3

I am trying to use python to download the results from the following website:

http://david.abcc.ncifcrf.gov/api.jsp?type=GENBANK_ACCESSION&ids=CP000010,CP000125,CP000124,CP000124,CP000124,CP000124&tool=chartReport&annot=KEGG_PATHWAY

I was attempting to use mechanize before I realized that the Download File is written in javascript which mechanize does not support. My code so far opens the web page as shown below. I am stuck on how to access the Download link on the web page in order to save the data onto my machine.

import urllib2

def downloadFile():

    url = 'http://david.abcc.ncifcrf.gov/api.jsp?type=GENBANK_ACCESSION&ids=CP000010,CP000125,CP000124,CP000124,CP000124,CP000124&tool=chartReport&annot=KEGG_PATHWAY'
    t = urllib2.urlopen(url)
    s = t.read()
    print s

The results that are printed are

<html>
<head></head>
<body>
  <form name="apiForm" method="POST">
    <input type="hidden" name="rowids">
    <input type="hidden" name="annot">

    <script type="text/javascript">
      document.apiForm.rowids.value="4791928,3403495,....";   //There are really about 500 values
      document.apiForm.annot.value="48";
      document.apiForm.action = "chartReport.jsp";
      document.apiForm.submit();
    </script>

  </form>
</body>
</html>

Does anybody know how I can select and move to the Download File page and save that file to my computer?

1
  • Did my solution work for you? Commented Jun 27, 2011 at 19:03

1 Answer 1

2

After some more research on that link, I came up with this. You can definitely use mechanize to do it.

import mechanize

def getJSVariableValue(content, variable):
    value_start_index = content.find(variable)
    value_start_index = content.find('"', value_start_index) + 1

    value_end_index = content.find('"', value_start_index)

    value = content[value_start_index:value_end_index]
    return value

def getChartReport(url):
    br = mechanize.Browser()
    resp = br.open(url)
    content = resp.read()
    br.select_form(name = 'apiForm')
    br.form.set_all_readonly(False)
    br.form['rowids'] = getJSVariableValue(content, 'document.apiForm.rowids.value')
    br.form['annot'] = getJSVariableValue(content, 'document.apiForm.annot.value')
    br.form.action = 'http://david.abcc.ncifcrf.gov/' + getJSVariableValue(content, 'document.apiForm.action')

    print br.form['rowids']
    print br.form['annot']

    br.submit()

    resp = br.follow_link(text_regex=r'Download File')
    content = resp.read()
    f = open('output.txt', 'w')
    f.write(content)


url = 'http://david.abcc.ncifcrf.gov/api.jsp?type=GENBANK_ACCESSION&ids=CP000010,CP000125,CP000124,CP000124,CP000124,CP000124&tool=chartReport&annot=KEGG_PATHWAY'
chart_output = getChartReport(url)
Sign up to request clarification or add additional context in comments.

5 Comments

I have tried versions of this. I receive the error output of:
'Traceback (most recent call last): File "<pyshell#0>", line 1, in <module> downloadFile() File "C:\Python27\DAVIDLink.py", line 14, in downloadFile content = br.follow_link(text_regex=r"Download File").read() File "C:\Python27\lib\mechanize.py", line 569, in follow_link return self.open(self.click_link(link, **kwds)) File "C:\Python27\lib\mechanize.py", line 553, in click_link link = self.find_link(**kwds) File "C:\Python27\lib\mechanize.py", line 620, in find_link raise LinkNotFoundError() LinkNotFoundError'
from my understanding this is because mechanize cannot understand or interpret javascript which the website is primarily written in. @Jordan
Ah, you are correct. I accidentally missed that step when browsing it for myself. I'll see if I can figure something out.
@Marea ok I edited my answer with an actual working, tested example. That spits out the content to a file named output.txt in the same folder that you run the script from. I'm sure you can modify to suit your needs.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.