Downloading Javascript File from Website using Python

Question

I am trying to use python to download the results from the following website:

http://david.abcc.ncifcrf.gov/api.jsp?type=GENBANK_ACCESSION&ids=CP000010,CP000125,CP000124,CP000124,CP000124,CP000124&tool=chartReport&annot=KEGG_PATHWAY

I was attempting to use mechanize before I realized that the Download File is written in javascript which mechanize does not support. My code so far opens the web page as shown below. I am stuck on how to access the Download link on the web page in order to save the data onto my machine.

import urllib2

def downloadFile():

    url = 'http://david.abcc.ncifcrf.gov/api.jsp?type=GENBANK_ACCESSION&ids=CP000010,CP000125,CP000124,CP000124,CP000124,CP000124&tool=chartReport&annot=KEGG_PATHWAY'
    t = urllib2.urlopen(url)
    s = t.read()
    print s

The results that are printed are

<html>
<head></head>
<body>
  <form name="apiForm" method="POST">
    <input type="hidden" name="rowids">
    <input type="hidden" name="annot">

    <script type="text/javascript">
      document.apiForm.rowids.value="4791928,3403495,....";   //There are really about 500 values
      document.apiForm.annot.value="48";
      document.apiForm.action = "chartReport.jsp";
      document.apiForm.submit();
    </script>

  </form>
</body>
</html>

Does anybody know how I can select and move to the Download File page and save that file to my computer?

Did my solution work for you?

Jordan
– Jordan

2011-06-27 19:03:45 +00:00
Commented Jun 27, 2011 at 19:03 — Jordan
– Jordan, Commented Jun 27, 2011 at 19:03

Jordan · Accepted Answer · 2011-06-23 14:52:47Z

2

After some more research on that link, I came up with this. You can definitely use mechanize to do it.

import mechanize

def getJSVariableValue(content, variable):
    value_start_index = content.find(variable)
    value_start_index = content.find('"', value_start_index) + 1

    value_end_index = content.find('"', value_start_index)

    value = content[value_start_index:value_end_index]
    return value

def getChartReport(url):
    br = mechanize.Browser()
    resp = br.open(url)
    content = resp.read()
    br.select_form(name = 'apiForm')
    br.form.set_all_readonly(False)
    br.form['rowids'] = getJSVariableValue(content, 'document.apiForm.rowids.value')
    br.form['annot'] = getJSVariableValue(content, 'document.apiForm.annot.value')
    br.form.action = 'http://david.abcc.ncifcrf.gov/' + getJSVariableValue(content, 'document.apiForm.action')

    print br.form['rowids']
    print br.form['annot']

    br.submit()

    resp = br.follow_link(text_regex=r'Download File')
    content = resp.read()
    f = open('output.txt', 'w')
    f.write(content)


url = 'http://david.abcc.ncifcrf.gov/api.jsp?type=GENBANK_ACCESSION&ids=CP000010,CP000125,CP000124,CP000124,CP000124,CP000124&tool=chartReport&annot=KEGG_PATHWAY'
chart_output = getChartReport(url)

edited Jun 23, 2011 at 14:52

answered Jun 22, 2011 at 19:02

Jordan

32.7k6 gold badges59 silver badges67 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Marea Over a year ago

I have tried versions of this. I receive the error output of:

Marea Over a year ago

'Traceback (most recent call last): File "<pyshell#0>", line 1, in <module> downloadFile() File "C:\Python27\DAVIDLink.py", line 14, in downloadFile content = br.follow_link(text_regex=r"Download File").read() File "C:\Python27\lib\mechanize.py", line 569, in follow_link return self.open(self.click_link(link, **kwds)) File "C:\Python27\lib\mechanize.py", line 553, in click_link link = self.find_link(**kwds) File "C:\Python27\lib\mechanize.py", line 620, in find_link raise LinkNotFoundError() LinkNotFoundError'

Marea Over a year ago

from my understanding this is because mechanize cannot understand or interpret javascript which the website is primarily written in. @Jordan

Jordan Over a year ago

Ah, you are correct. I accidentally missed that step when browsing it for myself. I'll see if I can figure something out.

Jordan Over a year ago

@Marea ok I edited my answer with an actual working, tested example. That spits out the content to a file named output.txt in the same folder that you run the script from. I'm sure you can modify to suit your needs.

Collectives™ on Stack Overflow

Downloading Javascript File from Website using Python

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related