2

I'm trying to use mechanize to grab prices for New York's metro-north railroad from this site:
http://as0.mta.info/mnr/fares/choosestation.cfm

The problem is that when you select the first option, the site uses javascript to populate your list of possible destinations. I have written equivalent code in python, but I can't seem to get it all working. Here's what I have so far:

import mechanize
import cookielib
from bs4 import BeautifulSoup

br = mechanize.Browser()
br.set_handle_robots(False)
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1)     Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]

br.open("http://as0.mta.info/mnr/fares/choosestation.cfm")

br.select_form(name="form1")
br.form.set_all_readonly(False)

origin_control = br.form.find_control("orig_stat", type="select")
origin_control_list = origin_control.items
origin_control.value = [origin_control.items[0].name]

destination_control_list = reFillList(0, origin_control_list)

destination_control = br.form.find_control("dest_stat", type="select")
destination_control.items = destination_control_list
destination_control.value = [destination_control.items[0].name]

response = br.submit()
response_text = response.read()
print response_text

I know I didn't give you code for the reFillList() method, because it's long, but assume it correctly creates a list of mechanize.option objects. Python doesn't complain about me about anything, but on submit I get the html for this alert:

"Fare information for travel between two lines is not available on-line. Please contact our Customer Information Center at 511 and ask to speak to a representative for further information."

Am I missing something here? Thanks for all the help!

1 Answer 1

2

If you know the station IDs, it is easier to POST the request yourself:

import mechanize
import urllib

post_url = 'http://as0.mta.info/mnr/fares/get_fares.cfm'

orig = 295 #BEACON FALLS
dest = 292 #ANSONIA

params = urllib.urlencode({'dest_stat':dest, 'orig_stat':orig })
rq = mechanize.Request(post_url, params)

fares_page = mechanize.urlopen(rq)

print fares_page.read()

If you have the code to find the list of destination IDs for a given starting ID (i.e. a variant of refillList()), you can then run this request for each combination:

import mechanize
import urllib, urllib2
from bs4 import BeautifulSoup

url = 'http://as0.mta.info/mnr/fares/choosestation.cfm'
post_url = 'http://as0.mta.info/mnr/fares/get_fares.cfm'

def get_fares(orig, dest):
    params = urllib.urlencode({'dest_stat':dest, 'orig_stat':orig })
    rq = mechanize.Request(post_url, params)

    fares_page = mechanize.urlopen(rq)
    print(fares_page.read())

pool = BeautifulSoup(urllib2.urlopen(url).read())

#let's keep our stations organised
stations = {}

# dict by station id
for option in pool.find('select', {'name':'orig_stat'}).findChildren():
    stations[option['value']] = {'name':option.string}

#iterate over all routes
for origin in stations:
    destinations = get_list_of_dests(origin) #use your code for this
    stations[origin]['dests'] = destinations

    for destination in destinations:
        print('Processing from %s to %s' % (origin, destination))
        get_fares(origin, destination)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.