7

My page returns JSON http response which contains id: 14

Is there a way in selenium python to grab this? I searched the web and could not find any solutions. Now I am wondering maybe its just not possible? I could grab this id from the db but I am trying to avoid this. Please tell me if there is any ways around. Thank you

2
  • You can see the source of the page using driver.page_source. But if the format of the response is plain JSON, is it necessary to use Selenium? Or can you use something lighter-weight instead (e.g. requests, urllib2, etc...)? Commented Oct 30, 2014 at 19:49
  • Selenium is necessary here because I am running a test and its selenium based, needs that variable Commented Oct 31, 2014 at 15:36

3 Answers 3

24

The source of your difficulty is the fact that when a browser is returned raw JSON data, it wraps it in a tiny bit of HTML to make it visible to the user on the screen.

When I visit https://httpbin.org/user-agent in Firefox, for example, the following raw JSON appears in my browser window:

{"user-agent": "Mozilla/5.0 (X11; Linux x86_64; rv:42.0) Gecko/20100101 Firefox/42.0"
}

But in fact Firefox (and Chrome) has wrapped the JSON in a bit of extra HTML in order to create a document it can actually display. Here is the HTML that Firefox wraps it in, which I can see right in the JavaScript console by evaluating the expression document.documentElement.innerHTML:

<head><link rel="alternate stylesheet" type="text/css"
 href="resource://gre-resources/plaintext.css" title="Wrap Long Lines"></head>
 <body><pre>{"user-agent": "Mozilla/5.0 (X11; Linux x86_64; rv:42.0)
 Gecko/20100101 Firefox/42.0"
}
</pre></body>

Using BeautifulSoup to parse the HTML, as suggested in another answer, has two serious disadvantages: it introduces a new dependency to your project, and will also be quite slow compared to taking advantage of the fact that the browser will already have parsed the HTML for you and have the resulting DOM ready for your use.

To ask the browser to extract the JSON for you, simply ask it for the text inside of the <body> element, and all of the extra structure that the browser has added will be excluded and the pure JSON be returned:

driver.find_element_by_tag_name('body').text

Or, if you want it parsed into a Python data structure:

import json
json.loads(driver.find_element_by_tag_name('body').text)
Sign up to request clarification or add additional context in comments.

2 Comments

This is clearly a much better solution! p.s. love your PyCon videos Brandon
same selenium+splinter: br.find_by_tag('body').text (instead of br.html)
6

You can use BeautifulSoup to parse the page and extract the json. The code you need should look something like this. You may need to change the soup.find command if the json isn't directly in the body of the response.

from bs4 import BeautifulSoup
import json

soup = BeautifulSoup(driver.page_source)
dict_from_json = json.loads(soup.find("body").text)

1 Comment

Asking Python to parse the raw HTML not only requires an extra third-party library, but will be rather slow compared to just letting the browser do the parsing.
0

The other solutions didn't work for me. I found this solution using requests to be fast and simple:

import requests
requests.get(browser.current_url).json()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.