1

I want to return the "id" value from the variable meta using beautifulsoup and python. This possible? Additionally, I don't know how to find the certain 'script' tag that contains the meta variable because it does not have a unique identifier, as well as many other 'script' tags on the site. I'm also using selenium as well, so I can understand any answers with that.

<script>
    var meta = "variants":[{"id":12443604615241,"price":14000}, 
    {"id":12443604648009,"price":14000}]
</script>
2
  • What are you trying so far with python? Commented Aug 10, 2018 at 1:37
  • @FrankDiGiacomoKnarFTHUNDER Update the HTML with the parent node of the <script> tag Commented Aug 10, 2018 at 2:12

2 Answers 2

8

If you are using selenium there's no need to parse the html to get the js variable, just use selenum webdriver.execute_script() to get it to python:

from selenium import webdriver

driver = webdriver.Firefox()
driver.get('https://whatever.com/')
meta = driver.execute_script('return meta')

And thats it, meta now holds the js variable, and it maintains its type

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks, didn't know it was that simple for my case.
This works better than expected - "maintains its type" is an understatement. The captured variable, in my case a js array, could be directly used as a Python list!
3

You can use builtin re and json module for extracting Javascript variables:

from bs4 import BeautifulSoup
import re
import json
from pprint import pprint

data = '''
<html>
<body>

<script>
    var meta = "variants":[{"id":12443604615241,"price":14000},
    {"id":12443604648009,"price":14000}]
</script>

</body>
'''

soup = BeautifulSoup(data, 'lxml')
json_string = re.search(r'meta\s*=\s*(.*?}])\s*\n', str(soup.find('script')), flags=re.DOTALL)

json_data = json.loads('{' + json_string[1] + '}')

pprint(json_data)

This prints:

{'variants': [{'id': 12443604615241, 'price': 14000},
              {'id': 12443604648009, 'price': 14000}]}

2 Comments

That seems like the right idea, but i got an error: stating "TypeError: 'NoneType' object is not subscriptable," remember that there are about 50 other script tags without any unique identifier on the site sometimes, so i think I need to find this unique one with the variable meta in it. Don't know if that's the problem, thanks
@FrankDiGiacomoKnarFTHUNDER I don't know the structure of the html code you have, so helping you is hard without knowing it. All I can say it's selecting the script you want and having the right regular expression to extract the variable.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.