1

Trying to parse the html in order to get data from tags nested inside of tags, but when I prettify I get javascript. How do I get the information out of this javascript? How do I turn it into html? Is there a better way to get this information? This is my first question and I apologize if I've made any mistakes. Thank you.

This is my code:

from bs4 import BeautifulSoup as bs
import requests

html = requests.get(url)
soup = bs(html.content, 'html.parser')
print(soup.prettify())

The response is: what looks like byte/string of pre-prettified code followed by

<html>
<head>
</head>
<script language="javascript">
var strUrl = window.location.href;


if (strUrl.indexOf("modisoftinc.com") > 0)
    window.location.replace("https://www.modisoftinc.com/home.html");
if (strUrl.indexOf("www.modisoftinc.com") > 0)
    window.location.replace("https://www.modisoftinc.com/home.html");
if (strUrl.indexOf("http://modisoftinc.com") > 0)
    window.location.replace("https://www.modisoftinc.com/home.html");
if (strUrl.indexOf("www.modisoftinc.com") > 0)
    window.location.replace("https://www.modisoftinc.com/home.html");


if (strUrl.indexOf("echecks.modisoftinc.com") > 0)
    window.location.replace("https://echecks.modisoftinc.com/Account/Logon");


if (strUrl.indexOf("pos.modisoftinc.com") > 0)
    window.location.replace("https://pos.modisoftinc.com/Account/Logon");


if (strUrl.indexOf("clock.modisoftinc.com") > 0)
    window.location.replace("https://clock.modisoftinc.com/Account/Logon");


if (strUrl.indexOf("admin11.modisoftinc.com") > 0)
    window.location.replace("https://admin11.modisoftinc.com/Account/Logon");




if (strUrl.indexOf("modisoft.com") > 0)
    window.location.replace("https://www.modisoft.com/home.html");
if (strUrl.indexOf("www.modisoft.com") > 0)
    window.location.replace("https://www.modisoft.com/home.html");
if (strUrl.indexOf("http://modisoft.com") > 0)
    window.location.replace("https://www.modisoft.com/home.html");
if (strUrl.indexOf("www.modisoft.com") > 0)
    window.location.replace("https://www.modisoft.com/home.html");


if (strUrl.indexOf("echecks.modisoft.com") > 0)
    window.location.replace("https://echecks.modisoft.com/Account/Logon");

if (strUrl.indexOf("app.modisoft.com") > 0)
    window.location.replace("https://app.modisoft.com/Account/Logon");

if (strUrl.indexOf("app1.modisoft.com") > 0)
    window.location.replace("https://app1.modisoft.com/Account/Logon");

if (strUrl.indexOf("app2.modisoft.com") > 0)
    window.location.replace("https://app2.modisoft.com/Account/Logon");

if (strUrl.indexOf("pos.modisoft.com") > 0)
    window.location.replace("https://pos.modisoft.com/Account/Logon");

if (strUrl.indexOf("clock.modisoft.com") > 0)
    window.location.replace("https://clock.modisoft.com/Account/Logon");

    if (strUrl.indexOf("admin11.modisoft.com") > 0)
    window.location.replace("https://admin11.modisoft.com/Account/Logon");



if (strUrl.indexOf("modisoftrewards.com") > 0)
    window.location.replace("https://www.modisoftrewards.com/index.html");
if (strUrl.indexOf("www.modisoftrewards.com") > 0)
    window.location.replace("https://www.modisoftrewards.com/index.html");
if (strUrl.indexOf("http://modisoftrewards.com") > 0)
    window.location.replace("https://www.modisoftrewards.com/index.html");
if (strUrl.indexOf("www.modisoftrewards.com") > 0)
    window.location.replace("https://www.modisoftrewards.com/index.html");






   if (strUrl.indexOf("localhost") > 0)
       window.location.replace("Account/Logon");
</script>
<body>
</body>
</html>
1
  • You can't turn it into html. Depending on what you want from the page you will need to automate a browser to let javascript run on the page and then grab what you want, or use the network tab/network monitoring tool to see if the content you want is available from another uri by xhr and make a request to that endpoint Commented Jun 30, 2020 at 4:40

1 Answer 1

1

How do I get the information out of this javascript? How do I turn it into html?

Yes, you need a browser automation (selenium, headless Chrome) to execute on-site JS. Then upon that, the JS fills in HTML with missing data. Eg.:

  1. https://webscraping.pro/javascript-rendering-library-for-scraping-javascript-sites/

  2. https://webscraping.pro/java-library-to-scrape-linkedin-its-data-affiliates/

Hack

In some cases you might use a bare coding (python, php) to imitate JS requests (usually XHR/Ajax) and get the missing info. Eg. Scrape a JS Lazy load page by Python requests

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.