3
import javax.script.ScriptEngine;
import javax.script.ScriptEngineManager;
import java.io.FileReader;

public class Main {

    public static void main(String[] args) {

        ScriptEngineManager manager = new ScriptEngineManager();
        ScriptEngine engine = manager.getEngineByName("js");
        try {
            FileReader reader = new FileReader("C:/yourfile.js");
            engine.put("urlfromjava", "http://www.something.com/?asvb");
            engine.eval(reader);
            reader.close();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

Right now, the yourfile.js contains this line

function urlget(url)
{
    print("URL:"+url);
    var loc = window.open(url);
    var link = document.getElementsByTagName('a')["61"].href;
    return ("\nLink is: \n"+link); 

}
var x = urlget(urlfromjava);
print(x);

I get the error

"javax.script.ScriptException: sun.org.mozilla.javascript.internal.EcmaError: ReferenceError: "window" is not defined"

How to open a URL and get the links of it from java?

5 Answers 5

6

you can embed Env.js in Rhino to get this kind of functionality

Sign up to request clarification or add additional context in comments.

Comments

2

According to the documentation:

The window object represents an open window in a browser.

Since you are not executing your script in a browser, the window object is not defined.

You can read the URL using the URL/URLConnecion classes and feed it to the ScriptEngine. There is a tutorial here.

2 Comments

I like the answer, except, w3schools is as much "the documentation" as wikipedia or a random web search result. So the first two lines of this answer are incorrect.
I'm suprised no one told you to use JavaFX. You can achieve headlessness by using a JFrame.
0

In javascript window means browser window. So when you are trying to execute this js from Java, it is unable to find browser window and you are getting error. You can use URL class in Java to get the content of the url.

3 Comments

Actually, the content of the URL has hyperlinks that I can retrieve only by using document.getElementByTagName('a'); So for that, I need to load the url in the memory, do this and get the link
you can parse the string using regex pattern.
The link is not there in the source of the page. It gets loaded by javascript executed on server side.
0

try this:

import java.net.*;  
import java.io.*;  
  public class URLConnectionReader {  
  public static void main(String[] args) throws Exception {  
        URL yahoo = new URL("http://www.yahoo.com/");  
        URLConnection yc = yahoo.openConnection();  
        BufferedReader in = new BufferedReader(  
             new InputStreamReader(  
             yc.getInputStream()));  
       String inputLine;  
       while ((inputLine = in.readLine()) != null)   
             System.out.println(inputLine);// or save to some StringBuilder like this:   sb.append(inputLine); then pass the sb.toString() to the method that gets links out of it - > see getLinks below  
        in.close();  
       }  
  }  



private static final String CLOSING_QUOTE   = "\"";
private static final String HREF_PREFIX     = "href=\"";
private static final String HTTP_PREFIX     = "http://";



public static Set<String> getLinks(String page) {
    Set<String> links = new HashSet<String>();
    String[] rawLinks = StringUtils.splitByWholeSeparator(page, HREF_PREFIX);
    for (String str : rawLinks) {
        if(str.startsWith(HTTP_PREFIX)) {
            links.add(StringUtils.substringBefore(str, CLOSING_QUOTE));
        }
    }
    return links;
}

3 Comments

The problem is, the link in the page is generated by javascript. So only after the URL is loaded, will the link arrive. i.e., it is not there in the source of the html file. That is why, after loading the url, I do document.getElementByTagName('a') rather than using URL class in java to extract out the links.
URL.openConnection emulates what client's browser does so you get exactly the same markup that you get via browser. Try it and I believe that you will see that it works. If i tdoes not let me know what you get and we could try to work it out further.
Sure, will do that and tell you.
0

you can use HtmlUnit is java API, i think it can help you to access the executed js content, as a simple html.

WebClient webClient = new WebClient();
HtmlPage myPage = (HtmlPage) webClient.getPage(new URL("YourURL"));
System.out.println(myPage.getVisibleText());

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.