0

I work on a website with JAVA Jsoup Library to extract some hyperlinks

Document doc = Jsoup.connect("http://www.saudisale.com/SS_a_mpg.aspx").get();
Elements script = doc.select("script") ;  

for(Element elementary :doc.select("table"))
{
System.out.println(""+elementary.select("tbody").select("tr").select("td").select("input").attr("onClick")+"");

Sample Output:-

window.open('http://saudisale.com/arPrivatePage.aspx?id=21871638','_blank','channelmode =1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1');
window.open('http://saudisale.com/arPrivatePage.aspx?id=21871638','_blank','channelmode =1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1');
window.open('http://saudisale.com/arPrivatePage.aspx?id=21871638','_blank','channelmode =1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1');
window.open('http://ads.saudisale.com/dyaralez.html ','_blank','channelmode =1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1');
window.open('http://ads.saudisale.com/dyaralez.html ','_blank','channelmode =1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1');


window.open('http://ads.saudisale.com/dalel.html','_blank','channelmode =1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1');

window.open('http://ads.saudisale.com/dalel.html','_blank','channelmode =1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1');
    window.open('SS_a_car.aspx?carid=37240','_blank','channelmode =1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1');
    window.open('SS_a_car.aspx?carid=37240','_blank','channelmode =1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1');

Based on the fact that Jsoup does not support javascript, so I have to do some manual java code to convert window.open(hyperlink ) javascript code to absolute hyperlink

For example the following output JavaScript code has to be converted

window.open('http://saudisale.com/arPrivatePage.aspx?id=21871638','_blank','channelmode=1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1')

To: http://saudisale.com/arPrivatePage.aspx?id=21871638

and

window.open('SS_a_car.aspx?carid=37149','_blank','channelmode =1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1'); 

To http://www.saudisale.com/SS_a_car.aspx?carid=37149

Could someone guide me how to accomplish this task with JAVA?

2 Answers 2

3

Use a regex. This will do what you want:

String input = "window.open('http://saudisale.com/arPrivatePage.aspx?id=21871638','_blank','channelmode =1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1');";

String regex = "window.open\\(['\"]*(.*?)(\\s*['\"]*,.*?)";
Pattern pattern = Pattern.compile(regex); 
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {

    String output = (matcher.group().replaceAll(regex, "$1"));
    System.out.println(output);
}

Your last two URLs are relative, so you have to convert them to absolute URLs as described here.

Sign up to request clarification or add additional context in comments.

2 Comments

Your code works fine, for the second URL I tried the following code String input2 ="window.open('SS_a_car.aspx?carid=37149','_blank','channelmode =1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1')"; URL baseURL = new URL("saudisale.com/"); URL url = new URL( baseURL ,input2); System.out.printl(url); But it doesn't give the desired output any more help please?
@JavaFan, Use http://saudisale.com/ as your baseURL. You must include the http:// part at the the beginning. Try that, it should work.
0

For relative URl I used this code. It works fine.

String input2 = "window.open('SS_a_car.aspx?carid=37149','_blank','channelmode =1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1')";        

    URL baseURL = new URL("http://saudisale.com/");

    String regex = "window.open\\(['\"]*(.*?)(\\s*['\"]*,.*?)";
    Pattern pattern = Pattern.compile(regex); 
    Matcher matcher = pattern.matcher(input2);
    while (matcher.find()) {

        String output = (matcher.group().replaceAll(regex, "$1"));
        URL url = new URL( baseURL ,output);
        System.out.println(url);
    }

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.