0

I'm desperate enough to get the content of the this URL.

No authentication is required when trying to access this page from a web browser but when I'm trying to get the content from a web application I get sso file as a response. The code I used is as follows:

HttpClient httpClient = new DefaultHttpClient();
HttpGet httpGet = new HttpGet("http://search.lib.monash.edu/primo_library/libweb/action/search.do?dscnt=1&frbg=&tab=default_tab&srt=rank&ct=search&mode=Basic&dum=true&tb=&indx=1&vl%28freeText0%29=java&fn=search&vid=MON");
HttpResponse httpResponse = httpClient.execute(httpGet);
HttpEntity responseEntity = httpResponse.getEntity();


BufferedReader in = new BufferedReader(
            new InputStreamReader(responseEntity.getContent()));
    String inputLine;
    StringBuffer response = new StringBuffer();


    while ((inputLine = in.readLine()) != null) {
        response.append(inputLine);
    }
    in.close();

    System.out.println(response.toString());    

and the sso file I get as response is as follows:

<!-- filename: sso --> <html> <head> <title>Login </title> <!-- START filename: meta-tags.pds --> <META HTTP-EQUIV="Cache-Control" CONTENT="no-cache">  <META HTTP-EQUIV="Pragma" CONTENT="no-cache">  <META HTTP-EQUIV="Expires" CONTENT="Sun, 06 Nov 1994 08:49:37 GMT">  <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8"> <!-- END   filename: meta-tags.pds --> <link rel="stylesheet" href="http://monash-dc05.hosted.exlibrisgroup.com:8991/PDSMExlibris.css" TYPE="text/css"> </head> <body onload = "location = '/goto/http://search.lib.monash.edu:80/primo_library/libweb/action/login.do?afterPDS=true&vid=MON&vid=MON&dscnt=2&targetURL=http%3A%2F%2Fsearch.lib.monash.edu%2Fprimo_library%2Flibweb%2Faction%2Fsearch.do%3Fdscnt%3D0&frbg=&tab=default%5Ftab&dstmp=1394940513823&srt=rank&ct=search&mode=Basic&dum=true&indx=1&tb=&vl%28freeText0%29=java&fn=search&pds_handle=GUEST';"> <noscript> <div id="header">      <div>         <img src="http://monash-dc05.hosted.exlibrisgroup.com:8991//exlibris/primo/p4_1/pds/html_form/icon/exlibrislogo.jpg" alt="Exlibris Logo"><p>&nbsp;</p>     </div> </div> <div id="connect">  <a href="/goto/http://search.lib.monash.edu:80/primo_library/libweb/action/login.do?afterPDS=true&vid=MON&vid=MON&dscnt=2&targetURL=http%3A%2F%2Fsearch.lib.monash.edu%2Fprimo_library%2Flibweb%2Faction%2Fsearch.do%3Fdscnt%3D0&frbg=&tab=default%5Ftab&dstmp=1394940513823&srt=rank&ct=search&mode=Basic&dum=true&indx=1&tb=&vl%28freeText0%29=java&fn=search&pds_handle=GUEST">Return from Check SSO </a></noscript> </div> </body> </html></body></html>

Please help.

0

1 Answer 1

1

It was not because of any authentication issue.

The page returned has a onload event associated with the body. Due to the reason, when you open referred URL in a browser client,

  1. It first receives the response html what you have in response string.
  2. Then it tries to render and display it.
  3. But, in the mean time, the onload event fires and loads a URL as defined by location='/goto/......
  4. And, before the current page is displayed, the new page is received and displayed on the browser.

From the response you received, observe this:

<body onload = "location = '/goto/http://search.lib.monash.edu:80/primo_library/libweb/action/login.do?afterPDS=true&vid=MON&vid=MON&dscnt=2&targetURL=http%3A%2F%2Fsearch.lib.monash.edu%2Fprimo_library%2Flibweb%2Faction%2Fsearch.do%3Fdscnt%3D0&frbg=&tab=default%5Ftab&dstmp=1394940513823&srt=rank&ct=search&mode=Basic&dum=true&indx=1&tb=&vl%28freeText0%29=java&fn=search&pds_handle=GUEST';">

In the JAVA code, you are just reading the content from the URL you specified.
And you are not passing it to any content parser to render and display. Unless which it just will be treated as a static text.

And hence you are not seeing a response in JAVA code as compared and seen in a web browser.

Other suggestions:
When you read a line and append it to a buffer, you better also append a CRLF to it.

Change:

    response.append(inputLine);

To:

    response.append( inputLine ).append( "\r\n" );

It makes the response text multi line and more readable.

Sign up to request clarification or add additional context in comments.

4 Comments

thanks for your explanation, didn't really understand what was happening! In my application I just want to get the search results as static text for some sort of processing. If I go to the "go to" URL then I'd get to right page but it only involves the java code, not the rendered results. I'm now wondering how would I get the search results. Really appreciated your help!
For search results, you should better depend on any RSS feed service, if available, from the site. Otherwise you need any third party tool.
There isn't any sort of API for the system. Do you recommend any tool for doing that? thanks!
I don't have any such information in hand at this moment on such tools.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.