9

Given this :

URL u=new URL("someURL");

How do i identify the top level domain of the URL..

2
  • Very similar, pretty much a duplicate: stackoverflow.com/questions/1379958/… Commented Jan 26, 2010 at 17:26
  • ... "top-level domain" != "domain name" (... "uk" != "www.amazon.co.uk" ) Commented Feb 2, 2013 at 13:08

4 Answers 4

10

Guava provides a nice utility for this. It works as follow:

InternetDomainName.from("someurl.co.uk").publicSuffix() will get you co.uk InternetDomainName.from("someurl.de").publicSuffix() will get you de

Sign up to request clarification or add additional context in comments.

Comments

5

So you want to have the top-level domain part only?

//parameter urlString: a String
//returns: a String representing the TLD of urlString, or null iff urlString is malformed
private String getTldString(String urlString) {
    URL url = null;
    String tldString = null;
    try {
        url = new URL(urlString);
        String[] domainNameParts = url.getHost().split("\\.");
        tldString = domainNameParts[domainNameParts.length-1];
    }
    catch (MalformedURLException e) {   
    }

    return tldString;
}

Let's test it!

@Test 
public void identifyLocale() {
    String ukString = "http://www.amazon.co.uk/Harry-Potter-Sheet-Complete-Series/dp/0739086731";
    logger.debug("ukString TLD: {}", getTldString(ukString));

    String deString = "http://www.amazon.de/The-Essential-George-Gershwin/dp/B00008GEOT";
    logger.debug("deString TLD: {}", getTldString(deString));

    String ceShiString = "http://例子.测试";
    logger.debug("ceShiString TLD: {}", getTldString(ceShiString));

    String dokimeString = "http://παράδειγμα.δοκιμή";
    logger.debug("dokimeString TLD: {}", getTldString(dokimeString));

    String nullString = null;
    logger.debug("nullString TLD: {}", getTldString(nullString));

    String lolString = "lol, this is a malformed URL, amirite?!";
    logger.debug("lolString TLD: {}", getTldString(lolString));

}

Output:

ukString TLD: uk
deString TLD: de
ceShiString TLD: 测试
dokimeString TLD: δοκιμή
nullString TLD: null
lolString TLD: null

1 Comment

your solution lacks on .co.uk domains for example
3

The host part of the url conforms to RFC 2732 according to the docs. It would imply that simply splitting the string you get from

  String host = u.getHost();

would not be enough. You will need to ensure that you conform to the RFC 2732 when searching the host OR if you can guarantee that all addresses are of the form server.com then you can search for the last . in the string and grab the tld.

Comments

1

Use URL#getHost() and if necessary thereafter a String#split() on "\\.".

Update: if you actually have an IP address as host, then you need to make use of InetAddress#getHostName() independently.

2 Comments

String[] htokens=u.getHost().toString().split("."); is there anything wrong with this line.. Because , after this statement , the length of htokens array still remains 0 though u.getHost().toString() returns "a.bc.de" Please help
The link mentions that it takes a regex pattern. The answer mentions that you need to split on "\\.".

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.