6

I'm writing a library in Java which creates the URL from a list of filenames in this way:

final String domain = "http://www.example.com/";

String filenames[] = {"Normal text","Ich weiß nicht", "L'ho inserito tra i princìpi"};

System.out.println(domain+normalize(filenames[0]);
//Prints  "http://www.example.com/Normal_text"
System.out.println(domain+normalize(filenames[1]);
//Prints  "http://www.example.com/Ich_weib_nicht"
System.out.println(domain+normalize(filenames[2]);
//Prints  "http://www.example.com/L_ho_inserito_tra_i_principi"

Exists somewhere a Java library that exposes the method normalize that I'm using in the code above?

Literature:

2
  • 1
    Take a look at this: stackoverflow.com/questions/21489289/… Commented Feb 10, 2014 at 13:32
  • 1
    @PopoFibo Yes, it works! I never seen the Normalizer class in Java! Thanks a lot! Can you post an answer with a short example? Commented Feb 10, 2014 at 13:37

2 Answers 2

7

Taking the content from my previous answer here, you can use java.text.Normalizer which comes close to normalizing Strings in Java. An example of normalization would be;

Accent removal:

String accented = "árvíztűrő tükörfúrógép";
String normalized = Normalizer.normalize(accented,  Normalizer.Form.NFD);
normalized = normalized.replaceAll("[^\\p{ASCII}]", "");

System.out.println(normalized);

Gives;

arvizturo tukorfurogep
Sign up to request clarification or add additional context in comments.

Comments

3

Assuming you mean you want to encode the strings to make them safe for the url. In which case use URLEncoder:

final String domain = "http://www.example.com/";

String filenames[] = {"Normal text","Ich weiß nicht", "L'ho inserito tra i princìpi"};

System.out.println(domain + URLEncoder.encode(filenames[0], "UTF-8"));
System.out.println(domain + URLEncoder.encode(filenames[1], "UTF-8"));
System.out.println(domain + URLEncoder.encode(filenames[2], "UTF-8"));

5 Comments

No, because I can't use "special" chars like % and so on which comes from the URLEncoder.encode() method. I'm creating URLs which must be result valid byr a special XML validator. It requires no whitespaces, no special chars, and so on
So they aren't URLs then
No no, they are! The XML contains a list of elements, each element has an rdf:about property which has an URL as value
In which case I would use StringEscapeUtils.escapeXML() from the apache commons lang library but I don't see what that has to do with URLs
That method is not ok for the validator. E.g., StringEscapeUtils.escapeXml("l'avevo all'università") is escaped as l'avevo all'università: the accented "a" is still there! Moreover, the text is not human-readable.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.