1

I have this text :

   <message id="dsds" to="[email protected]" type="video" from="test@test"><body>TESTTESTTEST</body><active xmlns="http://jabber.org"/></message>

And I want to get the part of <body></body> in this string.

In java, I m searching and found split, but it cant solve my problem. How can I get the text between <body></body> in java?

5
  • 2
    Do you want to parse XML? Commented Jan 7, 2015 at 11:42
  • Which part of the String do you want? Commented Jan 7, 2015 at 11:42
  • @JamesFox Probably depends. :) Commented Jan 7, 2015 at 11:42
  • @James Fox; in the <body> </body> I want to get. @Patryk; no its not xml its my string data. Commented Jan 7, 2015 at 11:43
  • Have a look at: jsoup.org which is a java html parser. Commented Jan 7, 2015 at 11:57

6 Answers 6

4

Using a Parser like SAXParser or DocumentBuilder is much preferred. You can accurately get the tags and process the data. They will be particularly handy when you have many tags to process.

Here is an example of using the Parser to read the body tag:

        SAXParserFactory factory = SAXParserFactory.newInstance();
        SAXParser saxParser = factory.newSAXParser();
        DefaultHandler handler = new DefaultHandler(){

            String body = "";
            boolean isBody = false;

            @Override
            public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {

                if (qName.equalsIgnoreCase("body")) {
                    isBody = true;
                }
            }

            @Override
            public void characters(char[] ch, int start, int length) throws SAXException {
                if (isBody) {
                    body = new String(ch, start, length);
                    System.out.println("body : " + body);
                }
            }

            @Override
            public void endElement(String uri, String localName, String qName) throws SAXException {
                if (qName.equalsIgnoreCase("body")) {
                    isBody = false;
                }
            }
        };

        saxParser.parse(new InputSource(new StringReader("<message id=\"dsds\" to=\"[email protected]\" type=\"video\" from=\"test@test\"><body id=\"dd\">TESTTESTTEST</body><active xmlns=\"http://jabber.org\"/></message>")), handler);
Sign up to request clarification or add additional context in comments.

Comments

2

use regex like this : (works for <body>asas asasa </body> as well as <body> </body>

public static void main(String[] args) {
    String s = "<message id=\"dsds\" to=\"[email protected]\" type=\"video\" from=\"test@test\"><body>TESTTESTTEST</body><active xmlns=\"http://jabber.org\"/></message>";
    Pattern p = Pattern.compile("<body.*>(.*?)</body>");
    Matcher m = p.matcher(s);
    while (m.find()) {
        System.out.println(m.group(1));
    }
}

O/P :

TESTTESTTEST

9 Comments

have you seen the answers before answering? same as my answer but 9 minute later!:)
@FarhangAmary - Does your answer work for the inputs I have provided?. Inputs like <body>asas asasa </body>. Please check. Also, my regex is different. And if that helps, I saw your answer and agreed with Thilo.
Well, there is something wrong in your regex .. it contains an odd amount of quotes. And as far as I see it, it would also fail if the body tag contains whitespaces (<body >) or attributes.
@Tom - Corrected it.. Was a typo. Thanks.. :).. Can you give me a sample input where this might fail?.
@TheLostMind Check my edit of the last comment :P. I already noticed that :).
|
1

Use regx package:

    String htmlString = "<message id=\"dsds\" to=\"[email protected]\" type=\"video\" from=\"test@test\"><body>TESTTESTTEST</body><active xmlns=\"http://jabber.org\"/></message>";
    String bodyText="";
    Pattern p = Pattern.compile("<body.*>(.*?)</body.*>");
    Matcher m = p.matcher(htmlString);

    if (m.find()) {
        bodyText = m.group(1);
    }
    System.out.println(bodyText);

OUTPUT: TESTTESTTEST

2 Comments

Caveat: Won't work if the body tag has any attributes (or just whitespace in it).
@Thilo then she/he can use (.*?) instead of (\\S+) inside the pattern
1

In that specific case, I'd recommend you to use regular expressions with Matcher

Possible solution: Java regex to extract text between tags

3 Comments

You should include the essential parts of your links in your answer. If the link becomes invalid your answer will then be meaningless and this should be avoided.
The link is to a possible duplicated question/solution. Should I include "essential parts" from another Stack Overflow answer in my answer?
Either that or flag this question as a possible duplicate of your found question (last approach is better).
1

You can write the code like this-

String s = "<message id=\"dsds\" to=\"[email protected]\" type=\"video\" from=\"test@test\"><body>TESTTESTTEST</body><active xmlns=\"http://jabber.org\"/></message>";//Use '/' character as escape for "
        int firstIndex = s.indexOf("<body>");
        int lastIndex = s.indexOf("</body>");
        System.out.println(s.substring(firstIndex+6, lastIndex));

And it will print the expected result.

Comments

0

Answer is already given for solving it through regex (though XML parser might have been the better choice).

Giving a simple suggestion to modify the regex proposed in above solutions:

Regex proposed: (<body.*>(.*?)</body.*>) => This regex is greedy. 
Non greed regex: <body[^>]*>(.*?)</body[^>]*>

You can make it non-greedy which will lead to improvement in running time. The problem with original regex is that .* will continue to match till the end of string and then it will backtrack. "[^>]" will stop as soon as it sees the right angle bracket. I ran a simple test comparing both the regex. Greedy one takes 3 times the time taken by non-greedy.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.