1

I'm trying to parse the bunch of xml files from a folder and return all the tags that contain particular expression. Below is what I did,

public class MyDomParser {

    public static void main(String[] args) {
           DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
            try {
                File folder = new File("C:\\Users\\xmlfolder");

                DocumentBuilder builder = factory.newDocumentBuilder();
                for(File workfile : folder.listFiles()){
                    if(workfile.isFile()){
                        Document doc = builder.parse(workfile);

                        }
                    }
                }


            } catch (ParserConfigurationException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            } catch (SAXException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            } catch (IOException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            }

    }

}

How do I loop through all the tags in each XML and return the tags that contain the expression "/server[^<]*".

Any help is much appreciated.

2 Answers 2

1

You could create a separate method that recursively goes through all nodes in the current XML file and adds the matched tags to a List of Nodes.

Example:

public static void parseTags (Node node, List<Node> list)
{
      NodeList nodeList = node.getChildNodes();
      for (int i = 0; i < nodeList.getLength(); i++)
      {
           Node n = nodeList.item(i);
           if (n.getNodeType() == Node.ELEMENT_NODE)
           {
               String content = n.getTextContent();

               // if the tag content matches your criteria, add it to the list
               if (content.matches("/server[^<]*"))
               {
                   list.add(n);
               }
               parseTags(n, list);
           }
      }
}

You can call this method in your existing code like this:

// create your list outside the loop like this:
List<Node> list = new ArrayList<Node>();

for(File workfile : folder.listFiles())
{
    if(workfile.isFile())
    {
        Document doc = builder.parse(workfile);

        // call the recursive method here:
        parseTags(doc.getDocumentElement(), list);
    }
}
Sign up to request clarification or add additional context in comments.

2 Comments

Michael, my question to be exact was search for text between the tags and and return the tags along with text if they match. In the above code name would search for only tagnames?
This answer looks good for the most part, just one thing - If the regex doesn't change, then it's more efficient to create a (static final) Pattern once, rather than use String#matches, which internally creates a new Pattern and Matcher each call. +1 though
0

This is a job for XQuery. It's a one-liner:

collection('file://my-folder/?recurse=yes;select=*.xml')//*[.='/server[^<]*'])

The syntax of collection URIs may vary from one XQuery implementation to another; the above works with Saxon.

Parsing each of the files using DOM and then navigating them using DOM interfaces is just absurdly inefficient both in terms of your time and in terms of machine performance.

You can of course invoke XQuery from Java, and get the results back in a form that Java can manipulate.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.