2

I receive XML files that need parsing. I code in java regularly, so java SAX was my natural first choice. The XML files have a combination of text elements and one binary element (.xls file).

My parser handler is as:

public void startElement(String uri, String localName, String qName,
            Attributes attributes) throws SAXException{

        if(qName.equalsIgnoreCase("To")){
           toFlag = true;
        }

        if(qName.equalsIgnoreCase("Subject")){
           subjectFlag = true;
        }

        if(qName.equalsIgnoreCase("OutDocumentId")){
           outdocmentIdFlag = true;
        }

        if(qName.equalsIgnoreCase("Filename")){
           filenameFlag = true;
        }

        if(qName.equalsIgnoreCase("EmailType")){
            emailTypeFlag = true;
        }

        if(qName.equalsIgnoreCase("Context")){
            contextTypeFlag = true;
        }

        if(qName.equalsIgnoreCase("Blob")){
            blobTypeFlag = true;
        }


    }

And the element data is parsed here:

public void characters(char ch[], int start, int length) throws SAXException{

        String text = null;
        if (toFlag) {
            text = new String(ch, start, length);
            getRequest().setRecipientEmail(text);
            toFlag = false;
        }

        if (subjectFlag) {
            text = new String(ch, start, length);
            getRequest().setSubject(text);
            subjectFlag = false;
        }

        if (outdocmentIdFlag) {             
            text = new String(ch, start, length);
            getRequest().setOutDocId(text);
            outdocmentIdFlag = false;
        }

        if (filenameFlag) {
            text = new String(ch, start, length);
            getRequest().setFilename(text);
            filenameFlag = false;
        }

        if(emailTypeFlag) {
            text = new String(ch, start, length);
            getRequest().setEmailType(Integer.parseInt(text));
            emailTypeFlag = false;
        }

        if(contextTypeFlag) {
            text = new String(ch, start, length);
            getRequest().setContext(text);
            contextTypeFlag = false;
        }

        if(blobTypeFlag) {
            text = new String(ch, start, length);               
            try {
                getRequest().setBlob(Hibernate.createBlob(text.getBytes("UTF-16")));
            } catch (UnsupportedEncodingException e) {
                     System.out.println("Error creating blob");
                     e.printStackTrace();
            }
            blobTypeFlag = false;
        }

    }

}

The problem is with the blob element, its being read in as a char[] (which I believe is incorrect ) ... because that's what they parent class allow to override during event processing.

Does anybody know how to use the SAX parse when one element, is not text but binary instead?

Greatly appreciated

4
  • 2
    If they're dumping raw binary into a file, it isn't well-formed XML (because there are a bunch of characters that are not permitted), and no conforming parser will parse it. Hopefully they encode the binary data using Base64 or something similar, in which case it is in fact character data (but needs to be decoded). Commented Jan 3, 2013 at 19:02
  • 3
    If they are dumping raw binary into a file, you need to get them to change that. Because what they're giving you is not XML. Commented Jan 3, 2013 at 19:03
  • 1
    Unless you're parsing enormous documents or performance is absolutely critical, consider using a DOM parser such as JDOM (www.jdom.org) -- it will make your task much simpler. Commented Jan 3, 2013 at 19:09
  • Consider using StAX - it will improve readability a lot Commented Jan 3, 2013 at 22:51

1 Answer 1

1

Take the char data and send it to a Base64 decoder.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for the help thus far, but I'm not certain I understand how I get from a ch[] to using the Base64.decodeBase64() .. do I need to convert the ch[] to a string first? I feel I would lose something if so.
First convert the char array to a String as you would any other text value. Then pass the String to the decoder. As others have pointed out, the only way to store binary data in an XML document is to first encode it as text (usually with the Base64 algorithm). To get the original binary data you need to reverse this process -- that is, convert the text back to the binary data.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.