0

In java, I want to parse a file, with heterogenous data (numbers and characters), fast.

I've been reading about ByteBuffer and memory mapped files.

I can copy it, but when parsing data it becomes tricky. I'd like to do it allocating various bytes. But it become then dependent on the encoding?

If the format of the file is, for instance:

someString 8
some other string 88

How can I parse it to String or Integer objects?

Thanks!

Udo.

1
  • 1
    If sequentially accessed and reading "text" and "integers saved as text" is the primary use-case, then I would start with a BufferedReader as a base. ByteBuffer is good for some things. This is generally not it. Commented Feb 8, 2011 at 20:44

3 Answers 3

2

Assuming your format is something like

{string possibly with spaces} {integer}\r?\n

You need to search for the newline, and work backward until you find the first space. You can decode the number yourself and turn it into an int or turn it into a String and parse it. I wouldn't use an Integer unless you had to. Now you know where the start of the line is and the start of the integer you can extract the String as bytes and convert it into a String using your desired encoding.

This assumes that newline and space are one byte in your encoding. It would be more complicated if they are multi-byte byte it can still be done.

EDIT: The following example prints...

text: ' someString', number: 8
text: 'some other string', number: -88

Code

ByteBuffer bb = ByteBuffer.wrap(" someString 8\r\nsome other string -88\n".getBytes());
while(bb.remaining()>0) {
    int start = bb.position(),end, ptr;
    for(end = start;end < bb.limit();end++) {
        byte b = bb.get(end);
        if (b == '\r' || b == '\n')
            break;
    }
    // read the number backwards
    long value = 0;
    long tens = 1;
    for(ptr = end-1;ptr>= start;ptr--) {
        byte b = bb.get(ptr);
        if (b >= '0' && b <= '9') {
            value += tens * (b - '0');
            tens *= 10;
        } else if (b == '-') {
            value = -value;
            ptr--;
            break;
        } else {
            break;
        }
    }
    // assume separator is a space....
    byte[] bytes = new byte[ptr-start];
    bb.get(bytes);
    String text = new String(bytes, "UTF-8");
    System.out.println("text: '"+text+"', number: "+value);

    // find the end of the line.
    if (bb.get(end) == '\r') end++;
    bb.position(end+1);
}
Sign up to request clarification or add additional context in comments.

Comments

1

You can try it this way:

CharacterIterator it = new StringCharacterIterator(StringBuffer.toString());
for (char c = it.first(); c != CharacterIterator.DONE; c = it.next()) {
    if (Character.isDigit(c)) {
        // character is digit
    } else {
        // character is not-digit
    }
}

Or you can use regex if you prefer

String str = StringBuffer.toString();
String numbers = str.replaceAll("\\D", "");
String letters = str.replaceAll("\\W", "");

Then you need to perform Integer.parseInt() as usual on the characters in your string numbers.

5 Comments

Thank you, but I was looking for a more specific ByteBuffer implementation.
A ByteBuffer giving Strings and Integers based on some content?
What would someString 8 return then? A string someString or an integer 8?
well, just iterate, or some way to parse someString to a String object and 8 to an Integer.
Ah, like o = buf.getNextSequence(); and o instanceOf Integer or buf.isInt(). Afraid the performance of such operations wouldn't necessarily be alot more perfomant since the character checks are only pushed down to the buffer implementation.
0

Are you looking for java.util.Scanner? Unless you have really exotic performance requirements, that should be fast enough:

    Scanner s = new Scanner(new File("C:\\test.txt"));
    while (s.hasNext()) {
        String label = s.next();
        int number = s.nextInt();

        System.out.println(number + " " + label);
    }

2 Comments

Why? Have you verified the performance impact justifies the additional effort?
That's a good point, but I'm doing this just to learn how to use it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.