0

I want to read an file of 1.5 GB into an array. Now, as it takes long time, I want to switch it to some other option. Can anybody help me,

If I preprocess the byte file into some database (or may be in other way) can I make it faster ?

Can anybody help me is there any other way to make it faster.

Actually, I have to process more than 50, 1.5GB file. So, such operation is quite expensive for me.

9
  • 3
    why are you reading them into an array? Commented Jul 31, 2012 at 17:18
  • and why are you reading 50 such files ? what are you going to do with them? Commented Jul 31, 2012 at 17:19
  • 2
    You might be able to speed it up a bit, but the real hits are reserving that much memory and the disk io. The way to address that is to basically not load it up into array. So why do you want to load it all up? Commented Jul 31, 2012 at 17:19
  • @MarkusMikkolainen, For accessing the array elements (non-sequencially) and then processing the elements (like comparing with other elements) and printing it. Commented Jul 31, 2012 at 17:21
  • how do you process them? could you use RandomAccessFile to access them from disk or something? or MMAP the file ? Commented Jul 31, 2012 at 17:22

3 Answers 3

1

It depends on what you want to do.

If you only wanted to access a few random bytes, then reading into an array isn't good - a MappedByteBuffer would be better.

If you want to read all the data and sequentially process it a small portion at a time then you could stream it.

If you need to do computations that do random access of the whole dataset, particularly if you need to repeatedly read elements, then loading into an array might be sensible (but a ByteBuffer is still a candidate).

Can you show some example code or explain further?

Sign up to request clarification or add additional context in comments.

Comments

1

How fast is your disk subsystem?

If you can read 40 MB per second, reading 1500 MB should take about 40 seconds. If you want to go faster than this, you need a faster disk subsystem. If you are reading from a local drive and its taking minutes, you have a tuning problem and there is not much you can doing Java to fix this because it is not the problem.

You can use a memory mapped file instead, but this will only speed up the access if you don't need all the data. If you need it all, you are limited by the speed of your hardware.

5 Comments

one way to go faster would be NOT to read the whole files, if you really dont need every byte to be read. If you need every byte, then you can try to improve by just reading every byte exactly once.
@MarkusMikkolainen, I need every byte to process.
then you are SOL, buy faster disk.
filechannel.map and accessing the mappedbyteuffer might be the fastest way to access a file , since i assume that it is implemented in a very efficient way (since it is platform code)
heh.for me on my desktop it takes 354 milliseconds to read a 1.25Gbyte file which happens to be in operating system cache already. so java shouldnt be a problem even with fileInputstream
0

Using BufferedInputStream or InputStream is probably as fast as you can get (faster than RandomAccessFile). The largest int size is 2,147,483,647 so you're getting somewhat close there with your array of 1,610,612,736 which would also be the max size of an array.

I'd recommend you just access the file using BufferedInputStream for best speed, skip() and read() to get the data you want. Maybe have a class that implements those, is aware of its position, and takes care of the seeking for you when you send it an offset to read from. I believe you close and reopen the input stream to put it back at the beginning.

And... you may not want to save them in an array and just access them on need from the file. That might help if loading time is your killer.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.