1

I am storing a large text file (10 GBs, N rows and 4 columns) in an HDF5 file using h5py package. Primarily because I do not want to use my RAM.

I would like to sort the items in the file based on second column. Any suggestions on how to do that?

I also heard that it can be done in chunks, any help on that please?

Thanks!

4
  • Does this help - stackoverflow.com/questions/21271727/…? Commented Jul 23, 2020 at 2:15
  • Instead of h5py, use Pytables (aka tables). It has optimized sort and search algorithms. Both can create and operate on an HDF5 file. (Obviously, you will have to read your text data into the HDF5 file first. There are other SO posts that show how to do that.) Commented Jul 23, 2020 at 12:13
  • @kcw78: thanks, I am able to store my data in HDF5 file but I am not able to understand how to sort. Can you please share a MWE ? Commented Jul 28, 2020 at 20:15
  • @bigbounty: this link gives commands, where do I use these commands in my python script? Consider me at a beginner level; would appreciate if you can provide a MWE. Commented Jul 28, 2020 at 20:18

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.