I have some CSV text files in the format:
1.3, 0, 1.0
20.0, 3.2, 0
30.5, 5.0, 5.2
The files are about 3.5Gb in size and I cannot read any of them in to memory in Pandas in a useful amount of time.
But I don't need to read the all file, because what I want to do, is to choose some random lines from the file and read the values there, and I know it's theoretically possible to do it if the file is formatted in a way that all the fields have the same size - for instance, float16 in a binary file.
Now, I think I can just convert it, using the NumPy method specified in the answer to question: How to output list of floats to a binary file in Python
But, how do I go about picking a random line from it after the conversion is done?
In a normal text file, I could just do:
import random
offset = random.randrange(filesize)
f = open('really_big_file')
f.seek(offset) #go to random position
f.readline() # discard - bound to be partial line
random_line = f.readline() # bingo!
But I can't find a way for this to work in a binary file made from NumPy.