I have about 30 data files and I need to extract the 4th, 5th, and 6th columns. Then skip 14 columns and grab the next 3 columns and so on till the end of the file. Each data file is about 400 rows and 17000 columns. So far I have this:
file_list = glob.glob('*.dat')
with open("result.dat", "wb") as outfile:
for f in file_list:
with open(f, "rb") as infile:
outfile.write(infile.read())
data = np.loadtxt('result.dat')
arr = np.array(data)
a = arr[:, 4:-1:17]
b = arr[:, 5:-1:17]
c = arr[:, 6:-1:17]
This is writing a file called result.dat that contains all of the data from the multiple files and then I extract the columns I need. However, this is taking a long time to create the array because it is writing all of the information that I do not need as well. Is there a way to only read in the specific columns I am interested instead into the result.dat file? This should cut down the time significantly.
result.datslow, or just reading it? Experiment with theusecolumnsparameter ofloadtxt.loadtxtalso reads the file one line at a time, splits it, collects columns, saves it all in a list of lists. At the end it turns everything into an array.