Saving several numpy arrays to one csv

Question

I have several different 'columns' I need to save to a CSV. Currently I do this:

f = open(out_csv, 'w', newline='') 
w = csv.writer(f, delimiter=",", )
w.writerow(['id_a', 'id_b',
            'lat_a','lon_a',
            'lat_b','lon_b',
            'proj_metres'])
w.writerows(np.column_stack((
            id_labels[udist.row],
            id_labels[udist.col],
            points[udist.row],
            points[udist.col],
            udist.data)))

Perhaps not important but for completeness:

tree_dist = tree.sparse_distance_matrix(tree)
udist = sparse.tril(tree_dist, k=-1)

The dimensions are around 30 million by 7 columns (two of which are strings: id_labels) - so this takes a while (around 8 minutes) and uses a lot of RAM as I think python creates a new temporary object when I call np.column_stack so at a one point in time it holds double the data it needs.

I was hoping was there was a better way to create the CSV I need?

Francesco Nazzaro · Accepted Answer · 2016-03-01 14:27:16Z

2

you can open a file with append option and then use np.savetxt

import numpy as np
array1 = np.arange(12).reshape((2, 6))
array2 = np.ones(18).reshape((3, 6))
with open('outputfile.csv', 'ab') as f:
    np.savetxt(f, array1, delimiter=',')
    np.savetxt(f, array2, delimiter=',')

edited Mar 1, 2016 at 14:27

answered Mar 1, 2016 at 13:46

Francesco Nazzaro

2,93614 silver badges21 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

mptevsion Over a year ago

Thanks, this looks great. I'm having a small problem with savetxt: TypeError: Mismatch between array dtype ('float64') and format specifier ('%f,%f'). I import my array as dtype=(float, float), however can't save it as such

Francesco Nazzaro Over a year ago

i can't understand your input array. it has 2 columns? can you write down it?

mptevsion Over a year ago

Sure if I do points[:3] I get: [[ 5.15317040e+01 -3.31830000e-02] [ 5.10514740e+01 -4.04532300e+00] [ 5.38018130e+01 -1.77162300e+00]]

Francesco Nazzaro Over a year ago

try with fmt='%f %f' instead of fmt=('%f', '%f')

mptevsion Over a year ago

Unfortunately no difference, if it helps this is how I import

points = np.genfromtxt(path_to_csv,                        delimiter=',',                        skip_header=1,                        usecols=(0,1),                        dtype=(float, float))

|

B. M. · Accepted Answer · 2016-03-01 14:35:02Z

1

first, save file one by one to avoid memory problems.

let's consider 3 solutions:

a=np.rand(10000,7)

import csv
def testfile():
    with open('test.csv','w') as f :
        w=csv.writer(f)
        w.writerows(a)

def testsavetxt():
    np.savetxt('test.csv',a)


def testpickle():
    with open('test.pickle','wb') as f:
        pickle.dump(a,f)

Some tests:

In [43]: %timeit testfile()
1 loops, best of 3: 576 ms per loop

In [44]: %timeit testsavetxt()
1 loops, best of 3: 442 ms per loop

In [45]: %timeit testpickle()
100 loops, best of 3: 12.3 ms per loop

so savetext is slighty faster.

If csv is not requisite, pickle offer a binary protocol, wich is 40x faster.

answered Mar 1, 2016 at 14:35

B. M.

18.7k2 gold badges40 silver badges56 bronze badges

3 Comments

mptevsion Over a year ago

I like the np.savetxt option - what would you suggest I do to handle several arrays - use the same np.column_stack(.. ? I'm having a bit of bother using 'ab' mode

B. M. Over a year ago

what you do is not very readable, and you will have to do the same when loading. perhaps pandas can help you to collect your data ?

mptevsion Over a year ago

That would mean converting data to pandas, however. I think I will go with:

np.savetxt(out_csv,np.column_stack((             id_labels[udist.row],             id_labels[udist.col],             points[udist.row],             points[udist.col],             udist.data))

Thanks

acdr · Accepted Answer · 2016-03-01 13:19:11Z

0

Not necessarily fast, but:

import numpy as np
arr1 = np.array([1,2,3,4])
arr2 = np.array([11,12,13,14])
arr3 = np.array([21,22,23,24])
numpy_arrays = [arr1, arr2, arr3]

with open(out_csv, "w") as f:
    for values in zip(*numpy_arrays): # or just zip(arr1, arr2, arr3)
        for value in values:
            f.write(str(value) + ",")
        f.write("\n")

This won't use up much more memory than just the memory needed for your separate arrays.

answered Mar 1, 2016 at 13:19

acdr

4,7863 gold badges24 silver badges48 bronze badges

Collectives™ on Stack Overflow

Saving several numpy arrays to one csv

3 Answers 3

7 Comments

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

7 Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related