I want to convert .csv file to a Numpy array

Question

I would like to convert a mydata.csv file to a Numpy array.

I have a matrix representation mydata.csv file (The matrix is 14*79 with signed values without any header name.)

-0.094391   -0.086641   0.31659 0.66066 -0.33076    0.02751 …
-0.26169    -0.022418   0.47564 0.39925 -0.22232    0.16129 …
-0.33073    0.026102    0.62409 -0.098799   -0.086641   0.31832 …
-0.22134    0.15488 0.69289 -0.26515    -0.021011   0.47096 …

I thought this code would work for this case.

import numpy as np

data = np.genfromtxt('mydata.csv', dtype=float, delimiter=',', names=False)

but it did not work.

and I would like to have final Numpy data shape as data.shape = (14, 79)

My error message looks like this though..

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-26-060012d7c568> in <module>
      1 import numpy as np
      2 
----> 3 data = np.genfromtxt('output.csv', dtype=float, delimiter=',', names=False)

~\Anaconda3\envs\tensorflow\lib\site-packages\numpy\lib\npyio.py in genfromtxt(fname, dtype, comments, delimiter, skip_header, skip_footer, converters, missing_values, filling_values, usecols, names, excludelist, deletechars, replace_space, autostrip, case_sensitive, defaultfmt, unpack, usemask, loose, invalid_raise, max_rows, encoding)
   1810                            deletechars=deletechars,
   1811                            case_sensitive=case_sensitive,
-> 1812                            replace_space=replace_space)
   1813     # Make sure the names is a list (for 2.5)
   1814     if names is not None:

~\Anaconda3\envs\tensorflow\lib\site-packages\numpy\lib\_iotools.py in easy_dtype(ndtype, names, defaultfmt, **validationargs)
    934             # Simple dtype: repeat to match the nb of names
    935             if nbtypes == 0:
--> 936                 formats = tuple([ndtype.type] * len(names))
    937                 names = validate(names, defaultfmt=defaultfmt)
    938                 ndtype = np.dtype(list(zip(names, formats)))

TypeError: object of type 'bool' has no len()

In the sample data the delimiter isn't a comma (probably a tab) and "names" should be "None" or some other things but not "False". — Michael Butscher
– Michael Butscher, Commented Oct 25, 2019 at 20:59
@MichaelButscher import numpy as np data = np.genfromtxt('mydata.csv', dtype=float, delimiter='\t', names=None) but the data is now [nan nan nan nan nan nan nan nan nan nan nan nan nan nan] — mario119
– mario119, Commented Oct 25, 2019 at 21:09
Apparently you have tried delimiter=',' and delimiter='\t'. Can you find out exactly what the delimiter in the file actually is instead of guessing? How was the file created? Can you open the file in an editor and check the character(s) that separate the fields? — Warren Weckesser
– Warren Weckesser, Commented Oct 25, 2019 at 21:40
@WarrenWeckesser I will share mydata.csv here pastebin.com/eKf9Sqip — mario119
– mario119, Commented Oct 25, 2019 at 21:52
Both np.loadtxt('mydata.csv', delimiter='\t') and np.genfromtxt('mydata.csv', delimiter='\t') worked for me. — Warren Weckesser
– Warren Weckesser, Commented Oct 25, 2019 at 23:01

Muhammad Usman Bashir · Accepted Answer · 2020-04-21 16:45:46Z

3

For this, you first create a list of CSV files (file_names) that you want to append. Then you can export this into a single CSV file by reshaping Numpy-Array. This will help you to move forward:

import pandas as pd
import numpy as np

combined_csv_files = pd.concat( [ pd.read_csv(f) for f in file_names ])

Now, if you want to Export these files into Single .csv-File, use like:

combined_csv_files.to_csv( "combined_csv.csv", index=False)

Now, in order to obtain Numpy Array, you can move forward like this:

data_set = pd.read_csv('combined_csv.csv', header=None)
data_frames = pd.DataFrame(data_set)

required_array = np.array(data_frames.values)
print(required_array)

Here you can also reshape Numpy Array by using:

required_array.shape = (100, 14, 79)

I have perform simple test on cmd to confirm this:

>>> y = np.zeros((2, 3, 4))
>>> y.shape
(2, 3, 4)
>>> y.shape = (3, 8)
>>> y
array([[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]])

edited Apr 21, 2020 at 16:45

answered Oct 25, 2019 at 21:35

Muhammad Usman Bashir

1,5192 gold badges18 silver badges48 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

mario119 Over a year ago

It worked! but I have another question. If I want to combine a lot of datasets(let say I have mydata_1.csv, mydata_2.csv, mydata_3.csv, mydata_100.csv), how can I combine automatically into a numpy array with shape = (100, 14, 79)? because I need to use my own data with Convolutional Neural network code, which is used with MNIST dataset.

mins Over a year ago

@mario119: Instead of adding your requirements in a comment, you should add them to the question. Now the selected answer covers topics not part of the questions, and the actual question has been moved to the background. This is not helpful for those who reach this page with a search engine.

Yepram Yeransian · Accepted Answer · 2019-10-25 21:47:49Z

2

Try this:

import pandas as pd
import numpy as np
mydata = pd.read_csv("mydata.csv")
mydata_array = np.array(mydata)

Out:
array([[-0.26169 , -0.022418,  0.47564 ,  0.39925 , -0.22232 ,  0.16129 ],
   [-0.33073 ,  0.026102,  0.62409 , -0.098799, -0.086641,  0.31832 ],
   [-0.22134 ,  0.15488 ,  0.69289 , -0.26515 , -0.021011,  0.47096 ]])

answered Oct 25, 2019 at 21:47

Yepram Yeransian

3313 silver badges15 bronze badges

1 Comment

mario119 Over a year ago

It worked! and I think the output is the numpy array. However, what is difference between array([.., .., ..], ...]) and [[... ... ...] [... ... ...]...]?

hpaulj · Accepted Answer · 2019-10-25 23:33:42Z

In [347]: txt = """-0.094391   -0.086641   0.31659 0.66066 -0.33076    0.02751 
     ...: -0.26169    -0.022418   0.47564 0.39925 -0.22232    0.16129 
     ...: -0.33073    0.026102    0.62409 -0.098799   -0.086641   0.31832 
     ...: -0.22134    0.15488 0.69289 -0.26515    -0.021011   0.47096""".splitli
     ...: nes()                                                                 
In [348]: txt                                                                   
Out[348]: 
['-0.094391   -0.086641   0.31659 0.66066 -0.33076    0.02751',
 '-0.26169    -0.022418   0.47564 0.39925 -0.22232    0.16129',
 '-0.33073    0.026102    0.62409 -0.098799   -0.086641   0.31832',
 '-0.22134    0.15488 0.69289 -0.26515    -0.021011   0.47096']

In [349]: np.genfromtxt(txt)                                                    
Out[349]: 
array([[-0.094391, -0.086641,  0.31659 ,  0.66066 , -0.33076 ,  0.02751 ],
       [-0.26169 , -0.022418,  0.47564 ,  0.39925 , -0.22232 ,  0.16129 ],
       [-0.33073 ,  0.026102,  0.62409 , -0.098799, -0.086641,  0.31832 ],
       [-0.22134 ,  0.15488 ,  0.69289 , -0.26515 , -0.021011,  0.47096 ]])

False is a bad value for names:

In [350]: np.genfromtxt(txt, names=False)                                       
---------------------------------------------------------------------------
...
TypeError: object of type 'bool' has no len()

names=None would ok, but that's the default value, so it's not needed.

It looks like the delimiter is whitespace. I don't see any commas. The default dtype is float.

Collectives™ on Stack Overflow

I want to convert .csv file to a Numpy array

3 Answers 3

2 Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related