6

I would like to convert a mydata.csv file to a Numpy array.

I have a matrix representation mydata.csv file (The matrix is 14*79 with signed values without any header name.)

-0.094391   -0.086641   0.31659 0.66066 -0.33076    0.02751 …
-0.26169    -0.022418   0.47564 0.39925 -0.22232    0.16129 …
-0.33073    0.026102    0.62409 -0.098799   -0.086641   0.31832 …
-0.22134    0.15488 0.69289 -0.26515    -0.021011   0.47096 …

I thought this code would work for this case.

import numpy as np

data = np.genfromtxt('mydata.csv', dtype=float, delimiter=',', names=False) 

but it did not work.

and I would like to have final Numpy data shape as data.shape = (14, 79)

My error message looks like this though..

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-26-060012d7c568> in <module>
      1 import numpy as np
      2 
----> 3 data = np.genfromtxt('output.csv', dtype=float, delimiter=',', names=False)

~\Anaconda3\envs\tensorflow\lib\site-packages\numpy\lib\npyio.py in genfromtxt(fname, dtype, comments, delimiter, skip_header, skip_footer, converters, missing_values, filling_values, usecols, names, excludelist, deletechars, replace_space, autostrip, case_sensitive, defaultfmt, unpack, usemask, loose, invalid_raise, max_rows, encoding)
   1810                            deletechars=deletechars,
   1811                            case_sensitive=case_sensitive,
-> 1812                            replace_space=replace_space)
   1813     # Make sure the names is a list (for 2.5)
   1814     if names is not None:

~\Anaconda3\envs\tensorflow\lib\site-packages\numpy\lib\_iotools.py in easy_dtype(ndtype, names, defaultfmt, **validationargs)
    934             # Simple dtype: repeat to match the nb of names
    935             if nbtypes == 0:
--> 936                 formats = tuple([ndtype.type] * len(names))
    937                 names = validate(names, defaultfmt=defaultfmt)
    938                 ndtype = np.dtype(list(zip(names, formats)))

TypeError: object of type 'bool' has no len()
5
  • 6
    In the sample data the delimiter isn't a comma (probably a tab) and "names" should be "None" or some other things but not "False". Commented Oct 25, 2019 at 20:59
  • @MichaelButscher import numpy as np data = np.genfromtxt('mydata.csv', dtype=float, delimiter='\t', names=None) but the data is now [nan nan nan nan nan nan nan nan nan nan nan nan nan nan] Commented Oct 25, 2019 at 21:09
  • Apparently you have tried delimiter=',' and delimiter='\t'. Can you find out exactly what the delimiter in the file actually is instead of guessing? How was the file created? Can you open the file in an editor and check the character(s) that separate the fields? Commented Oct 25, 2019 at 21:40
  • @WarrenWeckesser I will share mydata.csv here pastebin.com/eKf9Sqip Commented Oct 25, 2019 at 21:52
  • Both np.loadtxt('mydata.csv', delimiter='\t') and np.genfromtxt('mydata.csv', delimiter='\t') worked for me. Commented Oct 25, 2019 at 23:01

3 Answers 3

3

For this, you first create a list of CSV files (file_names) that you want to append. Then you can export this into a single CSV file by reshaping Numpy-Array. This will help you to move forward:

import pandas as pd
import numpy as np

combined_csv_files = pd.concat( [ pd.read_csv(f) for f in file_names ])

Now, if you want to Export these files into Single .csv-File, use like:

combined_csv_files.to_csv( "combined_csv.csv", index=False)

Now, in order to obtain Numpy Array, you can move forward like this:

data_set = pd.read_csv('combined_csv.csv', header=None)
data_frames = pd.DataFrame(data_set)

required_array = np.array(data_frames.values)
print(required_array)

Here you can also reshape Numpy Array by using:

required_array.shape = (100, 14, 79)

I have perform simple test on cmd to confirm this:

>>> y = np.zeros((2, 3, 4))
>>> y.shape
(2, 3, 4)
>>> y.shape = (3, 8)
>>> y
array([[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]])
Sign up to request clarification or add additional context in comments.

2 Comments

It worked! but I have another question. If I want to combine a lot of datasets(let say I have mydata_1.csv, mydata_2.csv, mydata_3.csv, mydata_100.csv), how can I combine automatically into a numpy array with shape = (100, 14, 79)? because I need to use my own data with Convolutional Neural network code, which is used with MNIST dataset.
@mario119: Instead of adding your requirements in a comment, you should add them to the question. Now the selected answer covers topics not part of the questions, and the actual question has been moved to the background. This is not helpful for those who reach this page with a search engine.
2

Try this:

import pandas as pd
import numpy as np
mydata = pd.read_csv("mydata.csv")
mydata_array = np.array(mydata)

Out:
array([[-0.26169 , -0.022418,  0.47564 ,  0.39925 , -0.22232 ,  0.16129 ],
   [-0.33073 ,  0.026102,  0.62409 , -0.098799, -0.086641,  0.31832 ],
   [-0.22134 ,  0.15488 ,  0.69289 , -0.26515 , -0.021011,  0.47096 ]])

1 Comment

It worked! and I think the output is the numpy array. However, what is difference between array([.., .., ..], ...]) and [[... ... ...] [... ... ...]...]?
2
In [347]: txt = """-0.094391   -0.086641   0.31659 0.66066 -0.33076    0.02751 
     ...: -0.26169    -0.022418   0.47564 0.39925 -0.22232    0.16129 
     ...: -0.33073    0.026102    0.62409 -0.098799   -0.086641   0.31832 
     ...: -0.22134    0.15488 0.69289 -0.26515    -0.021011   0.47096""".splitli
     ...: nes()                                                                 
In [348]: txt                                                                   
Out[348]: 
['-0.094391   -0.086641   0.31659 0.66066 -0.33076    0.02751',
 '-0.26169    -0.022418   0.47564 0.39925 -0.22232    0.16129',
 '-0.33073    0.026102    0.62409 -0.098799   -0.086641   0.31832',
 '-0.22134    0.15488 0.69289 -0.26515    -0.021011   0.47096']

In [349]: np.genfromtxt(txt)                                                    
Out[349]: 
array([[-0.094391, -0.086641,  0.31659 ,  0.66066 , -0.33076 ,  0.02751 ],
       [-0.26169 , -0.022418,  0.47564 ,  0.39925 , -0.22232 ,  0.16129 ],
       [-0.33073 ,  0.026102,  0.62409 , -0.098799, -0.086641,  0.31832 ],
       [-0.22134 ,  0.15488 ,  0.69289 , -0.26515 , -0.021011,  0.47096 ]])

False is a bad value for names:

In [350]: np.genfromtxt(txt, names=False)                                       
---------------------------------------------------------------------------
...
TypeError: object of type 'bool' has no len()

names=None would ok, but that's the default value, so it's not needed.

It looks like the delimiter is whitespace. I don't see any commas. The default dtype is float.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.