read data to numpy array

Question

I have a file below

label,feature
0,70 80 90 50 33 58 ...
2,53 56 84 56 25 12 ...
1,32 56 84 89 65 87 ...
...
2,56 48 57 56 99 22 ...
4,25 65 84 54 54 15 ...

I want the data could be

Ytrain = [0,2,1,...2,4]  (int, ndarray)
Xtrain = [[70 80 90 50 33 58...],
          [53 56 80 56 25 12...],
          ...
          [25 65 84 54 54 15...]] (int, ndarray)

here is my code

data = pd.read_csv('train.csv')
Ytrain = np.array(data.iloc[:, 0]).astype(int)
train = np.array(data.iloc[:, 1:]).astype(str)

Xtrain = []
for i in range(len(train)):
    tmp = [int(x) for x in train[i][0].split()]
    Xtrain.append(tmp)
Xtrain = np.array(Xtrain)

do you have a better way to do that ?

jezrael · Accepted Answer · 2018-02-11 12:10:35Z

2

Add multiple separator to read_csv with header=None and skiprows=1 for not read csv header:

data = pd.read_csv('train.csv', sep="[,\s+]", header=None, skiprows=1, engine='python')
print (data)
   0   1   2   3   4   5   6
0  0  70  80  90  50  33  58
1  2  53  56  84  56  25  12
2  1  32  56  84  89  65  87
3  2  56  48  57  56  99  22
4  4  25  65  84  54  54  15

Last select by iloc:

Ytrain = data.iloc[:,0].values
Xtrain = data.iloc[:,1:].values

Or use split with expand=True for DataFrame:

data = pd.read_csv('train.csv')
Ytrain = data.iloc[:,0].values.astype(int)
Xtrain = data.iloc[:,1].str.split(expand=True).values.astype(int)

print (Ytrain)
[0 2 1 2 4]

print (Xtrain)
[[70 80 90 50 33 58]
 [53 56 84 56 25 12]
 [32 56 84 89 65 87]
 [56 48 57 56 99 22]
 [25 65 84 54 54 15]]

edited Feb 11, 2018 at 12:10

answered Feb 11, 2018 at 11:58

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

Comments

jpp · Accepted Answer · 2018-02-11 12:14:11Z

0

You can use numpy for this. Since you have multiple delimiters, a little more work is required.

import numpy as np

s = open('train.csv', 'r').read().replace(',', ' ')
arr = np.genfromtxt(s)

Ytrain = arr[:, 1]
Xtrain = arr[:, 1:]

edited Feb 11, 2018 at 12:14

answered Feb 11, 2018 at 12:00

jpp

166k37 gold badges301 silver badges362 bronze badges

1 Comment

jpp Over a year ago

@Sam, updated - play around / google with open, it should be possible to feed this into genfromtxt.

Collectives™ on Stack Overflow

read data to numpy array

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related