4

I have a .csv file and it looks like

 1, 1 2 3 4 5
 3, 2 3 4 5 6
 2, 5 6 5 4 8
 5, 5 4 8 6 2
 ... 

how can I do to get the first column

a = [1 3 2 5 ...] 

and the matrix

b = [ 1 2 3 4 5
      2 3 4 5 6
      5 6 5 4 8
      5 4 8 6 2 ]

with type integer numpy array and I have tried

data = np.asarray(pd.read_csv('Data.csv'))

but it make the thing worse...

3 Answers 3

2

I think you need,

df=pd.read_csv()
first_col=np.array(df.iloc[:0])
df_array=np.array(df.iloc[:,1:])
Sign up to request clarification or add additional context in comments.

Comments

1

A pure Numpy approach would be using np.loadtext() and converting the strings to a proper type by passing in converter function:

In [70]: col1, col2 = np.loadtxt('test.csv', converters={0:int, 1:bytes.decode}, dtype=str, delimiter=',', unpack=True)

In [71]: col1 = col1.astype(int)

In [72]: col2 = np.vstack(np.core.defchararray.split(col2)).astype(int)

Result:

In [73]: col1
Out[73]: array([1, 3, 2, 5])

In [74]: col2
Out[74]: 
array([[1, 2, 3, 4, 5],
       [2, 3, 4, 5, 6],
       [5, 6, 5, 4, 8],
       [5, 4, 8, 6, 2]])

Note that before converting col2 to and array of integers it's an array of strings like following:

In [76]: col2
Out[76]: 
array([' 1 2 3 4 5', ' 2 3 4 5 6', ' 5 6 5 4 8', ' 5 4 8 6 2'], 
      dtype='<U10')

If you also want them separated but in string type at the next step you just don't need to use vstack() and astype(). In that case you'll get:

In [77]: np.core.defchararray.split(col2)
Out[77]: 
array([['1', '2', '3', '4', '5'], ['2', '3', '4', '5', '6'],
       ['5', '6', '5', '4', '8'], ['5', '4', '8', '6', '2']], dtype=object)

Comments

1

pandas supports multiple delimiters via regex, pd.read_csv, engine='python'. You can try something like this:

df = pd.read_csv('Data.csv', header=None, sep=' |, ',
                 engine='python', dtype=int)

Then retrieve your data as follows:

a = df.iloc[:, 0].values
b = df.iloc[:, 1:].values

3 Comments

The actual separator would be ", ", not just "," but you could remove the resulting NaN column with .dropna(axis=1)
@MaximilianPeters. thanks, i haven't tested this code - just a proof of concept. but i have updated nonetheless.
sorry , the problem is b matrix is str not int

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.