1

I need to use NumPy (and only NumPy -- not Pandas or SkLearn, etc) to read in a CSV file. The CSV file contains elements that look as follows:

PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S
2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",female,38,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2. 3101282,7.925,,S
4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35,1,0,113803,53.1,C123,S

I am reading and printing the data as follows:

dataset = np.genfromtxt(dataset_path, delimiter=',', names=True, skip_header=1)
print(titanic_dataset)

The file is being read in, but when looking at the output, the string information is missing (appears as nan:

[(  2., 1., 1., nan, nan, nan, 38.  , 1., 0.,          nan,  71.2833, nan, nan)
 (  3., 1., 3., nan, nan, nan, 26.  , 0., 0.,          nan,   7.925 , nan, nan)
 (  4., 1., 1., nan, nan, nan, 35.  , 1., 0., 1.138030e+05,  53.1   , nan, nan)
 (  5., 0., 3., nan, nan, nan, 35.  , 0., 0., 3.734500e+05,   8.05  , nan, nan)
 (  6., 0., 3., nan, nan, nan,   nan, 0., 0., 3.308770e+05,   8.4583, nan, nan)]

How can I read this csv file, keeping the comma as the delimiter and also read in the string values?

1 Answer 1

2

For consistent number of columns and mixed datatype use :

import numpy as np
np.genfromtxt('filename', dtype= None, delimiter=",")

dtype = none results in a recarry. so to access the field you must use the attributes.

Sign up to request clarification or add additional context in comments.

1 Comment

I never knew that was a thing. Was on my way to suggesting a custom dtype. TIL

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.