numpy different data type of multidimensional array

Question

I want to insert 2d data into numpy and project as data frame in pandas . Basically it has 10 rows and 5 column . Data type of all 5 columns are order ( int , string , int , string, string) .

_MaxRows = 10

_MaxColumns = 5


person = np.zeros([_MaxRows,5])


person

def personGen():
    for i in range(0,_MaxRows): 
            # add person dynamically
        # person[i][0] = i
        # person[i][1] = names[i]
        # person[i][2] = random.randint(10,50)
        # person[i][3] = random.choice(['M','F'])
        # person[i][4] = 'Desc'


personGen()

OUTPUT REQUIRED AS DATA FRAME

Id Name Age Gender Desc
1  Sumeet 12 'M' 'HELLO'
2  Sumeet2 13 'M' 'HELLO2'

yar · Accepted Answer · 2017-11-23 12:45:05Z

2

You cannot have different data types in the same numpy array. You could instead have a list of linear numpy arrays with each having its own data type.

e.g. like this:

names = ["asd", "shd", "wdf"]
ages = np.array([12, 35, 23])

d = {'name': names, 'age': ages}
df = pd.DataFrame(data=d)

edited Nov 23, 2017 at 12:45

answered Nov 23, 2017 at 10:41

yar

1,9561 gold badge15 silver badges30 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Sumeet Kumar Yadav Over a year ago

I am new example will help

Tonsic · Accepted Answer · 2020-12-03 18:25:30Z

Building from yar's answer:

You cannot have different data types in the same numpy (uni-multi)dimensional array. You can have different data types in numpy's structured array.

_MaxRows = 2
_MaxNameSize = 7
_MaxDescSize = 4

names = ['Sumeet', 'Sumeet2']

data = list(
    [(i, names[i], random.randint(10,50), random.choice(['M', 'F']), 'Desc') for i in range(0,_MaxRows)]
)

nparray = numpy.array(data, dtype=[('id', int), ('name', f'S{_MaxNameSize}'), ('age', int), ('Gender', 'S1'), ('Desc', f'S{_MaxDescSize}')])

# if you cannot specify string types size you can do like below, however I've read it decreases numpys operations performance because it removes the benefit of the array's data contiguous memory usage.
# nparray = numpy.array(data, dtype=[('id', int), ('name', object), ('age', int), ('Gender', 'S1'), ('Desc', object)])

print('Numpy structured array:\n', nparray)

pddataframe = pd.DataFrame(nparray)

print('\nPandas dataframe:\n', pddataframe)

Panda's dataframe by default already creates an index (0, 1, ..) for your incoming data. So you may choose to not inform the 'id' column.

Collectives™ on Stack Overflow

numpy different data type of multidimensional array

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related