2

I want to insert 2d data into numpy and project as data frame in pandas . Basically it has 10 rows and 5 column . Data type of all 5 columns are order ( int , string , int , string, string) .

_MaxRows = 10

_MaxColumns = 5


person = np.zeros([_MaxRows,5])


person

def personGen():
    for i in range(0,_MaxRows): 
            # add person dynamically
        # person[i][0] = i
        # person[i][1] = names[i]
        # person[i][2] = random.randint(10,50)
        # person[i][3] = random.choice(['M','F'])
        # person[i][4] = 'Desc'


personGen()

OUTPUT REQUIRED AS DATA FRAME

Id Name Age Gender Desc
1  Sumeet 12 'M' 'HELLO'
2  Sumeet2 13 'M' 'HELLO2'

2 Answers 2

2

You cannot have different data types in the same numpy array. You could instead have a list of linear numpy arrays with each having its own data type.

e.g. like this:

names = ["asd", "shd", "wdf"]
ages = np.array([12, 35, 23])

d = {'name': names, 'age': ages}
df = pd.DataFrame(data=d)
Sign up to request clarification or add additional context in comments.

1 Comment

I am new example will help
1

Building from yar's answer:

You cannot have different data types in the same numpy (uni-multi)dimensional array. You can have different data types in numpy's structured array.

_MaxRows = 2
_MaxNameSize = 7
_MaxDescSize = 4

names = ['Sumeet', 'Sumeet2']

data = list(
    [(i, names[i], random.randint(10,50), random.choice(['M', 'F']), 'Desc') for i in range(0,_MaxRows)]
)

nparray = numpy.array(data, dtype=[('id', int), ('name', f'S{_MaxNameSize}'), ('age', int), ('Gender', 'S1'), ('Desc', f'S{_MaxDescSize}')])

# if you cannot specify string types size you can do like below, however I've read it decreases numpys operations performance because it removes the benefit of the array's data contiguous memory usage.
# nparray = numpy.array(data, dtype=[('id', int), ('name', object), ('age', int), ('Gender', 'S1'), ('Desc', object)])

print('Numpy structured array:\n', nparray)

pddataframe = pd.DataFrame(nparray)

print('\nPandas dataframe:\n', pddataframe)

Panda's dataframe by default already creates an index (0, 1, ..) for your incoming data. So you may choose to not inform the 'id' column.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.