I'm currently having an issue with the way that Pandas casts Numpy array into DataFrame.
Sample code:
example_array = np.array([
[1, 2, 3],
['one', 'two', 'three'],
[4.01, 5.01, 6.01],
[np.nan, np.nan, np.nan]])
df = pd.DataFrame(example_array, index=['int', 'string', 'float', 'nan'])
df = df.T
df.dtypes
output:
int object
string object
float object
nan object
dtype: object
It seems that either Numpy or Pandas does not recognise the type nor converts it properly, while looking around one of the suggestions was to specify the dtype in creating a Series, however, this does not help me as I'm working with a large Numpy array.
Example by @r-max:
In [2]: df = pd.DataFrame({'x': pd.Series(['1.0', '2.0', '3.0'], dtype=float), 'y': pd.Series(['1', '2', '3'], dtype=int)})
In [3]: df
Out[3]:
x y
0 1 1
1 2 2
2 3 3
[3 rows x 2 columns]
In [4]: df.dtypes
Out[4]:
x float64
y int64
dtype: object
Is there a better solution to this problem? Is this a bug?
Thanks!