19

I have a numpy array of type object. I want to find the columns with numerical values and cast them to float. Also I want to find the indices of the columns with object values. this is my attempt:

import numpy as np
import pandas as pd

df = pd.DataFrame({'A' : [1,2,3,4,5],'B' : ['A', 'A', 'C', 'D','B']})
X = df.values.copy()
obj_ind = []
for ind in range(X.shape[1]):
    try:
        X[:,ind] = X[:,ind].astype(np.float32)
    except:
        obj_ind = np.append(obj_ind,ind)

print obj_ind

print X.dtype

and this is the output I get:

[ 1.]
object
4
  • It's unclear what you're expecting here, your output shows that the second column could not be cast to float and that the dtype is object which is correct as this is a str dtype, if you wanted the column name then you return obj_ind = np.append(obj_ind,x.columns[ind]) Commented Aug 25, 2015 at 15:10
  • I want to convert my first columns to type float @EdChum Commented Aug 25, 2015 at 15:53
  • 1
    the elements of numpy arrays can't have different dtypes. You might need a structured array instead Commented Aug 25, 2015 at 16:22
  • Does this answer your question? Converting numpy dtypes to native python types Commented May 15, 2020 at 15:27

3 Answers 3

23

Generally your idea of trying to apply astype to each column is fine.

In [590]: X[:,0].astype(int)
Out[590]: array([1, 2, 3, 4, 5])

But you have to collect the results in a separate list. You can't just put them back in X. That list can then be concatenated.

In [601]: numlist=[]; obj_ind=[]

In [602]: for ind in range(X.shape[1]):
   .....:     try:
   .....:         x = X[:,ind].astype(np.float32)
   .....:         numlist.append(x)
   .....:     except:
   .....:         obj_ind.append(ind)

In [603]: numlist
Out[603]: [array([ 3.,  4.,  5.,  6.,  7.], dtype=float32)]

In [604]: np.column_stack(numlist)
Out[604]: 
array([[ 3.],
       [ 4.],
       [ 5.],
       [ 6.],
       [ 7.]], dtype=float32)

In [606]: obj_ind
Out[606]: [1]

X is a numpy array with dtype object:

In [582]: X
Out[582]: 
array([[1, 'A'],
       [2, 'A'],
       [3, 'C'],
       [4, 'D'],
       [5, 'B']], dtype=object)

You could use the same conversion logic to create a structured array with a mix of int and object fields.

In [616]: ytype=[]

In [617]: for ind in range(X.shape[1]):
    try:                        
        x = X[:,ind].astype(np.float32)
        ytype.append('i4')
    except:
        ytype.append('O')       

In [618]: ytype
Out[618]: ['i4', 'O']

In [620]: Y=np.zeros(X.shape[0],dtype=','.join(ytype))

In [621]: for i in range(X.shape[1]):
    Y[Y.dtype.names[i]] = X[:,i]

In [622]: Y
Out[622]: 
array([(3, 'A'), (4, 'A'), (5, 'C'), (6, 'D'), (7, 'B')], 
      dtype=[('f0', '<i4'), ('f1', 'O')])

Y['f0'] gives the the numeric field.

Sign up to request clarification or add additional context in comments.

Comments

2

I think this might help

def func(x):
  a = None
  try:
    a = x.astype(float)
  except:
    # x.name represents the current index value 
    # which is column name in this case
    obj.append(x.name) 
    a = x
  return a

obj = []
new_df = df.apply(func, axis=0)

This will keep the object columns as such which you can use later.

Note: While using pandas.DataFrame avoid using iteration using loop as this much slower than performing the same operation using apply.

Comments

1

df.dtypes return a pandas series which can be operated further

# find columns of type int
mask = df.dtypes==int
# select columns for for the same
cols = df.dtypes[mask].index
# select these columns and convert to float
new_cols_df = df[cols].apply(lambda x: x.astype(float), axis=1)
# Replace these columns in original df
df[new_cols_df.columns] = new_cols_df

1 Comment

what I have posted is a minimal working example. In my full code I will not have access to df. Only to X. @shanmuga

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.