0

I have an exported pandas dataframe that is now a numpy.array object.

subset = array[:4,:]
array([[  2.        ,  12.        ,  33.33333333,   2.        ,
         33.33333333,  12.        ],
       [  2.        ,   2.        ,  33.33333333,   2.        ,
         33.33333333,   2.        ],
       [  2.8       ,   8.        ,  45.83333333,   2.75      ,
         46.66666667,  13.        ],
       [  3.11320755,  75.        ,  56.        ,   3.24      ,
         52.83018868,  33.        ]])
print subset.dtype
dtype('float64')

I was to convert the column values to specific types, and set column names as well, this means I need to convert it to a ndarray.

Here are my dtypes:

[('PERCENT_A_NEW', '<f8'), ('JoinField', '<i4'), ('NULL_COUNT_B', '<f8'), 
('PERCENT_COMP_B', '<f8'), ('RANKING_A', '<f8'), ('RANKING_B', '<f8'),
('NULL_COUNT_B', '<f8')]

When I go to convert the array, I get:

 ValueError: new type not compatible with array.

How do you cast each column to a specific value so I can convert the array to an ndarray?

Thanks

3
  • You should use correct dtype like np.int16, np.float32, np.float64 .... Commented Nov 17, 2016 at 16:25
  • 2
    You can do it in pandas itself using the .astype method. Why convert to an array unnecessarily? Commented Nov 17, 2016 at 16:26
  • @Kartik the program I'm working with uses numpy array. Commented Nov 17, 2016 at 17:43

1 Answer 1

2

You already have an ndarray. What you are seeking is a structured array, one with this compound dtype. First see if pandas can do it for you. If that fails we might be able to do something with tolist and a list comprehension.

In [84]: dt=[('PERCENT_A_NEW', '<f8'), ('JoinField', '<i4'), ('NULL_COUNT_B', '<
    ...: f8'), 
    ...: ('PERCENT_COMP_B', '<f8'), ('RANKING_A', '<f8'), ('RANKING_B', '<f8'),
    ...: ('NULL_COUNT_B', '<f8')]
In [85]: subset=np.array([[  2.        ,  12.        ,  33.33333333,   2.       
    ...:  ,
    ...:          33.33333333,  12.        ],
    ...:        [  2.        ,   2.        ,  33.33333333,   2.        ,
    ...:          33.33333333,   2.        ],
    ...:        [  2.8       ,   8.        ,  45.83333333,   2.75      ,
    ...:          46.66666667,  13.        ],
    ...:        [  3.11320755,  75.        ,  56.        ,   3.24      ,
    ...:          52.83018868,  33.        ]])
In [86]: subset
Out[86]: 
array([[  2.        ,  12.        ,  33.33333333,   2.        ,
         33.33333333,  12.        ],
       [  2.        ,   2.        ,  33.33333333,   2.        ,
         33.33333333,   2.        ],
       [  2.8       ,   8.        ,  45.83333333,   2.75      ,
         46.66666667,  13.        ],
       [  3.11320755,  75.        ,  56.        ,   3.24      ,
         52.83018868,  33.        ]])

Now make an array with dt. Input for a structured array has to be a list of tuples - so I'm using tolist and a list comprehension

In [87]: np.array([tuple(row) for row in subset.tolist()],dtype=dt)
....
ValueError: field 'NULL_COUNT_B' occurs more than once
In [88]: subset.shape
Out[88]: (4, 6)
In [89]: dt
Out[89]: 
[('PERCENT_A_NEW', '<f8'),
 ('JoinField', '<i4'),
 ('NULL_COUNT_B', '<f8'),
 ('PERCENT_COMP_B', '<f8'),
 ('RANKING_A', '<f8'),
 ('RANKING_B', '<f8'),
 ('NULL_COUNT_B', '<f8')]
In [90]: dt=[('PERCENT_A_NEW', '<f8'), ('JoinField', '<i4'), ('NULL_COUNT_B', '<
    ...: f8'), 
    ...: ('PERCENT_COMP_B', '<f8'), ('RANKING_A', '<f8'), ('RANKING_B', '<f8')]
In [91]: np.array([tuple(row) for row in subset.tolist()],dtype=dt)
Out[91]: 
array([(2.0, 12, 33.33333333, 2.0, 33.33333333, 12.0),
       (2.0, 2, 33.33333333, 2.0, 33.33333333, 2.0),
       (2.8, 8, 45.83333333, 2.75, 46.66666667, 13.0),
       (3.11320755, 75, 56.0, 3.24, 52.83018868, 33.0)], 
      dtype=[('PERCENT_A_NEW', '<f8'), ('JoinField', '<i4'), ('NULL_COUNT_B', '<f8'), ('PERCENT_COMP_B', '<f8'), ('RANKING_A', '<f8'), ('RANKING_B', '<f8')])
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.