Is there a better way to convert 'object' type array to numpy array by replacing 'na' with mean? [duplicate]

Question

I have an array of strings with some elements such as 'na' that can't be converted to float by using x.astype(np.float) as given here.

Please suggest any better way than the way I did it. Please find the procedure below (it is a snippet from my jupyter notebook, I have shown the intermediate steps just to demonstrate the changes):

In [4]: val_inc

Out [4]:

array(['na', '38.012', '38.7816', '38.0736', '40.7118', '44.7382',
       '39.6416', '38.9177', '36.9031', 43.2611, '38.2732', 40.7129,
       '37.2844', '39.5835', 43.9194, '42.5485', '36.9052', 'na', 41.9264,
       45.3568, '44.6239', 38.1079, 45.2393, '32.785', '44.6239',
       '38.0216', '38.4608', '42.5644', '35.3127', 33.2936, '33.0556',
       '40.4476', 35.6581, '35.5574', '43.1096', '34.4751', 42.0554,
       40.3944, '40.2466', '32.2567', 'na', '38.8594', '43.947', 41.7973,
       '41.8105', 40.3797, 31.2868, '45.3644', '40.7177', '41.8558',
       '38.9249', '33.2077', '42.4053', '42.559'], dtype=object)

In [5]: val_inc[val_inc == 'na']='0'

In [6]: val_inc

Out [6]:

array(['0', '38.012', '38.7816', '38.0736', '40.7118', '44.7382',
       '39.6416', '38.9177', '36.9031', 43.2611, '38.2732', 40.7129,
       '37.2844', '39.5835', 43.9194, '42.5485', '36.9052', '0', 41.9264,
       45.3568, '44.6239', 38.1079, 45.2393, '32.785', '44.6239',
       '38.0216', '38.4608', '42.5644', '35.3127', 33.2936, '33.0556',
       '40.4476', 35.6581, '35.5574', '43.1096', '34.4751', 42.0554,
       40.3944, '40.2466', '32.2567', '0', '38.8594', '43.947', 41.7973,
       '41.8105', 40.3797, 31.2868, '45.3644', '40.7177', '41.8558',
       '38.9249', '33.2077', '42.4053', '42.559'], dtype=object)

In [7]: val_inc = val_inc.astype(np.float)

In [8]: val_inc

Out [8]:

array([  0.    ,  38.012 ,  38.7816,  38.0736,  40.7118,  44.7382,
        39.6416,  38.9177,  36.9031,  43.2611,  38.2732,  40.7129,
        37.2844,  39.5835,  43.9194,  42.5485,  36.9052,   0.    ,
        41.9264,  45.3568,  44.6239,  38.1079,  45.2393,  32.785 ,
        44.6239,  38.0216,  38.4608,  42.5644,  35.3127,  33.2936,
        33.0556,  40.4476,  35.6581,  35.5574,  43.1096,  34.4751,
        42.0554,  40.3944,  40.2466,  32.2567,   0.    ,  38.8594,
        43.947 ,  41.7973,  41.8105,  40.3797,  31.2868,  45.3644,
        40.7177,  41.8558,  38.9249,  33.2077,  42.4053,  42.559 ])

In [9]: np.mean(val_inc[val_inc!=0.])

Out [9]: 39.587374509803915

In [10]: val_inc[val_inc==0.]=np.mean(val_inc[val_inc!=0.])

In [11]: val_inc

Out [11]:

array([ 39.58737451,  38.012     ,  38.7816    ,  38.0736    ,
        40.7118    ,  44.7382    ,  39.6416    ,  38.9177    ,
        36.9031    ,  43.2611    ,  38.2732    ,  40.7129    ,
        37.2844    ,  39.5835    ,  43.9194    ,  42.5485    ,
        36.9052    ,  39.58737451,  41.9264    ,  45.3568    ,
        44.6239    ,  38.1079    ,  45.2393    ,  32.785     ,
        44.6239    ,  38.0216    ,  38.4608    ,  42.5644    ,
        35.3127    ,  33.2936    ,  33.0556    ,  40.4476    ,
        35.6581    ,  35.5574    ,  43.1096    ,  34.4751    ,
        42.0554    ,  40.3944    ,  40.2466    ,  32.2567    ,
        39.58737451,  38.8594    ,  43.947     ,  41.7973    ,
        41.8105    ,  40.3797    ,  31.2868    ,  45.3644    ,
        40.7177    ,  41.8558    ,  38.9249    ,  33.2077    ,
        42.4053    ,  42.559     ])

Replace 'na' with 'nan' and it will be convertible to floating point. — MB-F
– MB-F, Commented Jan 9, 2018 at 8:11
@kazemakase thanks for your suggestion. I was not aware that string 'nan' could have been directly converted to np.nan — thepunitsingh
– thepunitsingh, Commented Jan 10, 2018 at 18:16
Apologies that my question turned out to be a duplicate, I will work on my searching skills. — thepunitsingh
– thepunitsingh, Commented Jan 10, 2018 at 18:24
no need to apologize... on the contrary, being marked as a duplicate your question now serves as a sign-post for others who may be looking for the same search terms as you did. — MB-F
– MB-F, Commented Jan 10, 2018 at 18:34

Julien · Accepted Answer · 2018-01-09 08:41:26Z

3

replace 'na' with 'nan' so it is then converted to np.nan, then use np.nanmean.

example:

test = np.array(['0','1','nan'], dtype=float)
np.where(np.isnan(test), np.nanmean(test), test)

array([ 0. ,  1. ,  0.5])

edited Jan 9, 2018 at 8:41

answered Jan 9, 2018 at 8:11

Julien

15.3k6 gold badges33 silver badges58 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

thepunitsingh Over a year ago

Your suggestion was the fastest way to solve my problem among other suggestions. Thanks!

rnso · Accepted Answer · 2018-01-09 08:36:03Z

Better would be to first convert 'na' to proper NaN. Then one can use the data anyway one wants:

import numpy as np
val_inc[val_inc == 'na'] = np.nan   # 'na' to proper NaN or missing value
val_inc = val_inc.astype(np.float)  # no error here now.
print(val_inc)

Ouput:

[     nan  38.012   38.7816  38.0736  40.7118  44.7382  39.6416  38.9177
  36.9031  43.2611  38.2732  40.7129  37.2844  39.5835  43.9194  42.5485
  36.9052      nan  41.9264  45.3568  44.6239  38.1079  45.2393  32.785
  44.6239  38.0216  38.4608  42.5644  35.3127  33.2936  33.0556  40.4476
  35.6581  35.5574  43.1096  34.4751  42.0554  40.3944  40.2466  32.2567
      nan  38.8594  43.947   41.7973  41.8105  40.3797  31.2868  45.3644
  40.7177  41.8558  38.9249  33.2077  42.4053  42.559 ]

Collectives™ on Stack Overflow

Is there a better way to convert 'object' type array to numpy array by replacing 'na' with mean? [duplicate]

2 Answers 2

1 Comment

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Linked

Related