3

I have an array of strings with some elements such as 'na' that can't be converted to float by using x.astype(np.float) as given here.

Please suggest any better way than the way I did it. Please find the procedure below (it is a snippet from my jupyter notebook, I have shown the intermediate steps just to demonstrate the changes):

In [4]: val_inc

Out [4]:

array(['na', '38.012', '38.7816', '38.0736', '40.7118', '44.7382',
       '39.6416', '38.9177', '36.9031', 43.2611, '38.2732', 40.7129,
       '37.2844', '39.5835', 43.9194, '42.5485', '36.9052', 'na', 41.9264,
       45.3568, '44.6239', 38.1079, 45.2393, '32.785', '44.6239',
       '38.0216', '38.4608', '42.5644', '35.3127', 33.2936, '33.0556',
       '40.4476', 35.6581, '35.5574', '43.1096', '34.4751', 42.0554,
       40.3944, '40.2466', '32.2567', 'na', '38.8594', '43.947', 41.7973,
       '41.8105', 40.3797, 31.2868, '45.3644', '40.7177', '41.8558',
       '38.9249', '33.2077', '42.4053', '42.559'], dtype=object)

In [5]: val_inc[val_inc == 'na']='0'

In [6]: val_inc

Out [6]:

array(['0', '38.012', '38.7816', '38.0736', '40.7118', '44.7382',
       '39.6416', '38.9177', '36.9031', 43.2611, '38.2732', 40.7129,
       '37.2844', '39.5835', 43.9194, '42.5485', '36.9052', '0', 41.9264,
       45.3568, '44.6239', 38.1079, 45.2393, '32.785', '44.6239',
       '38.0216', '38.4608', '42.5644', '35.3127', 33.2936, '33.0556',
       '40.4476', 35.6581, '35.5574', '43.1096', '34.4751', 42.0554,
       40.3944, '40.2466', '32.2567', '0', '38.8594', '43.947', 41.7973,
       '41.8105', 40.3797, 31.2868, '45.3644', '40.7177', '41.8558',
       '38.9249', '33.2077', '42.4053', '42.559'], dtype=object)

In [7]: val_inc = val_inc.astype(np.float)

In [8]: val_inc

Out [8]:

array([  0.    ,  38.012 ,  38.7816,  38.0736,  40.7118,  44.7382,
        39.6416,  38.9177,  36.9031,  43.2611,  38.2732,  40.7129,
        37.2844,  39.5835,  43.9194,  42.5485,  36.9052,   0.    ,
        41.9264,  45.3568,  44.6239,  38.1079,  45.2393,  32.785 ,
        44.6239,  38.0216,  38.4608,  42.5644,  35.3127,  33.2936,
        33.0556,  40.4476,  35.6581,  35.5574,  43.1096,  34.4751,
        42.0554,  40.3944,  40.2466,  32.2567,   0.    ,  38.8594,
        43.947 ,  41.7973,  41.8105,  40.3797,  31.2868,  45.3644,
        40.7177,  41.8558,  38.9249,  33.2077,  42.4053,  42.559 ])

In [9]: np.mean(val_inc[val_inc!=0.])

Out [9]: 39.587374509803915

In [10]: val_inc[val_inc==0.]=np.mean(val_inc[val_inc!=0.])

In [11]: val_inc

Out [11]:

array([ 39.58737451,  38.012     ,  38.7816    ,  38.0736    ,
        40.7118    ,  44.7382    ,  39.6416    ,  38.9177    ,
        36.9031    ,  43.2611    ,  38.2732    ,  40.7129    ,
        37.2844    ,  39.5835    ,  43.9194    ,  42.5485    ,
        36.9052    ,  39.58737451,  41.9264    ,  45.3568    ,
        44.6239    ,  38.1079    ,  45.2393    ,  32.785     ,
        44.6239    ,  38.0216    ,  38.4608    ,  42.5644    ,
        35.3127    ,  33.2936    ,  33.0556    ,  40.4476    ,
        35.6581    ,  35.5574    ,  43.1096    ,  34.4751    ,
        42.0554    ,  40.3944    ,  40.2466    ,  32.2567    ,
        39.58737451,  38.8594    ,  43.947     ,  41.7973    ,
        41.8105    ,  40.3797    ,  31.2868    ,  45.3644    ,
        40.7177    ,  41.8558    ,  38.9249    ,  33.2077    ,
        42.4053    ,  42.559     ])
4
  • Replace 'na' with 'nan' and it will be convertible to floating point. Commented Jan 9, 2018 at 8:11
  • @kazemakase thanks for your suggestion. I was not aware that string 'nan' could have been directly converted to np.nan Commented Jan 10, 2018 at 18:16
  • Apologies that my question turned out to be a duplicate, I will work on my searching skills. Commented Jan 10, 2018 at 18:24
  • no need to apologize... on the contrary, being marked as a duplicate your question now serves as a sign-post for others who may be looking for the same search terms as you did. Commented Jan 10, 2018 at 18:34

2 Answers 2

3

replace 'na' with 'nan' so it is then converted to np.nan, then use np.nanmean.

example:

test = np.array(['0','1','nan'], dtype=float)
np.where(np.isnan(test), np.nanmean(test), test)

array([ 0. ,  1. ,  0.5])
Sign up to request clarification or add additional context in comments.

1 Comment

Your suggestion was the fastest way to solve my problem among other suggestions. Thanks!
2

Better would be to first convert 'na' to proper NaN. Then one can use the data anyway one wants:

import numpy as np
val_inc[val_inc == 'na'] = np.nan   # 'na' to proper NaN or missing value
val_inc = val_inc.astype(np.float)  # no error here now.
print(val_inc)

Ouput:

[     nan  38.012   38.7816  38.0736  40.7118  44.7382  39.6416  38.9177
  36.9031  43.2611  38.2732  40.7129  37.2844  39.5835  43.9194  42.5485
  36.9052      nan  41.9264  45.3568  44.6239  38.1079  45.2393  32.785
  44.6239  38.0216  38.4608  42.5644  35.3127  33.2936  33.0556  40.4476
  35.6581  35.5574  43.1096  34.4751  42.0554  40.3944  40.2466  32.2567
      nan  38.8594  43.947   41.7973  41.8105  40.3797  31.2868  45.3644
  40.7177  41.8558  38.9249  33.2077  42.4053  42.559 ]

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.