1

I have a binary pandas dataframe with values 0.0, 1.0, and NaN.

import pandas as pd
df = pd.read_csv("file.csv")

I would like to turn the floats 1.0 and 0.0 into integers 1 and 0. Unfortunately, because of NaN value, this command fails:

df.applymap(int)

The error is:

ValueError: ('cannot convert float NaN to integer', 'occurred at index 0')

Are there "pandas" alternatives?

2
  • What do you want the integer value of NaN to be? What should be the output for input of 0.0, 1.0, NaN? Commented Aug 25, 2016 at 17:40
  • @recursive I want 1.0 to be 1, 0.0 to be 0 and NaN to be ignored Commented Aug 25, 2016 at 17:42

2 Answers 2

3

UPDATE:

if you need nice looking string values you can do it:

In [84]: df.astype(object)
Out[84]:
   a  b    c
0  0  1    0
1  0  0    1
2  1  1    1
3  0  1    1
4  1  1  NaN

but all values - are strings (object in pandas terms):

In [85]: df.astype(object).dtypes
Out[85]:
a    object
b    object
c    object
dtype: object

Timings against 500K rows DF:

In [86]: df = pd.concat([df] * 10**5, ignore_index=True)

In [87]: df.shape
Out[87]: (500000, 3)

In [88]: %timeit df.astype(object)
10 loops, best of 3: 113 ms per loop

In [89]: %timeit df.applymap(lambda x: int(x) if pd.notnull(x) else x).astype(object)
1 loop, best of 3: 7.86 s per loop

OLD answer:

AFAIK you can't do it using modern pandas versions.

Here is a demo:

In [52]: df
Out[52]:
     a    b    c
0  1.0  NaN  0.0
1  NaN  1.0  1.0
2  0.0  0.0  NaN

In [53]: df[pd.isnull(df)] = -1

In [54]: df
Out[54]:
     a    b    c
0  1.0 -1.0  0.0
1 -1.0  1.0  1.0
2  0.0  0.0 -1.0

In [55]: df = df.astype(int)

In [56]: df
Out[56]:
   a  b  c
0  1 -1  0
1 -1  1  1
2  0  0 -1

we are almost there, let's replace -1 with NaN:

In [57]: df[df < 0] = np.nan

In [58]: df
Out[58]:
     a    b    c
0  1.0  NaN  0.0
1  NaN  1.0  1.0
2  0.0  0.0  NaN

Another demo:

In [60]: df = pd.DataFrame(np.random.choice([0,1], (5,3)), columns=list('abc'))

In [61]: df
Out[61]:
   a  b  c
0  1  0  0
1  1  0  1
2  0  1  1
3  0  0  1
4  0  0  1

look what happens with c column if we change a single cell in it to NaN:

In [62]: df.loc[4, 'c'] = np.nan

In [63]: df
Out[63]:
   a  b    c
0  1  0  0.0
1  1  0  1.0
2  0  1  1.0
3  0  0  1.0
4  0  0  NaN
Sign up to request clarification or add additional context in comments.

3 Comments

The best answer appears to be df.astype(object).
@ShanZhengYang, so you don't need integer values as your subject states? Do you need strings that look like integers?
Actually, that didn't work either...whenever I save the matrix via df.to_cvs(), it saves the integers as floats.....Any other ideas what to do?
3

As of pandas 0.24 (January 2019), you can achieve what you need without resolving to object, by using nullable integers instead. Using @MaxU's example:

In [125]: df
Out[125]:
   a  b    c
0  0  1  0.0
1  0  0  1.0
2  1  1  1.0
3  0  1  1.0
4  1  1  NaN

In [126]: df.astype('Int64')
Out[126]:
   a  b    c
0  0  1    0
1  0  0    1
2  1  1    1
3  0  1    1
4  1  1  NaN

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.