Difficulty Creating column of Pandas Dataframe

Question

I wanted to create a new column of a dataframe based on existing columns, however I want it to be conditional on another existing column in my dataframe. The following code is not working. Does anyone know why?

if CV['keyword'] == 0:
    CV['left out'] = (CV['Prediction Numerator'] - (CV['Rate'] *10000))/(CV['Prediction Denominator'] - 10000)
else:
    CV['left out'] = (CV['Prediction Numerator'] - (CV['Rate'] *10000 * 10))/(CV['Prediction Denominator'] - (10000 * 10))

I'm getting the following error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\bwei\Downloads\WinPython-64bit-2.7.9.4\python-2.7.9.amd64\lib\site-packages\pandas\core\generic.py", line 709, in __nonzero__
    .format(self.__class__.__name__))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Here's a snippet of the first 4 columns of my dataframe.

        Zip  keyword  Prediction Numerator  Prediction Denominator  
0     01001        0        7650546.693200            40002.558782   
1     01001        0        7650546.693200            40002.558782   
2     01001        0        7650546.693200            40002.558782   
3     01001        0        7650546.693200            40002.558782   
4     01002        0            157.951741                0.718621   
5     01002        0            157.951741                0.718621   
6     01005        0        3600150.148240            20000.671431   
7     01005        0        3600150.148240            20000.671431   
8     01007        0        6932235.816260            30000.936191   
9     01007        0        6932235.816260            30000.936191   
10    01007        0        6932235.816260            30000.936191

Thanks, Ben

alex314159 · Accepted Answer · 2015-07-08 21:11:37Z

4

This should work:

CV.loc[CV['keyword']==0,'left out']=expression1
CV.loc[CV['keyword']!=0,'left out']=expression2

answered Jul 8, 2015 at 21:11

alex314159

3,2673 gold badges23 silver badges28 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

ben890 Over a year ago

This was exactly what I want, thanks! Just so I understand what it's doing, it's essentially looking at locations where keyword = 0, and then setting left out equal to each given expression?

alex314159 Over a year ago

Exactly - might not be fastest as you filter 2x but simplest to read/understand

Nir Friedman · Accepted Answer · 2015-07-08 20:56:20Z

1

Instead of CV['keyword'] == 0, you should use 'keyword' in CV.columns to see if there is a column named "keyword" in CV.

answered Jul 8, 2015 at 20:56

Nir Friedman

17.9k2 gold badges48 silver badges77 bronze badges

3 Comments

ben890 Over a year ago

No the keyword column is always there, I just want to say if keyword = 0 for a specific then perform operation a and if it's equal to 0, perform operation b.

Nir Friedman Over a year ago

You're going to have to describe more precisely what you want. What does it mean to compare an entire column to 0? There are many possible interpretations. Please clarify your question.

ben890 Over a year ago

I edited my question to give more clarity. For example, at row 0, keyword = 0, so for that I'd want ['left out'] to be equal to the if, if keyword == 1, I'd want ['left out'] to be equal to the else.

Ami Tavory · Accepted Answer · 2015-07-08 21:06:48Z

1

When you write

if CV['keyword'] == 0:

then CV['keyword'] is a column, and comparing it to 0 returns a boolean series. You cannot perform an if on such a series (which value would determine if it's True or False?), and hence the error.

Fortunately, CV.columns works pretty much like a Python list, so you can check membership using it.

answered Jul 8, 2015 at 21:06

Ami Tavory

76.7k13 gold badges152 silver badges196 bronze badges

Comments

JoeCondron · Accepted Answer · 2015-07-08 21:17:41Z

0

What you want is

CV['left out'] = np.where(CV['keyword'] == 0,
  (CV['Prediction Numerator'] - (CV['Rate'] *10000))/(CV['Prediction Denominator'] - 10000), 
   (CV['left out'] = (CV['Prediction Numerator'] - (CV['Rate'] * 10000 * 10))/(CV['Prediction Denominator'] - (10000 * ))
)

edited Jul 8, 2015 at 21:17

answered Jul 8, 2015 at 21:08

JoeCondron

8,9263 gold badges29 silver badges28 bronze badges

2 Comments

ben890 Over a year ago

using the same if else control flow?

JoeCondron Over a year ago

You have two formulas and you want to use the first formula for rows where CV['keyword'] == 0 and the other formula for rows where that's not the case, is that correct? np.where does the if else element wise in a vectorized way. You would have to loop through your rows using if you want to use if else which would be inefficient.

Collectives™ on Stack Overflow

Difficulty Creating column of Pandas Dataframe

4 Answers 4

2 Comments

3 Comments

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

3 Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related