How to delete specific rows from pandas data frame?

Question

I am dealing with a pandas data frame as shown below.

    id          x1          y1
 0  2           some_val    some_val
 1  2           some_val    some_val
 2  2           some_val    some_val
 3  2           some_val    some_val
 4  2           some_val    some_val
 5  0           0           0 
 6  3           some_val    some_val
 7  3           some_val    some_val
 8  0           0           0 
 9  5           some_val    some_val
10  5           some_val    some_val
11  5           some_val    some_val
12  0           0           0
13  6           some_val    some_val
14  6           some_val    some_val
15  6           some_val    some_val
16  6           some_val    some_val

My original data frame was the data frame without the rows with all '0' values. As per the project requirement I had to insert the rows with all 0's value whenever the "id" changes.

Now I want to delete all the rows of any "id" which has 3 and less than 3 rows. From the above data frame, I would want to delete all the respective rows of id- "3" and "5" . My resultant data frame should look like below:

   id          x1          y1
0  2           some_val    some_val
1  2           some_val    some_val
2  2           some_val    some_val
3  2           some_val    some_val
4  2           some_val    some_val
5  0           0           0
6  6           some_val    some_val
7  6           some_val    some_val
8  6           some_val    some_val
9  6           some_val    some_val

Kindly suggest me a way to obtain this result.

In your requirements you specify "has 3 and less than 3 rows"... what does this mean? — Andrew L
– Andrew L, Commented Mar 23, 2017 at 19:59

piRSquared · Accepted Answer · 2017-03-23 20:27:49Z

2

The simplest answer is to remove the zero rows because they may get in the way of the calculation if you have more than 3 of them. then do a group by. then filter. then add back zeros like you did in other question/answer

d1 = df.query('ProjID != 0').groupby('ProjID').filter(lambda df: len(df) > 3)
d1

    ProjID     Xcoord    Ycoord
0        2  -7.863509  5.221327
1        2   some_val  some_val
2        2   some_val  some_val
3        2   some_val  some_val
4        2   some_val  some_val
13       6   some_val  some_val
14       6   some_val  some_val
15       6   some_val  some_val
16       6   some_val  some_val

Then add back

pidv = d1.ProjID.values
pid_chg = np.append(pidv[:-1] != pidv[1:], True)

i = d1.index.repeat(pid_chg + 1)

d2 = d1.loc[i, :].copy()

d2.loc[i.duplicated()] = 0

d2.reset_index(drop=True)

    ProjID     Xcoord    Ycoord
0        2  -7.863509  5.221327
1        2   some_val  some_val
2        2   some_val  some_val
3        2   some_val  some_val
4        2   some_val  some_val
5        0          0         0
6        6   some_val  some_val
7        6   some_val  some_val
8        6   some_val  some_val
9        6   some_val  some_val
10       0          0         0

answered Mar 23, 2017 at 20:27

piRSquared

296k68 gold badges509 silver badges654 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Liza Over a year ago

Thank you so much !! You are a life saver.

kod5kod · Accepted Answer · 2017-03-23 19:26:31Z

1

Say your DataFrame name is df, you need to do the following:

df = df[df['col'<>=condition]]

Specifically to your case:

df = df[df['ProjID'!=3]]

Same with 5. You can combine both filters with an 'and' for efficiency.

This is called DataFrame indexing filters.

answered Mar 23, 2017 at 19:26

kod5kod

614 bronze badges

5 Comments

kod5kod Over a year ago

Regarding your other question, I am a bit confused by it: do you want to delete all 3=> rows as well, or you just want to assign 0 to them?

Liza Over a year ago

Thank You. I have tried this, but the problem is the data frame is very huge, with thousands of rows, I cant manually search for the specific ProjID with <=3 rows and delete them. Can you suggest a way where it iterates over the whole df ?

Liza Over a year ago

I have to assign a row with 0 values, whenever the ProjID changes. I think, I should do this part once I have dropped the unwanted rows.

kod5kod Over a year ago

The method I proposed does exactly what you want. The "!=3" is a condition, is not an index . This means that when applied, it will loop through the DataFrame and deletes (actually reassigns) any row that has 'ProjID' equal to 3. For example, df = df[df['ProjID'>3]] Will filter out all the 'ProjID' rows that are equal or less than 3.

kod5kod Over a year ago

Regarding your other question, it is better to re-assign zero at the same time. Well, to be more specific, you want to pick up all the rows that satisfy your condition (in this case df[df['ProjID'<=3]] ) , and then replace their values with zeros. This is best done with a lambda function and 'apply': df = df[df.ProjID.apply(lambda x: x if x>3 else 0)] Here is a relevant discussion: stackoverflow.com/questions/11869910/…

Vaishali · Accepted Answer · 2017-03-23 20:25:38Z

0

You can use groupby and filter the IDs with count less than three and use the resulting list to index the df.

filtered = df.groupby('ProjID').Xcoord.filter(lambda x: x.count() > 3)
df.iloc[filtered.index.tolist()]


    ProjID  Xcoord  Ycoord
0   2   -7.863509   5.221327
1   2   some_val    some_val
2   2   some_val    some_val
3   2   some_val    some_val
4   2   some_val    some_val
13  6   some_val    some_val
14  6   some_val    some_val
15  6   some_val    some_val
16  6   some_val    some_val

answered Mar 23, 2017 at 20:25

Vaishali

38.5k5 gold badges62 silver badges88 bronze badges

Collectives™ on Stack Overflow

How to delete specific rows from pandas data frame?

3 Answers 3

1 Comment

5 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related