1

I am dealing with a pandas data frame as shown below.

    id          x1          y1
 0  2           some_val    some_val
 1  2           some_val    some_val
 2  2           some_val    some_val
 3  2           some_val    some_val
 4  2           some_val    some_val
 5  0           0           0 
 6  3           some_val    some_val
 7  3           some_val    some_val
 8  0           0           0 
 9  5           some_val    some_val
10  5           some_val    some_val
11  5           some_val    some_val
12  0           0           0
13  6           some_val    some_val
14  6           some_val    some_val
15  6           some_val    some_val
16  6           some_val    some_val

My original data frame was the data frame without the rows with all '0' values. As per the project requirement I had to insert the rows with all 0's value whenever the "id" changes.

Now I want to delete all the rows of any "id" which has 3 and less than 3 rows. From the above data frame, I would want to delete all the respective rows of id- "3" and "5" . My resultant data frame should look like below:

   id          x1          y1
0  2           some_val    some_val
1  2           some_val    some_val
2  2           some_val    some_val
3  2           some_val    some_val
4  2           some_val    some_val
5  0           0           0
6  6           some_val    some_val
7  6           some_val    some_val
8  6           some_val    some_val
9  6           some_val    some_val

Kindly suggest me a way to obtain this result.

1
  • In your requirements you specify "has 3 and less than 3 rows"... what does this mean? Commented Mar 23, 2017 at 19:59

3 Answers 3

2

The simplest answer is to remove the zero rows because they may get in the way of the calculation if you have more than 3 of them. then do a group by. then filter. then add back zeros like you did in other question/answer

d1 = df.query('ProjID != 0').groupby('ProjID').filter(lambda df: len(df) > 3)
d1

    ProjID     Xcoord    Ycoord
0        2  -7.863509  5.221327
1        2   some_val  some_val
2        2   some_val  some_val
3        2   some_val  some_val
4        2   some_val  some_val
13       6   some_val  some_val
14       6   some_val  some_val
15       6   some_val  some_val
16       6   some_val  some_val

Then add back

pidv = d1.ProjID.values
pid_chg = np.append(pidv[:-1] != pidv[1:], True)

i = d1.index.repeat(pid_chg + 1)

d2 = d1.loc[i, :].copy()

d2.loc[i.duplicated()] = 0

d2.reset_index(drop=True)

    ProjID     Xcoord    Ycoord
0        2  -7.863509  5.221327
1        2   some_val  some_val
2        2   some_val  some_val
3        2   some_val  some_val
4        2   some_val  some_val
5        0          0         0
6        6   some_val  some_val
7        6   some_val  some_val
8        6   some_val  some_val
9        6   some_val  some_val
10       0          0         0
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you so much !! You are a life saver.
1

Say your DataFrame name is df, you need to do the following:

df = df[df['col'<>=condition]] 

Specifically to your case:

df = df[df['ProjID'!=3]] 

Same with 5. You can combine both filters with an 'and' for efficiency.

This is called DataFrame indexing filters.

5 Comments

Regarding your other question, I am a bit confused by it: do you want to delete all 3=> rows as well, or you just want to assign 0 to them?
Thank You. I have tried this, but the problem is the data frame is very huge, with thousands of rows, I cant manually search for the specific ProjID with <=3 rows and delete them. Can you suggest a way where it iterates over the whole df ?
I have to assign a row with 0 values, whenever the ProjID changes. I think, I should do this part once I have dropped the unwanted rows.
The method I proposed does exactly what you want. The "!=3" is a condition, is not an index . This means that when applied, it will loop through the DataFrame and deletes (actually reassigns) any row that has 'ProjID' equal to 3. For example, df = df[df['ProjID'>3]] Will filter out all the 'ProjID' rows that are equal or less than 3.
Regarding your other question, it is better to re-assign zero at the same time. Well, to be more specific, you want to pick up all the rows that satisfy your condition (in this case df[df['ProjID'<=3]] ) , and then replace their values with zeros. This is best done with a lambda function and 'apply': df = df[df.ProjID.apply(lambda x: x if x>3 else 0)] Here is a relevant discussion: stackoverflow.com/questions/11869910/…
0

You can use groupby and filter the IDs with count less than three and use the resulting list to index the df.

filtered = df.groupby('ProjID').Xcoord.filter(lambda x: x.count() > 3)
df.iloc[filtered.index.tolist()]


    ProjID  Xcoord  Ycoord
0   2   -7.863509   5.221327
1   2   some_val    some_val
2   2   some_val    some_val
3   2   some_val    some_val
4   2   some_val    some_val
13  6   some_val    some_val
14  6   some_val    some_val
15  6   some_val    some_val
16  6   some_val    some_val

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.