0

im stuck.

i have a huge dataframe that looks like this:

| Index| Field|
| -------- | -------------- |
| 1| A|
| 1| B|
| 1| C|
| 2| A|
| 2| C|
| 3| A|
| 3| B|

At first i was grouping the Index and concatenated the column "field" with pandas and the groupby command.

Now my dataframe looks like this:

| Index| Field|
| -------- | -------------- |
| 1| [A, B, C]
| 2| [B, C] 
| 3| [A, B]

The next step is, that i want to count, how many times [A, B, C] exist in the whole dataset. The solution should be like this:

 | Field | Counts|
| -------- | -------------- |
| [A, B, C]| 222
| [B, C] | 530 
| [A, B] | 400 

because i put it in an list (or np array), i don't know how to achieve the next output. Because i now have an list / array. i can not do another pd.groupby or im to blind to see.

can anybody give some hint or has an solution how to solve this?

thanks!

edit: sorry for the bad editing, stackoverflow didn't let me use the table formatting without saying this is some code

1
  • Your counts do not make sense based on your sample data. Commented Aug 24, 2021 at 20:10

1 Answer 1

1

Try with tuple not list , then we can do value_counts

s = df.groupby('Index')['Field'].agg(tuple)
s.value_counts()
Out[642]: 
(A, B)       1
(A, B, C)    1
(A, C)       1
Name: Field, dtype: int64
Sign up to request clarification or add additional context in comments.

1 Comment

this is exactly what i was looking for. Thank you!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.