2

I have a pandas dataframe:

  col1 | col2 | col3 | col4 |
0.  A  | B    |    C |     G|
1.  I  | J    |    S |     D|
2.  O  | L    |    C |     G|
3.  A  | B    |    H |     D|
4.  H  | B    |    C |     P|

# reproducible
import pandas as pd
from string import ascii_uppercase as uc  # just for sample data
import random  # just for sample data

random.seed(365)
df = pd.DataFrame({'col1': [random.choice(uc) for _ in range(20)],
                   'col2': [random.choice(uc) for _ in range(20)],
                   'col3': [random.choice(uc) for _ in range(20)],
                   'col4': [random.choice(uc) for _ in range(20)]})

I'm looking for a function like this:

func('H')

which will return all the names of indexes and columns where "H" is. Any ideas?

0

3 Answers 3

3

Use, np.argwhere along with df.to_numpy:

rows, cols = np.argwhere(df.to_numpy() == 'H').T
indices = list(zip(df.index[rows], df.columns[cols]))

Or,

indices = df.where(df.eq('H')).stack().index.tolist()

# print(indices)
[(3, 'col3'), (4, 'col1')]

timeit comparision of all the answers:

df.shape
(50000, 4)

%%timeit -n100 @Shubham1
rows, cols = np.argwhere(df.to_numpy() == 'H').T
indices = list(zip(df.index[rows], df.columns[cols])) 
8.87 ms ± 218 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


%%timeit -n100 @Scott
r,c = np.where(df == 'H')
_ = list(zip(df.index[r], df.columns[c])) 
17.4 ms ± 510 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


%%timeit -n100 @Shubham2
indices = df.where(df.eq('H')).stack().index.tolist()
26.8 ms ± 165 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


%%timeit -n100 @Roy
df.index.name = "inx"
t = df.reset_index().melt(id_vars = "inx")
_ = t[t.value == "H"]
29 ms ± 656 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Sign up to request clarification or add additional context in comments.

2 Comments

Out of curiosity try.. r,c = np.where(df.to_numpy() == 'H')
Its identical to using argwhere, nearly same timings ;).
2

One solution would be to use melt:

df.index.name = "inx"
t = df.reset_index().melt(id_vars = "inx")
print(t[t.value == "H"])

The output is:

    inx variable value
4     4     col1     H
13    3     col3     H

You can now easily extract columns and indices.

1 Comment

does this answer your question?
2

Use np.where and indexing (updated to add performance):

r, c = np.where(df.to_numpy() == 'H')
list(zip(df.index[r], df.columns[c]))

Output:

[(3, 'col3'), (4, 'col1')]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.