Finding Column and Index in pandas dataframe

Question

I have a pandas dataframe:

  col1 | col2 | col3 | col4 |
0.  A  | B    |    C |     G|
1.  I  | J    |    S |     D|
2.  O  | L    |    C |     G|
3.  A  | B    |    H |     D|
4.  H  | B    |    C |     P|

# reproducible
import pandas as pd
from string import ascii_uppercase as uc  # just for sample data
import random  # just for sample data

random.seed(365)
df = pd.DataFrame({'col1': [random.choice(uc) for _ in range(20)],
                   'col2': [random.choice(uc) for _ in range(20)],
                   'col3': [random.choice(uc) for _ in range(20)],
                   'col4': [random.choice(uc) for _ in range(20)]})

I'm looking for a function like this:

func('H')

which will return all the names of indexes and columns where "H" is. Any ideas?

Shubham Sharma · Accepted Answer · 2020-06-17 17:21:32Z

3

Use, np.argwhere along with df.to_numpy:

rows, cols = np.argwhere(df.to_numpy() == 'H').T
indices = list(zip(df.index[rows], df.columns[cols]))

Or,

indices = df.where(df.eq('H')).stack().index.tolist()

# print(indices)
[(3, 'col3'), (4, 'col1')]

timeit comparision of all the answers:

df.shape
(50000, 4)

%%timeit -n100 @Shubham1
rows, cols = np.argwhere(df.to_numpy() == 'H').T
indices = list(zip(df.index[rows], df.columns[cols])) 
8.87 ms ± 218 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


%%timeit -n100 @Scott
r,c = np.where(df == 'H')
_ = list(zip(df.index[r], df.columns[c])) 
17.4 ms ± 510 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


%%timeit -n100 @Shubham2
indices = df.where(df.eq('H')).stack().index.tolist()
26.8 ms ± 165 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


%%timeit -n100 @Roy
df.index.name = "inx"
t = df.reset_index().melt(id_vars = "inx")
_ = t[t.value == "H"]
29 ms ± 656 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

edited Jun 17, 2020 at 17:21

answered Jun 17, 2020 at 16:43

Shubham Sharma

71.8k6 gold badges26 silver badges58 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Scott Boston Over a year ago

Out of curiosity try.. r,c = np.where(df.to_numpy() == 'H')

Shubham Sharma Over a year ago

Its identical to using argwhere, nearly same timings ;).

Roy2012 · Accepted Answer · 2020-06-17 16:39:25Z

2

One solution would be to use melt:

df.index.name = "inx"
t = df.reset_index().melt(id_vars = "inx")
print(t[t.value == "H"])

The output is:

    inx variable value
4     4     col1     H
13    3     col3     H

You can now easily extract columns and indices.

answered Jun 17, 2020 at 16:39

Roy2012

12.7k3 gold badges28 silver badges38 bronze badges

1 Comment

Roy2012 Over a year ago

does this answer your question?

Scott Boston · Accepted Answer · 2020-06-17 18:21:52Z

2

Use np.where and indexing (updated to add performance):

r, c = np.where(df.to_numpy() == 'H')
list(zip(df.index[r], df.columns[c]))

Output:

[(3, 'col3'), (4, 'col1')]

edited Jun 17, 2020 at 18:21

answered Jun 17, 2020 at 16:46

Scott Boston

154k15 gold badges160 silver badges207 bronze badges

Collectives™ on Stack Overflow

Finding Column and Index in pandas dataframe

3 Answers 3

2 Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related