Get data from Pandas DataFrame using column values

Question

>>> df = pd.DataFrame({'num_legs': [4, 2], 'num_wings': [0, 2]},
...                   index=['dog', 'hawk'])
>>> df
      num_legs  num_wings
dog          4          0
hawk         2          2
>>> for row in df.itertuples():
...     print(row)
...
Pandas(Index='dog', num_legs=4, num_wings=0)
Pandas(Index='hawk', num_legs=2, num_wings=2)

I am parsing an excel sheet using pandas.DataFrame.itertuples which will give me a pandas.DataFrame over each iteration. Consider the pandas.DataFrame returned in each iteration as shown above.

Now off the each data frame Pandas(Index='dog', num_legs=4, num_wings=0) I would like to access the values using the keyword num_legs however upon using the same I get the below exception.

TypeError: tuple indices must be integers, not str

Could someone help on how to retrieve the data from the data frames using the column headers directly.

Mohit Musaddi · Accepted Answer · 2019-02-19 11:31:34Z

4

I faced the same error when using a variable.

v = 'num_legs'
for row in df.itertuples():
    print(row[v])

TypeError: tuple indices must be integers or slices, not str

To use df.itertuples() and use the attribute name as a variable.

v = 'num_legs'
for row in df.itertuples():
    print(getattr(row, v))

At the end df.itertuples() is faster than df.iterrows().

answered Feb 19, 2019 at 11:31

Mohit Musaddi

1432 silver badges9 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Krishna Over a year ago

How did you evaluated to conclude that itertuples is faster than iterrows ?

Mohit Musaddi Over a year ago

You can check this link and last week I have tested the same with large dataframes.

Mohit Musaddi Over a year ago

And also this

meW · Accepted Answer · 2019-02-19 11:09:36Z

1

Here:

for row in df.itertuples():
    print(row.num_legs)
  # print(row.num_wings)   # Other column values

# Output
4
2

edited Feb 19, 2019 at 11:09

answered Feb 19, 2019 at 11:07

meW

3,97710 silver badges27 bronze badges

5 Comments

Krishna Over a year ago

accepting this since I was using itertuples to iterate over data frames.

Krishna Over a year ago

I tried to use the same when reading a csv using read_csv however my first row after comments in csv is not being treated as column names and I get exception while using row["columnHeader"]

meW Over a year ago

While that's a separate question which you should raise, but as a hint play with header argument.

Krishna Over a year ago

Tried to use header argument , unfortunately the csv have extra column data apart from column header and hence upon using the header argument the parsing fails

meW Over a year ago

@darth_coder Then I suggest you should ask a separate question, by listing only this problem with proper explanation.

Mohamed Thasin ah · Accepted Answer · 2019-02-19 11:13:35Z

1

you could use iterrows(),

for u,row in df.iterrows():
    print(u)
    print (row)
    print (row['num_legs'])

O/P:

dog
num_legs     4
num_wings    0
Name: dog, dtype: int64
4
hawk
num_legs     2
num_wings    2
Name: hawk, dtype: int64
2

answered Feb 19, 2019 at 11:13

Mohamed Thasin ah

11.2k11 gold badges65 silver badges120 bronze badges

1 Comment

Krishna Over a year ago

This answer is also correct and I would now use iterrows while coding rather than itertuples since the way data is accessed mimics array index operator.

Collectives™ on Stack Overflow

Get data from Pandas DataFrame using column values

3 Answers 3

3 Comments

5 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

5 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related