1

I have been exploring the titanic dataset. I am trying to create a dataframe which will have the ages of the people who survived the titanic sinking, and those who didn't, in two separate columns.

    train = pd.read_csv('train.csv')
    test = pd.read_csv('test.csv')    
    whole = pd.concat([train, test])
    df = pd.DataFrame({'survived': whole['Age'][whole['Survived'] == 1],
                       'died': whole['Age'][whole['Survived'] == 0]})

But I am getting this error

pandas.indexes.base.InvalidIndexError: Reindexing only valid with uniquely valued Index objects

What am I doing wrong?

4
  • It runs without an error on pandas 0.20.1. Commented May 28, 2017 at 18:00
  • Change this : whole = pd.concat([train, test]) to whole = pd.concat([train, test]).reset_index(drop=True) Commented May 28, 2017 at 18:00
  • @Nain Yes, it worked. Can you explain what was the problem? Commented May 28, 2017 at 18:07
  • @ayhan I was using pandas version 0.19.2 Upgrading to 0.20.1 did not work for me. Commented May 28, 2017 at 18:08

1 Answer 1

3

Make this change in your code whole = pd.concat([train, test]).reset_index(drop=True)

Sign up to request clarification or add additional context in comments.

3 Comments

we can use: pd.concat([train, test], ignore_index=True) instead ;)
@MaxU This works too. What happens when you set ignore_index to True?
pd.concat will create a new default index (np.arange(len(concatenated_df))) for you, so it will not need to join two existing indexes and then again drop it and create a new one...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.