8

I'm having trouble changing the header row in an existing DataFrame using pandas in python. After importing pandas and the csv file I set a header row as None in order to be able to remove duplicate dates after transposing. However this leaves me with a row header (and in fact an index column) that I do not want.

df = pd.read_csv(spreadfile, header=None)

df2 = df.T.drop_duplicates([0], take_last=True)
del df2[1]

indcol = df2.ix[:,0]
df3 = df2.reindex(indcol)

The above unimaginative code however fails on two counts. The index column is now the required one however all entries are now NaN. My understanding of python is not yet good enough to recognise what python is doing. The desired output below is what I need, any help would be greatly appreciated!

df2 before reindexing:

     0             2             3             4             5
0        NaN  XS0089553282  XS0089773484  XS0092157600  XS0092541969
1  01-May-14         131.7         165.1         151.8          88.9
3  02-May-14           131         164.9         151.7          88.5
5  05-May-14         131.1           165         151.8          88.6
7  06-May-14         129.9         163.4         151.2          87.1

df2 after reindexing:

             0    2    3    4    5
0                                 
NaN        NaN  NaN  NaN  NaN  NaN
01-May-14  NaN  NaN  NaN  NaN  NaN
02-May-14  NaN  NaN  NaN  NaN  NaN
05-May-14  NaN  NaN  NaN  NaN  NaN
06-May-14  NaN  NaN  NaN  NaN  NaN

df2 desired:

       XS0089553282  XS0089773484  XS0092157600  XS0092541969
01-May-14         131.7         165.1         151.8          88.9
02-May-14           131         164.9         151.7          88.5
05-May-14         131.1           165         151.8          88.6
06-May-14         129.9         163.4         151.2          87.1

2 Answers 2

6

Assign the columns directly:

indcol = df2.ix[:,0]
df2.columns = indcol

The problem with reindex is it'll use the existing index and column values of your df, so your passed in new column values don't exist, hence why you get all NaNs

A simpler approach to what you're trying to do:

In [147]:
# take the cols and index values of interest
cols = df.loc[0, '2':]
idx = df['0'].iloc[1:]
print(cols)
print(idx)

2    XS0089553282
3    XS0089773484
4    XS0092157600
5    XS0092541969
Name: 0, dtype: object

1    01-May-14
3    02-May-14
5    05-May-14
7    06-May-14
Name: 0, dtype: object

In [157]:
# drop the first row and the first column
df2 = df.drop('0', axis=1).drop(0)
# overwrite the index values
df2.index = idx.values
df2

Out[157]:
               2      3      4     5
01-May-14  131.7  165.1  151.8  88.9
02-May-14    131  164.9  151.7  88.5
05-May-14  131.1    165  151.8  88.6
06-May-14  129.9  163.4  151.2  87.1

In [158]:
# now overwrite the column values    
df2.columns = cols.values
df2

Out[158]:
          XS0089553282 XS0089773484 XS0092157600 XS0092541969
01-May-14        131.7        165.1        151.8         88.9
02-May-14          131        164.9        151.7         88.5
05-May-14        131.1          165        151.8         88.6
06-May-14        129.9        163.4        151.2         87.1
Sign up to request clarification or add additional context in comments.

Comments

1
In [310]:
cols = df.iloc[0 , 1:]
cols
Out[310]:
1    XS0089553282
2    XS0089773484
3    XS0092157600
4    XS0092541969
Name: 0, dtype: object

In [311]:
df.drop(0 , inplace=True)
df
Out[311]:
           0    1       2          3    4
1   01-May-14   131.7   165.1   151.8   88.9
2   02-May-14   131     164.9   151.7   88.5
3   05-May-14   131.1   165     151.8   88.6
4   06-May-14   129.9   163.4   151.2   87.1

In [312]:
df.set_index(0 , inplace=True)
df

Out[312]:
    0           1   2           3   4       
01-May-14   131.7   165.1   151.8   88.9
02-May-14   131     164.9   151.7   88.5
05-May-14   131.1   165     151.8   88.6
06-May-14   129.9   163.4   151.2   87.1

In [315]:

df
df.columns = cols
df
Out[315]:
            XS0089553282    XS0089773484    XS0092157600    XS0092541969                
01-May-14   131.7                  165.1    151.8           88.9
02-May-14   131                    164.9    151.7           88.5
05-May-14   131.1                    165    151.8           88.6
06-May-14   129.9                  163.4    151.2           87.1

4 Comments

inplace = True generates an error for me: TypeError: drop() got an unexpected keyword argument 'inplace'
this error indicates that there is no argument with name inplace for the method drop which of course is not the case , I'm not sure about this , are you sure you followed the same steps ?
I've tried again and same thing, it's really strange because set_index has the inplace argument. The only arguments for df.drop are labels, axis and level.
ok , no problem just exclude this parameter for now , but you will need to assign the result of the function to the same df like this df = df.set_index(0)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.