Create index from pandas DataFrame Columns

Question

I have a DataFrame that looks like this (where 'ID' is the name of the index):

                      VAF
ID  
chr1-115227855-T-A  0.002491
chr1-115227855-T-C  0.005449
chr1-115227856-C-A  0.000466
chr1-115227856-C-G  0.000311
chr1-115227856-C-T  0.002331

And a second DataFrame that looks like this:

    Chrom   Loc WT  Var Change  ConvChange  AO  DP  VAF IntEx   Gene    Upstream    Downstream  Individual
0   chr1    115227855   T   C   T>C T>C 43  16155   0.00266171  TIII    TIIIa   NaN NaN 1
1   chr1    115227856   C   T   C>T C>T 25  16179   0.00154521  TIII    TIIIa   NaN NaN 1
2   chr1    115227857   C   T   C>T C>T 20  16178   0.00123625  TIII    TIIIa   NaN NaN 1
3   chr1    115227858   A   T   A>T T>A 29  16178   0.00179256  TIII    TIIIa   NaN NaN 1
4   chr1    115227880   C   T   C>T C>T 18  16150   0.00111455  TIII    TIIIa   NaN NaN 1

I would like to make the second DataFrame look like the first. I have tried setting a new index like this:

df2.set_index(['Chrom','Loc','WT','Var']).VAF

But this just give me a multiple indexed DataFrame.

Is there a way to do this?

piRSquared · Accepted Answer · 2018-08-10 18:07:58Z

`apply` a `format_map`

fmt = '{Chrom}-{Loc}-{WT}-{Var}'.format_map
df[['VAF']].set_index(df.apply(fmt, 1).rename('ID'))

                         VAF
ID                          
chr1-115227855-T-C  0.002662
chr1-115227856-C-T  0.001545
chr1-115227857-C-T  0.001236
chr1-115227858-A-T  0.001793
chr1-115227880-C-T  0.001115

one-line

because it's cool ¯\_(ツ)_/¯

df[['VAF']].set_index(df.apply('{Chrom}-{Loc}-{WT}-{Var}'.format_map, 1).rename('ID'))

Explanation

Create a function that takes a dictionary and passes its key:value pairs as parameters to used in a formatting string. Notice that 'Loc' can be str or int as format/format_map uses the string representation.

fmt = '{Chrom}-{Loc}-{WT}-{Var}'.format_map

Make a new series object by applying the function to each row of df using df.apply with axis=1. In this case, each row will be passed as a pandas.Series and can be processed in a dictionary context. That's perfect for format_map. I'll end up renaming the series to 'ID' to match OP's output.

idx = df.apply(fmt, 1).rename('ID')

Now if we use a pandas.Series within a set_index, Pandas will align the existing index with the index of the passed series... which is fine.

Use a double square bracket to slice the columns [['VAF']] to make sure we keep a dataframe with the columns equal to ['VAF']. Otherwise, if we used df['VAF']we would return a series object whose name is 'VAF'. Also, pandas.Series doesn't have a set_index method and pandas.DataFrame does.

df[['VAF']].set_index(idx)

                         VAF
ID                          
chr1-115227855-T-C  0.002662
chr1-115227856-C-T  0.001545
chr1-115227857-C-T  0.001236
chr1-115227858-A-T  0.001793
chr1-115227880-C-T  0.001115

We could have done this to get a series

df.set_index(idx)['VAF']

ID
chr1-115227855-T-C    0.002662
chr1-115227856-C-T    0.001545
chr1-115227857-C-T    0.001236
chr1-115227858-A-T    0.001793
chr1-115227880-C-T    0.001115
Name: VAF, dtype: float64

See! Same data, but now a series whose name is 'VAF'

jezrael · Accepted Answer · 2018-08-10 17:47:41Z

4

First join columns together to Series, set_index, change index name by rename_axis and select column VAF by double [] to one column DataFrame:

s = df['Chrom'] + '-' + df['Loc'].astype(str) + '-' +  df['WT'] + '-' + df['Var']

df1 = df.set_index(s).rename_axis('ID')[['VAF']]
print (df1)
                         VAF
ID                          
chr1-115227855-T-C  0.002662
chr1-115227856-C-T  0.001545
chr1-115227857-C-T  0.001236
chr1-115227858-A-T  0.001793
chr1-115227880-C-T  0.001115

answered Aug 10, 2018 at 17:47

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

1 Comment

PMende Over a year ago

This is likely to be much faster by avoiding apply.

Collectives™ on Stack Overflow

Create index from pandas DataFrame Columns

2 Answers 2

`apply` a `format_map`

one-line

Explanation

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

apply a format_map

one-line

Explanation

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related

`apply` a `format_map`