Selecting a subset of columns from a dataframe using multi index

Question

I have a dataframe that contains 3 channels of measured data recorded at various depths.

       5     5        5       10     10     10
       x     y        z       x       y     z
1   -22.2    0.9    -88.6   -124.8  -76.7    83.2
2   -94.7   -67.9   -162.6  -200.8  -159.0   2.2
3   -128.7  -99.7   -196.4  -248.5  -219.8  -46.8
4   -127.8  -98.4   -195.1  -256.4  -239.1  -55.7
5   -141.0  -110.9  -208.8  -275.2  -265.7  -76.9
6   -142.1  -111.5  -209.6  -280.7  -276.3  -83.3
7   -147.1  -116.0  -214.6  -287.8  -286.0  -91.6
8   -149.2  -117.8  -216.7  -291.5  -290.9  -96.0

The dataframe is multi indexed using a repeating sequence of X, Y and Z (for each of the 3 components) and a floating point depth, as follows:

c = list(itertools.repeat(['x','y', 'z'], n))
col_a = list(itertools.chain(*c))

col_b = natsorted (depths * 3)

df.columns = [cola, colb]

Where n is the number of depths and depths is a user defined list of floats describing the depth of each measurement (5 and 10 in the example table above).

I would like to be able to create subsets of the data (to write to csv or to plot on the screen) from either of the column index levels. Selecting the component (X, Y or Z) isn't an issue.

x1 = df['x']
x1.to_csv(x_out.csv')

However, selecting all columns from a particular depth doesn't work

x1 = df['10']

I have tried various forms .ix and .loc but I think that the problem may lie in the float data type of the "depth" coumns key.

My question is, is there a way to select the subset based upon a column key of floating point values or would I be better of using a different method here?

Usually one would make a depth column and only have one each of x, y, and z. Then you could select df[df.depth == 10] — U2EF1
– U2EF1, Commented Jan 10, 2014 at 11:50
The data is made up of 7120 samples (rows) from 3 sensors. at each depth, 3 columns of data are produced (X,Y,Z @5 m; X,Y,Z @10 m etc.). A depth column wouldn't allow for all of the samples from each measured channel to be labeled at the depth it was collected, would it? — Tim Nixon
– Tim Nixon, Commented Jan 10, 2014 at 12:08
it would. Your depth column would look like [5,5,5,5,5,10,10,10,10,10,15,15,15,51,15]. — U2EF1
– U2EF1, Commented Jan 10, 2014 at 18:26

HYRY · Accepted Answer · 2014-01-10 13:08:23Z

1

Try this:

import numpy as np
import pandas as pd
import itertools

c = list(itertools.repeat(['x','y', 'z'], 3))
col_a = list(itertools.chain(*c))

depths = [5.0, 5.0, 5.0, 10.0, 10.0, 10.0, 20.0, 20.0, 20.0]
names = list("xyzxyzxyz")

df = pd.DataFrame(np.random.rand(8, 9))
df.columns = pd.MultiIndex.from_arrays((depths, names))
print df[10]

output:

          x         y         z
0  0.767859  0.274721  0.986447
1  0.166864  0.143640  0.896246
2  0.029581  0.951677  0.626415
3  0.822003  0.358323  0.061943
4  0.764663  0.955426  0.831934
5  0.192194  0.001171  0.181386
6  0.649342  0.186907  0.109016
7  0.360859  0.163483  0.597824

to select "x":

df.xs("x", 1, level=1)

output:

         5         10        20
0  0.075749  0.767859  0.691237
1  0.305108  0.166864  0.595809
2  0.432526  0.029581  0.317391
3  0.410563  0.822003  0.884315
4  0.865121  0.764663  0.808828
5  0.590033  0.192194  0.657932
6  0.658829  0.649342  0.006082
7  0.677408  0.360859  0.320102

edited Jan 10, 2014 at 13:08

answered Jan 10, 2014 at 12:29

HYRY

97.8k28 gold badges197 silver badges192 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Tim Nixon Over a year ago

Thanks for replying. This works great with the depth key but gives a key error with print df['x'] . Is this because the column keys are input in the opposite order to my original code above?

HYRY Over a year ago

I modified the answer, please check it.

Brian Wylie · Accepted Answer · 2014-01-10 15:39:10Z

0

I agree with @U2EF1. For example, lets take the first row from your data above and make it two rows based on the depth value

       x     y        z     depth
1   -22.2    0.9    -88.6   5
2   -124.8  -76.7    83.2   10

you can then do lots of commands in pandas to organize the data based on depth.

df[df.depth == x] (as U2EF1 suggested)
df.groupby('depth')  # This + unstack() can be great for plotting
df['depth'].value_counts()   # I always use this for sanity checks

edited Jan 10, 2014 at 15:39

answered Jan 10, 2014 at 12:38

Brian Wylie

2,67030 silver badges30 bronze badges

Collectives™ on Stack Overflow

Selecting a subset of columns from a dataframe using multi index

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related