1

I have a pandas DataFrame that contains some array columns. What is the recommended way to index some of these columns by different position indices? For example, from the array column named l I need the second elements, from the array column named a I need the first elements. The results should be a new DataFrame. Array column can either contain a Python list or a Numpy array, but this probably does not matter.

I have three solutions, but I don't really like any of them.

df= pd.DataFrame({'l': [[1, 2, 4], [3, 2, 0, 10]], \
                  'a':[np.array(["foo", "bar", "baz"]), np.array(["qux", "quux"])], \
                  'dontcare': [10, 20]})
               l                a  dontcare
0      [1, 2, 4]  [foo, bar, baz]        10
1  [3, 2, 0, 10]      [qux, quux]        20

Solution 1, with str and join

df['l'].str[1].to_frame('l').join(df['a'].str[0])
   l    a
0  2  foo
1  2  qux

Solution 2, with the function apply and creating Series

df.apply(lambda row: pd.Series([row['l'][1], row['a'][0]], index=['l', 'a']), axis=1)

Solution 3, with apply and broadcast.

df[['l', 'a']].apply(lambda row: [row['l'][1], row['a'][0]], axis=1, result_type='broadcast')

We can assume that the output column names match the input column names and we dont need multiple elements of any array column.

1 Answer 1

1

I think it depends.

First solution is most general, working always if indices not exist - then returned NaNs. But it is also reason for slowiest solution if large DataFrame.

print (df['l'].str[3].to_frame('l').join(df['a'].str[2]))
      l    a
0   NaN  baz
1  10.0  NaN

Another solution with apply should be faster, but fail, if value not exist.

print (df.apply(lambda row: pd.Series([row['l'][3], row['a'][2]], index=['l', 'a']), axis=1))

IndexError: ('list index out of range', 'occurred at index 0')


Another idea if always exist values in lists is use list comprehension (but fail similar like apply, if not exist) with *c for tail:

df= pd.DataFrame({'l': [[1, 2, 4], [3, 2, 0, 10]], \
                  'a':[np.array(["foo", "bar", "baz"]), np.array(["qux", "quux"])], \
                  'dontcare': [10, 20],
                   's': [10, 20], 
                   'b': [10, 20]})
print (df)
               l                a  dontcare   s   b
0      [1, 2, 4]  [foo, bar, baz]        10  10  10
1  [3, 2, 0, 10]      [qux, quux]        20  20  20

df = pd.DataFrame([(a[1], b[0]) for a,b, *c in df.values], columns=['l', 'a'])
print (df)
   l    a
0  2  foo
1  2  qux

Or select column by list for processing:

df = pd.DataFrame([(a[1], b[0]) for a,b in df[['l','a']].values], columns=['l', 'a'])
Sign up to request clarification or add additional context in comments.

6 Comments

Thanks for your comment on indexing out-of-bound. Please change the code of your last idea, to avoid modifying the original DataFrame. pd.DataFrame([(a[1], b[0]) for a,b,c in df.values], columns=['l', 'a']). Also, do you need to list all columns? My DataFrame may contain 100 columns.
@FerencBodon - If use python 3, then is possible use *c for tail - link, edited answer.
Your solution is definitely worth listing here - it has lessons to learn. Nevertheless, we need to acknowledge that (1) you either repeat the column names three times or you introduce other names, a, b, c, (2) the *c based solution is sensitive to the column order, i.e. if tomorrow somebody changes the column order then the code will silently return false values.
@FerencBodon - hmm, also is not possible use df = pd.DataFrame([(a[1], b[0]) for a,b in df[['l','a']].values], columns=['l', 'a']) ? Because generally what is logic for selecting? Get all columns with lists and select e.g. first value of lists?
you are right. I edited my post to make it more specific. Theoretically, we have a list of column name, index pairs, .e.g from column l, I am interested in the second element, for column a i need the first elements.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.