14

Hi I have the following pandas Series of numpy arrays:

 datetime
    03-Sep-15     [53.5688348969, 31.2542494769, 18.002043765]
    04-Sep-15     [46.845084292, 27.0833015735, 15.5997887379]
    08-Sep-15    [52.8701581666, 30.7347431703, 17.6379377917]
    09-Sep-15    [47.9535624339, 27.7063099999, 15.9126963643]
    10-Sep-15     [51.2900606534, 29.600945626, 16.8756260105]

Do you know how I could convert it into a dataframe with 3 columns? Thanks!

4
  • What do you really have? a single series? Commented Sep 15, 2015 at 19:32
  • 1
    yes, exactly. This is now a Series of arrays. Commented Sep 15, 2015 at 19:34
  • How are you creating that? Post a runnable example please. Commented Sep 15, 2015 at 19:36
  • let me get back to this. Because I just realized the array has some NaNs that are treated as single rows. Don't spend any time on it yet. Commented Sep 15, 2015 at 19:43

3 Answers 3

19

Feeding a list of lists to pd.DataFrame is a more efficient approach:

s = pd.Series([np.array([53.5688348969, 31.2542494769, 18.002043765]),
               np.array([46.845084292, 27.0833015735, 15.5997887379]),
               np.array([52.8701581666, 30.7347431703, 17.6379377917]),
               np.array([47.9535624339, 27.7063099999, 15.9126963643]),
               np.array([51.2900606534, 29.600945626, 16.8756260105])],
              index=['03-Sep-15', '04-Sep-15', '08-Sep-15', '09-Sep-15', '10-Sep-15'])

df = pd.DataFrame(s.values.tolist(), index=s.index)

print(df)

                   0          1          2
03-Sep-15  53.568835  31.254249  18.002044
04-Sep-15  46.845084  27.083302  15.599789
08-Sep-15  52.870158  30.734743  17.637938
09-Sep-15  47.953562  27.706310  15.912696
10-Sep-15  51.290061  29.600946  16.875626

Benchmarking on Python 3.6 / Pandas 0.19:

%timeit pd.DataFrame(s.values.tolist(), index=s.index)  # 448 µs per loop
%timeit s.apply(pd.Series)                              # 1.5 ms per loop
Sign up to request clarification or add additional context in comments.

1 Comment

Not in my case at least: 2.88 s ± 40.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) <magic-timeit>:1: FutureWarning: Returning a DataFrame from Series.apply when the supplied function returns a Series is deprecated and will be removed in a future version. 851 ms ± 5.45 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
7

It won't be super-performant, but you should be able to apply(pd.Series):

>>> ser
03-Sep-15     [53.5688348969, 31.2542494769, 18.002043765]
04-Sep-15     [46.845084292, 27.0833015735, 15.5997887379]
08-Sep-15    [52.8701581666, 30.7347431703, 17.6379377917]
09-Sep-15    [47.9535624339, 27.7063099999, 15.9126963643]
10-Sep-15     [51.2900606534, 29.600945626, 16.8756260105]
dtype: object
>>> type(ser.values[0])
<class 'numpy.ndarray'>
>>> ser.apply(pd.Series)
                   0          1          2
03-Sep-15  53.568835  31.254249  18.002044
04-Sep-15  46.845084  27.083302  15.599789
08-Sep-15  52.870158  30.734743  17.637938
09-Sep-15  47.953562  27.706310  15.912696
10-Sep-15  51.290061  29.600946  16.875626

Comments

0

You can also do this:

df = pd.DataFrame(np.stack(s.tolist()), index=s.index)

I believe it's a little faster than pd.DataFrame(s.values.tolist(), index=s.index).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.