0

I used numpy C api in C++ and got the following array in python:

>>> my_array
array([array([20211101., 20211101., 20211101., 20211101., 20211101.]),
       array([10601155, 10603088, 10603982, 10600983, 10603283], dtype=int32),
       array([30000011, 30000021, 30000031, 30000041, 30000051], dtype=int32),
       array([93003000., 93003000., 93003000., 93003000., 93003000.]),
       array([-1., -1., -1.,  1., -1.]),
       array([b'Sell', b'Sell', b'Sell', b'Buy', b'Sell'], dtype='|S4'),
       array([b'SQZ', b'SQZ', b'SQZ', b'SQZ', b'SQZ'], dtype='|S4'),
       array([ 100, 1100,  100,  200,  200], dtype=int32),
       array([34.19,  9.97, 29.46,  8.96, 27.85]),
      array([b'5', b'0', b'5', b'0', b'0'], dtype='|S4')], dtype=object)

The shape of this array is

>>> my_array.shape
(10,)

My purpose is to switch this array to a 2D numpy array and create a dataframe by pd.DataFrame(data=my_array). But I failed to do it because I am supposed to input some numpy array like

np.array([[...],[...],[...],...])

not

array([array([...]),array([...]),array([...]),...])

I understand that I can use a for loop to get the dataframe, but the speed would be very slow if the dataset is large. So is there any method to convert my array to a real 2D numpy array and get a dataframe object?

8
  • 1
    can you try df = pd.DataFrame(a.T) asuming a the array? Commented Apr 14, 2022 at 14:45
  • @mozway it gives a dataframe that only has one column and 10 rows. Each row is an array Commented Apr 14, 2022 at 15:00
  • strange, I tried on the provided example and it gives me a shape (5,10) Commented Apr 14, 2022 at 15:04
  • @mozway So for my case, df = pd.DataFrame(my_array.T) and df = pd.DataFrame(my_array) give me the same (1, 10) result. Is this because each element of my array is also an array? Commented Apr 14, 2022 at 15:23
  • 1
    @mozway, a simple copy-n-paste makes an array from a list of arrays. If they are all the same shape, the result is 2-d array with single element objects (or as high a dimensional array as it can). You have to use something like my example [135][136] to construct a 1d array containing the arrays. If the subarrays vary in size you get the 1d array (and ragged array warning), but you shouldn't count on it. Commented Apr 15, 2022 at 4:34

1 Answer 1

1

Making a list from your sample:

In [132]: alist
Out[132]: 
[array([20211101., 20211101., 20211101., 20211101., 20211101.]),
 array([10601155, 10603088, 10603982, 10600983, 10603283], dtype=int32),
 array([30000011, 30000021, 30000031, 30000041, 30000051], dtype=int32),
 array([93003000., 93003000., 93003000., 93003000., 93003000.]),
 array([-1., -1., -1.,  1., -1.]),
 array([b'Sell', b'Sell', b'Sell', b'Buy', b'Sell'], dtype='|S4'),
 array([b'SQZ', b'SQZ', b'SQZ', b'SQZ', b'SQZ'], dtype='|S4'),
 array([ 100, 1100,  100,  200,  200], dtype=int32),
 array([34.19,  9.97, 29.46,  8.96, 27.85]),
 array([b'5', b'0', b'5', b'0', b'0'], dtype='|S4')]

Using 'list transpose' to make a list of tuples, one per "row/record" of the frame:

In [133]: df = pd.DataFrame([tuple(x) for x in zip(*alist)])
In [134]: df
Out[134]: 
            0         1         2           3  ...       6     7      8     9
0  20211101.0  10601155  30000011  93003000.0  ...  b'SQZ'   100  34.19  b'5'
1  20211101.0  10603088  30000021  93003000.0  ...  b'SQZ'  1100   9.97  b'0'
2  20211101.0  10603982  30000031  93003000.0  ...  b'SQZ'   100  29.46  b'5'
3  20211101.0  10600983  30000041  93003000.0  ...  b'SQZ'   200   8.96  b'0'
4  20211101.0  10603283  30000051  93003000.0  ...  b'SQZ'   200  27.85  b'0'

[5 rows x 10 columns]

Since the subarrays are all the same length, making an object array from it requires some special handling. We can't just copy-n-paste your display.

In [135]: arr = np.zeros(len(alist),object)
In [136]: arr[:] = alist

This makes a 1d array like yours, which will work as with the list

In [138]: df = pd.DataFrame([tuple(x) for x in zip(*arr)])

pandas may have another way of creating a frame with one column/series per array of a list, but this is best I can do from a numpy base.

Sign up to request clarification or add additional context in comments.

1 Comment

yes lists work for me. Thank you!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.