0

I have a dataset which is a numpy array with shape (1536 x 16 x 48). A quick explanation of these dimensions that might be helpful:

  • The dataset consists of data collected by EEG sensors at 256Hz rate (1 second = 256 measures/values);
  • 1536 values represent 6 seconds of EEG data (256 * 6 = 1536);
  • 16 is the number of electrodes used to collect data;
  • 48 is the number of samples.

In summary: i have 48 samples of 6 seconds (1536 values) of EEG data, collected by 16 electrodes.

I need to create a pandas dataframe with all this data, and therefore turn this 3D array into 2D. The depth dimension (48) can be removed if i stack all samples one above another. So the new dataset will be shaped (1536 * 48) x 16.

In addition to that, since this is a classification problem, i have a vector with 48 values that represents the class of each EEG sample. The new dataset should also has this as a "class" column, and then the real shape would be: (1536 * 48) x 16 + 1 (class).

I could easily do that looping through the depth dimension of the 3D array and concatenate everything into a 2D new one. But this looks bad since i will be dealing with many datasets like this one. Performance is an issue. I would like to know if there's any more clever way of doing it.

I tried to provide the maximum of information i could for this question, but since it is not a trivial task feel free to ask further details if needed.

Thanks in advance.

1

2 Answers 2

0

Setup

>>> import numpy as np
>>> import pandas as pd
>>> a = np.zeros((4,3,3),dtype=int) + [0,1,2]
>>> a *= 10
>>> a += np.array([1,2,3,4])[:,None,None]
>>> a
array([[[ 1, 11, 21],
        [ 1, 11, 21],
        [ 1, 11, 21]],

       [[ 2, 12, 22],
        [ 2, 12, 22],
        [ 2, 12, 22]],

       [[ 3, 13, 23],
        [ 3, 13, 23],
        [ 3, 13, 23]],

       [[ 4, 14, 24],
        [ 4, 14, 24],
        [ 4, 14, 24]]])

Split evenly along the last dimension; stack those elements, reshape, feed to DataFrame. Using the lengths of the array's dimensions simplifies the process.

>>> d0,d1,d2 = a.shape
>>> pd.DataFrame(np.stack(np.dsplit(a,d2)).reshape(d0*d2,d1))
     0   1   2
0    1   1   1
1    2   2   2
2    3   3   3
3    4   4   4
4   11  11  11
5   12  12  12
6   13  13  13
7   14  14  14
8   21  21  21
9   22  22  22
10  23  23  23
11  24  24  24
>>>

Using your shape.

>>> b = np.random.random((1536, 16, 48))
>>> d0,d1,d2 = b.shape
>>> df = pd.DataFrame(np.stack(np.dsplit(b,d2)).reshape(d0*d2,d1))
>>> df.shape
(73728, 16)
>>>

After making the DataFrame from the 3d array, add the classification column to it, df['class'] = data. - Column selection, addition, deletion

Sign up to request clarification or add additional context in comments.

Comments

0

For the numpy part

x = np.random.random((1536, 16, 48)) # ndarray with simillar shape
x = x.swapaxes(1,2) # swap axes 1 and 2 i.e 16 and 48
x = x.reshape((-1, 16), order='C') # order is important, you may want to check the docs
c = np.zeros((x.shape[0], 1)) # class column, shape=(73728, 1)
x = np.hstack((x, c)) # final dataset
x.shape

Output

(73728, 17)

or in one line

x = np.hstack((x.swapaxes(1,2).reshape((-1, 16), order='C'), c))

Finally,

x = pd.DataFrame(x)

4 Comments

I'm trying to reproduce your code, but i get the following error: TypeError: 'tuple' object is not callable. Do you have any clue what it is?
Made a small typo, c = np.zeros((x.shape(0), 1)) should've been c = np.zeros((x.shape[0], 1)). FIxed now.
That is a really good way of transforming the 3D array into 2D, but what about the part of concatenating the 48-length vector into the new array? In your example, you concatenated the c vector as a vector with 73728 values instead of 48
According to your post, the array shape should be (1536 * 48) x 16 + 1 = 73728x17. So, 73728 samples, 16 feature columns, and a classification column. When you say concatenating the 48-length vector, which dimension are you referring to?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.