0

I am trying to create an array using two lists, one of which has a list for each element. The problem is that in the first case I manage to do what I want, using np.column_stack but in the second case, although my initial lists look similar (in structure), my list of lists enters the array flattened (which is not what I need.

I am attaching two examples to replicate, on the first case, I manage to get an array, where each line has a string as first element, and a list as a second, while in the second case, I get 4 columns (the list is flattened) with no obvious reason.


Example 1

temp_list_column1=['St. Raphael',
 'Goppingen',
 'HSG Wetzlar',
 'Huttenberg',
 'Kiel',
 'Stuttgart',
 'Izvidac',
 'Viborg W',
 'Silkeborg-Voel W',
 'Bjerringbro W',
 'Lyngby W',
 'Most W',
 'Ostrava W',
 'Presov W',
 'Slavia Prague W',
 'Dicken',
 'Elbflorenz',
 'Lubeck-Schwartau',
 'HK Ogre/Miandum',
 'Stal Mielec',
 'MKS Perla Lublin W',
 'Koscierzyna W',
 'CS Madeira W',
 'CSM Focsani',
 'CSM Bucuresti',
 'Constanta',
 'Iasi',
 'Suceava',
 'Timisoara',
 'Saratov',
 'Alisa Ufa W',
 'Pozarevac',
 'Nove Zamky',
 'Aranas',
 'Ricoh',
 'H 65 Hoor W',
 'Lugi W',
 'Strands W']

temp_list_column2=[['32', '16', '16'],
 ['32', '16', '16'],
 ['27', '13', '14'],
 ['23', '9', '14'],
 ['29', '14', '15'],
 ['24', '17', '7'],
 ['30', '15', '15'],
 ['26', '12', '14'],
 ['27', '13', '14'],
 ['26'],
 ['18', '9', '9'],
 ['34', '15', '19'],
 ['30', '13', '17'],
 ['31', '13', '18'],
 ['27', '10', '17'],
 ['28', '14', '14'],
 ['24', '14', '10'],
 ['28', '12', '16'],
 ['28', '9', '19'],
 ['22', '13', '9'],
 ['30', '14', '16'],
 ['22', '14', '8'],
 ['17', '8', '9'],
 ['26'],
 ['41', '21', '20'],
 ['36', '18', '18'],
 ['10'],
 ['25', '12', '13'],
 ['27', '16', '11'],
 ['31', '15', '16'],
 ['25', '15', '10'],
 ['24', '8', '16'],
 ['28', '14', '14'],
 ['24', '13', '11'],
 ['26', '14', '12'],
 ['33', '17', '16'],
 ['26', '12', '14'],
 ['17', '12', '5']]

import numpy as np
temp_array = np.column_stack((temp_list_column1,temp_list_column2))

output

array([['St. Raphael', ['32', '16', '16']],
       ['Goppingen', ['32', '16', '16']],
       ['HSG Wetzlar', ['27', '13', '14']],
       ['Huttenberg', ['23', '9', '14']],
       ['Kiel', ['29', '14', '15']],
       ['Stuttgart', ['24', '17', '7']],
       ['Izvidac', ['30', '15', '15']],
       ['Viborg W', ['26', '12', '14']],
       ['Silkeborg-Voel W', ['27', '13', '14']],
       ['Bjerringbro W', ['26']],
       ['Lyngby W', ['18', '9', '9']],
       ['Most W', ['34', '15', '19']],
       ['Ostrava W', ['30', '13', '17']],
       ['Presov W', ['31', '13', '18']],
       ['Slavia Prague W', ['27', '10', '17']],
       ['Dicken', ['28', '14', '14']],
       ['Elbflorenz', ['24', '14', '10']],
       ['Lubeck-Schwartau', ['28', '12', '16']],
       ['HK Ogre/Miandum', ['28', '9', '19']],
       ['Stal Mielec', ['22', '13', '9']],
       ['MKS Perla Lublin W', ['30', '14', '16']],
       ['Koscierzyna W', ['22', '14', '8']],
       ['CS Madeira W', ['17', '8', '9']],
       ['CSM Focsani', ['26']],
       ['CSM Bucuresti', ['41', '21', '20']],
       ['Constanta', ['36', '18', '18']],
       ['Iasi', ['10']],
       ['Suceava', ['25', '12', '13']],
       ['Timisoara', ['27', '16', '11']],
       ['Saratov', ['31', '15', '16']],
       ['Alisa Ufa W', ['25', '15', '10']],
       ['Pozarevac', ['24', '8', '16']],
       ['Nove Zamky', ['28', '14', '14']],
       ['Aranas', ['24', '13', '11']],
       ['Ricoh', ['26', '14', '12']],
       ['H 65 Hoor W', ['33', '17', '16']],
       ['Lugi W', ['26', '12', '14']],
       ['Strands W', ['17', '12', '5']]], dtype=object)

Example 2

temp_list_column1b=['Benidorm',
 'Alpla Hard',
 'Dubrava',
 'Frydek-Mistek',
 'Karvina',
 'Koprivnice',
 'Nove Veseli',
 'Vardar',
 'Meble Elblag Wojcik',
 'Zaglebie',
 'Benfica',
 'Barros W',
 'Juvelis W',
 'Assomada W',
 'UOR No.2 Moscow',
 'Izhevsk W',
 'Stavropol W',
 'Din. Volgograd W',
 'Zvenigorod W',
 'Adyif W',
 'Crvena zvezda',
 'Ribnica',
 'Slovan',
 'Jeruzalem Ormoz',
 'Karlskrona',
 'Torslanda W']

temp_list_column2b=[['28', '14', '14'],
 ['27', '12', '15'],
 ['24', '13', '11'],
 ['24', '14', '10'],
 ['28', '17', '11'],
 ['30', '16', '14'],
 ['26', '15', '11'],
 ['38', '18', '20'],
 ['24', '13', '11'],
 ['33', '15', '18'],
 ['24', '10', '14'],
 ['18', '11', '7'],
 ['22', '9', '13'],
 ['25', '12', '13'],
 ['19', '11', '8'],
 ['24', '10', '14'],
 ['21', '9', '12'],
 ['18', '10', '8'],
 ['31', '17', '14'],
 ['29', '15', '14'],
 ['26', '14', '12'],
 ['29', '12', '17'],
 ['25', '11', '14'],
 ['33', '19', '14'],
 ['32', '14', '18'],
 ['19', '12', '7']]



import numpy as np
temp_arrayb = np.column_stack((temp_list_column1b,temp_list_column2b))

output

array([['Benidorm', '28', '14', '14'],
       ['Alpla Hard', '27', '12', '15'],
       ['Dubrava', '24', '13', '11'],
       ['Frydek-Mistek', '24', '14', '10'],
       ['Karvina', '28', '17', '11'],
       ['Koprivnice', '30', '16', '14'],
       ['Nove Veseli', '26', '15', '11'],
       ['Vardar', '38', '18', '20'],
       ['Meble Elblag Wojcik', '24', '13', '11'],
       ['Zaglebie', '33', '15', '18'],
       ['Benfica', '24', '10', '14'],
       ['Barros W', '18', '11', '7'],
       ['Juvelis W', '22', '9', '13'],
       ['Assomada W', '25', '12', '13'],
       ['UOR No.2 Moscow', '19', '11', '8'],
       ['Izhevsk W', '24', '10', '14'],
       ['Stavropol W', '21', '9', '12'],
       ['Din. Volgograd W', '18', '10', '8'],
       ['Zvenigorod W', '31', '17', '14'],
       ['Adyif W', '29', '15', '14'],
       ['Crvena zvezda', '26', '14', '12'],
       ['Ribnica', '29', '12', '17'],
       ['Slovan', '25', '11', '14'],
       ['Jeruzalem Ormoz', '33', '19', '14'],
       ['Karlskrona', '32', '14', '18'],
       ['Torslanda W', '19', '12', '7']], 
      dtype='<U19')

In the first case, shape is (38, 2), while in the second is (26, 4) (i am interested in the number of columns only). Am I missing something obvious?

1
  • 2
    column_stack first turns the lists into arrays. Test that yourself with np.array(the_list). Commented Apr 2, 2018 at 15:04

2 Answers 2

3

Your problem here seems to be that the first B list is jagged, while your second is rectangular.

Look at the difference in how Numpy converts the following two lists into Arrays (which, as @hpaulj points out, is exactly what happens when you pass them to column_stack:

In [1]: b1 = [
   ...: [1,2,3],
   ...: [2,3,4],
   ...: [3,4,5],
   ...: [4,5,6]]

In [2]: np.array(b1)
Out[2]:
array([[1, 2, 3],
       [2, 3, 4],
       [3, 4, 5],
       [4, 5, 6]])

In [3]: b2 = [
   ...: [1,2,3],
   ...: [2,3],
   ...: [3]]

In [4]: np.array(b2)
Out[4]: array([list([1, 2, 3]), list([2, 3]), list([3])], dtype=object)

Thus, when column stacking your example lists, in the first case you have a 1D array of lists that gets converted into a single column, whereas in the second case you have a 2D matrix of numbers that has 3 columns.

You should probably just not even be using Numpy's column_stack in this case, just zip the two lists together. If you want a numpy array as your final result, just np.array(list(zip(list_a, list_b)))


EDIT: In retrospect, your data structure sounds more like what's typically referred to as a DataFrame, rather than a matrix which is what Numpy is trying to give you.

import pandas as pd
data = pd.DataFrame()
data['name'] = temp_list_column1
data['numbers'] = test_list_column2

# Or
data = pd.DataFrame(list(zip(temp_list_column1, temp_list_column2)), columns=['name', 'numbers'])

Which gives you a data structure that looks like:

    name    numbers
0   John  [1, 2, 3]
1  James  [2, 3, 4]
2  Peter  [3, 4, 5]
3   Paul  [4, 5, 6]
Sign up to request clarification or add additional context in comments.

3 Comments

I would add that storing numeric + non-numeric data in object type isn't a good idea either.. a structured array might be better.
Agreed, the main problem in the OP question seems to be that he's using numpy functions but doesn't have matrix-like data; the data type sounds more like a dataframe than a matrix, so a simple Python list or Pandas DF seems more applicable than numpy arrays. Hence why just using zip seems like the right way to go here.
I was debating whether to get into the issue of constructing a 1d array of lists when all lists are the same. But your idea of using the list zip bypasses that.
1

Diagnosis

It seems like the issue is for the 2nd example, all the sublists has 3 elements while in the first example there are sublists with length 1 e.g. ['Bjerringbro W', ['26']]; the list ['26'] has only one element.

In the second case apparently np.column_stack forces to NOT HAVE lists as a cell element. In fact, we can have another discussion about why you want to see lists as cell elements which I will not go through here. Here is the solution

Special Case Solution

I assume you don't mind using pandas

import pandas as pd
series_1 = pd.Series(temp_list_column1b).to_frame(name='col1') # name it whatever you want
series_2 = pd.Series(temp_list_column2b).to_frame(name='col2') # name it whatever you want

df = pd.concat([series_1, series_2], axis=1)
# print(df) # view in pandas form
# print(df.values) # to see how it looks like as a numpy array
# print(df.values.shape) # to see how what the shape is in terms of numpy 

Generalized Solution

Assuming you have a list of such columns which is called "list_of_cols". Then:

import pandas as pd
'''
    list_of_cols: all the lists you want to combine
'''

df = pd.concat([pd.Series(temp_col).to_frame() for temp_col in list_of_cols], axis=1)

I hope this helps!

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.