3

I have a list of list which looks like:

[['A'],
 ['America'],
 ['2017-39', '2017-40', '2017-41', '2017-42', '2017-43'],
 [10.0, 6.0, 6.0, 6.0, 1.0],
 [5.0,7.0,8.0,9.0,1.0],
 ,
 ['B'],
 ['Britan'],
 ['2017-38', '2017-39', '2017-40', '2017-41', '2017-42', '2017-43', '2017-44'],
 [41.0, 27.0, 38.0, 36.0, 33.0, 41.0, 8.0],
 [40.0, 38.0, 28.0, 27.0, 23.0, 65.0, 4.0]]

I want to convert this into a dataframe which should look like

A America     2017-39   10.0  5.0
na   na       2017-40    6.0  7.0
na   na       2017-41    6.0  8.0
na   na       2017-42    6.0  9.0
na   na       2017-43    1.0 10.0
B Britan      2017-38   41.0 40.0
na   na       2017-39   27.0 38.0
na   na       2017-40   38.0 28.0
na   na       2017-41   36.0 27.0
na   na       2017-42   33.0 23.0
na   na       2017-43   41.0 65.0
na   na       2017-44    8.0  4.0

How can I code to make it possible , as I am pretty new to python, I am having a hard time.

I will really appreciate your time and effort to help me in this regards

3 Answers 3

2

I am using groupby and re-create the column

s=pd.DataFrame(lst).T
s.columns=s.columns//5
pd.concat([pd.DataFrame(x.values) for _,x in s.groupby(level=0,axis=1)]).dropna(axis=0,thresh=1)
Out[146]: 
      0        1        2   3   4
0     A  America  2017-39  10   5
1  None     None  2017-40   6   7
2  None     None  2017-41   6   8
3  None     None  2017-42   6   9
4  None     None  2017-43   1   1
0     B   Britan  2017-38  41  40
1  None     None  2017-39  27  38
2  None     None  2017-40  38  28
3  None     None  2017-41  36  27
4  None     None  2017-42  33  23
5  None     None  2017-43  41  65
6  None     None  2017-44   8   4
Sign up to request clarification or add additional context in comments.

4 Comments

This is just brilliant +1. But probably not as fast as itertools.chain :).
Thanks Wen , brilliant I should say. With you and other guys around I can master python soon
@AhamedMoosa, Be careful, you may be on your way to mastering pandas, but this is no way equivalent to python. Just a friendly reminder :).
@jpp I am looking to master pandas at a start , yes as you rightly said python is an ocean and I may sink if I try to surf :) But you guys are brilliant :)
2
import pandas as pd
data = [['A'],
 ['America'],
 ['2017-39', '2017-40', '2017-41', '2017-42', '2017-43'],
 [10.0, 6.0, 6.0, 6.0, 1.0],
 [5.0,7.0,8.0,9.0,1.0],
 ['B'],
 ['Britan'],
 ['2017-38', '2017-39', '2017-40', '2017-41', '2017-42', '2017-43', '2017-44'],
 [41.0, 27.0, 38.0, 36.0, 33.0, 41.0, 8.0],
 [40.0, 38.0, 28.0, 27.0, 23.0, 65.0, 4.0]]

result = {}
for letters, countries, dates, val1, val2 in zip(*[iter(data)]*5):
    result[tuple(letters+countries)] = pd.DataFrame({'date':dates, 'val1':val1, 'val2':val2})
result = pd.concat(result)
print(result)

yields

                date  val1  val2
A America 0  2017-39  10.0   5.0
          1  2017-40   6.0   7.0
          2  2017-41   6.0   8.0
          3  2017-42   6.0   9.0
          4  2017-43   1.0   1.0
B Britan  0  2017-38  41.0  40.0
          1  2017-39  27.0  38.0
          2  2017-40  38.0  28.0
          3  2017-41  36.0  27.0
          4  2017-42  33.0  23.0
          5  2017-43  41.0  65.0
          6  2017-44   8.0   4.0

The main idea above is to use the "grouper idiom" zip(*[iter(data)]*5) to group the items in data in groups of 5. That way, you can use

for letters, countries, dates, val1, val2 in zip(*[iter(data)]*5):

to loop through 5 items of data at a time.


pd.concat can accept a dict of DataFrames as input and concatenate them into a single DataFrame with a MultiIndex composed of the keys of the dict. So the for-loop is used to compose the dict of DataFrames,

for letters, countries, dates, val1, val2 in zip(*[iter(data)]*5):
    result[tuple(letters+countries)] = pd.DataFrame({'date':dates, 'val1':val1, 'val2':val2})

and then

result = pd.concat(result)

produces the desired DataFrame.


Not that you could drop the last level of the MultiIndex:

In [91]: result.index = result.index.droplevel(level=-1)

In [92]: result
Out[92]: 
              date  val1  val2
A America  2017-39  10.0   5.0
  America  2017-40   6.0   7.0
  America  2017-41   6.0   8.0
  America  2017-42   6.0   9.0
  America  2017-43   1.0   1.0
B Britan   2017-38  41.0  40.0
  Britan   2017-39  27.0  38.0
  Britan   2017-40  38.0  28.0
  Britan   2017-41  36.0  27.0
  Britan   2017-42  33.0  23.0
  Britan   2017-43  41.0  65.0
  Britan   2017-44   8.0   4.0

but I wouldn't recommend this since it makes the index non-unique:

In [96]: result.index.is_unique
Out[96]: False

and this can cause future difficulties since some Pandas operations only work on DataFrames with unique indexes.

1 Comment

Thanks unutbu for your answer , suggestion and educating me. I really appreciate your help. the code worked perfectly for my goal
2

One solution is to use itertools to perform some chaining magic.

There are 2 essential idioms we use:

  1. For identifer columns, zip the lengths of data lists together with identifers.
  2. For data columns, use chain.from_iterable (assigned to chainer) to combine every 5th sublist.

In both cases, we utilise islice to avoid creating lists unnecessarily as intermediate steps.

data is defined as per @unutbu's post.

Solution

import pandas as pd
from itertools import chain, islice

chainer = chain.from_iterable

lens = list(map(len, islice(data, 2, None, 5)))

res = pd.DataFrame({'id1': list(chainer(list(j)+[np.nan]*(i-1) for i, j in
                                zip(lens, islice(data, 0, None, 5)))),
                    'id2': list(chainer(list(j)+[np.nan]*(i-1) for i, j in 
                                zip(lens, islice(data, 1, None, 5)))),
                    'date': list(chainer(islice(data, 2, None, 5))),
                    'num1': list(chainer(islice(data, 3, None, 5))),
                    'num2': list(chainer(islice(data, 4, None, 5)))})

res = res[['id1', 'id2', 'date', 'num1', 'num2']]

Result

print(res)

    id1      id2     date  num1  num2
0     A  America  2017-39  10.0   5.0
1   NaN      NaN  2017-40   6.0   7.0
2   NaN      NaN  2017-41   6.0   8.0
3   NaN      NaN  2017-42   6.0   9.0
4   NaN      NaN  2017-43   1.0   1.0
5     B   Britan  2017-38  41.0  40.0
6   NaN      NaN  2017-39  27.0  38.0
7   NaN      NaN  2017-40  38.0  28.0
8   NaN      NaN  2017-41  36.0  27.0
9   NaN      NaN  2017-42  33.0  23.0
10  NaN      NaN  2017-43  41.0  65.0
11  NaN      NaN  2017-44   8.0   4.0

2 Comments

I have checked all the solutions , this is indeed the fastest of all :) But anyways I am giving it to wen this time. Keep supporting me in my journey. I really appreciate it
@AhamedMoosa, No worries, I like Wen's solution too :).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.