0

I am filling out a dataframe using the content of a list of lists such as:

desc_prep=[['aesthet', 'abod'], [['arb', 'abod'], ['forest', 'abod']]]

col_names =  ['desc_name','desc_avg_vector']
df_desc_prep = pd.DataFrame(columns=col_names)    
df_desc_prep['desc_name']=desc_prep

At this point of time I am getting the following:

                         desc_name
0                [aesthet, abod]
1  [[arb, abod], [forest, abod]]

When iterating the dataframe with iteritems to get a tuple with the column name and the content as a Series:

for index, value in df_desc_prep.iteritems():
    print("index: ", index)#-->index:  desc_name
    print("value: ", value)#-->value:  0                  [aesthet, abod]
    print("value[0]:", value[0])#['aesthet', 'abod']
    print("value[1]:", value[1])#[['arb', 'abod'], ['forest', 'abod']]
    if isinstance(value[0], list):#->value[0]:  ['aesthet', 'abod']

When iterating using iterrows() to get a Series for each row:

for index, value in df_desc_prep.iterrows():
    print("index: ", index)#-->index:  0
    print("value: ", value)#-->value:  desc_name    [aesthet, abod]
    if isinstance(value[0], list):#-->value[0]:  ['aesthet', 'abod'], value[1]: IndexError: index out of bounds

I was expecting to get value[0] as aesthet and value[1] as abod. Instead I am getting IndexError: index out of bounds when getting value[1].

How can I get the behaviour of iterating over the dataframe and getting value[0]=aesthet when iterating over ['aesthet', 'abod'] and value[0]=['arb', 'abod'] when iterating over [['arb', 'abod'], ['forest', 'abod']]

1 Answer 1

1
...
for index, value in df_desc_prep.iterrows():
    print(value[0][0])
    print(value[0][1])

aesthet
abod
['arb', 'abod']
['forest', 'abod']

for index, value in df_desc_prep.iterrows():
    print(value['desc_name'][0])
    print(value['desc_name'][1])

aesthet
abod
['arb', 'abod']
['forest', 'abod']
Sign up to request clarification or add additional context in comments.

2 Comments

Well done!!! Works as a charm. A question, why you have to indicate the name of the column even when is a dataframe of one column (a series)? For instance, col_names = ['desc_name'] df_desc_prep =pd.DataFrame(columns=col_names)
@JuanPerez so that you can see that we can access elements with column_name also in case there are more columns.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.