0

I have a dataset that looks like below:

Zn  Pb    Ag  Cu  Mo   Cr  Ni  Co   Ba
87   7  0.02  42   2   57  38  14  393
70   6  0.02  56   2   27  29  20  404
75   5  0.02  69   2   44  23  17  417
70   6  0.02  54   1   20  19  12  377

I want to create a pandas dataframe out of this dataset. I have written the function below:

def correlation_iterated(raw_data,element_concentration):

    columns = element_concentration.split()
    df1 = pd.DataFrame(columns=columns)
   
    
    data1=[]
    selected_columns = raw_data.loc[:, element_concentration.split()].columns   
    for i in selected_columns:
        for j in selected_columns:
            # another function that takes 'i' and 'j' and returns 'a'
            zipped1 = zip([i], a)
            data1.append(dict(zipped1))
            
        
        
    df1 = df1.append(data1,True)
        

    print(df1)

This function is supposed to do the calculations for each element and create a 9 by 9 pandas dataframe and store each calculation in each cell. But I get the following:

          Zn  Pb  Ag  Cu  Mo  Cr  Ni  Co        Ba
0   1.000000 NaN NaN NaN NaN NaN NaN NaN       NaN
1   0.460611 NaN NaN NaN NaN NaN NaN NaN       NaN
2   0.127904 NaN NaN NaN NaN NaN NaN NaN       NaN
3   0.276086 NaN NaN NaN NaN NaN NaN NaN       NaN
4  -0.164873 NaN NaN NaN NaN NaN NaN NaN       NaN
..       ...  ..  ..  ..  ..  ..  ..  ..       ...
76       NaN NaN NaN NaN NaN NaN NaN NaN  0.113172
77       NaN NaN NaN NaN NaN NaN NaN NaN  0.027251
78       NaN NaN NaN NaN NaN NaN NaN NaN -0.036409
79       NaN NaN NaN NaN NaN NaN NaN NaN  0.041396
80       NaN NaN NaN NaN NaN NaN NaN NaN  1.000000

[81 rows x 9 columns] 

which is basically calculating the results of the first column and storing them in just the first column, then doing the calculations and appending new rows to the column. How can I program the code in a way that appends new calculations to the next column when finished with one column? I want sth like this:

    Zn         Pb         Ag Cu Mo Cr Ni Co Ba
0   1.000000   0.460611   ...
1   0.460611   1.000000   ...
2   0.127904   0.111559   ...
3   0.276086   0.303925   ...
4  -0.164873  -0.190886   ...
5   0.402046   0.338073   ...
6   0.174774   0.096724   ...
7   0.165760  -0.005301   ...
8  -0.043695   0.174193   ...

[9 rows x 9 columns]

2
  • I don't know that I follow why you're zipping things inside your function? You're zipping a single item (the column) as a list, with the result, and converting it to a dictionary, essentially with one element, for every row/column in your final data frame? Is it that the "rows", if they were named, also correspond to the columns? Commented Aug 18, 2021 at 21:52
  • @sequoia I'm basically doing the following in the nested for-loop: doing the calculation, zipping the column header (e.g. Zn) with the result, then converting them to a dictionary so there would a key, i.e. column header, with some values, i.e. calculations. Then making a dataframe with these dictionaries. Commented Aug 18, 2021 at 21:58

1 Answer 1

2

Could you not just do something like this:

def correlation_iterated(raw_data,element_concentration):

    columns = element_concentration.split()
    
    data = {}
    selected_columns = raw_data.loc[:,columns].columns   
    for i in selected_columns:
        temp = []
        for j in selected_columns:
            # another function that takes 'i' and 'j' and returns 'a'
            temp.append(a)

        data[i] = temp
            
    df = pd.DataFrame(data)
    print(df)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.