0

I'm very new to Python, so apologies in advance if my question has already been asked.

I have a large dataset, k_cc, that contains degree sequences for different years. Sometimes, the length of the degree sequences for each year vary. I am trying to generate a series of configuration models using these degree sequences over all years present in the data, so that I can extract a couple of measures I need for my analyses. I know how to run the code for one year, but I don't know how to loop over the years, since their lengths vary.

Below is a reproducible example of my problem, shown for one year.

import networkx as nx
import pandas as pd

# Data
k_cc = {'degree':  [4,4,6,3,7,8,6,3,5,1,4,2,8,9,4],
        'Year': [1990, 1990, 1990, 1991, 1991, 1991, 1992, 1992, 1992, 1992, 1992, 1993, 1993, 1993, 1994]}
k_cc = pd.DataFrame(k_cc)
k_cc
Out[13]: 
    degree  Year
0        3  1990
1        4  1990
2        6  1990
3        3  1991
4        7  1991
5        8  1991
6        6  1992
7        3  1992
8        5  1992
9        1  1992
10       4  1992
11       2  1993
12       8  1993
13       9  1993
14       4  1994
# Analyses for one year
k_cc_1990 = k_cc[k_cc['Year']==1990]
k_cc_1990 = k_cc_1990["degree"]
k_cc_1990 = k_cc_1990.values.tolist()

# Generate a configuration model
net_meas_random = pd.DataFrame(columns = ['cluscoef','avlen'])

for i in range(10):                                   
    cm = nx.configuration_model(k_cc_1990)                      
    cm = nx.Graph(cm)                              
    cm.remove_edges_from( nx.selfloop_edges(cm) )    
    net_meas_random.loc[i,'cluscoef'] = nx.average_clustering(cm)
    Gcc_cm = sorted(nx.connected_components(cm), key=len, reverse=True )   
    H_cm = cm.subgraph(Gcc_cm[0]).copy()
    net_meas_random.loc[i,'avlen'] = nx.average_shortest_path_length(H_cm)

results = {'Mean_Clus_Coeff':  [net_meas_random['cluscoef'].mean()],
        'StdDev_Clus_Coeff': [net_meas_random['cluscoef'].std()],
        'Mean_ave_short_path_leng':  [net_meas_random['avlen'].mean()],
        'StdDev_ave_short_path_leng': [net_meas_random['avlen'].std()],
        'Year': [1990]}
results = pd.DataFrame(results)

Many thanks in advance for any tips!

3
  • Did you provide the actual data? Because you would need to provide a degree sequence to nx.configuration_model() which has an even sum. Commented Jul 29, 2022 at 9:05
  • In your example there is a typo with degree. In addition, running your code I get the following error: Invalid degree sequence: sum of degrees must be even, not odd. At which part of your code there is a problem when using different lengths? I have not seen any hard-coded values on first view. Commented Jul 29, 2022 at 9:07
  • Sorry, I had created a toy dataset that looked like mine, and I hadn't checked if it ran. I edited the question, and this new sample dataset runs. Commented Jul 29, 2022 at 9:13

1 Answer 1

1

If your second code example works for every given year you could do the following:

1.Define a function that does your analyses:

def eval_seq(data, year):
    k_cc=data
    #Put the second code here. 
    return results 

2.Call your function as loop:

results={} # dict for storing all results
for year in sorted(list(set(k_cc['Year']))): #get a List of all years in your dataset
    results[year]=eval_seq(k_cc, year)

EDIT

I was not able to recreate you error. However, the example data was still wrong. Please notice the modifications given below:

import networkx as nx
import pandas as pd

# Data
data = {'degree':  [4,4,6,3,7,8,6,3,5,1,5,2,8,8,4],
        'Year': [1990, 1990, 1990, 1991, 1991, 1991, 1992, 1992, 1992, 1992, 1992, 1993, 1993, 1993, 1994]}
k_cc = pd.DataFrame(data)

Two numbers were changed due to errors.

def eval_seq(data, year):
    k_cc=data.copy()
    
    #Put the second code here. 
    # Change 1990 to year

    # Analyses for one year
    k_cc_1990 = k_cc[k_cc['Year']==year]
    k_cc_1990 = k_cc_1990["degree"]
    k_cc_1990 = k_cc_1990.values.tolist()

    # Generate a configuration model
    net_meas_random = pd.DataFrame(columns = ['cluscoef','avlen'])

    for i in range(10):                                   
        cm = nx.configuration_model(k_cc_1990)                      
        cm = nx.Graph(cm)                              
        cm.remove_edges_from( nx.selfloop_edges(cm) )    
        net_meas_random.loc[i,'cluscoef'] = nx.average_clustering(cm)
        Gcc_cm = sorted(nx.connected_components(cm), key=len, reverse=True )   
        H_cm = cm.subgraph(Gcc_cm[0]).copy()
        net_meas_random.loc[i,'avlen'] = nx.average_shortest_path_length(H_cm)
    
    # Changed results to scalars instead of one-element arrays
    results = {'Mean_Clus_Coeff':  net_meas_random['cluscoef'].mean(),
            'StdDev_Clus_Coeff': net_meas_random['cluscoef'].std(),
            'Mean_ave_short_path_leng':  net_meas_random['avlen'].mean(),
            'StdDev_ave_short_path_leng': net_meas_random['avlen'].std(),
            'Year': year}
#     results = pd.DataFrame(results)
    return(results)

I have reduced the [] in your result for simplification. No need for one-element arrays.

results={} # dict for storing all results
for year in sorted(list(set(k_cc['Year'].values.tolist()))): #get a List of all years in your dataset
    results[year]=eval_seq(k_cc, year)
print(results)
df=pd.DataFrame(results)
df.head()

This will run without error and the result is also converted into a DataFrame.

1990 1991 1992 1993 1994
Mean_Clus_Coeff 0.5 0.7 0.383333 0.4 0
StdDev_Clus_Coeff 0.527046 0.483046 0.279881 0.516398 0
Mean_ave_short_path_leng 1.16667 1.1 1.5 1.2 0
StdDev_ave_short_path_leng 0.175682 0.161015 0.105409 0.172133 0
Year 1990 1991 1992 1993 1994
Sign up to request clarification or add additional context in comments.

4 Comments

I ran it, but I get an error Unsupported operand type(s) for +: 'int' and 'str'.
At which part of your code you got this error?
In the last part, when I call the function as loop.
Thank you so much! I've ran it on the original code and it works perfectly (and fast) :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.