I need some help starting running some parallel code in python. I do not think that for my problem I can share executable code but still you can help me conceptually solve my issue.
I have written a function that takes as input a panda dataframe row. That function makes some x calculations and returns back again a row from a panda data frame that has different column names as the input.
So far I have been using this in a for loop to get as input the rows and the after the function was returning I was appending its output to the new dataframe
new_df=pd.DataFrame(columns=['1','2','unique','occurence','timediff','ueid'], dtype='float')
for i in range(0,small_pd.shape[0]): #small_pd the input of the dataframe
new_df=new_df.append(SequencesExtractTime(small_pd.loc[i]))
Now I have the issue that I want to run this code in parallel. I have found the multiprocessing package.
from joblib import Parallel, delayed
import multiprocessing
num_cores = multiprocessing.cpu_count()
results = Parallel(n_jobs=num_cores)(SequencesExtractTime(small_pd.loc)(i) for i in range(0,small_pd.shape[0]))
but unfortunately this does-not execute, since I do not know how to declare that the input is the separate rows of this dataframe.
Can you please help me on how I can achieve such parallelization in python? Inputs are the rows of a dataframe, the output are rows of a dataframe that need to be merged together.
Thanks a lot
Regards
Alex