My pandas/numpy is rusty, and the code I have written feels inefficient.
I'm initializing a numpy array of zeros in Python3.x, length 1000. For my purpose, these are simply integers:
import numpy as np
array_of_zeros = np.zeros((1000, ), )
I also have the following DataFrame (which is much smaller than my actual data)
import pandas as pd
dict1 = {'start' : [100, 200, 300], 'end':[400, 500, 600]}
df = pd.DataFrame(dict1)
print(df)
##
## start end
## 0 100 400
## 1 200 500
## 2 300 600
The DataFrame has two columns, start and end. These values represent a range of values, i.e. start will always be a smaller integer than end. Above, we see the first row has the range 100-400, next is 200-500, and then 300-600.
My goal is to iterate through the pandas DataFrame row by row, and increment the numpy array array_of_zeros based on these index positions. So, if there is a row in the dataframe of 10 to 20, I would like to increment the zero by +1 for the indices 10-20.
Here is the code which does what I would like:
import numpy as np
array_of_zeros = np.zeros((1000, ), )
import pandas as pd
dict1 = {'start' : [100, 200, 300], 'end':[400, 500, 600]}
df = pd.DataFrame(dict1)
print(df)
for idx, row in df.iterrows():
for i in range(int(row.start), int(row.end)+1):
array_of_zeros[i]+=1
And it works!
print(array_of_zeros[15])
## output: 0.0
print(array_of_zeros[600])
## output: 1.0
print(array_of_zeros[400])
## output: 3.0
print(array_of_zeros[100])
## output: 1.0
print(array_of_zeros[200])
## output: 2.0
My questions: this is very clumsy code! I shouldn't be using so many for-loops with numpy arrays! This solution will be very inefficient if the input dataframe is quite large
Is there a more efficient (i.e. more numpy-based) method to avoid this for-loop?
for i in range(int(row.start), int(row.end)+1):
array_of_zeros[i]+=1
Perhaps there is a pandas-oriented solution?