I have a pretty simple operation involving two not so large arrays:
- For every element in the first (larger) array, located in position
i - Find if it exists in the second (smaller) array
- If it does, find its index in the second array:
j - Store a float taken from a third array (same length as first array) in the position
i, in the positionjof a fourth array (same length as second array)
The for block below works, but gets very slow for not so large arrays (>10000).
Can this implementation be made faster?
import numpy as np
import random
##############################################
# Generate some random data.
#'Nb' is always smaller then 'Na
Na, Nb = 50000, 40000
# List of IDs (could be any string, I use integers here for simplicity)
ids_a = random.sample(range(1, Na * 10), Na)
ids_a = [str(_) for _ in ids_a]
random.shuffle(ids_a)
# Some floats associated to these IDs
vals_in_a = np.random.uniform(0., 1., Na)
# Smaller list of repeated IDs from 'ids_a'
ids_b = random.sample(ids_a, Nb)
# Array to be filled
vals_in_b = np.zeros(Nb)
##############################################
# This block needs to be *a lot* more efficient
#
# For each string in 'ids_a'
for i, id_a in enumerate(ids_a):
# if it exists in 'ids_b'
if id_a in ids_b:
# find where in 'ids_b' this element is located
j = ids_b.index(id_a)
# store in that position the value taken from 'ids_a'
vals_in_b[j] = vals_in_a[i]
ids_bto their index, then just use that dictionary instead of checking membership on an array.