Optimize Python: Large arrays, memory problems

Question

I'm having a speed problem running a python / numypy code. I don't know how to make it faster, maybe someone else?

Assume there is a surface with two triangulation, one fine (..._fine) with M points, one coarse with N points. Also, there's data on the coarse mesh at every point (N floats). I'm trying to do the following:

For every point on the fine mesh, find the k closest points on coarse mesh and get mean value. Short: interpolate data from coarse to fine.

My code right now goes like that. With large data (in my case M = 2e6, N = 1e4) the code runs about 25 minutes, guess due to the explicit for loop not going into numpy. Any ideas how to solve that one with smart indexing? M x N arrays blowing the RAM..

import numpy as np

p_fine.shape => m x 3
p.shape => n x 3

data_fine = np.empty((m,))
for i, ps in enumerate(p_fine):
    data_fine[i] = np.mean(data_coarse[np.argsort(np.linalg.norm(ps-p,axis=1))[:k]])

Cheers!

is there a reason why you cannot use nearest neighbors regression in sklearn? probably much more efficient than doing it by hand. — benten
– benten, Commented Sep 28, 2016 at 14:15
I think numpy is not the good module to do that kind of things, because the loop over the point of the fine mesh cannot be vectorized. If you need to code it by hand, I suggest using Cython and working with explicit for loops. — Bertrand Gazanion
– Bertrand Gazanion, Commented Sep 28, 2016 at 14:24
If I understand you correctly, p and p_fine are meshes. Since a mesh is usually structured, it would be a lot faster if you switched to a different data structure (e.g. a kD-tree) where searching for spatial data is fast. — Hannes Ovrén
– Hannes Ovrén, Commented Sep 28, 2016 at 15:31

benten · Accepted Answer · 2016-09-29 16:46:42Z

3

First of all thanks for the detailed help.

First, Divakar, your solutions gave substantial speed-up. With my data, the code ran for just below 2 minutes depending a bit on the chunk size.

I also tried my way around sklearn and ended up with

def sklearnSearch_v3(p, p_fine, k):
    neigh = NearestNeighbors(k)
    neigh.fit(p)
    return data_coarse[neigh.kneighbors(p_fine)[1]].mean(axis=1)

which ended up being quite fast, for my data sizes, I get the following

import numpy as np
from sklearn.neighbors import NearestNeighbors

m,n = 2000000,20000
p_fine = np.random.rand(m,3)
p = np.random.rand(n,3)
data_coarse = np.random.rand(n)
k = 3

yields

%timeit sklearv3(p, p_fine, k)
1 loop, best of 3: 7.46 s per loop

edited Sep 29, 2016 at 16:46

benten

1,9892 gold badges24 silver badges38 bronze badges

answered Sep 29, 2016 at 10:06

Max

3571 silver badge15 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Divakar Over a year ago

This seems to be the better one! Good job on researching these.

Divakar · Accepted Answer · 2016-09-28 17:29:40Z

Approach #1

We are working with large sized datasets and memory is an issue, so I will try to optimize the computations within the loop. Now, we can use np.einsum to replace np.linalg.norm part and np.argpartition in place of actual sorting with np.argsort, like so -

out = np.empty((m,))
for i, ps in enumerate(p_fine):
    subs = ps-p
    sq_dists = np.einsum('ij,ij->i',subs,subs)
    out[i] = data_coarse[np.argpartition(sq_dists,k)[:k]].sum()
out = out/k

Approach #2

Now, as another approach we can also use Scipy's cdist for a fully vectorized solution, like so -

from scipy.spatial.distance import cdist
out = data_coarse[np.argpartition(cdist(p_fine,p),k,axis=1)[:,:k]].mean(1)

But, since we are memory bound here, we can perform these operations in chunks. Basically, we would get chunks of rows from that tall array p_fine that has millions of rows and use cdist and thus at each iteration get chunks of output elements instead of just one scalar. With this, we would cut the loop count by the length of that chunk.

So, finally we would have an implementation like so -

out = np.empty((m,))
L = 10 # Length of chunk (to be used as a param)
num_iter = m//L
for j in range(num_iter):
    p_fine_slice = p_fine[L*j:L*j+L]
    out[L*j:L*j+L] = data_coarse[np.argpartition(cdist\
                           (p_fine_slice,p),k,axis=1)[:,:k]].mean(1)

Runtime test

Setup -

# Setup inputs
m,n = 20000,100
p_fine = np.random.rand(m,3)
p = np.random.rand(n,3)
data_coarse = np.random.rand(n)
k = 5

def original_approach(p,p_fine,m,n,k):
    data_fine = np.empty((m,))
    for i, ps in enumerate(p_fine):
        data_fine[i] = np.mean(data_coarse[np.argsort(np.linalg.norm\
                                                 (ps-p,axis=1))[:k]])
    return data_fine

def proposed_approach(p,p_fine,m,n,k):    
    out = np.empty((m,))
    for i, ps in enumerate(p_fine):
        subs = ps-p
        sq_dists = np.einsum('ij,ij->i',subs,subs)
        out[i] = data_coarse[np.argpartition(sq_dists,k)[:k]].sum()
    return out/k

def proposed_approach_v2(p,p_fine,m,n,k,len_per_iter):
    L = len_per_iter
    out = np.empty((m,))    
    num_iter = m//L
    for j in range(num_iter):
        p_fine_slice = p_fine[L*j:L*j+L]
        out[L*j:L*j+L] = data_coarse[np.argpartition(cdist\
                               (p_fine_slice,p),k,axis=1)[:,:k]].sum(1)
    return out/k

Timings -

In [134]: %timeit original_approach(p,p_fine,m,n,k)
1 loops, best of 3: 1.1 s per loop

In [135]: %timeit proposed_approach(p,p_fine,m,n,k)
1 loops, best of 3: 539 ms per loop

In [136]: %timeit proposed_approach_v2(p,p_fine,m,n,k,len_per_iter=100)
10 loops, best of 3: 63.2 ms per loop

In [137]: %timeit proposed_approach_v2(p,p_fine,m,n,k,len_per_iter=1000)
10 loops, best of 3: 53.1 ms per loop

In [138]: %timeit proposed_approach_v2(p,p_fine,m,n,k,len_per_iter=2000)
10 loops, best of 3: 63.8 ms per loop

So, there's about 2x improvement with the first proposed approach and 20x over the original approach with the second one at the sweet spot with the len_per_iter param set at 1000. Hopefully this will bring down your 25 minutes runtime to little over a minute. Not bad I guess!

Collectives™ on Stack Overflow

Optimize Python: Large arrays, memory problems

2 Answers 2

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related