0

I'm trying to solve the bottleneck in my application, which is an elementwise sum of two matrices.

I'm using NumPy and Cython. I have a cdef class with a matrix attribute. Since Cython still doesn't support buffer arrays in class attributes, I followed this and tried to use a pointer to the data attribute of the matrix. The thing is, I'm sure I'm doing something wrong, as the results indicate.

What I tried to do is more or less the following:

cdef class the_class:
    cdef np.ndarray the_matrix
    cdef float_t* the_matrix_p

    def __init__(self):
        the_matrix_p = <float_t*> self.the_matrix.data

    cpdef the_function(self):
        other_matrix = self.get_other_matrix()


        the_matrix_p += other_matrix.data
1
  • So, what's the problem? What error are you getting? Commented Jan 25, 2013 at 12:54

2 Answers 2

1

I have serious doubt that adding two numpy arrays is a bottleneck that you can solve rewriting things in C. See the follwing code, that uses scipy.weave:

import numpy as np
from scipy.weave import inline

a = np.random.rand(10000000)
b = np.random.rand(10000000)
c = np.empty((10000000,))

def c_sum(a, b, c) :
    length = a.shape[0]
    code = '''
           for(int j = 0; j < length; j++)
           {
               c[j] = a[j] + b[j];
           }
           '''
    inline(code, ['a', 'b', 'c', 'length'])

Once you run c_sum(a, b, c) once to get the C code compiled, these are the timings I get:

In [12]: %timeit c_sum(a, b, c)
10 loops, best of 3: 33.5 ms per loop

In [16]: %timeit np.add(a, b, out=c)
10 loops, best of 3: 33.6 ms per loop

So it seems you are looking at something of a .3% performance improvement, if the timing differences are not simply random noise, on an operation that takes a handful of ms when working on arrays of ten million elements. If it really is a bottleneck, this is hardly going to solve it.

Sign up to request clarification or add additional context in comments.

1 Comment

Yeah, I think you are right. After a couple of measurements, I came to the conclusion that my code is as fast as it can be in Python.
0

Try compiling ATLAS and recompiling numpy after that. This won't probably help with addition, but you can have really nice performance boost with more complicated matrix operations (if you use such, of course).

Check out this simple benchmark. If your results fall too far from those given in the post, maybe your numpy is not linked against some optimized BLAS implementation.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.