Declaring numpy array and c pointer in cython

Question

In my code I usally use numpy arrays to interface between methods and classes. Optimizing the core parts of my program I use cython with c pointers of those numpy arrays. Unforunately, the way I'm currently declaring the arrays is quite long.

For example, let's say I have a method which should return a numpy array someArrayNumpy, but inside the function pointers *someArrayPointers should be used for speed. This is how I usually declare this:

cdef:
    numpy.ndarray someArrayNumpy = numpy.zeros(someArraySize)
    numpy.ndarray[numpy.double_t, ndim=1] someArrayBuff = someArrayNumpy
    double *someArrayPointers = <double *> someArrayBuff.data

[... some Code ...]

return someArrayNumpy

As you can see, this takes up 3 lines of code for basically one array, and often I have to declare more of those arrays.

Is there a more compact/clever way to do this? I think I am missing something.

EDIT:

So because it was asked by J. Martinot-Lagarde I timed C pointers and "numpy pointers". The code was basically

for ii in range(someArraySize):
    someArrayPointers[ii] += 1

and

for ii in range(someArraySize):
    someArrayBuff[ii] += 1

with the definitions from above, but I added "ndim=1, mode='c'" just to make sure. Results are for someArraySize = 1e8 (time in ms):

testMartinot("cPointers")
531.276941299
testMartinot("numpyPointers")
498.730182648

That's what I roughly remember from previous/different benchmarks.

If anyone is reading this: By now I moved on to using typed memoryviews of cython. In my experience they are very close to C pointers in performance (closer than to the numpy buffer) and much easier to use. In fact, on some rare occasions I made "small" (thus not easily recognizable/avoidable) mistakes with C pointers which made them slower than the typed memoryviews. I really recommend typed memoryviews if and where possible. — oli
– oli, Commented Jan 31, 2014 at 14:50

J. Martinot-Lagarde · Accepted Answer · 2013-07-10 13:57:15Z

6

You're actually declaring two numpy arrays here, the first one is generic and the second one has a specific dtype. You can skip the first line, someArrayBuff is a ndarray.

This gives :

numpy.ndarray[numpy.double_t] someArrayNumpy = numpy.zeros(someArraySize)
double *someArrayPointers = <double *> someArrayNumpy.data

You need at least two lines because you're using someArrayPointers and returning someArrayNumpy so you have to declare them.

As a side note, are you sure that pointers are faster than ndarrays, if you declare the type and the number of dimensions of the array ?

numpy.ndarray[numpy.double_t, ndim=2] someArrayNumpy = numpy.zeros(someArraySize)

answered Jul 10, 2013 at 13:57

J. Martinot-Lagarde

3,5902 gold badges18 silver badges17 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

oli Over a year ago

Thank you for your answer, I somehow thought that the numpy.dtype_t stuff is a necessary buffer. Btw I added some timit above to justifiy the use of C pointers. It's not much but in my case a general >5% speed up is worth the effort.

Collectives™ on Stack Overflow

Declaring numpy array and c pointer in cython

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related