7

In my code I usally use numpy arrays to interface between methods and classes. Optimizing the core parts of my program I use cython with c pointers of those numpy arrays. Unforunately, the way I'm currently declaring the arrays is quite long.

For example, let's say I have a method which should return a numpy array someArrayNumpy, but inside the function pointers *someArrayPointers should be used for speed. This is how I usually declare this:

cdef:
    numpy.ndarray someArrayNumpy = numpy.zeros(someArraySize)
    numpy.ndarray[numpy.double_t, ndim=1] someArrayBuff = someArrayNumpy
    double *someArrayPointers = <double *> someArrayBuff.data

[... some Code ...]

return someArrayNumpy

As you can see, this takes up 3 lines of code for basically one array, and often I have to declare more of those arrays.

Is there a more compact/clever way to do this? I think I am missing something.

EDIT:

So because it was asked by J. Martinot-Lagarde I timed C pointers and "numpy pointers". The code was basically

for ii in range(someArraySize):
    someArrayPointers[ii] += 1

and

for ii in range(someArraySize):
    someArrayBuff[ii] += 1

with the definitions from above, but I added "ndim=1, mode='c'" just to make sure. Results are for someArraySize = 1e8 (time in ms):

testMartinot("cPointers")
531.276941299
testMartinot("numpyPointers")
498.730182648

That's what I roughly remember from previous/different benchmarks.

1
  • If anyone is reading this: By now I moved on to using typed memoryviews of cython. In my experience they are very close to C pointers in performance (closer than to the numpy buffer) and much easier to use. In fact, on some rare occasions I made "small" (thus not easily recognizable/avoidable) mistakes with C pointers which made them slower than the typed memoryviews. I really recommend typed memoryviews if and where possible. Commented Jan 31, 2014 at 14:50

1 Answer 1

6

You're actually declaring two numpy arrays here, the first one is generic and the second one has a specific dtype. You can skip the first line, someArrayBuff is a ndarray.

This gives :

numpy.ndarray[numpy.double_t] someArrayNumpy = numpy.zeros(someArraySize)
double *someArrayPointers = <double *> someArrayNumpy.data

You need at least two lines because you're using someArrayPointers and returning someArrayNumpy so you have to declare them.


As a side note, are you sure that pointers are faster than ndarrays, if you declare the type and the number of dimensions of the array ?

numpy.ndarray[numpy.double_t, ndim=2] someArrayNumpy = numpy.zeros(someArraySize)
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you for your answer, I somehow thought that the numpy.dtype_t stuff is a necessary buffer. Btw I added some timit above to justifiy the use of C pointers. It's not much but in my case a general >5% speed up is worth the effort.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.