44

I have a C++ function computing a large tensor which I would like to return to Python as a NumPy array via pybind11.

From the documentation of pybind11, it seems like using STL unique_ptr is desirable. In the following example, the commented out version works, whereas the given one compiles but fails at runtime ("Unable to convert function return value to a Python type!").

Why is the smartpointer version failing? What is the canonical way to create and return a NumPy array?

PS: Due to program structure and size of the array, it is desirable to not copy memory but create the array from a given pointer. Memory ownership should be taken by Python.

typedef typename py::array_t<double, py::array::c_style | py::array::forcecast> py_cdarray_t;

// py_cd_array_t _test()
std::unique_ptr<py_cdarray_t> _test()
{
    double * memory = new double[3]; memory[0] = 11; memory[1] = 12; memory[2] = 13;
    py::buffer_info bufinfo (
        memory,                                   // pointer to memory buffer
        sizeof(double),                           // size of underlying scalar type
        py::format_descriptor<double>::format(),  // python struct-style format descriptor
        1,                                        // number of dimensions
        { 3 },                                    // buffer dimensions
        { sizeof(double) }                        // strides (in bytes) for each index
    );

    //return py_cdarray_t(bufinfo);
    return std::unique_ptr<py_cdarray_t>( new py_cdarray_t(bufinfo) );
}

3 Answers 3

82

A few comments (then a working implementation).

  • pybind11's C++ object wrappers around Python types (like pybind11::object, pybind11::list, and, in this case, pybind11::array_t<T>) are really just wrappers around an underlying Python object pointer. In this respect there are already taking on the role of a shared pointer wrapper, and so there's no point in wrapping that in a unique_ptr: returning the py::array_t<T> object directly is already essentially just returning a glorified pointer.
  • pybind11::array_t can be constructed directly from a data pointer, so you can skip the py::buffer_info intermediate step and just give the shape and strides directly to the pybind11::array_t constructor. A numpy array constructed this way won't own its own data, it'll just reference it (that is, the numpy owndata flag will be set to false).
  • Memory ownership can be tied to the life of a Python object, but you're still on the hook for doing the deallocation properly. Pybind11 provides a py::capsule class to help you do exactly this. What you want to do is make the numpy array depend on this capsule as its parent class by specifying it as the base argument to array_t. That will make the numpy array reference it, keeping it alive as long as the array itself is alive, and invoke the cleanup function when it is no longer referenced.
  • The c_style flag in the older (pre-2.2) releases only had an effect on new arrays, i.e. when not passing a value pointer. That was fixed in the 2.2 release to also affect the automatic strides if you specify only shapes but not strides. It has no effect at all if you specify the strides directly yourself (as I do in the example below).

So, putting the pieces together, this code is a complete pybind11 module that demonstrates how you can accomplish what you're looking for (and includes some C++ output to demonstrate that is indeed working correctly):

#include <iostream>
#include <pybind11/pybind11.h>
#include <pybind11/numpy.h>

namespace py = pybind11;

PYBIND11_PLUGIN(numpywrap) {
    py::module m("numpywrap");
    m.def("f", []() {
        // Allocate and initialize some data; make this big so
        // we can see the impact on the process memory use:
        constexpr size_t size = 100*1000*1000;
        double *foo = new double[size];
        for (size_t i = 0; i < size; i++) {
            foo[i] = (double) i;
        }

        // Create a Python object that will free the allocated
        // memory when destroyed:
        py::capsule free_when_done(foo, [](void *f) {
            double *foo = reinterpret_cast<double *>(f);
            std::cerr << "Element [0] = " << foo[0] << "\n";
            std::cerr << "freeing memory @ " << f << "\n";
            delete[] foo;
        });

        return py::array_t<double>(
            {100, 1000, 1000}, // shape
            {1000*1000*8, 1000*8, 8}, // C-style contiguous strides for double
            foo, // the data pointer
            free_when_done); // numpy array references this parent
    });
    return m.ptr();
}

Compiling that and invoking it from Python shows it working:

>>> import numpywrap
>>> z = numpywrap.f()
>>> # the python process is now taking up a bit more than 800MB memory
>>> z[1,1,1]
1001001.0
>>> z[0,0,100]
100.0
>>> z[99,999,999]
99999999.0
>>> z[0,0,0] = 3.141592
>>> del z
Element [0] = 3.14159
freeing memory @ 0x7fd769f12010
>>> # python process memory size has dropped back down
Sign up to request clarification or add additional context in comments.

6 Comments

For anybody who comes across this now, creating and assigning the capsule object is no longer necessary.
@Nimitz14 What do you mean precisely by "is no longer necessary"? It seems by default the array_t ctor now copies the data unless you specify a base. Getting non-copy semantics still seems to require a capsule.
Is it unnecessary to define a move return policy via return_value_policy::move in this case? Wondering if the returned array_t will be copied or not.
providing a base argument py::capsule to py::array_t<float> constructor as py::array_t<float>(fsv_size, fsv, capsule) fails with Windows fatal exception: code 0xc0000374 (Heap Corruption) on Windows while works without problems on unix and macos, more details here. Any suggestions why that's the case?
@Nimitz14 I know this was a long time ago, but any idea what the current recommended approach for returning NumPy array would be?
|
8

Since the accepted answer seems to be out of date, I'll give a quick example of how to implement it with the current API. The key thing to keep in mind is that if we are constructing an array "from scratch", we don't need to manually allocate any data, we just need to give the py::array_t constructor our desired shape and strides, and it will manage its own memory.

#include <pybind11/pybind11.h>
#include <pybind11/numpy.h>

namespace py = pybind11;

PYBIND11_MODULE(numpywrap, m) {
    m.def("f", []() {
        constexpr size_t elsize = sizeof(double);
        size_t shape[3]{100, 1000, 1000};
        size_t strides[3]{1000 * 1000 * elsize, 1000 * elsize, elsize};
        auto a = py::array_t<double>(shape, strides);
        auto view = a.mutable_unchecked<3>();
        
        for(size_t i = 0; i < a.shape(0); i++)
        {
          for(size_t j = 0; j < a.shape(1); j++)
          {
            for(size_t k = 0; k < a.shape(2); k++)
            {
              view(i,j,k) = whatever_data_should_go_here(i,j,k);
            }
          }
        }
        return a;
    });
}

4 Comments

This works great! I'm just popping in to comment for posterity that you need to explicitly initialise the shape and strides variables (as shown here). pybind11 gets mad if you try to pass the initialiser lists directly to the py::array_t constructor.
How does this work if you don't know the number of dimensions in the array ahead of time. The solution is for a 3-way array. What does the generic solution look like for a k-way array, where k is a runtime integer that might be 1, 2, 3, 4, ...?
@StevenScott see the header: github.com/pybind/pybind11/blob/… it takes a generic container in the constructor argument, so you could for example use std::vector for shapes and strides
Thank you @AwkwardWhale. I think my question was ambiguous. I see how to build the array object using the various constructors. What I intended to ask about was the utility of a view object where you reference data as view(i, j, k, m) when you don't know at compile time how many indices you need. The view object was promoted in this answer as a convenient way to copy data into the newly constructed array, but I don't see how to use it in the generic setting where the number of dimensions is unknown. My solution was to work out indexing myself. Could view have made it easier?
1

I recommend using ndarray. A foundational principle is that the underlying data is never copied unless explicitly requested (or you quickly end up with huge inefficiencies). Below is an example of it in use, but there are other features I haven't shown, including conversion to Eigen arrays (ndarray::asEigen(array)), which makes it pretty powerful.

Header:

#ifndef MYTENSORCODE_H
#define MYTENSORCODE_H

#include "ndarray_fwd.h"

namespace myTensorNamespace {

ndarray::Array<double, 2, 1> myTensorFunction(int param1, double param2);

}  // namespace myTensorNamespace

#endif  // include guard

Lib:

#include "ndarray.h"
#include "myTensorCode.h"

namespace myTensorNamespace {

ndarray::Array<double, 2, 1> myTensorFunction(int param1, double param2) {
    std::size_t const size = calculateSize();
    ndarray::Array<double, 2, 1> array = ndarray::allocate(size, size);
    array.deep() = 0;  // initialise
    for (std::size_t ii = 0; ii < size; ++ii) {
        array[ii][ndarray::view(ii, ii + 1)] = 1.0;
    }
    return array;
}

}  // namespace myTensorNamespace

Wrapper:

#include "pybind11/pybind11.h"
#include "ndarray.h"
#include "ndarray/pybind11.h"

#include "myTensorCode.h"

namespace py = pybind11;
using namespace pybind11::literals;

namespace myTensorNamespace {
namespace {

PYBIND11_MODULE(myTensorModule, mod) {
    mod.def("myTensorFunction", &myTensorFunction, "param1"_a, "param2"_a);
}

}  // anonymous namespace
}  // namespace myTensorNamespace

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.