0

Consider a system with n_channels transmitting n_samples at a given sampling rate. The 1D buffer containing the timestamps and the 2D buffer containing (n_channels, n_samples) is:

from ctypes import c_double, c_float

# Assume a 2-second window, 3 channels, sampled at 1024 Hz
# data: (n_channels, n_samples) = (3, 2048)
# timestamps: (n_samples,) = (2048,)
n_channels = 3
n_samples = 2048
n_data_values = n_channels * n_samples
data_buffer = (c_float * n_data_values)()
ts_buffer = (c_double * n_samples)() 

I have a C++ binary library that fills the buffer. The function can be summarized as:

from ctypes import byref

fill_buffers(
    byref(data_buffer),
    byref(ts_buffer),
)

At this point, I have 2 filled buffers, one with 2048 elements (timestamps) and one with 3* 2048 elements (data). I want to load as efficiently as possible those 2 buffers in a numpy array.

np.frombuffer seems amazing to read 1D array, e.g. the timestamps, but I can't find a counterpart for N-dim arrays.

# read from buffer for the 1D array
timestamps = np.frombuffer(ts_buffer)  # 192 ns ± 1.11 ns per loop
timestamps = np.array(ts_buffer)  # 854 ns ± 2.99 ns per loop

For now, the data array is loaded with:

data = np.array(data_buffer).reshape(-1, n_channels, order="C").T

Any way to use the same efficient method as np.frombuffer while providing the output shape and the order?


This question is different from How can I initialize a NumPy array from a multidimensional buffer? and from How to restore a 2-dimensional numpy.array from a bytestring? since it does not focus on an alternative to np.frombuffer, but an alternative as efficient.


EDIT: Why is np.frombuffer(data_buffer).reshape(-1, n_channels).T not working? With 3 channels and 1024 points (to speed-up my testing), I get len(data_buffer) = 3072, but:

np.array(data_buffer).reshape(-1, 3).T.size = 3072
np.frombuffer(data_buffer).reshape(-1, 3).T.size = 1536

The application is a LabStreamingLayer buffer. The buffer is filled here https://github.com/labstreaminglayer/liblsl-Python/blob/87276974a311bcf7ceb3383e9d04c6bdcf302771/pylsl/pylsl.py#L854-L861 using the C++ library https://github.com/sccn/liblsl with specifically this function https://github.com/sccn/liblsl/blob/08aa186326e9a339316b7d5677ef31b3651b4aad/src/lsl_inlet_c.cpp#L180-L185

1 Answer 1

1

Does np.frombuffer(data_buffer, dtype=c_float).reshape(-1, n_channels, order="C").T not work correctly? As you are doing it np.array treats the buffer as a 1D array until you reshape it anyways.

For me the following code produces the right shapes. (Hard to verify if it works correctly without a MWE for the data that should be in the buffers).

import numpy as np
from ctypes import c_double, c_float

# Assume a 2-second window, 3 channels, sampled at 1024 Hz
# data: (n_channels, n_samples) = (3, 2048)
# timestamps: (n_samples,) = (2048,)
n_channels = 3
n_samples = 2048
n_data_values = n_channels * n_samples
data_buffer = (c_float * n_data_values)() # Note that c_float is typically 32 bytes while c_double and numpy's default is 64 bytes
ts_buffer = (c_double * n_samples)()

# Create a mock buffer

input_data = np.arange(0,n_data_values, dtype=c_float)
input_data_buffer = input_data.tobytes()


timestamps = np.frombuffer(ts_buffer) 

# Note to specify the data type for the array of floats
data = np.frombuffer(input_data_buffer, dtype=c_float).reshape(-1, n_channels, order="C").T
# data has values 0,1,2 for first time point, 3,4,5 for second, and so on
Sign up to request clarification or add additional context in comments.

5 Comments

I'll try this one (just in case), but when I did np.frombuffer(data_buffer), I got only 2048 elements (a single channel).
Hard to say without the code creating the buffers, but for the code I posted I get a 3,2048 data array. Perhaps the c code is not creating/filling the buffer correctly
So there is something weird into play, I was also expecting your solution to work. But I'm getting only half of the data, even thou the buffer len is correct. C++ is above my current knowledge, I edited my post with additional information and links, but it seems like some digging into this library is required to figure out what is happening here..
Have you tried specifying the dtype for from buffer? Perhaps np.array does it automatically, but I needed to specify dtype as c_float not c_double, e.g. numpy's default data type. c_float is half the size of c_double so you'll get twice as many values. (c_float and c_double may be architecture/compiler dependent, but I'm not very aware of working with c/c++ bindings in Python.)
And I'm an idiot.. obviously, I need to load with the correct dtype.. Thanks a lot for the help, works like a charm!!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.