How to write consistent code using Numpy/Scipy?

Question

I'm new to python and numpy/scipy. The design of Numpy array and the broadcasting rule in numpy/scipy is sometime quite helpful while remaining a lot of pain to me.

I read something like

Numpy tries to keep the array in the lowest dimension.

somewhere.

Here are some situations.

I would like to receive a matrix and calculate its eigenvalues and do some stuff. There will be a moment that a 1d array(e.g, array(1.0) - that comes from a result of a numpy operation), namely a scale pass to this function. I need to write something like
```
if (A.ndim < 2):
    A = sp.array([[A]])
```
to prevent scipy.linalg.eig showing

ValueError: expected square matrix
When doing some machine learning problem, I write thing like
```
n_samples, n_features = X.shape if X.ndim > 1 else (1, X.shape[0])
```
I just need to write extra code to gain the number of samples and features and prevent

IndexError: tuple index out of range

Or sometimes I only need the number of features when the row of a matrix represents a sample, and the column of a matrix represents a feature. I need to write something like
```
n_features = X.shape[1] if X.ndim > 2 else X.shape[0]
```
or do some preprocessing like
```
if (X.ndim < 2):
    X = X[np.newaxis, :]
```
to keep things go well.
Sometimes I write thing like
```
sp.dot(weight.T, X.T - mu[:, sp.newaxis])
```
Everything seems fine until I find that mu will possibly be a 1d array or a int scaler ! Then the exception

TypeError: 'int' object is not subscriptable or

IndexError: too many indices for array

almost make me crazy.

There are even more cases like this ... All of this seems come from the rule mentioned as the fisrt quote, e.g, when I am expecting a matrix even a 1x1 one, numpy tries to reduce it into a 0-dim array(namely, array(1.0)).

I'm used to be a matlab user and now get into Numpy/Scipy. Beyond the simple and math-friendly matlab syntax, there still less pain in matlab.

I read some code in the source of sklearn package, there also be lots of code worrying 'is this thing a vector or matrix ?', 'shall we add a new axis to it?'.

What is the best way to reduce the pain of writing this ?

This question is not related to MATLAB. I'm going to remove the tag. — user2271770
– user2271770, Commented Mar 11, 2016 at 12:52
It wasn't me. I just removed the [matlab] tag as irrelevant. — user2271770
– user2271770, Commented Mar 11, 2016 at 17:55
@CST-Link I know it, I am just looking for a place to shout ... — Saddle Point
– Saddle Point, Commented Mar 12, 2016 at 1:04

hpaulj · Accepted Answer · 2016-03-11 18:10:49Z

np.atleast_2d might solve all your issues.

Even the last might be written as:

np.dot(weight.T, (X - np.atleast_2d(mu)).T)

or maybe

np.dot(X-np.atleast_2d(mu), weight)  # tested with (3,2),(2,3) arrays

The issue of matlab/numpy compatibility has been around for a long time. There's at least one documentation page devoted to the topic.

There's an ndarray subclass that ensures everything it touches is a 2d matrix. That was true of MATLAB when I started to use it. It was a big deal when the allowed 3d and higher. But experienced numpy users discourage its use. There are SO questions in which the poster got messed up by np.matrix.

I learned early on in MATLAB that keeping dimensions straight was the biggest part of debugging. I got in the habit of defining test matrices with shapes that helped identify mismatches (and the need for actions like transpose). The same applies in numpy.

MATLAB users are always wondering, 'is this a column vector or a row vector?', and throwing x.' expressions around. numpy gives you a third choice - a 1d vector.

There have been lots of questions about ensuring an array has n dimensions. Look up the np.atleast_xxx functions. Look for ...[None,:] syntax. For reshape(n,1) and reshape(...-1). Functions like np.sum accept a keepdims parameter.

The code for np.atleast_2d is:

def atleast_2d(*arys):
    res = []
    for ary in arys:
        ary = asanyarray(ary)
        if len(ary.shape) == 0 :
            result = ary.reshape(1, 1)
        elif len(ary.shape) == 1 :
            result = ary[newaxis,:]
        else :
            result = ary
        res.append(result)
    if len(res) == 1:
        return res[0]
    else:
        return res

Thanks for your answer ! It seems that np.matrix is not recommended after I read some articles. Maybe np.atlest_2d is the unavoidable compromise :)

Collectives™ on Stack Overflow

How to write consistent code using Numpy/Scipy?

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related