2

I'm new to python and numpy/scipy. The design of Numpy array and the broadcasting rule in numpy/scipy is sometime quite helpful while remaining a lot of pain to me.

I read something like

Numpy tries to keep the array in the lowest dimension.

somewhere.

Here are some situations.

  1. I would like to receive a matrix and calculate its eigenvalues and do some stuff. There will be a moment that a 1d array(e.g, array(1.0) - that comes from a result of a numpy operation), namely a scale pass to this function. I need to write something like

    if (A.ndim < 2):
        A = sp.array([[A]])
    

    to prevent scipy.linalg.eig showing

    ValueError: expected square matrix

  2. When doing some machine learning problem, I write thing like

    n_samples, n_features = X.shape if X.ndim > 1 else (1, X.shape[0])
    

    I just need to write extra code to gain the number of samples and features and prevent

    IndexError: tuple index out of range

    Or sometimes I only need the number of features when the row of a matrix represents a sample, and the column of a matrix represents a feature. I need to write something like

    n_features = X.shape[1] if X.ndim > 2 else X.shape[0]
    

    or do some preprocessing like

    if (X.ndim < 2):
        X = X[np.newaxis, :]
    

    to keep things go well.

  3. Sometimes I write thing like

    sp.dot(weight.T, X.T - mu[:, sp.newaxis])
    

    Everything seems fine until I find that mu will possibly be a 1d array or a int scaler ! Then the exception

    TypeError: 'int' object is not subscriptable or

    IndexError: too many indices for array

almost make me crazy.

There are even more cases like this ... All of this seems come from the rule mentioned as the fisrt quote, e.g, when I am expecting a matrix even a 1x1 one, numpy tries to reduce it into a 0-dim array(namely, array(1.0)).

I'm used to be a matlab user and now get into Numpy/Scipy. Beyond the simple and math-friendly matlab syntax, there still less pain in matlab.

I read some code in the source of sklearn package, there also be lots of code worrying 'is this thing a vector or matrix ?', 'shall we add a new axis to it?'.

What is the best way to reduce the pain of writing this ?

4
  • This question is not related to MATLAB. I'm going to remove the tag. Commented Mar 11, 2016 at 12:52
  • Can somebody tell me why down voting this question ? Commented Mar 11, 2016 at 15:52
  • It wasn't me. I just removed the [matlab] tag as irrelevant. Commented Mar 11, 2016 at 17:55
  • @CST-Link I know it, I am just looking for a place to shout ... Commented Mar 12, 2016 at 1:04

1 Answer 1

2

np.atleast_2d might solve all your issues.

Even the last might be written as:

np.dot(weight.T, (X - np.atleast_2d(mu)).T)

or maybe

np.dot(X-np.atleast_2d(mu), weight)  # tested with (3,2),(2,3) arrays

The issue of matlab/numpy compatibility has been around for a long time. There's at least one documentation page devoted to the topic.

There's an ndarray subclass that ensures everything it touches is a 2d matrix. That was true of MATLAB when I started to use it. It was a big deal when the allowed 3d and higher. But experienced numpy users discourage its use. There are SO questions in which the poster got messed up by np.matrix.

I learned early on in MATLAB that keeping dimensions straight was the biggest part of debugging. I got in the habit of defining test matrices with shapes that helped identify mismatches (and the need for actions like transpose). The same applies in numpy.

MATLAB users are always wondering, 'is this a column vector or a row vector?', and throwing x.' expressions around. numpy gives you a third choice - a 1d vector.

There have been lots of questions about ensuring an array has n dimensions. Look up the np.atleast_xxx functions. Look for ...[None,:] syntax. For reshape(n,1) and reshape(...-1). Functions like np.sum accept a keepdims parameter.


The code for np.atleast_2d is:

def atleast_2d(*arys):
    res = []
    for ary in arys:
        ary = asanyarray(ary)
        if len(ary.shape) == 0 :
            result = ary.reshape(1, 1)
        elif len(ary.shape) == 1 :
            result = ary[newaxis,:]
        else :
            result = ary
        res.append(result)
    if len(res) == 1:
        return res[0]
    else:
        return res
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for your answer ! It seems that np.matrix is not recommended after I read some articles. Maybe np.atlest_2d is the unavoidable compromise :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.