0

I am implementing linear regression in Python, and I think I am doing something wrong while converting matrix to numpy array, but cannot seem to figure it out. Any help will be appreciated.

I am loading data from a csv file that has 100 columns. y is the last column. I am not using col 1 and 2 for regression.

communities=np.genfromtxt("communities.csv", delimiter = ",", dtype=float)
xdata = communities[1:,2:99]
x = np.array([np.concatenate((v,[1]))for v in xdata])
y = communities[1:,99]

Function definition

def standRegress(xArr, yArr):
    xMat = mat(xArr); yMat = mat(yArr).T
    xTx = xMat.T*xMat
    if linalg.det(xTx)==0.0:
        print"singular matrix"
        return
    ws = xTx.I*(xMat.T*yMat)
    return ws

calling the function

w = standRegress(x,y)
xMat = mat(x) #shape(1994L,98L)
yMat = mat(y) #shape (1L, 1994L)
yhat = xMat*w #shape (1994L, 1L)

Next I am trying to calculate RMSE and this is where I am having problem

yMatT = yMat.T #shape(1994L, 1L)
err = yhat - yMatT #shape(1994L, 1L)
error = np.array(err)
total_error = np.dot(error,error)
rmse = np.sqrt(total_error/len(p))

I get an error while I am doing the dot product and thus not able to calculate rmse. I will appreciate if someone can help me find my mistake.

Error: 
 ---> 11 np.dot(error,error)
 12 #test = (error)**2
 13 #test.sum()/len(y)
 ValueError: matrices are not aligned
5
  • 1
    Can you edit your question and include the specific error message you're receiving? Commented Oct 31, 2014 at 15:57
  • as you're using numpy, just wonder why if there is any particular reason you're not using linalg? Commented Oct 31, 2014 at 16:04
  • @Anzel, did not think of using linalg. Can you please guide how to use that. Commented Oct 31, 2014 at 16:15
  • @nasiajaffri, take a look at this numpy doc Commented Oct 31, 2014 at 16:21
  • @Michael0x2a, I have edited the question. Please have a look now. Commented Oct 31, 2014 at 16:24

1 Answer 1

1

I'm not quite sure what the last dot is supposed to do. But you can't multiple error with itself this way. dot does a matrix multiplication, thus the dimensions have to align.

See, e.g., the following example:

import numpy as np
A = np.ones((3, 4))
B = np.ones((3, 4))
print np.dot(A, B)

This yields the error ValueError: matrices are not aligned.

What is possible, however, is:

print np.dot(A.T, B)

Output:

[[ 3.  3.  3.  3.]
 [ 3.  3.  3.  3.]
 [ 3.  3.  3.  3.]
 [ 3.  3.  3.  3.]]

In your example error is just a column vector - but stored as a 2D array:

A = np.ones((3, 1))
B = np.ones((3, 1))
print np.dot(A, B)

Same error.

So you can either transpose one argument - as shown above - or extract one column as a 1D array:

print np.dot(A[:, 0], B[:, 0])

Output:

3.0
Sign up to request clarification or add additional context in comments.

4 Comments

Yes you are right, but err is supposed to be 1994 rows, but only 1 column. I am not sure what am I doing wrong before the dot product.
@nasiajaffri: Oh, I see. I edited my answer accordingly.
Also - from np.info(np.dot) - ...Raises ------ ValueError If the last dimension of `a` is not the same size as the second-to-last dimension of `b`....

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.