1

Firstly, there are a few topics on this but they involve deprecated packages with pandas etc. Suppose I'm trying to predict a variable w with variables x,y and z. I want to run a multiple linear regression to try and predict w. There are quite a few solutions that will produce the coefficients but I'm not sure how to use these. So, in pseudocode;

import numpy as np
from scipy import stats

w = np.array((1,2,3,4,5,6,7,8,9,10))  # Time series I'm trying to predict

x = np.array((1,3,6,1,4,6,8,9,2,2))   # The three variables to predict w
y = np.array((2,7,6,1,5,6,3,9,5,7)) 
z = np.array((1,3,4,7,4,8,5,1,8,2)) 

def model(w,x,y,z):
   # do something!

    return guess  # where guess is some 10 element array formed 
                  # using multiple linear regression of x,y,z

guess = model(w,x,y,z)
r = stats.pearsonr(w,guess) # To see how good guess is 

Hopefully this makes sense as I'm new to MLR. There is probably a package in scipy that does all this so any help welcome!

2 Answers 2

1

You can use the normal equation method. Let your equation be of the form : ax+by+cz +d =w Then

import numpy as np

x = np.asarray([[1,3,6,1,4,6,8,9,2,2],
                [2,7,6,1,5,6,3,9,5,7],
                [1,3,4,7,4,8,5,1,8,2],
                [1,1,1,1,1,1,1,1,1,1]]).T
y = numpy.asarray([1,2,3,4,5,6,7,8,9,10]).T

a,b,c,d = np.linalg.pinv((x.T).dot(x)).dot(x.T.dot(y))
Sign up to request clarification or add additional context in comments.

4 Comments

import statsmodels.api as sm
Using an OLS model gives different results for a,b,c,d as 0.0595,0.5877,0.3937 and the constant 0.5599
Could be due to the small number of input data may be.
@JamesWarner statsmodels does not add a constant, except when using formulas.
0

Think I've found out now. If anyone could confirm that this produces the correct results that'd be great!

import numpy as np
from scipy import stats

# What I'm trying to predict
y = [-6,-5,-10,-5,-8,-3,-6,-8,-8]  

# Array that stores two predictors in columns
x = np.array([[-4.95,-4.55],[-10.96,-1.08],[-6.52,-0.81],[-7.01,-4.46],[-11.54,-5.87],[-4.52,-11.64],[-3.36,-7.45],[-2.36,-7.33],[-7.65,-10.03]])

# Fit linear least squares and get regression coefficients
beta_hat = np.linalg.lstsq(x,y)[0]
print(beta_hat)

# To store my best guess
estimate = np.zeros((9))

for i in range(0,9):

    # y = x1b1 + x2b2
    estimate[i] = beta_hat[0]*x[i,0]+beta_hat[1]*x[i,1]


# Correlation between best guess and real values
print(stats.pearsonr(estimate,y))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.