Multiple Linear Regression using Python

Question

Firstly, there are a few topics on this but they involve deprecated packages with pandas etc. Suppose I'm trying to predict a variable w with variables x,y and z. I want to run a multiple linear regression to try and predict w. There are quite a few solutions that will produce the coefficients but I'm not sure how to use these. So, in pseudocode;

import numpy as np
from scipy import stats

w = np.array((1,2,3,4,5,6,7,8,9,10))  # Time series I'm trying to predict

x = np.array((1,3,6,1,4,6,8,9,2,2))   # The three variables to predict w
y = np.array((2,7,6,1,5,6,3,9,5,7)) 
z = np.array((1,3,4,7,4,8,5,1,8,2)) 

def model(w,x,y,z):
   # do something!

    return guess  # where guess is some 10 element array formed 
                  # using multiple linear regression of x,y,z

guess = model(w,x,y,z)
r = stats.pearsonr(w,guess) # To see how good guess is

Hopefully this makes sense as I'm new to MLR. There is probably a package in scipy that does all this so any help welcome!

Parth Verma · Accepted Answer · 2017-11-28 10:26:42Z

1

You can use the normal equation method. Let your equation be of the form : ax+by+cz +d =w Then

import numpy as np

x = np.asarray([[1,3,6,1,4,6,8,9,2,2],
                [2,7,6,1,5,6,3,9,5,7],
                [1,3,4,7,4,8,5,1,8,2],
                [1,1,1,1,1,1,1,1,1,1]]).T
y = numpy.asarray([1,2,3,4,5,6,7,8,9,10]).T

a,b,c,d = np.linalg.pinv((x.T).dot(x)).dot(x.T.dot(y))

edited Nov 28, 2017 at 10:26

answered Nov 28, 2017 at 10:19

Parth Verma

8301 gold badge8 silver badges16 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

J W Over a year ago

import statsmodels.api as sm

J W Over a year ago

Using an OLS model gives different results for a,b,c,d as 0.0595,0.5877,0.3937 and the constant 0.5599

Parth Verma Over a year ago

Could be due to the small number of input data may be.

Josef Over a year ago

@JamesWarner statsmodels does not add a constant, except when using formulas.

J W · Accepted Answer · 2017-11-28 10:09:07Z

Think I've found out now. If anyone could confirm that this produces the correct results that'd be great!

import numpy as np
from scipy import stats

# What I'm trying to predict
y = [-6,-5,-10,-5,-8,-3,-6,-8,-8]  

# Array that stores two predictors in columns
x = np.array([[-4.95,-4.55],[-10.96,-1.08],[-6.52,-0.81],[-7.01,-4.46],[-11.54,-5.87],[-4.52,-11.64],[-3.36,-7.45],[-2.36,-7.33],[-7.65,-10.03]])

# Fit linear least squares and get regression coefficients
beta_hat = np.linalg.lstsq(x,y)[0]
print(beta_hat)

# To store my best guess
estimate = np.zeros((9))

for i in range(0,9):

    # y = x1b1 + x2b2
    estimate[i] = beta_hat[0]*x[i,0]+beta_hat[1]*x[i,1]


# Correlation between best guess and real values
print(stats.pearsonr(estimate,y))

Collectives™ on Stack Overflow

Multiple Linear Regression using Python

2 Answers 2

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related