Implementing Linear Regression using numpy

Question

I am trying to learn the linear equation y = x1 + x2 + e where e is a random error between 0 and 0.5. The data is defined as this:

X1 = np.random.randint(1, 10000, 5000)
X2 = np.random.randint(1, 10000, 5000)
e = np.array([random.uniform(0, 0.5) for i in range(5000)])
y = X1 + X2 + e

When I am implementing a simple gradient descent to find the parameters, the Loss and gradients all are exploding. Where am I going wrong? The code for gradient descent:

w1, w2, b = 1, 1, 0
n = X1.shape[0]
alpha = 0.01
for i in range(5):
    y_pred = w1 * X1 + w2 * X2 + b
    L = np.sum(np.square(y - y_pred))/(2 * n)
    dL_dw1 = (-1/n) * np.sum((y - y_pred) * X1)
    dL_dw2 = (-1/n) * np.sum((y - y_pred) * X2)
    dL_db = (-1/n) * np.sum((y - y_pred))
    w1 = w1 - alpha * dL_dw1
    w2 = w2 - alpha * dL_dw2
    b = b - alpha * dL_db
    print(L, w1, w2, b)

The output for this is:

0.042928723015982384 ,  13.7023102434034 ,  13.670617201430483 ,  0.00254938447277222 

9291487188.8259 ,  -7353857.489486973 ,  -7293941.123714662 ,  -1261.9252592161051 

3.096713445664372e+21 ,  4247172241132.3584 ,  4209117175658.749 ,  728518135.2857293 

1.0320897597938595e+33 ,  -2.4520737800716524e+18 ,  -2.4298158059267333e+18 ,  -420579738783719.2 

3.4398058610314825e+44 ,  1.415615899689713e+24 ,  1.402742160404974e+24 ,  2.428043942370682e+20

mujjiga · Accepted Answer · 2021-04-29 16:31:33Z

1

All you are missing is Data normalization. For Gradient based learning algorithms you have to make sure the data is normalized i.e it has mean=0 and std=1.

Lets verify so by having a constant error (say e=33).

X1 = np.random.randint(1, 10000, 5000)
X2 = np.random.randint(1, 10000, 5000)
e = 33

# Normalize data
X1 = (X1 - np.mean(X1))/np.std(X1)
X2 = (X2 - np.mean(X2))/np.std(X2)

y = X1 + X2 + e


w1, w2, b = np.random.rand(), np.random.rand(), np.random.rand()

n = X1.shape[0]
alpha = 0.01
for i in range(1000):
    y_pred = w1 * X1 + w2 * X2 + b
    L = np.sum(np.square(y - y_pred))/(2 * n)
    dL_dw1 = (-1/n) * np.sum((y - y_pred) * X1)
    dL_dw2 = (-1/n) * np.sum((y - y_pred) * X2)
    dL_db = (-1/n) * np.sum((y - y_pred))
    w1 = w1 - alpha * dL_dw1
    w2 = w2 - alpha * dL_dw2
    b = b - alpha * dL_db
    
    if (i)%100 == 0:
        print(L)
    
print (w1, w2, b)

Output:

Loss: 517.7575710514508
Loss: 69.36601211594098
Loss: 9.29326322560041
Loss: 1.2450619081931993
Loss: 0.16680720657514425
Loss: 0.022348057963833764
Loss: 0.002994096883392299
Loss: 0.0004011372165515275
Loss: 5.374289796164062e-05
Loss: 7.2002934167549005e-06
0.9999609731610163 0.9999911458582055 32.99861157362915

As you can see it did converge.

There are no issues in your code except that you have to normalize your data.

Now you can plug back your error and find the best possible estimates.

edited Apr 29, 2021 at 16:31

answered Apr 29, 2021 at 16:26

mujjiga

17.1k2 gold badges37 silver badges54 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

dufrmbgr Over a year ago

Thank you. I tried with normalization and it worked. But I actually did the same problem using sklearn's default LinearRegression model and it was able to handle the same without normalization. I thought I was missing something. The default parameter for normalize here is False and it took care of the problem without explicitly mentioning the same to True. see [scikit-learn.org/stable/modules/generated/…. I was wondering how it's able to solve this without normalization. Any thoughts?

mujjiga Over a year ago

@dudefrmbgr sklearn LR model does not use Gradient decent for learning but rather uses optimization of functional form.

dufrmbgr Over a year ago

Got it..Thank you

vegabondx · Accepted Answer · 2021-04-29 16:17:06Z

1

Okay there are a few problem with the problem formulation

Scaling: Gradient descents generally need the variables to be scaled well in order to ensure that the alpha can be set properly. Everything is relative in most of the cases and you can always multiply a problem by a fixed constant. However because the weights are manipulated directly by alpha value the very high or very low values of the weights are harder to reach I am hereby scaling your mechanism down by about 10000 and also reducing the random error to scale

import numpy as np
import random
X1 = np.random.random(5000)
X2 = np.random.random(5000)
e = np.array([random.uniform(0, 0.0005) for i in range(5000)])
y = X1 + X2 + e

Dependence of y_pred on b: The Value of B i am not sure what it is supposed to do and why are you explicitly introducing an error to y_pred. Your prediction should assume that there is no error :D
If X and Ys are scaled well a few tries with hyperparameter would yield a good value

for i in range(5):
    y_pred = w1 * X1 + w2 * X2
    L = np.sum(np.square(y - y_pred))/(2 * n)
    dL_dw1 = -(1/n) * np.sum((y - y_pred) * X1)
    dL_dw2 = -(1/n) * np.sum((y - y_pred) * X2)
    dL_db = -(1/n) * np.sum((y - y_pred))
    w1 = w1 - alpha * dL_dw1
    w2 = w2 - alpha * dL_dw2
    print(L, w1, w2)

You can play around with those values but they will converge

w1, w2, b = 1.1, 0.9, 0.01
alpha = 1
0.0008532534726479387 1.0911950693892498 0.9082610891021278
0.0007137567968828647 1.0833134985852988 0.9159869797801239
0.0005971536415151483 1.0761750602775175 0.9231234590515701
0.0004996145120126794 1.0696746682185534 0.9296797694772246
0.0004180103133293466 1.0637407602096771 0.9356885401106588

answered Apr 29, 2021 at 16:17

vegabondx

714 bronze badges

2 Comments

dufrmbgr Over a year ago

Thank you. I am not sure about point number 2. In this case just for learning purpose I set the function as per my wish and i think even if I modelled an additional parameter it should automatically learn to make it zero. Also, I tried the same with Sklearns default model "sklearn.linear_model.LinearRegression" and it took care of the problem without considering normalization. I was wondering how it was able to handle this without explicitly setting the parameter normalize to True as per this link. [scikit-learn.org/stable/modules/generated/….

vegabondx Over a year ago

@dudefrmbgr I think Sklearn is a least squares closed form expression solver ( it does not use SGD ) kaggle.com/general/22793. Also note that sklearn regression method you are citing also provides a normalization parameter.

Collectives™ on Stack Overflow

Implementing Linear Regression using numpy

2 Answers 2

3 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related