0

I am trying to learn the linear equation y = x1 + x2 + e where e is a random error between 0 and 0.5. The data is defined as this:

X1 = np.random.randint(1, 10000, 5000)
X2 = np.random.randint(1, 10000, 5000)
e = np.array([random.uniform(0, 0.5) for i in range(5000)])
y = X1 + X2 + e

When I am implementing a simple gradient descent to find the parameters, the Loss and gradients all are exploding. Where am I going wrong? The code for gradient descent:

w1, w2, b = 1, 1, 0
n = X1.shape[0]
alpha = 0.01
for i in range(5):
    y_pred = w1 * X1 + w2 * X2 + b
    L = np.sum(np.square(y - y_pred))/(2 * n)
    dL_dw1 = (-1/n) * np.sum((y - y_pred) * X1)
    dL_dw2 = (-1/n) * np.sum((y - y_pred) * X2)
    dL_db = (-1/n) * np.sum((y - y_pred))
    w1 = w1 - alpha * dL_dw1
    w2 = w2 - alpha * dL_dw2
    b = b - alpha * dL_db
    print(L, w1, w2, b)

The output for this is:

0.042928723015982384 ,  13.7023102434034 ,  13.670617201430483 ,  0.00254938447277222 

9291487188.8259 ,  -7353857.489486973 ,  -7293941.123714662 ,  -1261.9252592161051 

3.096713445664372e+21 ,  4247172241132.3584 ,  4209117175658.749 ,  728518135.2857293 

1.0320897597938595e+33 ,  -2.4520737800716524e+18 ,  -2.4298158059267333e+18 ,  -420579738783719.2 

3.4398058610314825e+44 ,  1.415615899689713e+24 ,  1.402742160404974e+24 ,  2.428043942370682e+20 

2 Answers 2

1

All you are missing is Data normalization. For Gradient based learning algorithms you have to make sure the data is normalized i.e it has mean=0 and std=1.

Lets verify so by having a constant error (say e=33).

X1 = np.random.randint(1, 10000, 5000)
X2 = np.random.randint(1, 10000, 5000)
e = 33

# Normalize data
X1 = (X1 - np.mean(X1))/np.std(X1)
X2 = (X2 - np.mean(X2))/np.std(X2)

y = X1 + X2 + e


w1, w2, b = np.random.rand(), np.random.rand(), np.random.rand()

n = X1.shape[0]
alpha = 0.01
for i in range(1000):
    y_pred = w1 * X1 + w2 * X2 + b
    L = np.sum(np.square(y - y_pred))/(2 * n)
    dL_dw1 = (-1/n) * np.sum((y - y_pred) * X1)
    dL_dw2 = (-1/n) * np.sum((y - y_pred) * X2)
    dL_db = (-1/n) * np.sum((y - y_pred))
    w1 = w1 - alpha * dL_dw1
    w2 = w2 - alpha * dL_dw2
    b = b - alpha * dL_db
    
    if (i)%100 == 0:
        print(L)
    
print (w1, w2, b)

Output:

Loss: 517.7575710514508
Loss: 69.36601211594098
Loss: 9.29326322560041
Loss: 1.2450619081931993
Loss: 0.16680720657514425
Loss: 0.022348057963833764
Loss: 0.002994096883392299
Loss: 0.0004011372165515275
Loss: 5.374289796164062e-05
Loss: 7.2002934167549005e-06
0.9999609731610163 0.9999911458582055 32.99861157362915

As you can see it did converge.

There are no issues in your code except that you have to normalize your data.

Now you can plug back your error and find the best possible estimates.

Sign up to request clarification or add additional context in comments.

3 Comments

Thank you. I tried with normalization and it worked. But I actually did the same problem using sklearn's default LinearRegression model and it was able to handle the same without normalization. I thought I was missing something. The default parameter for normalize here is False and it took care of the problem without explicitly mentioning the same to True. see [scikit-learn.org/stable/modules/generated/…. I was wondering how it's able to solve this without normalization. Any thoughts?
@dudefrmbgr sklearn LR model does not use Gradient decent for learning but rather uses optimization of functional form.
Got it..Thank you
1

Okay there are a few problem with the problem formulation

  1. Scaling: Gradient descents generally need the variables to be scaled well in order to ensure that the alpha can be set properly. Everything is relative in most of the cases and you can always multiply a problem by a fixed constant. However because the weights are manipulated directly by alpha value the very high or very low values of the weights are harder to reach I am hereby scaling your mechanism down by about 10000 and also reducing the random error to scale
import numpy as np
import random
X1 = np.random.random(5000)
X2 = np.random.random(5000)
e = np.array([random.uniform(0, 0.0005) for i in range(5000)])
y = X1 + X2 + e
  1. Dependence of y_pred on b: The Value of B i am not sure what it is supposed to do and why are you explicitly introducing an error to y_pred. Your prediction should assume that there is no error :D

  2. If X and Ys are scaled well a few tries with hyperparameter would yield a good value

for i in range(5):
    y_pred = w1 * X1 + w2 * X2
    L = np.sum(np.square(y - y_pred))/(2 * n)
    dL_dw1 = -(1/n) * np.sum((y - y_pred) * X1)
    dL_dw2 = -(1/n) * np.sum((y - y_pred) * X2)
    dL_db = -(1/n) * np.sum((y - y_pred))
    w1 = w1 - alpha * dL_dw1
    w2 = w2 - alpha * dL_dw2
    print(L, w1, w2)
    

You can play around with those values but they will converge

w1, w2, b = 1.1, 0.9, 0.01
alpha = 1
0.0008532534726479387 1.0911950693892498 0.9082610891021278
0.0007137567968828647 1.0833134985852988 0.9159869797801239
0.0005971536415151483 1.0761750602775175 0.9231234590515701
0.0004996145120126794 1.0696746682185534 0.9296797694772246
0.0004180103133293466 1.0637407602096771 0.9356885401106588

2 Comments

Thank you. I am not sure about point number 2. In this case just for learning purpose I set the function as per my wish and i think even if I modelled an additional parameter it should automatically learn to make it zero. Also, I tried the same with Sklearns default model "sklearn.linear_model.LinearRegression" and it took care of the problem without considering normalization. I was wondering how it was able to handle this without explicitly setting the parameter normalize to True as per this link. [scikit-learn.org/stable/modules/generated/….
@dudefrmbgr I think Sklearn is a least squares closed form expression solver ( it does not use SGD ) kaggle.com/general/22793. Also note that sklearn regression method you are citing also provides a normalization parameter.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.