I am trying to learn the linear equation y = x1 + x2 + e where e is a random error between 0 and 0.5. The data is defined as this:
X1 = np.random.randint(1, 10000, 5000)
X2 = np.random.randint(1, 10000, 5000)
e = np.array([random.uniform(0, 0.5) for i in range(5000)])
y = X1 + X2 + e
When I am implementing a simple gradient descent to find the parameters, the Loss and gradients all are exploding. Where am I going wrong? The code for gradient descent:
w1, w2, b = 1, 1, 0
n = X1.shape[0]
alpha = 0.01
for i in range(5):
y_pred = w1 * X1 + w2 * X2 + b
L = np.sum(np.square(y - y_pred))/(2 * n)
dL_dw1 = (-1/n) * np.sum((y - y_pred) * X1)
dL_dw2 = (-1/n) * np.sum((y - y_pred) * X2)
dL_db = (-1/n) * np.sum((y - y_pred))
w1 = w1 - alpha * dL_dw1
w2 = w2 - alpha * dL_dw2
b = b - alpha * dL_db
print(L, w1, w2, b)
The output for this is:
0.042928723015982384 , 13.7023102434034 , 13.670617201430483 , 0.00254938447277222
9291487188.8259 , -7353857.489486973 , -7293941.123714662 , -1261.9252592161051
3.096713445664372e+21 , 4247172241132.3584 , 4209117175658.749 , 728518135.2857293
1.0320897597938595e+33 , -2.4520737800716524e+18 , -2.4298158059267333e+18 , -420579738783719.2
3.4398058610314825e+44 , 1.415615899689713e+24 , 1.402742160404974e+24 , 2.428043942370682e+20