2

How can you plot the linear regression results from scikit learn after the analysis to see the "testing" data (real values vs. predicted values) at the end of the program? The code below is close but I believe it is missing a scaling factor.

input:

import pandas as pd
import numpy as np
import datetime

pd.core.common.is_list_like = pd.api.types.is_list_like # temp fix
import fix_yahoo_finance as yf
from pandas_datareader import data, wb
from datetime import date
from sklearn.linear_model import LinearRegression
from sklearn import preprocessing, cross_validation, svm
import matplotlib.pyplot as plt

df = yf.download('MMM', start = date (2012, 1, 1), end = date (2018, 1, 1) , progress = False)
df_low = df[['Low']] # create a new df with only the low column
forecast_out = int(5) # predicting some days into future
df_low['low_prediction'] = df_low[['Low']].shift(-forecast_out) # create a new column based on the existing col but shifted some days

X_low = np.array(df_low.drop(['low_prediction'], 1))
X_low = preprocessing.scale(X_low) # scaling the input values

X_low_forecast = X_low[-forecast_out:] # set X_forecast equal to last 5 days
X_low = X_low[:-forecast_out] # remove last 5 days from X

y_low = np.array(df_low['low_prediction'])
y_low = y_low[:-forecast_out]

X_low_train, X_low_test, y_low_train, y_low_test = cross_validation.train_test_split(X_low, y_low, test_size = 0.2)

clf_low = LinearRegression() # classifier
clf_low.fit(X_low_train, y_low_train) # training

confidence_low = clf_low.score(X_low_test, y_low_test) # testing

print("confidence for lows: ", confidence_low)
forecast_prediction_low = clf_low.predict(X_low_forecast)
print(forecast_prediction_low)

plt.figure(figsize = (17,9))
plt.grid(True)
plt.plot(X_low_test, color = "red")
plt.plot(y_low_test, color = "green")
plt.show()

image:

enter image description here

1
  • 2
    I think you have just plotted the test data, x (X_low_test) and y (y_low_test) separately. What you have to do is to predict new y values based on the y_low_test, like: X_low_predicted = clf_low.predict(y_low_test). After that, plot the test set on predicted values of test set by plt.plot(X_low_test, y_low_test, label='real'); plt.plot(X_low_predicted, y_low_test, label='predicted'); plt.legend() Commented Oct 3, 2018 at 13:18

1 Answer 1

2

You plot y_test and X_test, while you should plot y_test and clf_low.predict(X_test) instead, if you want to compare target and predicted.

BTW, clf_low in your code is not a classifier, it is a regressor. It's better to use the alias model instead of clf.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.