How to plot SciKit-Learn linear regression graph

Question

I am new to SciKit-Learn and I have been working on a regression problem (king county csv) on kaggle. I have been training a regression model to predict the price of the house and I wanted to plot the graph but I have no idea how to do so. I am using python 3.6. Any advice or suggestion would be greatly appreciated.

#importing numpy and pandas, seaborn

import numpy as np #linear algebra
import pandas as pd #datapreprocessing, CSV file I/O
import seaborn as sns #for plotting graphs
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
import matplotlib.pyplot as plt

data = pd.read_csv('kc_house_data.csv')
data = data.drop('date',axis=1)
data = data.drop('id',axis=1)

X = data
Y = X['price'].values
X = X.drop('price', axis = 1).values

X_train, X_test, Y_train, Y_test = train_test_split (X, Y, test_size = 0.30, random_state=21)


reg = LinearRegression()
kfold = KFold(n_splits=15, random_state=21)
cv_results = cross_val_score(reg, X_train, Y_train, cv=kfold, scoring='r2')

print(cv_results)

round(np.mean(cv_results)*100, 2)

ardito.bryan · Accepted Answer · 2020-07-12 07:57:46Z

1

This is the code from sklearn: https://scikit-learn.org/stable/auto_examples/linear_model/plot_ols.html

answered Jul 12, 2020 at 7:57

ardito.bryan

5231 gold badge10 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Imran · Accepted Answer · 2020-07-12 08:11:51Z

1

You can use matplotlib for plotting

import matplotlib.pyplot as plt
plt.figure(figsize=(16, 9))
plt.plot(cv_results)
    
plt.show()

There can be multiple type of plots you can use like simple line plot or scatter plot.

plt.barh(x, y) # for bar graph
plt.plot(x,y)  # for line graph
plt.scatter(x,y) # for scatter graph

answered Jul 12, 2020 at 8:11

Imran

8959 silver badges22 bronze badges

Comments

ajay sagar · Accepted Answer · 2020-07-12 08:18:11Z

1

Seaborn is a very useful visualization library. So much so that you can use 'seaborn.regplot' to directly plot the data and regression-model-fit line. It directly takes in the predictor variable and response variable, and spits out the plot of data points and best fit line. Here is the link on how to use it:

https://seaborn.pydata.org/generated/seaborn.regplot.html

edited Jul 12, 2020 at 8:18

answered Jul 12, 2020 at 8:12

ajay sagar

1991 silver badge6 bronze badges

Comments

Michele Giglioni · Accepted Answer · 2020-07-12 10:19:56Z

0

I have also done the same competition on kaggle. For regressions I would go for a scatter plot:

import matplotlib as plt
plt.plot(x,y)

As for the visualisations on that particular competition I would use the following code:

# visualising some more outliers in the data values
fig, axs = plt.subplots(ncols=2, nrows=0, figsize=(12, 120))
plt.subplots_adjust(right=2)
plt.subplots_adjust(top=2)
sns.color_palette("husl", 8)
for i, feature in enumerate(list(train[numeric]), 1):
if(feature=='MiscVal'):
    break
plt.subplot(len(list(numeric)), 3, i)
sns.scatterplot(x=feature, y='SalePrice', hue='SalePrice', palette='Blues', data=train)
    
plt.xlabel('{}'.format(feature), size=15,labelpad=12.5)
plt.ylabel('SalePrice', size=15, labelpad=12.5)

for j in range(2):
    plt.tick_params(axis='x', labelsize=12)
    plt.tick_params(axis='y', labelsize=12)

plt.legend(loc='best', prop={'size': 10})
    
plt.show()

I have actually uploaded the full code for that competition on my GitHub if you want to have a look ;) (I am currently in the top 14% on that competition).

edited Jul 12, 2020 at 10:19

answered Jul 12, 2020 at 10:14

Michele Giglioni

3394 silver badges15 bronze badges

2 Comments

Joel2342 Over a year ago

thanks bro, just to clairfy, what does the feature == 'MiscVal' mean

Michele Giglioni Over a year ago

no problem. "MiscVal" is one of the columns of the train.csv file. According to the kaggle description it's basically the value assigned to the features not covered in other categories (lot area, street, building quality, roof material, ...)

Collectives™ on Stack Overflow

How to plot SciKit-Learn linear regression graph

4 Answers 4

Comments

Comments

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related