I'm following a course on EdX on Programming with Python in Data Science. When using a given function to plot the results of my linear regression model, the graph seems very off with all the scatter points clustered at the bottom and the regression line way up top.
I'm not sure if it is the defined function drawLine to be incorrect or sth else is wrong with my modeling process.
here is the defined function
def drawLine(model, X_test, y_test, title, R2):
fig = plt.figure()
ax = fig.add_subplot(111)
ax.scatter(X_test, y_test, c='g', marker='o')
ax.plot(X_test, model.predict(X_test), color='orange', linewidth=1, alpha=0.7)
title += " R2: " + str(R2)
ax.set_title(title)
print(title)
print("Intercept(s): ", model.intercept_)
plt.show()
here is the code I wrote
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn import linear_model
from sklearn.model_selection import train_test_split
matplotlib.style.use('ggplot') # Look Pretty
# Reading in data
X = pd.read_csv('Datasets/College.csv', index_col=0)
# Wrangling data
X.Private = X.Private.map({'Yes':1, 'No':0})
# Splitting data
roomBoard = X[['Room.Board']]
accStudent = X[['Accept']]
X_train, X_test, y_train, y_test = train_test_split(roomBoard, accStudent, test_size=0.3, random_state=7)
# Training model
model = linear_model.LinearRegression()
model.fit(X_train, y_train)
score = model.score(X_test, y_test)
# Visualise results
drawLine(model, X_test, y_test, "Accept(Room&Board)", score)
the data I used can be found here
Thank you for your time.
Any help or advice is appreciated.