1

I'm following a course on EdX on Programming with Python in Data Science. When using a given function to plot the results of my linear regression model, the graph seems very off with all the scatter points clustered at the bottom and the regression line way up top.

I'm not sure if it is the defined function drawLine to be incorrect or sth else is wrong with my modeling process.

here is the defined function

def drawLine(model, X_test, y_test, title, R2):
    fig = plt.figure()
    ax = fig.add_subplot(111)
    ax.scatter(X_test, y_test, c='g', marker='o')
    ax.plot(X_test, model.predict(X_test), color='orange', linewidth=1, alpha=0.7)

    title += " R2: " + str(R2)
    ax.set_title(title)
    print(title)
    print("Intercept(s): ", model.intercept_)

    plt.show()

here is the code I wrote

import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn import linear_model
from sklearn.model_selection import train_test_split

matplotlib.style.use('ggplot') # Look Pretty

# Reading in data
X = pd.read_csv('Datasets/College.csv', index_col=0)

# Wrangling data
X.Private = X.Private.map({'Yes':1, 'No':0})

# Splitting data
roomBoard = X[['Room.Board']]
accStudent = X[['Accept']]
X_train, X_test, y_train, y_test = train_test_split(roomBoard, accStudent, test_size=0.3, random_state=7)

# Training model
model = linear_model.LinearRegression()
model.fit(X_train, y_train)
score = model.score(X_test, y_test)

# Visualise results
drawLine(model, X_test, y_test, "Accept(Room&Board)", score)

the data I used can be found here

Thank you for your time.
Any help or advice is appreciated.

1 Answer 1

1

In you drawLine function I changed ax.scatter to plt.scatter. I also changed roomBoard and accStudent to numpy arrays instead of pandas.Series. Finally I changed how you were updating the "private" column to

X.loc[:, "Private"] = X.Private.map({'Yes':1, 'No':0})

The Pandas docs explain why I made this change. Other small changes are cosmetic.

I got the following to work:

import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn import linear_model
from sklearn.model_selection import train_test_split

matplotlib.style.use('ggplot') # Look Pretty

# Reading in data
X = pd.read_csv('College.csv', index_col=0)

# Wrangling data
X.loc[:, "Private"] = X.Private.map({'Yes':1, 'No':0})

# Splitting data
roomBoard = X.loc[:, 'Room.Board'].values.reshape((len(X),1))
accStudent = X.loc[:, 'Accept'].values.reshape((len(X),1))
X_train, X_test, y_train, y_test = train_test_split(roomBoard, accStudent, test_size=0.3, random_state=7)

# Training model
model = linear_model.LinearRegression()
model.fit(X_train, y_train)
score = model.score(X_test, y_test)

# Visualise results
def drawLine(model, X_test, y_test, title, R2):
    fig = plt.figure()
    ax = fig.add_subplot(111)
    plt.scatter(X_test, y_test, c='g', marker='o')
    y_pred =  model.predict(X_test)
    plt.plot(X_test, y_pred, color='orange', linewidth=1, alpha=0.7)

    title += " R2: " + str(R2)
    ax.set_title(title)
    print(title)
    print("Intercept(s): ", model.intercept_)

    plt.xticks(())
    plt.yticks(())

    plt.show()

drawLine(model, X_test, y_test, "Accept(Room&Board)", score)
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you for your help it worked! I tried to make one change at a time and realised the types of roomBoard and accStudent seemed to be the most important to create a correct graph. However, I'm pretty sure in my original code roomBoard = X[['Room.Board']] returns a pandas.DataFrame. Could you enlighten me on why a numpy 2DArray works while a DataFrame doesn't?
Scikit-Learn Docs say the input must be a numpy array

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.