5

After importing the file when I separate the x_values and y_values using numpy as:

import pandas as pd
from sklearn import linear_model
from  matplotlib import pyplot 
import numpy as np

#read data
dataframe = pd.read_csv('challenge_dataset.txt')
dataframe.columns=['Brain','Body']
x_values=np.array(dataframe['Brain'],dtype=np.float64).reshape(1,-1)
y_values=np.array(dataframe['Body'],dtype=np.float64).reshape(1,-1)

#train model on data
body_reg = linear_model.LinearRegression()
body_reg.fit(x_values, y_values)
prediction=body_reg.predict(x_values)

print(prediction)
#visualize results
pyplot.scatter(x_values, y_values)
pyplot.plot(x_values,prediction)
pyplot.show()

I get the plot as following image, which doesn't show up the line of best fit and also when I print the value of 'prediction' it shows up values same as 'y_values'.

enter image description here Contrary when I use the following code. I get the regression line.

#read data
dataframe = pd.read_csv('challenge_dataset.txt')
dataframe.columns=['Brain','Body']
x_values=dataframe[['Brain']]
y_values=dataframe[['Body']]

enter image description here

Why is it so ?

Thanks in advance.

4
  • What would be the reason to do .reshape(1,-1)? Commented Sep 23, 2017 at 17:58
  • x_values=np.array(dataframe['Brain'],dtype=np.float64).reshape(1,-1) Because I was taking the value of column Brain in 1 dimension. I know it's weird I could have taken it in 2 dimension but I was just experimenting. Commented Sep 23, 2017 at 18:08
  • What I mean is what happens if you leave .reshape(1,-1) out? Commented Sep 23, 2017 at 18:12
  • It throws this error. ValueError: Expected 2D array, got 1D array instead: Commented Sep 23, 2017 at 18:14

1 Answer 1

7

linear_model.LinearRegression().fit(X,y) expects its arguments

X : numpy array or sparse matrix of shape [n_samples,n_features]
y : numpy array of shape [n_samples, n_targets]

Here you have 1 "feature" and 1 "target", hence the expected shape of the input would be (n_samples,1)

While this is the case for

x_values=dataframe[['Brain']]
y_values=dataframe[['Body']]

the shape for np.array(dataframe['Brain'],dtype=np.float64).reshape(1,-1) is (n_samples,).

Another option to optain the desired shape from the dataframe columns would be to broadcast them to a 2D array with a new axis

x_values=dataframe['Brain'].values[:,np.newaxis]
y_values=dataframe['Body'].values[:,np.newaxis]

Note that in order to show a nice line, you would probably want to sort the x values.

import pandas as pd
from sklearn import linear_model
from  matplotlib import pyplot 
import numpy as np

#read data
x = np.random.rand(25,2)
x[:,1] = 2*x[:,0]+np.random.rand(25)
dataframe = pd.DataFrame(x,columns=['Brain','Body'])


x_values=dataframe['Brain'].values[:,np.newaxis]
y_values=dataframe['Body'].values[:,np.newaxis]

body_reg = linear_model.LinearRegression()
body_reg.fit(x_values, y_values)
prediction=body_reg.predict(np.sort(x_values, axis=0))

pyplot.scatter(x_values, y_values)
pyplot.plot(np.sort(x_values, axis=0),prediction)
pyplot.show()

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.