5

I have some data like

arr = [
    [30.0, 0.0257],
    [30.0, 0.0261],
    [30.0, 0.0261],
    [30.0, 0.026],
    [30.0, 0.026],
    [35.0, 0.0387],
    [35.0, 0.0388],
    [35.0, 0.0387],
    [35.0, 0.0388],
    [35.0, 0.0388],
    [40.0, 0.0502],
    [40.0, 0.0503],
    [40.0, 0.0502],
    [40.0, 0.0498],
    [40.0, 0.0502],
    [45.0, 0.0582],
    [45.0, 0.0574],
    [45.0, 0.058],
    [45.0, 0.058],
    [45.0, 0.058],
    [50.0, 0.0702],
    [50.0, 0.0702],
    [50.0, 0.0698],
    [50.0, 0.0704],
    [50.0, 0.0703],
    [55.0, 0.0796],
    [55.0, 0.0808],
    [55.0, 0.0803],
    [55.0, 0.0805],
    [55.0, 0.0806],
]

which is plotted like

in Google Charts API

I am trying to do linear regression on this, i.e. trying to find the slope and the (y-) intercept of the trend line, and also the uncertainty in slope and uncertainty in intercept.

The Google Charts API already finds the slope and the intercept value when I draw the trend line, but I am not sure how to find the uncertainties.

I have been doing this using LINEST function in Excel, but I find this very cumbersome, since all my data are in Python.

So my question is, how can I find the two uncertainty values that I get in LINEST using Python?

I apologize for asking an elementary question like this.

I am pretty good at Python and Javascript, but I am very poor at regression analysis, so when I tried to look them up in documentations, because of the difficult terms, I got very confused.

I hope to use some well-known Python library, although it would be ideal if I could do this within Google Charts API.

2
  • I think this might help you stackoverflow.com/questions/11479064/… Commented Sep 26, 2014 at 1:58
  • I am an absolute novice when it comes to regression or any statistical methods. Unfortunately, the link does not help. Sorry. Commented Sep 26, 2014 at 2:08

1 Answer 1

4

It could be done using statsmodels like this:

import statsmodels.api as sm
import numpy as np


y=[];x=[]
for item in arr:
    x.append(item[0])
    y.append(item[1])

# include constant in ols models, which is not done by default
x = sm.add_constant(x)

model = sm.OLS(y,x)
results = model.fit()

You could then access the values you require as follows. The intercept and the slope are given by:

results.params # linear coefficients
# array([-0.036924 ,  0.0021368])

I suppose you mean the standard errors when you refer to uncertainty, they can be accessed like this:

results.bse # standard errors of the parameter estimates
# array([  1.03372221e-03,   2.38463106e-05])

An overview can be obtained by running

>>> print results.summary()
                            OLS Regression Results
==============================================================================
Dep. Variable:                      y   R-squared:                       0.997
Model:                            OLS   Adj. R-squared:                  0.996
Method:                 Least Squares   F-statistic:                     8029.
Date:                Fri, 26 Sep 2014   Prob (F-statistic):           5.61e-36
Time:                        05:47:08   Log-Likelihood:                 162.43
No. Observations:                  30   AIC:                            -320.9
Df Residuals:                      28   BIC:                            -318.0
Df Model:                           1
Covariance Type:            nonrobust
==============================================================================
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
const         -0.0369      0.001    -35.719      0.000        -0.039    -0.035
x1             0.0021   2.38e-05     89.607      0.000         0.002     0.002
==============================================================================
Omnibus:                        7.378   Durbin-Watson:                   0.569
Prob(Omnibus):                  0.025   Jarque-Bera (JB):                2.079
Skew:                           0.048   Prob(JB):                        0.354
Kurtosis:                       1.714   Cond. No.                         220.
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

This might also be of interest for a summary of the properties of the resulting model.

I did not compare to LINESTin Excel. I also don't know if this is possible using only the Google Charts API.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.