I am running (what I think is) as fairly straightforward multiple linear regression model fit using Stats model.
My code is as follows:
y = 'EXITS|20:00:00'
all_columns = "+".join(y_2015piv.columns - ['EXITS|20:00:00'])
reg_formula = "y~" + all_columns
lm= smf.ols(formula=reg_formula, data=y_2015piv).fit()
Because I have about 30 factor variables I'm creating the formula using Python string manipulation. "y" is as presented above. all_columns is the dataframe y_2015piv columns without "y".
This is all_columns:
DAY_Fri+DAY_Mon+DAY_Sat+DAY_Sun+DAY_Thu+DAY_Tue+DAY_Wed+ENTRIES|00:00:00+ENTRIES|04:00:00+ENTRIES|08:00:00+ENTRIES|12:00:00+ENTRIES|16:00:00+ENTRIES|20:00:00+EXITS|00:00:00+EXITS|04:00:00+EXITS|08:00:00+EXITS|12:00:00+EXITS|16:00:00+MONTH_Apr+MONTH_Aug+MONTH_Dec+MONTH_Feb+MONTH_Jan+MONTH_Jul+MONTH_Jun+MONTH_Mar+MONTH_May+MONTH_Nov+MONTH_Oct+MONTH_Sep
The values in the dataframe are continuous numerical variables and 0/1 dummy variables.
When I try and fit the model I get this error:
PatsyError: numbers besides '0' and '1' are only allowed with **
y~DAY_Fri+DAY_Mon+DAY_Sat+DAY_Sun+DAY_Thu+DAY_Tue+DAY_Wed+ENTRIES|00:00:00+ENTRIES|04:00:00+ENTRIES|08:00:00+ENTRIES|12:00:00+ENTRIES|16:00:00+ENTRIES|20:00:00+EXITS|00:00:00+EXITS|04:00:00+EXITS|08:00:00+EXITS|12:00:00+EXITS|16:00:00+MONTH_Apr+MONTH_Aug+MONTH_Dec+MONTH_Feb+MONTH_Jan+MONTH_Jul+MONTH_Jun+MONTH_Mar+MONTH_May+MONTH_Nov+MONTH_Oct+MONTH_Sep
There is nothing on line that addresses what this could be. Any help appreciated.
By the way, when I fit this model in Scikit-learn it works fine. So I figure the data is in order.
Thanks in advance.