Questions tagged [regression]
Techniques for analyzing the relationship between one (or more) "dependent" variables and "independent" variables.
30,767 questions
2
votes
1
answer
69
views
Why does the coefficient of a regressor increase with sample size when using poly() in lm() and glm()?
I’m trying to use the R poly() function with degree 1 to force glm to interpret a factor linearly. I’m puzzled by the fact that the size of the sample seems to increase the coefficient of the ...
1
vote
0
answers
47
views
Prediction intervals for random-design simple linear regression
I am going through the creation of a prediction interval for a value drawn from the conditional distribution of $Y$ given $X=x$ under simple linear regression as shown in the image above. The ...
0
votes
0
answers
26
views
True slope parameter for quantile regression with heterogeneous error
I am trying to perform a Monte-Carlo simulation on quantile regression using R. Currently I am getting stuck simulating the data from the model below.
...
4
votes
2
answers
109
views
+100
Narrow vs Broad-based U-shape comparisons
I’m modeling mortality using a multivariate logistic regression model with a nonlinear effect of X1 and I’m examining whether this relationship changes across ...
0
votes
1
answer
47
views
Using ordinal logistic regression to extract insights with imbalanced data
I am attempting to understand how each independent variable effects the probability of each dependent variable, which are ordinal (0, 1 and 2). Therefore, I am attempting to use ordinal logistic ...
4
votes
4
answers
318
views
+50
Borderline interaction p value
I’m working on a logistic regression model where I want to examine whether the effect of one continuous predictor (X1) on a binary outcome depends on another ...
1
vote
1
answer
35
views
Interpretation of LMM output with scaled predictors
I'm running a linear mixed model, in which I have included a few categorical variables - time, sex - with two levels, as well as three continuous nutrition variables as fixed effects and their ...
0
votes
0
answers
30
views
Partialing out a time-trend I'm unable to evaluate
I am investigating the influence of policy X on grade outcomes.
Earlier research was able to utilise a partial implementation of policy X in the population of interest to establish a natural ...
1
vote
0
answers
60
views
How to analyze the influence of a variable on an outcome in clinical pre-post data (if the pre measurement as predictor isn't enough)?
I’m trying to get a better grasp of how to handle an issue in pre–post observational data.
Let’s say I have data from a rehab center with measures at admission and discharge (only these two ...
0
votes
0
answers
60
views
What regression model do I chose for my DV?
My data is a ratio of: perceived time elapsed/actual time elapsed. Now this ranges from 0 to +infinity. It a continuous positive number.
My experiment is mixed model (with within and between subject ...
5
votes
1
answer
216
views
Number of knots in splines (internal vs total)
I’m trying to understand how natural cubic splines (splines::ns) and restricted cubic splines (rms::rcs) handle knots — ...
2
votes
0
answers
41
views
How to choose features for a Gamma regression, vs. Linear Regression
I'm new to using GLMs which are not Linear Regression, and am working on a project where I am using Gamma regression with a log-link. I'm having problems with the feature engineering step.
With linear ...
3
votes
2
answers
159
views
Understanding and interpreting Cox Regression when using ordered factors
I am trying to understand ordered factors (polynomial terms) and their interpretation in Cox Proportional Hazards regression model. I know when using lm() to fit ...
0
votes
0
answers
38
views
Maximum likelihood estimation for linear regression [duplicate]
When conducting maximum likelihood estimation for simple linear regression whilst considering the regressors as random, the joint distribution of $f_{X,Y}(x,y;\theta) = f_{Y|X}(y|x;\theta) * f_{X}(x;\...
1
vote
0
answers
49
views
What is the best statistical approach to forecast cash flow from run-off debt vintages with a growing balance?
community.
I'm facing a modeling problem for cash flow forecasting and would like to know what the most robust mathematical/statistical approach is to solve it.
The Problem: Debt Recovery Forecasting
...
1
vote
1
answer
164
views
Does strict exogeneity imply uncorrelation among error terms?
Does the strict exogeneity assumption of OLS $ \mathbb{E} [\epsilon \mid X ] = 0 $ imply that the error terms of different observations are uncorrelated with one another, that is $ \text{Cov}( \...
2
votes
2
answers
99
views
lm() and glm() equivalence for log-transformed response variable [duplicate]
I can't seem to wrap my head around this:
What is the glm() equivalent for lm(log(y) ~ x1 + x2, data=data)?
Is it?
a. ...
1
vote
1
answer
93
views
Multicollinearity in logistic regression
I would like to check for multicollinearity of the independent variables in a binary logistic regression. Some independent variables are binary (coded 0, 1), others are polytomous (converted to dummy) ...
0
votes
0
answers
19
views
Non-linear regression for modeling accuracy of ML models
Suppose I have a slow model with accuracy of between 75 and 80 %. I
want to approximate this model with faster models. Fast models require $e$ effort and the more effort the better. I want to estimate ...
2
votes
1
answer
240
views
Calculating standard errors in least squares and the normality assumption
The question titled “How are the standard errors of coefficients calculated in a regression?” is asking how the standard errors of regression coefficient estimates are computed (for example, the ...
0
votes
1
answer
28
views
In linear regression, what changes when you use robust standard errors to overcome non-constant variance?
In my first course on linear regression, I learned the 4 basic assumptions that every textbook teaches: linearity, independence, homoscedasticity, and normality. However, I recently learned about ...
5
votes
1
answer
122
views
Is there a "better" approach when it comes to model evaluation on multiple test datasets?
I have two models trained and validated on the same training/validation data.
Now I need to evaluate them on multiple independent test datasets (e.g., 10 different datasets of the same measure).
Which ...
2
votes
1
answer
110
views
In a regression, does AIC tell anything that the mean squared error does not, except for the penalty for more variables?
The equation for AIC is
$$\mathrm{AIC} = n\ln(\mathrm{MSE})+2k$$
where:
$n ={}$number of observations
$\mathrm{MSE} ={}$mean squared error
$k ={}$number of parameter estimates
The way I ...
1
vote
0
answers
110
views
Bias in standard error of regression slope with not-independent data and effective sample size
Consider a sample of $N/2$ pairs of individuals. Each pair belongs to a group $j$.
For each individual $i$ from the $N$ sample, I measure two variables ($y_{i}$ and $x_{i}$) and the average per group $...
0
votes
0
answers
41
views
Different optimal elbow points for different values of a second continuous variable in a regression model
I am analyzing the relationship between age, education, and the probability of having a high income (>50K) using data from the UCI Adult dataset. I've fit a logistic regression model with a natural ...