I am trying to understand ordered factors (polynomial terms) and their interpretation in Cox Proportional Hazards regression model. I know when using lm() to fit linear models, you can model a a variable as an unoredered factor, ordered factor, or quantitative variable.
I will use the mtcars dataset and then the mort dataset:
library(eha)
library(survival)
library(cmprsk)
library(car)
mtcarsData <- mtcars
mtcarsData$gearOF <- factor(mtcarsData$gear, c(3, 4, 5),
labels(3, 4, 5), ordered=TRUE)
mtcarsModel1 <- lm(mpg~gear, data=mtcarsData)
mtcarsModel2 <- lm(mpg~gearOF, data=mtcarsData)
mtcarsModel1 output:
Call:
lm(formula = mpg ~ gear, data = mtcarsData)
Residuals:
Min 1Q Median 3Q Max
-10.240 -2.793 -0.205 2.126 12.583
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.623 4.916 1.144 0.2618
gear 3.923 1.308 2.999 0.0054 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 5.374 on 30 degrees of freedom
Multiple R-squared: 0.2307, Adjusted R-squared: 0.205
F-statistic: 8.995 on 1 and 30 DF, p-value: 0.005401
I know here that gear has a significant effect on cars mpg. I know that the equation looks something like the following:
y = (B1 X) + B0
y = (3.923 * gear) + 5.623
For mtcarsModel2 output:
Call:
lm(formula = mpg ~ gearOF, data = mtcarsData)
Residuals:
Min 1Q Median 3Q Max
-6.7333 -3.2333 -0.9067 2.8483 9.3667
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 20.6733 0.9284 22.267 < 2e-16 ***
gearOF.L 3.7288 1.7191 2.169 0.03842 *
gearOF.Q -4.7275 1.4888 -3.175 0.00353 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 4.708 on 29 degrees of freedom
Multiple R-squared: 0.4292, Adjusted R-squared: 0.3898
F-statistic: 10.9 on 2 and 29 DF, p-value: 0.0002948
I know that .L and .Q refer to linear and quadratic polynomial terms. I think something like the following (correct me if I am wrong, I want to get better):
y = B2*X2^2 + B1*X1 + B0
y = (-4.7275 * X2^2) + (3.7288 * X1) + 20.6733
To me this second model is more of an interpretation of the trend of the data of gears as an ordered factor on mpg. That it could be modeled as either linear or quadratic regression. Is this a fair interpretation or am I wrong in thinking this?
Does the same hold true for a Cox Proportional Hazards model when looking at an odered factor?
Below I wil use the mort dataset as an example:
A data frame with 2058 observations on the following 6 variables:
id Personal identification number.
enter Start of duration. Measured in years since the fortieth birthday.
exit End of duration. Measured in years since the fortieth birthday.
event a logical vector indicating death at end of interval.
birthdate The birthdate in decimal form.
ses Socio-economic status, a factor with levels lower, upper
Example:
data <- mort
data$sesOF <- factor(data$ses, c('lower', 'upper'),
label=c('lower', 'upper'), ordered=TRUE)
surv_obj <- Surv(data$enter, data$exit, data$event )
model1 <- coxph(surv_obj ~ ses, data=data)
model2 <- coxph(surv_obj ~ sesOF, data=data)
Model1 output:
Call:
coxph(formula = surv_obj ~ ses, data = data)
n= 1208, number of events= 276
coef exp(coef) se(coef) z Pr(>|z|)
sesupper -0.4795 0.6191 0.1207 -3.972 7.13e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
exp(coef) exp(-coef) lower .95 upper .95
sesupper 0.6191 1.615 0.4886 0.7844
Concordance= 0.56 (se = 0.015 )
Likelihood ratio test= 15.81 on 1 df, p=7e-05
Wald test = 15.77 on 1 df, p=7e-05
Score (logrank) test = 16.08 on 1 df, p=6e-05
Model2 output:
Call:
coxph(formula = surv_obj ~ sesOF, data = data)
n= 1208, number of events= 276
coef exp(coef) se(coef) z Pr(>|z|)
sesOF.L -0.33908 0.71243 0.08537 -3.972 7.13e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
exp(coef) exp(-coef) lower .95 upper .95
sesOF.L 0.7124 1.404 0.6027 0.8422
Concordance= 0.56 (se = 0.015 )
Likelihood ratio test= 15.81 on 1 df, p=7e-05
Wald test = 15.77 on 1 df, p=7e-05
Score (logrank) test = 16.08 on 1 df, p=6e-05
My interpretation of model1 is that an upper socio-economic status lowers the risk of death (HR of 0.6191).
In model2, when socio-economic status is modeled as an odered factor it returns .L polynomial term. Would similar interpretations be made to model mtcarsModel2 in the sense that this output is more of an interpretation of the trend of socio-economic status as an ordered factor?
My intuition tells me that if I were to be reporting this output, model1 would be the preffered analysis to report, while model2 here tells us more about the trend of the data.
I appreciate any help on this! :)
References:
2- Interpreting the estimate of an ordered factor in regression
3- GLM interpretation of parameters of ordinal predictor variables
4- polynomial contrasts in survival models
5- Results of lm() function with a dependent ordered categorical variable?
