MULTINOMIAL LOGISTIC
REGRESSION
Dr. Athar Khan
matharm@yahoo.com
3/29/2020 DR ATHAR KHAN 1
Dichotomous Dependent
Variable
One or more independent
variables (continuous or
categorical)
Dependent variable
"exam performance“
dichotomous scale
"passed" or "failed"
Independent variables
"revision time“
"test anxiety"
"lecture attendance"
Binomial Logistic Regression
3/29/2020 DR ATHAR KHAN 2
Nominal Variable
with more than two
levels
One or more independent
variables (continuous or
ordinal or nominal)
Dependent variable
"type of drink", with
four categories – Coffee,
Soft Drink, Tea and
Water
Independent variables
"location in UK“
"age"
Multinomial Logistic Regression is the regression analysis to
conduct when the dependent variable is nominal with more than
two levels.
3/29/2020 DR ATHAR KHAN 3
We are studying predictors of peoples’ voting behavior
during the 2018 General Election.
We hypothesize that age, gender identification, economic
beliefs, and religious beliefs will predict whether a person
voted for Imran Khan (coded 1 on “vote2018”), Nawaz Sharif
(coded 2 on “vote2018”), an Other Candidate (coded 3), or
Did Not Vote (coded 4).
[Note: The “other” category on the dependent variable was
created because of the low frequencies and it is not a
informative category]
IV IV IV IV DV3/29/2020 DR ATHAR KHAN 4
▪ Our sample is comprised of n=120 observations.
▪ “Age” is measured as self-reported age of the
participant.
▪ “Econ.conlib” is a rating of self-reported economic
liberalism - 1=extremely conservative to 7=extremely
liberal (higher scores, thus, reflect greater liberalism).
▪ “Rel.conlib” is a rating of self-reported religious
liberalism - 1=extremely conservative to 7=extremely
liberal (higher scores, thus, reflect greater liberalism).
▪ Gender identification is coded 0=identified male,
1=identified female.
3/29/2020 DR ATHAR KHAN 5
When setting up our analysis, individuals indicating they
voted for Imran Khan (group 1) as the reference (or
baseline category) against which all other groups are
compared.
[We could also be using a different reference category if
we had interests in other comparisons between a different
baseline category and the remaining groups.]
3/29/2020 DR ATHAR KHAN 6
3/29/2020 DR ATHAR KHAN 7
Because ‘genderid’ is a binary variable, we can include it
under Covariate(s), along with the remaining IV’s.
SPSS Statistics classifies continuous independent variables as covariates and
nominal independent variables as factors. However, ordinal independent
variable, as a covariate or a factor.
3/29/2020 DR ATHAR KHAN 8
We’ll need to click the ‘Reference category’ tab under
‘Dependent’ to re-set the reference category to the first
group (i.e., Imran Khan voters).
3/29/2020 DR ATHAR KHAN 9
Click on Statistics.
√ Classification Table and Goodness of fit, along with the
other defaults.
3/29/2020 DR ATHAR KHAN 10
13/20 *100 = 65%
5/20 *100 = 25%
3/29/2020 DR ATHAR KHAN 11
This table contains information on the number and % of
cases observed in each category on the dependent
variable.
3/29/2020 DR ATHAR KHAN 12
The “Model Fitting Information” table contains a Likelihood Ratio, chi-
square test, comparing the full model (i.e., containing all the predictors)
against a null (or intercept only model).
In this example, we see that the full model is a significant improvement
in fit over a null model [χ²(12)=71.567, p<.001].
Statistical significance indicates that the full model represents a
significant improvement in fit over the null model.
Full model statistically significantly predicts the dependent variable better than
the intercept-only model alone.
3/29/2020 DR ATHAR KHAN 13
Non-significant test results are indicators that the model fits the data
well. [Note: They do not always necessarily agree, as in the case we
see here. So the results are somewhat mixed.]
The “Goodness of Fit” table contains the Deviance and Pearson chi-
square tests, which are useful for determining whether a model
exhibits good fit to the data.
Pearson’s chi-square test indicates that the model does not fit the
data well [χ²(309)=370.099, p=.010], whereas the Deviance chi-square
does indicate good fit [χ²(309)=231.961, p=1.00].
deviance is a measure of goodness of fit: the smaller the deviance, the better the fit.
Model does not
fit the data well
Two measures of goodness-of-fit might not always give the same result.
3/29/2020 DR ATHAR KHAN 14
These are pseudo-R-square values that are treated as
rough analogues to the R-square value in OLS regression.
In general, there is no strong guidance in the literature on
how these should be used or interpreted.
3/29/2020 DR ATHAR KHAN 15
These results contain likelihood ratio tests of the overall
contribution of each independent variable to the model
(Note: if a variable is added in as a factor, the result for that
variable is treated as an omnibus test of that factor).
Using the conventional α=.05 threshold, we see that
economic liberalism was the only significant
predictor(p<0.001) in the model, although age was “near
significant” (at p=.051).3/29/2020 DR ATHAR KHAN 16
These results involve comparisons between each voter group
against the Reference Category (Imran Khan voters).
3/29/2020 DR ATHAR KHAN 17
The B column contains regression coefficients (expressed in
the metric of log-odds). The Exp(B) column contains odds
ratios.
3/29/2020 DR ATHAR KHAN 18
The first set of coefficients represents comparisons between Imran Khan voters
and those voting for Nawaz Sharif. Only ‘economic liberalism’ was a significant
predictor (b=-1.568, s.e.=.328, p<.001) in the model, as persons scoring higher on
this variable were less likely to vote for Nawaz Sharif. The odds ratio of .208
indicates that for every one unit increase on economic liberalism, the odds of a
person voting for Nawaz changed by a factor of .208 (in other words, the odds
were decreasing).
▪ “Econ.conlib” is a rating of self-reported economic liberalism -
1=extremely conservative to 7=extremely liberal (higher scores, thus,
reflect greater liberalism).
3/29/2020 DR ATHAR KHAN 19
The second set of coefficients represents comparisons between Imran Khan voters
and those voting for an ‘Other candidate’. Again, only ‘economic liberalism’ was a
significant predictor (b=-.808, s.e.=.290, p=.005) in the model, as persons scoring
higher on this variable were less likely to vote for the ‘Other candidate’. The odds
ratio of .446 indicates that for every one unit increase on economic liberalism, the
odds of a person voting for ‘Other candidate’ changed by a factor of .446 (in other
words, the odds were decreasing).3/29/2020 DR ATHAR KHAN 20
The final set of coefficients represents comparisons between Imran Khan voters and
those who ‘Did not vote’. Age was a significant negative predictor (b=-.106,
s.e.=.043, p=.013), indicating that persons who were older were more likely to vote
for Imran than to not vote. The regression coefficients for economic and religious
liberalism are consistent with the notion that individuals rating themselves as more
economically or religiously liberal were more likely to vote for Imran than to not
vote at all. Nevertheless, these predictors were not significant in the model.
3/29/2020 DR ATHAR KHAN 21
▪ These are classification statistics used to determine which group
memberships were best predicted by the model.
▪ Imran Khan voters were correctly predicted by the model 75.8% of the
time [as 25 of the 33 people who actually voted for Imran were
predicted to do so by the model; 25/(25+2+0+6) = .758].
▪ Nawaz Sharif voters were correctly predicted by the model 82.4% of
the time. Persons expressing that they Did Not Vote were correctly
predicted by the model 55.9% of the time.
▪ The model did a particularly poor job of predicting (at a rate of 5.3%)
those who voted for Other candidate.
3/29/2020 DR ATHAR KHAN 22
THANKS
3/29/2020 DR ATHAR KHAN 23

Multinomial Logistic Regression

  • 1.
    MULTINOMIAL LOGISTIC REGRESSION Dr. AtharKhan matharm@yahoo.com 3/29/2020 DR ATHAR KHAN 1
  • 2.
    Dichotomous Dependent Variable One ormore independent variables (continuous or categorical) Dependent variable "exam performance“ dichotomous scale "passed" or "failed" Independent variables "revision time“ "test anxiety" "lecture attendance" Binomial Logistic Regression 3/29/2020 DR ATHAR KHAN 2
  • 3.
    Nominal Variable with morethan two levels One or more independent variables (continuous or ordinal or nominal) Dependent variable "type of drink", with four categories – Coffee, Soft Drink, Tea and Water Independent variables "location in UK“ "age" Multinomial Logistic Regression is the regression analysis to conduct when the dependent variable is nominal with more than two levels. 3/29/2020 DR ATHAR KHAN 3
  • 4.
    We are studyingpredictors of peoples’ voting behavior during the 2018 General Election. We hypothesize that age, gender identification, economic beliefs, and religious beliefs will predict whether a person voted for Imran Khan (coded 1 on “vote2018”), Nawaz Sharif (coded 2 on “vote2018”), an Other Candidate (coded 3), or Did Not Vote (coded 4). [Note: The “other” category on the dependent variable was created because of the low frequencies and it is not a informative category] IV IV IV IV DV3/29/2020 DR ATHAR KHAN 4
  • 5.
    ▪ Our sampleis comprised of n=120 observations. ▪ “Age” is measured as self-reported age of the participant. ▪ “Econ.conlib” is a rating of self-reported economic liberalism - 1=extremely conservative to 7=extremely liberal (higher scores, thus, reflect greater liberalism). ▪ “Rel.conlib” is a rating of self-reported religious liberalism - 1=extremely conservative to 7=extremely liberal (higher scores, thus, reflect greater liberalism). ▪ Gender identification is coded 0=identified male, 1=identified female. 3/29/2020 DR ATHAR KHAN 5
  • 6.
    When setting upour analysis, individuals indicating they voted for Imran Khan (group 1) as the reference (or baseline category) against which all other groups are compared. [We could also be using a different reference category if we had interests in other comparisons between a different baseline category and the remaining groups.] 3/29/2020 DR ATHAR KHAN 6
  • 7.
  • 8.
    Because ‘genderid’ isa binary variable, we can include it under Covariate(s), along with the remaining IV’s. SPSS Statistics classifies continuous independent variables as covariates and nominal independent variables as factors. However, ordinal independent variable, as a covariate or a factor. 3/29/2020 DR ATHAR KHAN 8
  • 9.
    We’ll need toclick the ‘Reference category’ tab under ‘Dependent’ to re-set the reference category to the first group (i.e., Imran Khan voters). 3/29/2020 DR ATHAR KHAN 9
  • 10.
    Click on Statistics. √Classification Table and Goodness of fit, along with the other defaults. 3/29/2020 DR ATHAR KHAN 10
  • 11.
    13/20 *100 =65% 5/20 *100 = 25% 3/29/2020 DR ATHAR KHAN 11
  • 12.
    This table containsinformation on the number and % of cases observed in each category on the dependent variable. 3/29/2020 DR ATHAR KHAN 12
  • 13.
    The “Model FittingInformation” table contains a Likelihood Ratio, chi- square test, comparing the full model (i.e., containing all the predictors) against a null (or intercept only model). In this example, we see that the full model is a significant improvement in fit over a null model [χ²(12)=71.567, p<.001]. Statistical significance indicates that the full model represents a significant improvement in fit over the null model. Full model statistically significantly predicts the dependent variable better than the intercept-only model alone. 3/29/2020 DR ATHAR KHAN 13
  • 14.
    Non-significant test resultsare indicators that the model fits the data well. [Note: They do not always necessarily agree, as in the case we see here. So the results are somewhat mixed.] The “Goodness of Fit” table contains the Deviance and Pearson chi- square tests, which are useful for determining whether a model exhibits good fit to the data. Pearson’s chi-square test indicates that the model does not fit the data well [χ²(309)=370.099, p=.010], whereas the Deviance chi-square does indicate good fit [χ²(309)=231.961, p=1.00]. deviance is a measure of goodness of fit: the smaller the deviance, the better the fit. Model does not fit the data well Two measures of goodness-of-fit might not always give the same result. 3/29/2020 DR ATHAR KHAN 14
  • 15.
    These are pseudo-R-squarevalues that are treated as rough analogues to the R-square value in OLS regression. In general, there is no strong guidance in the literature on how these should be used or interpreted. 3/29/2020 DR ATHAR KHAN 15
  • 16.
    These results containlikelihood ratio tests of the overall contribution of each independent variable to the model (Note: if a variable is added in as a factor, the result for that variable is treated as an omnibus test of that factor). Using the conventional α=.05 threshold, we see that economic liberalism was the only significant predictor(p<0.001) in the model, although age was “near significant” (at p=.051).3/29/2020 DR ATHAR KHAN 16
  • 17.
    These results involvecomparisons between each voter group against the Reference Category (Imran Khan voters). 3/29/2020 DR ATHAR KHAN 17
  • 18.
    The B columncontains regression coefficients (expressed in the metric of log-odds). The Exp(B) column contains odds ratios. 3/29/2020 DR ATHAR KHAN 18
  • 19.
    The first setof coefficients represents comparisons between Imran Khan voters and those voting for Nawaz Sharif. Only ‘economic liberalism’ was a significant predictor (b=-1.568, s.e.=.328, p<.001) in the model, as persons scoring higher on this variable were less likely to vote for Nawaz Sharif. The odds ratio of .208 indicates that for every one unit increase on economic liberalism, the odds of a person voting for Nawaz changed by a factor of .208 (in other words, the odds were decreasing). ▪ “Econ.conlib” is a rating of self-reported economic liberalism - 1=extremely conservative to 7=extremely liberal (higher scores, thus, reflect greater liberalism). 3/29/2020 DR ATHAR KHAN 19
  • 20.
    The second setof coefficients represents comparisons between Imran Khan voters and those voting for an ‘Other candidate’. Again, only ‘economic liberalism’ was a significant predictor (b=-.808, s.e.=.290, p=.005) in the model, as persons scoring higher on this variable were less likely to vote for the ‘Other candidate’. The odds ratio of .446 indicates that for every one unit increase on economic liberalism, the odds of a person voting for ‘Other candidate’ changed by a factor of .446 (in other words, the odds were decreasing).3/29/2020 DR ATHAR KHAN 20
  • 21.
    The final setof coefficients represents comparisons between Imran Khan voters and those who ‘Did not vote’. Age was a significant negative predictor (b=-.106, s.e.=.043, p=.013), indicating that persons who were older were more likely to vote for Imran than to not vote. The regression coefficients for economic and religious liberalism are consistent with the notion that individuals rating themselves as more economically or religiously liberal were more likely to vote for Imran than to not vote at all. Nevertheless, these predictors were not significant in the model. 3/29/2020 DR ATHAR KHAN 21
  • 22.
    ▪ These areclassification statistics used to determine which group memberships were best predicted by the model. ▪ Imran Khan voters were correctly predicted by the model 75.8% of the time [as 25 of the 33 people who actually voted for Imran were predicted to do so by the model; 25/(25+2+0+6) = .758]. ▪ Nawaz Sharif voters were correctly predicted by the model 82.4% of the time. Persons expressing that they Did Not Vote were correctly predicted by the model 55.9% of the time. ▪ The model did a particularly poor job of predicting (at a rate of 5.3%) those who voted for Other candidate. 3/29/2020 DR ATHAR KHAN 22
  • 23.