Generalized Linear Models

Kevin Donovan

February 28, 2021

Introduction

Recall: Discussed association analyses with correlation and linear regression

\[ \begin{align} &Y=\beta_0+\beta_1X_1+\ldots+\beta_pX_p+\epsilon \\ &\\ &\text{where E}(\epsilon)=0 \text{; Var}(\epsilon)=\sigma^2 \\ &\epsilon_i \perp \epsilon_j \text{ for }i\neq j; X_1,\ldots,X_p\perp \epsilon \end{align} \]

Generalized linear model

Regression for binary outcomes

Logistic regression

Logistic regression

Logistic regression

Model fitting

Maximum likelihood example

## 
## Call:
## glm(formula = factor(SSM_ASD_v24) ~ `V24 mullen,composite_standard_score`, 
##     family = binomial(), data = ibis_data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.1954  -0.5375  -0.3091  -0.1466   3.2387  
## 
## Coefficients:
##                                        Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                            6.671167   0.827694    8.06 7.63e-16 ***
## `V24 mullen,composite_standard_score` -0.088884   0.009268   -9.59  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 514.73  on 572  degrees of freedom
## Residual deviance: 368.89  on 571  degrees of freedom
##   (14 observations deleted due to missingness)
## AIC: 372.89
## 
## Number of Fisher Scoring iterations: 6

Logistic regression

Estimated model: \(\hat{\text{Pr}}[ASD=\text{YES}|MSEL]=\frac{e^{6.67-0.09MSEL}}{1+e^{6.67-0.09MSEL}}\)

Interpretation:

  1. \(\hat{\beta_0}=6.67\)
  1. \(\hat{\beta_1}=-0.09\)

Logistic regression

Intercept:

Solution: center at means

## 
## Call:
## glm(formula = factor(SSM_ASD_v24) ~ mullen_center, family = binomial(), 
##     data = ibis_data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.1954  -0.5375  -0.3091  -0.1466   3.2387  
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   -2.336873   0.180835  -12.92   <2e-16 ***
## mullen_center -0.088884   0.009268   -9.59   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 514.73  on 572  degrees of freedom
## Residual deviance: 368.89  on 571  degrees of freedom
##   (14 observations deleted due to missingness)
## AIC: 372.89
## 
## Number of Fisher Scoring iterations: 6

Interpretation:

  1. \(\hat{\beta_0} = -2.34\)

\[ \hat{\text{Pr}}[ASD=\text{YES}|MSEL=\mu]=\frac{e^{-2.34}}{1+e^{-2.34}} \]

Slope \(\hat{\beta_1}\) not changed

Logistic Regression

Model-based estimated probabilities (non-centered):

For patient with MSEL=100 (mean in population for standard score)

\[ \hat{\text{Pr}}[ASD=\text{YES}|MSEL=90]=\frac{e^{6.67-0.09*90}}{1+e^{6.67-0.09*90}}=0.19 \]

Based on \(\hat{\text{Pr}}[ASD=\text{YES}|MSEL=90]\) can create predicted response \(\hat{ASD}\) by thresholding

Logistic regression: confounding

Example: Credit Card Default Rate

missing
missing

Logistic regression: confounding

missing

Generalized linear models

Structure of model:

  1. Choose conditional distribution \(f(y|x)\)

Generalized linear models

  1. Choose link function \(g(\mu_{y|x})=\beta_0+\beta_1X_1+\ldots+\beta_pX_p\)
  1. Construct likelihood and fit

Generalized linear models