Presenting Regression Analyses in R: Part 2
Effect sizes and diagnostics
Kevin Donovan
April 16, 2021
Introduction
- We have discussed how to do regression analyses
- Presenting analyses to communicate results just as important
- Maximizes the impact of your work
- Introduced ways to communicate results, continue this discussion
- Focus on metrics to use to quantify results from analyses
Correlation analyses
- Recall our previous visualizations
Correlation analyses
- Correlations are an example of an effect size metric
- Metric has standardized units of measure of the relationship
- Strength is same regardless of scale for \(X\), \(Y\)
\[
\text{Cor}(X,Y)=\frac{\text{Cov}(X,Y)}{\text{Var}(X)\text{Var}(Y)}
\]
Group differences
- Can use T-test and F-test to evaluate pairwise differences or multi-group differences:
- \(H_0: \mu_1=\mu_2\)
- \(H_1: \mu_1\neq\mu_2\)
\[
T=\frac{\bar{X_1}-\bar{X_2}}{s_p\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}} \sim \text{T}_{(n_1+n_2-2)}
\]
- \(H_0: \mu_1=\mu_2=\ldots=\mu_K\)
- \(H_1: \text{At least one = is } \neq\)
\[
F=\frac{\text{between-group variance}}{\text{within-group variance}} \sim \text{F}_{(K-1, N-K)}
\]
Group differences
- \(\bar{X_1}-\bar{X_2}\) depends on units of variables
- How big of a difference is ``big’’?
- Effect size: Cohen’s D
\[
d=\frac{\bar{X_1}-\bar{X_2}}{s_p}
\]
- I.e., just standardizing difference by spread
Group differences
\[
\begin{align}
&f^2=\frac{\eta^2}{1-\eta^2} \\
&\text{ where } \eta^2=\frac{SS_{group}}{SS_{total}}=\frac{SS_{group}}{SS_{group}+SS_{error}}
\end{align}
\]
- I.e., how much of the variability in variable is related to groups?
- Easy to calculate from
lm
function in R
Summary statistics
- Can create easily formatted summary stats tables in code
- Add in effect sizes using
add_stat
Characteristic | N | HR-ASD, N = 781 | HR-Neg, N = 2731 | p-value2 | ES (95% CI)3 |
EACSF_V24 | 219 | 72,487 (16,123) | 70,358 (15,022) | 0.4 | 0.14 (-0.18, 0.46) |
Missing |
| 30 | 102 |
|
|
ICV_V24 | 262 | 1,270,041 (123,302) | 1,223,180 (108,795) | 0.006 | 0.42 (0.12, 0.72) |
Missing |
| 23 | 66 |
|
|
TBV_V24 | 262 | 1,104,088 (107,222) | 1,064,502 (95,094) | 0.008 | 0.4 (0.1, 0.7) |
Missing |
| 23 | 66 |
|
|
TCV_V24 | 262 | 973,873 (99,189) | 938,532 (86,327) | 0.009 | 0.4 (0.1, 0.7) |
Missing |
| 23 | 66 |
|
|
LeftAmygdala_V24 | 262 | 1,071 (132) | 1,029 (103) | 0.013 | 0.38 (0.08, 0.68) |
Missing |
| 23 | 66 |
|
|
RightAmygdala_V24 | 262 | 1,072 (115) | 1,036 (97) | 0.020 | 0.35 (0.06, 0.65) |
Missing |
| 23 | 66 |
|
|
1Mean (SD) |
2One-way ANOVA |
3Cohen's D (95% CI) |
Summary statistics in regression
ANOVA
- Recall: ANOVA = F-test for multi-group differences
- Model:
\[
\begin{align}
&MSEL = \beta_0+\beta_1*I(Group=\text{HR-ASD})+\beta_2*I(Group=\text{HR-Neg})+\epsilon \\
& I(Group=\text{x}) \text{ is dummy variable for group x} \\
& \rightarrow \text{LR is reference group}
\end{align}
\]
- Now can express group difference test in terms of \(\beta\)
\[
\begin{align}
&H_0: \mu_{LR}=\mu_{HR-ASD}=\mu_{HR-Neg} \leftrightarrow\\
&H_0: \beta_0=\beta_0+\beta_1=\beta_0+\beta_2 \leftrightarrow\\
&H_0: \beta_1=\beta_2=0
\end{align}
\]
- \(\rightarrow\) can use Cohen’s \(f^2\) for effect size
Summary statistics in regression
\[
MSEL = \beta_0+\beta_1*I(Group=\text{HR-ASD})+\beta_2*I(Group=\text{HR-Neg})+\beta_3*TCV +\epsilon
\]
- How to define effect sizes for \(\beta\) estimates?
- Metrics:
- Semi-partial correlation $ = _{y,x|z}$ using
ppcor
in R
- Adjusted Cohen’s D $ = = $
- \(R^2\) = \(\frac{SS_{total}-SS_{error}}{SS_{total}}\)
- Recall sum of squares (SS) is
\[
\begin{align}
&SS_{total}=\sum_{i=1}^{n}(y_i-\bar{y})^2 \\
&SS_{error}=\sum_{i=1}^{n}(\epsilon_i)^2
\end{align}
\]
Summary statistics in regression
\[
MSEL = \beta_0+\beta_1*I(Group=\text{HR-ASD})+\beta_2*I(Group=\text{HR-Neg})+\beta_3*TCV+\beta_4*Age+\delta_{0,i}+\delta_{1,i}*Age +\epsilon
\]
- Looking at changing MSEL over time by group and TCV
- Random effects: intercept (\(\delta_0\)), slope for age (\(\delta_1\)), residual (\(\epsilon\))
- How to compute effect sizes
- Group differences $ = _{mixed} = $
- Semi-partial marginal \(R^2\) from
r2glmm
in R
Regression diagnostics
- With modeling need to assess model fit
- Residual distribution and variance
- Outliers and their effects on results
- Can easily create visuals using
ggfortify
and olsrr
Customized visualizations
- Can use
flextable
package with results stored as data frame
- Create any table you want
- Can use
ggplot
with ggpubr
to combine figures