Presenting Regression Analyses in R: Part 2
Effect sizes and diagnostics

Kevin Donovan

April 16, 2021

Introduction

We have discussed how to do regression analyses
- Presenting analyses to communicate results just as important
- Maximizes the impact of your work
Introduced ways to communicate results, continue this discussion
- Focus on metrics to use to quantify results from analyses

Correlation analyses

Recall our previous visualizations

Correlation analyses

Correlations are an example of an effect size metric
- Metric has standardized units of measure of the relationship
- Strength is same regardless of scale for $X$, $Y$

\[ \text{Cor}(X,Y)=\frac{\text{Cov}(X,Y)}{\text{Var}(X)\text{Var}(Y)} \]

Group differences

Can use T-test and F-test to evaluate pairwise differences or multi-group differences:
- $H_0: \mu_1=\mu_2$
- $H_1: \mu_1\neq\mu_2$

\[ T=\frac{\bar{X_1}-\bar{X_2}}{s_p\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}} \sim \text{T}_{(n_1+n_2-2)} \]

$H_0: \mu_1=\mu_2=\ldots=\mu_K$
$H_1: \text{At least one = is } \neq$

\[ F=\frac{\text{between-group variance}}{\text{within-group variance}} \sim \text{F}_{(K-1, N-K)} \]

Group differences

$\bar{X_1}-\bar{X_2}$ depends on units of variables
- How big of a difference is ``big’’?
- Effect size: Cohen’s D

\[ d=\frac{\bar{X_1}-\bar{X_2}}{s_p} \]

I.e., just standardizing difference by spread

Group differences

How about for F-Test?
- Cohen’s $f^2$:

\[ \begin{align} &f^2=\frac{\eta^2}{1-\eta^2} \\ &\text{ where } \eta^2=\frac{SS_{group}}{SS_{total}}=\frac{SS_{group}}{SS_{group}+SS_{error}} \end{align} \]

I.e., how much of the variability in variable is related to groups?
Easy to calculate from lm function in R

Summary statistics

Can create easily formatted summary stats tables in code
Add in effect sizes using add_stat

Characteristic	N	HR-ASD, N = 781	HR-Neg, N = 2731	p-value2	ES (95% CI)3
EACSF_V24	219	72,487 (16,123)	70,358 (15,022)	0.4	0.14 (-0.18, 0.46)
Missing		30	102
ICV_V24	262	1,270,041 (123,302)	1,223,180 (108,795)	0.006	0.42 (0.12, 0.72)
Missing		23	66
TBV_V24	262	1,104,088 (107,222)	1,064,502 (95,094)	0.008	0.4 (0.1, 0.7)
Missing		23	66
TCV_V24	262	973,873 (99,189)	938,532 (86,327)	0.009	0.4 (0.1, 0.7)
Missing		23	66
LeftAmygdala_V24	262	1,071 (132)	1,029 (103)	0.013	0.38 (0.08, 0.68)
Missing		23	66
RightAmygdala_V24	262	1,072 (115)	1,036 (97)	0.020	0.35 (0.06, 0.65)
Missing		23	66
1Mean (SD)
2One-way ANOVA
3Cohen's D (95% CI)

Summary statistics in regression

ANOVA
- Recall: ANOVA = F-test for multi-group differences
- Model:

\[ \begin{align} &MSEL = \beta_0+\beta_1*I(Group=\text{HR-ASD})+\beta_2*I(Group=\text{HR-Neg})+\epsilon \\ & I(Group=\text{x}) \text{ is dummy variable for group x} \\ & \rightarrow \text{LR is reference group} \end{align} \]

Now can express group difference test in terms of $\beta$

\[ \begin{align} &H_0: \mu_{LR}=\mu_{HR-ASD}=\mu_{HR-Neg} \leftrightarrow\\ &H_0: \beta_0=\beta_0+\beta_1=\beta_0+\beta_2 \leftrightarrow\\ &H_0: \beta_1=\beta_2=0 \end{align} \]

$\rightarrow$ can use Cohen’s $f^2$ for effect size

Summary statistics in regression

General regression
- Consider model

\[ MSEL = \beta_0+\beta_1*I(Group=\text{HR-ASD})+\beta_2*I(Group=\text{HR-Neg})+\beta_3*TCV +\epsilon \]

How to define effect sizes for $\beta$ estimates?
Metrics:
- Semi-partial correlation $ = _{y,x|z}$ using ppcor in R
- Adjusted Cohen’s D $ = = $
- $R^2$ = $\frac{SS_{total}-SS_{error}}{SS_{total}}$
- Recall sum of squares (SS) is

\[ \begin{align} &SS_{total}=\sum_{i=1}^{n}(y_i-\bar{y})^2 \\ &SS_{error}=\sum_{i=1}^{n}(\epsilon_i)^2 \end{align} \]

Summary statistics in regression

Mixed models
- Consider model

\[ MSEL = \beta_0+\beta_1*I(Group=\text{HR-ASD})+\beta_2*I(Group=\text{HR-Neg})+\beta_3*TCV+\beta_4*Age+\delta_{0,i}+\delta_{1,i}*Age +\epsilon \]

Looking at changing MSEL over time by group and TCV
Random effects: intercept ($\delta_0$), slope for age ($\delta_1$), residual ($\epsilon$)
How to compute effect sizes
- Group differences $ = _{mixed} = $
- Semi-partial marginal $R^2$ from r2glmm in R

Regression diagnostics

With modeling need to assess model fit
- Residual distribution and variance
- Outliers and their effects on results
Can easily create visuals using ggfortify and olsrr

Regression diagnostics

Customized visualizations

Can use flextable package with results stored as data frame
- Create any table you want
Can use ggplot with ggpubr to combine figures

Presenting Regression Analyses in R: Part 2Effect sizes and diagnostics

Introduction

Correlation analyses

Correlation analyses

Group differences

Group differences

Group differences

Summary statistics

Summary statistics in regression

Summary statistics in regression

Summary statistics in regression

Regression diagnostics

Regression diagnostics

Customized visualizations

Presenting Regression Analyses in R: Part 2
Effect sizes and diagnostics