Association Analyses with IBIS Data: Correlation and Linear Regression Analyses

Kevin Donovan

January 27, 2021

Introduction

Previous Session: introduced concepts in statistical inference

Now: focus on analytic methods and their implementation in R

Multivariable Analysis

Previously: Discussed simple univariable analyses (comparing means)

Suppose one is interested in how multiple variables are related distributionally

Simplest case: Two variables \(X\) and \(Y\)

Covariance and Correlation

Covariance: \(\text{Cov}(X, Y)=\text{E}[(X-\text{E}[X])(Y-\text{E}[Y])]\)

Looking inside the outer mean: \((X-\text{E}[X])(Y-\text{E}[Y])\)

Limitation: relationship size in terms of variable units

Pearson Correlation: \(\text{Corr}(X, Y)=\frac{\text{Cov}(X,Y)}{\sqrt{\text{Var}(X)}\sqrt{\text{Var}(Y)}}\)

Standardizes relationship size using variances

Covariance visual explanation

missing

Correlation in IBIS

```{=html}

```

Correlation: Assessing Significance

Correlation Estimates: Example

Limitations of Correlation

Linear Regression: Setup

Consider variable \(X\) and \(Y\) again

Consider a directional relationship:

\(X\) denoted independent variable; \(Y\) denoted dependent variable

\(X\) and \(Y\) related through mean: \(\text{E}(Y|X)=\beta_0+\beta_1X\)

Linear Regression: Setup

Full Model: \[ \begin{align} &Y=\beta_0+\beta_1X+\epsilon \\ &\\ &\text{where E}(\epsilon)=0 \text{; Var}(\epsilon)=\sigma^2 \\ &\epsilon_i \perp \epsilon_j \text{ for }i\neq j; X\perp \epsilon \end{align} \]

Linear Regression: Inference

Estimation:

Find “line of best fit” in data

Can see slope estimate is scaled correlation

Linear Regression: Inference

Confidence Intervals and Testing:

  1. If \(\epsilon_i \perp \epsilon_j\) for \(i \neq j\) and \(\epsilon \sim\text{Normal}(0,\sigma^2)\)
  1. If \(\epsilon_i \perp \epsilon_j\) for \(i \neq j\) and \(\text{Var}(\epsilon_i)=\sigma^2\) for all \(i\)

Linear Regression: Covariates

Above all apply for general regression equation:

\(Y=\beta_0+\beta_1X1+\ldots+\beta_pX_p+\epsilon\)

Where \(\text{E}(Y|X_1, \ldots, X_p)=\beta_0+\beta_1X1+\ldots+\beta_pX_p\)

\(Y|X_1, \ldots, X_p=\) “controlling for \(X_1, \ldots, X_p\)

missing

Confounders

Diagnostics

Diagnostics

  1. Normality
  1. Homoskedasicity