Kevin Donovan
February 13, 2021
Previous Session: discussed association analyses with correlation, linear regression
Now: focus on longitudinal analyses and linear regression with clustered data
Independent Data:
Clustered Data
\[ \begin{align} &\text{Cluster 1}: \{(X_{1,1},Y_{1,1}), \ldots, (X_{1,n_1},Y_{1,n_1})\} \\ &\text{Cluster 2}: \{(X_{2,1},Y_{2,1}), \ldots, (X_{2,n_2},Y_{2,n_2})\} \\ & \ldots \\ &\text{Cluster K}: \{(X_{K,1},Y_{K,1}), \ldots, (X_{K,n_K},Y_{K,n_K})\} \\ \end{align} \]
Examples:
Estimating cluster-specific parameters (within-cluster means for example)
Can we incorporate both in a single model/analysis framework?
For this session, we focus on longitudinal data
\(\leftrightarrow\) “cluster” = single participant
Goal: Suppose we want to analyze the associations between variables \(X\) and \(Y\)
Longitudinal dependence in data maybe a) nuisance or b) of interest
Why not just compute Pearson or Spearman correlation between \(X\) and \(Y\) in whole data?
Okay, what about linear modeling?
For simplicity, suppose we are interested in the association between \(Y\) and covariate \(X\)
Suppose we also observe time variable \(T\), observe each subject \(K\) times
Recall: Linear regression model with \(X\) and \(T\) is
\[ \begin{align} &Y_{i,j}=\beta_0+\beta_1X_{i,j}+\beta_2T_{i,j}+\epsilon_{i,j} \\ &\\ &\text{where E}(\epsilon_{i,j})=0 \text{; Var}(\epsilon_{i,j})=\sigma^2 \\ &\epsilon_{i,j} \perp \epsilon_{k,l} \text{ for }i\neq k|j\neq l \end{align} \]
Idea: Let’s tie together observation in the same cluster/subject using random effects
Example: Suppose want to tie observations in subject together based on starting point
Model:
\[ \begin{align} &Y_{i,j}=\beta_0+\beta_1X_{i,j}+\beta_2T_{i,j}+\phi_i+\epsilon_{i,j} \\ &\\ &\text{where E}(\epsilon_{i,j})=0 \text{; Var}(\epsilon_{i,j})=\sigma^2 \\ &\text{where E}(\phi_{i})=0 \text{; Var}(\phi_{i})=\sigma_{\phi}^2; \text{Cor}(\epsilon_{i,j}, \epsilon_{i,l})=\rho_{j,l} \\ & \phi_{i} \perp \phi_{j} \text{ for }i\neq j \\ &\epsilon_{i,j} \perp \epsilon_{k,l} \text{ for }i\neq k \end{align} \]
How is this modeling dependence?
\[ \begin{align} &Y_{1,1} = \beta_0+\beta_1X_{1,1}+\beta_2T_{1,1}+\phi_1+\epsilon_{1,1} \\ &Y_{2,2} = \beta_0+\beta_1X_{2,2}+\beta_2T_{2,2}+\phi_2+\epsilon_{2,2} \end{align} \]
Between subjects all pieces independent from one another \(\rightarrow\)
Variables are independent
\[ \begin{align} &Y_{1,1} = \beta_0+\beta_1X_{1,1}+\beta_2T_{1,1}+\phi_1+\epsilon_{1,1} \\ &Y_{1,2} = \beta_0+\beta_1X_{1,2}+\beta_2T_{1,2}+\phi_1+\epsilon_{1,2} \end{align} \]
Can represent these different means using our equation
\[ \text{E}(Y_{i,j}|X_{i,j}, T_{i,j}) = \beta_0+\beta_1X_{i,j}+\beta_2T_{i,j} \]
\[ \text{E}(Y_{i,j}|X_{i,j}, T_{i,j}, \phi_i) = \beta_0+\beta_1X_{i,j}+\beta_2T_{i,j}+\phi_i \]
Thus, mixed models often referred to as hierarchical models
Have subject-level and pop-level mean structure
Have within-subject and between-subject variance/covariance
Slopes and intercepts also have levels
\[ \begin{align} &Y_{i,j}=\beta_0+\beta_1X_{i,j}+\beta_2T_{i,j}+\phi_i+\epsilon_{i,j} \\ &Y_{i,j}=[\beta_0+\phi_i]+\beta_1X_{i,j}+\beta_2T_{i,j}+\epsilon_{i,j}\\ &\\ &Y_{i,j}=\beta_{0,i}+\beta_1X_{i,j}+\beta_2T_{i,j}+\epsilon_{i,j}\\ \end{align} \]
\[ \begin{align} &Y_{i,j}=\beta_0+\beta_1X_{i,j}+\beta_2T_{i,j}+\phi_{0,i}+\phi_{1,i}T_{i,j}+\epsilon_{i,j} \\ &\\ &\text{where E}(\epsilon_{i,j})=0 \text{; Var}(\epsilon_{i,j})=\sigma^2 \\ &\text{where E}(\phi_{0,i})=\text{where E}(\phi_{1,i})=0 \text{; Var}(\phi_{0,i})=\sigma_{\phi_0}^2 \text{; Var}(\phi_{1,i})=\sigma_{\phi_1}^2\\ & \text{Cor}(\phi_{0,i}, \phi_{1,i})=\rho_{\phi_{0,1}} \\ & \text{Cor}(\epsilon_{i,j}, \epsilon_{i,l})=\rho_{\epsilon_{j,l}} \\ & \phi_{i,l} \perp \phi_{j,m} \text{ for }i\neq j \\ &\epsilon_{i,j} \perp \epsilon_{k,l} \text{ for }i\neq k \end{align} \]
\[ \text{E}(Y_{i,j}|X_{i,j}, T_{i,j}) = \beta_0+\beta_1X_{i,j}+\beta_2T_{i,j} \]
\[ \text{E}(Y_{i,j}|X_{i,j}, T_{i,j}, \phi_{0,i}, \phi_{1,i}) = \beta_0+\beta_1X_{i,j}+\beta_2T_{i,j}+\phi_{0,i}+\phi_{1,i}T_{i,j} \]
\[ \begin{align} &Y_{i,j}=\beta_0+\beta_1X_{i,j}+\beta_2T_{i,j}+\phi_{0,i}+\phi_{1,i}T_{i,j}+\epsilon_{i,j} \\ &Y_{i,j}=[\beta_0+\phi_{0,i}]+\beta_1X_{i,j}+[\beta_2+\phi_{1,i}]T_{i,j}+\epsilon_{i,j}\\ & \\ &Y_{i,j}=\beta_{0,i}+\beta_1X_{i,j}+\beta_{2,i}T_{i,j}+\epsilon_{i,j}\\ \end{align} \]
See code RMD file