This study focuses on implementing
Hierarchical Models
with nested data and comparing this method to
the
Pooled
and
Unpooled
method. The pooled method complete ignores any nesting structure in the data,
which does not account for the variability in the response among the nesting groups. The unpooled method overstates the
variability among the nesting groups by fitting separate estimates for each nesting group without taking into
account information from other groups. Hierarchical models serve as a balance between these two methods.
It accounts for the variability in the nesting groups while fitting group estimates by using information
across all the groups.
Two key related concepts are present: borrowing strength and shrinkage . Borrowing strength refers to how estimates for groups with small sample sizes are pulled toward the average of all groups. Shrinkage refers to how estimates from hierarchical models are closer together compared to the unpooled method.
Level 1 observational units (i) refer to the units observed at the lowest level and are nested in groups. Level 2 observational units (j) refer to the groups in which the level 1 observational units are nested in. Predictor at both levels can be used but the response variable must be at level 1.
The music data set is from a performance anxiety study conducted by Sadler and Miller (2010). They collected data on 37 undergraduate music majors who filled out performance diaries over a whole academic year. Before each performance, each musician also completed a Positive Affect Negative Affect Schedule (PANAS), in which two variables were measured: negative affect (measure of anxiety) and positive affect (measure of happiness). In total, variables were measured on the musicians and each of their performances. This app will focus on how negative affect is associated with characteristics of the musicians and characteristics of the performances.
The
pooled method
completely pools all level 1 observational units and ignores
that there is nesting in level 2 observational units. Therefore, the pooled mean is the overall mean of the
response variable, not accounting for variability among the level 2 observational units in the response variable.
The
unpooled method
fits a separate average for each level 2 observational unit.
An issue is that the unpooled method exaggerates the variability in the response among the level 2 observational units because
information across all level 2 observational units are ignored. For example, some level 2 observational units
may have smaller sample size that would yield unreliable estimates with the pooled method. However, this can
be accounted for with hierarchical models.
$$y_i = \alpha_{j[i]} + \epsilon_i, where\, \epsilon_i \sim N(0,\sigma_y^{2})$$
$$\alpha_j = \mu_{\alpha} + \eta_j, where\, \eta_j \sim N(0,\sigma_\alpha^{2})$$
$$y_i = \mu_{\alpha} + \eta_j + \epsilon_i$$
$$\epsilon_i \sim N(0,\sigma_y^{2})$$
$$\eta_j \sim N(0,\sigma_\alpha^{2})$$
$$y_i = true\,response\,for\,observational\,unit\,i$$
$$\alpha_{j[i]} = true\,HLM\,mean\,of\,group\,j\,for\,observational\,unit\,i$$
$$\epsilon_i = deviation\,of\,observational\,unit\,i\,from\,its\,group\,average$$
$$\sigma_y^{2} = within\,group\,variance\,in\,response$$
$$\mu_{\alpha} = true\,average\,of\,group\,averages$$
$$\eta_j = deviation\,of\,group\,j\,from\,true\,average$$
$$\sigma_\alpha^{2} = between\,group\,variance\,in\,response$$
$$\hat{\alpha}_j^{HLM} = \hat{\omega_j}*\hat{\mu}_\alpha+(1-\hat{\omega_j})*\bar{y}_j$$
$$\hat{\omega}_j = 1-\frac{n_j*\hat{\sigma}_\alpha^{2}}{n_j*\hat{\sigma}_\alpha^{2}+\hat{\sigma}_y^{2}}$$
$$\hat{\omega}_j = pooling\,factor$$
$$n_j = sample\,size\,of\,group\,j$$
$$\hat{\sigma}_y^{2} = within\,group\,(unexplained)\,variance\,in\,response$$
$$\hat{\sigma}_\alpha^{2} = between\,group\,(explained)\,variance\,in\,response$$
$$\bar{y}_{all} = pooled\,mean$$
$$\alpha_j \sim N(\mu_{\alpha},\sigma_\alpha^{2})$$
$$\hat{\sigma}_y^{2} = within\,group\,(unexplained)\,variance\,in\,response$$
$$\hat{\sigma}_\alpha^{2} = between\,group\,(explained)\,variance\,in\,response$$
$$\hat{\sigma}_y^{2} = within\,group\,(unexplained)\,variance\,in\,response$$
$$\hat{\sigma}_\alpha^{2} = between\,group\,(explained)\,variance\,in\,response$$
$$y_i = \alpha_{j[i]} + \beta_{j[i]}x_i + \epsilon_i$$
$$\alpha_j = \mu_{\alpha} + \eta^{\alpha}_j$$
$$\beta_j = \mu_{\beta} + \eta^{\beta}_j$$
$$\begin{pmatrix} \alpha_j \\ \beta_j \end{pmatrix} \sim N\begin{pmatrix} \begin{pmatrix} \mu_{\alpha} \\ \mu_{\beta} \end{pmatrix} , \begin{pmatrix} \sigma_\alpha^{2} & \rho\sigma_\alpha\sigma_\beta \\ \rho\sigma_\alpha\sigma_\beta & \sigma_\beta^{2} \end{pmatrix} \end{pmatrix}$$
Error terms:
$$\epsilon_i \sim N(0,\sigma_y^{2})$$
$$\begin{pmatrix} \eta^{\alpha}_j \\ \eta^{\beta}_j \end{pmatrix} \sim N\begin{pmatrix} \begin{pmatrix} 0 \\ 0 \end{pmatrix} , \begin{pmatrix} \sigma_\alpha^{2} & \rho\sigma_\alpha\sigma_\beta \\ \rho\sigma_\alpha\sigma_\beta & \sigma_\beta^{2} \end{pmatrix} \end{pmatrix}$$
$$y_i = \alpha_{j[i]} + \beta_{j[i]}x_i + \epsilon_i$$
$$\alpha_j = \gamma^{\alpha}_0 + \gamma^{\alpha}_1\mu_j + \eta^{\alpha}_j$$
$$\beta_j = \gamma^{\beta}_0 + \gamma^{\beta}_1\mu_j + \eta^{\beta}_j$$
$$\begin{pmatrix} \alpha_j \\ \beta_j \end{pmatrix} \sim N\begin{pmatrix} \begin{pmatrix} \gamma^{\alpha}_0 + \gamma^{\alpha}_1\mu_j \\ \gamma^{\beta}_0 + \gamma^{\beta}_1\mu_j \end{pmatrix} , \begin{pmatrix} \sigma_\alpha^{2} & \rho\sigma_\alpha\sigma_\beta \\ \rho\sigma_\alpha\sigma_\beta & \sigma_\beta^{2} \end{pmatrix} \end{pmatrix}$$