University of California, Santa Barbara
Figure. Gaussian mixture models. Data simulated from a 2-class model.
Lab preparation
Creating a version-controlled R-Project with Github
Download repository here: https://github.com/garberadamc/SEM-Lab10
On the Github repository webpage:
fork
your ownbranch
of the lab repository- copy the repository web URL address from the
clone or download
menu
Within R-Studio:
- click “NEW PROJECT”
- choose option
Version Control
- choose option
Git
- paste the repository web URL path copied from the
clone or download
menu on Github page - choose location of the R-Project
Data source:
- The first example closely follows the vignette used to demonstrate the
tidyLPA
package (Rosenberg, 2019): \(\color{blue}{\text{See detailed documentation of this model here}}\)
This model utilizes the PISA
data collected in the U.S. in 2015. To learn more about this data \(\color{blue}{\text{see here}}\).
To access the 2015 US PISA
data in R use the following code: a. devtools::install_github("jrosen48/pisaUSA15")
b. library(pisaUSA15)
c. open_codebook()
- The second examples utilizes 4 test score measures from the public-use dataset, The Longitudinal Survey of American Youth (LSAY): \(\color{blue}{\text{See documentation here}}\)
Load packages
library(naniar)
library(tidyverse)
library(haven)
library(glue)
library(MplusAutomation)
library(rhdf5)
library(here)
library(janitor)
library(gt)
library(tidyLPA)
Load data
Latent Profile Analysis
Figure. Picture adapted from tutorial (Rosenberg, 2019).
model 1
: Equal variances, and covariances fixed to 0 (Class-invariant / Diagonal)model 2
: Free variances and covariances fixed to 0 (Class-varying / Diagonal)model 3
: Equal variances and equal covariances (Class-invariant / Non-Diagonal)model 4
: Free variances, and equal covariancesmodel 5
: Equal variances, and free covariancesmodel 6
: Free variances and free covariances (Class Varying / Non-Diagonal)
Example 1: PISA dataset from the tidyLPA
package
Enumerate using estimate_profiles()
:
- Estimate models with classes \(K = 1:3\) using Mplus
- Model has 4 continuous indicators
- Default variance-covariance sprecifications is model 1
- In the followind example model 1 and model 6 are compared
- Add line
scale() %>%
to center indicator means
lpa_models <- pisa[1:500,] %>%
select(broad_interest, enjoyment, instrumental_mot, self_efficacy) %>%
estimate_profiles(1:3,
package = "MplusAutomation",
ANALYSIS = "starts = 100, 20;",
variances = c("equal", "varying"),
covariances = c("zero", "varying"))
get_fit(lpa_models)
Plot 3-class model
Note: single imputation is used in this example as plot_profiles()
requires complete cases
pisa[1:200,] %>%
select(broad_interest, enjoyment, instrumental_mot, self_efficacy) %>%
single_imputation() %>%
estimate_profiles(3, package = "MplusAutomation") %>%
plot_profiles(sd=FALSE)
Plot densities for classes k = 1:4
pisa[1:500, c("broad_interest","enjoyment")] %>%
single_imputation() %>%
estimate_profiles(1:4, package = "MplusAutomation") %>%
plot_density()
Example 2: Math, Science, Physics, and Biology measures (LSAY).
Read in data
A. Run a quick enumeration (default variance-covariance specification - model 1
)
lpa_k14 <- lapply(1:4, function(k) {
lpa_enum <- mplusObject(
TITLE = glue("Class {k}A"),
VARIABLE = glue(
"usevar = mth_scor-bio_scor;
classes = c({k}); "),
ANALYSIS =
"estimator = mlr;
type = mixture;
starts = 200 50;
processors = 10;",
OUTPUT = "sampstat residual tech11 tech14;",
PLOT =
"type = plot3;
series = mth_scor-bio_scor(*);",
usevariables = colnames(lsay_data),
rdata = lsay_data)
lpa_enum_fit <- mplusModeler(lpa_enum,
dataout=glue(here("enum_lpa", "c_lpaA_lsay_Lab10.dat")),
modelout=glue(here("enum_lpa", "c{k}_lpaA_lsay_Lab10.inp")) ,
check=TRUE, run = TRUE, hashfilename = FALSE)
})
Plot \(1:4\) class model profiles
lsay_data[1:500,5:8] %>%
single_imputation() %>%
estimate_profiles(1:4, package = "MplusAutomation") %>%
plot_profiles(sd=FALSE)
Figure. Here we see ordered solutions.
This result might be aniticipated given that the correlation between test scores ranges from moderate to high (r = .57 - .95).
Compare model fit.
all_output <- readModels(here("enum_lpa"), quiet = TRUE)
enum_extract <- LatexSummaryTable(all_output, #
keepCols=c("Title","Parameters", "LL", "BIC", #
"aBIC", "BLRT_PValue", "T11_VLMR_PValue"), #
sortBy = "Title") #
gt(enum_extract)
Which variance-covariance specification is appropriate?
In this tutorial we consider the following three models: A. equal variances and covariances fixed to zero (default) B. free variances and covariances fixed to zero (Model 2) C. free variances and free covariance for Physics and Science test scores (Model 6)
Figure. Picture adapted from lecture slides by Karen Nylund-Gibson.
Concept check 1: Equal Variances & Covariances fixed to zero
Figure. Picture of “diagonal class-invariant model” adapted from lecture slides by Karen Nylund-Gibson.
Concept check 2: Unequal Variances & Free Covariances
Figure. Picture of “non-diagonal class-varying model” adapted from lecture slides by Karen Nylund-Gibson.
B. Run enumeration specifying free variances
lpa_k14B <- lapply(1:4, function(k) {
lpa_enumB <- mplusObject(
TITLE = glue("Class {k}B"),
VARIABLE = glue(
"usevar = mth_scor-bio_scor;
classes = c({k}); "),
ANALYSIS =
"estimator = mlr;
type = mixture;
starts = 200 50;
processors = 10;",
MODEL =
"!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! THE # OF CLASS-SPECIFIC STATEMENTS MUST BE EDITED TO CORRESPOND WITH K!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
%c#1%
[mth_scor-bio_scor]; ! means are are freely estimated
mth_scor-bio_scor; ! variances are freely estimated
%c#2%
[mth_scor-bio_scor];
mth_scor-bio_scor;
%c#3%
[mth_scor-bio_scor];
mth_scor-bio_scor;
%c#4%
[mth_scor-bio_scor];
mth_scor-bio_scor;",
OUTPUT = "sampstat residual tech11 tech14;",
PLOT =
"type = plot3;
series = mth_scor-bio_scor(*);",
usevariables = colnames(lsay_data),
rdata = lsay_data)
lpa_enumB_fit <- mplusModeler(lpa_enumB,
dataout=glue(here("enum_lpa", "c_lpaB_lsay_Lab10.dat")),
modelout=glue(here("enum_lpa", "c{k}_lpaB_lsay_Lab10.inp")) ,
check=TRUE, run = TRUE, hashfilename = FALSE)
})
C. Run enumeration specifying free variances and free covariances
lpa_k14C <- lapply(1:4, function(k) {
lpa_enumC <- mplusObject(
TITLE = glue("Class {k}C"),
VARIABLE = glue(
"usevar = mth_scor-bio_scor;
classes = c({k}); "),
ANALYSIS =
"estimator = mlr;
type = mixture;
starts = 200 50;
processors = 10;",
MODEL =
"!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! THE # OF CLASS-SPECIFIC STATEMENTS MUST BE EDITED TO CORRESPOND WITH K!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
%c#1%
[mth_scor-bio_scor]; ! means are are freely estimated
mth_scor-bio_scor; ! variances are freely estimated
sci_scor with phy_scor;
%c#2%
[mth_scor-bio_scor];
mth_scor-bio_scor;
sci_scor with phy_scor;
%c#3%
[mth_scor-bio_scor];
mth_scor-bio_scor;
sci_scor with phy_scor;
%c#4%
[mth_scor-bio_scor];
mth_scor-bio_scor;",
OUTPUT = "sampstat residual tech11 tech14;",
PLOT =
"type = plot3;
series = mth_scor-bio_scor(*);",
usevariables = colnames(lsay_data),
rdata = lsay_data)
lpa_enumC_fit <- mplusModeler(lpa_enumC,
dataout=glue(here("enum_lpa", "c_lpaC_lsay_Lab10.dat")),
modelout=glue(here("enum_lpa", "c{k}_lpaC_lsay_Lab10.inp")) ,
check=TRUE, run = TRUE, hashfilename = FALSE)
})
Compare model fit
all_output <- readModels(here("enum_lpa"), quiet = TRUE)
enum_extract <- LatexSummaryTable(all_output, #
keepCols=c("Title","Parameters", "LL", "BIC", #
"aBIC", "BLRT_PValue", "T11_VLMR_PValue"), #
sortBy = "Title") #
gt(enum_extract)
Title | Parameters | LL | BIC | aBIC | BLRT_PValue | T11_VLMR_PValue |
---|---|---|---|---|---|---|
Class 1A | 8 | -46288.29 | 92640.89 | 92615.47 | NA | NA |
Class 1B | 8 | -46288.29 | 92640.89 | 92615.47 | NA | NA |
Class 1C | 9 | -42816.54 | 85705.44 | 85676.84 | NA | NA |
Class 2A | 13 | -43352.36 | 86809.23 | 86767.93 | 0 | 0e+00 |
Class 2B | 17 | -43313.36 | 86763.40 | 86709.38 | 0 | 0e+00 |
Class 2C | 19 | -41021.00 | 82194.76 | 82134.39 | 0 | 0e+00 |
Class 3A | 18 | -42126.11 | 84396.93 | 84339.74 | 0 | 0e+00 |
Class 3B | 26 | -42030.92 | 84270.88 | 84188.27 | 0 | 0e+00 |
Class 3C | 29 | -40319.12 | 80871.40 | 80779.26 | 0 | 0e+00 |
Class 4A | 23 | -41433.72 | 83052.37 | 82979.29 | 0 | 0e+00 |
Class 4B | 35 | -41324.70 | 82930.80 | 82819.59 | 0 | 2e-04 |
Class 4C | NA | NA | NA | NA | NA | NA |
References
Hallquist, M. N., & Wiley, J. F. (2018). MplusAutomation: An R Package for Facilitating Large-Scale Latent Variable Analyses in Mplus. Structural equation modeling: a multidisciplinary journal, 25(4), 621-638.
Miller, J. D., Hoffer, T., Suchner, R., Brown, K., & Nelson, C. (1992). LSAY codebook. Northern Illinois University.
Muthén, B. O., Muthén, L. K., & Asparouhov, T. (2017). Regression and mediation analysis using Mplus. Los Angeles, CA: Muthén & Muthén.
Muthén, L.K. and Muthén, B.O. (1998-2017). Mplus User’s Guide. Eighth Edition. Los Angeles, CA: Muthén & Muthén
Rosenberg, J. M., van Lissa, C. J., Beymer, P. N., Anderson, D. J., Schell, M. J. & Schmidt, J. A. (2019). tidyLPA: Easily carry out Latent Profile Analysis (LPA) using open-source or commercial software [R package]. https://data-edu.github.io/tidyLPA/
R Core Team (2017). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/
Wickham et al., (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686, https://doi.org/10.21105/joss.01686