University of California, Santa Barbara

Figure. Gaussian mixture models. Data simulated from a 2-class model.

Lab preparation

Creating a version-controlled R-Project with Github

Download repository here: https://github.com/garberadamc/SEM-Lab10

On the Github repository webpage:

fork your own branch of the lab repository
copy the repository web URL address from the clone or download menu

Within R-Studio:

click “NEW PROJECT”
choose option Version Control
choose option Git
paste the repository web URL path copied from the clone or download menu on Github page
choose location of the R-Project

Data source:

The first example closely follows the vignette used to demonstrate the tidyLPA package (Rosenberg, 2019): \(\color{blue}{\text{See detailed documentation of this model here}}\)

This model utilizes the PISA data collected in the U.S. in 2015. To learn more about this data \(\color{blue}{\text{see here}}\).

To access the 2015 US PISA data in R use the following code: a. devtools::install_github("jrosen48/pisaUSA15") b. library(pisaUSA15) c. open_codebook()

The second examples utilizes 4 test score measures from the public-use dataset, The Longitudinal Survey of American Youth (LSAY): \(\color{blue}{\text{See documentation here}}\)

Load packages

library(naniar)
library(tidyverse)
library(haven)
library(glue)
library(MplusAutomation)
library(rhdf5)
library(here)
library(janitor)
library(gt)
library(tidyLPA)

Load data

pisa <- pisaUSA15

Latent Profile Analysis

Figure. Picture adapted from tutorial (Rosenberg, 2019).

model 1: Equal variances, and covariances fixed to 0 (Class-invariant / Diagonal)
model 2: Free variances and covariances fixed to 0 (Class-varying / Diagonal)
model 3: Equal variances and equal covariances (Class-invariant / Non-Diagonal)
model 4: Free variances, and equal covariances
model 5: Equal variances, and free covariances
model 6: Free variances and free covariances (Class Varying / Non-Diagonal)

Example 1: PISA dataset from the `tidyLPA` package

Enumerate using estimate_profiles():

Estimate models with classes \(K = 1:3\) using Mplus
Model has 4 continuous indicators
Default variance-covariance sprecifications is model 1
In the followind example model 1 and model 6 are compared
Add line scale() %>% to center indicator means

lpa_models <- pisa[1:500,] %>%
    select(broad_interest, enjoyment, instrumental_mot, self_efficacy) %>%
    estimate_profiles(1:3,
                      package = "MplusAutomation",
                      ANALYSIS = "starts = 100, 20;",
                      variances = c("equal", "varying"),
                      covariances = c("zero", "varying"))

get_fit(lpa_models)

Plot 3-class model

Note: single imputation is used in this example as plot_profiles() requires complete cases

pisa[1:200,] %>%
    select(broad_interest, enjoyment, instrumental_mot, self_efficacy) %>%
    single_imputation() %>%
    estimate_profiles(3, package = "MplusAutomation") %>% 
    plot_profiles(sd=FALSE)

Plot densities for classes k = 1:4

pisa[1:500, c("broad_interest","enjoyment")] %>%
  single_imputation() %>%
  estimate_profiles(1:4, package = "MplusAutomation") %>%
  plot_density()

Example 2: Math, Science, Physics, and Biology measures (LSAY).

Read in data

lsay_data <- read_csv(here("data", "lsay_lab10.2_lpa.csv"))

A. Run a quick enumeration (default variance-covariance specification - model 1)

lpa_k14  <- lapply(1:4, function(k) {
  lpa_enum  <- mplusObject(
      
    TITLE = glue("Class {k}A"), 
  
    VARIABLE = glue(
    "usevar = mth_scor-bio_scor;
     classes = c({k}); "),
  
  ANALYSIS = 
   "estimator = mlr; 
    type = mixture;
    starts = 200 50; 
    processors = 10;",
  
  OUTPUT = "sampstat residual tech11 tech14;",
  
  PLOT = 
    "type = plot3; 
     series = mth_scor-bio_scor(*);",
  
  usevariables = colnames(lsay_data),
  rdata = lsay_data)

lpa_enum_fit <- mplusModeler(lpa_enum, 
    dataout=glue(here("enum_lpa", "c_lpaA_lsay_Lab10.dat")),
    modelout=glue(here("enum_lpa", "c{k}_lpaA_lsay_Lab10.inp")) ,
    check=TRUE, run = TRUE, hashfilename = FALSE)
})

Plot \(1:4\) class model profiles

lsay_data[1:500,5:8] %>%
    single_imputation() %>%
    estimate_profiles(1:4, package = "MplusAutomation") %>% 
    plot_profiles(sd=FALSE)

Figure. Here we see ordered solutions.

This result might be aniticipated given that the correlation between test scores ranges from moderate to high (r = .57 - .95).

Compare model fit.

all_output <- readModels(here("enum_lpa"), quiet = TRUE)

enum_extract <- LatexSummaryTable(all_output,                                             #
                keepCols=c("Title","Parameters", "LL", "BIC",                             #
                           "aBIC", "BLRT_PValue", "T11_VLMR_PValue"),                     #
                sortBy = "Title")                                                         #

gt(enum_extract)

Which variance-covariance specification is appropriate?

In this tutorial we consider the following three models: A. equal variances and covariances fixed to zero (default) B. free variances and covariances fixed to zero (Model 2) C. free variances and free covariance for Physics and Science test scores (Model 6)

Figure. Picture adapted from lecture slides by Karen Nylund-Gibson.

Concept check 1: Equal Variances & Covariances fixed to zero

Figure. Picture of “diagonal class-invariant model” adapted from lecture slides by Karen Nylund-Gibson.

Concept check 2: Unequal Variances & Free Covariances

Figure. Picture of “non-diagonal class-varying model” adapted from lecture slides by Karen Nylund-Gibson.

B. Run enumeration specifying free variances

lpa_k14B  <- lapply(1:4, function(k) {
  lpa_enumB  <- mplusObject(
      
    TITLE = glue("Class {k}B"), 
  
    VARIABLE = glue(
    "usevar = mth_scor-bio_scor;
     classes = c({k}); "),
  
  ANALYSIS = 
   "estimator = mlr; 
    type = mixture;
    starts = 200 50; 
    processors = 10;",
  
  MODEL = 
    "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
     ! THE # OF CLASS-SPECIFIC STATEMENTS MUST BE EDITED TO CORRESPOND WITH K!
     !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
    
     %c#1%
     [mth_scor-bio_scor];    ! means are are freely estimated 
     mth_scor-bio_scor;      ! variances are freely estimated

     %c#2%
     [mth_scor-bio_scor];
     mth_scor-bio_scor;

     %c#3%
     [mth_scor-bio_scor];
     mth_scor-bio_scor;
     
     %c#4%
     [mth_scor-bio_scor];
     mth_scor-bio_scor;",
  
  OUTPUT = "sampstat residual tech11 tech14;",
  
  PLOT = 
    "type = plot3; 
     series = mth_scor-bio_scor(*);",
  
  usevariables = colnames(lsay_data),
  rdata = lsay_data)

lpa_enumB_fit <- mplusModeler(lpa_enumB, 
    dataout=glue(here("enum_lpa", "c_lpaB_lsay_Lab10.dat")),
    modelout=glue(here("enum_lpa", "c{k}_lpaB_lsay_Lab10.inp")) ,
    check=TRUE, run = TRUE, hashfilename = FALSE)
})

C. Run enumeration specifying free variances and free covariances

lpa_k14C  <- lapply(1:4, function(k) {
  lpa_enumC  <- mplusObject(
      
    TITLE = glue("Class {k}C"), 
  
    VARIABLE = glue(
    "usevar = mth_scor-bio_scor;
     classes = c({k}); "),
  
  ANALYSIS = 
   "estimator = mlr; 
    type = mixture;
    starts = 200 50; 
    processors = 10;",
  
  MODEL = 
    "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
     ! THE # OF CLASS-SPECIFIC STATEMENTS MUST BE EDITED TO CORRESPOND WITH K!
     !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
     %c#1%
     [mth_scor-bio_scor];    ! means are are freely estimated 
     mth_scor-bio_scor;      ! variances are freely estimated
     sci_scor with phy_scor;

     %c#2%
     [mth_scor-bio_scor];
     mth_scor-bio_scor;
     sci_scor with phy_scor;

     %c#3%
     [mth_scor-bio_scor];
     mth_scor-bio_scor;
     sci_scor with phy_scor;
     
     %c#4%
     [mth_scor-bio_scor];
     mth_scor-bio_scor;",
  
  OUTPUT = "sampstat residual tech11 tech14;",
  
  PLOT = 
    "type = plot3; 
     series = mth_scor-bio_scor(*);",
  
  usevariables = colnames(lsay_data),
  rdata = lsay_data)

lpa_enumC_fit <- mplusModeler(lpa_enumC, 
    dataout=glue(here("enum_lpa", "c_lpaC_lsay_Lab10.dat")),
    modelout=glue(here("enum_lpa", "c{k}_lpaC_lsay_Lab10.inp")) ,
    check=TRUE, run = TRUE, hashfilename = FALSE)
})

Compare model fit

all_output <- readModels(here("enum_lpa"), quiet = TRUE)

enum_extract <- LatexSummaryTable(all_output,                                             #
                keepCols=c("Title","Parameters", "LL", "BIC",                             #
                           "aBIC", "BLRT_PValue", "T11_VLMR_PValue"),                     #
                sortBy = "Title")                                                         #

gt(enum_extract)

Title	Parameters	LL	BIC	aBIC	BLRT_PValue	T11_VLMR_PValue
Class 1A	8	-46288.29	92640.89	92615.47	NA	NA
Class 1B	8	-46288.29	92640.89	92615.47	NA	NA
Class 1C	9	-42816.54	85705.44	85676.84	NA	NA
Class 2A	13	-43352.36	86809.23	86767.93	0	0e+00
Class 2B	17	-43313.36	86763.40	86709.38	0	0e+00
Class 2C	19	-41021.00	82194.76	82134.39	0	0e+00
Class 3A	18	-42126.11	84396.93	84339.74	0	0e+00
Class 3B	26	-42030.92	84270.88	84188.27	0	0e+00
Class 3C	29	-40319.12	80871.40	80779.26	0	0e+00
Class 4A	23	-41433.72	83052.37	82979.29	0	0e+00
Class 4B	35	-41324.70	82930.80	82819.59	0	2e-04
Class 4C	NA	NA	NA	NA	NA	NA

Lab Materials - Return to Home Page

References

Hallquist, M. N., & Wiley, J. F. (2018). MplusAutomation: An R Package for Facilitating Large-Scale Latent Variable Analyses in Mplus. Structural equation modeling: a multidisciplinary journal, 25(4), 621-638.

Miller, J. D., Hoffer, T., Suchner, R., Brown, K., & Nelson, C. (1992). LSAY codebook. Northern Illinois University.

Muthén, B. O., Muthén, L. K., & Asparouhov, T. (2017). Regression and mediation analysis using Mplus. Los Angeles, CA: Muthén & Muthén.

Muthén, L.K. and Muthén, B.O. (1998-2017). Mplus User’s Guide. Eighth Edition. Los Angeles, CA: Muthén & Muthén

Rosenberg, J. M., van Lissa, C. J., Beymer, P. N., Anderson, D. J., Schell, M. J. & Schmidt, J. A. (2019). tidyLPA: Easily carry out Latent Profile Analysis (LPA) using open-source or commercial software [R package]. https://data-edu.github.io/tidyLPA/

R Core Team (2017). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/

Wickham et al., (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686, https://doi.org/10.21105/joss.01686

Lab 10.2 - Latent Profile Analysis

Structural Equation Modeling - Instructor: Karen Nylund-Gibson

Adam Garber

June 07, 2020

Lab preparation

Creating a version-controlled R-Project with Github

Data source:

Latent Profile Analysis

Example 1: PISA dataset from the `tidyLPA` package

Example 2: Math, Science, Physics, and Biology measures (LSAY).

Which variance-covariance specification is appropriate?

Compare model fit

Lab Materials - Return to Home Page

References

Lab preparation

Creating a version-controlled R-Project with Github

Data source:

Latent Profile Analysis

Example 1: PISA dataset from the tidyLPA package

Example 2: Math, Science, Physics, and Biology measures (LSAY).

Which variance-covariance specification is appropriate?

Compare model fit

Lab Materials - Return to Home Page

References

Example 1: PISA dataset from the `tidyLPA` package