In Bayesian statistics, the most widely used criteria of Bayesian model assessment and comparison are Deviance Information Criterion (DIC) and Watanabe–Akaike Information Criterion (WAIC). We use a multilevel mediation model as an illustrative example to compare different types of DIC and WAIC. More specifically, we aim to compare the performance of conditional and marginal DICs and WAICs, and investigate their performance with missing data. We focus on two versions of DIC ( DIC_1 and DIC_2) and one version of WAIC. In addition, we explore whether it is necessary to include the nuisance models of incomplete exogenous variables in likelihood. Based on the simulation results, whether DIC_2 is better than DIC_1 and WAIC and whether we should include the nuisance models of exogenous variables in likelihood functions depend on whether we use marginal or conditional likelihoods. Overall, we find that the marginal likelihood based-DIC_2 that excludes the likelihood of covariate models generally had the highest true model selection rates.
Du, H., Keller, B. T., *Alacam, E., & Enders, C. K. (2023). Comparing DIC and WAIC for multilevel models with missing data. Behavior Research Methods. Advance online publication.
To handle the nonnormal data issue, Browne proposed an unbiased distribution free (DF) estimator (ΓˆUDF) and an asymptotically distribution free estimator (ΓˆADF) of the covariance matrix of sample variances/covariances Γ to calculate robust test statistics and robust standard errors. However, ΓˆUDF is ignored in methodological and substantive research, and has not been extended to models with mean structures. To improve robust standard errors and the model fit statistic for nonnormal data with mean structures (e.g., growth curve models), we propose an unbiased distribution free estimator with mean structures considered. In growth curve models, we apply ΓˆUDF to four robust statistics that have relatively simple forms and denote them as TUSB, TUMVA, TUMVA2, and TUCOR1. We compare their performance with 7 robust test statistics that employ ΓˆADF. We find that with the same model fit statistic, ΓˆUDF generally leads to smaller Anderson-Darling distances from the theoretical distribution than ΓˆADF, except TUMVA2 in some skewed cases. Additionally, the p-values from TUMVA2 are distributed closest to the theoretical distribution Uniform(0,1) among the 12 examined statistics. In terms of Type I error rates, TUMVA2 and TMVA2 are the most stable statistics. Additionally, ΓˆUDF provides smaller relative biases of the robust SE estimates than ΓˆADF. Hence, we suggest using ΓˆUDF in both model fit statistics and robust SE calculation. Among the model fit statistics using ΓˆUDF, we suggest TUMVA2.
Abstract: In structural equation modeling, researchers conduct goodness-of-fit tests to evaluate whether the specified model fits the data well. With nonnormal data, the standard goodness-of-fit test statistic T does not follow a chi-square distribution. Comparing T to χ df 2 can fail to control Type I error rates and lead to misleading model selection conclusions. To better evaluate model fit, researchers have proposed various robust test statistics, but none of them consistently control Type I error rates under all examined conditions. To improve model fit statistics for nonnormal data, we propose to use an unbiased distribution free weight matrix estimator in robust test statistics. Specifically, using normal theory based parameter estimates with the unbiased distribution free weight matrix estimator, we calculate various robust test statistics and robust standard errors. We conducted a simulation study to compare 63 existing robust statistic combinations with the 4 proposed robust statistics with unbiased distribution free weight matrix estimator. The Satorra–Bentler statistic based on the unbiased distribution free weight matrix estimator provided acceptable Type I error rates at α =.01 , .05, or .1 across all conditions (except a few cases with α =.01 ), regardless of the sample size and the distribution.
Du, H., & Bentler, P.M. (Accepted). 40-Year Old Unbiased Distribution Free Estimator Reliably Improves SEM Statistics for Nonnormal Data. Structural Equation Modeling.
Missing data such as data missing at random (MAR) are unavoidable in real data and have the potential to undermine the validity of research results. Multiple imputation is one of the most widely used MAR-based methods in education and behavioral science applications. Arbitrarily specifying imputation models can lead to incompatibility and cause biased estimation. Building on the recent developments of model-based imputation and Arnold’s compatibility work, this paper systematically summarizes when the traditional fully conditional specification (FCS) is applicable and how to specify a model-based imputation model if needed. We summarize two Compatibility Requirements to help researchers check compatibility more easily and a decision tree to check whether the traditional FCS is applicable in a given scenario. Additionally, we present a clear overview of two types of model-based imputation: the sequential and separate specifications. We illustrate how to specify model-based imputation with examples. Additionally, we provide example code of a free software program, Blimp, for implementing model-based imputation.
Du, H., Alacam, E., Mena, S., & Keller, B. (Accepted). Compatibility in Imputation Specification. Behavior Research Methods.
Growth curve modeling is commonly used in psychological, educational, and social science research. The mainstream estimators for growth curve modeling are based on normal theory, but real data are unlikely to be exactly normally distributed. To improve estimation and inference with non-normal data, various estimators have been proposed. Among these estimators, the asymptotically distribution free ( ADF ) estimator does not need to rely on any distribution assumption but it is not efficient with small and modest sample sizes. We propose a distributionally weighted least squares ( DLS ) estimator in the growth curve modeling framework. DLS combines normal theory based and ADF based generalized least squares estimation to balance the information from the data and the normality assumption. Computer simulation results suggest that model-implied covariance based DLS ( DL S M ) generally provides more accurate and efficient estimates than the examined alternative methods regardless of the distribution. In addition, the relative biases of standard error estimates and the Type I error rates of the Satorra–Bentler test statistic ( T SB ) in DL S M were competitive with the classical methods including maximum likelihood and generalized least squares estimation. We illustrate how to implement DL S M and select the optimal tuning parameter by a bootstrap procedure in a real data example.
Du, H., Bentler, P.M., & Rosseel, Y (In press). Distributionally-weighted least squares in structural equation modeling. Structural Equation Modeling.
Stefany Mena is awarded the National Science Foundation Graduate Research Fellowship (NSF GRFP) in 2020. The NSF GRFP is a three-year fellowship awarded to doctoral students in STEM fields.
Missing data are exceedingly common across a variety of disciplines, such as educational, social, and behavioral science areas. Missing not at random (MNAR) mechanism where missingness is related to unobserved data is widespread in real data and has detrimental consequence. However, the existing MNAR-based methods have potential problems such as leaving the data incomplete and failing to accommodate incomplete covariates with interactions, non-linear terms, and random slopes. We propose a Bayesian latent variable imputation approach to impute missing data due to MNAR (and other missingness mechanisms) and estimate the model of substantive interest simultaneously. In addition, even when the incomplete covariates involves interactions, non-linear terms, and random slopes, the proposed method can handle missingness appropriately. Computer simulation results suggested that the proposed Bayesian latent variable selection model (BLVSM) was quite effective when the outcome and/or covariates were MNAR. Except when the sample size was small, estimates from the proposed BLVSM tracked closely with those from the complete data analysis. With a small sample size, when the outcome was less predictable from the covariates, the missingness proportions of the covariates and the outcome were larger, and the missingness selection processes of the covariates and the outcome were more MNAR and MAR, the performance of BLVSM was less satisfactory. When the sample size was large, BLVSM always performed well. In contrast, the method with an MAR assumption provided biased estimates and undercoverage confidence intervals when the missingness was MNAR. The robustness and the implementation of BLVSM in real data were also illustrated. The proposed method is available in the Blimp software application, and the paper includes a data analysis example illustrating its use.
Du, H., & Enders, C. K., Keller, B. T., Bradbury, T. & Karney, B. (In press). A Bayesian latent variable selection model for nonignorable missingness. Multivariate Behavioral Research.
In real data analysis with structural equation modeling, data are unlikely to be exactly normally distributed. If we ignore the non-normality reality, the parameter estimates, standard error estimates, and model fit statistics from normal theory based methods such as maximum likelihood (ML) and normal theory based generalized least squares estimation (GLS) are unreliable. On the other hand, the asymptotically distribution free (ADF) estimator does not rely on any distribution assumption but cannot demonstrate its efficiency advantage with small and modest sample sizes. The methods which adopt misspecified loss functions including ridge GLS (RGLS) can provide better estimates and inferences than the normal theory based methods and the ADF estimator in some cases. We propose a distributionally-weighted least squares (DLS) estimator, and expect that it can perform better than the existing generalized least squares, because it combines normal theory based and ADF based generalized least squares estimation. Computer simulation results suggest that model-implied covariance based DLS ( DLS_M ) provided relatively accurate and efficient estimates in terms of RMSE. In addition, the empirical standard errors, the relative biases of standard error estimates, and the Type I error rates of the Jiang-Yuan rank adjusted model fit test statistic ( T_JY ) in DL S_M were competitive with the classical methods including ML, GLS, and RGLS. The performance of DLS_M depends on its tuning parameter a . We illustrate how to implement DLS_M and select the optimal a by a bootstrap procedure in a real data example.
Du, H., & Bentler, P.M. (In press). Distributionally-weighted least squares in structural equation modeling. Psychological Methods