This article presents research into resolving common issues where incomplete biological data is presented in forensic samples.
Biological data are frequently nonlinear, heteroscedastic and conditionally dependent, and often researchers deal with missing data. To account for characteristics common in biological data in one algorithm, the authors developed the mixed cumulative probit (MCP), a novel latent trait model that is a formal generalization of the cumulative probit model usually used in transition analysis. Specifically, the MCP accommodates heteroscedasticity, mixtures of ordinal and continuous variables, missing values, conditional dependence and alternative specifications of the mean response and noise response. Cross-validation selects the best model parameters (mean response and the noise response for simple models, as well as conditional dependence for multivariate models), and the Kullback–Leibler divergence evaluates information gain during posterior inference to quantify mis-specified models (conditionally dependent versus conditionally independent). Two continuous and four ordinal skeletal and dental variables collected from 1296 individuals (aged birth to 22 years) from the Subadult Virtual Anthropology Database are used to introduce and demonstrate the algorithm. In addition to describing the features of the MCP, the authors provide material to help fit novel datasets using the MCP. The flexible, general formulation with model selection provides a process to robustly identify the modelling assumptions that are best suited for the data at hand. (Published abstract provided)
