Title: Model-based clustering for categorical data via Hamming distance
Abstract: In this work a model-based approach for clustering categorical data with no natural ordering is introduced. The proposed method exploits the Hamming distance to define a family of probability mass functions to model categorical data. The elements of this family are considered as kernels of a finite mixture model with unknown number of components. Fully Bayesian inference is provided using a sampling strategy based on a trans-dimensional blocked Gibbs-sampler, facilitating the computation with respect to the customary reversible-jump algorithm. Model performances are assessed via a simulation study, showing improvements in clustering recovery over existing approaches. Finally, our method is illustrated with application to reference datasets. Joint work with Raffaele Argiento and Edoardo Filippi-Mazzola.