--- title: "Short Intro into Gaussian Mixture Models" author: "Michael C. Thrun" date: "`r format(Sys.time(), '%d %b %Y')`" output: html_document: theme: united highlight: tango toc: true number_sections: true doc_depth: 2 toc_float: true fig.width: 8 fig.height: 8 vignette: > %\VignetteIndexEntry{Short Intro into Gaussian Mixture Models} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} # library(rgl) # #library(rglwidget) # setupKnitr() # knitr::opts_chunk$set(echo = TRUE, # fig.align = "center", # warning = FALSE, # webgl = TRUE, # fig.width = 8, # fig.height = 8, # fig.keep = "all", # fig.ext = "jpeg" # ) ``` # Gaussian Mixture Models (GMM) Examples in which using the EM algorithm for GMM itself is insufficient but a visual modelling approach appropriate can be found in [Ultsch et al., 2015]. In general, a GMM is explainable if the overlapping of Gaussians remains small. An good example for modeling of such a GMM in the case of natural data can be found in the ECDA presentation available on Research Gate in [Thrun/Ultsch, 2015]. In the example below the data is generated specifcally such that a the resulting GMM is statistitically signficant. The interactive approach of AdaptGauss uses shiny. Hence, I dont know how to illustrate these examples in Rmarkdown. ```{} data=c(rnorm(3000,2,1),rnorm(3000,7,3),rnorm(3000,-2,0.5)) gmm=AdaptGauss::AdaptGauss(data, Means = c(-2, 2, 7), SDs = c(0.5, 1, 4), Weights = c(0.3333, 0.3333, 0.3333)) AdaptGauss::Chi2testMixtures(data, gmm$Means,gmm$SDs,gmm$Weights,PlotIt=T) AdaptGauss::QQplotGMM(data,gmm$Means,gmm$SDs,gmm$Weights) ``` ## Multimodal Natural Dataset not Suitable for a GMM Not every multimodal dataset should be modelled with GMMs. This is an example for a non-statistically significant model of a multimodal dataset. ```{} data('LKWFahrzeitSeehafen2010') gmm=AdaptGauss::AdaptGauss(LKWFahrzeitSeehafen2010, Means = c(52.74, 385.38, 619.46, 162.08), SDs = c(38.22, 93.21, 57.72, 48.36), Weights = c(0.2434, 0.5589, 0.1484, 0.0749)) AdaptGauss::Chi2testMixtures(LKWFahrzeitSeehafen2010, gmm$Means,gmm$SDs,gmm$Weights,PlotIt=T) AdaptGauss::QQplotGMM(LKWFahrzeitSeehafen2010,gmm$Means,gmm$SDs,gmm$Weights) ``` # References Thrun, M. C., & Ultsch, A. : Models of Income Distributions for Knowledge Discovery, Proc. European Conference on Data Analysis (ECDA), DOI: 10.13140/RG.2.1.4463.0244, pp. 136-137, Colchester, 2015. Ultsch, A., Thrun, M. C., Hansen-Goos, O., & Lotsch, J. : Identification of Molecular Fingerprints in Human Heat Pain Thresholds by Use of an Interactive Mixture Model R Toolbox (AdaptGauss), International journal of molecular sciences, Vol. 16(10), pp. 25897-25911, 2015.