| Title: | Plausible Naive Bayes Classifier Using PDE |
|---|---|
| Description: | A nonparametric, multicore-capable plausible naive Bayes classifier based on the Pareto density estimation (PDE), supporting memory sharing within multicore computations and featuring a plausible approach to a pitfall in the Bayesian theorem covering low evidence cases Stier, Q., Hoffmann, J., and Thrun, M.C.: "Classifying with the Fine Structure of Distributions: Leveraging Distributional Information for Robust and Plausible Naive Bayes" (2026), Machine Learning and Knowledge Extraction (MAKE), <DOI:10.3390/make8010013>. |
| Authors: | Michael Thrun [aut, cph, cre] (ORCID: <https://orcid.org/0000-0001-9542-5543>), Quirin Stier [aut, rev] (ORCID: <https://orcid.org/0000-0002-7896-4737>), Tim Robin Neldner [ctr, ctb] |
| Maintainer: | Michael Thrun <[email protected]> |
| License: | GPL-3 |
| Version: | 0.3.0 |
| Built: | 2026-06-21 15:36:34 UTC |
| Source: | https://github.com/mthrun/pdebayes |
A nonparametric, multicore-capable plausible naive Bayes classifier based on the Pareto density estimation (PDE), supporting memory sharing within multicore computations and featuring a plausible approach to a pitfall in the Bayesian theorem covering low evidence cases Stier, Q., Hoffmann, J., and Thrun, M.C.: "Classifying with the Fine Structure of Distributions: Leveraging Distributional Information for Robust and Plausible Naive Bayes" (2026), Machine Learning and Knowledge Extraction (MAKE), <DOI:10.3390/make8010013>.
Pareto Density Estimated naive Bayes Classifier Index of help topics:
ApplyBayesTheorem4Likelihoods
ApplyBayesTheorem4Likelihoods
defineOrEstimateDistribution
defineOrEstimateDistribution
fitParameters fitParameters
GetLikelihoods GetLikelihoods
getPriors getPriors
Hepta Hepta introduced in [Ultsch, 2003]
PDEnaiveBayes-package Plausible Naive Bayes Classifier Using PDE
PlotBayesianDecision2D
PlotBayesianDecision2D
PlotLikelihoodFuns PlotLikelihoodFuns
PlotLikelihoods PlotLikelihoods
PlotNaiveBayes PlotNaiveBayes
PlotPosteriors PlotPosteriors
Predict_naiveBayes Predict_naiveBayes
predict.PDEbayes predict.PDEbayes
Train_naiveBayes Train_naiveBayes
Train_naiveBayes_multicore
Train a Multicore Pareto Density Naive Bayes
Classifier
(PDENB) of [Stier et al., 2026].
Michal Thrun
Maintainer: Michael Thrun <[email protected]>
[Stier et al., 2026] Stier, Q.,Hoffmann, J. & Thrun, M. C.: Classifying with the Fine Structure of Distributions: Leveraging Distributional Information for Robust and Plausible Naïve Bayes, Machine Learning and Knowledge Extraction (MAKE), Vol. 8(1), 13, doi 10.3390/make8010013, MDPI, 2026.
[Thrun et al., 2020] Thrun, M. C., Gehlert, T., & Ultsch, A.: Analyzing the Fine Structure of Distributions, PloS one, Vol. 15(10), pp. e0238835, doi 10.1371/journal.pone.0238835 2020.
[Thrun/Ultsch, 2020] Thrun, M. C., & Ultsch, A.: Clustering Benchmark Datasets Exploiting the Fundamental Clustering Problems, Data in Brief, Vol. 30(C), pp. 105501, doi 10.1016/j.dib.2020.105501, 2020.
[Ultsch et al., 2015] Ultsch, A., Thrun, M. C., Hansen-Goos, O., & L?tsch, J.: Identification of Molecular Fingerprints in Human Heat Pain Thresholds by Use of an Interactive Mixture Model R Toolbox (AdaptGauss), International journal of molecular sciences, Vol. 16(10), pp. 25897-25911, doi 10.3390/ijms161025897, 2015.
if(requireNamespace("FCPS")){ V=FCPS::ClusterChallenge("Hepta",1000) Data=V$Hepta Cls=V$Cls ind=1:length(Cls) indtrain=sample(ind,800) indtest=setdiff(ind,indtrain) #parametric #model=Train_naiveBayes(Data[indtrain,],Cls[indtrain],Gaussian=TRUE) #ClsTrain=model$ClsTrain #table(Cls[indtrain],ClsTrain) #res=Predict_naiveBayes(Data[indtest,], Model = model) #table(Cls[indtest],res$ClsTest) #PDEbayes model=Train_naiveBayes(Data[indtrain,],Cls[indtrain],Gaussian=FALSE) ClsTrain=model$ClsTrain table(Cls[indtrain],ClsTrain) res=Predict_naiveBayes(Data[indtest,], Model = model) table(Cls[indtest],res$ClsTest) }if(requireNamespace("FCPS")){ V=FCPS::ClusterChallenge("Hepta",1000) Data=V$Hepta Cls=V$Cls ind=1:length(Cls) indtrain=sample(ind,800) indtest=setdiff(ind,indtrain) #parametric #model=Train_naiveBayes(Data[indtrain,],Cls[indtrain],Gaussian=TRUE) #ClsTrain=model$ClsTrain #table(Cls[indtrain],ClsTrain) #res=Predict_naiveBayes(Data[indtest,], Model = model) #table(Cls[indtest],res$ClsTest) #PDEbayes model=Train_naiveBayes(Data[indtrain,],Cls[indtrain],Gaussian=FALSE) ClsTrain=model$ClsTrain table(Cls[indtrain],ClsTrain) res=Predict_naiveBayes(Data[indtest,], Model = model) table(Cls[indtest],res$ClsTest) }
Calculates the posteriors, for given likelihoods and priors using the Bayes Theorem
ApplyBayesTheorem4Likelihoods(Likelihoods,Priors,threshold=.Machine$double.eps*1000)ApplyBayesTheorem4Likelihoods(Likelihoods,Priors,threshold=.Machine$double.eps*1000)
Likelihoods |
List of d numeric matrices, one per feature, each matrix with 1:k columns containing the distribution of class 1:k. |
Priors |
[1:k] Numeric vector with prior probability for each class. |
threshold |
(Optional: Default=0.00001). |
Posteriors |
[1:n, 1:d] Numeric matrix with posterior probability according to the bayes theorem. |
Michael Thrun
[Stier et al., 2026] Stier, Q.,Hoffmann, J. & Thrun, M. C.: Classifying with the Fine Structure of Distributions: Leveraging Distributional Information for Robust and Plausible Naïve Bayes, Machine Learning and Knowledge Extraction (MAKE), Vol. 8(1), 13, doi 10.3390/make8010013, MDPI, 2026.
if(requireNamespace("FCPS")){ data(Hepta) Data=Hepta$Data Cls=Hepta$Cls #parametric #V=Train_naiveBayes(Data,Cls,Gaussian=TRUE) #ClsTrain=V$ClsTrain #table(Cls,ClsTrain) #non-parametric V=Train_naiveBayes(Data,Cls,Gaussian=FALSE) ClsTrain=V$ClsTrain table(Cls,ClsTrain) }if(requireNamespace("FCPS")){ data(Hepta) Data=Hepta$Data Cls=Hepta$Cls #parametric #V=Train_naiveBayes(Data,Cls,Gaussian=TRUE) #ClsTrain=V$ClsTrain #table(Cls,ClsTrain) #non-parametric V=Train_naiveBayes(Data,Cls,Gaussian=FALSE) ClsTrain=V$ClsTrain table(Cls,ClsTrain) }
The function estimates the distribution of values within a features that belong to a specific class, i.e., the conditional probability of the likelihood
defineOrEstimateDistribution(Feature,ClassInd,Gaussian=FALSE,ParetoRadius=NULL, InternalPlotIt=FALSE,SD_Threshold=0.001,...)defineOrEstimateDistribution(Feature,ClassInd,Gaussian=FALSE,ParetoRadius=NULL, InternalPlotIt=FALSE,SD_Threshold=0.001,...)
Feature |
[1:n] Numeric Vector |
ClassInd |
Integer Vector with class indices |
Gaussian |
(Optional: Default=TRUE). Assume gaussian distribution. |
ParetoRadius |
Optional [1:d] numerical vector for pareto radii computed
priorly, see |
InternalPlotIt |
Optional: Default=FALSE). Create plot if set to TRUE. |
SD_Threshold |
Optional: Default=0.001. |
... |
|
Kernels |
[1:m] Numeric vector with kernels (x-values) of a 1D pdf. |
PDF |
[1:m] Numeric vector with the distribution values of a 1D pdf. |
Theta |
Numeric vector with parameters of gaussian of mean and standard deviation - NULL if no gaussian used. |
Michael Thrun
[Stier et al., 2026] Stier, Q.,Hoffmann, J. & Thrun, M. C.: Classifying with the Fine Structure of Distributions: Leveraging Distributional Information for Robust and Plausible Naïve Bayes, Machine Learning and Knowledge Extraction (MAKE), Vol. 8(1), 13, doi 10.3390/make8010013, MDPI, 2026.
if(requireNamespace("FCPS")){ data(Hepta) Data=Hepta$Data Cls=Hepta$Cls Priors=getPriors(Cls) }if(requireNamespace("FCPS")){ data(Hepta) Data=Hepta$Data Cls=Hepta$Cls Priors=getPriors(Cls) }
Fit Gaussian parameters.
fitParameters(Feature,ClassInd,Robust=FALSE,na.rm=TRUE,SD_Threshold=0.0001)fitParameters(Feature,ClassInd,Robust=FALSE,na.rm=TRUE,SD_Threshold=0.0001)
Feature |
[1:n] Numeric Vector |
ClassInd |
Integer Vector with class indices |
Robust |
(Optional: Default=FALSE). Robust computation if set to TRUE. |
na.rm |
(Optional: Default=TRUE). Remove na. |
SD_Threshold |
(Optional: Default=0.00001). |
Parameters |
[1:2] Numeric vector with Mean and Std. |
Michael Thrun
if(requireNamespace("FCPS")){ data(Hepta) Data=Hepta$Data Cls=Hepta$Cls Priors=getPriors(Cls) }if(requireNamespace("FCPS")){ data(Hepta) Data=Hepta$Data Cls=Hepta$Cls Priors=getPriors(Cls) }
Yields the likelihoods per feauture and class as values of distribution either defined by Gaussian or estimated form the data using pareto density estimation.
GetLikelihoods(Data,Cls,...)GetLikelihoods(Data,Cls,...)
Data |
[1:n,1:d] matrix of training data. It consists of n cases of d-dimensional data points. Every case has d attributes, variables or features. |
Cls |
[1:n] numerical vector with n numbers defining the classification. It has k unique numbers representing the arbitrary labels of the classification. |
... |
Further arguements for |
Due to pareto density estimation per class and feature, usually the number of rows in each element of
c_Kernels_list and ListOfLikelihoods varies and does not equal the number of rows of data n.
c_Kernels_list |
List of d numeric matrices, one per feature, each matrix with 1:k columns containing the kernels of class 1:k |
ListOfLikelihoods |
List of d numeric matrices, one per feature, each matrix with 1:k columns containing distribution values (likelihood) of class 1:k |
Thetas |
If Gaussian=TRUE: List of d numeric matrices, one per feauture, each matrix with 1:k rows containing the mean in the first column and the standard deviation in teh seconf columd of class 1:k Otherwise: NULL |
ParetoRadiusPerFeauture |
Numeric vector with estimated pareto radius per feature. |
Michael Thrun
[Stier et al., 2026] Stier, Q.,Hoffmann, J. & Thrun, M. C.: Classifying with the Fine Structure of Distributions: Leveraging Distributional Information for Robust and Plausible Naïve Bayes, Machine Learning and Knowledge Extraction (MAKE), Vol. 8(1), 13, doi 10.3390/make8010013, MDPI, 2026.
if(requireNamespace("FCPS")){ data(Hepta) Data=Hepta$Data Cls=Hepta$Cls Priors=getPriors(Cls) }if(requireNamespace("FCPS")){ data(Hepta) Data=Hepta$Data Cls=Hepta$Cls Priors=getPriors(Cls) }
Get a prior via class proportions.
getPriors(Cls)getPriors(Cls)
Cls |
[1:n] numerical vector with n numbers defining the classification. It has k unique numbers representing the arbitrary labels of the classification. |
Priors |
[1:k] Numeric vector with prior probability for each class. |
Michael Thrun
if(requireNamespace("FCPS")){ data(Hepta) Data=Hepta$Data Cls=Hepta$Cls Priors=getPriors(Cls) }if(requireNamespace("FCPS")){ data(Hepta) Data=Hepta$Data Cls=Hepta$Cls Priors=getPriors(Cls) }
Clearly defined clusters, different variances. Detailed description of dataset and its clustering challenge is provided in [Thrun/Ultsch, 2020].
data("Hepta")data("Hepta")
Size 212, Dimensions 3, stored in Hepta$Data
Classes 7, stored in Hepta$Cls
[Ultsch, 2003] Ultsch, A.: Maps for the visualization of high-dimensional data spaces, Proc. Workshop on Self organizing Maps (WSOM), pp. 225-230, Kyushu, Japan, 2003.
[Thrun/Ultsch, 2020] Thrun, M. C., & Ultsch, A.: Clustering Benchmark Datasets Exploiting the Fundamental Clustering Problems, Data in Brief, Vol. 30(C), pp. 105501, doi:10.1016/j.dib.2020.105501, 2020.
data(Hepta) str(Hepta)data(Hepta) str(Hepta)
Plots estimation of decision boundary in a 2D slice of the data using the posteriors
PlotBayesianDecision2D(X, Y, Posteriors, Class = 1, NoBins, CellColorsOrPallette, Showpoints = TRUE, xlim, ylim, xlab, ylab, main, PlotIt = TRUE)PlotBayesianDecision2D(X, Y, Posteriors, Class = 1, NoBins, CellColorsOrPallette, Showpoints = TRUE, xlim, ylim, xlab, ylab, main, PlotIt = TRUE)
X |
Numeric vector with point coordinates of first dimension of data selection. |
Y |
Numeric vector with point coordinates of second dimension of data selection. |
Posteriors |
[1:n, 1:Class] matrix of posteriors. |
Class |
Optional,Integer defining which class to look at. |
NoBins |
Optional,Number of bins for class posteriori. |
CellColorsOrPallette |
Optional, Either a function defining the color palette of a character vector or character vector of length NoBins stating colors. |
Showpoints |
Optional, TRUE, points are displayed. |
xlim |
Optional,Numeric vector of length 2 stating limits of x axis. |
ylim |
Optional,Numeric vector of length 2 stating limits of y axis. |
xlab |
Optional,Character stating name of x axis. |
ylab |
Optional,Character stating name of y axis. |
main |
Optional, Character name of title |
PlotIt |
Optional, TRUE: prints GGPLOT2 object, FALSE: not shown plot. |
Boundaries are assumed to be zero for plotting.
List of:
Mapping |
List containing a map for colors, kernels and bin number. |
GGobj |
ggplot2 object containing 2D visualization of Posteriori. |
Michael Thrun
[Stier et al., 2026] Stier, Q.,Hoffmann, J. & Thrun, M. C.: Classifying with the Fine Structure of Distributions: Leveraging Distributional Information for Robust and Plausible Naïve Bayes, Machine Learning and Knowledge Extraction (MAKE), Vol. 8(1), 13, doi 10.3390/make8010013, MDPI, 2026.
Data = as.matrix(iris[,1:4]) Cls = as.numeric(iris[,5]) TrainIdx = c(17, 73, 46, 29, 68, 35, 131, 62, 132, 127, 71, 72, 144, 99, 93, 13, 38, 21, 102, 53, 36, 111, 114, 96, 57, 74, 145, 86, 3, 16, 52, 59, 140, 40, 122, 109, 6, 91, 79, 15, 108, 139, 37, 76, 20, 115, 66, 28, 100, 117, 44, 78, 80, 150, 146, 142, 9, 90, 45, 58, 134, 11, 87, 125, 141, 118, 136, 48, 124, 47, 8, 27, 33, 92, 130, 54, 65, 104, 23, 98, 129, 123, 34, 128, 135, 51, 64, 5, 94, 83, 42, 116, 101, 43, 7, 12, 82, 1, 84, 138, 2, 56, 4, 106, 120) TestIdx = c(60, 10, 75, 70, 81, 18, 97, 95, 67, 22, 55, 143, 88, 24, 105, 26, 119, 31, 107, 63, 41, 61, 32, 147, 89, 14, 121, 19, 113, 49, 126, 112, 25, 77, 137, 103, 50, 30, 149, 110, 39, 69, 148, 85, 133) TrainX = Data[TrainIdx, ] TestX = Data[TestIdx, ] TrainY = Cls[TrainIdx] TestY = Cls[TestIdx] VPDENB = Train_naiveBayes(Data = TrainX, Cls = TrainY, Plausible = FALSE) PlotBayesianDecision2D(X = TrainX[, 1], Y = TrainX[, 2], Posteriors = VPDENB$Posteriors, Class = 1)Data = as.matrix(iris[,1:4]) Cls = as.numeric(iris[,5]) TrainIdx = c(17, 73, 46, 29, 68, 35, 131, 62, 132, 127, 71, 72, 144, 99, 93, 13, 38, 21, 102, 53, 36, 111, 114, 96, 57, 74, 145, 86, 3, 16, 52, 59, 140, 40, 122, 109, 6, 91, 79, 15, 108, 139, 37, 76, 20, 115, 66, 28, 100, 117, 44, 78, 80, 150, 146, 142, 9, 90, 45, 58, 134, 11, 87, 125, 141, 118, 136, 48, 124, 47, 8, 27, 33, 92, 130, 54, 65, 104, 23, 98, 129, 123, 34, 128, 135, 51, 64, 5, 94, 83, 42, 116, 101, 43, 7, 12, 82, 1, 84, 138, 2, 56, 4, 106, 120) TestIdx = c(60, 10, 75, 70, 81, 18, 97, 95, 67, 22, 55, 143, 88, 24, 105, 26, 119, 31, 107, 63, 41, 61, 32, 147, 89, 14, 121, 19, 113, 49, 126, 112, 25, 77, 137, 103, 50, 30, 149, 110, 39, 69, 148, 85, 133) TrainX = Data[TrainIdx, ] TestX = Data[TestIdx, ] TrainY = Cls[TrainIdx] TestY = Cls[TestIdx] VPDENB = Train_naiveBayes(Data = TrainX, Cls = TrainY, Plausible = FALSE) PlotBayesianDecision2D(X = TrainX[, 1], Y = TrainX[, 2], Posteriors = VPDENB$Posteriors, Class = 1)
Plots the class-conditional Likelihoods per feature, given the generating likelihood functions.
PlotLikelihoodFuns(LikelihoodFuns,Data,PlausibleLikelihoodFuns=NULL, Epsilon=NULL,PlausibleCenters=NULL,PlotCutOff=4,xlim)PlotLikelihoodFuns(LikelihoodFuns,Data,PlausibleLikelihoodFuns=NULL, Epsilon=NULL,PlausibleCenters=NULL,PlotCutOff=4,xlim)
LikelihoodFuns |
List with Likelihoods generating functions |
Data |
Numeric matrix with data. |
PlausibleLikelihoodFuns |
List with plausible Likelihoods. |
Epsilon |
Numeric scalar defining epsilon fo plausible likelihoods. |
PlausibleCenters |
Numeric vector [1:k] plausible centers used to compute plausible likelihoods. |
PlotCutOff |
scalar defining the how many feature starting from 1 should be plotted or numerical vector defining the index of features to be plotted in second case should not be too many otherwise plot yields an error. |
xlim |
Numeric vector of length 2 stating limits of x axis. |
No return value.
Michael Thrun
[Stier et al., 2026] Stier, Q.,Hoffmann, J. & Thrun, M. C.: Classifying with the Fine Structure of Distributions: Leveraging Distributional Information for Robust and Plausible Naïve Bayes, Machine Learning and Knowledge Extraction (MAKE), Vol. 8(1), 13, doi 10.3390/make8010013, MDPI, 2026.
Data = as.matrix(iris[,1:4]) Cls = as.numeric(iris[,5]) TrainIdx = c(17, 73, 46, 29, 68, 35, 131, 62, 132, 127, 71, 72, 144, 99, 93, 13, 38, 21, 102, 53, 36, 111, 114, 96, 57, 74, 145, 86, 3, 16, 52, 59, 140, 40, 122, 109, 6, 91, 79, 15, 108, 139, 37, 76, 20, 115, 66, 28, 100, 117, 44, 78, 80, 150, 146, 142, 9, 90, 45, 58, 134, 11, 87, 125, 141, 118, 136, 48, 124, 47, 8, 27, 33, 92, 130, 54, 65, 104, 23, 98, 129, 123, 34, 128, 135, 51, 64, 5, 94, 83, 42, 116, 101, 43, 7, 12, 82, 1, 84, 138, 2, 56, 4, 106, 120) TestIdx = c(60, 10, 75, 70, 81, 18, 97, 95, 67, 22, 55, 143, 88, 24, 105, 26, 119, 31, 107, 63, 41, 61, 32, 147, 89, 14, 121, 19, 113, 49, 126, 112, 25, 77, 137, 103, 50, 30, 149, 110, 39, 69, 148, 85, 133) TrainX = Data[TrainIdx, ] TestX = Data[TestIdx, ] TrainY = Cls[TrainIdx] TestY = Cls[TestIdx] VPDENB = Train_naiveBayes(Data = TrainX, Cls = TrainY, Plausible = FALSE) PlotLikelihoodFuns(LikelihoodFuns = VPDENB$Model$PDFs_funs, Data = TrainX)Data = as.matrix(iris[,1:4]) Cls = as.numeric(iris[,5]) TrainIdx = c(17, 73, 46, 29, 68, 35, 131, 62, 132, 127, 71, 72, 144, 99, 93, 13, 38, 21, 102, 53, 36, 111, 114, 96, 57, 74, 145, 86, 3, 16, 52, 59, 140, 40, 122, 109, 6, 91, 79, 15, 108, 139, 37, 76, 20, 115, 66, 28, 100, 117, 44, 78, 80, 150, 146, 142, 9, 90, 45, 58, 134, 11, 87, 125, 141, 118, 136, 48, 124, 47, 8, 27, 33, 92, 130, 54, 65, 104, 23, 98, 129, 123, 34, 128, 135, 51, 64, 5, 94, 83, 42, 116, 101, 43, 7, 12, 82, 1, 84, 138, 2, 56, 4, 106, 120) TestIdx = c(60, 10, 75, 70, 81, 18, 97, 95, 67, 22, 55, 143, 88, 24, 105, 26, 119, 31, 107, 63, 41, 61, 32, 147, 89, 14, 121, 19, 113, 49, 126, 112, 25, 77, 137, 103, 50, 30, 149, 110, 39, 69, 148, 85, 133) TrainX = Data[TrainIdx, ] TestX = Data[TestIdx, ] TrainY = Cls[TrainIdx] TestY = Cls[TestIdx] VPDENB = Train_naiveBayes(Data = TrainX, Cls = TrainY, Plausible = FALSE) PlotLikelihoodFuns(LikelihoodFuns = VPDENB$Model$PDFs_funs, Data = TrainX)
Plots the Likelihoods per feature.
PlotLikelihoods(Likelihoods, Data, PlausibleLikelihoods=NULL,Epsilon=NULL, PlausibleCenters=NULL,PlotCutOff=4,xlim)PlotLikelihoods(Likelihoods, Data, PlausibleLikelihoods=NULL,Epsilon=NULL, PlausibleCenters=NULL,PlotCutOff=4,xlim)
Likelihoods |
List with Likelihoods. |
Data |
Numeric matrix with data. |
PlausibleLikelihoods |
List with plausible Likelihoods. |
Epsilon |
Numeric scalar defining epsilon fo plausible likelihoods. |
PlausibleCenters |
Numeric vector [1:k] plausible centers used to compute plausible likelihoods. |
PlotCutOff |
scalar defining the how many feature starting from 1 should be plotted or numerical vector defining the index of features to be plotted in second case should not be too many otherwise plot yields an error. |
xlim |
Numeric vector of length 2 stating limits of x axis. |
Boundaries are assumed to be zero for plotting.
No return value.
Michael Thrun
[Stier et al., 2026] Stier, Q.,Hoffmann, J. & Thrun, M. C.: Classifying with the Fine Structure of Distributions: Leveraging Distributional Information for Robust and Plausible Naïve Bayes, Machine Learning and Knowledge Extraction (MAKE), Vol. 8(1), 13, doi 10.3390/make8010013, MDPI, 2026.
Data = as.matrix(iris[,1:4]) Cls = as.numeric(iris[,5]) TrainIdx = c(17, 73, 46, 29, 68, 35, 131, 62, 132, 127, 71, 72, 144, 99, 93, 13, 38, 21, 102, 53, 36, 111, 114, 96, 57, 74, 145, 86, 3, 16, 52, 59, 140, 40, 122, 109, 6, 91, 79, 15, 108, 139, 37, 76, 20, 115, 66, 28, 100, 117, 44, 78, 80, 150, 146, 142, 9, 90, 45, 58, 134, 11, 87, 125, 141, 118, 136, 48, 124, 47, 8, 27, 33, 92, 130, 54, 65, 104, 23, 98, 129, 123, 34, 128, 135, 51, 64, 5, 94, 83, 42, 116, 101, 43, 7, 12, 82, 1, 84, 138, 2, 56, 4, 106, 120) TestIdx = c(60, 10, 75, 70, 81, 18, 97, 95, 67, 22, 55, 143, 88, 24, 105, 26, 119, 31, 107, 63, 41, 61, 32, 147, 89, 14, 121, 19, 113, 49, 126, 112, 25, 77, 137, 103, 50, 30, 149, 110, 39, 69, 148, 85, 133) TrainX = Data[TrainIdx, ] TestX = Data[TestIdx, ] TrainY = Cls[TrainIdx] TestY = Cls[TestIdx] VPDENB = Train_naiveBayes(Data = TrainX, Cls = TrainY, Plausible = FALSE) PlotLikelihoods(Likelihoods = VPDENB$Model$ListOfLikelihoods, Data = TrainX)Data = as.matrix(iris[,1:4]) Cls = as.numeric(iris[,5]) TrainIdx = c(17, 73, 46, 29, 68, 35, 131, 62, 132, 127, 71, 72, 144, 99, 93, 13, 38, 21, 102, 53, 36, 111, 114, 96, 57, 74, 145, 86, 3, 16, 52, 59, 140, 40, 122, 109, 6, 91, 79, 15, 108, 139, 37, 76, 20, 115, 66, 28, 100, 117, 44, 78, 80, 150, 146, 142, 9, 90, 45, 58, 134, 11, 87, 125, 141, 118, 136, 48, 124, 47, 8, 27, 33, 92, 130, 54, 65, 104, 23, 98, 129, 123, 34, 128, 135, 51, 64, 5, 94, 83, 42, 116, 101, 43, 7, 12, 82, 1, 84, 138, 2, 56, 4, 106, 120) TestIdx = c(60, 10, 75, 70, 81, 18, 97, 95, 67, 22, 55, 143, 88, 24, 105, 26, 119, 31, 107, 63, 41, 61, 32, 147, 89, 14, 121, 19, 113, 49, 126, 112, 25, 77, 137, 103, 50, 30, 149, 110, 39, 69, 148, 85, 133) TrainX = Data[TrainIdx, ] TestX = Data[TestIdx, ] TrainY = Cls[TrainIdx] TestY = Cls[TestIdx] VPDENB = Train_naiveBayes(Data = TrainX, Cls = TrainY, Plausible = FALSE) PlotLikelihoods(Likelihoods = VPDENB$Model$ListOfLikelihoods, Data = TrainX)
Visualize the class-conditional distributions of the Pareto Density estimated naive Bayes model (PDENB) [Stier et al., 2026].
PlotNaiveBayes(Model, FeatureNames, ClassNames, DatasetName = "Data", nrows = 1, FeatureOrder, NumFeaturesPerRow = 4, Colors, IndividualFigures = FALSE)PlotNaiveBayes(Model, FeatureNames, ClassNames, DatasetName = "Data", nrows = 1, FeatureOrder, NumFeaturesPerRow = 4, Colors, IndividualFigures = FALSE)
Model |
List with elements |
FeatureNames |
Character vector of names with a name for each feature contained in the data used to create the naive bayes model. |
ClassNames |
Character vector of class names to present in the legend of the plots. |
DatasetName |
Character title for each plot. |
nrows |
Number of rows inside one plot. |
FeatureOrder |
Numeric vector representing the order of the features to be displayed. |
NumFeaturesPerRow |
Maximum number of features to be displayed in one plot. |
Colors |
Character vector of color names. The length of the vector must be the same as the number of classes within the data modeled by the naive Bayes classifier. |
IndividualFigures |
Optional boolean: If set to TRUE, it returns a list of the individual figures for customization. |
Boundaries are assumed to be zero for plotting.
Cls |
[1:n] numerical vector with n numbers defining the classification. It has k unique numbers representing the arbitrary labels of the classification. |
Posteriors |
[1:n, 1:l] Numeric matrices with posterior probabilities. |
DataLikelihoodsPerClass |
list of length |
Quirin Stier
[Stier et al., 2026] Stier, Q.,Hoffmann, J. & Thrun, M. C.: Classifying with the Fine Structure of Distributions: Leveraging Distributional Information for Robust and Plausible Naïve Bayes, Machine Learning and Knowledge Extraction (MAKE), Vol. 8(1), 13, doi 10.3390/make8010013, MDPI, 2026.
Data = as.matrix(iris[,1:4]) Cls = as.numeric(iris[,5]) DatasetName = "Iris" TrainIdx = c(17, 73, 46, 29, 68, 35, 131, 62, 132, 127, 71, 72, 144, 99, 93, 13, 38, 21, 102, 53, 36, 111, 114, 96, 57, 74, 145, 86, 3, 16, 52, 59, 140, 40, 122, 109, 6, 91, 79, 15, 108, 139, 37, 76, 20, 115, 66, 28, 100, 117, 44, 78, 80, 150, 146, 142, 9, 90, 45, 58, 134, 11, 87, 125, 141, 118, 136, 48, 124, 47, 8, 27, 33, 92, 130, 54, 65, 104, 23, 98, 129, 123, 34, 128, 135, 51, 64, 5, 94, 83, 42, 116, 101, 43, 7, 12, 82, 1, 84, 138, 2, 56, 4, 106, 120) TestIdx = c(60, 10, 75, 70, 81, 18, 97, 95, 67, 22, 55, 143, 88, 24, 105, 26, 119, 31, 107, 63, 41, 61, 32, 147, 89, 14, 121, 19, 113, 49, 126, 112, 25, 77, 137, 103, 50, 30, 149, 110, 39, 69, 148, 85, 133) TrainX = Data[TrainIdx, ] TestX = Data[TestIdx, ] TrainY = Cls[TrainIdx] TestY = Cls[TestIdx] VPDENB = Train_naiveBayes(Data = TrainX, Cls = TrainY, Plausible = FALSE) FeatureNames = colnames(Data) PlotNaiveBayes(Model = VPDENB$Model, FeatureNames = FeatureNames)Data = as.matrix(iris[,1:4]) Cls = as.numeric(iris[,5]) DatasetName = "Iris" TrainIdx = c(17, 73, 46, 29, 68, 35, 131, 62, 132, 127, 71, 72, 144, 99, 93, 13, 38, 21, 102, 53, 36, 111, 114, 96, 57, 74, 145, 86, 3, 16, 52, 59, 140, 40, 122, 109, 6, 91, 79, 15, 108, 139, 37, 76, 20, 115, 66, 28, 100, 117, 44, 78, 80, 150, 146, 142, 9, 90, 45, 58, 134, 11, 87, 125, 141, 118, 136, 48, 124, 47, 8, 27, 33, 92, 130, 54, 65, 104, 23, 98, 129, 123, 34, 128, 135, 51, 64, 5, 94, 83, 42, 116, 101, 43, 7, 12, 82, 1, 84, 138, 2, 56, 4, 106, 120) TestIdx = c(60, 10, 75, 70, 81, 18, 97, 95, 67, 22, 55, 143, 88, 24, 105, 26, 119, 31, 107, 63, 41, 61, 32, 147, 89, 14, 121, 19, 113, 49, 126, 112, 25, 77, 137, 103, 50, 30, 149, 110, 39, 69, 148, 85, 133) TrainX = Data[TrainIdx, ] TestX = Data[TestIdx, ] TrainY = Cls[TrainIdx] TestY = Cls[TestIdx] VPDENB = Train_naiveBayes(Data = TrainX, Cls = TrainY, Plausible = FALSE) FeatureNames = colnames(Data) PlotNaiveBayes(Model = VPDENB$Model, FeatureNames = FeatureNames)
Plots posteriors either using a panel of plots based on PlotBayesianDecision2D or in 1D as a line plot [Stier et al., 2026].
PlotPosteriors(Data, Posteriors, Class = 1, CellColorsOrPallette, Showpoints = TRUE)PlotPosteriors(Data, Posteriors, Class = 1, CellColorsOrPallette, Showpoints = TRUE)
Data |
Either numeric matrix [1:n, 1:d] with data or one column of data. |
Posteriors |
[1:n, 1:Class] matrix of posteriors. |
Class |
Integer defining which class to look at if numeric matrix is given, for column of data all posteriors are overlayed in line plot. |
CellColorsOrPallette |
Either a function defining the color palette of a character vector or character vector of length NoBins stating colors. |
Showpoints |
TRUE, points are displayed. |
Plotting posteriors in one directions only often does not give any insight. The default option using PlotBayesianDecision2D os often more useful.
GGobj |
ggplot2 object containing 2D visualization of Posteriori. |
Michael Thrun
[Stier et al., 2026] Stier, Q.,Hoffmann, J. & Thrun, M. C.: Classifying with the Fine Structure of Distributions: Leveraging Distributional Information for Robust and Plausible Naïve Bayes, Machine Learning and Knowledge Extraction (MAKE), Vol. 8(1), 13, doi 10.3390/make8010013, MDPI, 2026.
Data = as.matrix(iris[,1:4]) Cls = as.numeric(iris[,5]) TrainIdx = c(17, 73, 46, 29, 68, 35, 131, 62, 132, 127, 71, 72, 144, 99, 93, 13, 38, 21, 102, 53, 36, 111, 114, 96, 57, 74, 145, 86, 3, 16, 52, 59, 140, 40, 122, 109, 6, 91, 79, 15, 108, 139, 37, 76, 20, 115, 66, 28, 100, 117, 44, 78, 80, 150, 146, 142, 9, 90, 45, 58, 134, 11, 87, 125, 141, 118, 136, 48, 124, 47, 8, 27, 33, 92, 130, 54, 65, 104, 23, 98, 129, 123, 34, 128, 135, 51, 64, 5, 94, 83, 42, 116, 101, 43, 7, 12, 82, 1, 84, 138, 2, 56, 4, 106, 120) TestIdx = c(60, 10, 75, 70, 81, 18, 97, 95, 67, 22, 55, 143, 88, 24, 105, 26, 119, 31, 107, 63, 41, 61, 32, 147, 89, 14, 121, 19, 113, 49, 126, 112, 25, 77, 137, 103, 50, 30, 149, 110, 39, 69, 148, 85, 133) TrainX = Data[TrainIdx, ] TestX = Data[TestIdx, ] TrainY = Cls[TrainIdx] TestY = Cls[TestIdx] VPDENB = Train_naiveBayes(Data = TrainX, Cls = TrainY, Plausible = FALSE) #default option PlotPosteriors(Data = TrainX, Posteriors = VPDENB$Posteriors, Class = 1) # alternative option PlotPosteriors(Data = TrainX[,3], Posteriors = VPDENB$Posteriors)Data = as.matrix(iris[,1:4]) Cls = as.numeric(iris[,5]) TrainIdx = c(17, 73, 46, 29, 68, 35, 131, 62, 132, 127, 71, 72, 144, 99, 93, 13, 38, 21, 102, 53, 36, 111, 114, 96, 57, 74, 145, 86, 3, 16, 52, 59, 140, 40, 122, 109, 6, 91, 79, 15, 108, 139, 37, 76, 20, 115, 66, 28, 100, 117, 44, 78, 80, 150, 146, 142, 9, 90, 45, 58, 134, 11, 87, 125, 141, 118, 136, 48, 124, 47, 8, 27, 33, 92, 130, 54, 65, 104, 23, 98, 129, 123, 34, 128, 135, 51, 64, 5, 94, 83, 42, 116, 101, 43, 7, 12, 82, 1, 84, 138, 2, 56, 4, 106, 120) TestIdx = c(60, 10, 75, 70, 81, 18, 97, 95, 67, 22, 55, 143, 88, 24, 105, 26, 119, 31, 107, 63, 41, 61, 32, 147, 89, 14, 121, 19, 113, 49, 126, 112, 25, 77, 137, 103, 50, 30, 149, 110, 39, 69, 148, 85, 133) TrainX = Data[TrainIdx, ] TestX = Data[TestIdx, ] TrainY = Cls[TrainIdx] TestY = Cls[TestIdx] VPDENB = Train_naiveBayes(Data = TrainX, Cls = TrainY, Plausible = FALSE) #default option PlotPosteriors(Data = TrainX, Posteriors = VPDENB$Posteriors, Class = 1) # alternative option PlotPosteriors(Data = TrainX[,3], Posteriors = VPDENB$Posteriors)
Predict classification with naive Bayes model [Stier et al., 2026].
Predict_naiveBayes(Data, Model, ...)Predict_naiveBayes(Data, Model, ...)
Data |
[1:n,1:d] matrix of test data. It consists of n cases of d-dimensional data points. Every case has d attributes, variables or features. |
Model |
Optional, list with elements |
... |
|
The function is implemented in a way so that one can combine training and test data although it is intended to be applied on test data only.
Cls |
[1:n] numerical vector with n numbers defining the classification. It has k unique numbers representing the arbitrary labels of the classification. |
Posteriors |
[1:n, 1:l] Numeric matrices with posterior probabilities. |
DataLikelihoodsPerClass |
list of length |
Michael Thrun
[Stier et al., 2026] Stier, Q.,Hoffmann, J. & Thrun, M. C.: Classifying with the Fine Structure of Distributions: Leveraging Distributional Information for Robust and Plausible Naïve Bayes, Machine Learning and Knowledge Extraction (MAKE), Vol. 8(1), 13, doi 10.3390/make8010013, MDPI, 2026.
if(requireNamespace("FCPS")){ V=FCPS::ClusterChallenge("Hepta",1000) Data=V$Hepta Cls=V$Cls ind=1:length(Cls) indtrain=sample(ind,800) indtest=setdiff(ind,indtrain) #PDEbayes model=Train_naiveBayes(Data[indtrain,],Cls[indtrain],Gaussian=FALSE) ClsTrain=model$ClsTrain table(Cls[indtrain],ClsTrain) res=Predict_naiveBayes(Data[indtest,], Model = model) table(Cls[indtest],res$ClsTest) }if(requireNamespace("FCPS")){ V=FCPS::ClusterChallenge("Hepta",1000) Data=V$Hepta Cls=V$Cls ind=1:length(Cls) indtrain=sample(ind,800) indtest=setdiff(ind,indtrain) #PDEbayes model=Train_naiveBayes(Data[indtrain,],Cls[indtrain],Gaussian=FALSE) ClsTrain=model$ClsTrain table(Cls[indtrain],ClsTrain) res=Predict_naiveBayes(Data[indtest,], Model = model) table(Cls[indtest],res$ClsTest) }
Predict a classification with the Pareto Density estimated naive Bayes model [Stier et al., 2026] . (PDENB).
predict.PDEbayes(object, newdata, type = c("class", "response","prob"), ...)predict.PDEbayes(object, newdata, type = c("class", "response","prob"), ...)
object |
Model obtained from training routine in PDEnaiveBayes package. |
newdata |
[1:n,1:d] matrix of test data. It consists of n cases of d-dimensional data points. Every case has d attributes, variables or features. |
type |
Optional parameter. |
... |
|
The function is implemented in a way so that one can combine training and test data although it is intended to be applied on test data only.
Cls |
Numeric vector with predicted class associated with newdata. |
Michael Thrun
[Stier et al., 2026] Stier, Q.,Hoffmann, J. & Thrun, M. C.: Classifying with the Fine Structure of Distributions: Leveraging Distributional Information for Robust and Plausible Naïve Bayes, Machine Learning and Knowledge Extraction (MAKE), Vol. 8(1), 13, doi 10.3390/make8010013, MDPI, 2026.
if(requireNamespace("FCPS")){ V=FCPS::ClusterChallenge("Hepta",1000) Data=V$Hepta Cls=V$Cls ind=1:length(Cls) indtrain=sample(ind,800) indtest=setdiff(ind,indtrain) model=Train_naiveBayes(Data[indtrain,],Cls[indtrain],Gaussian=FALSE) ClsTrain=model$ClsTrain table(Cls[indtrain],ClsTrain) ClsTest=predict.PDEbayes(object = model, newdata = Data[indtest,]) table(Cls[indtest],ClsTest) }if(requireNamespace("FCPS")){ V=FCPS::ClusterChallenge("Hepta",1000) Data=V$Hepta Cls=V$Cls ind=1:length(Cls) indtrain=sample(ind,800) indtest=setdiff(ind,indtrain) model=Train_naiveBayes(Data[indtrain,],Cls[indtrain],Gaussian=FALSE) ClsTrain=model$ClsTrain table(Cls[indtrain],ClsTrain) ClsTest=predict.PDEbayes(object = model, newdata = Data[indtest,]) table(Cls[indtest],ClsTest) }
Trains a Pareto Density estimated naive Bayes model (PDENB) of [Stier et al., 2026].
Train_naiveBayes(Data,Cls,Predict=TRUE,Priors,...)Train_naiveBayes(Data,Cls,Predict=TRUE,Priors,...)
Data |
|
Cls |
|
Predict |
Optional, boolean to decide extent of output. In case of TRUE, yields ClsTrain and Posteriors, else it yields only Model and Thetas. Note: Only if Predict is set to TRUE, parameter EvalPlausible can be set true! |
Priors |
Optional, |
... |
|
Precomputation of ParetoRadiusPerFeauture can be usefull to make cross-validation faster although it should be only done on the training data.
If Plausible is not given, both options are evalauted using shannon information.
c_Kernels_list and ListOfLikelihoods have d elements each storing a matrix [1:m,1:k], usually m!=n. In contrast to DataLikelihoodsPerClass in which by interpolation the matrix are of size [1:n,1:k]
Model |
List of model parameters and results. |
c_Kernels_list |
List of matrices, where each matrix represent the kernels of one feature for all classes. |
ListOfLikelihoods |
List of matrices, where each matrix represent the likelihood of one feature for all classes. |
PDFs_funs |
Nested list of depth 1, where the first index assigns the feature index and the second index assigns the class. The elements are functions for the density estimation for each feature and each class. |
ParetoRadiusPerFeauture |
Numeric vector which stores the pareto radius for each feature. |
Theta |
Parameters mean and standard deviation of the Gaussian distributions per class and feaures. |
Priors |
Numeric vector which stores the prior probability of each class to appear. |
PlausibleCenters |
[1:k, 1:f] Numeric matrix which stores the centers for each feature and each class, where the row index assigns features and the column index assigns classes. |
ClsTrain |
[1:n] numerical vector with n numbers defining the classification. It has k unique numbers representing the arbitrary labels of the classification. |
Posteriors |
[1:n, 1:k] Numeric matrices with posterior probabilities. |
Michael Thrun
[Stier et al., 2026] Stier, Q.,Hoffmann, J. & Thrun, M. C.: Classifying with the Fine Structure of Distributions: Leveraging Distributional Information for Robust and Plausible Naïve Bayes, Machine Learning and Knowledge Extraction (MAKE), Vol. 8(1), 13, doi 10.3390/make8010013, MDPI, 2026.
if(requireNamespace("FCPS")){ data(Hepta) Data=Hepta$Data Cls=Hepta$Cls #non-parametric V=Train_naiveBayes(Data,Cls,Gaussian=FALSE) ClsTrain=V$ClsTrain table(Cls,ClsTrain) }if(requireNamespace("FCPS")){ data(Hepta) Data=Hepta$Data Cls=Hepta$Cls #non-parametric V=Train_naiveBayes(Data,Cls,Gaussian=FALSE) ClsTrain=V$ClsTrain table(Cls,ClsTrain) }
Trains a naive Bayes classifier using Gaussian likelihoods or nonparametric Pareto density estimation (PDE), with an optional plausible correction for low-evidence cases [Stier et al., 2026]. Model training can be distributed across a parallel cluster. Shared-memory computation can be enabled to prevent memory duplication among workers [Thrun and Märte, 2026].
Train_naiveBayes_multicore( cl = NULL, Data, Cls, Plausible = TRUE, Predict = FALSE, Priors, UseMemshare = FALSE, ... )Train_naiveBayes_multicore( cl = NULL, Data, Cls, Plausible = TRUE, Predict = FALSE, Priors, UseMemshare = FALSE, ... )
cl |
A cluster object, typically created with
|
Data |
A numeric |
Cls |
A numeric vector of length |
Plausible |
Logical. If |
Predict |
Logical. If |
Priors |
Optional numeric vector of length |
UseMemshare |
Logical. If |
... |
Additional named arguments passed to the underlying likelihood-estimation and prediction functions. Common arguments include:
|
The function trains every column of Data independently
and combines the resulting components into one multivariate naive Bayes model.
With a non-NULL cluster, the feature-wise fits are evaluated in
parallel.
When UseMemshare = FALSE, the standard parallel backend is used
and worker processes can hold separate copies of the required objects. When
UseMemshare = TRUE, memshare provides shared-memory objects during
the multicore computation [Thrun and Märte, 2026]. The potential memory saving is greatest for large
training matrices and multiple workers. Memory sharing has no effect when
cl = NULL.
If the number of rows in Data differs from the length of Cls, the
inputs are shortened to their common length with a warning. Observations with
non-finite class labels are removed before default priors are estimated.
If Data has no column names, feature names of the form X1,
X2, and so on are generated. Feature names are retained in the fitted
model components.
A precomputed Pareto radius can accelerate repeated fitting, such as during cross-validation. To avoid information leakage, it must be estimated separately within each training fold and never from the corresponding validation data.
A list with the following components:
ModelAn object of class PDEbayes containing the fitted model. Depending on the
selected options, it contains:
c_Kernels_listA list of length d. Each element is a matrix containing the density
kernels for one feature across the k classes.
ListOfLikelihoodsA list of length d. Each element is a matrix containing the estimated
likelihood values for one feature across the k classes.
PDFs_funsA feature-indexed list of class-specific density functions.
ParetoRadiusPerFeautureFor non-Gaussian models, either a numeric vector with one Pareto radius per
feature or, when GlobalPR = FALSE, a class-by-feature matrix. For
Gaussian models, this component is not used for likelihood estimation.
ThetasFor Gaussian models, a list of length d. Each element is a k by
2 matrix containing the class-wise means and standard deviations. It is
NULL for non-Gaussian models.
PriorsA named numeric vector containing the class prior probabilities.
PlausibleLogical value indicating whether plausible likelihood correction was requested.
PlausibleCentersWhen plausible correction is used, a d by k numeric matrix of
feature-wise class centers; otherwise NULL.
ClsTrainIf Predict = TRUE, a numeric vector of length n containing the
predicted class labels for the training observations; otherwise NULL.
PosteriorsIf Predict = TRUE, an n by k matrix of posterior class
probabilities; otherwise NULL.
Michael Thrun
[Stier et al., 2026] Stier, Q.,Hoffmann, J. & Thrun, M. C.: Classifying with the Fine Structure of Distributions: Leveraging Distributional Information for Robust and Plausible Naïve Bayes, Machine Learning and Knowledge Extraction (MAKE), Vol. 8(1), 13, doi 10.3390/make8010013, MDPI, 2026.
[Thrun and Märte, 2026] Thrun, M.C., Märte, J.: Memshare: Memory Sharing for Multicore Computation in R with an Application to Feature Selection by Mutual Information using PDE, The R Journal, Vol. 17(4), pp. 306 - 322, doi 10.32614/RJ-2025-043, 2026.
Train_naiveBayes,
Predict_naiveBayes,
predict.PDEbayes
data(Hepta) Data <- Hepta$Data Cls <- Hepta$Cls ## Single-process debugging mode Vdebug <- Train_naiveBayes_multicore( cl = NULL, Data = Data, Cls = Cls, Plausible = FALSE, Gaussian = FALSE, Predict = TRUE ) if(requireNamespace("parallel")){ cl = parallel::makeCluster(1)#set to number of cores >1 #each core copies the memory V=Train_naiveBayes_multicore(cl=cl,Data=Data,Cls=Cls, Predict=TRUE,UseMemshare=FALSE) ClsTrain=V$ClsTrain table(Cls,ClsTrain) #each core shares the memory V=Train_naiveBayes_multicore(cl=cl,Data=Data,Cls=Cls, Predict=TRUE,UseMemshare=TRUE) ClsTrain=V$ClsTrain table(Cls,ClsTrain) on.exit(parallel::stopCluster(cl)) }data(Hepta) Data <- Hepta$Data Cls <- Hepta$Cls ## Single-process debugging mode Vdebug <- Train_naiveBayes_multicore( cl = NULL, Data = Data, Cls = Cls, Plausible = FALSE, Gaussian = FALSE, Predict = TRUE ) if(requireNamespace("parallel")){ cl = parallel::makeCluster(1)#set to number of cores >1 #each core copies the memory V=Train_naiveBayes_multicore(cl=cl,Data=Data,Cls=Cls, Predict=TRUE,UseMemshare=FALSE) ClsTrain=V$ClsTrain table(Cls,ClsTrain) #each core shares the memory V=Train_naiveBayes_multicore(cl=cl,Data=Data,Cls=Cls, Predict=TRUE,UseMemshare=TRUE) ClsTrain=V$ClsTrain table(Cls,ClsTrain) on.exit(parallel::stopCluster(cl)) }