Title: | Visualizations of High-Dimensional Data |
---|---|
Description: | Gives access to data visualisation methods that are relevant from the data scientist's point of view. The flagship idea of 'DataVisualizations' is the mirrored density plot (MD-plot) for either classified or non-classified multivariate data published in Thrun, M.C. et al.: "Analyzing the Fine Structure of Distributions" (2020), PLoS ONE, <DOI:10.1371/journal.pone.0238835>. The MD-plot outperforms the box-and-whisker diagram (box plot), violin plot and bean plot and geom_violin plot of ggplot2. Furthermore, a collection of various visualization methods for univariate data is provided. In the case of exploratory data analysis, 'DataVisualizations' makes it possible to inspect the distribution of each feature of a dataset visually through a combination of four methods. One of these methods is the Pareto density estimation (PDE) of the probability density function (pdf). Additionally, visualizations of the distribution of distances using PDE, the scatter-density plot using PDE for two variables as well as the Shepard density plot and the Bland-Altman plot are presented here. Pertaining to classified high-dimensional data, a number of visualizations are described, such as f.ex. the heat map and silhouette plot. A political map of the world or Germany can be visualized with the additional information defined by a classification of countries or regions. By extending the political map further, an uncomplicated function for a Choropleth map can be used which is useful for measurements across a geographic area. For categorical features, the Pie charts, slope charts and fan plots, improved by the ABC analysis, become usable. More detailed explanations are found in the book by Thrun, M.C.: "Projection-Based Clustering through Self-Organization and Swarm Intelligence" (2018) <DOI:10.1007/978-3-658-20540-9>. |
Authors: | Michael Thrun [aut, cre, cph] , Felix Pape [aut, rev], Onno Hansen-Goos [ctr, ctb], Quirin Stier [ctb, rev] , Hamza Tayyab [ctr, ctb], Luca Brinkmann [ctr, ctb], Dirk Eddelbuettel [ctr], Craig Varrichio [ctr], Alfred Ultsch [dtc, ctb, ctr] |
Maintainer: | Michael Thrun <[email protected]> |
License: | GPL-3 |
Version: | 1.3.3 |
Built: | 2024-10-29 10:22:19 UTC |
Source: | https://github.com/mthrun/datavisualizations |
Gives access to data visualisation methods that are relevant from the data scientist's point of view. The flagship idea of 'DataVisualizations' is the mirrored density plot (MD-plot) for either classified or non-classified multivariate data published in Thrun, M.C. et al.: "Analyzing the Fine Structure of Distributions" (2020), PLoS ONE, <DOI:10.1371/journal.pone.0238835>. The MD-plot outperforms the box-and-whisker diagram (box plot), violin plot and bean plot and geom_violin plot of ggplot2. Furthermore, a collection of various visualization methods for univariate data is provided. In the case of exploratory data analysis, 'DataVisualizations' makes it possible to inspect the distribution of each feature of a dataset visually through a combination of four methods. One of these methods is the Pareto density estimation (PDE) of the probability density function (pdf). Additionally, visualizations of the distribution of distances using PDE, the scatter-density plot using PDE for two variables as well as the Shepard density plot and the Bland-Altman plot are presented here. Pertaining to classified high-dimensional data, a number of visualizations are described, such as f.ex. the heat map and silhouette plot. A political map of the world or Germany can be visualized with the additional information defined by a classification of countries or regions. By extending the political map further, an uncomplicated function for a Choropleth map can be used which is useful for measurements across a geographic area. For categorical features, the Pie charts, slope charts and fan plots, improved by the ABC analysis, become usable. More detailed explanations are found in the book by Thrun, M.C.: "Projection-Based Clustering through Self-Organization and Swarm Intelligence" (2018) <DOI:10.1007/978-3-658-20540-9>.
For a brief introduction to DataVisualizations please see the vignette A Quick Tour in Data Visualizations.
Please see https://www.deepbionics.org/. Depending on the context please cite either [Thrun, 2018] regarding visualizations in the context of clustering or [Thrun/Ultsch, 2018] for other visualizations.
For the Mirrored Density Plot (MD plot) please cite [Thrun et al., 2020] and see the extensive vignette in https://md-plot.readthedocs.io/en/latest/index.html. The MD plot is also available in Python https://pypi.org/project/md-plot/
Index of help topics:
ABCbarplot Barplot with Sorted Data Colored by ABCanalysis AccountingInformation_PrimeStandard_Q3_2019 Accounting Information in the Prime Standard in Q3 in 2019 (AI_PS_Q3_2019) BimodalityAmplitude Bimodality Amplitude CCDFplot plot Complementary Cumulative Distribution Function (CCDF) in Log/Log uses ecdf, CCDF(x) = 1-cdf(x) ChoroplethPostalCodesAndAGS_Germany Postal Codes and AGS of Germany for a Choropleth Map Choroplethmap Plots the Choropleth Map ClassBarPlot ClassBarPlot ClassBoxplot Creates Boxplot plot for all classes ClassErrorbar ClassErrorbar ClassMDplot Class MDplot for Data w.r.t. all classes ClassPDEplot PDE Plot for all classes ClassPDEplotMaxLikeli Create PDE plot for all classes with maximum likelihood Classplot Classplot CombineCols Combine vectors of various lengths CombineRows Combine matrices of various lengths Crosstable Crosstable plot DataVisualizations-package Visualizations of High-Dimensional Data DefaultColorSequence Default color sequence for plots DensityContour Contour plot of densities DensityScatter Scatter plot with densities DiagnosticAbility4Classifiers DiagnosticAbility4Classifiers DrawWorldWithCls Plot a classificated world map DualaxisClassplot Dualaxis Classplot DualaxisLinechart DualaxisLinechart Fanplot The fan plot FundamentalData_Q1_2018 Fundamental Data of the 1st Quarter in 2018 GermanPostalCodesShapes GermanPostalCodesShapes GoogleMapsCoordinates Google Maps with marked coordinates Heatmap Heatmap for Clustering HeatmapColors Default color sequence for plots ITS Income Tax Share InspectBoxplots Inspect Boxplots InspectCorrelation Inspect the Correlation InspectDistances Inspection of Distance-Distribution InspectScatterplots Pairwise scatterplots and optimal histograms InspectStandardization QQplot of Data versus Normalized Data InspectVariable Visualization of Distribution of one variable JitterUniqueValues Jitters Unique Values Lsun3D Lsun3D inspired by FCPS [Thrun/Ultsch, 2020] introduced in [Thrun, 2018] MAplot Minus versus Add plot MDplot Mirrored Density plot (MD-plot) MDplot4multiplevectors Mirrored Density plot (MD-plot)for Multiple Vectors MTY Muncipal Income Tax Yield Meanrobust Robust Empirical Mean Estimation Multiplot Plot multiple ggplots objects in one panel OpposingViolinBiclassPlot OpposingViolinBiclassPlot OptimalNoBins Optimal Number Of Bins PDEnormrobust PDEnormrobust PDEplot PDE plot ParetoDensityEstimation Pareto Density Estimation V3 ParetoRadius ParetoRadius for distributions Piechart The pie chart Pixelmatrix Plot of a Pixel Matrix Plot3D 3D plot of points PlotGraph2D PlotGraph2D PlotMissingvalues Plot of the Amount Of Missing Values PlotProductratio Product-Ratio Plot PmatrixColormap P-Matrix colors QQplot QQplot with a Linear Fit ROC ROC plot RobustNorm_BackTrafo Transforms the Robust Normalization back RobustNormalization RobustNormalization ShepardDensityScatter Shepard PDE scatter Sheparddiagram Draws a Shepard Diagram SignedLog Signed Log Silhouetteplot Silhouette plot of classified data. Slopechart Slope Chart StatPDEdensity Pareto Density Estimation Stdrobust Standard Deviation Robust Worldmap plots a world map by country codes categoricalVariable A categorical Feature. estimateDensity2D estimateDensity2D stat_pde_density Calculate Pareto density estimation for ggplot2 plots world_country_polygons world_country_polygons zplot Plotting for 3 dimensional data
Michael Thrun, Felix Pape, Onno Hansen-Goos, Alfred Ultsch
Maintainer: Michael Thrun <[email protected]>
[Thrun, 2018] Thrun, M. C.: Projection Based Clustering through Self-Organization and Swarm Intelligence, doctoral dissertation 2017, Springer, Heidelberg, ISBN: 978-3-658-20539-3, doi:10.1007/978-3-658-20540-9, 2018.
[Thrun/Ultsch, 2018] Thrun, M. C., & Ultsch, A. : Effects of the payout system of income taxes to municipalities in Germany, in Papiez, M. & Smiech,, S. (eds.), Proc. 12th Professor Aleksander Zelias International Conference on Modelling and Forecasting of Socio-Economic Phenomena, pp. 533-542, Cracow: Foundation of the Cracow University of Economics, Cracow, Poland, 2018.
[Thrun et al., 2020] Thrun, M. C., Gehlert, T. & Ultsch, A.: Analyzing the Fine Structure of Distributions, PLoS ONE, Vol. 15(10), pp. 1-66, DOI 10.1371/journal.pone.0238835, 2020.
data("Lsun3D") Data=Lsun3D$Data Pixelmatrix(Data) InspectDistances(as.matrix(dist(Data))) MAlist=MAplot(ITS,MTY) data("Lsun3D") Cls=Lsun3D$Cls Data=Lsun3D$Data #clear cluster structure plot(Data[,1:2],col=Cls) #However, the silhouette plot does not indicate a very good clustering in cluster 1 and 2 Silhouetteplot(Data,Cls = Cls) Heatmap(as.matrix(dist(Data)),Cls = Cls)
data("Lsun3D") Data=Lsun3D$Data Pixelmatrix(Data) InspectDistances(as.matrix(dist(Data))) MAlist=MAplot(ITS,MTY) data("Lsun3D") Cls=Lsun3D$Cls Data=Lsun3D$Data #clear cluster structure plot(Data[,1:2],col=Cls) #However, the silhouette plot does not indicate a very good clustering in cluster 1 and 2 Silhouetteplot(Data,Cls = Cls) Heatmap(as.matrix(dist(Data)),Cls = Cls)
This plot can be read like a scree plot for PCA. It allowed to select the most important values visually.
ABCbarplot(Data, Colors=DataVisualizations::DefaultColorSequence[1:3], main,xlab,ylab="Value")
ABCbarplot(Data, Colors=DataVisualizations::DefaultColorSequence[1:3], main,xlab,ylab="Value")
Data |
[1:n] vector of Data, e.g. eigenvalues of PCA |
Colors |
three colors for A, B and C |
main |
title of plot |
xlab |
xlabel |
ylab |
ylabel |
ABC analysis is explained in ABCanalysis. The visualization is based on ggplot2.
List V of
ABCanalysis |
output of ABCanalysis |
ggobject |
object of ggplot2 plotted |
DF |
Data frame if another plot should be done manually |
Michael Thrun
Ultsch. A ., Lotsch J.: Computed ABC Analysis for Rational Selection of Most Informative Variables in Multivariate Data, PloS one, Vol. 10(6), pp. e0129767. doi 10.1371/journal.pone.0129767, 2015.
data('FundamentalData_Q1_2018') Data=as.matrix(FundamentalData_Q1_2018$Data) Data[!is.finite(Data)]=0 results=prcomp(Data) main="Scree plot with Class A of the Most-Important Eigenvalues" plotlist = ABCbarplot(results$sdev,ylab='Eigenvalues',main=main) plotlist$ggobject
data('FundamentalData_Q1_2018') Data=as.matrix(FundamentalData_Q1_2018$Data) Data[!is.finite(Data)]=0 results=prcomp(Data) main="Scree plot with Class A of the Most-Important Eigenvalues" plotlist = ABCbarplot(results$sdev,ylab='Eigenvalues',main=main) plotlist$ggobject
Accounting Information of 261 companies traded in the Frankfurt stock exchange in the German Prime standard.
data("AccountingInformation_PrimeStandard_Q3_2019")
data("AccountingInformation_PrimeStandard_Q3_2019")
A list with of three objects
Key
[1:n] Key of the 261 obeservations
Data
[1:n,1:d] numeric matrix of 261 observations on the 45 variables describing the accounting information
Cls
[1:n] a numeric vector of k clusters of the clustering performend in [Thrun/Ultsch, 2019]
Detailed data description can be found in [Thrun/Ultsch, 2019].
Yahoo Finance
[Thrun/Ultsch, 2019] Thrun, M. C., & Ultsch, A.: Stock Selection via Knowledge Discovery using Swarm Intelligence with Emergence, IEEE Intelligent Systems, Vol. under review, pp., 2019.
data(AccountingInformation_PrimeStandard_Q3_2019) str(AI_PS_Q3_2019) dim(AI_PS_Q3_2019$Data)
data(AccountingInformation_PrimeStandard_Q3_2019) str(AI_PS_Q3_2019) dim(AI_PS_Q3_2019$Data)
Computes the Bimodality Amplitude of [Zhang et al., 2003]
BimodalityAmplitude(x, PlotIt=FALSE)
BimodalityAmplitude(x, PlotIt=FALSE)
x |
Data vector. |
PlotIt |
FALSE, TRUE if a figure with the antimodes and peaks is plotted |
This function calculates the Bimodality Ampltiude of a data vector. This is a measure of the proportion of bimodality and the existence of bimodality. The value lies between zero and one (that is: [0,1]) where the value of zero implies that the data is unimodal and the value of one implies the data is two point masses.
function was rewritten after the flow of a function of Sathish Deevi because the original function was incorrect.
Michael Thrun
Zhang, C., Mapes, B., & Soden, B.: Bimodality in tropical water vapour, Quarterly Journal of the Royal Meteorological Society, Vol. 129(594), pp. 2847-2866, 2003.
#Example 1 data<-c(rnorm(299,0,1),rnorm(299,5,1)) BimodalityAmplitude(data,TRUE) #Example 2 dist1<-rnorm(2100,5,2) dist2<-dist1+11 data<-c(dist1,dist2) BimodalityAmplitude(data,TRUE) #Example 3 dist1<-rnorm(210,-15,1) dist2<-rep(dist1,3)+30 data<-c(dist1,dist2) BimodalityAmplitude(data,TRUE) #Example 4 data<-runif(1000,-15,1) BimodalityAmplitude(data,TRUE)
#Example 1 data<-c(rnorm(299,0,1),rnorm(299,5,1)) BimodalityAmplitude(data,TRUE) #Example 2 dist1<-rnorm(2100,5,2) dist2<-dist1+11 data<-c(dist1,dist2) BimodalityAmplitude(data,TRUE) #Example 3 dist1<-rnorm(210,-15,1) dist2<-rep(dist1,3)+30 data<-c(dist1,dist2) BimodalityAmplitude(data,TRUE) #Example 4 data<-runif(1000,-15,1) BimodalityAmplitude(data,TRUE)
Character vector of length 391029 with five different labels.
data("categoricalVariable")
data("categoricalVariable")
data(categoricalVariable) unique(categoricalVariable)
data(categoricalVariable) unique(categoricalVariable)
plot Complementary Cumulative Distribution Function (CCDF) in Log/Log uses ecdf, CCDF(x) = 1-cdf(x)
Feature |
Vector of data to be plotted, or a matrix with given probability density function in column 2 and/or a cumulative density function in column 3 |
pch |
Optional, default: pch=0 for Line, other numbers see documentation about pch of plot |
PlotIt |
Optional, if PlotIt==T (default) do a plot, otherwise return only values |
LogLogPlot |
Optional, if LogLogPlot==T (default) do a log/log plot |
xlab |
Optional, xlab of plot |
ylab |
Optional, ylab of plot |
main |
Optional, main of plot |
... |
Optional, further arguments for plot |
V$CCDFuniqX,V$CCDFuniqY CCDFuniqY= 1-cdf(CCDFuniqX), such that plot(CCDFuniqX,CCDFuniqY)...)
Michael Thrun
A thematic map with areas colored in proportion to the measurement of the statistical variable being displayed on the map. A political map geneated by this function was used in the conference talk of the publication [Thrun/Ultsch, 2018].
Choroplethmap(Counts, PostalCodes, NumberOfBins = 0, Breaks4Intervals, percentiles = c(0.5, 0.95), digits = 0, PostalCodesShapes, PlotIt = TRUE, DiscreteColors, HighColorContinuous = "red", LowColorContinuous = "deepskyblue1", NAcolor = "grey", ReferenceMap = FALSE, main = "Political Map of Germany", legend = "Range of values", Silent = TRUE)
Choroplethmap(Counts, PostalCodes, NumberOfBins = 0, Breaks4Intervals, percentiles = c(0.5, 0.95), digits = 0, PostalCodesShapes, PlotIt = TRUE, DiscreteColors, HighColorContinuous = "red", LowColorContinuous = "deepskyblue1", NAcolor = "grey", ReferenceMap = FALSE, main = "Political Map of Germany", legend = "Range of values", Silent = TRUE)
Counts |
vector [1:m], statistical variable being displayed |
PostalCodes |
vector[1:n], currently german postal codes (zip codes), if |
NumberOfBins |
Default: 1; 1 or below continously changes the color as defined by the package |
Breaks4Intervals |
If NumberOfBins>1 you can set here the intervals of the bins manually |
percentiles |
If NumberOfBins>1 and Breaks4Intervals not set, then the percentiles of min and max bin can be set here. See also |
digits |
number of digits for |
PostalCodesShapes |
Specially prepared shape file with postal codes and geographic boundaries. If you set this object, then you can use non german zip codes. You can see the required structure in map.df, github trulia choroplethr blob master r chloropleth. The German PostalCodesShapes can be downloaded from https://github.com/Mthrun/DataVisualizations/tree/master/data. |
PlotIt |
Either Plot the map directly or change the object manually before plotting it |
DiscreteColors |
Set the discrete colors manually if NumberOfBins>1, else it is ignored |
HighColorContinuous |
if NumberOfBins<=1: color of highest continuous value, else it is ignored |
LowColorContinuous |
if NumberOfBins<=1: color of lowest continuous value, else it is ignored |
NAcolor |
Color of NA values in the map (postal codes without any counts) |
ReferenceMap |
TRUE: With Google map, FALSE: without Google map |
main |
title of plot |
legend |
title of legend |
Silent |
TRUE: disable warnings of |
This wrapper for the choroplethr enables to visualize a political map easily in the case of german zip codes based on given counts and postal codes. Other postal codes are in principle usable.
List of
chorR6obj |
An R6 object of the package |
DataFrame |
Transformed PostalCodes and Counts in a way that they can be used in the package |
You could read https://www.r-bloggers.com/2016/05/case-study-mapping-german-zip-codes-in-r/, if you want to change the map (PostalCodesShapes
shape object).
Michael Thrun
[Thrun/Ultsch, 2018] Thrun, M. C., & Ultsch, A. : Effects of the payout system of income taxes to municipalities in Germany, in Papiez, M. & Smiech,, S. (eds.), Proc. 12th Professor Aleksander Zelias International Conference on Modelling and Forecasting of Socio-Economic Phenomena, pp. 533-542, Cracow: Foundation of the Cracow University of Economics, Cracow, Poland, 2018.
Google choroplethr
package.
#Many postal codes are required to see a structure #Exemplary two postal codes in the upper left corner of the map out=Choroplethmap(c(4,8,5,4), c('49838', '26817', '49838', '26817'), NumberOfBins=2,PlotIt=FALSE) out$chorR6obj$render() #bins are only presented in the map if the have values within out=Choroplethmap(c(4,8,5,4),c('49838', '26817', '49838', '26817'),NumberOfBins=5, Breaks4Intervals=c(1,2,3,5,10),PlotIt=FALSE) out$chorR6obj$render() # Result of [Thrun/Ultsch, 2018] data('ChoroplethPostalCodesAndAGS_Germany') res=Choroplethmap(as.numeric(ChoroplethPostalCodesAndAGS_Germany$Cls)+1, ChoroplethPostalCodesAndAGS_Germany$PLZ,NumberOfBins = 2, Breaks4Intervals = c(0,1,2,3,4,5,6),digits = 1,ReferenceMap = F, DiscreteColors = c('white','green','blue','red','magenta'), main = 'Classification of German Postal Codes based on Income Tax Share and Yield', legend = 'ITS vs MTY Classification in 2010',NAcolor = 'black',PlotIt=FALSE) #takes time to process res$chorR6obj$render()
#Many postal codes are required to see a structure #Exemplary two postal codes in the upper left corner of the map out=Choroplethmap(c(4,8,5,4), c('49838', '26817', '49838', '26817'), NumberOfBins=2,PlotIt=FALSE) out$chorR6obj$render() #bins are only presented in the map if the have values within out=Choroplethmap(c(4,8,5,4),c('49838', '26817', '49838', '26817'),NumberOfBins=5, Breaks4Intervals=c(1,2,3,5,10),PlotIt=FALSE) out$chorR6obj$render() # Result of [Thrun/Ultsch, 2018] data('ChoroplethPostalCodesAndAGS_Germany') res=Choroplethmap(as.numeric(ChoroplethPostalCodesAndAGS_Germany$Cls)+1, ChoroplethPostalCodesAndAGS_Germany$PLZ,NumberOfBins = 2, Breaks4Intervals = c(0,1,2,3,4,5,6),digits = 1,ReferenceMap = F, DiscreteColors = c('white','green','blue','red','magenta'), main = 'Classification of German Postal Codes based on Income Tax Share and Yield', legend = 'ITS vs MTY Classification in 2010',NAcolor = 'black',PlotIt=FALSE) #takes time to process res$chorR6obj$render()
Zip Codes and Community Identification Number of Germany which can be used in a Choropleth Map.
data("ChoroplethPostalCodesAndAGS_Germany")
data("ChoroplethPostalCodesAndAGS_Germany")
A data frame with 8702 observations on the following 4 variables.
PLZ
German postal codes/zip codes
Cls
Clustering aggregated of germany postal codes by MTY
and ITS
features
AGS
It is the 'Amtlicher Gemeindeschluessel' (Community Identification Number) of German municipalities
Names
Names of municipalities
CLS are the the labels of a MTS versus ITS Bayesian classification showing two main groups of low quota ('1') and high quota ('2') municipalities. Additionally, outliers are manually classified into two separated groups called sponsors ('3') and promoted ('4'). In the Bayesian Classification non classified data have the label '0'. If a 'AGS' code of a 'PLZ' was unclear than the label is 'NaN'.
Class | 0 | low quota | high quota | sponsors | promoted | non classified | unclear mapping |
Labels | 0 | 1 | 2 | 3 | 4 | 5 | NaN |
CountPerClass | 31 | 1325 | 7239 | 10 | 95 | 5 | 2 |
Generated for [Thrun/Ultsch, 2018] using the approach of [Ultsch/Behnisch, 2017].
[Thrun/Ultsch, 2018] Thrun, M. C., & Ultsch, A. : Effects of the payout system of income taxes to municipalities in Germany, in Papiez, M. & Smiech,, S. (eds.), Proc. 12th Professor Aleksander Zelias International Conference on Modelling and Forecasting of Socio-Economic Phenomena, pp. 533-542, Cracow: Foundation of the Cracow University of Economics, Cracow, Poland, 2018.
[Ultsch/Behnisch, 2017] Ultsch, A., Behnisch, M.: Effects of the payout system of income taxes to municipalities in Germany, Applied Geography, Vol. 81, pp. 21-31, 2017.
data(ChoroplethPostalCodesAndAGS_Germany) str(ChoroplethPostalCodesAndAGS_Germany)
data(ChoroplethPostalCodesAndAGS_Germany) str(ChoroplethPostalCodesAndAGS_Germany)
Represent values for each class and instance as bar plot with optional error deviation, e.g., mean values of features depending on class with standard deviation.
ClassBarPlot(Values, Cls, Deviation, Names, ylab = "Values", xlab = "Instances", PlotIt = TRUE)
ClassBarPlot(Values, Cls, Deviation, Names, ylab = "Values", xlab = "Instances", PlotIt = TRUE)
Values |
[1:n] Numeric vector with values (y-axis) in matching order to Cls, Deviation and Names. |
Cls |
[1:n] Numeric vector of classes in matching order to Values and Deviation and Names. |
Deviation |
[1:n] Numeric vector with deviation in matching order to Values and Cls and Names. |
Names |
[1:n] Character or numeric vector of instances (x-axis) in matching order to Values and Cls and Deviation. |
ylab |
Character stating y label. |
xlab |
Character stating x label. |
PlotIt |
Logical value indicating visual output TRUE => create visual output FALSE => do not create visual output (Default: Boolean=TRUE). |
ggplot2 object
Quirin Stier
library(ggplot2) if(require(dplyr)){ tmpVar1 = iris tmpVar2 = iris tmpVar3 = iris tmpVar4 = iris tmpVar5 = iris tmpVar6 = iris tmpVar7 = iris tmpVar8 = iris Values = c(tmpVar1$mean, tmpVar2$mean, tmpVar3$mean, tmpVar4$mean) Class = rep(1:3, 4) Deviation = c(tmpVar5$sd, tmpVar6$sd, tmpVar7$sd, tmpVar8$sd) if(length(Values) == length(Class)){ ClassBarPlot(Values = Values, Cls = Class, Deviation = Deviation) } }
library(ggplot2) if(require(dplyr)){ tmpVar1 = iris tmpVar2 = iris tmpVar3 = iris tmpVar4 = iris tmpVar5 = iris tmpVar6 = iris tmpVar7 = iris tmpVar8 = iris Values = c(tmpVar1$mean, tmpVar2$mean, tmpVar3$mean, tmpVar4$mean) Class = rep(1:3, 4) Deviation = c(tmpVar5$sd, tmpVar6$sd, tmpVar7$sd, tmpVar8$sd) if(length(Values) == length(Class)){ ClassBarPlot(Values = Values, Cls = Class, Deviation = Deviation) } }
Boxplot the data for all classes
ClassBoxplot(Data, Cls, ColorSequence = DataVisualizations::DefaultColorSequence, ClassNames = NULL,All=FALSE, PlotLegend = TRUE, main = 'Boxplot per Class', xlab = 'Classes', ylab = 'Range of Data')
ClassBoxplot(Data, Cls, ColorSequence = DataVisualizations::DefaultColorSequence, ClassNames = NULL,All=FALSE, PlotLegend = TRUE, main = 'Boxplot per Class', xlab = 'Classes', ylab = 'Range of Data')
Data |
Vector of the data to be plotted |
Cls |
Vector of class identifiers. |
ColorSequence |
Optional: The sequence of colors used, Default: DefaultColorSequence() |
ClassNames |
Optional: The names of the classes. Default: C1 - C(Number of Classes) |
All |
Optional: adds full data vector for comparison against classes |
PlotLegend |
Optional: Add a legent to plot. Default: TRUE) |
main |
Optional: Title of the plot. Default: "ClassBoxPlot"" |
xlab |
Optional: Title of the x axis. Default: "Classes" |
ylab |
Optional: Title of the y axis. Default: "Data" |
A List of
ClassData |
The DataFrame used to plot |
ggobject |
The ggplot2 plot object |
in mode invisible
Michael Thrun, Felix Pape
data(ITS) #please download package from cran #model=AdaptGauss::AdaptGauss(ITS) #Classification=AdaptGauss::ClassifyByDecisionBoundaries(ITS, #DecisionBoundaries = AdaptGauss::BayesDecisionBoundaries(model$Means,model$SDs,model$Weights)) DataVisualizations::ClassBoxplot(ITS,Classification)$ggobject
data(ITS) #please download package from cran #model=AdaptGauss::AdaptGauss(ITS) #Classification=AdaptGauss::ClassifyByDecisionBoundaries(ITS, #DecisionBoundaries = AdaptGauss::BayesDecisionBoundaries(model$Means,model$SDs,model$Weights)) DataVisualizations::ClassBoxplot(ITS,Classification)$ggobject
Plots ClassErrorbars at Xvalue positions for one or more than one classes with user means and defined whiskers
ClassErrorbar(Xvalues, Ymatrix, Cls, ClassNames, ClassCols, ClassShape, MeanFun = median, SDfun, JitterPosition = 0.5, main = "Error bar plot", xlab, ylab, WhiskerWidth = 7, Whisker_lwd = 1, BW = TRUE)
ClassErrorbar(Xvalues, Ymatrix, Cls, ClassNames, ClassCols, ClassShape, MeanFun = median, SDfun, JitterPosition = 0.5, main = "Error bar plot", xlab, ylab, WhiskerWidth = 7, Whisker_lwd = 1, BW = TRUE)
Xvalues |
[1:m] Numerical or character vector, positions of error bars (see details) in on x-axis for the m variables |
Ymatrix |
[1:n,1:d] of n cases and d=m*k variables with for which the error-bar statistics defined by MeanFun and SDfun should be computed |
Cls |
Optional, [1:d] numerical vector of k classes for the d variables. Each class is one method that will be shown as distinctive set of error bars in the plot |
ClassNames |
Optional, [1:k] character vector of k methods |
ClassCols |
Optional, [1:k] character vector of k colors |
ClassShape |
Optional, [1:k] numerical vector of k shapes, see pch in |
MeanFun |
Optional, error bar statstic of mean points, default=median |
SDfun |
Optional, error bar statstic for the length of whiskers, default is the robust estimation of standard deviation |
JitterPosition |
Optional, how much in values of Xvalues should the error bars jitter around Xvalues to not overlap |
main |
Optional, title of plot |
xlab |
Optional, x-axis label |
ylab |
Optional, y-axis label |
WhiskerWidth |
Optional, scalar above zero defining the width of the end of the whiskers |
Whisker_lwd |
Optional, scalar obove zero defining the thickness of the whisker lines |
BW |
Optional, FALSE: usual ggplot2 background and style which is good for screen visualizations. Default: TRUE: theme_bw() is used which is more appropriate for publications |
If k=1, e.g., one method is used, d=m and Cls=rep(1,m). All vector [1:k] assume the occurance of the classes in Cls as ordered with increasing value.
Statistics are provided in long table format with the column names Xvalues, Mean, SD and Method. The method column specifies the names of the k classes.
If Xvalues
is a character vector (see example), ggplot2 automatically sets the position on the x-axis.
Otherwise specific numeric positions can be set. This allowes also for plotting a smooth line over the average (see example).
List with
ggobj |
The ggplot object of the ClassErrorbar |
Statistics |
[1:(d*k)1:4] data frame of statstics per class used for plotting |
Michael Thrun
data('FundamentalData_Q1_2018') Data=as.matrix(FundamentalData_Q1_2018$Data) Cls = FundamentalData_Q1_2018$Cls Class1Data = matrix(NA, nrow = nrow(Data), ncol = 2) Class2Data = matrix(NA, nrow = nrow(Data), ncol = 2) Class1Data[which(Cls==1), ] = Data[which(Cls==1), c("TotalAssets", "TotalLiabilities")] Class2Data[which(Cls==2), ] = Data[which(Cls==2), c("TotalAssets", "TotalLiabilities")] YMatrix = cbind(Class1Data, Class2Data) #Option 1: character vector ClassErrorbar(c("TotalRevenue","GrossProfit"), YMatrix, c (1,1,2,2), ClassNames=c("Class 1", "Class 2"), main="ClassErrorbar of Q1 2018 for total revenue and gross profit", xlab="GrossProfit/TotalRevenue", ylab="Median +- std", WhiskerWidth = 1) #Option 2: numerical vector ClassErrorbar(c(1,2), YMatrix, c(1,1,2,2), ClassNames=c("Class 1", "Class 2"), main="ClassErrorbar of Q1 2018 for total revenue and gross profit", xlab="GrossProfit/TotalRevenue", ylab="Median +- std", WhiskerWidth = 1) #Option 3: numerical vector + line ## Not run: #arbitrary data Y_someOtherData=cbind(YMatrix,YMatrix, YMatrix,YMatrix) some_values=c(2,3,4,5,6,8,9,10) ClassErrorbar(some_values, Y_someOtherData, c(1,1,2,2), ClassNames=c("Class 1", "Class 2"), main="ClassErrorbar of Q1 2018 for total revenue and gross profit", xlab="GrossProfit/TotalRevenue", ylab="Median +- std", WhiskerWidth = 1)$ggobj+ geom_smooth(method="auto", se=F, fullrange=F, level=0.95) ## End(Not run)
data('FundamentalData_Q1_2018') Data=as.matrix(FundamentalData_Q1_2018$Data) Cls = FundamentalData_Q1_2018$Cls Class1Data = matrix(NA, nrow = nrow(Data), ncol = 2) Class2Data = matrix(NA, nrow = nrow(Data), ncol = 2) Class1Data[which(Cls==1), ] = Data[which(Cls==1), c("TotalAssets", "TotalLiabilities")] Class2Data[which(Cls==2), ] = Data[which(Cls==2), c("TotalAssets", "TotalLiabilities")] YMatrix = cbind(Class1Data, Class2Data) #Option 1: character vector ClassErrorbar(c("TotalRevenue","GrossProfit"), YMatrix, c (1,1,2,2), ClassNames=c("Class 1", "Class 2"), main="ClassErrorbar of Q1 2018 for total revenue and gross profit", xlab="GrossProfit/TotalRevenue", ylab="Median +- std", WhiskerWidth = 1) #Option 2: numerical vector ClassErrorbar(c(1,2), YMatrix, c(1,1,2,2), ClassNames=c("Class 1", "Class 2"), main="ClassErrorbar of Q1 2018 for total revenue and gross profit", xlab="GrossProfit/TotalRevenue", ylab="Median +- std", WhiskerWidth = 1) #Option 3: numerical vector + line ## Not run: #arbitrary data Y_someOtherData=cbind(YMatrix,YMatrix, YMatrix,YMatrix) some_values=c(2,3,4,5,6,8,9,10) ClassErrorbar(some_values, Y_someOtherData, c(1,1,2,2), ClassNames=c("Class 1", "Class 2"), main="ClassErrorbar of Q1 2018 for total revenue and gross profit", xlab="GrossProfit/TotalRevenue", ylab="Median +- std", WhiskerWidth = 1)$ggobj+ geom_smooth(method="auto", se=F, fullrange=F, level=0.95) ## End(Not run)
Creates a Mirrored-Density plot w.r.t. to each class of a numerical vector of data.
ClassMDplot(Data, Cls, ColorSequence = DataVisualizations::DefaultColorSequence, ClassNames = NULL, PlotLegend = TRUE,Ordering = "Columnwise", main = 'MDplot for each Class', xlab = 'Classes', ylab = 'PDE of Data per Class', Fill = 'darkblue', MinimalAmoutOfData=40, MinimalAmoutOfUniqueData=12,SampleSize=1e+05,...)
ClassMDplot(Data, Cls, ColorSequence = DataVisualizations::DefaultColorSequence, ClassNames = NULL, PlotLegend = TRUE,Ordering = "Columnwise", main = 'MDplot for each Class', xlab = 'Classes', ylab = 'PDE of Data per Class', Fill = 'darkblue', MinimalAmoutOfData=40, MinimalAmoutOfUniqueData=12,SampleSize=1e+05,...)
Data |
[1:n] Vector of the data to be plotted |
Cls |
[1:n] Vector of class identifiers of k clusters one number is the label of one cluster |
ColorSequence |
Optional: [1:k] vector, The sequence of colors used, Default: DataVisualizations::DefaultColorSequence |
ClassNames |
Optional: [1:k] named numerical vector, The names of the classes. Default: Class 1 - Class k with k beeing the number of classes |
PlotLegend |
Optional: Add a legent to plot. Default: TRUE) |
Ordering |
Optional: Ordering of Classes, please see |
main |
Optional: Title of the plot. Default: MDplot for each Class |
Fill |
Optional: [1:k] Vector with the colors, the MD's are to be colored with. If only one value is given, all MD's are colored in the same color. |
xlab |
Optional: Title of the x axis. Default: "Classes" |
ylab |
Optional: Title of the y axis. Default: "Data" |
MinimalAmoutOfData |
Optional: numeric value defining a threshold. Below this threshold no density estimation is performed and a Jitter plot with a median line is drawn. Please see |
MinimalAmoutOfUniqueData |
Optional: numeric value defining a threshold. Below this threshold no density estimation and statistical testing is performed and a Jitter plot is drawn. Only Data Science experts should change this value after they understand how the density is estimated (see [Ultsch, 2005]). |
SampleSize |
Optional: numeric value defining a threshold. Above this thresholdclass-wise uniform sampling of finite cases is performed in order to shorten computation time. If required, |
... |
Further arguments that are documented in |
Further examples for the ClassMDplot can be found in https://md-plot.readthedocs.io/en/latest/application/example_application.html.
The Cls
vector is reordered from lowest to highest number.
The ClassNames
vector and ColorSequence
vectors are matched by this ordering of Cls
, i.e. the lowest number gets the first color or class name.
A List of
ClassData |
The matrix [1:m,1:NoOfClasses] used to plot with the reordered Cls, rows are filled partly with NaN, m is the length of the number of data in largest class. |
ggobject |
The ggplot2 plot object |
in mode invisible
Function is still experimental because ColorSequence
does not work yet, because we are unable to specify the colors in ggplot2. If someone knows a solution, please mail the maintainer of the package. Similar issue for PlotLegend
.
Michael Thrun, Felix Pape
Thrun, M. C., Breuer, L., & Ultsch, A. : Knowledge discovery from low-frequency stream nitrate concentrations: hydrology and biology contributions, Proc. European Conference on Data Analysis (ECDA), Paderborn, Germany, 2018.
https://md-plot.readthedocs.io/en/latest/application/example_application.html
MDplot
https://pypi.org/project/md-plot/
data(ITS) #shortcut for example if AdaptGauss not installed Classification = kmeans(ITS, centers = 2)$cluster #better approach #please download package from cran #model=AdaptGauss::AdaptGauss(ITS) #Classification=AdaptGauss::ClassifyByDecisionBoundaries(ITS, #DecisionBoundaries = AdaptGauss::BayesDecisionBoundaries(model$Means,model$SDs,model$Weights)) ClassNames=c(1,2) names(ClassNames)=c("Insert name \n of Class 1","Insert name \n of Class 2") ClassMDplot(ITS,Classification,ClassNames = ClassNames)
data(ITS) #shortcut for example if AdaptGauss not installed Classification = kmeans(ITS, centers = 2)$cluster #better approach #please download package from cran #model=AdaptGauss::AdaptGauss(ITS) #Classification=AdaptGauss::ClassifyByDecisionBoundaries(ITS, #DecisionBoundaries = AdaptGauss::BayesDecisionBoundaries(model$Means,model$SDs,model$Weights)) ClassNames=c(1,2) names(ClassNames)=c("Insert name \n of Class 1","Insert name \n of Class 2") ClassMDplot(ITS,Classification,ClassNames = ClassNames)
PDEplot the data for all classes, weights the pdf with priors
ClassPDEplot(Data, Cls, ColorSequence, ColorSymbSequence, PlotLegend = 1, SameKernelsAndRadius = 0, xlim, ylim, ...)
ClassPDEplot(Data, Cls, ColorSequence, ColorSymbSequence, PlotLegend = 1, SameKernelsAndRadius = 0, xlim, ylim, ...)
Data |
The Data to be plotted |
Cls |
Vector of class identifiers. Can be integers or NaN's, need not be consecutive nor positive |
ColorSequence |
Optional: the sequence of colors used, Default: DefaultColorSequence |
ColorSymbSequence |
Optional: the plot symbols used (theoretisch nicht notwendig, da erst wichtig, wenn mehr als 562 Cluster) |
PlotLegend |
Optional: add a legent to plot (default == 1) |
SameKernelsAndRadius |
Optional: Use the same PDE kernels and radii for all distributions (default == 0) |
xlim |
Optional: range of the x axis |
ylim |
Optional: range of the y axis |
... |
further arguments passed to plot |
Kernels of the Pareto density estimation in mode invisible
Michael Thrun
data(ITS) #please download package from cran #model=AdaptGauss::AdaptGauss(ITS) #Classification=AdaptGauss::ClassifyByDecisionBoundaries(ITS, #DecisionBoundaries = AdaptGauss::BayesDecisionBoundaries(model$Means,model$SDs,model$Weights)) DataVisualizations::ClassPDEplot(ITS,Classification)$ggobject
data(ITS) #please download package from cran #model=AdaptGauss::AdaptGauss(ITS) #Classification=AdaptGauss::ClassifyByDecisionBoundaries(ITS, #DecisionBoundaries = AdaptGauss::BayesDecisionBoundaries(model$Means,model$SDs,model$Weights)) DataVisualizations::ClassPDEplot(ITS,Classification)$ggobject
PDEplot the data for allclasses, weight the Plot with 1 (= maximum likelihood)
ClassPDEplotMaxLikeli(Data, Cls, ColorSequence = DataVisualizations::DefaultColorSequence, ClassNames, PlotLegend = TRUE, MinAnzKernels = 0,PlotNorm, main = "Pareto Density Estimation (PDE)", xlab = "Data", ylab = "ParetoDensity", xlim, ylim, lwd=1, ...)
ClassPDEplotMaxLikeli(Data, Cls, ColorSequence = DataVisualizations::DefaultColorSequence, ClassNames, PlotLegend = TRUE, MinAnzKernels = 0,PlotNorm, main = "Pareto Density Estimation (PDE)", xlab = "Data", ylab = "ParetoDensity", xlim, ylim, lwd=1, ...)
Data |
The Data to be plotted |
Cls |
Vector of class identifiers. Can be integers or NaN's, need not be consecutive nor positive |
ColorSequence |
Optional: the sequence of colors used, Default: DefaultColorSequence |
ClassNames |
Optional: the names of the classes to be displayed in the legend |
PlotLegend |
Optional: add a legent to plot (default == 1) |
MinAnzKernels |
Optional: Minimum number of kernels |
PlotNorm |
Optional: ==1 => plot Normal distribuion on top , ==2 = plot robust normal distribution,; default: PlotNorm= 0 |
main |
Optional: Title of the plot |
xlab |
Optional: title of the x axis |
ylab |
Optional: title of the y axis |
xlim |
Optional: area of the x-axis to be plotted |
lwd |
Optional: area of the y-axis to be plotted |
ylim |
numerical scalar defining the width of the lines |
... |
further arguments passed to plot |
Kernels |
Kernels of the distributions |
ClassParetoDensities |
Pareto densities for classes |
ggobject |
ggplot2 plot object. This should be used to further modify the plot |
Felix Pape
Aubert, A. H., Thrun, M. C., Breuer, L., & Ultsch, A. : Knowledge discovery from high-frequency stream nitrate concentrations: hydrology and biology contributions, Scientific reports, Nature, Vol. 6(31536), pp. doi 10.1038/srep31536, 2016.
data(ITS) #model=AdaptGauss::AdaptGauss(ITS) ##please download package from cran #Classification=AdaptGauss::ClassifyByDecisionBoundaries(ITS, #DecisionBoundaries = AdaptGauss::BayesDecisionBoundaries(model$Means,model$SDs,model$Weights)) DataVisualizations::ClassPDEplotMaxLikeli(ITS,Classification)$ggobject
data(ITS) #model=AdaptGauss::AdaptGauss(ITS) ##please download package from cran #Classification=AdaptGauss::ClassifyByDecisionBoundaries(ITS, #DecisionBoundaries = AdaptGauss::BayesDecisionBoundaries(model$Means,model$SDs,model$Weights)) DataVisualizations::ClassPDEplotMaxLikeli(ITS,Classification)$ggobject
Allows to plot one time series or feauture with a classification as a labeled scatter plot with a line. The colors are the labels defined by the classification.
Classplot(X, Y, Cls, Plotter,Names = NULL, na.rm = FALSE, xlab = "X", ylab = "Y", main = "Class Plot", Colors = NULL, Size = 8,PointBorderCol="black", LineColor = NULL, LineWidth = 1, LineType = NULL, Showgrid = TRUE, pch, AnnotateIt = FALSE, SaveIt = FALSE, Nudge_x_Names = 0, Nudge_y_Names = 0, Legend = "", SmallClassesOnTop = TRUE, ...)
Classplot(X, Y, Cls, Plotter,Names = NULL, na.rm = FALSE, xlab = "X", ylab = "Y", main = "Class Plot", Colors = NULL, Size = 8,PointBorderCol="black", LineColor = NULL, LineWidth = 1, LineType = NULL, Showgrid = TRUE, pch, AnnotateIt = FALSE, SaveIt = FALSE, Nudge_x_Names = 0, Nudge_y_Names = 0, Legend = "", SmallClassesOnTop = TRUE, ...)
X |
[1:n] numeric vector or time |
Y |
[1:n] numeric vector of feature |
Cls |
[1:n] numeric vector of k classes, if not set per default every point is in first class |
Names |
[1:n] character vector of k classes, if not set per default Cls is used, if set, names the legend and the points |
na.rm |
Function may not work with non finite values. If these cases should be automatically removed, set parameter TRUE |
xlab |
Optional, string for xlabel |
ylab |
Optional, string for ylabel |
main |
Optional, string for title of plot |
Colors |
Optional, [1;k] string defining the k colors, one per class |
AnnotateIt |
Optional, in case of |
Size |
Optional, size of points, beware: default is appropriate for " |
PointBorderCol |
Optional, string, color of the dot outline for " |
LineColor |
Optional, name of color, in plotly then all points are connected by a curve, in ggplot2 all points of one class ae connected by a curve of the color the class |
LineWidth |
Optional, number defining the width of the curve (plotly only) |
LineType |
Optional, string defining the type of the curve in plotly only, " for ggplot2: just set =1 here and then the curve is plotted |
Showgrid |
Optional, boolean (plotly only) |
Plotter |
Optional, either " |
pch |
[1:n] numeric vector of length n of the cases of Cls for the k classes. It defines the symbols to use, for native |
SaveIt |
Optional, boolean, if true saves plot as html (plotly) or png (ggplot2) |
Nudge_x_Names |
Optional, numerical scalar, for |
Nudge_y_Names |
Optional, numerical scalar, for |
SmallClassesOnTop |
Optional, boolean, decide if small classes should be plotted on top for visibility (default setting) or not. |
Legend |
Optional, if argument is not missing, character string defining the title of the legend which automatically enables the legend |
... |
Further arguments for |
The mapping of colors to the labels of Cls
is consecutive, i.e., the label with the smallest value in Cls
gets the first color in Colors
. The Colors are plotted in order from label with the highest number of points to the label with the lowest number of points beeing on top.
Default is "plotly
" if Names
are NULL
. However, ggplot2 is preferable in case that Names
parameter is used because overlapping text labels are avoided. In that case the default is "ggplot
". Note that ggplot2 options are currently slightly restricted.
For example, the function is usefull to see if temporal clustering has time dependent variations and for Hidden Markov Models (see Mthrun/RHmm on GitHub).
plotly object or ggplot2 objected depending on Plotter
Michael Thrun
data(Lsun3D) Classplot(Lsun3D$Data[,1],Lsun3D$Data[,2],Lsun3D$Cls) #ggplot 2 with different symbols Classplot( Lsun3D$Data[, 1], Lsun3D$Data[, 2], Lsun3D$Cls, Plotter = "ggplot2", Size = 3, pch = Lsun3D$Cls + 5 ) #plotly with line data(Lsun3D) Classplot(Lsun3D$Data[,1],Lsun3D$Data[,2],Lsun3D$Cls, LineType="-",LineColor = "green") #ggplot2 with annotations data(Lsun3D) ind=sample(1:nrow(Lsun3D$Data),20) Classplot(Lsun3D$Data[ind,1],Lsun3D$Data[ind,2],Lsun3D$Cls[ind], Names = rownames(Lsun3D$Data)[ind],Size =1, Plotter = "ggplot2",AnnotateIt = TRUE) #ggplot2 with labels and legend per class data(Lsun3D) Classplot(Lsun3D$Data[,1],Lsun3D$Data[,2],Lsun3D$Cls, Names = paste0("C",Lsun3D$Cls),Size =2,Legend ="Classes")
data(Lsun3D) Classplot(Lsun3D$Data[,1],Lsun3D$Data[,2],Lsun3D$Cls) #ggplot 2 with different symbols Classplot( Lsun3D$Data[, 1], Lsun3D$Data[, 2], Lsun3D$Cls, Plotter = "ggplot2", Size = 3, pch = Lsun3D$Cls + 5 ) #plotly with line data(Lsun3D) Classplot(Lsun3D$Data[,1],Lsun3D$Data[,2],Lsun3D$Cls, LineType="-",LineColor = "green") #ggplot2 with annotations data(Lsun3D) ind=sample(1:nrow(Lsun3D$Data),20) Classplot(Lsun3D$Data[ind,1],Lsun3D$Data[ind,2],Lsun3D$Cls[ind], Names = rownames(Lsun3D$Data)[ind],Size =1, Plotter = "ggplot2",AnnotateIt = TRUE) #ggplot2 with labels and legend per class data(Lsun3D) Classplot(Lsun3D$Data[,1],Lsun3D$Data[,2],Lsun3D$Cls, Names = paste0("C",Lsun3D$Cls),Size =2,Legend ="Classes")
Combine arbitrary vectors of data, filling in missing rows with NaN
CombineCols(...,na.rm=FALSE)
CombineCols(...,na.rm=FALSE)
... |
d vectors of arbitrary lengths, see example |
na.rm |
boolean: FALSE: fills with NaN TRUE: filles with zeros |
Robust alternative to cbind
that fills missing values with nan instead of extending length of vector by duplicating elements
matrix of dimensionality of n x d with n beeing the length of the longest vector and d the number of vectors given as input
special application by MCT of rowr cbind.fill which is now not on CRAN anymore
Craig Varrichio
CombineRows
CombineCols(c(1,2,3),c(1),c(2,3))
CombineCols(c(1,2,3),c(1),c(2,3))
Combine arbitrary matrices of data, filling in missing columns with NaN
CombineRows(...,na.rm=FALSE)
CombineRows(...,na.rm=FALSE)
... |
First argument is a matrix usually with named columns, thereafter either matrices or d vectors of arbitrary lengths, see example |
na.rm |
boolean: FALSE: fills with NaN TRUE: filles with zeros |
Robust alternative to rbind
that fills missing values with #NaN, tries to match given column names
if matrices are inserted otherwise fills up the missing columns at the end.
The first argument has to be a matrix. It is assumed that this matrix has to be filled up and other arguments or not of bigger size than d columns. Otherwiese the further elements stored in columns >d are ignored.
matrix of dimensionality of n x d with n beeing the number of rows of the first argument and d the number columns of the first argument given as input
Michael Thrun
CombineRows
matrix_pattern=cbind(c(1,2,3),c(4,5,6),c(7,8,9)) CombineRows(matrix_pattern,c(1),c(2,3)) CombineRows(matrix_pattern,cbind(c(1,2,3),c(4,5,6)))
matrix_pattern=cbind(c(1,2,3),c(4,5,6),c(7,8,9)) CombineRows(matrix_pattern,c(1),c(2,3)) CombineRows(matrix_pattern,cbind(c(1,2,3),c(4,5,6)))
Presents a heatmap with values and a cross table of given Data matrix of two features and a bin width or percentualized values. In this approach the bin width is fixes. A more general way to approach this is the kernel density estimation plot of PDEscatter
.
Crosstable(Data, xbins = seq(0, 100, 5), ybins = xbins, NormalizationFactor = 1, PlotIt = TRUE, main='Cross Table', PlotText=TRUE,TextDigits=0,TextProbs=c(0.05,0.95))
Crosstable(Data, xbins = seq(0, 100, 5), ybins = xbins, NormalizationFactor = 1, PlotIt = TRUE, main='Cross Table', PlotText=TRUE,TextDigits=0,TextProbs=c(0.05,0.95))
Data |
[1:n,1:2] matrix of two features from which the cross table should be generated from |
xbins |
[1:k] start of k bins as a vector generated with |
ybins |
[1:k] start of k bins as a vector generated with |
NormalizationFactor |
Optional, Data feautures can be seen as regular time series, e.g. 1 measurement for a minute, in this case it is useful to normalize the output, e.g. to hours, then |
PlotIt |
Optional, Plots the heatmap if |
main |
In case of for |
PlotText |
In case of for |
TextDigits |
In case of for |
TextProbs |
In case of for |
The interval in each bin is closed to the left and opened to the right. The cross table can be seen as a two-dimensional histogram. The idea to add histograms to the table is taken from [Charpentier. 2014].
The cross table in invisible
mode which depicts the number of values (frequency) in an specific range with regard to two features.
The first feature is on the x-axis (left to right), and the second on y-axis (top to bottom) contrary to the plot where it is bottom to top.
For non percentiled values the PlotText
part does not seem always to work, but I currently dont know why the text does not always overlap with the heatmap.
Michael Thrun
[Charpentier. 2014] Charpentier, Arthur, ed. Computational actuarial science with R. CRC Press, 2014.
data(ITS) data(MTY) #simple but not a good transformation Data=(cbind(ITS/max(ITS),MTY/max(MTY)))*100 #choice for bins could be better Crosstable(Data)
data(ITS) data(MTY) #simple but not a good transformation Data=(cbind(ITS/max(ITS),MTY/max(MTY)))*100 #choice for bins could be better Crosstable(Data)
Defines the default color sequence for plots made within the Projections package.
data("DefaultColorSequence")
data("DefaultColorSequence")
A vector with 562 different strings describing colors for plots.
Density estimation (PDE) [Ultsch, 2005] or "SDH" [Eilers/Goeman, 2004] used for a density contour plot.
DensityContour(X,Y, DensityEstimation="SDH", SampleSize, na.rm=FALSE,PlotIt=TRUE, NrOfContourLines=20,Plotter='ggplot', DrawTopView = TRUE, xlab, ylab, main="DensityContour", xlim, ylim, Legendlab_ggplot="value", AddString2lab="",NoBinsOrPareto=NULL,...)
DensityContour(X,Y, DensityEstimation="SDH", SampleSize, na.rm=FALSE,PlotIt=TRUE, NrOfContourLines=20,Plotter='ggplot', DrawTopView = TRUE, xlab, ylab, main="DensityContour", xlim, ylim, Legendlab_ggplot="value", AddString2lab="",NoBinsOrPareto=NULL,...)
X |
Numeric vector [1:n], first feature (for x axis values) |
Y |
Numeric vector [1:n], second feature (for y axis values) |
DensityEstimation |
|
SampleSize |
Numeric, positiv scalar, maximum size of the sample used for calculation. High values increase runtime significantly. The default is that no sample is drawn |
na.rm |
Function may not work with non finite values. If these cases should be automatically removed, set parameter TRUE |
PlotIt |
|
NrOfContourLines |
Numeric, number of contour lines to be drawn. 20 by default. |
Plotter |
String, name of the plotting backend to use. Possible values are: " |
DrawTopView |
Boolean, True means contur is drawn, otherwise a 3D plot is drawn. Default: TRUE |
xlab |
String, title of the x axis. Default: "X", see |
ylab |
String, title of the y axis. Default: "Y", see |
main |
string, the same as "main" in |
xlim |
see |
ylim |
see |
Legendlab_ggplot |
String, in case of |
AddString2lab |
adds the same string of information to x and y axis label, e.g. usefull for adding SI units |
NoBinsOrPareto |
Density specifc parameters, for |
... |
further plot arguments |
The DensityContour
function generates the density of the xy data as a z coordinate. Afterwards xyz will be plotted either as a contour plot or a 3d plot. It assumens that the cases of x and y are mapped to each other meaning that a cbind(x,y)
operation is allowed.
This function plots the Density on top of a scatterplot. Variances of x and y should not differ by extreme numbers, otherwise calculate the percentiles on both first. If DrawTopView=FALSE
only the plotly option is currently available. If another option is chosen, the method switches automatically there.
PlotIt=FALSE
is usefull if one likes to perform adjustements like axis scaling prior to plotting with ggplot2 or plotly.
List of:
X |
Numeric vector [1:m],m<=n, first feature used in the plot or the kernels used |
Y |
Numeric vector [1:m],m<=n, second feature used in the plot or the kernels used |
Densities |
Number of points within the ParetoRadius of each point, i.e. density information |
Handle |
Handle of the plot object |
MT contributed with several adjustments
Felix Pape
[Thrun, 2018] Thrun, M. C.: Projection Based Clustering through Self-Organization and Swarm Intelligence, (Ultsch, A. & Huellermeier, E. Eds., 10.1007/978-3-658-20540-9), Doctoral dissertation, Heidelberg, Springer, ISBN: 978-3658205393, 2018.
[Thrun/Ultsch, 2018] Thrun, M. C., & Ultsch, A. : Effects of the payout system of income taxes to municipalities in Germany, in Papiez, M. & Smiech,, S. (eds.), Proc. 12th Professor Aleksander Zelias International Conference on Modelling and Forecasting of Socio-Economic Phenomena, pp. 533-542, Cracow: Foundation of the Cracow University of Economics, Cracow, Poland, 2018.
[Ultsch, 2005] Ultsch, A.: Pareto density estimation: A density estimation for knowledge discovery, In Baier, D. & Werrnecke, K. D. (Eds.), Innovations in classification, data science, and information systems, (Vol. 27, pp. 91-100), Berlin, Germany, Springer, 2005.
[Eilers/Goeman, 2004] Eilers, P. H., & Goeman, J. J.: Enhancing scatterplots with smoothed densities, Bioinformatics, Vol. 20(5), pp. 623-628. 2004.
#taken from [Thrun/Ultsch, 2018] data("ITS") data("MTY") Inds=which(ITS<900&MTY<8000) plot(ITS[Inds],MTY[Inds],main='Bimodality is not visible in normal scatter plot') DensityContour(ITS[Inds],MTY[Inds],DensityEstimation="SDH",xlab = 'ITS in EUR', ylab ='MTY in EUR' ,main='Smoothed Densities histogram indicates Bimodality' ) DensityContour(ITS[Inds],MTY[Inds],DensityEstimation="PDE",xlab = 'ITS in EUR', ylab ='MTY in EUR' ,main='PDE indicates Bimodality' )
#taken from [Thrun/Ultsch, 2018] data("ITS") data("MTY") Inds=which(ITS<900&MTY<8000) plot(ITS[Inds],MTY[Inds],main='Bimodality is not visible in normal scatter plot') DensityContour(ITS[Inds],MTY[Inds],DensityEstimation="SDH",xlab = 'ITS in EUR', ylab ='MTY in EUR' ,main='Smoothed Densities histogram indicates Bimodality' ) DensityContour(ITS[Inds],MTY[Inds],DensityEstimation="PDE",xlab = 'ITS in EUR', ylab ='MTY in EUR' ,main='PDE indicates Bimodality' )
Density estimation is performed by (PDE) [Ultsch, 2005] or "SDH" [Eilers/Goeman, 2004] and visualized in a density scatter plot [Brinkmann et al., 2023] in which the points are colored by their density.
DensityScatter(X,Y,DensityEstimation="SDH", Type="DDCAL", Plotter = "native",Marginals = FALSE, SampleSize,na.rm=FALSE, xlab, ylab, main="DensityScatter", AddString2lab="", xlim, ylim,NoBinsOrPareto=NULL,...)
DensityScatter(X,Y,DensityEstimation="SDH", Type="DDCAL", Plotter = "native",Marginals = FALSE, SampleSize,na.rm=FALSE, xlab, ylab, main="DensityScatter", AddString2lab="", xlim, ylim,NoBinsOrPareto=NULL,...)
X |
Numeric vector [1:n], first feature (for x axis values) |
Y |
Numeric vector [1:n], second feature (for y axis values) |
DensityEstimation |
(Optional), |
Type |
(Optional), |
Plotter |
in case of |
Marginals |
(Optional) Boolean, if TRUE the marginal distributions of X and Y will be plotted together with the 2D density of X and Y. Default is FALSE |
SampleSize |
(Optional), Numeric, positiv scalar, maximum size of the sample used for calculation. High values increase runtime significantly. The default is that no sample is drawn |
na.rm |
(Optional), Function may not work with non finite values. If these cases should be automatically removed, set parameter TRUE |
xlab |
(Optional), String, title of the x axis. Default: "X", see |
ylab |
(Optional), String, title of the y axis. Default: "Y", see |
main |
(Optional), string, the same as "main" in |
AddString2lab |
(Optional), adds the same string of information to x and y axis label, e.g. usefull for adding SI units |
xlim |
(Optional), in case of |
ylim |
in case of |
NoBinsOrPareto |
(Optional), in case of |
... |
(Optional), further arguments either to ScatterDenstiy::DensityScatter.DDCAL or to plot() |
The DensityScatter
function generates the density of the xy data as a z coordinate. Afterwards xy points will be plotted as a scatter plot, where the z values defines the coloring of the xy points. It assumens that the cases of x and y are mapped to each other meaning that a cbind(x,y)
operation is allowed.
This function plots the Density on top of a scatterplot. Variances of x and y should not differ by extreme numbers, otherwise calculate the percentiles on both first.
List of:
X |
Numeric vector [1:m],m<=n, first feature used in the plot or the kernels used |
Y |
Numeric vector [1:m],m<=n, second feature used in the plot or the kernels used |
Densities |
Number of points within the ParetoRadius of each point, i.e. density information |
MT contributed with several adjustments
Felix Pape
[Thrun, 2018] Thrun, M. C.: Projection Based Clustering through Self-Organization and Swarm Intelligence, (Ultsch, A. & Huellermeier, E. Eds., 10.1007/978-3-658-20540-9), Doctoral dissertation, Heidelberg, Springer, ISBN: 978-3658205393, 2018.
[Thrun/Ultsch, 2018] Thrun, M. C., & Ultsch, A. : Effects of the payout system of income taxes to municipalities in Germany, in Papiez, M. & Smiech,, S. (eds.), Proc. 12th Professor Aleksander Zelias International Conference on Modelling and Forecasting of Socio-Economic Phenomena, pp. 533-542, Cracow: Foundation of the Cracow University of Economics, Cracow, Poland, 2018.
[Ultsch, 2005] Ultsch, A.: Pareto density estimation: A density estimation for knowledge discovery, In Baier, D. & Werrnecke, K. D. (Eds.), Innovations in classification, data science, and information systems, (Vol. 27, pp. 91-100), Berlin, Germany, Springer, 2005.
[Eilers/Goeman, 2004] Eilers, P. H., & Goeman, J. J.: Enhancing scatterplots with smoothed densities, Bioinformatics, Vol. 20(5), pp. 623-628. 2004
[Lux/Rinderle-Ma, 2023] Lux, M. & Rinderle-Ma, S.: DDCAL: Evenly Distributing Data into Low Variance Clusters Based on Iterative Feature Scaling, Journal of Classification vol. 40, pp. 106-144, 2023.
[Brinkmann et al., 2023] Brinkmann, L., Stier, Q., & Thrun, M. C.: Computing Sensitive Color Transitions for the Identification of Two-Dimensional Structures, Proc. Data Science, Statistics & Visualisation (DSSV) and the European Conference on Data Analysis (ECDA), p.109, Antwerp, Belgium, July 5-7, 2023.
#taken from [Thrun/Ultsch, 2018] data("ITS") data("MTY") Inds=which(ITS<900&MTY<8000) plot(ITS[Inds],MTY[Inds],main='Bimodality is not visible in normal scatter plot') DensityScatter(ITS[Inds],MTY[Inds],DensityEstimation="SDH",xlab = 'ITS in EUR', ylab ='MTY in EUR' ,main='Smoothed Densities histogram indicates Bimodality' ) DensityScatter(ITS[Inds],MTY[Inds],DensityEstimation="PDE",xlab = 'ITS in EUR', ylab ='MTY in EUR' ,main='PDE indicates Bimodality' )
#taken from [Thrun/Ultsch, 2018] data("ITS") data("MTY") Inds=which(ITS<900&MTY<8000) plot(ITS[Inds],MTY[Inds],main='Bimodality is not visible in normal scatter plot') DensityScatter(ITS[Inds],MTY[Inds],DensityEstimation="SDH",xlab = 'ITS in EUR', ylab ='MTY in EUR' ,main='Smoothed Densities histogram indicates Bimodality' ) DensityScatter(ITS[Inds],MTY[Inds],DensityEstimation="PDE",xlab = 'ITS in EUR', ylab ='MTY in EUR' ,main='PDE indicates Bimodality' )
DiagnosticAbility4Classifiers as applied in [...].
DiagnosticAbility4Classifiers(TrueCondition_Cls, ManyPredictedCondition_Cls, NamesOfConditions = NULL, PlotType = "PRC", xlab = "True Positive Rate", ylab = "False Positive Rate", main = "ROC Space", Colors, LineColor = NULL, Size = 8, LineWidth = 1, LineType = NULL, Showgrid = TRUE, SaveIt = FALSE)
DiagnosticAbility4Classifiers(TrueCondition_Cls, ManyPredictedCondition_Cls, NamesOfConditions = NULL, PlotType = "PRC", xlab = "True Positive Rate", ylab = "False Positive Rate", main = "ROC Space", Colors, LineColor = NULL, Size = 8, LineWidth = 1, LineType = NULL, Showgrid = TRUE, SaveIt = FALSE)
TrueCondition_Cls |
[1:n] numeric vector of k classes (true classification), preferably of the testset |
ManyPredictedCondition_Cls |
[1:n,1:c] every col c is a Cls of one specific condition of the classifier trying to reproduce the classification (preferably on a test set) |
NamesOfConditions |
[1:c] character vector of c conditions, sets names of legend and the points |
PlotType |
possible are 'ROC':Receiver operating characteristic. 'PRC': Precision Recall, and 'SenSpec':Sensitivity-Specifity Plot |
xlab |
Optional, string |
ylab |
Optional, string |
main |
Optional, string |
Colors |
Optional, string |
LineColor |
Optional, name of color, then all points are connected by a curve |
Size |
Optional, number defining the Size of the curve |
LineWidth |
Optional, number defining the width of the curve |
LineType |
Optional, string defining the type of the curve |
Showgrid |
Optional, boolean |
SaveIt |
Optional, boolean, if true saves plot as html |
For unbalanced binary classes PRC should be preferred and not ROC [Saito/Rehmsmeier, 2016].
If it is a LIST, use
Plot |
plotly handler |
X |
[1:c] vector of xaxis values |
Y |
[1:c] vector of y axis values |
Currently only for binary classifiers developed
Michael Thrun
[|] :Determination of CD43 and CD200 surface expression improves accuracy of B-cell lymphoma immunophenotyping, 2020.
[Saito/Rehmsmeier, 2016] Saito, Takaya and Rehmsmeier, Marc: The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets, PlosOne, https://doi.org/10.1371/journal.pone.0118432, 2016.
#TODo
#TODo
This function plots a world map where the single countries get colored differently by using a classification
CountryCode |
Vector of Countrys belonging to the Cls |
Cls |
Classes belonging to the Countries from CountryCode |
JoinCode |
System that is used for the CountryCodes. Possible are: "ISO3", "UN" |
Title |
Title that will be written above the map |
Colors |
Vector that colors for classes will be selected from |
a plot
Florian Lerch
Allows to plot two time series or features with one or two classification(a) as labeled scatter plots. The colors are the labels defined by the classification. Usefull to see if temporal clustering has time dependent variations and for Hidden Markov Models (see Mthrun/RHmm on GitHub).
DualaxisClassplot(X, Y1, Y2, Cls1, Cls2, xlab = "X", y1lab = "Y1", y2lab = "Y2", main = "Dual Axis Class Plot", Colors, Showgrid = TRUE, SaveIt = FALSE)
DualaxisClassplot(X, Y1, Y2, Cls1, Cls2, xlab = "X", y1lab = "Y1", y2lab = "Y2", main = "Dual Axis Class Plot", Colors, Showgrid = TRUE, SaveIt = FALSE)
X |
[1:n] numeric vector or time |
Y1 |
[1:n] numeric vector of feauture |
Y2 |
[1:n] numeric vector of feauture |
Cls1 |
[1:n] numeric vector defining a classification of k1 classes |
Cls2 |
Optional, [1:n] numeric vector defining a classification of k2 classes for |
xlab |
Optional, string |
y1lab |
Optional, string |
y2lab |
Optional, string |
main |
Optional, string |
Colors |
[1:(k1+k2)] Colornames |
Showgrid |
Optional, boolean |
SaveIt |
Optional, boolean |
plotly object
Michael Thrun
##ToDo
##ToDo
A line chart with dual axisSS
DualaxisLinechart(X, Y1, Y2, xlab = "X", y1lab = "Y1", y2lab = "Y2", main = "Dual Axis Line Chart", cols = c("black", "blue"),Overlaying="y", SaveIt = FALSE)
DualaxisLinechart(X, Y1, Y2, xlab = "X", y1lab = "Y1", y2lab = "Y2", main = "Dual Axis Line Chart", cols = c("black", "blue"),Overlaying="y", SaveIt = FALSE)
X |
[1:n] vector, both lines require the same xvalues, e.g. the time of the time series, |
Y1 |
[1:n] vector of first line |
Y2 |
[1:n] vector of second line |
xlab |
Optional, string for xlabel |
y1lab |
Optional, string for first ylabel |
y2lab |
Optional, string for second ylabel |
main |
Optional, title of plot |
cols |
Optional, color of two lines |
Overlaying |
Change only default in case of using |
SaveIt |
Optional, default FALSE; TRUE if you want to save plot as html in |
enables to visualize to lines in one plot overlaying them using ploty (e.g. two time series with two ranges of values)
plotly
object
Michael Thrun
#subplot renames the numbering of subsequent plots y1=runif(100,0,1) y2=rnorm(100,m=5,s=1) DualaxisLinechart(1:100, y1, y2,main="Random Time series") y1=runif(100,0,1) y2=(1:100*3+4)*runif(100,0,1) p1=DualaxisLinechart(1:100, y1, y2,main="Random Time series",Overlaying="y2") y3=1:100*(-2)+4 y4=rnorm(100,m=0,s=2) p2=DualaxisLinechart(1:100, y3, y4,main="Random Time series",Overlaying="y4") plotly::subplot(p1,p2)
#subplot renames the numbering of subsequent plots y1=runif(100,0,1) y2=rnorm(100,m=5,s=1) DualaxisLinechart(1:100, y1, y2,main="Random Time series") y1=runif(100,0,1) y2=(1:100*3+4)*runif(100,0,1) p1=DualaxisLinechart(1:100, y1, y2,main="Random Time series",Overlaying="y2") y3=1:100*(-2)+4 y4=rnorm(100,m=0,s=2) p2=DualaxisLinechart(1:100, y3, y4,main="Random Time series",Overlaying="y4") plotly::subplot(p1,p2)
Estimates densities for two-dimensional data with the given estimation type
estimateDensity2D(X, Y, DensityEstimation = "SDH", SampleSize, na.rm = FALSE, NoBinsOrPareto = NULL)
estimateDensity2D(X, Y, DensityEstimation = "SDH", SampleSize, na.rm = FALSE, NoBinsOrPareto = NULL)
X |
[1:n] numerical vector of first feature |
Y |
[1:n] numerical vector of second feature |
DensityEstimation |
Either "PDE","SDH" or "kde2d" |
SampleSize |
Sample Size in case of big data |
na.rm |
Function may not work with non finite values. If these cases should be automatically removed, set parameter TRUE |
NoBinsOrPareto |
Density specifc parameters, for PDEscatter(ParetoRadius) or SDH (nbins)) or kde2d(bins) |
Each two-dimensional data point is defined by its corresponding X and Y value.
List V with
X |
[1:m] numerical vector of first feature, m<=n depending if all values are finite an na.rm parameter |
Y |
[1:m] numerical vector of second feature, m<=n depending if all values are finite an na.rm parameter |
Densities |
the density of each two-dimensional data point |
Luca Brinkman and Michael Thrun
[Ultsch, 2005] Ultsch, A.: Pareto density estimation: A density estimation for knowledge discovery, In Baier, D. & Werrnecke, K. D. (Eds.), Innovations in classification, data science, and information systems, (Vol. 27, pp. 91-100), Berlin, Germany, Springer, 2005.
[Eilers/Goeman, 2004] Eilers, P. H., & Goeman, J. J.: Enhancing scatterplots with smoothed densities, Bioinformatics, Vol. 20(5), pp. 623-628. 2004
X=runif(100) Y=rnorm(100) #V=estimateDensity2D(X,Y)
X=runif(100) Y=rnorm(100) #V=estimateDensity2D(X,Y)
The better alternative to the pie chart represents amount of values given in data.
Fanplot(Datavector,Names,Labels,MaxNumberOfSlices,main='',col, MaxPercentage=FALSE,ShrinkPies=0.05,Rline=1.1, lwd=2,LabelCols="black",...)
Fanplot(Datavector,Names,Labels,MaxNumberOfSlices,main='',col, MaxPercentage=FALSE,ShrinkPies=0.05,Rline=1.1, lwd=2,LabelCols="black",...)
Datavector |
[1:n] a vector of n non unique values |
Names |
Optional,
[1:k] names to search for in Datavector, if not set |
Labels |
Optional, [1:k] Labels if they are specially named, if not Names are used. |
MaxNumberOfSlices |
Default is k, integer value defining how many labels will be shown. Everything else will be summed up to |
main |
Optional, title below the fan pie, see |
col |
Optional, the default are the first [1:k] colors of the default color sequence used in this package, otherwise a character vector of [1:k] specifying the colors analog to |
MaxPercentage |
default FALSE; if true the biggest slice is 100 percent instead of the biggest procentual count |
ShrinkPies |
Optional, distance between biggest and smallest slice of the pie |
Rline |
Optional, the distance between text and pie is defined here as the length of the line in numerical numbers |
lwd |
Optional, The line width, a positive number, defaut is 2 |
LabelCols |
Color of labels |
... |
Further arguments to |
A normal pie plot is dificult to interpret for a human observer, because humans are not trained well to observe angles [Gohil, 2015, p. 102]. Therefore, the fan plot is used. As proposed in [Gohil 2015] the fan.plot
() of the plotrix
package is used to solve this problem.
If Number of Slices is higher than MaxNumberOfSlices then ABCanalysis
is applied (see [Ultsch/Lotsch, 2015]) and group A chosen.
If Number of Slices in group A is higher than MaxNumberOfSlices, then the most important ones out of group A are chosen.
If MaxNumberOfSlices is higher than Slices in group A, additional slices are shown depending on the percentage (from high to low).
Color sequence is automatically shortened to the MaxNumberOfSlices used in the fan plot.
silent output by calling invisible
of a list with
Percentages |
[1:k] percent values visualized in fanplot |
Labels |
[1:k] see input |
Michael Thrun
[Gohil, 2015] Gohil, Atmajitsinh. R data Visualization cookbook. Packt Publishing Ltd, 2015.
[Ultsch/Lotsch, 2015] Ultsch. A ., Lotsch J.: Computed ABC Analysis for Rational Selection of Most Informative Variables in Multivariate Data, PloS one, Vol. 10(6), pp. e0129767. doi 10.1371/journal.pone.0129767, 2015.
data(categoricalVariable) Fanplot(categoricalVariable)
data(categoricalVariable) Fanplot(categoricalVariable)
This dataset was extracted out of Yahoo finance and was investigated in [Thrun et al., 2019] and clustered in [Thrun, 2019].
data("FundamentalData_Q1_2018")
data("FundamentalData_Q1_2018")
The format is: List of 3 $ Data :'data.frame': 269 obs. of 45 variables: ..$ TotalRevenue : num [1:269] 3779000 78225 48220 63726 3084 ... ..$ CostofRevenue : num [1:269] 2348000 60835 26174 35203 882 ... ..$ GrossProfit : num [1:269] 1431000 17390 22046 28523 2202 ... ..$ SellingGeneralandAdministrative : num [1:269] 459000 NaN 15162 17072 2005 ... ..$ Others : num [1:269] -3000 10272 -52 3131 1784 ... ..$ TotalOperatingExpenses : num [1:269] 2872000 73833 41284 56787 5081 ... ..$ OperatingIncomeorLoss : num [1:269] 907000 4392 6936 6939 -1997 ... ..$ TotalOtherIncomeDIVxpensesNet : num [1:269] -28000 -344 1 -210 -240 ... ..$ EarningsBeforeInterestandTaxes : num [1:269] 907000 4392 6936 6939 -1997 ... ..$ InterestExpense : num [1:269] -20000 -415 NaN -243 -238 ... ..$ IncomeBeforeTax : num [1:269] 879000 4048 6937 6729 -2237 ... ..$ IncomeTaxExpense : num [1:269] 233000 1365 2188 1896 7 ... ..$ NetIncomeFromContinuingOps : num [1:269] 646000 2683 4749 4833 -2244 ... ..$ NetIncome_x : num [1:269] 644000 2817 4645 4833 -2244 ... ..$ NetIncome : num [1:269] 644000 2817 4645 4833 -2244 ... ..$ CashAndCashEquivalents : num [1:269] 926000 29047 45911 94859 11217 ... ..$ NetReceivables : num [1:269] 2527000 46171 20774 151952 2774 ... ..$ Inventory : num [1:269] 2011000 471 NaN 10572 8924 ... ..$ TotalCurrentAssets : num [1:269] 5674000 80224 68061 267187 25989 ... ..$ LongTermInvestments : num [1:269] 234000 450 NaN 4155 872 ... ..$ PropertyPlantandEquipment : num [1:269] 4216000 14561 3093 32247 7073 ... ..$ IntangibleAssets : num [1:269] 78000 40706 3975 6169 125 ... ..$ OtherAssets : num [1:269] 810000 8224 1091 2978 13310 ... ..$ DeferredLongTermAssetCharges : num [1:269] 759000 684 1091 784 1405 ... ..$ TotalAssets : num [1:269] 11262000 167807 83155 351220 47369 ... ..$ AccountsPayable : num [1:269] 1442000 10567 1698 17316 1386 ... ..$ ShortDIVurrentLongTermDebt : num [1:269] 1275000 30192 NaN 26668 917 ... ..$ OtherCurrentLiabilities : num [1:269] 1064000 36942 22781 92297 2659 ... ..$ TotalCurrentLiabilities : num [1:269] 2577000 54430 24479 114210 4299 ... ..$ OtherLiabilities : num [1:269] 1795000 19435 6876 29347 2018 ... ..$ TotalLiabilities : num [1:269] 5576000 97136 31355 165628 6980 ... ..$ CommonStock : num [1:269] 198000 14946 5198 15250 28644 ... ..$ RetainedEarnings : num [1:269] NaN 44030 34767 40374 -8965 ... ..$ TreasuryStock : num [1:269] 5455000 11686 NaN 129968 20710 ... ..$ OtherStockholderEquity : num [1:269] 5455000 11686 NaN 129968 20710 ... ..$ TotalStockholderEquity : num [1:269] 5653000 70662 51212 185592 40389 ... ..$ NetTangibleAssets : num [1:269] 5325000 6314 40302 140939 40264 ... ..$ Depreciation : num [1:269] 156000 2728 331 1381 410 ... ..$ AdjustmentsToNetIncome : num [1:269] 216000 1911 116 2912 39 ... ..$ ChangesInOtherOperatingActivities : num [1:269] -20000 -2174 -829 NaN 428 ... ..$ TotalCashFlowFromOperatingActivities : num [1:269] 452000 7349 4274 -8241 -1367 ... ..$ CapitalExpenditures : num [1:269] -88000 -966 -1778 -2067 -155 ... ..$ TotalCashFlowsFromInvestingActivities: num [1:269] 30000 -879 -1766 -2746 -484 ... ..$ TotalCashFlowsFromFinancingActivities: num [1:269] -789000 -6660 -21867 -961 -204 ... ..$ ChangeInCashandCashEquivalents : num [1:269] -306000 -215 2508 -11842 -2062 ... $ Names: chr [1:269, 1:6] "1COV" "A1OS" "AAD" "AAG" ... ..- attr(*, "dimnames")=List of 2 .. ..$ : NULL .. ..$ : chr [1:6] "Key" "ISIN" "Company" "Sector" ... $ Cls : num [1:269] 1 1 1 1 2 1 1 1 3 1 ...
Stocks are selected by the German Prime standard accoridingly to the "Names" data frame. Fundamental Data with missing values is stored in "Data". The rownames of "Data" have the same Key as the first row of "Names" which is the trading symbol. "Cls" provides the clustering as a numerical vector of 1:k classes performed by Databionic Swarm in [Thrun, 2019].
Yahoo finance
Thrun, M. C., : Knowledge Discovery in Quarterly Financial Data of Stocks Based on the Prime Standard using a Hybrid of a Swarm with SOM, in Verleysen, M. (Ed.), European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN), Vol. 27, pp. 397-402, Ciaco, ISBN: 978-287-587-065-0, Bruges, Belgium, 2019.
[Thrun et al., 2019] Thrun, M. C., Gehlert, Tino, & Ultsch, A. : Analyzing the Fine Structure of Distributions, arXiv:1908.06081, 2019.
data(FundamentalData_Q1_2018) ## maybe str(FundamentalData_Q1_2018) ; plot(FundamentalData_Q1_2018) ...
data(FundamentalData_Q1_2018) ## maybe str(FundamentalData_Q1_2018) ; plot(FundamentalData_Q1_2018) ...
GermanPostalCodesShapes
data("GermanPostalCodesShapes")
data("GermanPostalCodesShapes")
GermanPostalCodesShapes
You could read https://www.r-bloggers.com/case-study-mapping-german-zip-codes-in-r/, if you want to change the map.
data(GermanPostalCodesShapes) str(GermanPostalCodesShapes)
data(GermanPostalCodesShapes) str(GermanPostalCodesShapes)
Google Maps with marked coordinates.
GoogleMapsCoordinates(Longitude,Latitude,Cls=rep(1,length(Longitude)), zoom=3,location= c(mean(Longitude),mean(Latitude)),stroke=1.7,size=6,sequence)
GoogleMapsCoordinates(Longitude,Latitude,Cls=rep(1,length(Longitude)), zoom=3,location= c(mean(Longitude),mean(Latitude)),stroke=1.7,size=6,sequence)
Longitude |
sphaerischer winkel der Kugeloberflaeche, coord 1 |
Latitude |
sphaerischer winkel der Kugeloberflaeche, coord 2 |
Cls |
Vorklassification/Clusterung |
zoom |
map zoom, an integer from 3 (continent) to 21 (building), default value 10 (city). openstreetmaps limits a zoom of 18, and the limit on stamen maps depends on the maptype. "auto" automatically determines the zoom for bounding box specifications, and is defaulted to 10 with center/zoom specifications. maps of the whole world currently not supported |
location |
Optional, default: c(mean(Longitude),mean(Latitude); an address, longitude/latitude pair (in that order), or left/bottom/right/top bounding box |
stroke |
Optional, plotting parameter, dicke der linien der coordiantensymbole |
size |
Optional, plotting parameter, groesse der koordinatensymbole |
sequence |
Optional, vector of length of number of clusers with numbers indicating the plotting symbols and colors to use |
This plot was used in [Thrun, 2018, p. 135].
ggobject()
requires an Internet connection,
requires an API key of Google.
See ?ggmap::register_google
for details.
Michael Thrun
[Thrun, 2018] Thrun, M. C.: Projection Based Clustering through Self-Organization and Swarm Intelligence, doctoral dissertation 2017, Springer, ISBN: 978-3-658-20539-3, Heidelberg, 2018.
Heatmap of Distances of Data sorted by Cls. Clustering algorithms provide a Classifcation of data, where the labels are defined as a numeric vector Cls
. Then, a typical cluster-respectively group structure is displayed by the Heatmap
function.
At the margin of the heatmap a dendrogram can be shown, if hierarchical cluster algorithms are used [Wilkinson,2009].
Here the dendrogram has to be shown separately and only the heatmap itself is displayed
Heatmap(DataOrDistances,Cls,method='euclidean', LowLim=0,HiLim,LineWidth=0.5,Clabel="Cluster No.")
Heatmap(DataOrDistances,Cls,method='euclidean', LowLim=0,HiLim,LineWidth=0.5,Clabel="Cluster No.")
DataOrDistances |
if not symmetric, then the function assumes a [1:n,1:d] numeric matrix of n data cases in rows amd d variables in columns. In this case, the distance metric specifed in Otherwise, [1:n,1:n] distance matrix that is symmetric |
Cls |
[1:n] numerical vector of numbers defining the classification as the main output of the clustering algorithm. It has k unique numbers for k clusters that represent the arbitrary labels of the clustering, assuming a descending order of 1 to k. If not ordered please use |
method |
Optional,
if |
LowLim |
Optional: limits for the color axis |
HiLim |
Optional: limits for the color axis |
LineWidth |
Width of lines seperating the clusters in the heatmap |
Clabel |
Default " |
"Cluster heatmaps are commonly used in biology and related fields to reveal hierarchical clusters in data matrices. Heatmaps visualize a data matrix by drawing a rectangular grid corresponding to rows and columns in the matrix and coloring the cells by their values in the data matrix. In their most basic form, heatmaps have been used for over a century [Wilkinson, 2012]. In addition to coloring cells, cluster heatmaps reorder the rows and/or columns of the matrix based on the results of hierarchical clustering. (...) . Cluster heatmaps have high data density, allowing them to compact large amounts of information into a small space [Weinstein, 2008]", [Engle, 2017].
The procedure can be adapted to distance matrices [Thrun, 2018]. Then, the color scale is chosen such that pixels of low distances have blue and teal colors, pixels of middle distances yellow colors, and pixels of high distances have orange and red colors [Thrun, 2018]. The distances are ordered by the clustering and the clusters are divided by black lines. A clustering is valid if the intra-cluster distances are distinctively smaller that inter-cluster distances in the heatmap [Thrun, 2018]. For another example, please see [Thrun, 2018] (Fig. 3.7, p. 31).
object of ggplot2
Michael Thrun
[Wilkinson,2009] Wilkinson, L., & Friendly, M.: The history of the cluster heat map, The American Statistician, Vol. 63(2), pp. 179-184. 2009.
[Engle et al., 2017] Engle, S., Whalen, S., Joshi, A., & Pollard, K. S.: Unboxing cluster heatmaps, BMC bioinformatics, Vol. 18(2), pp. 63. 2017.
[Weinstein, 2008] Weinstein, J. N.: A postgenomic visual icon, Science, Vol. 319(5871), pp. 1772-1773. 2008.
[Thrun, 2018] Thrun, M. C.: Projection Based Clustering through Self-Organization and Swarm Intelligence, doctoral dissertation 2017, Springer, Heidelberg, ISBN: 978-3-658-20539-3, doi:10.1007/978-3-658-20540-9, 2018.
data("Lsun3D") Cls=Lsun3D$Cls Data=Lsun3D$Data #Data Heatmap(Data,Cls = Cls) #Distances Heatmap(as.matrix(dist(Data)),Cls = Cls)
data("Lsun3D") Cls=Lsun3D$Cls Data=Lsun3D$Data #Data Heatmap(Data,Cls = Cls) #Distances Heatmap(as.matrix(dist(Data)),Cls = Cls)
Defines the default color sequence for plots made with PixelMatrixPlot
data("HeatmapColors")
data("HeatmapColors")
A vector with different strings describing colors for this plot.
Enables to inspect the boxplots for multiple variables in ggplot2 syntax. Each boxplot also has a point for the mean of the variable.
InspectBoxplots(Data, Names,Means=TRUE)
InspectBoxplots(Data, Names,Means=TRUE)
Data |
Matrix containing the data. Each column is one variable. |
Names |
Optional: Names of the variables. If missing the columnnames of data are used. |
Means |
Optional: TRUE: with mean, FALSE: Only median. |
The ggplot object of the boxplots
Felix Pape
x <- cbind(A = rnorm(200, 1, 3), B = rnorm(100, -2, 5)) InspectBoxplots(x)
x <- cbind(A = rnorm(200, 1, 3), B = rnorm(100, -2, 5)) InspectBoxplots(x)
Inspects the correlation between two given features using density scatter plots.
InspectCorrelation(X, Y, DensityEstimation = "SDH", CorMethod = "spearman", na.rm = TRUE, SampleSize = round(sqrt(5e+08), -3), NrOfContourLines = 20, Plotter = "native", DrawTopView = T, xlab, ylab, main = "Spearman correlation coef.:", xlim, ylim, Legendlab_ggplot = "value", ...)
InspectCorrelation(X, Y, DensityEstimation = "SDH", CorMethod = "spearman", na.rm = TRUE, SampleSize = round(sqrt(5e+08), -3), NrOfContourLines = 20, Plotter = "native", DrawTopView = T, xlab, ylab, main = "Spearman correlation coef.:", xlim, ylim, Legendlab_ggplot = "value", ...)
X |
Numeric vector [1:n], first feature (for x axis values) |
Y |
Numeric vector [1:n], second feature (for y axis values) |
DensityEstimation |
"SDH" is very fast but maybe not correct, "PDE" is slow but proably more correct. |
CorMethod |
method of correlation of the cor function, One of "pearson" (default), "kendall", or "spearman |
SampleSize |
Numeric, positiv scalar, maximum size of the sample used for calculation. High values increase runtime significantly. The default is that no sample is drawn |
na.rm |
Function may not work with non finite values. If these cases should be automatically removed, set parameter TRUE |
NrOfContourLines |
Numeric, number of contour lines to be drawn. 20 by default. |
Plotter |
String, name of the plotting backend to use. Possible values are: " |
DrawTopView |
Boolean, True means contur is drawn, otherwise a 3D plot is drawn. Default: TRUE |
xlab |
String, title of the x axis. Default: "X", see |
ylab |
String, title of the y axis. Default: "Y", see |
main |
string, the same as "main" in |
xlim |
see |
ylim |
see |
Legendlab_ggplot |
String, in case of |
... |
Density specifc parameters, for |
Example shows that features with high correlation coefficient do not correlate because of bimodality.
plotting handler
Michael Thrun
[Thrun/Ultsch, 2018] Thrun, M. C., & Ultsch, A. : Effects of the payout system of income taxes to municipalities in Germany, in Papiez, M. & Smiech,, S. (eds.), Proc. 12th Professor Aleksander Zelias International Conference on Modelling and Forecasting of Socio-Economic Phenomena, pp. 533-542, Cracow: Foundation of the Cracow University of Economics, Cracow, Poland, 2018.
data(ITS) data(MTY) Inds=which(ITS<900&MTY<8000) InspectCorrelation(ITS[Inds],MTY[Inds])
data(ITS) data(MTY) Inds=which(ITS<900&MTY<8000) InspectCorrelation(ITS[Inds],MTY[Inds])
Visualizes the distances between objects in the data matrix
InspectDistances(DataOrDistances,method= "euclidean",sampleSize = 50000,...)
InspectDistances(DataOrDistances,method= "euclidean",sampleSize = 50000,...)
DataOrDistances |
[1:n,1:d] data cases in rows, variables in columns, if not symmetric or [1:n,1:n] distance matrix, if symmetric |
method |
Optional,
if Data[1:n,1:d]
see |
sampleSize |
double value defining the size of the sample for large distance matrizes, see |
... |
further arguments passed on to |
For an interpretation of the distribution analysis of the distance please read [Thrun, 2018, p. 27, 185].
uses InspectVariable
Michael Thrun
[Thrun, 2018] Thrun, M. C.: Projection Based Clustering through Self-Organization and Swarm Intelligence, doctoral dissertation 2017, Springer, ISBN: 978-3-658-20539-3, Heidelberg, 2018.
data("Lsun3D") Data=Lsun3D$Data InspectDistances(as.matrix(dist(Data)))
data("Lsun3D") Data=Lsun3D$Data InspectDistances(as.matrix(dist(Data)))
Pairwise scatterplots and optimal histograms of all features stored as columns of data are plotted
InspectScatterplots(Data,Names=colnames(Data))
InspectScatterplots(Data,Names=colnames(Data))
Data |
[1:n,1:d] Data cases in rows (n), variables in columns (d) |
Names |
Optional: Names of the variables. If missing the columnnames of data are used. |
For two features, PDEscatter
function should be used to isnpect modalities [Thrun/Ultsch, 2018]. For many features the function takes too lang. In such a case this function can be used. See [Thrun/Ultsch, 2018] for optimal histogram description.
Michael Thrun
[Thrun/Ultsch, 2018] Thrun, M. C., & Ultsch, A.: Effects of the payout system of income taxes to municipalities in Germany, 12th Professor Aleksander Zelias International Conference on Modelling and Forecasting of Socio-Economic Phenomena, Vol. accepted, Foundation of the Cracow University of Economics, Zakopane, Poland, 2018.
Data=cbind(rnorm(100, mean = 2, sd = 3 ),rnorm(100,mean = 0, sd = 1),rnorm(100,mean = 6, sd = 0.5)) #InspectScatterplots(Data)
Data=cbind(rnorm(100, mean = 2, sd = 3 ),rnorm(100,mean = 0, sd = 1),rnorm(100,mean = 6, sd = 0.5)) #InspectScatterplots(Data)
Allows to inspect if standardization of data makes sense
InspectStandardization(Data, TransData, xug = -3, xog = 3, xlab = "Normal", yDataLab = "Data", yTransDataLab = "Trasformated Data", Symbol4Gerade = "red", main = "", ...)
InspectStandardization(Data, TransData, xug = -3, xog = 3, xlab = "Normal", yDataLab = "Data", yTransDataLab = "Trasformated Data", Symbol4Gerade = "red", main = "", ...)
Data |
... |
TransData |
... |
xug |
... |
xog |
... |
xlab |
... |
yDataLab |
... |
yTransDataLab |
... |
Symbol4Gerade |
... |
main |
... |
... |
... |
...
plot
Michael Thrun
Michael, J. R.: The stabilized probability plot, Biometrika, Vol. 70(1), pp. 11-17, 1983.
Enables distribution inspection by visualization as described in [Thrun, 2018] and for example used in
InspectVariable(Feature, Name, i = 1, xlim, ylim, sampleSize =1e+05, main)
InspectVariable(Feature, Name, i = 1, xlim, ylim, sampleSize =1e+05, main)
Feature |
[1:n] Variable/Vector of Data to be plotted |
Name |
Optional, string, for x label |
i |
Optional, No. of variable/feature, an integer of the for lope |
xlim |
[2] Optional, range of x-axis for PDEplot and histogram |
ylim |
[2] Optional, range of y-axis, only for PDEplot |
sampleSize |
Optional, default(100000), sample size, if datavector is to big |
main |
string for the title if other than what is desribed in |
Michael Thrun
[Thrun, 2018] Thrun, M. C.: Projection Based Clustering through Self-Organization and Swarm Intelligence, doctoral dissertation 2017, Springer, ISBN: 978-3-658-20539-3, Heidelberg, 2018.
[Thrun/Ultsch, 2018] Thrun, M. C., & Ultsch, A. : Effects of the payout system of income taxes to municipalities in Germany, in Papiez, M. & Smiech,, S. (eds.), Proc. 12th Professor Aleksander Zelias International Conference on Modelling and Forecasting of Socio-Economic Phenomena, pp. 533-542, Cracow: Foundation of the Cracow University of Economics, Cracow, Poland, 2018.
data("ITS") InspectVariable(ITS,Name='Income in EUR',main='ITS')
data("ITS") InspectVariable(ITS,Name='Income in EUR',main='ITS')
Numerical vector of length 11194. details in [Ultsch/Behnisch, 2017; Thrun/Ultsch, 2018].
data("ITS")
data("ITS")
[Thrun/Ultsch, 2018] Thrun, M. C., & Ultsch, A. : Effects of the payout system of income taxes to municipalities in Germany, in Papiez, M. & Smiech,, S. (eds.), Proc. 12th Professor Aleksander Zelias International Conference on Modelling and Forecasting of Socio-Economic Phenomena, pp. 533-542, Cracow: Foundation of the Cracow University of Economics, Cracow, Poland, 2018.
[Ultsch/Behnisch, 2017] Ultsch, A., Behnisch, M.: Effects of the payout system of income taxes to municipalities in Germany, Applied Geography, Vol. 81, pp. 21-31, 2017.
data(ITS) str(ITS)
data(ITS) str(ITS)
Jitters Unique Values for Visualizations
JitterUniqueValues(Data, Npoints = 20, min = 0.99999, max = 1.00001)
JitterUniqueValues(Data, Npoints = 20, min = 0.99999, max = 1.00001)
Data |
[1:n] vector of data |
Npoints |
number of jittered points generated from the m unique values of the datavector Data |
min |
minimum value of jittering |
max |
maximum value of jittering |
min and max are either multiplied or added to data depending on the range of values. If Npoints==2, then only two values per unique of Data is jittered otherwise additional values are generated.Npoints==1 does not jitter the values but gives the unique values back.
vector of DataJitter[1:(m+Npoints-1)] jittered values
Michael Thrun
used for example in MDplot
data=c(rep(1,10),rep(0,10),rep(100,10)) JitterUniqueValues(data,Npoints=1) JitterUniqueValues(data,Npoints=2) DataJitter=JitterUniqueValues(data,Npoints=20)
data=c(rep(1,10),rep(0,10),rep(100,10)) JitterUniqueValues(data,Npoints=1) JitterUniqueValues(data,Npoints=2) DataJitter=JitterUniqueValues(data,Npoints=20)
Clearly defined clusters, different variances. Detailed description of dataset and its clustering challenge is provided in [Thrun/Ultsch, 2020].
data("Lsun3D")
data("Lsun3D")
Size 404, Dimensions 3
Dataset defines discontinuites, where the clusters have different variances. Three main clusters, and four outliers (in cluster 4). For a more detailed description see [Thrun, 2018].
[Thrun, 2018] Thrun, M. C.: Projection Based Clustering through Self-Organization and Swarm Intelligence, doctoral dissertation 2017, Springer, Heidelberg, ISBN: 978-3-658-20539-3, doi:10.1007/978-3-658-20540-9, 2018.
[Thrun/Ultsch, 2020] Thrun, M. C., & Ultsch, A.: Clustering Benchmark Datasets Exploiting the Fundamental Clustering Problems, Data in Brief, Vol. 30(C), pp. 105501, doi:10.1016/j.dib.2020.105501, 2020.
data(Lsun3D) str(Lsun3D) Cls=Lsun3D$Cls Data=Lsun3D$Data
data(Lsun3D) str(Lsun3D) Cls=Lsun3D$Cls Data=Lsun3D$Data
Bland-Altman plot [Altman/Bland, 1983].
MAplot(X,Y,islog=TRUE,LoA=FALSE,CI=FALSE, densityplot=FALSE,main,xlab,ylab, Cls,lwd=2,ylim=NULL,...)
MAplot(X,Y,islog=TRUE,LoA=FALSE,CI=FALSE, densityplot=FALSE,main,xlab,ylab, Cls,lwd=2,ylim=NULL,...)
X |
[1:n] numerical vector of a feature/variable |
Y |
[1:n] another numerical vector of a feature/variable |
islog |
Optional, TRUE: MAplot, FALSE: M=x-y versus a=0.5(x+y) |
LoA |
Optional, if TRUE: limits of agreement are plottet as lines if densityplot=FALSE |
CI |
Optional, if TRUE: confidence intervals for LoA, see [Stockl et al., 2004], if densityplot=FALSE |
densityplot |
Optional, FALSE: Scatterplot using |
main |
Optional, see |
xlab |
Optional, see |
ylab |
Optional, see |
Cls |
Optional, prior Classification as a numeric vector. |
lwd |
Optional, if |
ylim |
Optional, default |
... |
for example, |
Bland-Altman plot [Altman/Bland, 1983] for visual representation of genomic data or in order to decorrelate data.
"The limits of agreement (LoA) are defined as the mean difference +- 1.96 SD of differences. If these limits do not exceed the maximum allowed difference between methods (the differences within mean +- 1.96 SD are not clinically important), the two methods are considered to be in agreement and may be used interchangeably." cited as in URL. Please note, that the underyling assumption is the normal distribution of the differences. Input argument LoA=TRUE
shows the mean of the difference in blue and +- 1.96 SD in green. Input argument CI=TRUE
shows the mean of the difference in blue and the confidence intervall as red dashed lines similar to the cited URL.
In case of densityplot=FALSE
, the function Classplot
is always called with Plotter="native"
. Then, the input argument "Colors
"" of points can only be set in Classplot
if "Cls
"" is given in this function, otherwise the points are always black. The input argument "Size
"" sets the size of points in Classplot
.
MA |
[1:n,2] Matrix of Minus component of two features and Add component of two features |
Handle |
see |
Statistics |
Named list of four element, each consisting of one value depending on input parameters |
Michael Thrun
[Altman/Bland, 1983] Altman D.G., Bland J.M.: Measurement in medicine: the analysis of method comparison studies, The Statistician, Vol. 32, p. 307-317, doi:10.2307/2987937, 1983.
https://www.medcalc.org/manual/bland-altman-plot.php
[Stockl et al., 2004] Stockl, D., Rodriguez Cabaleiro, D., Van Uytfanghe, K., & Thienpont, L. M.: Interpreting method comparison studies by use of the Bland-Altman plot: reflecting the importance of sample size by incorporating confidence limits and predefined error limits in the graphic, Clinical chemistry, Vol. 50(11), pp. 2216-2218. 2004.
data("ITS") data("MTY") MAlist=MAplot(ITS,MTY)
data("ITS") data("MTY") MAlist=MAplot(ITS,MTY)
This function creates a MD-plot for each variable of the data matrix. The MD-plot is a visualization for a boxplot-like shape of the PDF published in [Thrun et al., 2020] with the default ordering by shape. It is an improvement of violin or so-called bean plots and posses advantages in comparison to the conventional well-known box plot [Thrun et al., 2020].
A complete guide about the MDplot can be found in https://md-plot.readthedocs.io/en/latest/index.html.
MDplot(Data, Names, Ordering='Default', Scaling="None", Fill='darkblue', RobustGaussian=TRUE, GaussianColor='magenta', Gaussian_lwd=1.5, BoxPlot=FALSE,BoxColor='darkred', MDscaling='width', LineColor='black', LineSize=0.01, QuantityThreshold=50, UniqueValuesThreshold=12, SampleSize=5e+05,SizeOfJitteredPoints=1,OnlyPlotOutput=TRUE, main="MD-plot",ylab="Range of values in which PDE is estimated", BW=FALSE,ForceNames=FALSE)
MDplot(Data, Names, Ordering='Default', Scaling="None", Fill='darkblue', RobustGaussian=TRUE, GaussianColor='magenta', Gaussian_lwd=1.5, BoxPlot=FALSE,BoxColor='darkred', MDscaling='width', LineColor='black', LineSize=0.01, QuantityThreshold=50, UniqueValuesThreshold=12, SampleSize=5e+05,SizeOfJitteredPoints=1,OnlyPlotOutput=TRUE, main="MD-plot",ylab="Range of values in which PDE is estimated", BW=FALSE,ForceNames=FALSE)
Data |
[1:n,1:d] Numerical Matrix containing the n cases of d variables. Each column is one variable. A data.frame is automatically transformed to a numerical matrix. |
Names |
Optional: [1:d] Names of the variables. If missing, the columnnames of data are used. If not missing, than the names can be cleaned or not (see |
Ordering |
Optional: string, either |
Scaling |
Optional, Default is |
Fill |
Optional: String or Vector, which gives the color(s) with which MDs are to be filled with. |
RobustGaussian |
Optional: If TRUE: each MDplot of a variable is overlayed with a roubustly estimated unimodal Gaussian distribution in the range of this variable, if statistical testing does not yield a significant p.value. In this case the packages moments, diptest and signal are required. |
GaussianColor |
Optional: string, color of robustly estimated gaussian, only for |
Gaussian_lwd |
Optional: numerical, line width of robustly estimated gaussian, only for |
BoxPlot |
Optional: If TRUE: each MDplot is overlayed with a Box-Whisker Diagram. |
BoxColor |
Optional: string, color of Boxplot, only for |
MDscaling |
Optional: if "area", all violins have the same area (before trimming the tails). If "count", areas are scaled proportionally to the number of observations. If "width" (default), all MDs have the same maximum width. |
LineColor |
Optional: string, color of line around the mirrored densities. |
LineSize |
Optional: numerical, linewidth of line around the mirrored densities. |
QuantityThreshold |
Optional: numeric value defining the threshold of the minimal amount of values in data. Below this threshold no density estimation is performed and a 1D scatter plot with jittered points is drawn. Only Data Science experts should change this value after they understand how the density is estimated (see [Ultsch, 2005]). |
UniqueValuesThreshold |
Optional: numeric value defining the threshold of the minimal amount of unique values in data. Below this threshold no density estimation and statistical testing is performed and a 1D scatter plot with jittered points drawn. Only Data Science experts should change this value after they understand how the density is estimated (see [Ultsch, 2005]). |
SampleSize |
Optional: numeric value defining a threshold. Above this threshold uniform sampling of finite cases is performed in order to shorten computation time.If rowr is not installed, uniform sampling of all cases
is performed. If required, |
SizeOfJitteredPoints |
Optional: scalar. If not enough unique values for density estimation are given, data points are jittered. This parameter defines the size of the points. |
OnlyPlotOutput |
Optional: Default TRUE only a ggplot object is given back, if FALSE: Additinally, scaled data and ordering are the output of this function in a |
main |
string defining the (centered) title of the plot |
ylab |
string defining the y label, PDE= pareto density estimation (see [Ultsch, 2005]) |
BW |
FALSE: usual ggplot2 background and style which is good for screen visualizations TRUE: theme_bw() is used which is more appropriate for publications |
ForceNames |
FALSE: Per Default column names are cleaned for propper plotting TRUE: forces to set the column names as given. Beware, this can result in plotting errors. |
In short, the MD-plot can be described as a PDE optimized violin plot. The Pareto Density Estimation (PDE) is an approach to estimate the probability density function (pdf) [Ultsch, 2005].
The MD-plot is in the process of beeing peer-reviewed [Thrun/Ultsch, 2019].
Statistical testing is performed with dip.test
and agostino.test
.
For the paramter Ordering
the following options are possible:
Default
Ordering of plots by convex/concav/unimodal/nonunimodal shapes using statistical criteria. In this case the signal is required.
Columnwise
Ordering of plots by the order of columns of Data
.
AsIs
Synonym of Columnwise: Ordering of plots by the order of columns of Data
.
Alphabetical
Ordering of plots by the order of columns of Data
sorted in alphabetical order by column names.
Average
Ordering of plots by the order of columns of Data
sorted in order of increasing column-wise average
Bimodal
Ordering of plots by the order of columns of Data
sorted in order of decreasing bimodality amplitude[Zhang et al., 2003]
Variance
Ordering of plots by the order of columns of Data
sorted in order of increasing inter-quartile range
Statistics
Ordering of plots depending on the logarithm of the p-vlaues of statistical testing. In this case the packages moments, diptest and signal are required.
For the paramter Scaling
the following options are possible:
None
No Scaling of data is done.
Percentalize
Data is scaled between zero and 100.
CompleteRobust
Data is first robustly scaled between zero and 1, then centered to zero and outliers are capped by a robustly formula described in RobustNormalization
.
Robust
Data is robustly scaled between zero and 1 by a formula described in the RobustNormalization
.
Log
Data is transformed with a sgined log allowing for negative values to be transformed with a logarithm of base 10, please see SignedLog
for details.
In the default case of OnlyPlotOutput==TRUE
: The ggplot object of the MD-plot.
Otherwise for OnlyPlotOutput==FALSE
: A list of
ggplotObj |
The ggplot object of the MD-plot. |
Ordering |
The ordering of columns of data defined by |
DataOrdered |
[1:n,1:d] matrix of ordered and scaled data defined by |
Note that the package ggExtra is not necessarily required but if given the feature names are automatically rotated.
1.) One would assume that in the first of the two following cases ggplot2 only adjusts the plotting region but:
MDplot(MTY)+ylim(c(0,7000))
is equal to MDplot(MTY[MTY<7000])
.
This means in both cases the data is clipped and AFTERWARDS the density estimation is performed.
2.) Because of a (sometimes) strange behavior of either ggplot2 or reshape2, numerical column names are changed to character by adding 'C_' which can disabled using ForceNames=TRUE
.
3.) Columnnames will be automatically deblanked and cleaned. To force specific columnnames the input Names
can be used in combination with ForceNames=TRUE
. However, this can result in plotting errors or other strange behavior.
4.) Overlaying MD-plots with robustly estimated gaussians seldomly will yield magenta (or other GaussianColor
) lines overlaying more than the violin plot they should overlay, because the width of the two plots is not the same (but I am unable to set it strictly in ggplot). In such a case just call the function again.
Michael Thrun, Felix Pape contributed with the idea to use ggplot2 as the basic framework.
[Thrun et al., 2020] Thrun, M. C., Gehlert, T. & Ultsch, A.: Analyzing the Fine Structure of Distributions, PLoS ONE, Vol. 15(10), pp. 1-66, DOI 10.1371/journal.pone.0238835, 2020.
[Ultsch, 2005] Ultsch, A.: Pareto density estimation: A density estimation for knowledge discovery, in Baier, D.; Werrnecke, K. D., (Eds), Innovations in classification, data science, and information systems, Proc Gfkl 2003, pp 91-100, Springer, Berlin, 2005.
[Zhang et al., 2003] Zhang, C., Mapes, B., & Soden, B.: Bimodality in tropical water vapour, Quarterly Journalof the Royal Meteorological Society, 129(594), 2847-2866, 2003.
https://md-plot.readthedocs.io/en/latest/index.html
https://pypi.org/project/md-plot/
x = cbind( A = runif(2000, 1, 5), B = c(rnorm(1000, 0, 1), rnorm(1000, 2.6, 1)), C = c(rnorm(2000, 2.5, 1)), D = rpois(2000, 5) ) MDplot(x)
x = cbind( A = runif(2000, 1, 5), B = c(rnorm(1000, 0, 1), rnorm(1000, 2.6, 1)), C = c(rnorm(2000, 2.5, 1)), D = rpois(2000, 5) ) MDplot(x)
This function creates a MD-plot for multiple numerical vectors of various lenghts. The MD-plot is a visualization for a boxplot-like Shape of the PDF published in [Thrun et al., 2020]. It is an improvement of violin or so-called bean plots and posses advantages in comparison to the conventional well-known box plot [Thrun et al., 2020].
MDplot4multiplevectors(..., Names, Ordering = 'Columnwise', Scaling = "None", Fill = 'darkblue', RobustGaussian = TRUE, GaussianColor = 'magenta', Gaussian_lwd = 1.5, BoxPlot = FALSE, BoxColor = 'darkred', MDscaling = 'width', LineSize = 0.01, LineColor = 'black', QuantityThreshold = 40, UniqueValuesThreshold = 12, SampleSize = 5e+05, SizeOfJitteredPoints = 1, OnlyPlotOutput = TRUE)
MDplot4multiplevectors(..., Names, Ordering = 'Columnwise', Scaling = "None", Fill = 'darkblue', RobustGaussian = TRUE, GaussianColor = 'magenta', Gaussian_lwd = 1.5, BoxPlot = FALSE, BoxColor = 'darkred', MDscaling = 'width', LineSize = 0.01, LineColor = 'black', QuantityThreshold = 40, UniqueValuesThreshold = 12, SampleSize = 5e+05, SizeOfJitteredPoints = 1, OnlyPlotOutput = TRUE)
... |
Either d numerical vectors of different lengths or a list of length d where each element of the list is an vector of arbitrary length |
Names |
Optional: [1:d] Names of the variables. If missing, the columnnames of data are used. |
Ordering |
Optional: string, either |
Scaling |
Optional, Default is |
Fill |
Optional: string, color with which MDs are to be filled with. |
RobustGaussian |
Optional: If TRUE: each MDplot of a variable is overlayed with a roubustly estimated unimodal Gaussian distribution in the range of this variable, if statistical testing does not yield a significant p.value. In this case the packages moments, diptest and signal are required. |
GaussianColor |
Optional: string, color of robustly estimated gaussian, only for |
Gaussian_lwd |
Optional: numerical, line width of robustly estimated gaussian, only for |
BoxPlot |
Optional: If TRUE: each MDplot is overlayed with a Box-Whisker Diagram. |
BoxColor |
Optional: string, color of Boxplot, only for |
MDscaling |
Optional: if "area", all violins have the same area (before trimming the tails). If "count", areas are scaled proportionally to the number of observations. If "width" (default), all MDs have the same maximum width. |
LineSize |
Optional: numerical, linewidth of line around the mirrored densities. |
LineColor |
Optional: string, color of line around the mirrored densities. |
QuantityThreshold |
Optional: numeric value defining a threshold. Below this threshold no density estimation is performed and a jitter plot with a median line is drawn. Only Data Science experts should change this value after they understand how the density is estimated (see [Ultsch, 2005]). |
UniqueValuesThreshold |
Optional: numeric value defining a threshold. Below this threshold no density estimation and statistical testing is performed and a Jitter plot is drawn. Only Data Science experts should change this value after they understand how the density is estimated (see [Ultsch, 2005]). |
SampleSize |
Optional: numeric value defining a threshold. Above this threshold uniform sampling of finite cases is performed in order to shorten computation time.If rowr is not installed, uniform sampling of all cases
is performed. If required, |
SizeOfJitteredPoints |
Optional: scalar. If Not enough unique values for density estimation are given, data points are jittered. This parameter defines the size of the points. |
OnlyPlotOutput |
Optional: Default TRUE only a ggplot object is given back, if FALSE: Additinally Scaled Data and ordering are the output of this function in a |
Please see MDplot
for details.
In the default case of OnlyPlotOutput==TRUE
: The ggplot object of the MD-plot.
Otherwise for OnlyPlotOutput==FALSE
: A list of
ggplotObj |
The ggplot object of the MD-plot. |
Ordering |
The ordering of columns of data defined by |
DataOrdered |
[1:n,1:d] matrix of ordered and scaled data defined by |
Note that the package ggExtra is not necessarily required but if given the feauture names are automatically rotated.
cbind.fill is internally used from the depricated R package rowr of Craig Varrichio.
Michael Thrun.
[Ultsch, 2005] Ultsch, A.: Pareto density estimation: A density estimation for knowledge discovery, in Baier, D.; Werrnecke, K. D., (Eds), Innovations in classification, data science, and information systems, Proc Gfkl 2003, pp 91-100, Springer, Berlin, 2005.
[Thrun et al., 2020] Thrun, M. C., Gehlert, T. & Ultsch, A.: Analyzing the Fine Structure of Distributions, PLoS ONE, Vol. 15(10), pp. 1-66, DOI 10.1371/journal.pone.0238835, 2020.
ClassMDplot
MDplot
https://pypi.org/project/md-plot/
MDplot4multiplevectors(runif(20000, 1, 5),c(rnorm(20000,0,1), rnorm(20000,2.6,1)),c(rnorm(2000,2.5,1)),rpois(25000,5), Names=c('A','B','C','D')) V=list(runif(20000, 1, 5),c(rnorm(20000,0,1), rnorm(20000,2.6,1)),c(rnorm(2000,2.5,1)),rpois(25000,5)) MDplot4multiplevectors(V,Names=c('A','B','C','D'))
MDplot4multiplevectors(runif(20000, 1, 5),c(rnorm(20000,0,1), rnorm(20000,2.6,1)),c(rnorm(2000,2.5,1)),rpois(25000,5), Names=c('A','B','C','D')) V=list(runif(20000, 1, 5),c(rnorm(20000,0,1), rnorm(20000,2.6,1)),c(rnorm(2000,2.5,1)),rpois(25000,5)) MDplot4multiplevectors(V,Names=c('A','B','C','D'))
If the input is a matrix the mean value will be compute for every column.
Meanrobust(x, p=10,na.rm=TRUE)
Meanrobust(x, p=10,na.rm=TRUE)
x |
vetor or matrix |
p |
default=10; percent of the top- and bottomcut from x |
na.rm |
a boolean evaluating to TRUE or FALSE indicating whether all non finite values should be stripped before the computation proceeds. |
Zornitsa Manolova
Numerical vector of length 11194. details in [Ultsch/Behnisch, 2017; Thrun/Ultsch, 2018].
data("MTY")
data("MTY")
[Thrun/Ultsch, 2018] Thrun, M. C., & Ultsch, A. : Effects of the payout system of income taxes to municipalities in Germany, in Papiez, M. & Smiech,, S. (eds.), Proc. 12th Professor Aleksander Zelias International Conference on Modelling and Forecasting of Socio-Economic Phenomena, pp. 533-542, Cracow: Foundation of the Cracow University of Economics, Cracow, Poland, 2018.
[Ultsch/Behnisch, 2017] Ultsch, A., Behnisch, M.: Effects of the payout system of income taxes to municipalities in Germany, Applied Geography, Vol. 81, pp. 21-31, 2017.
data(MTY) str(MTY)
data(MTY) str(MTY)
ggplot objects can be passed in ..., or to plotlist (as a list of ggplot objects)
For example, if the layout is specified as the matrix(c(1,2,3,3), nrow=2, byrow=TRUE), then plot 1 will go in the upper left, 2 will go in the upper right, and 3 will go all the way across the bottom.
Multiplot(..., Plotlist=NULL, ColNo=1, LayoutMat)
Multiplot(..., Plotlist=NULL, ColNo=1, LayoutMat)
... |
multiple ggplot objects to be plotted |
Plotlist |
Optional: list filled with ggplot objects to be plotted |
ColNo |
Number of columns in layout |
LayoutMat |
A matrix specifying the layout. If present, 'ColNo' is ignored. |
List with Plotlist
Winston Chang
data(Lsun3D) Data=Lsun3D$Data Cls=Lsun3D$Cls obj1=Classplot(Data[,1],Data[,2],Cls=Cls,Plotter="ggplot",Size=3,main="Top plot") obj2=Classplot(Data[,2],Data[,3],Cls=Cls,Plotter="ggplot",Size=3,main="Middle plot") obj3=Classplot(Data[,1],Data[,3],Cls=Cls,Plotter="ggplot",Size=3,main="Bottom plot") V=Multiplot(obj1,obj2,obj3)
data(Lsun3D) Data=Lsun3D$Data Cls=Lsun3D$Cls obj1=Classplot(Data[,1],Data[,2],Cls=Cls,Plotter="ggplot",Size=3,main="Top plot") obj2=Classplot(Data[,2],Data[,3],Cls=Cls,Plotter="ggplot",Size=3,main="Middle plot") obj3=Classplot(Data[,1],Data[,3],Cls=Cls,Plotter="ggplot",Size=3,main="Bottom plot") V=Multiplot(obj1,obj2,obj3)
Creates a set of two violin plots opposing each other
OpposingViolinBiclassPlot(ListData, Names, BoxPlots = FALSE, Subtitle = c("AttributeA", "AttributeB"), Title = "Opposing Violin Biclass Plot")
OpposingViolinBiclassPlot(ListData, Names, BoxPlots = FALSE, Subtitle = c("AttributeA", "AttributeB"), Title = "Opposing Violin Biclass Plot")
ListData |
List of k matrices as elements where each element has shape [1:n, 1:2] |
Names |
Vector of character names for each element of ListData |
BoxPlots |
Optional: Boolean variable BoxPlots = TRUE shows a box plot drawn into the violin plot. BoxPlots = FALSE shows no box plot. Default: BoxPlots = FALSE |
Subtitle |
Optional: Vector of character names for two classes. The classes are described as features contained in the matrix [1:n, 1:2] |
Title |
Optional: Character containing the title of the plot. |
Plotly object.
Quirin Stier
Optimal Number Of Bins is a kernel density estimation for fixed intervals.
Calculation of the optimal number of bins for a histogram.
OptimalNoBins(Data)
OptimalNoBins(Data)
Data |
Data |
The bin width ist defined with bw=3.49*stdrobust(1/(n)^1/3)
optNrOfBins The best possible number of bins. Not less than 10 though
This the second version of the function prior available in AdaptGauss
Alfred Ultsch, Michael Thrun
David W. Scott Jerome P. Keating: A Primer on Density Estimation for the Great Home Run Race of 98, STATS 25, 1999, pp 16-22.
See Also
ParetoRadius
Data = c(rnorm(1000),rnorm(2000)+2,rnorm(1000)*2-1) optNrOfBins = OptimalNoBins(Data) minData = min(Data,na.rm = TRUE) maxData = max(Data,na.rm = TRUE) i = maxData-minData optBreaks = seq(minData, maxData, i/optNrOfBins) # bins in fixed intervals hist(Data, breaks=optBreaks)
Data = c(rnorm(1000),rnorm(2000)+2,rnorm(1000)*2-1) optNrOfBins = OptimalNoBins(Data) minData = min(Data,na.rm = TRUE) maxData = max(Data,na.rm = TRUE) i = maxData-minData optBreaks = seq(minData, maxData, i/optNrOfBins) # bins in fixed intervals hist(Data, breaks=optBreaks)
This function estimates the Pareto Density for the distribution of one variable. In the default setting the functions estimates internally the appropriate number and position of kernels to estimate the density properly. However, the user can set the kernels
manually. In this case density will only be estimated only around these values even if data exists outside the range of kernels or the internally estimated paretoRadius
does not contain all datapoints between each kernel. See example for details.
ParetoDensityEstimation(Data, paretoRadius, kernels = NULL, MinAnzKernels = 100,PlotIt=FALSE,Silent=FALSE)
ParetoDensityEstimation(Data, paretoRadius, kernels = NULL, MinAnzKernels = 100,PlotIt=FALSE,Silent=FALSE)
Data |
[1:n] numeric vector of data. |
paretoRadius |
Optional scalar, numeric value, see |
kernels |
Optional,[1:m] numeric vector data values where pareto density is measured at. If 0 (by default) kernels will be computed. |
MinAnzKernels |
Optional, minimal number of kernels, default MinAnzKernels==100 |
PlotIt |
Optional, if TRUE: raw basic r plot of density estimation of debugging purposes. Usually please use ggplot2 interface via |
Silent |
Optional, if TRUE: disables all warnings |
Pareto Density Estimation (PDE) is a method for the estimation of probability density functions using hyperspheres. The Pareto-radius of the hyperspheres is derived from the optimization of information for minimal set size. It is shown, that Pareto Density is the best estimate for clusters of Gaussian structure. The method is shown to be robust when cluster overlap and when the variances differ across clusters. This is the best density estimation to judge Gaussian Mixtures of the data see [Ultsch 2003].
If input argument kernels
is set manually the output arguments paretoDensity_internal
and kernels_internal
provide the internally estimated density and kernels. Otherwise these arguments are NULL. The function provides a message if range of kernels and range of data does not overlap completly.
Typically it is not advisable to set paretoRadius
manually. However in specific cases, the function ParetoRadius
is used prior to calling this function. In such cases the input argument can use a priorly estimated paretoRadius
.
List With
[1:m] numeric vector. data values at with Pareto Density is measured.
[1:m] numeric vector containing the determined density by paretoRadius
.
numeric value of defining the radius
Either NULL or internally estimated [1:p] numeric vector of kernels if input argument kernels
was set by the user
Either NULL or internally estimated density if input argument kernels
was set by the user
This the second version of the function prior available in AdaptGauss
Michael Thrun
Ultsch, A.: Pareto density estimation: A density estimation for knowledge discovery, in Baier, D.; Werrnecke, K. D., (Eds), Innovations in classification, data science, and information systems, Proc Gfkl 2003, pp 91-100, Springer, Berlin, 2005.
#kernels are estimated internally data = c(rnorm(1000),rnorm(2000)+2,rnorm(1000)*2-1) pdeVal <- ParetoDensityEstimation(data) plot(pdeVal$kernels,pdeVal$paretoDensity,type='l',xaxs='i', yaxs='i',xlab='Data',ylab='PDE') ##data exist outside of the range kernels kernels=seq(from=-3,to=3,by=0.01) pdeVal <- ParetoDensityEstimation(data, kernels=kernels) plot(pdeVal$kernels,pdeVal$paretoDensity,type='l',xaxs='i', yaxs='i',xlab='Data',ylab='PDE') #data exists in-between kernels that is not measured pdeVal$paretoRadius#0.42 kernels=seq(from=-8,to=8,by=1) pdeVal <- ParetoDensityEstimation(data, kernels=kernels) plot(pdeVal$kernels,pdeVal$paretoDensity,type='l',xaxs='i', yaxs='i',xlab='Data',ylab='PDE')
#kernels are estimated internally data = c(rnorm(1000),rnorm(2000)+2,rnorm(1000)*2-1) pdeVal <- ParetoDensityEstimation(data) plot(pdeVal$kernels,pdeVal$paretoDensity,type='l',xaxs='i', yaxs='i',xlab='Data',ylab='PDE') ##data exist outside of the range kernels kernels=seq(from=-3,to=3,by=0.01) pdeVal <- ParetoDensityEstimation(data, kernels=kernels) plot(pdeVal$kernels,pdeVal$paretoDensity,type='l',xaxs='i', yaxs='i',xlab='Data',ylab='PDE') #data exists in-between kernels that is not measured pdeVal$paretoRadius#0.42 kernels=seq(from=-8,to=8,by=1) pdeVal <- ParetoDensityEstimation(data, kernels=kernels) plot(pdeVal$kernels,pdeVal$paretoDensity,type='l',xaxs='i', yaxs='i',xlab='Data',ylab='PDE')
Calculation of the ParetoRadius i.e. the 18 percentiles of all mutual Euclidian distances in data.
ParetoRadius(Data, maximumNrSamples = 10000, plotDistancePercentiles = FALSE)
ParetoRadius(Data, maximumNrSamples = 10000, plotDistancePercentiles = FALSE)
Data |
numeric data vector |
maximumNrSamples |
Optional, numeric. Maximum number for which the distance calculation can be done. 1000 by default. |
plotDistancePercentiles |
Optional, logical. If TRUE, a plot of the percentiles of distances is produced. FALSE by default. |
The Pareto-radius of the hyperspheres is derived from the optimization of information for minimal set size. ParetoRadius() is a kernel density estimation for variable intervals. It works only on Data without missing values (NA) or NaN. In other cases, please use ParetoDensityEstimation directly.
numeric value, the Pareto radius.
This the second version of the function prior available in AdaptGauss.
For larger datasets the quantile_c() function is used instead of quantile in R which was programmed by Dirk Eddelbuettel on Jun 6 and taken by the author from https://github.com/RcppCore/Rcpp/issues/967.
Michael Thrun
Ultsch, A.: Pareto density estimation: A density estimation for knowledge discovery, in Baier, D.; Werrnecke, K. D., (Eds), Innovations in classification, data science, and information systems, Proc Gfkl 2003, pp 91-100, Springer, Berlin, 2005.
See Also
ParetoDensityEstimation, OptimalNoBins
This functions plots ParetoDensityEsrtimation (PDE) and robustly estimated Gaussian with empirical Mean and Variance
PDEnormrobust(Data,xlab='PDE',ylab,main='PDEnormrobust', PlotSymbolPDE='blue', PlotSymbolGauss= 'magenta',PlotIt=TRUE, Mark2Sigma=FALSE,Mark3Sigma=FALSE, p_mean=10,p_sd=25,...)
PDEnormrobust(Data,xlab='PDE',ylab,main='PDEnormrobust', PlotSymbolPDE='blue', PlotSymbolGauss= 'magenta',PlotIt=TRUE, Mark2Sigma=FALSE,Mark3Sigma=FALSE, p_mean=10,p_sd=25,...)
Data |
numeric vector, data to be plotted. |
xlab |
Optional,see plot |
ylab |
Optional,see plot |
main |
Optional,see plot |
PlotSymbolPDE |
line color pdf |
PlotSymbolGauss |
line color robust gauss |
Mark2Sigma |
TRUE: sets to vertical lines marking data outside M+-1.96SD |
Mark3Sigma |
TRUE: sets to vertical lines marking data outside M+-2.576SD |
p_mean |
scalar between 1-99, percent of the top- and bottomcut from x |
p_sd |
scalar between 1-99, lowInnerPercentile for robustly estimated standard deviation |
... |
Further arguments for plot |
Within Mark2Sigma
95 percent of data should be contained if distribution is Gaussian
Within Mark3Sigma
99 percent of data should be contained if distribution is Gaussian
The 3sgima rule is usually defined as M+-3SD containing 99.7 percent of data but to simplify, the input parameter name is called Mark3Sigma
instead Mark2comma576Sigma
, the same reason applies to the output parameter Sigma3
.
Kernels |
numeric vector. The x points of the PDE function. |
ParetoDensity |
estimated pdf of data, numeric vector, the PDE(x). |
ParetoRadius |
numeric value, the Pareto Radius used for the plot. |
Normaldist |
pdf based on rubstly estimated parameters |
Pars |
Named vector of robustly estimatated |
Michael Thrun
data(MTY) PDEnormrobust(unname(MTY))
data(MTY) PDEnormrobust(unname(MTY))
This function plots the Pareto probability density estimation (PDE), uses PDEstimationForGauss and ParetoRadius.
PDEplot(Data, paretoRadius = 0, weight = 1, kernels = NULL, LogPlot = F, PlotIt = TRUE, title = "ParetoDensityEstimation(PDE)", color = "blue", xpoints = FALSE, xlim, ylim, xlab, ylab = "PDE", ggPlot = ggplot(), sampleSize = 2e+05, lwd = 2)
PDEplot(Data, paretoRadius = 0, weight = 1, kernels = NULL, LogPlot = F, PlotIt = TRUE, title = "ParetoDensityEstimation(PDE)", color = "blue", xpoints = FALSE, xlim, ylim, xlab, ylab = "PDE", ggPlot = ggplot(), sampleSize = 2e+05, lwd = 2)
Data |
[1:n] numeric vector of data to be plotted. |
paretoRadius |
numeric, the Pareto Radius. If omitted, calculate by paretoRad. |
weight |
numeric, Weight*ParetoDensity is plotted. 1 by default. |
kernels |
numeric vector of kernels. Optional |
LogPlot |
LogLog PDEplot if TRUE, xpoints has to be FALSE. Optional |
PlotIt |
logical, if plot. TRUE by default. |
title |
character vector, title of plot. |
color |
character vector, color of plot. |
xpoints |
logical, if TRUE only points are plotted. FALSE by default. |
xlim |
Arguments to be passed to the plot method. |
ylim |
Arguments to be passed to the plot method. |
xlab |
Arguments to be passed to the plot method. |
ylab |
Arguments to be passed to the plot method. |
ggPlot |
ggplot2 object to be plotted upon. Insert an exisiting plot to add a new PDEPlot to it. Default: empty plot |
sampleSize |
default(200000), sample size, if datavector is to big |
lwd |
linewidth, see |
kernels |
numeric vector. The x points of the PDE function. |
paretoDensity |
numeric vector, the PDE(x). |
paretoRadius |
numeric value, the Pareto Radius used for the plot. |
ggPlot |
ggplot2 object. Can be used to further modify the plot or add other plots. |
Michael Thrun
Ultsch, A.: Pareto Density Estimation: A Density Estimation for Knowledge Discovery, Baier D., Wernecke K.D. (Eds), In Innovations in Classification, Data Science, and Information Systems - Proceedings 27th Annual Conference of the German Classification Society (GfKL) 2003, Berlin, Heidelberg, Springer, pp, 91-100, 2005.
x <- rnorm(1000, mean = 0.5, sd = 0.5) y <- rnorm(750, mean = -0.5, sd = 0.75) plt <- PDEplot(x, color = "red")$ggPlot plt <- PDEplot(y, color = "blue", ggPlot = plt)$ggPlot # Second Example # ggplotObj=ggplot() # for(i in 1:length(Variables)) # ggplotObj=PDEplot(Data[,i],ggPlot = ggplotObj)$ggPlot
x <- rnorm(1000, mean = 0.5, sd = 0.5) y <- rnorm(750, mean = -0.5, sd = 0.75) plt <- PDEplot(x, color = "red")$ggPlot plt <- PDEplot(y, color = "blue", ggPlot = plt)$ggPlot # Second Example # ggplotObj=ggplot() # for(i in 1:length(Variables)) # ggplotObj=PDEplot(Data[,i],ggPlot = ggplotObj)$ggPlot
the pie chart represents amount of values given in data.
Piechart(Datavector,Names,Labels,MaxNumberOfSlices, main='',col,Rline=1,...)
Piechart(Datavector,Names,Labels,MaxNumberOfSlices, main='',col,Rline=1,...)
Datavector |
[1:n] a vector of n non unique values |
Names |
Optional,
[1:k] names to search for in Datavector, if not set |
Labels |
Optional, [1:k] Labels if they are specially named, if not Names are used. |
MaxNumberOfSlices |
Default is k, integer value defining how many labels will be shown. Everything else will be summed up to |
main |
Optional, title below the fan pie, see |
col |
Optional, the default are the first [1:k] colors of the default color sequence used in this package, otherwise a character vector of [1:k] specifying the colors analog to |
Rline |
Optional, the radius of the pie in numerical numbers |
... |
Optional, further arguments passed on to |
If Number of Slices is higher than MaxNumberOfSlices then ABCanalysis
is applied (see [Ultsch/Lotsch, 2015]) and group A chosen.
If Number of Slices in group A is higher than MaxNumberOfSlices, then the most important ones out of group A are chosen.
If MaxNumberOfSlices is higher than Slices in group A, additional slices are shown depending on the percentage (from high to low).
Parameters of visualization a set as in [Schwabish, 2014] defined.
Color sequence is automatically shortened to the MaxNumberOfSlices used in the pie chart.
silent output by calling invisible
of a list with
Percentages |
[1:k] percent values visualized in fanplot |
Labels |
[1:k] see input |
You see in the example below that a pie chart does not visualize such data well contrary to the fanPlot
.
Michael Thrun
[Schwabish, 2014] Schwabish, Jonathan A. An Economist's Guide to Visualizing Data. Journal of Economic Perspectives, 28 (1): 209-34. DOI: 10.1257/jep.28.1.209, 2014.
[Ultsch/Lotsch, 2015] Ultsch. A ., Lotsch J.: Computed ABC Analysis for Rational Selection of Most Informative Variables in Multivariate Data, PloS one, Vol. 10(6), pp. e0129767. doi 10.1371/journal.pone.0129767, 2015.
data(categoricalVariable) Piechart(categoricalVariable)
data(categoricalVariable) Piechart(categoricalVariable)
Plots Data matrix as a pixel coulour image.
Pixelmatrix(Data, XNames, LowLim, HiLim, YNames, main,FillNotFiniteWithHighestValue=FALSE)
Pixelmatrix(Data, XNames, LowLim, HiLim, YNames, main,FillNotFiniteWithHighestValue=FALSE)
Data |
[1:n,1:d] Data cases in rows (n), variables in columns (d) |
LowLim |
Optional: limits for the color axis |
HiLim |
Optional: limits for the color axis |
XNames |
Optional: Vector - names for the X-ticks, NULL: no ticks at all |
YNames |
Optional: Vector - names for the Y-ticks, NULL: no ticks at all |
main |
Optoinal: String - Title of the plot |
FillNotFiniteWithHighestValue |
Optional: TRUE: fills not finite values with same color as the highest value |
Low values are shown in blue and green, middle values in yellow and high values in orange and red.
Michael Thrun, Felix Pape
data("Lsun3D") Data=Lsun3D$Data Pixelmatrix(Data)
data("Lsun3D") Data=Lsun3D$Data Pixelmatrix(Data)
A wrapper for Data with systematic clustering colors for either a 2D (x,y) or 3D (x,y,z) plot combined with a classification
Plot3D(Data,Cls,UniqueColors, size=2,na.rm=FALSE,Plotter3D="rgl",...)
Plot3D(Data,Cls,UniqueColors, size=2,na.rm=FALSE,Plotter3D="rgl",...)
Data |
[1:n,1:d] matrix with either |
Cls |
[1:n] numeric vector of the classification of data with |
UniqueColors |
[1:k] character vector of colors, if not given DataVisualizations::DefaultColorSequence is used |
size |
size of points, for plotly additional a vector [1:n] of a mapping of sizes to Cls has to be given in the (...) argument with |
na.rm |
if |
Plotter3D |
in case of 3 dimensions, choose either "plotly" or "rgl", if one of this packages is not given, the other one is selected as a fallback method |
... |
further arguments to be processed by |
For geom_point
only size
and na.rm
is available as further arguments.
Uses either geom_point
for 2D or plot3d
for 3D or plot_ly
Michael Thrun
RGL vignette in https://cran.r-project.org/package=rgl
#Spin3D similar output data(Lsun3D) Plot3D(Lsun3D$Data,Lsun3D$Cls,type='s',radius=0.1,box=FALSE,aspect=TRUE) rgl::grid3d(c("x", "y", "z")) #Projected Points with Classification Data=cbind(runif(500,min=-3,max=3),rnorm(500)) # Classification Cls=ifelse(Data[,1]>0,1,2) Plot3D(Data,Cls,UniqueColors = DataVisualizations::DefaultColorSequence[c(1,3)],size=2) ## Not run: #Points with Non-Overlapping Labels #require(ggrepel) Data=cbind(runif(30,min=-1,max=1),rnorm(30,0,0.5)) Names=paste0('VeryLongName',1:30) ggobj=Plot3D(Data) ggobj + geom_text_repel(aes(label=Names), size=3) ## End(Not run)
#Spin3D similar output data(Lsun3D) Plot3D(Lsun3D$Data,Lsun3D$Cls,type='s',radius=0.1,box=FALSE,aspect=TRUE) rgl::grid3d(c("x", "y", "z")) #Projected Points with Classification Data=cbind(runif(500,min=-3,max=3),rnorm(500)) # Classification Cls=ifelse(Data[,1]>0,1,2) Plot3D(Data,Cls,UniqueColors = DataVisualizations::DefaultColorSequence[c(1,3)],size=2) ## Not run: #Points with Non-Overlapping Labels #require(ggrepel) Data=cbind(runif(30,min=-1,max=1),rnorm(30,0,0.5)) Names=paste0('VeryLongName',1:30) ggobj=Plot3D(Data) ggobj + geom_text_repel(aes(label=Names), size=3) ## End(Not run)
plots a neighborhood graph in two dimensions given the 2D coordinates of the points
PlotGraph2D(AdjacencyMatrix, Points, Cls, Colors, xlab = "X", ylab = "Y", xlim, ylim, Plotter = "native", LineColor = "grey", pch = 20, lwd = 0.1, main = "", mainSize)
PlotGraph2D(AdjacencyMatrix, Points, Cls, Colors, xlab = "X", ylab = "Y", xlim, ylim, Plotter = "native", LineColor = "grey", pch = 20, lwd = 0.1, main = "", mainSize)
AdjacencyMatrix |
[1:n,1:n] numerical matrix consting of binary values. 1 indicates that two points have an edge, zero that they do not |
Points |
[1:n,1:2] numeric matrix of two feature |
Cls |
[1:n] numeric vector of k classes, if not set per default every point is in first class |
Colors |
Optional, string defining the k colors, one per class |
xlab |
Optional, string for xlabel |
ylab |
Optional, string for ylabel |
xlim |
Optional, [1:2] vector of x-axis limits |
ylim |
Optional, [1:2] vector of y-axis limits |
Plotter |
Optional, either |
LineColor |
Optional, color of edges |
pch |
Optional, shape of point, usally can be in a range from zero to 25, see pch of plot for details |
lwd |
width of the lines |
main |
Optional, string for the title of plot |
mainSize |
Optional, scalar for the size of the title of plot |
The points are the vertices of the graph. the adjacency matrix defines the edges. Via adjacency matrix various graphs, like from deldir package, can be used.
native plot or plotly object depending on input argument Plotter
Michael Thrun
Lecture of Knowledge Discovery II
N=10 x=runif(N) y=runif(N) Euklid=as.matrix(dist(cbind(x,y))) Radius=quantile(as.vector(Euklid),0.5) RKugelGraphAdjMatrix = matrix(0, ncol = N, nrow = N) for (i in 1:N) { RInd = which(Euklid[i, ] <= Radius, arr.ind = TRUE) RKugelGraphAdjMatrix[i, RInd] = 1 } PlotGraph2D(RKugelGraphAdjMatrix,cbind(x,y))
N=10 x=runif(N) y=runif(N) Euklid=as.matrix(dist(cbind(x,y))) Radius=quantile(as.vector(Euklid),0.5) RKugelGraphAdjMatrix = matrix(0, ncol = N, nrow = N) for (i in 1:N) { RInd = which(Euklid[i, ] <= Radius, arr.ind = TRUE) RKugelGraphAdjMatrix[i, RInd] = 1 } PlotGraph2D(RKugelGraphAdjMatrix,cbind(x,y))
Percentage of missing values per feature are visualized as a bar plot.
PlotMissingvalues(Data,Names, WhichDefineMissing=c('NA','NaN','DUMMY','.',' '), PlotIt=TRUE, xlab='Amount Of Missing Values in Percent', xlim=c(0,100),...)
PlotMissingvalues(Data,Names, WhichDefineMissing=c('NA','NaN','DUMMY','.',' '), PlotIt=TRUE, xlab='Amount Of Missing Values in Percent', xlim=c(0,100),...)
Data |
[1:n,1:d] data cases in rows, variables/features in columns |
Names |
[1:d] optional vector of string describing the names of the features |
WhichDefineMissing |
[1:d] optional vector of string describing missing values, usefull for character features. Currently up to five different options are possible. |
PlotIt |
If FALES: Does not plot |
xlab |
x label of bar plot |
xlim |
x axis limits in percent |
... |
Further arguments passed on to |
plots not finite and missing values as a bar plot for each feature d
and returns with invisible
the amount of missing values as a vector. Works even with character variables, but WhichDefineMissing cannot be changed at the current version. Please make a suggestion on GitHub how to improve this.
Does not work with the tibble format, in such a case please call as.data.frame(as.matrix(Data))
Michael Thrun
data("ITS") data("MTY") PlotMissingvalues(cbind(ITS,MTY),Names=c('ITS','MTY'))
data("ITS") data("MTY") PlotMissingvalues(cbind(ITS,MTY),Names=c('ITS','MTY'))
The product-ratio plot as defined in [Tukey, 1977, p. 594].
PlotProductratio(X, Y, na.rm = FALSE, main='Product Ratio Analysis',xlab = "Log of Ratio",ylab = "Root of Product", ...)
PlotProductratio(X, Y, na.rm = FALSE, main='Product Ratio Analysis',xlab = "Log of Ratio",ylab = "Root of Product", ...)
X |
[1:n] positive numerical vector, negativ values are removed automatically |
Y |
[1:n] positive numerical vector, negativ values are removed automatically |
na.rm |
Function may not work with non finite values. If these cases should be automatically removed, set parameter TRUE |
main |
see |
ylab |
see |
xlab |
see |
... |
further arguments passed on to |
In the case where there are many instances of very small values, but a small number of very large ones, this plot is usefull [Tukey, 1977, p. 615].
matrix[1:n,2] with sqrt(x*y) and log(x/y) as the two columns
Michael Thrun
[Tukey, 1977] Tukey, J. W.: Exploratory data analysis, United States Addison-Wesley Publishing Company, ISBN: 0-201-07616-0, 1977.
#Beware: The data does no fit ne requirements for this approach data('ITS') data(MTY) PlotProductratio(ITS,MTY)
#Beware: The data does no fit ne requirements for this approach data('ITS') data(MTY) PlotProductratio(ITS,MTY)
Defines the default color sequence for plots made with PDEscatter
data("PmatrixColormap")
data("PmatrixColormap")
Returns the vectors for a (heat) colormap.
Qantile-quantile plot with a linear fit
QQplot(X,Y,Type=8,NoQuantiles=10000,xlab, ylab,col="red",main='', lwd=3,pch=20,subplot=FALSE,...)
QQplot(X,Y,Type=8,NoQuantiles=10000,xlab, ylab,col="red",main='', lwd=3,pch=20,subplot=FALSE,...)
X |
[1:n] numerical vector, First Feature |
Y |
1:n] numerical vector, Second Feature to compare first feature with |
Type |
an integer between 1 and 9 selecting one of the nine quantile algorithms detailed in |
NoQuantiles |
number of quantiles used in QQ-plot, if number is low and the data has outliers, there may be empty space visible in the plot |
xlab |
x label, see |
ylab |
y label, see |
col |
color of line, see |
main |
title of plot, see |
lwd |
line width of plot, see |
pch |
type of point, see |
subplot |
FALSE: par is set specifically, TRUE: assumption is the usage as a subfigure, par has to be set by the user, no checks are performed, labels have to be set by the user |
... |
other parameters for |
Output is the evaluation of a linear (regression) fit of lm
called 'line' and a quantile quantile plot (QQplot). Per default 10.000 quantiles are chosen, but in the case of very large data vectors one can reduce the quantiles for faster computation.
The 100 percentiles used for the regression line are of darker blue than the quantiles chosen by the user.
List with
Quantiles |
[1:NoQuantiles,1:2] quantiles in y and y |
Residuals |
Output of the Regression with |
Summary |
Output of the Regression with |
Anova |
Output of the Regression with |
Michael Thrun
Michael, J. R.: The stabilized probability plot, Biometrika, Vol. 70(1), pp. 11-17, 1983.
data(MTY) NormalDistribution=rnorm(50000) QQplot(NormalDistribution,MTY)
data(MTY) NormalDistribution=rnorm(50000) QQplot(NormalDistribution,MTY)
Transforms the Robust Normalization back if Capped=FALSE
RobustNorm_BackTrafo(TransformedData, MinX,Denom,Center=0)
RobustNorm_BackTrafo(TransformedData, MinX,Denom,Center=0)
TransformedData |
[1:n,1:d] matrix |
MinX |
scalar |
Denom |
scalar |
Center |
scalar |
For details see RobustNormalization
[1:n,1:d] Data matrix
Michael Thrun
data(Lsun3D) Data = Lsun3D$Data TransList = RobustNormalization(Data, Centered = TRUE, WithBackTransformation = TRUE) Lsun3DData = RobustNorm_BackTrafo(TransList$TransformedData, TransList$MinX, TransList$Denom, TransList$Center) sum(Lsun3DData - Data) #<e-15
data(Lsun3D) Data = Lsun3D$Data TransList = RobustNormalization(Data, Centered = TRUE, WithBackTransformation = TRUE) Lsun3DData = RobustNorm_BackTrafo(TransList$TransformedData, TransList$MinX, TransList$Denom, TransList$Center) sum(Lsun3DData - Data) #<e-15
RobustNormalization as described in [Milligan/Cooper, 1988].
RobustNormalization(Data,Centered=FALSE,Capped=FALSE, na.rm=TRUE,WithBackTransformation=FALSE, pmin=0.01,pmax=0.99)
RobustNormalization(Data,Centered=FALSE,Capped=FALSE, na.rm=TRUE,WithBackTransformation=FALSE, pmin=0.01,pmax=0.99)
Data |
[1:n,1:d] data matrix of n cases and d features |
Centered |
centered data around zero by median if TRUE |
Capped |
TRUE: outliers are capped above 1 or below -1 and set to 1 or -1. |
na.rm |
If TRUE, infinite vlaues are disregarded |
WithBackTransformation |
If in the case for forecasting with neural networks a backtransformation is required, this parameter can be set to 'TRUE'. |
pmin |
defines outliers on the lower end of scale |
pmax |
defines outliers on the higher end of scale |
Normalizes features either between -1 to 1 (Centered=TRUE) or 0-1 (Centered=TRUE) without changing the distribution of a feature itself. For a more precise description please read [Thrun, 2018, p.17].
"[The] scaling of the inputs determines the effective scaling of the weights in the last layer of a MLP with BP neural netowrk, it can have a large effect on the quality of the final solution. At the outset it is best to standardize all inputs to have mean zero and standard deviation 1 [(or at least the range under 1)]. This ensures all inputs are treated equally in the regularization prozess, and allows to choose a meaningful range for the random starting weights."[Friedman et al., 2012]
if WithBackTransformation=FALSE
: TransformedData[1:n,1:d] i.e.,
normalized data matrix of n cases and d features
if WithBackTransformation=TRUE
: List with
TransformedData |
[1:n,1:d] normalized data matrix of n cases and d features |
MinX |
[1:d] numerical vector used for manual back-transformation of each feature |
MaxX |
[1:d] numerical vector used for manual back-transformation of each feature |
Denom |
[1:d] numerical vector used for manual back-transformation of each feature |
Center |
[1:d] numerical vector used for manual back-transformation of each feature |
Michael Thrun
[Milligan/Cooper, 1988] Milligan, G. W., & Cooper, M. C.: A study of standardization of variables in cluster analysis, Journal of Classification, Vol. 5(2), pp. 181-204. 1988.
[Friedman et al., 2012] Friedman, J., Hastie, T., & Tibshirani, R.: The Elements of Statistical Learning, (Second ed. Vol. 1), Springer series in statistics New York, NY, USA:, ISBN, 2012.
[Thrun, 2018] Thrun, M. C.: Projection Based Clustering through Self-Organization and Swarm Intelligence, doctoral dissertation 2017, Springer, Heidelberg, ISBN: 978-3-658-20539-3, doi:10.1007/978-3-658-20540-9, 2018.
Scaled = RobustNormalization(rnorm(1000, 2, 100), Capped = TRUE) hist(Scaled) m = cbind(c(1, 2, 3), c(2, 6, 4)) List = RobustNormalization(m, FALSE, FALSE, FALSE, TRUE) TransformedData = List$TransformedData mback = RobustNorm_BackTrafo(TransformedData, List$MinX, List$Denom, List$Center) sum(m - mback)
Scaled = RobustNormalization(rnorm(1000, 2, 100), Capped = TRUE) hist(Scaled) m = cbind(c(1, 2, 3), c(2, 6, 4)) List = RobustNormalization(m, FALSE, FALSE, FALSE, TRUE) TransformedData = List$TransformedData mback = RobustNorm_BackTrafo(TransformedData, List$MinX, List$Denom, List$Center) sum(m - mback)
ROC
ROC(Data, Cls, Names, Colors)
ROC(Data, Cls, Names, Colors)
Data |
[1:n, 1:d] numeric vector or matrix of scores to be evaluated with ROC. |
Cls |
[1:n] numeric vector with true classes. |
Names |
[1:d] character vector with names for scores. |
Colors |
[1:d] character vector with colores for scores. |
ROCit |
List of ROCit results for each score column in Data. |
Plot |
Plotly object. |
Quirin Stier
Data = runif(1000,0,1) Cls = sample(c(0,1), 1000, replace = TRUE) ROC(Data, Cls)
Data = runif(1000,0,1) Cls = sample(c(0,1), 1000, replace = TRUE) ROC(Data, Cls)
Draws ein Shepard Diagram (scatterplot of distances) with an two-dimensional PDE density estimation .
ShepardDensityScatter(InputDists, OutputDists, Plotter= "native", Type = "DDCAL", DensityEstimation="SDH", Marginals = FALSE, xlab='Input Distances', ylab='Output Distances',main='ProjectionMethod', sampleSize=500000)
ShepardDensityScatter(InputDists, OutputDists, Plotter= "native", Type = "DDCAL", DensityEstimation="SDH", Marginals = FALSE, xlab='Input Distances', ylab='Output Distances',main='ProjectionMethod', sampleSize=500000)
InputDists |
[1:n,1:n] with n cases of data in d variables/features: Matrix containing the distances of the inputspace. |
OutputDists |
[1:n,1:n] with n cases of data in d dimensionalites of the projection method variables/features: Matrix containing the distances of the outputspace. |
Plotter |
Optional, either |
Type |
Optional, either |
DensityEstimation |
Optional, use either |
Marginals |
Optional, either TRUE (draw Marginals) or FALSE (do not draw Marginals) |
xlab |
Label of the x axis in the resulting Plot. |
ylab |
Label of the y axis in the resulting Plot. |
main |
Title of the Shepard diagram |
sampleSize |
Optional, default(500000), reduces a.ount of data for density estimation, if too many distances given |
Introduced and described in [Thrun, 2018, p. 63] with examples in [Thrun, 2018, p. 71-72]
Michael Thrun
[Thrun, 2018] Thrun, M. C.: Projection Based Clustering through Self-Organization and Swarm Intelligence, doctoral dissertation 2017, Springer, ISBN: 978-3-658-20540-9, Heidelberg, 2018.
data("Lsun3D") Cls=Lsun3D$Cls Data=Lsun3D$Data InputDist=as.matrix(dist(Data)) res = stats::cmdscale(d = InputDist, k = 2, eig = TRUE, add = FALSE, x.ret = FALSE) ProjectedPoints = as.matrix(res$points) ShepardDensityScatter(InputDist,as.matrix(dist(ProjectedPoints)),main = 'MDS') ShepardDensityScatter(InputDist[1:100,1:100], as.matrix(dist(ProjectedPoints))[1:100,1:100],main = 'MDS')
data("Lsun3D") Cls=Lsun3D$Cls Data=Lsun3D$Data InputDist=as.matrix(dist(Data)) res = stats::cmdscale(d = InputDist, k = 2, eig = TRUE, add = FALSE, x.ret = FALSE) ProjectedPoints = as.matrix(res$points) ShepardDensityScatter(InputDist,as.matrix(dist(ProjectedPoints)),main = 'MDS') ShepardDensityScatter(InputDist[1:100,1:100], as.matrix(dist(ProjectedPoints))[1:100,1:100],main = 'MDS')
This function plots a Shepard diagram which is a scatter plot of InputDist and OutputDist
Sheparddiagram(InputDists, OutputDists, xlab = "Input Distances", ylab= "Output Distances", fancy = F, main = "ProjectionMethod", gPlot = ggplot())
Sheparddiagram(InputDists, OutputDists, xlab = "Input Distances", ylab= "Output Distances", fancy = F, main = "ProjectionMethod", gPlot = ggplot())
InputDists |
[1:n,1:n] with n cases of data in d variables/features: Matrix containing the distances of the inputspace. |
OutputDists |
[1:n,1:n] with n cases of data in d dimensionalites of the projection method variables/features: Matrix containing the distances of the outputspace. |
xlab |
Label of the x axis in the resulting Plot. |
ylab |
Label of the y axis in the resulting Plot. |
fancy |
Set FALSE for PC and TRUE for publication |
main |
Title of the Shepard diagram |
gPlot |
ggplot2 object to plot upon. |
ggplot2 object containing the plot.
Michael Thrun
data("Lsun3D") Cls=Lsun3D$Cls Data=Lsun3D$Data InputDist=as.matrix(dist(Data)) res = stats::cmdscale(d = InputDist, k = 2, eig = TRUE, add = FALSE, x.ret = FALSE) ProjectedPoints = as.matrix(res$points) Sheparddiagram(InputDist,as.matrix(dist(ProjectedPoints)),main = 'MDS')
data("Lsun3D") Cls=Lsun3D$Cls Data=Lsun3D$Data InputDist=as.matrix(dist(Data)) res = stats::cmdscale(d = InputDist, k = 2, eig = TRUE, add = FALSE, x.ret = FALSE) ProjectedPoints = as.matrix(res$points) Sheparddiagram(InputDist,as.matrix(dist(ProjectedPoints)),main = 'MDS')
Computes the Signed Log if Data
SignedLog(Data,Base="Ten")
SignedLog(Data,Base="Ten")
Data |
[1:n,1:d] Data matrix with n cases and d variables |
Base |
Either "Ten", "Two", "Zero", or any number. |
A neat transformation for data, it it has a better representation on the log scale.
Transformed Data
Number Selections for Base
for 2,10, "Two" or "Ten" add 1 to every datapoint as defined in the lectures.
Michael Thrun
Prof. Dr. habil. A. Ultsch, Lectures in Knowledge Discovery, 2014.
# sampling is done # because otherwise the example takes too long # in the CRAN check data('ITS') ind=sample(length(ITS),1000) MDplot(SignedLog(cbind(ITS[ind],MTY[ind])*(-1),Base = "Ten"))
# sampling is done # because otherwise the example takes too long # in the CRAN check data('ITS') ind=sample(length(ITS),1000) MDplot(SignedLog(cbind(ITS[ind],MTY[ind])*(-1),Base = "Ten"))
Silhouette plot of cluster silhouettes for the n-by-d data matrix Data or distance matrix where the clusters are defined in the vector Cls.
Silhouetteplot(DataOrDistances, Cls, method='euclidean', PlotIt=TRUE,...)
Silhouetteplot(DataOrDistances, Cls, method='euclidean', PlotIt=TRUE,...)
DataOrDistances |
[1:n,1:d] data cases in rows, variables in columns, if not symmetric or [1:n,1:n] distance matrix, if symmetric |
Cls |
numeric vector, [1:n,1] classified data |
method |
Optional if Datamatrix is used,
one of "euclidean", "maximum", "manhattan", "canberra", "binary" or "minkowski". Any unambiguous substring can be given, see |
PlotIt |
Optional, Default:TRUE, FALSE to supress the plot |
... |
If |
"The Silhouette plot is a common unsupervised index for visual evaluation of a clustering [L. R. Kaufman/Rousseeuw, 2005] [introduced in [Rousseeuw, 1987]]. A reasonable clustering is characterized by a silhouette width of greater than 0.5, and an average width below 0.2 should be interpreted as indicating a lack of any substantial cluster structure [Everitt et al., 2001, p. 105]. However, it is evident that silhouette scores assume clusters that are spherical or Gaussian in shape [Herrmann, 2011, pp. 91-92]" [Thrun, 2018, p. 29].
silh |
Silhouette values in a N-by-1 vector |
Onno Hansen-Goos, Michael Thrun
[Thrun, 2018] Thrun, M. C.: Projection Based Clustering through Self-Organization and Swarm Intelligence, doctoral dissertation 2017, Springer, ISBN: 978-3-658-20539-3, Heidelberg, 2018.
[Rousseeuw, 1987] Rousseeuw, Peter J.: Silhouettes: a Graphical Aid to the Interpretation and Validation of Cluster Analysis, Computational and Applied Mathematics, 20, p.53-65, 1987.
data("Lsun3D") Cls=Lsun3D$Cls Data=Lsun3D$Data #clear cluster structure plot(Data[,1:2],col=Cls) #However, the silhouette plot does not indicate a very good clustering in cluster 1 and 2 Silhouetteplot(Data,Cls = Cls,main='Silhouetteplot')
data("Lsun3D") Cls=Lsun3D$Cls Data=Lsun3D$Data #clear cluster structure plot(Data[,1:2],col=Cls) #However, the silhouette plot does not indicate a very good clustering in cluster 1 and 2 Silhouetteplot(Data,Cls = Cls,main='Silhouetteplot')
ABC analysis improved slope chart
Slopechart(FirstDatavector, SecondDatavector, Names, Labels, MaxNumberOfSlices, TopLabels=c('FirstDatavector','SecondDatavector'), main='Comparision of Descending Frequency')
Slopechart(FirstDatavector, SecondDatavector, Names, Labels, MaxNumberOfSlices, TopLabels=c('FirstDatavector','SecondDatavector'), main='Comparision of Descending Frequency')
FirstDatavector |
[1:n] a vector of n non unique values - a features |
SecondDatavector |
[1:m] a vector of n non unique values - a second feature |
Labels |
Optional, [1:k] Labels if they are specially named, if not Names are used. |
Names |
[1:k] names to search for in Datavector, if not set |
MaxNumberOfSlices |
Default is k, integer value defining how many labels will be shown. Everything else will be summed up to |
TopLabels |
Labels of of feature names |
main |
title of the plot |
still experimental.
silent output by calling invisible
of a list with
Percentages |
[1:k] percent values visualized in fanplot |
Labels |
[1:k] see input |
Michael Thrun
[Gohil, 2015] Gohil, Atmajitsinh. R data Visualization cookbook. Packt Publishing Ltd, 2015.
## will follow
## will follow
This function enables to replace the default density estimation for ggplot2 plots with the Pareto density estimation [Ultsch, 2005]. It is used for the PDE-Optimized violin plot published in [Thrun et al, 2018].
stat_pde_density(mapping = NULL, data = NULL, geom = "violin", position = "dodge", ..., trim = TRUE, scale = "area", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE)
stat_pde_density(mapping = NULL, data = NULL, geom = "violin", position = "dodge", ..., trim = TRUE, scale = "area", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE)
mapping |
Set of aesthetic mappings created by |
data |
The data to be displayed in this layer. There are three options: If A A |
geom |
The geometric object to use display the data |
position |
Position adjustment, either as a string, or the result of a call to a position adjustment function. |
... |
Other arguments passed on to |
trim |
This parameter only matters if you are displaying multiple densities in one plot. If 'FALSE', the default, each density is computed on the full range of the data. If 'TRUE', each density is computed over the range of that group: this typically means the estimated x values will not line-up, and hence you won't be able to stack density values. |
scale |
When used with geom_violin: if "area" (default), all violins have the same area (before trimming the tails). If "count", areas are scaled proportionally to the number of observations. If "width", all violins have the same maximum width. |
na.rm |
If |
show.legend |
logical. Should this layer be included in the legends?
|
inherit.aes |
If |
Pareto Density Estimation (PDE) is a method for the estimation of probability density functions using hyperspheres. The Pareto-radius of the hyperspheres is derived from the optimization of information for minimal set size. It is shown, that Pareto Density is the best estimate for clusters of Gaussian structure. The method is shown to be robust when cluster overlap and when the variances differ across clusters.
Felix Pape
Ultsch, A.: Pareto density estimation: A density estimation for knowledge discovery, in Baier, D.; Werrnecke, K. D., (Eds), Innovations in classification, data science, and information systems, Proc Gfkl 2003, pp 91-100, Springer, Berlin, 2005.
[Thrun et al, 2018] Thrun, M. C., Pape, F., & Ultsch, A. : Benchmarking Cluster Analysis Methods using PDE-Optimized Violin Plots, Proc. European Conference on Data Analysis (ECDA), accepted, Paderborn, Germany, 2018.
[ggplot2]stat_density
miris <- reshape2::melt(iris) ggplot2::ggplot(miris, mapping = ggplot2::aes(y = .data$value, x = .data$variable)) + ggplot2::geom_violin(stat = "PDEdensity")
miris <- reshape2::melt(iris) ggplot2::ggplot(miris, mapping = ggplot2::aes(y = .data$value, x = .data$variable)) + ggplot2::geom_violin(stat = "PDEdensity")
Density Estimation for ggplot with a clear model behind it.
The format is: Classes 'StatPDEdensity', 'Stat', 'ggproto' <ggproto object: Class StatPDEdensity, Stat> aesthetics: function compute_group: function compute_layer: function compute_panel: function default_aes: uneval extra_params: na.rm finish_layer: function non_missing_aes: parameters: function required_aes: x y retransform: TRUE setup_data: function setup_params: function super: <ggproto object: Class Stat>
PDE was published in [Ultsch, 2005], short explanation in [Thrun, Ultsch 2018] and the PDE optimized violin plot was published in [Thrun et al., 2018].
[Ultsch,2005] Ultsch, A.: Pareto density estimation: A density estimation for knowledge discovery, in Baier, D.; Werrnecke, K. D., (Eds), Innovations in classification, data science, and information systems, Proc Gfkl 2003, pp 91-100, Springer, Berlin, 2005.
[Thrun, Ultsch 2018] Thrun, M. C., & Ultsch, A. : Effects of the payout system of income taxes to municipalities in Germany, in Papiez, M. & Smiech,, S. (eds.), Proc. 12th Professor Aleksander Zelias International Conference on Modelling and Forecasting of Socio-Economic Phenomena, pp. 533-542, Cracow: Foundation of the Cracow University of Economics, Cracow, Poland, 2018.
[Thrun et al, 2018] Thrun, M. C., Pape, F., & Ultsch, A. : Benchmarking Cluster Analysis Methods using PDE-Optimized Violin Plots, Proc. European Conference on Data Analysis (ECDA), accepted, Paderborn, Germany, 2018.
Robust empirical estimation for standard deviation.NaNs are ignored.
Stdrobust(x, lowInnerPercentile=25,na.rm=TRUE)
Stdrobust(x, lowInnerPercentile=25,na.rm=TRUE)
x |
a numerical matrix |
lowInnerPercentile |
optional; default=25; standard deviation aproximated by percentilinterval. |
na.rm |
a boolean evaluating to TRUE or FALSE indicating whether all non finite values should be stripped before the computation proceeds. |
out |
a vector with the calculated standard deviation for the column |
Zornitsa Manolova
world_country_polygons shapefile
data("world_country_polygons")
data("world_country_polygons")
world_country_polygons stores data objects using classes defined in the sp package or inheriting from those classes updated to sp Y= 1.4 and rgdal >= 1.5.
Since DataVisualization Version 1.2.1 it stores now a CRS objects with a comment containing an WKT2 CRS representation, thanks to a suggestion of Roger Bivand.
Note that the rebuilt CRS object contains a revised version of the input Proj4 string as well as the WKT2 string, and may be used with both older and newer versions of sp. See maptools package for further details. Also note that since sp >= 2.0 maptools and rgdal were deprecated without change to the workflow. See terra for an alternative to maptools.
Hamza Tayyab, Michael Thrun
maptools package
maptools package
data(world_country_polygons) str(world_country_polygons)
data(world_country_polygons) str(world_country_polygons)
The Worldmap function is used in [Thrun, 2018].
Worldmap(CountryCodes, Cls, Colors, MissingCountryColor = grDevices::gray(0.8), ...)
Worldmap(CountryCodes, Cls, Colors, MissingCountryColor = grDevices::gray(0.8), ...)
CountryCodes |
[1:n] vector of characters identifying countries by ISO 3166 codes (2 or 3 letters) |
Cls |
[1:n] numerical vector of classification |
Colors |
optional, vector of charcters specifying the used colors |
MissingCountryColor |
if not all countries are specified in |
... |
Further arguments passed on to |
List of
Colors |
[1:m] colors used in map, m<=n |
CountryCodeList |
[1:m] countries found, m<=n |
world_country_polygons |
|
Michae Thrun
Used in
[Thrun, 2018] Thrun, M. C. : Cluster Analysis of the World Gross-Domestic Product Based on Emergent Self-Organization of a Swarm, 12th Professor Aleksander Zelias International Conference on Modelling and Forecasting of Socio-Economic Phenomena, Foundation of the Cracow University of Economics, Zakopane, Poland, accepted, 2018.
Source for shapefile: - package maptoops and
Originally 'mappinghacks.com/data/TM_WORLD_BORDERS_SIMPL-0.2.zip', now available from https://github.com/nasa/World-Wind-Java/tree/master/WorldWind/testData/shapefiles
# data from [Thrun, 2018] Cls=c(1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 3L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L ) Codes=c("AFG", "AGO", "ALB", "ARG", "ATG", "AUS", "AUT", "BDI", "BEL", "BEN", "BFA", "BGD", "BGR", "BHR", "BHS", "BLZ", "BMU", "BOL", "BRA", "BRB", "BRN", "BTN", "BWA", "CAF", "CAN", "CH2", "CHE", "CHL", "CHN", "CIV", "CMR", "COG", "COL", "COM", "CPV", "CRI", "CUB", "CYP", "DJI", "DMA", "DNK", "DOM", "DZA", "ECU", "EGY", "ESP", "ETH", "FIN", "FJI", "FRA", "FSM", "GAB", "GBR", "GER", "GHA", "GIN", "GMB", "GNB", "GNQ", "GRC", "GRD", "GTM", "GUY", "HKG", "HND", "HTI", "HUN", "IDN", "IND", "IRL", "IRN", "IRQ", "ISL", "ISR", "ITA", "JAM", "JOR", "JPN", "KEN", "KHM", "KIR", "KNA", "KOR", "LAO", "LBN", "LBR", "LCA", "LKA", "LSO", "LUX", "MAC", "MAR", "MDG", "MDV", "MEX", "MHL", "MLI", "MLT", "MNG", "MOZ", "MRT", "MUS", "MWI", "MYS", "NAM", "NER", "NGA", "NIC", "NLD", "NOR", "NPL", "NZL", "OMN", "PAK", "PAN", "PER", "PHL", "PLW", "PNG", "POL", "PRI", "PRT", "PRY", "ROM", "RWA", "SDN", "SEN", "SGP", "SLB", "SLE", "SLV", "SOM", "STP", "SUR", "SWE", "SWZ", "SYC", "SYR", "TCD", "TGO", "THA", "TON", "TTO", "TUN", "TUR", "TWN", "TZA", "UGA", "URY", "USA", "VCT", "VEN", "VNM", "VUT", "WSM", "ZAF", "ZAR", "ZMB", "ZWE") Worldmap(Codes,Cls)
# data from [Thrun, 2018] Cls=c(1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 3L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L ) Codes=c("AFG", "AGO", "ALB", "ARG", "ATG", "AUS", "AUT", "BDI", "BEL", "BEN", "BFA", "BGD", "BGR", "BHR", "BHS", "BLZ", "BMU", "BOL", "BRA", "BRB", "BRN", "BTN", "BWA", "CAF", "CAN", "CH2", "CHE", "CHL", "CHN", "CIV", "CMR", "COG", "COL", "COM", "CPV", "CRI", "CUB", "CYP", "DJI", "DMA", "DNK", "DOM", "DZA", "ECU", "EGY", "ESP", "ETH", "FIN", "FJI", "FRA", "FSM", "GAB", "GBR", "GER", "GHA", "GIN", "GMB", "GNB", "GNQ", "GRC", "GRD", "GTM", "GUY", "HKG", "HND", "HTI", "HUN", "IDN", "IND", "IRL", "IRN", "IRQ", "ISL", "ISR", "ITA", "JAM", "JOR", "JPN", "KEN", "KHM", "KIR", "KNA", "KOR", "LAO", "LBN", "LBR", "LCA", "LKA", "LSO", "LUX", "MAC", "MAR", "MDG", "MDV", "MEX", "MHL", "MLI", "MLT", "MNG", "MOZ", "MRT", "MUS", "MWI", "MYS", "NAM", "NER", "NGA", "NIC", "NLD", "NOR", "NPL", "NZL", "OMN", "PAK", "PAN", "PER", "PHL", "PLW", "PNG", "POL", "PRI", "PRT", "PRY", "ROM", "RWA", "SDN", "SEN", "SGP", "SLB", "SLE", "SLV", "SOM", "STP", "SUR", "SWE", "SWZ", "SYC", "SYR", "TCD", "TGO", "THA", "TON", "TTO", "TUN", "TUR", "TWN", "TZA", "UGA", "URY", "USA", "VCT", "VEN", "VNM", "VUT", "WSM", "ZAF", "ZAR", "ZMB", "ZWE") Worldmap(Codes,Cls)
Plots z above xy plane as 3D mountain or 2D contourlines
zplot(x, y, z, DrawTopView = TRUE, NrOfContourLines = 20, TwoDplotter = "native", xlim, ylim)
zplot(x, y, z, DrawTopView = TRUE, NrOfContourLines = 20, TwoDplotter = "native", xlim, ylim)
x |
Vector of x-coordinates of the data. If y and z are missing: Matrix containing 3 rows, one for each coordinate |
y |
Vector of y-coordinates of the data. |
z |
Vector of z-coordinates of the data. |
DrawTopView |
Optional: Boolean, if true plot contours otherwise a 3D plot. Default: True |
NrOfContourLines |
Optional: Numeric. Only used when DrawTopView == True. Number of lines to be drawn in 2D contour plots. Default: 20 |
TwoDplotter |
Optional: String indicating which backend to use for plotting. Possible Values: 'ggplot', 'native', 'plotly' |
xlim |
[1:2] scalar vector setting the limits of x-axis |
ylim |
[1:2] scalar vector setting the limits of y-axis |
If the plotting backend does support it, this will return a handle for the generated plot.
Felix pape