Title: | Quality Measurements for Dimensionality Reduction |
---|---|
Description: | Several quality measurements for investigating the performance of dimensionality reduction methods are provided here. In addition a new quality measurement called Gabriel classification error is made accessible. |
Authors: | Quirin Stier [aut], Florian Lerch [ctb], Julian Märte [aut], Hermann Tafo [ctb], Laukert Schlichting [ctb], Michael Thrun [aut, cph, cre] |
Maintainer: | Michael Thrun <[email protected]> |
License: | GPL-3 |
Version: | 0.2.1 |
Built: | 2024-10-09 05:00:53 UTC |
Source: | https://github.com/mthrun/drquality |
Compares projected points to a given prior classification using knn classifier.
ClassificationError(OutputDistances,Cls,k=5)
ClassificationError(OutputDistances,Cls,k=5)
OutputDistances |
[1:n,1:n] |
Cls |
[1:n] |
k |
number of k nearest neighbors, in Venna 2010 set to 5 (here default) |
Projected points are evaluated by k-nearest neighbor classification accuracy (with k = 5), that is, each sample in the visualization is classified by majority vote of its k nearest neighbors in the visualization, and the classification is compared to the ground truth label. [Venna 2010].
Error |
Classification Error: 1-Accuracy[1] |
Accuracy |
Accuracy |
KNNCls |
[1:n]] cls of knn classifier |
Here, the Outputdistances of the Projected points are used.
Michael Thrun
Venna, J., Peltonen, J., Nybo, K., Aidos, H., and Kaski, S. Information retrieval perspective to nonlinear dimensionality reduction for data visualization. The Journal of Machine Learning Research, 11, 451-490. (2010)
Gracia, A., Gonzalez, S., Robles, V., and Menasalvas, E. A methodology to compare Dimensionality Reduction algorithms in terms of loss of quality. Information Sciences, 270, 1-27. (2014)
if(requireNamespace("FCPS")){ data(Hepta,package="FCPS") projection=cmdscale(dist(Hepta$Data), k=2) ClassificationError(as.matrix(dist(projection)),Hepta$Cls) }
if(requireNamespace("FCPS")){ data(Hepta,package="FCPS") projection=cmdscale(dist(Hepta$Data), k=2) ClassificationError(as.matrix(dist(projection)),Hepta$Cls) }
Calculate the C-Measure subtypes of minimal path length and minimal wiring
Data |
[1:n,1:d] numerical matrix of points in input space. |
Projection |
[1:n,1:2] numerical matrix of points in output space. |
k |
Number of nearest neighbors, both measures set it always to k=1. |
[[1:2] Numerical vector of MinimalPathlength and MinimalWiring values.
Michael Thrun
if(requireNamespace("FCPS")){ data(Hepta,package="FCPS") projection=cmdscale(dist(Hepta$Data), k=2) Cmeasure(Hepta$Data,projection) }
if(requireNamespace("FCPS")){ data(Hepta,package="FCPS") projection=cmdscale(dist(Hepta$Data), k=2) Cmeasure(Hepta$Data,projection) }
GCE searches for the k-nearest neighbors of the first gabriel neighbors weighted by the Euclidean Distances of the Inputspace [Thrun et al, 2023]. GCE evaluates these neighbors in the Output space. A low value indicates a better two-dimensional projection of the high-dimensional Input space.
GabrielClassificationError(Data,ProjectedPoints,Cls,LC, PlotIt=FALSE,Plotter = "native", Colors = NULL,LineColor= 'grey', main = "Name of Projection", mainSize = 24,xlab = "X", ylab = "Y", xlim, ylim, pch,lwd,Margin=list(t=50,r=0,l=0,b=0))
GabrielClassificationError(Data,ProjectedPoints,Cls,LC, PlotIt=FALSE,Plotter = "native", Colors = NULL,LineColor= 'grey', main = "Name of Projection", mainSize = 24,xlab = "X", ylab = "Y", xlim, ylim, pch,lwd,Margin=list(t=50,r=0,l=0,b=0))
Data |
[1:n,1:d] Numeric matrix with n cases and d variables |
ProjectedPoints |
[1:n,1:2] Numeric matrix with 2D points in cartesian coordinates |
Cls |
[1:n] Numeric vector with class labels |
LC |
Optional, Numeric vector of two values determining grid size of the underlying projection |
PlotIt |
Optional, Boolean: TRUE/FALSE => Plot/Do not plot (Default: FALSE) |
Plotter |
Optional, Character with plot technique (native or plotly) |
Colors |
Optional, Character vector of class colors for points |
LineColor |
Optional, Character of line color used for edges of graph |
main |
Optional, Character plot title |
mainSize |
Optional, Numeric size of plot title |
xlab |
Optional, Character name of x ax |
ylab |
Optional, Character name of y ax |
xlim |
Optional, Numeric vector with two values defining x ax range |
ylim |
Optional, Numeric vector with two values defining y ax range |
pch |
Optional, Numeric of point size (graphic parameter) |
lwd |
Optional, Numeric of linewidth (graphic parameter) |
Margin |
Optional, Margin of plotly plot |
Gabriek classification error (GCE) makes an unbiased evaluation of distance and densitiybased structure which ma be even non-linear seperable. First, GCE utilizes the information provided by a prior classification to assess projected structures. Second, GCE applies the insights drawn from graph theory. Details are described in [Thrun et al, 2023]
list of
GCE |
GabrielClassificationError NOTE the rest is just for development purposes |
GCEperPoint |
[1:n] unnormalized GCE of each point: GCE = mean(GCEperPoint) |
nn |
the number of points in a relevant neghborhood: 0.5 * 85percentile(AnzNN) |
AnzNN |
[1:n] the number of points with a gabriel graph neighborhood |
NNdists |
[1:n,1:nn] the distances within the relevant neighborhood, 0 for inner cluster distances |
HD |
[1:nn] HD = HarmonicDecay(nn) i.e weight function for the NNdists: GCEperPoint = HD*NNdists |
IsInterDistance |
Distances to the nn closest neighbors |
GabrielDists |
Distance matrix implied by high dimensional distances and the underlying gabriel (Gabriel) graph |
ProjectionGraphError |
Plotly object in case, plotly is chose |
Michael Thrun, Quirin Stier, Julian Märte
[Thrun et al, 2023] Thrun, M.C, Märte, J., Stier, Q.: Analyzing Quality Measurements for Dimensionality Reduction, Machine Learning and Knowledge Extraction (MAKE), Vol 5., accepted, 2023.
if(requireNamespace("FCPS")){ data(Hepta,package="FCPS") projection=cmdscale(dist(Hepta$Data), k=2) GabrielClassificationError(Hepta$Data,projection,Hepta$Cls)$GCE } if(requireNamespace("FCPS")){ data(Hepta,package="FCPS") projection=cmdscale(dist(Hepta$Data), k=2) GabrielClassificationError(Hepta$Data,projection,Hepta$Cls)$GCE }
if(requireNamespace("FCPS")){ data(Hepta,package="FCPS") projection=cmdscale(dist(Hepta$Data), k=2) GabrielClassificationError(Hepta$Data,projection,Hepta$Cls)$GCE } if(requireNamespace("FCPS")){ data(Hepta,package="FCPS") projection=cmdscale(dist(Hepta$Data), k=2) GabrielClassificationError(Hepta$Data,projection,Hepta$Cls)$GCE }
Calculates the statistical correlation by Kendall. Basically a wrapper to pcaPP::cor.fk.
KendallsTau(InputDists, OutputDists)
KendallsTau(InputDists, OutputDists)
InputDists |
Matrix containing the distances of the first dataset. |
OutputDists |
Matrix containing the distances of the second dataset. |
Equivalent to cor.fk
Michael Thrun
if(requireNamespace("FCPS")){ data(Hepta,package="FCPS") InputDist=dist(Hepta$Data) projection=cmdscale(InputDist, k=2) KendallsTau(as.matrix(InputDist),as.matrix(dist(projection))) }
if(requireNamespace("FCPS")){ data(Hepta,package="FCPS") InputDist=dist(Hepta$Data) projection=cmdscale(InputDist, k=2) KendallsTau(as.matrix(InputDist),as.matrix(dist(projection))) }
Compares the projection in pData with the original data in Data and calculates trustworthiness and continuity of the projection for neighborhood sizes ranging from 1 to the size of the neighborhood.
plotMeasureRAAR(Raar, label = 'ProjectionMethod', gPlotList = list(RAARplot = ggplot2::ggplot()), LineType="solid", Shape = 16, PointsPerE = 10, fancy = FALSE)
plotMeasureRAAR(Raar, label = 'ProjectionMethod', gPlotList = list(RAARplot = ggplot2::ggplot()), LineType="solid", Shape = 16, PointsPerE = 10, fancy = FALSE)
Raar |
Output of RAAR() applied for a projection method. |
label |
Title of plot. |
gPlotList |
Settings for ggplot. |
LineType |
Character - graphic parameter: Line type of ggplot. |
Shape |
Integer: type of point |
PointsPerE |
Numeric graphic parameter: Distance between markers on plot line |
fancy |
Boolean graphic parameter: Some automatic settings for a more appealing plot. |
ggplot object
Michael Thrun
Compares the projection in pData with the original data in Data and calculates trustworthiness and continuity of the projection for neighborhood sizes ranging from 1 to the size of the neighborhood.
plotMeasureTundD(TDmatrix, label = 'ProjectionMethod', gPlotList = list(TW = ggplot2::ggplot(), DC = ggplot2::ggplot()), LineType = "solid", Shape = 16, PointsPerE = 16)
plotMeasureTundD(TDmatrix, label = 'ProjectionMethod', gPlotList = list(TW = ggplot2::ggplot(), DC = ggplot2::ggplot()), LineType = "solid", Shape = 16, PointsPerE = 16)
TDmatrix |
Output of MeasureTundD() applied for a projection method. |
label |
Title of plot. |
gPlotList |
Settings for ggplot. |
LineType |
Character - graphic parameter: Line type of ggplot. |
Shape |
Integer: type of point |
PointsPerE |
Numeric graphic parameter: Distance between markers on plot line |
ggplot object
Michael Thrun
Rescaled average agreement rate deduced by the co-ranking matrix from LCMC.
RAAR(Data, ProjectedPoints, kmax = nrow(Data) - 2, PlotIt = T)
RAAR(Data, ProjectedPoints, kmax = nrow(Data) - 2, PlotIt = T)
Data |
Matrix containing n cases in rows, d variables in columns or a distance matrix which in this case has to be symmetric |
ProjectedPoints |
n by OutputDimension matrix containing coordinates of the Projection |
kmax |
maximum of intervall 1:kmax of k nearest neighbors |
PlotIt |
Optional: Should the output be plottet. Default: TRUE |
A list containing:
Raar |
Rescaled average agreement rate |
Aar |
Average agreement rate |
Michael Thrun
Lee, J. A., Peluffo-Ordonez, D. H., & Verleysen, M. Multiscale stochastic neighbor embedding: Towards parameter-free dimensionality reduction. Paper presented at the Proceedings of 22st European Symposium on Artificial Neural Networks, Computational Intelligence And Machine Learning (ESANN) (2014).
Calculates the error of a projection with spearman's rank correlation coefficient.
VectorOfInputDists(1:n2) |
dissimilarities in Input Space between the n data points in vector form as produced by squareform(Dists(1:n,1:n)) |
VectorOfOutputDists(1:n2) |
dissimilarities in Input Space between the n data points in vector form as produced by squareform(Dists(1:n,1:n)) |
rho rank correlation coefficient
Florian Lerch
Calculates the error of a projection with spearman's rank correlation coefficient
SpearmansRho(InputDists, OutputDists)
SpearmansRho(InputDists, OutputDists)
InputDists |
[1:d,1:d] numeric matrix with input distances |
OutputDists |
[1:d,1:d] numeric matrix with output distances |
rho
Julian Märte
Calculates the Topololgical Correlation
TopologicalCorrelation(Data,ProjectedPoints,type='norm',method,Kn=0)
TopologicalCorrelation(Data,ProjectedPoints,type='norm',method,Kn=0)
Data |
a matrix of the given n-dim. points: the rows represent the points and the columns represent the coordinates in the n-dim. space. |
ProjectedPoints |
matrix of Projected Points, if missing, method should be set! |
method |
Determines whether the selected projections method for a given set of n-Dim. points is a good choice. Therefor, a result of 1 means the seleceted projections method is good, and a result value of 0 means that the Visualization of the given Data in the two dim. space doesnt fit for the problem. |
type |
How the paths in the adjacencematrix should be weighted, norm representes path lenthgs of 1 and eucldidean represents the distance in the euclidean metric. |
Kn |
k nearest neighbours in the graph. only needed in method is isomap and LocallyLinearEmbedding |
TC value
Hermann Tafo, Laukert Schlichting 07/2015
if(requireNamespace("FCPS")){ data(Hepta,package="FCPS") projection=cmdscale(dist(Hepta$Data), k=2) TopologicalCorrelation(Hepta$Data,projection) }
if(requireNamespace("FCPS")){ data(Hepta,package="FCPS") projection=cmdscale(dist(Hepta$Data), k=2) TopologicalCorrelation(Hepta$Data,projection) }
A generalized version of the zrehen-measure which defines the neighbourhood by gabrielgraph and is therefore not restricted to grid-based projections.
Data |
[1:n,1:d] points in input room with d attributes |
Projection |
[1:n,1:2] projected points in output room, with index,x,y or index,line,column |
width |
only necessary if toroid |
height |
only necessary if toroid |
isToroid |
are the points toroid? |
isGrid |
is the grid a toroid? |
plotGabriel |
plot the generated GabrielGraph |
List with
V$zrehen |
the raw zrehen measure |
V$normedzrehen |
the zrehen measure normed by the number of neighbours |
v$neighbourcounter |
the number of possible neighbours by which the zrehen measure is normed |
Florian Lerch 07/2015
if(requireNamespace("FCPS")){ data(Hepta,package="FCPS") projection=cmdscale(dist(Hepta$Data), k=2) ZrehenMeasure4All(Hepta$Data,projection)$zrehen }
if(requireNamespace("FCPS")){ data(Hepta,package="FCPS") projection=cmdscale(dist(Hepta$Data), k=2) ZrehenMeasure4All(Hepta$Data,projection)$zrehen }