Clustering and Dimensionality Reduction in MerQur: Unsupervised Learning from K-Means to UMAP
DOI:
https://doi.org/10.53463/merqur.20260450Keywords:
clustering, K-Means, DBSCAN, hierarchical, PCA, t-SNEAbstract
Unsupervised learning methods — clustering and dimensionality reduction — are the two main ways of extracting structure from unlabelled data. Clustering groups similar observations, while dimensionality reduction transforms high-dimensional data into a low-dimensional (typically 2D or 3D) representation. This study introduces in detail the 7 analyses offered in MerQur’s Clustering category: K-Means Clustering, Hierarchical Clustering, DBSCAN, Principal Component Analysis (PCA), t-Distributed Stochastic Neighbour Embedding (t-SNE), Multidimensional Scaling (MDS), and Uniform Manifold Approximation and Projection (UMAP). For each: (i) the basis of the method and its place in unsupervised learning, (ii) hyperparameters and selection strategies, (iii) form fields and options in MerQur, (iv) reported statistics and visualisation outputs, and (v) interpretation guidance for a typical research question. The distinction between geometric distance-based methods (K-Means, Hierarchical) and density-based approaches (DBSCAN); the design differences between linear (PCA, MDS) and non-linear dimensionality reductions (t-SNE, UMAP) are discussed. Overall, MerQur’s Clustering category brings together within a single graphical interface a spectrum spanning classical statistical dimensionality reduction, modern manifold learning, geometric grouping methods, and density-based clustering algorithms capable of outlier detection.
References
Ester, M., Kriegel, H.-P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD-96) (pp. 226–231). AAAI Press.
Halkidi, M., Batistakis, Y., & Vazirgiannis, M. (2001). On clustering validation techniques. Journal of Intelligent Information Systems, 17, 107–145. https://doi.org/10.1023/A:1012801612483
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning (2nd ed.). Springer.
Jolliffe, I. T., & Cadima, J. (2016). Principal component analysis: A review and recent developments. Philosophical Transactions of the Royal Society A, 374(2065), 20150202. https://doi.org/10.1098/rsta.2015.0202
Kaufman, L., & Rousseeuw, P. J. (1990). Finding groups in data: An introduction to cluster analysis. Wiley.
MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability (Vol. 1, pp. 281–297). University of California Press.
McInnes, L., Healy, J., & Melville, J. (2018). UMAP: Uniform manifold approximation and projection for dimension reduction. Journal of Open Source Software, 3(29), 861. https://doi.org/10.21105/joss.00861
Pearson, K. (1901). On lines and planes of closest fit to systems of points in space. Philosophical Magazine, 2(11), 559–572. https://doi.org/10.1080/14786440109462720
Rousseeuw, P. J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53–65. https://doi.org/10.1016/0377-0427(87)90125-7
Schubert, E., Sander, J., Ester, M., Kriegel, H. P., & Xu, X. (2017). DBSCAN revisited, revisited: Why and how you should (still) use DBSCAN. ACM Transactions on Database Systems, 42(3), 1–21. https://doi.org/10.1145/3068335
Tibshirani, R., Walther, G., & Hastie, T. (2001). Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B, 63(2), 411–423. https://doi.org/10.1111/1467-9868.00293
Torgerson, W. S. (1952). Multidimensional scaling: I. Theory and method. Psychometrika, 17(4), 401–419. https://doi.org/10.1007/BF02288916
van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9, 2579–2605.
Ward, J. H. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58(301), 236–244. https://doi.org/10.1080/01621459.1963.10500845
Downloads
Published
Issue
Section
License
Copyright (c) 2026 MerQur

This work is licensed under a Creative Commons Attribution 4.0 International License.
This article is published under a Creative Commons Attribution 4.0 International License (CC-BY 4.0). Under this license you may:
- Share: Copy and redistribute the material in any medium or format.
- Adapt: Remix, transform and build upon the material for any purpose, including commercial use.
- Attribution: You must give appropriate credit, provide a link to the license, and indicate if changes were made.