Clustering and Dimensionality Reduction in MerQur: Unsupervised Learning from K-Means to UMAP

Authors

  • Ömer K. Örücü Suleyman Demirel University Faculty of Architecture Department of Landscape Architecture Author

DOI:

https://doi.org/10.53463/merqur.20260450

Keywords:

clustering, K-Means, DBSCAN, hierarchical, PCA, t-SNE

Abstract

Unsupervised learning methods — clustering and dimensionality reduction — are the two main ways of extracting structure from unlabelled data. Clustering groups similar observations, while dimensionality reduction transforms high-dimensional data into a low-dimensional (typically 2D or 3D) representation. This study introduces in detail the 7 analyses offered in MerQur’s Clustering category: K-Means Clustering, Hierarchical Clustering, DBSCAN, Principal Component Analysis (PCA), t-Distributed Stochastic Neighbour Embedding (t-SNE), Multidimensional Scaling (MDS), and Uniform Manifold Approximation and Projection (UMAP). For each: (i) the basis of the method and its place in unsupervised learning, (ii) hyperparameters and selection strategies, (iii) form fields and options in MerQur, (iv) reported statistics and visualisation outputs, and (v) interpretation guidance for a typical research question. The distinction between geometric distance-based methods (K-Means, Hierarchical) and density-based approaches (DBSCAN); the design differences between linear (PCA, MDS) and non-linear dimensionality reductions (t-SNE, UMAP) are discussed. Overall, MerQur’s Clustering category brings together within a single graphical interface a spectrum spanning classical statistical dimensionality reduction, modern manifold learning, geometric grouping methods, and density-based clustering algorithms capable of outlier detection.

References

Ester, M., Kriegel, H.-P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD-96) (pp. 226–231). AAAI Press.

Halkidi, M., Batistakis, Y., & Vazirgiannis, M. (2001). On clustering validation techniques. Journal of Intelligent Information Systems, 17, 107–145. https://doi.org/10.1023/A:1012801612483

Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning (2nd ed.). Springer.

Jolliffe, I. T., & Cadima, J. (2016). Principal component analysis: A review and recent developments. Philosophical Transactions of the Royal Society A, 374(2065), 20150202. https://doi.org/10.1098/rsta.2015.0202

Kaufman, L., & Rousseeuw, P. J. (1990). Finding groups in data: An introduction to cluster analysis. Wiley.

MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability (Vol. 1, pp. 281–297). University of California Press.

McInnes, L., Healy, J., & Melville, J. (2018). UMAP: Uniform manifold approximation and projection for dimension reduction. Journal of Open Source Software, 3(29), 861. https://doi.org/10.21105/joss.00861

Pearson, K. (1901). On lines and planes of closest fit to systems of points in space. Philosophical Magazine, 2(11), 559–572. https://doi.org/10.1080/14786440109462720

Rousseeuw, P. J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53–65. https://doi.org/10.1016/0377-0427(87)90125-7

Schubert, E., Sander, J., Ester, M., Kriegel, H. P., & Xu, X. (2017). DBSCAN revisited, revisited: Why and how you should (still) use DBSCAN. ACM Transactions on Database Systems, 42(3), 1–21. https://doi.org/10.1145/3068335

Tibshirani, R., Walther, G., & Hastie, T. (2001). Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B, 63(2), 411–423. https://doi.org/10.1111/1467-9868.00293

Torgerson, W. S. (1952). Multidimensional scaling: I. Theory and method. Psychometrika, 17(4), 401–419. https://doi.org/10.1007/BF02288916

van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9, 2579–2605.

Ward, J. H. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58(301), 236–244. https://doi.org/10.1080/01621459.1963.10500845

Downloads

Published

2026-05-18

Issue

Section

Editorial