MerQur'da Kümeleme ve Boyut İndirgeme: K-Means'ten UMAP'a Denetimsiz Öğrenme

Ömer K. Örücü

doi:10.53463/merqur.20260450

Yazarlar

Ömer K. Örücü Süleyman Demirel Üniversitesi Mimarlık Fakültesi Peyzaj Mimarlığı Bölümü Yazar

DOI:

https://doi.org/10.53463/merqur.20260450

Anahtar Kelimeler:

kümeleme- K-Means- DBSCAN- hiyerarşik- PCA- t-SNE

Özet

Denetimsiz öğrenme yöntemleri — kümeleme ve boyut indirgeme — etiketsiz verilerden yapı çıkarmanın iki ana yoludur. Kümeleme benzer gözlemleri gruplara ayırırken, boyut indirgeme yüksek boyutlu verinin az boyutlu (genellikle 2D veya 3D) bir temsile dönüştürülmesini sağlar. Bu çalışmada MerQur masaüstü yazılımının Kümeleme kategorisinde sunulan 7 analiz ayrıntılı olarak tanıtılmıştır: K-Means Kümeleme, Hiyerarşik Kümeleme, DBSCAN, Temel Bileşenler Analizi (PCA), t-Distributed Stochastic Neighbour Embedding (t-SNE), Multidimensional Scaling (MDS) ve Uniform Manifold Approximation and Projection (UMAP). Her analiz için (i) yöntemin temeli ve denetimsiz öğrenmedeki yeri, (ii) hiperparametreler ve seçim stratejileri (K seçimi: elbow / silhouette / gap statistic; DBSCAN için ε ve min_samples; PCA için bileşen sayısı; UMAP için komşu sayısı), (iii) MerQur’daki form alanları ve seçenekler, (iv) raporlanan istatistikler ve görselleştirme çıktıları (silhouette grafiği, dendrogram, biplot, 2D embedding), ve (v) tipik bir araştırma sorusu için yorumlama önerisi sunulmuştur. Geometrik mesafe-tabanlı yöntemler (K-Means, Hiyerarşik) ile yoğunluk-tabanlı yaklaşımlar (DBSCAN) arasındaki ayrım; doğrusal (PCA, MDS) ile doğrusal olmayan boyut indirgemeler (t-SNE, UMAP) arasındaki tasarım farklılıkları tartışılmıştır. Sonuç olarak MerQur’un Kümeleme kategorisi, klasik istatistiksel boyut indirgemeden modern manifold öğrenmesine, geometrik gruplama yöntemlerinden yoğunluk-tabanlı aykırı tespiti edebilen kümeleme algoritmalarına uzanan kapsamı tek bir grafik arayüzde toplamaktadır.

Referanslar

Ester, M., Kriegel, H.-P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD-96) (pp. 226–231). AAAI Press.

Halkidi, M., Batistakis, Y., & Vazirgiannis, M. (2001). On clustering validation techniques. Journal of Intelligent Information Systems, 17, 107–145. https://doi.org/10.1023/A:1012801612483

Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning (2nd ed.). Springer.

Jolliffe, I. T., & Cadima, J. (2016). Principal component analysis: A review and recent developments. Philosophical Transactions of the Royal Society A, 374(2065), 20150202. https://doi.org/10.1098/rsta.2015.0202

Kaufman, L., & Rousseeuw, P. J. (1990). Finding groups in data: An introduction to cluster analysis. Wiley.

MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability (Vol. 1, pp. 281–297). University of California Press.

McInnes, L., Healy, J., & Melville, J. (2018). UMAP: Uniform manifold approximation and projection for dimension reduction. Journal of Open Source Software, 3(29), 861. https://doi.org/10.21105/joss.00861

Pearson, K. (1901). On lines and planes of closest fit to systems of points in space. Philosophical Magazine, 2(11), 559–572. https://doi.org/10.1080/14786440109462720

Rousseeuw, P. J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53–65. https://doi.org/10.1016/0377-0427(87)90125-7

Schubert, E., Sander, J., Ester, M., Kriegel, H. P., & Xu, X. (2017). DBSCAN revisited, revisited: Why and how you should (still) use DBSCAN. ACM Transactions on Database Systems, 42(3), 1–21. https://doi.org/10.1145/3068335

Tibshirani, R., Walther, G., & Hastie, T. (2001). Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B, 63(2), 411–423. https://doi.org/10.1111/1467-9868.00293

Torgerson, W. S. (1952). Multidimensional scaling: I. Theory and method. Psychometrika, 17(4), 401–419. https://doi.org/10.1007/BF02288916

van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9, 2579–2605.

Ward, J. H. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58(301), 236–244. https://doi.org/10.1080/01621459.1963.10500845

MerQur'da Kümeleme ve Boyut İndirgeme: K-Means'ten UMAP'a Denetimsiz Öğrenme

Yazarlar

DOI:

Anahtar Kelimeler:

Özet

Referanslar

Yayınlandı

Sayı

Bölüm

Lisans

Bilgi

Dil