Analisis Topik Dominan Dalam Paper Ilmu Komputer Menggunakan TF-IDF Dan K-Means
DOI:
https://doi.org/10.58369/biit.v3i3.122Keywords:
Analisis Topik, Davies-Bouldin Index, K-Means, Penambangan Teks, Silhouette Score, TF-IDFAbstract
The rapid growth of scientific publications in the field of computer science has created a need to understand the distribution and trends of emerging research topics. This study aims to identify and analyze dominant topics in computer science literature using a text mining approach based on Term Frequency–Inverse Document Frequency (TF-IDF) vectorization and the K-Means clustering algorithm. A total of 1,222 publication titles from Semantic Scholar (2020–2025) were processed through language normalization, text preprocessing, TF-IDF feature extraction, optimal cluster determination, and cluster quality evaluation using Silhouette Score and Davies-Bouldin Index (DBI). The results reveal that topics such as cybersecurity, artificial intelligence, and machine learning are the most prevalent. While some clusters show good internal cohesion, the overall evaluation yielded a Silhouette Score of 0.0585 and a DBI of 4.387, indicating overlapping topics and limited cluster separation. These findings suggest that although the TF-IDF and K-Means approach can highlight general topic trends, it has limitations in capturing semantic context. Future research is encouraged to explore more contextual representation and clustering techniques to improve topic analysis quality.
References
N. W. Utami and I. G. J. Eka Putra, “Text Minig Clustering Untuk Pengelompokan Topik Dokumen Penelitian Menggunakan Algoritma K-Means Dengan Cosine Similarity,” J. Inform. Teknol. dan Sains, vol. 4, no. 3, pp. 255–259, 2022, doi: 10.51401/jinteks.v4i3.1907.
M. A. Haq, W. Purnomo, and N. Y. Setiawan, “Analisis Clustering Topik Survey menggunakan Algoritme K-Means (Studi Kasus: Kudata),” … Teknol. Inf. dan Ilmu …, vol. 7, no. 7, pp. 3498–3506, 2023, [Online]. Available: https://j-ptiik.ub.ac.id/index.php/j-ptiik/article/view/13147%0Ahttps://j-ptiik.ub.ac.id/index.php/j-ptiik/article/download/13147/5928
D. A. C. Rachman, R. Goejantoro, and F. D. T. Amijaya, “Implementasi Text Mining Pengelompokkan Dokumen Skripsi Menggunakan Metode K-Means Clustering,” Eksponensial, vol. 11, no. 2, p. 167, 2020, doi: 10.30872/eksponensial.v11i2.660.
I. A. Mashudi, S. N. Arief, D. S. E.I., T. Fatmawati, M. Hani’ah, and I. T. Alfarid, “Klasterisasi Jawaban Uraian Mahasiswa Menggunakan TF-IDF dan K-Means untuk Membantu Koreksi Ujian,” J. Media Inform. Budidarma, vol. 7, no. 4, p. 2159, 2023, doi: 10.30865/mib.v7i4.6688.
I. Widaningrum, D. Mustikasari, R. Arifin, S. L. Tsaqila, and D. Fatmawati, “Algoritma Term Frequency-Inverse Document Frequency (TF-IDF) dan K-Means Clustering Untuk Menentukan Kategori Dokumen,” Pros. Semin. Nas. Sist. Inf. dan Teknol., pp. 145–149, 2022.
K. Clastering, D. Remawati, H. Wijayanto, Y. Retno, W. Utami, and B. D. Raharja, “Pengelompokkan Film Trending di Youtube Menggunakan TF-IDF dan,” vol. 4, pp. 65–74, 2025.
I. M. A. Purniawan, G. M. A. Sasmita, and I. P. A. E. Pratama, “Clustering Berita Menggunakan Algoritma Tf-Idf Dan K-Means Dengan Memanfaatkan Sumber Data Crawling Pada Situs Detik.Com,” JITTER- J. Ilm. Teknol. dan Komput., vol. 3, no. 1, pp. 821–830, 2022.
R. Maulana and S. Adinugroho, “Ekstraksi Topik Dokumen Berita Menggunakan Term-Cluster Weighting dan Clustering Large Application (CLARA),” vol. 3, no. 11, pp. 10623–10629, 2019, [Online]. Available: http://j-ptiik.ub.ac.id
D. K. Wardy, I. K. G. D. Putra, and N. K. D. Rusjayanthi, “Clustering Artikel pada Portal Berita Online,” JITTER- J. Ilm. Teknol. dan Komput., vol. 3, no. 1, pp. 3–11, 2022.
H. T. A. Simanjuntak, P. E. P. Silaban, J. K. S. Manurung, and V. H. Sormin, “Klasterisasi Berita Bahasa Indonesia Dengan Menggunakan K-Means Dan Word Embedding,” J. Teknol. Inf. dan Ilmu Komput., vol. 10, no. 3, pp. 641–652, 2023, doi: 10.25126/jtiik.20231026468.
N. R. Rosiyan, “Pemetaan Sistematik Publikasi Tren Penelitian Pustakawan Data Menggunakan ScientoPy,” Media Pustak., vol. 30, no. 3, pp. 235–244, 2023, doi: 10.37014/medpus.v30i3.4954.
Pande sindu, Agus Aan Jiwa Permana, and I Nyoman Saputra Wahyu Wijaya, “Identifikasi Dan Normalisasi Teks Slang Dengan Fasttext Pada Twitter Dalam Bahasa Indonesia,” J. Pendidik. Teknol. dan Kejuru., vol. 21, no. 1, pp. 33–44, 2024, doi: 10.23887/jptkundiksha.v21i1.66381.
S. Analisis, A. Satusehat, D. Wardhani, R. Astuti, and D. D. Saputra, “Optimasi Feature Selection Text Mining: Stemming dan Stopword,” Innov. J. Soc. Sci. Res., vol. 4, pp. 7537–7548, 2024.
A. Santosa, I. Purnamasari, and Mayasari Rini, “Pengaruh Stopword Removal dan StemmingTerhadap Performa Klasifikasi Teks KomentarKebijakan New Normal Menggunakan AlgoritmaLSTM,” J. Sains Komput. Inform., vol. 6, pp. 81–93, 2022.
M. R. Muttaqin and M. Defriani, “Algoritma K-Means untuk Pengelompokan Topik Skripsi Mahasiswa,” Ilk. J. Ilm., vol. 12, no. 2, pp. 121–129, 2020, doi: 10.33096/ilkom.v12i2.542.121-129.
K. Dbscan and Y. Hasan, “Pengukuran Silhouette Score dan Davies-Bouldin Index pada Hasil Cluster,” vol. 06, no. 01, pp. 60–74, 2024.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Jovansa Putra Laksana, Shela Shela, Hafiz Irsyad, Abdul Rahman

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).








