Peringkasan Teks Berbahasa Indonesia dengan Latent Dirichlet Allocation dan Maximum Marginal Relevance
DOI:
https://doi.org/10.62411/tc.v23i3.10998Abstract
Kemajuan teknologi membuat berita mudah ditemukan pada media online. Jumlah artikel berita yang tersedia semakin banyak dengan teks yang cukup panjang. Hal ini akan menyulitkan pembaca berita dalam mencari inti informasi dari berita sehingga diperlukan ringkasan teks untuk membantu pengguna memahami inti dari suatu teks tanpa perlu membaca seluruhnya. Metode yang digunakan untuk peringkasan teks yaitu Maximum Marginal Relevance (MMR) dengan menggabungkan dua faktor pemilihan, yaitu relevansi dan keragaman. Sering ditemukan saat ini bahwa judul berita dalam artikel online belum sepenuhnya mewakili isi berita atau disebut clickbait, untuk menghindari judul yang kurang sesuai, pada penelitian ini peringkasan didasarkan pada kata kunci yang dihasilkan dengan metode Latent Dirichlet Allocation (LDA). Hasil uji coba dengan 2500 data artikel berita menghasilkan nilai rata-rata ROUGE-1 terbaik sebesar 0.488 untuk tingkat kompresi 50% dan 0.462 untuk tingkat kompresi 30%. Nilai ROUGE-1 terendah yaitu 0.453 untuk tingkat kompresi 50% dan 0.435 untuk tingkat kompresi 30%. Hasil tersebut menunjukkan bahwa sistem dapat menghasilkan ringkasan yang cukup relevan dengan menggunakan kata kunci yang diekstrak dari konten berita. Advances in technology make news easy to find on online media. The number of news articles available is increasing with a fairly long text. This will make it difficult for news readers to find the core information from the news so that a text summary is needed to help users understand the essence of a text without the need to read it all. The method used for text summarization is Maximum Marginal Relevance (MMR) by combining two selection factors, namely relevance and diversity. It is often found today that news titles in online articles do not fully represent the content of the news or called clickbait, to avoid inappropriate titles, in this study the summary is based on keywords generated by the Latent Dirichlet Allocation (LDA) method. The test results with 2500 news article data produced the best average ROUGE-1 value of 0.488 for a compression level of 50% and 0.462 for a compression level of 30%. The lowest ROUGE-1 value is 0.453 for a compression level of 50% and 0.435 for a compression level of 30%. These results show that the system can produce quite relevant summaries using keywords extracted from news content.Downloads
Published
Issue
Section
License
Copyright (c) 2024 Bima Hamdani Mawaridi, Muhammad Faisal, Hani Nurhayati

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
License Terms
All articles published in Techno.COM Journal are licensed under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0). This means:
1. Attribution
Readers and users are free to:
-
Share – Copy and redistribute the material in any medium or format.
-
Adapt – Remix, transform, and build upon the material.
As long as proper credit is given to the original work by citing the author(s) and the journal.
2. Non-Commercial Use
-
The material cannot be used for commercial purposes.
-
Commercial use includes selling the content, using it in commercial advertising, or integrating it into products/services for profit.
3. Rights of Authors
-
Authors retain copyright and grant Techno.COM Journal the right to publish the article.
-
Authors can distribute their work (e.g., in institutional repositories or personal websites) with proper acknowledgment of the journal.
4. No Additional Restrictions
-
The journal cannot apply legal terms or technological measures that restrict others from using the material in ways allowed by the license.
5. Disclaimer
-
The journal is not responsible for how the published content is used by third parties.
-
The opinions expressed in the articles are solely those of the authors.
For more details, visit the Creative Commons License Page:
? https://creativecommons.org/licenses/by-nc/4.0/