Latent Dirichlet Allocation dalam Identifikasi Respon Masyarakat Indonesia Terhadap Covid-19 Tahun 2020-2021
Keywords:Covid, Latent Dirichlet Allocation, Text mining, Text sampling, Twitter
Covid-19 is a very troubling disease in Indonesia. Therefore, understanding public opinion is required to find solutions and evaluate the government performance in handling the pandemic. Twitter can be helpful to identify the public opinion of significant events. Twitter’s tweet is a large dimension text-based big data. It requires text sampling and text mining to be processed efficiently and effectively. Stratified random sampling with 20 repetitions applied to assume days as strata followed by topic modeling with latent Dirichlet allocation (LDA). This research aims to find out public opinion regarding Covid-19 and its
growth over time. Other than that, this research also aims to find out sampling effects on tweet data using stratified random sampling. Therefore, the extracted topics will be transformed into time-series data and considering the variety of the pattern made. Afterward, the transformation results will be explored and interpreted. This research suggests that discussions related to Covid-19 are divided into four topics by the first model, namely: “Vaccine”, “Positive or affected people”, “Health protocol”, and “Indonesia” then nine topics by the second model, namely: “Vaccine”, “Prayer”, “Health protocol”, “Social aid and corruption”, “Affected people”, “Indonesian economy”, “Work”, “Persuading to wear mask”, and “Willing to watch”. Furthermore, some topics peak whenever a significant event occurs in Indonesia. Afterward, this research suggests that 20 repetitions of stratified random sampling could provide good results.