https://ejurnal.seminar-id.com/index.php/bits/issue/feed Building of Informatics, Technology and Science (BITS) 2026-06-05T23:56:44+07:00 Support Journal seminar.id2020@gmail.com Open Journal Systems <p style="text-align: justify;">Building of Informatics, Technology and Science (BITS) is an open-access media in publishing scientific articles that contain the results of research in information technology and computers. Paper that enters this journal will be checked for plagiarism and peer-review first to maintain its quality. This journal is managed by Forum Kerjasama Pendidikan Tinggi (FKPT) published 4 times a year in <strong>June (No 1), September (No 2), December (No 3),&nbsp;</strong>and <strong>March&nbsp;(No 4)&nbsp;</strong>with ISSN&nbsp;<a href="https://issn.brin.go.id/terbit/detail/1557033587" target="_blank" rel="noopener">2684-8910 (Print)</a>&nbsp;and&nbsp;<a href="https://issn.brin.go.id/terbit/detail/1557037175" target="_blank" rel="noopener">2685-3310 (Online)</a>. The existence of this journal is expected to develop research and make a real contribution to improving research resources in the field of information technology and computers. BITS Journal, indexed by :&nbsp;<a href="https://scholar.google.com/citations?user=oy-dtP8AAAAJ&amp;hl=id&amp;citsig=AMD79orr29I2On4MNhRIxcFHJxCpCrUMQA">Google Scholar</a>&nbsp;|&nbsp;<a href="https://garuda.kemdikbud.go.id/journal/view/15844">Portal Garuda&nbsp;</a>| <a href="https://app.dimensions.ai/discover/publication?search_mode=content&amp;search_text=10.47065&amp;search_type=kws&amp;search_field=full_search&amp;and_facet_source_title=jour.1407312">Dimensions</a> |&nbsp;<a href="https://onesearch.id/Search/Results?lookfor=Building+of+Informatics%2C+Technology+and+Science+%28BITS%29&amp;type=AllFields&amp;limit=20&amp;sort=relevance">Indonesia One Search</a> |&nbsp;<a href="https://moraref.kemenag.go.id/archives/journal/98984515036262163">Moraref</a> |&nbsp;<a href="https://index.pkp.sfu.ca/index.php/browse/index/10161">PKP Index</a> |&nbsp;<a href="https://www.scilit.net/journal/6109244">SCILIT</a> |&nbsp;<a href="https://explore.openaire.eu/search/dataprovider?datasourceId=issn___print::8c94c96cf14c5cea949a4b30da0dcea5">OpenAire</a> |&nbsp;<a href="https://portal.issn.org/resource/ISSN/2685-3310">ROAD</a> | <a href="https://search.crossref.org/?q=Building+of+Informatics%2C+Technology+and+Science+%28BITS%29&amp;from_ui=yes">Crossref</a> | <a href="https://sinta.kemdikbud.go.id/journals/profile/7790">Science and Technology Index (Peringkat SINTA 3)</a>&nbsp;| <a href="https://www.base-search.net/Search/Results?type=all&amp;lookfor=2685-3310&amp;ling=1&amp;oaboost=1&amp;name=&amp;thes=&amp;refid=dcresen&amp;newsearch=1">BASE</a>&nbsp;|&nbsp;<a href="https://www.worldcat.org/search?q=2685-3310&amp;qt=results_page">Worldcut.Org.</a><br><strong>Building of Informatics, Technology and Science (BITS)</strong>, has been reaccredited with a <strong>SINTA rating of 3</strong> through the Decree of the Director General of Strengthening Research and Development of the Ministry of Research, Technology and Higher Education based on number <a href="https://drive.google.com/file/d/1Lq3pCoZZmZwoZMSVsAuCM-0seprhkwee/view?usp=sharing">72/E/KPT/2024</a>, dated April 1, 2024 regarding the results Electronic Scientific Periodic Accreditation Period I 2024 from <strong>Volume 5 No 1 (2023)</strong> to <strong>Volume 9 No 4 (2028)</strong>.</p> https://ejurnal.seminar-id.com/index.php/bits/article/view/8967 Deteksi Manipulasi Citra Medis MRI Menggunakan Watermarking Least Significant Bit dengan Autentikasi SHA-256 dan ECDSA 2026-06-05T23:56:39+07:00 Y Noven Dhimas Nugroho 111202214045@mhs.dinus.ac.id Wildanil Ghozi wildanil.ghozi@dsn.dinus.ac.id <p>Medical image security is a crucial aspect of maintaining the integrity and authenticity of diagnostic data, particularly during digital transmission and storage processes that are vulnerable to manipulation. Minor modifications to pixels can lead to misdiagnosis; thus, protection methods are required to verify integrity without compromising visual quality. However, previous studies still face a trade-off between system complexity, computational efficiency, and tamper detection capabilities. This research aims to develop a medical image watermarking method capable of efficiently detecting changes in diagnostic areas with minimal distortion. The proposed method integrates automated Region of Interest (ROI) segmentation based on Otsu thresholding, 1-LSB watermark embedding in the Region of Non-Interest (RONI), and authentication based on SHA-256 and ECDSA digital signatures. The primary contribution of this study is an integrated framework that combines automated segmentation and cryptographic authentication to maintain image integrity without sacrificing clinical information. Experimental results demonstrate that the method maintains high image quality, with an average PSNR of 75.04 dB, low MSE, and the highest SSIM of 0.9999975. This performance is achieved through a small payload (99 bytes) that modifies only 1.21% of pixels in the RONI. In terms of efficiency, the method exhibits relatively fast computational performance with average embedding and extraction times of 0.14 seconds and 0.095 seconds, respectively, on 256×256 pixel images using an AMD Ryzen 5 5600H and 16 GB RAM. The system is capable of detecting ROI manipulation, identifying global payload damage, and remains valid under RONI changes, although it remains limited against large-scale manipulation due to the fragile nature of the LSB technique.</p> 2026-06-05T00:00:00+07:00 ##submission.copyrightStatement## https://ejurnal.seminar-id.com/index.php/bits/article/view/9722 Deteksi Penyakit Jantung Menggunakan SVM dan XGBoost dengan Interpretabilitas SHAP dan Integrasi LLM 2026-06-05T23:56:39+07:00 Raihan Al Aziz 111202214808@mhs.dinus.ac.id Egia Rosi Subhiyakto egia@dsn.dinus.ac.id <p>Cardiovascular disease remains the leading cause of death globally, demanding accurate early detection, yet limited access to specialist medical personnel in developing countries often hinders timely diagnosis. This study aims to address the critical gap between the high accuracy of machine learning models in academic research and the minimal adoption of practical clinical applications by developing a safe and trustworthy hybrid artificial intelligence-based heart disease triage system. The proposed methodology integrates a dual-model architecture in which Support Vector Machine serves as the primary prediction model and Extreme Gradient Boosting as a second-opinion model, both optimized with SMOTE oversampling technique to handle class imbalance, and implements SHAP to provide transparency in black-box model decisions. The system is enriched with Dynamic Prompt Engineering innovation on the Mistral-7B Large Language Model to translate numerical probabilities into safe, personalized, and empathetic medical narratives. Experimental results show that the Support Vector Machine model with RBF kernel delivers superior performance with an accuracy of 90.22% and sensitivity of 94.12%, which is crucial for minimizing false negative cases in medical screening, outperforming the Extreme Gradient Boosting model which recorded 88.04% accuracy. Interpretability analysis identified chest pain type, cholesterol level, and maximum heart rate as the primary risk indicators, validating the model's alignment with standard cardiology guidelines. A dual safety validation mechanism through programmed risk thresholds and language generation temperature control ensures the system does not produce harmful diagnostic hallucinations. In conclusion, the system implemented as a FastAPI-based microservice is proven technically feasible with low latency, offering an accurate, transparent, and communicative early screening solution to support healthcare service efficiency.</p> 2026-06-03T13:12:13+07:00 ##submission.copyrightStatement## https://ejurnal.seminar-id.com/index.php/bits/article/view/9558 Perbandingan Naïve Bayes dan Support Vector Machine Dalam Analisis Sentimen Google Maps Pusat Perbelanjaan 2026-06-05T23:56:40+07:00 Eliza Cahyaningrum elizacahya14@gmail.com Astrid Novita Putri astrid@usm.ac.id <p>The rapid growth of user reviews on Google Maps is not always accompanied by ease in understanding the sentiment contained within them, causing tourists and the general public to face difficulties in determining shopping centers with good reputation and service quality. The lack of information regarding visitor satisfaction levels, along with various facility-related issues such as crowd density, limited parking space, and the comfort of public facilities, combined with the large number of subjective and unstructured reviews, makes manual sentiment analysis ineffective and potentially leads to less accurate conclusions. This investigation aims to analyze sentiment from Google Maps reviews of shopping centers in the city of Semarang utilizing the Support Vector Machine (SVM) and Naïve Bayes methods. The data were collected from five shopping centers with the highest number of reviews in Semarang, namely Paragon Mall, Mall Ciputra, Java Mall, DP Mall, and Queen City Mall. The investigation method includes text preprocessing, TF-IDF weighting, and sentiment classification into three classes: negative, neutral, and positive. The dataset was divided into training and testing data with a ratio of 80:20. The outcomes reveal that the Naïve Bayes method achieved an accuracy of 85.56%, while the Support Vector Machine (SVM) method achieved an accuracy of 89.20%. Considering the outcomes, the SVM method performs better in classifying sentiment from Google Maps reviews of shopping centers in Semarang.</p> 2026-06-05T00:00:00+07:00 ##submission.copyrightStatement## https://ejurnal.seminar-id.com/index.php/bits/article/view/9596 Analisis Sentimen X Terhadap Isu Industri Sawit Prabowo Subianto Menggunakan TF-IDF dan Machine Learning 2026-06-05T23:56:40+07:00 Ibrahim Akbar Arga Dewangga 111202214417@mhs.dinus.ac.id Rama Aria Megantara aria@dsn.dinus.ac.id <p>This study aims to analyze public sentiment on the X platform regarding the palm oil industry issue associated with Prabowo Subianto and to compare the performance of Decision Tree, Support Vector Machine (SVM), and Random Forest algorithms. The dataset consisted of 3,785 tweets collected through a crawling process. The data were then processed through cleaning, case folding, text normalization, tokenizing, stopword removal, and stemming. Sentiment labeling was conducted using a lexicon-based approach, followed by feature extraction using Term Frequency-Inverse Document Frequency (TF-IDF) and traintest data splitting. The labeling results show that public opinion was dominated by positive sentiment with 3,018 tweets (79.7%), while negative sentiment accounted for 767 tweets (20.3%). The experimental results indicate that SVM achieved the best performance with an accuracy of 0.90, followed by Random Forest with 0.86 and Decision Tree with 0.84. SVM also demonstrated more stable performance based on precision, recall, and F1-score across both sentiment classes. These findings indicate that SVM is the most effective model for Indonesian-language sentiment classification on palm oil policy issues and has strong potential to support public policy evaluation based on social media data.</p> 2026-06-05T00:00:00+07:00 ##submission.copyrightStatement## https://ejurnal.seminar-id.com/index.php/bits/article/view/9710 Optimasi Bayesian pada Gradient Boosting untuk Prediksi Niat Beli E-Commerce pada Dataset dengan Ketidakseimbangan Kelas 2026-06-05T23:56:40+07:00 Imam Bagus Setyawan 111202012526@mhs.dinus.ac.id Heribertus Himawan himawan26@dsn.dinus.ac.id <p>Predicting consumer purchase intention in e-commerce is a crucial challenge due to the high rate of class imbalance, where the majority of visitors only browse without making a transaction. This study compares the performance of three Gradient Boosting family algorithms (XGBoost, LightGBM, and CatBoost) using the Online Shoppers Intention dataset, which has a class ratio of 84.5% to 15.5%. To overcome majority class bias, the Synthetic Minority Oversampling Technique (SMOTE) approach was implemented on the training data. This research focuses on hyperparameter optimization implementation using the Optuna framework based on the Tree-structured Parzen Estimator (TPE), which is statistically validated using the Friedman and Post-Hoc Nemenyi tests. Model evaluation using stratified 10-Fold Cross-Validation shows that all three models can handle class imbalance effectively. LightGBM achieved an accuracy of 88.36% with an ROC-AUC of 0.9138, XGBoost achieved an accuracy of 88.56% with an ROC-AUC of 0.9127, and CatBoost achieved an accuracy of 88.56% with an ROC-AUC of 0.9121. Feature importance analysis identifies ProductRelated_Duration and ExitRates as the main predictors of purchase intention. The Friedman statistical test detected global performance differences (p=0.0450), but the Nemenyi post-hoc test found insufficient empirical evidence to claim significant pairwise performance differences. This research provides a practical contribution to the e-commerce industry by demonstrating that the selection of ensemble algorithms no longer needs to rely absolutely on pseudo-accuracy margins, but can be objectively recommended based on computational latency efficiency, where the LightGBM architecture proves to be efficient.</p> 2026-06-05T00:00:00+07:00 ##submission.copyrightStatement## https://ejurnal.seminar-id.com/index.php/bits/article/view/9584 Evaluasi Validitas Model Machine learning pada Klasifikasi Stunting Berbasis Data Antropometri dan Hubungan Deterministik 2026-06-05T23:56:40+07:00 Turwan Aldi Putra turwan_aldi_putra@teknokrat.ac.id Nirwana Hendrastuty nirwanahendrastuty@teknokrat.ac.id <p>Stunting is a chronic nutritional problem among infants and toddlers that affects children’s growth and development. Various studies have utilized machine learning for nutritional status classification based on anthropometric data; however, the validity of the resulting models has rarely been examined. This study aims to evaluate the validity of machine learning models in classifying stunting status using the XGBoost, Random Forest, and Naïve Bayes algorithms. The dataset consists of 120,999 anthropometric records of infants, with age, gender, and height as features, and nutritional status as the target variable. The research process included preprocessing, data transformation, and model evaluation using the k-fold cross-validation method with accuracy, precision, recall, and F1-score metrics. The results showed that Random Forest and XGBoost achieved very high accuracy, at 99.91% and 99.08%, respectively, while Naïve Bayes reached only 55%. This stark difference in performance indicates that ensemble-based models are capable of capturing very strong patterns in the data, while Naïve Bayes struggles due to the interdependence among features. Furthermore, the high accuracy of certain models suggests a deterministic relationship between features and labels, which could potentially make the models less robust against data containing measurement errors or noise.</p> 2026-06-04T18:26:19+07:00 ##submission.copyrightStatement## https://ejurnal.seminar-id.com/index.php/bits/article/view/9554 Analisis Perbandingan Metode Edas Dan Aras Dalam Pemilihan Platform Freelance Terbaik Untuk Pekerja Jarak Jauh (Remote Worker) 2026-06-05T23:56:41+07:00 Rexlicky Verdhika Sagatha rexlicky_verdhika_sagatha@teknokrat.ac.id Zaenal Abidin zaenal_abidin@teknokrat.ac.id <p>The trend of remote workers has increased significantly, driving the high adoption of global freelance platforms. However, the diversity of policies regarding service fees, withdrawal limits, and levels of competition across platforms often makes it difficult for beginner remote workers to determine the most optimal choice. This study aims to analyze and compare the recommendation results of a Decision Support System (DSS) using the <em>Evaluation based on Distance from Average Solution</em> (EDAS) method and the <em>Additive Ratio Assessment</em> (ARAS) method in selecting freelance platforms. The study evaluates five platform alternatives (Upwork, Fiverr, Fastwork, Freelancer, and Projects.co.id) using a mixed-methods approach that combines factual platform policy data (Administrative Fee Deduction and Minimum Withdrawal) with user perception data (UI/UX, Security, and Level of Competition). The analysis results show a high level of consistency between the two methods for the best alternative, where Upwork (A1) ranks first with an Appraisal Score (AS) of 0.965 in EDAS and a Utility Degree (Ki) of 0.958 in ARAS. However, the comparative analysis reveals differences in rankings at the 4th and 5th positions, caused by the extreme value (outlier) sensitivity of the EDAS algorithm on cost attributes and the more tolerant stability of the ARAS algorithm in providing proportional value compensation. This study concludes that a comparative method not only provides validated recommendations but also reveals the characteristics of each algorithm in handling anomalies in cost attribute data. The main contribution of this study is to provide a valid comparative decision-making framework for remote workers in optimizing platform selection, while also enriching the academic literature regarding the disclosure of algorithmic sensitivity in the ARAS and EDAS methods when handling cost data anomalies.</p> 2026-06-05T00:00:00+07:00 ##submission.copyrightStatement## https://ejurnal.seminar-id.com/index.php/bits/article/view/9707 Segmentation-Aware Recommendation with Cluster-Specific Item Graphs Using Pointwise Mutual Information for Market Basket Analysis 2026-06-05T23:56:41+07:00 Khalifatur Rauf blograuf.kr@gmail.com Arief Hermawan ariefdb@uty.ac.id Donny Avianto donny@uty.ac.id <p>Traditional Association Rule-based recommendation methods often exhibit limited coverage and high redundancy when applied to sparse transactional data, thereby constraining their effectiveness for product discovery in e-commerce systems. This study proposes a hybrid recommendation framework that integrates customer behavioral segmentation with graph-based item representation learning to address these limitations. Customers are first grouped into behaviorally homogeneous clusters using historical transaction features. For each cluster, an item co-occurrence graph is constructed and weighted using pointwise mutual information to mitigate sparsity bias and emphasize informative associations. Graph-based representation learning is then applied using Node2Vec to generate low-dimensional product embeddings that capture both local structural proximity and higher-order relational patterns. The proposed framework explicitly restricts the candidate item space to the Top 100 most frequent products within each behavioral cluster, thereby focusing the recommendation task on improving localized discovery within high-frequency product segments rather than global catalog exploration. The objective of this research is to assess whether segmentation-aware graph embeddings can outperform traditional FP-Growth association rules under a strict temporal split between the Historical Training Set and the Hold-out Evaluation Set, ensuring realistic and leakage-free evaluation. Model performance is evaluated using precision, recall, normalized discounted cumulative gain, and intra-list diversity on the Hold-out Evaluation Set. Experimental results indicate that the proposed graph-based approach improves ranking quality and diversity within constrained high-frequency item spaces, demonstrating more effective localized discovery within Top 100 product segments compared to FP-Growth. These results demonstrate that graph-based embeddings are more robust to sparse behavioral patterns within high-frequency product segments and better suited for exploratory recommendation scenarios within dense product subsets. The proposed framework offers a scalable and temporally valid foundation for knowledge-driven recommender systems.</p> 2026-06-05T01:30:50+07:00 ##submission.copyrightStatement## https://ejurnal.seminar-id.com/index.php/bits/article/view/9709 Comparing TF-IDF Based SVM and Logistic Regression for Imbalanced Pertamina Corruption Tweet Sentiment Classification 2026-06-05T23:56:41+07:00 Khahlil Gibran 2208096096@student.walisongo.ac.id Wenty Dwi Yuniarti wenty@walisongo.ac.id Khotibul Umam khotibul_umam@walisongo.ac.id Mokhamad Iklil Mustofa iklil@walisongo.ac.id <p>The corruption case involving PT Pertamina (Persero) in early 2025 generated widespread public reactions on social media, particularly on the X (Twitter) platform. The rapid dissemination of opinions in digital environments highlights the importance of analyzing public sentiment toward socio-political issues. This study aims to examine public sentiment regarding the Pertamina corruption case using a text classification approach based on Term Frequency–Inverse Document Frequency (TF-IDF). This study contributes a controlled comparison of TF-IDF-based Support Vector Machine (SVM) and Logistic Regression on imbalanced Indonesian-language tweets related to a nationally salient corruption issue, while also emphasizing the importance of evaluating performance beyond accuracy alone through macro-F1 and minority-class recall. Two classification algorithms, Support Vector Machine (SVM) and Logistic Regression, were employed to compare their performance in predicting lexicon-derived positive and negative sentiment labels.. To address class imbalance, the Synthetic Minority Over-sampling Technique (SMOTE) was applied to the training data. A total of 3,058 Indonesian-language tweets collected between February 25 and March 10, 2025 underwent preprocessing and sentiment labeling using the INSET Lexicon. The results show that SVM achieved higher overall accuracy of 94.93% and a macro-F1 score of 0.80, while Logistic Regression achieved an accuracy of 90.52% and a macro-F1 score of 0.73. However, class-wise evaluation indicates that accuracy should not be interpreted independently because the dataset was dominated by negative sentiment. For the positive minority class, SVM obtained an F1-score of 0.64 and recall of 0.60, whereas Logistic Regression obtained a lower F1-score of 0.52 but a higher recall of 0.69. These findings indicate a trade-off between overall classification performance and minority-class sensitivity.</p> 2026-06-05T00:00:00+07:00 ##submission.copyrightStatement## https://ejurnal.seminar-id.com/index.php/bits/article/view/9688 Perbandingan XGBoost dan Random Forest Menggunakan Seleksi Fitur ANOVA-MI Dalam Klasifikasi Kesehatan Janin Cardiotocography 2026-06-05T23:56:41+07:00 Abednego Destyo Amanda abednego_destyo_amanda@teknokrat.ac.id Angga Bayu Santoso anggabayu@teknokrat.ac.id <p>This study compares the performance of Random Forest and XGBoost algorithms in classifying fetal health problems using Cardiotocography (CTG) data. The imbalance in the amount of data between classes, the presence of less relevant features, and the challenge in identifying the Suspect class, which has characteristics between the Normal and Pathological classes, are the main problems in the CTG dataset. This condition is important because the early stage of fetal health risk determines further medical treatment represented by the Suspect class. This study uses ANOVA and Mutual Information feature selection techniques, as well as the ADASYN oversampling method to balance the data to overcome these problems. In addition, Random Search is used to optimize model parameters to improve its performance. Unlike previous studies that generally focus on improving accuracy, this study also emphasizes the model's ability to detect minority classes, especially the Suspect class. Based on the results of the study, in almost every test scenario, XGBoost consistently outperforms Random Forest. The XGBoost model obtained optimal accuracy from the combination of ANOVA, ADASYN, and hyperparameter tuning with an accuracy of 95.51%. Meanwhile, the application of Mutual Information with ADASYN and tuning was quite effective in identifying the Suspect class with a higher recall value of 81%. However, because the Suspect class attribute lies between the Normal and Pathological class attributes, the model still faces challenges in optimally distinguishing them. Overall, this study shows that a combination of appropriate feature selection, handling data imbalance, and parameter optimization in a single pipeline can improve model performance more balanced. This research is expected to support more objective medical decision-making, especially in detecting fetal risk conditions from an early stage.</p> 2026-06-05T15:50:24+07:00 ##submission.copyrightStatement## https://ejurnal.seminar-id.com/index.php/bits/article/view/9697 Evaluasi Kinerja Naïve Bayes, Decision Tree, Dan Random Forest Serta Voting Ensemble Pada Klasifikasi Multi-Kelas Penyakit Sapi Berbasis Gejala 2026-06-05T23:56:42+07:00 Nazwa Diajeng Istika Rahmadhani l200220251@student.ums.ac.id Nurgiyatna Nurgiyatna nurgiyatna@ums.ac.id <p>Cattle are an important livestock commodity; however, farmers often face difficulties in disease diagnosis due to the similarity of clinical symptoms and limited access to veterinary experts. This study aims to compare the performance of three machine learning classification algorithms, namely Naïve Bayes, Decision Tree, and Random Forest, and to evaluate the effectiveness of an ensemble approach using a Voting Ensemble method for cattle disease diagnosis. The study adopts the CRISP-DM methodology, consisting of data preprocessing, modeling, and evaluation stages. Model performance is assessed using accuracy, precision, recall, and F1-score metrics. The experimental results show that Naïve Bayes achieves the best performance with an accuracy of 0.951 and an F1-score of 0.920. Random Forest obtains an accuracy of 0.799, while Decision Tree performs the lowest with an accuracy of 0.265. Ensemble methods, including Voting NB+RF, Voting Weighted, Voting Soft, and Voting Hard, achieve accuracies of 0.912, 0.900, 0.853, and 0.792, respectively. These findings indicate that Naïve Bayes is more suitable for high-dimensional and sparse symptom-based data, providing the most stable performance among the evaluated models. The developed system is implemented as a web-based expert system. Usability evaluation using the System Usability Scale (SUS) yields a score of 77, categorized as “Good.” This study demonstrates that machine learning can support decision-making in cattle disease diagnosis.</p> 2026-06-05T16:11:44+07:00 ##submission.copyrightStatement## https://ejurnal.seminar-id.com/index.php/bits/article/view/9573 Optuna-Driven Hyperparameter Optimization in Tsukamoto Fuzzy Logic for House Price Estimation 2026-06-05T23:56:42+07:00 Annisa Aurelia Fitriani annisaarel04@gmail.com Nabilah Putri Wijaya nabilahputriwijaya@gmail.com Susanto Susanto susanto@usm.ac.id Nur Wakhidah ida@usm.ac.id <p>The property sector faces challenges in determining accurate house selling prices due to subjectivity and market uncertainty. The relationship between physical attributes, such as land area and building area, and price is not always linear, making conventional methods often less precise in estimation. This study aims to design a decision support system to objectively estimate house prices in the Plamongan area, Semarang. The method used is Fuzzy Tsukamoto Logic. This preliminary study explores the integration of the Tree-structured Parzen Estimator (TPE) algorithm through the Optuna framework to automatically optimize membership function limits, replacing manual trial and error methods. The dataset was collected via scraping techniques, providing a pilot dataset of 26 data points. Final model performance evaluation showed a Mean Absolute Percentage Error (MAPE) value of 11.39%, which falls into the 'Good Forecast' category. However, given the highly limited sample size, these findings primarily serve as a proof-of-concept that requires further validation with larger, multi-variable datasets. These results prove that integrating the Fuzzy Tsukamoto method with hyperparameter optimization is effective in reducing subjectivity and providing reliable property price estimates. The primary contribution of this research is providing a mathematical proof-of-concept for an automated, objective property valuation system that eliminates human bias in fuzzy parameter configuration, offering a practical baseline tool for localized real estate markets.</p> 2026-06-05T16:29:53+07:00 ##submission.copyrightStatement## https://ejurnal.seminar-id.com/index.php/bits/article/view/9668 Classification of School Students Lifestyle Risks Based on Smoking Behavior Using Naïve Bayes 2026-06-05T23:56:42+07:00 Oktaria Dwi Cahyani oktariadwicahyani@gmail.com Deltari Balka deltaribalka3@gmail.com Dinni Rezky Amelia dinnirezkyameliaa@gmail.com Rainda Cintari Aulya raindacintariaviya@gmail.com Ken Ditha Tania kenya.tania@gmail.com Allsela Meiriza allsela_meiriza@yahoo.co.id Zaqqi Yamani zaqqi_yamani@unsri.ac.id <p>This study aims to classify students' lifestyle risks based on smoking behavior using the Naïve Bayes algorithm within a knowledge management framework. The research was conducted on students at a vocational high school within the coverage area of a local community health center. The dataset consisted of 277 valid records after undergoing data selection, cleaning, and transformation stages. The modeling process was carried out using RapidMiner software with an 80:20 data split for training (221 students) and testing (56 students). The evaluation metrics used included accuracy, precision, recall, and confusion matrix. The experimental results demonstrate that the Naïve Bayes model achieved an accuracy of 85.92%, precision of 86.12%, and recall of 92.86% for the unhealthy class. Furthermore, the classification results were integrated into a knowledge management framework to support decision-making processes in schools and community health centers. This study contributes to the application of predictive data mining in adolescent health and demonstrates how classification models can serve as effective tools for early detection, preventive interventions, and evidence-based policy formulation in educational and health settings.</p> 2026-06-05T16:53:24+07:00 ##submission.copyrightStatement## https://ejurnal.seminar-id.com/index.php/bits/article/view/9794 Sentiment Classification on Indonesian Game Sequels: A Comparative Analysis of SVM and Naive Bayes on Coffee Talk Franchise Reviews 2026-06-05T23:56:42+07:00 Nanda Yuris Riziq nandayurisriziq@gmail.com Edy Mulyanto edymulyanto@dsn.dinus.ac.id <p>User reviews on Steam are a critical source of feedback for game developers, yet manual sentiment analysis at scale is impractical. This study aims to compare Support Vector Machine (SVM), Multinomial Naive Bayes (MNB), and Complement Naive Bayes (CNB) for binary sentiment classification and to analyze sequel reception patterns through cross-game evaluation. Reviews were preprocessed with negation-aware stopword removal and WordNet lemmatization, then vectorized with TF-IDF unigram and bigram features. Four scenarios were evaluated: two within-game baselines, a cross-game generalization, and a combined evaluation. Class imbalance was handled at the model level via class weighting for SVM and the CNB variant. Macro-averaged F1-Score was the primary metric. SVM consistently outperformed both Naive Bayes variants, achieving macro-F1 of 0.81 within-game and 0.75 cross-game. MNB collapsed to majority-class prediction across all scenarios; in S2, all three models also failed on the minority class due to the small test partition (n=6). The cross-game result indicates that sentiment patterns transfer reasonably from the original game to its sequel, with the performance drop concentrated in the minority class. These findings offer practical guidance for Indonesian game developers monitoring sequel reception through automated sentiment analysis.</p> 2026-06-05T17:05:07+07:00 ##submission.copyrightStatement## https://ejurnal.seminar-id.com/index.php/bits/article/view/9716 Komparasi Random Forest dan Artificial Neural Network dalam Prediksi Dampak AI terhadap Pekerjaan 2030 2026-06-05T23:56:42+07:00 Jefri Jaka Tirta jefri_jaka_tirta@teknokrat.ac.id Heni Sulistiani henisulistiani@teknokrat.ac.id <p>The development of Artificial Intelligence (AI) is expected to affect the future employment structure, particularly regarding automation risks in 2030 as a period of accelerated AI adoption across various industrial sectors. This study aims to compare the performance of the Random Forest and Artificial Neural Network (ANN) algorithms in predicting the impact of AI on jobs. The study employed two modeling approaches, namely regression to predict job automation probability and classification to determine job risk categories into Low, Medium, and High classes through a discretization process. The dataset was obtained from Kaggle with a total of 3,000 records and processed through preprocessing, feature engineering, and train-test splitting with an 80:20 ratio. Regression evaluation was conducted using Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and coefficient of determination (R²), while classification evaluation used accuracy and F1-score. The results showed that Random Forest achieved the best regression performance with an MAE of 0.0786, RMSE of 0.0932, and R² of 0.8640, outperforming ANN with an MAE of 0.0949, RMSE of 0.1137, and R² of 0.7973. In the classification task, both algorithms achieved an accuracy and F1-score of 99.33%. This study shows that Random Forest is more stable on tabular data and contributes to the comparative analysis of ensemble learning and neural network approaches for predicting the impact of AI on jobs.</p> 2026-06-05T17:30:56+07:00 ##submission.copyrightStatement## https://ejurnal.seminar-id.com/index.php/bits/article/view/9818 Klasifikasi Risiko Bencana di Indonesia Menggunakan SVM dan Random Forest 2026-06-05T23:56:43+07:00 Erland Adhe Sharendra erland_adhe_sharendra@teknokrat.ac.id Tri Widodo tri_widodo@teknokrat.ac.id Damayanti Damayanti damayanti@teknokrat.ac.id Okma Arnilia okmaarnilia@uinssc.ac.id <p>Indonesia is a country with a high level of disaster vulnerability, requiring effective methods to accurately classify disaster risk levels. This study aims to analyze and compare the performance of Support Vector Machine (SVM) and <em>Random Forest</em> algorithms in disaster risk classification. The dataset used consists of disaster event data from 2019–2024, including disaster type, region, number of victims, and population density. Disaster risk levels were classified into three categories, namely low, medium, and high, based on the total impact calculated from the number of victims. The proposed method includes data preprocessing, normalization, and train-test data splitting. The results show that both models achieved high performance, where Random Forest obtained an accuracy of 95.66% and SVM achieved 95.28%, with ROC-<em>AUC</em> values of 0.9823 and 0.9769, respectively. Random Forest demonstrated slightly better performance with an accuracy difference of 0.38% and more consistent prediction results. The high performance indicates that the models were able to recognize the main patterns within the dataset, although the results were also influenced by the characteristics of the data used. Overall, Random Forest is more suitable for disaster risk classification on data with complex characteristics.</p> 2026-06-05T17:42:21+07:00 ##submission.copyrightStatement## https://ejurnal.seminar-id.com/index.php/bits/article/view/9896 Hybrid CNN-BiLSTM untuk Analisis Sentimen Multi-Platform terhadap Insiden Keamanan Pangan Program Makan Bergizi Gratis 2026-06-05T23:56:43+07:00 Mohamad Rival Farid Riwaldi 111202214547@mhs.dinus.ac.id Aripin Aripin arifin@dsn.dinus.ac.id <p>The decline in stunting prevalence in Indonesia has not been accompanied by improvements in the quality of nutritional intervention program implementation, including the Free Nutritious Meal Program (MBG), which sparked public controversy following food safety incidents in several regions. The high volume of cross-platform public opinion on social media requires an analytical approach capable of simultaneously capturing diverse linguistic styles from various sources. This study proposes a hybrid Convolutional Neural Network-Bidirectional Long Short-Term Memory (CNN-BiLSTM) classification model to analyze public sentiment regarding the incidents, with CNN extracting local feature patterns and BiLSTM modeling bidirectional word-sequence dependencies. A total of 3,416 comments were collected from five social media platforms (X, Instagram, TikTok, YouTube, and Facebook), then processed through text preprocessing and initial lexicon-based labeling into three sentiment classes: negative, neutral, and positive. To strengthen label validity, the labeling quality was validated through manual annotation by two independent annotators, yielding a Cohen’s Kappa value of κ = 0.828. The dataset was split using an 80:20 stratified scheme, with class weight applied to reduce bias caused by class imbalance without changing the number of samples in each class. The hybrid model was compared with two baseline models, CNN and BiLSTM, using macro F1-score as the primary metric, while accuracy was used as a supporting metric. The experimental results show that the hybrid CNN–BiLSTM model achieved a macro F1-score of 90.38% and an accuracy of 94.59%, outperforming both baseline models. Misclassification analysis revealed that most errors occurred in argumentative comments, negation, and contrastive sentences, reflecting the limitations of lexicon-based labeling in capturing nuanced language. Overall, this approach demonstrates the potential of cross-platform deep learning-based sentiment analysis as an initial component for monitoring public opinion on national-scale government policies. This study contributes by providing a manually validated multi-platform Indonesian dataset, developing a hybrid CNN-BiLSTM architecture with a class weight scheme effective for three-class sentiment classification on informal text, and opening opportunities for applying deep learning as a means of data-driven public opinion monitoring.</p> 2026-06-05T17:57:40+07:00 ##submission.copyrightStatement## https://ejurnal.seminar-id.com/index.php/bits/article/view/9724 Perbandingan Kinerja Naïve Bayes, SVM, dan Random Forest dalam Klasifikasi Risiko Kehamilan 2026-06-05T23:56:43+07:00 Reva Ekalia reva_ekalia@teknokrat.ac.id Dyah Ayu Megawaty dyahayumegawaty@teknokrat.ac.id <p>Classifying pregnancy risk levels is a crucial aspect in supporting early detection of potential complications in pregnant women. However, most previous studies have focused on a single algorithm and relied solely on accuracy metrics, thus failing to provide a comprehensive picture of model performance in multiclass classification. Furthermore, performance comparisons between algorithms using more comprehensive evaluation approaches are still limited. This study aims to analyze and compare the performance of the Naïve Bayes, Support Vector Machine (SVM), and Random Forest algorithms in classifying pregnancy risk levels using the Maternal Health Risk Dataset from the UCI Machine Learning Repository, which consists of 1,014 data sets with six maternal health attributes. The methods used include data preprocessing, hyperparameter optimization using GridSearchCV, and model evaluation using Stratified K-Fold Cross Validation with k = 10. Model performance was measured using accuracy, precision, recall, and F1-score metrics to provide a more comprehensive evaluation. The results showed that the Random Forest algorithm had the best performance with an accuracy value of 0.8629, precision of 0.8704, recall of 0.8629, and F1-score of 0.8635, followed by SVM and Naïve Bayes. The superiority of Random Forest is due to its ability to combine several decision trees and capture non-linear relationships between features, resulting in more accurate and stable predictions. Thus, Random Forest is recommended as the most effective method in pregnancy risk classification based on maternal health data.</p> 2026-06-05T19:05:10+07:00 ##submission.copyrightStatement## https://ejurnal.seminar-id.com/index.php/bits/article/view/9788 Perbandingan Kinerja Model ARIMA-GARCH dan LSTM Dalam Peramalan Volatilitas Bitcoin 2026-06-05T23:56:43+07:00 Miezan El khoir miezan_el_khoir@teknokrat.ac.id Fenty Ariany fenty@teknokrat.ac.id <p>Bitcoin is a cryptocurrency aset with extreme volatility, necessitating precise forecasting models for investment risk mitigation. This study aims to analyze and forecast Bitcoin price volatility using an integrated Autoregressive Integrated Moving Average - Generalized Autoregressive Conditional Heteroskedasticity (ARIMA-GARCH) approach and compare its performance with a Deep Learning method, specifically Long Short-Term Memory (LSTM). The data used is the daily closing price of Bitcoin for the period 2018 to 2025. The results indicate that the ARIMA(1,1,1)-GARCH(1,1) model effectively captures the volatility clustering phenomenon, with a significant beta parameter value of 0.8691, indicating long-term volatility persistence. However, in terms of price prediction accuracy, the LSTM model significantly outperforms the conventional statistical model. Based on the testing, the ARIMA-GARCH model produced a Mean Absolute Percentage Error (MAPE) of 18.11%, which falls into the "good forecasting" category. In contrast, the LSTM model achieved a MAPE of 3.09%, categorized as "highly accurate forecasting." The significant difference in Root Mean Square Error (RMSE) values also reinforces that the LSTM architecture is more adaptive in processing non-linear data patterns and complex Bitcoin price fluctuations. This study concludes that while ARIMA-GARCH excels in risk structure analysis, the LSTM model provides more reliable price projection results for crypto market participants.</p> 2026-06-05T22:45:19+07:00 ##submission.copyrightStatement## https://ejurnal.seminar-id.com/index.php/bits/article/view/9706 Leakage-Aware Random Forest Regression for Predicting Job Automation Risk Using Structured Labor Market Data 2026-06-05T23:56:43+07:00 Alya Zalfa Chairunnisa ch.alyazalfa@gmail.com Nawirah Athqiyah nawirahathqiyah67@gmail.com Vanisa Amalia Putri hello.vanisamalia@gmail.com Ken Dhita Tania kenya.tania@gmail.com Allsela Meiriza allsela@unsri.ac.id <p>This study aims to predict job automation risk in the era of artificial intelligence (AI) using a leakage-aware Random Forest Regression approach. The automation risk score, defined as a composite index derived from task exposure to AI, occupational routine intensity, and technological susceptibility indicators sourced from the AI Impact Jobs Dataset, serves as the target variable. The dataset comprises 5,000 job vacancy records from 44 countries across 9 industries spanning 2010 to 2025. A rigorous methodological framework is applied by systematically identifying and eliminating potential data leakage features, including ai_intensity_score, reskilling_required, and ai_mentioned, which were found to share mathematical or conceptual derivation paths with the target variable. The model is evaluated using R², RMSE, MAE, and MAPE with 5-fold cross-validation. The results show that the model achieves an R² score of 0.8087 on testing data, with RMSE of 0.1129 and MAE of 0.0893. Feature importance analysis reveals that salary_change_vs_prev_year_percent is the most influential predictor (55.85%), which, although indicative of dominance bias typical in synthetic datasets, aligns with economic theories linking wage dynamics to automation incentives. The findings demonstrate that leakage control significantly reduces inflated performance estimates (from R² = 0.8857 to 0.8087), and that Random Forest Regression provides a robust predictive framework for tabular socio-economic data when combined with rigorous preprocessing. This study contributes a methodological template for preventing data leakage in labor market prediction tasks.</p> 2026-06-05T23:03:25+07:00 ##submission.copyrightStatement## https://ejurnal.seminar-id.com/index.php/bits/article/view/9687 Segmentasi Pelanggan E-Commerce Berbasis Integrasi Text Mining dan RFM untuk Deteksi Dini Churn 2026-06-05T23:56:43+07:00 Violin Juneyla Nandita violinjuneyla06@gmail.com Juseia Wulandari juseiawulandari59@gmail.com Apriyadi Apriyadi apriyadiakh@gmail.com Ali Ibrahim aliibrahim@unsri.ac.id Fathoni Fathoni fathoni@unsri.ac.id <p>The growth of transactions on e-commerce platforms generates a massive volume of unstructured customer review data. However, traditional Customer Relationship Management (CRM) models such as RFM often only focus on quantitative transaction data and ignore the emotional dimension contained in customer reviews. This study aims to analyze the relationship between purchase frequency and customer comment polarity through the integration of Text Mining and CRM Analytics approaches. The novelty offered is the development of a hybrid method that combines Lexicon Refinement-based sentiment extraction with the Random Forest algorithm to overcome rating bias in global e-commerce platform data (Kaggle). The proposed method includes the use of Natural Language Processing (NLP) techniques, topic modeling based on Latent Dirichlet Allocation (LDA), and sentiment analysis to extract polarity scores. The test results show that the initial lexicon model has limitations with an accuracy of 52.14% due to noise in neutral reviews (3-star rating). However, after optimization using the Random Forest algorithm and neutral data filtering, the classification accuracy increased significantly to 74.62%. These results prove that sentiment integration is able to provide more accurate loyalty mapping and help e-commerce management detect potential churn in the At-Risk customer segment.</p> 2026-06-05T23:16:25+07:00 ##submission.copyrightStatement## https://ejurnal.seminar-id.com/index.php/bits/article/view/9864 Aspect-Based Sentiment Analysis on Skintific Product Reviews Using IndoBERT 2026-06-05T23:56:43+07:00 Asyifa Hafizah Putri asyifahafizahumm@webmail.umm.ac.id Christian Sri Kusuma Aditya christianskaditya@umm.ac.id <p>The rapid growth of the beauty industry has generated a massive volume of online reviews where traditional sentiment analysis fails to capture contradictory opinions across specific product features. This study implements Aspect-Based Sentiment Analysis (ABSA) using the IndoBERT-base-p1 architecture on 2,139 review data points from Female Daily, integrated with a specialized slang normalization stage to mitigate linguistic noise. The novelty lies in evaluating IndoBERT’s bidirectional attention robustness in processing technical medical terminology alongside Indonesian social media slang—a complexity often overlooked in prior beauty domain studies. This study contributes a novel methodological pipeline that bridges deep learning architectures with domain-specific linguistic preprocessing, providing a benchmark dataset for Indonesian beauty product reviews. The results showed that IndoBERT was able to distinguish nuances of sentiment, with superior performance in the Effectiveness (F1-Score 72.57%) and Texture (F1-Score 71.10%). Although the average score was affected by sample limitations in certain aspects, the model proved effective in capturing the semantics of Indonesian consumer slang. Ultimately, this research provides a practical contribution for consumers in validating product quality specifically and for producers as a basis for evaluating product performance in the public eye.</p> 2026-06-05T23:34:42+07:00 ##submission.copyrightStatement## https://ejurnal.seminar-id.com/index.php/bits/article/view/9840 Arrhythmia Detection Using XGBoost with Recursive Feature Elimination: A Two-Stage Machine Learning Approach 2026-06-05T23:56:43+07:00 Suci Mutiarani mutiaranisuci58@gmail.com Tikaridha Hardiani tikaridha@unisayogya.ac.id <p>Arrhythmia is a cardiac rhythm disorder that can lead to severe complications, including heart failure and sudden cardiac death. Accurate electrocardiogram (ECG)-based arrhythmia detection remains challenging due to high-dimensional features and class imbalance. Therefore, this study aims to develop a two-stage machine learning approach for arrhythmia detection using Recursive Feature Elimination (RFE) and Extreme Gradient Boosting (XGBoost). The proposed approach performs binary classification to distinguish normal and arrhythmia conditions, followed by multi-class classification to identify arrhythmia subtypes. SMOTE is applied to address class imbalance, while Grid Search with cross-validation is used for hyperparameter optimization. Furthermore, the trained model is implemented in a web-based application for interactive prediction and visualization. Experimental results show that the optimized binary classification model achieves an accuracy of 0.89 and an F1-score of 0.87. Meanwhile, the multi-class classification model achieves an accuracy of 0.69 and a weighted F1-score of 0.66. The results indicate that the proposed approach performs effectively for binary arrhythmia detection. However, performance in multi-class classification remains limited due to imbalance and insufficient samples in several arrhythmia subtypes. This study contributes by proposing an integrated framework that combines Recursive Feature Elimination (RFE) for feature selection, SMOTE for imbalance handling, XGBoost with GridSearchCV-based hyperparameter optimization, and a two-stage classification approach for ECG-based arrhythmia detection and subtype classification. In addition, the proposed model is implemented in a web-based application to support interactive prediction and visualization. Overall, this study demonstrates the potential of integrating RFE, XGBoost, and SMOTE for ECG-based arrhythmia detection and practical web-based implementation.</p> 2026-06-05T23:55:28+07:00 ##submission.copyrightStatement##