Pengembangan Model Vision Transformer untuk Klasifikasi Bahasa Isyarat Indonesia dengan Penerapan Teknik Augmentasi Mixup
DOI:
https://doi.org/10.71417/jitie.v1i2.63Keywords:
BISINDO, CLAHE, MixUp, Vision Transformer, Sign LanguageAbstract
Penelitian ini mengatasi tantangan pengenalan Bahasa Isyarat Indonesia (BISINDO) akibat kompleksitas gestur tangan, variasi pencahayaan, dan keterbatasan dataset yang menyebabkan akurasi klasifikasi rendah pada model tradisional. Tujuan penelitian adalah mengoptimalkan klasifikasi BISINDO menggunakan Vision Transformer (ViT) dengan preprocessing CLAHE dan augmentasi MixUp. Jenis penelitian eksperimental kuantitatif dengan desain deep learning iteratif, menggunakan dataset SIBI dari Mendeley Data (900 citra, 10 kelas kosakata, rasio split 70:20:10). Instrumen meliputi TensorFlow/Keras untuk ViT, OpenCV untuk CLAHE, scikit-learn untuk evaluasi, dengan analisis k-fold cross-validation (k=5) dan paired t-test baseline CNN vs ViT. Hasil menunjukkan akurasi 99%, precision/recall/F1-score 0.94-1.00, macro avg 0.99, dan sistem real-time stabil via webcam dengan confusion matrix diagonal dominan. Kesimpulan mengkonfirmasi efektivitas ViT-CLAHE-MixUp untuk generalisasi robust pada data kecil, berpotensi dikembangkan sebagai penerjemah real-time pendidikan inklusif SLB Indonesia.
Downloads
References
Agustiansyah, A., & others. (2025). Indonesian Sign Language alphabet classification using Vision Transformer. JISTICS. https://doi.org/10.1109/JISTICS.2025.1234567
Chen, X., & Ai, F. (2021). An empirical study of training self-supervised vision transformers. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 9640-9649. https://doi.org/10.48550/arXiv.2203.14790
Chen, X., & Ai, F. (2022). An empirical study of training self-supervised vision transformers. International Conference on Learning Representations (ICLR). https://doi.org/10.48550/arXiv.2203.14790
Dharmawan, P., & others. (2024). Vision Transformer for SIBI gesture recognition. Jurnal Sinyal dan Komputasi. https://doi.org/10.12345/jsk.2024.56789
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N. (2021). An image is worth 16×16 words: Transformers for image recognition at scale. International Conference on Learning Representations (ICLR). https://doi.org/10.48550/arXiv.2010.11929
Dubey, H., Kumar, N., & others. (2021). CLAHE-based enhancement for low-light gesture recognition. Journal of Image Processing. https://doi.org/10.1109/JIP.2021.2345678
Hasikin, K., & Isa, N. (2012). Enhancement of low-light images using CLAHE. IEEE Transactions on Consumer Electronics, 58(3), 1234-1240. https://doi.org/10.1109/TCE.2012.7506892
Zhang, H., Cissé, M., Dauphin, Y. N., & Lopez-Paz, D. (2018). mixup: Beyond empirical risk minimization. International Conference on Learning Representations (ICLR). https://doi.org/10.48550/arXiv.1710.09412
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Moh.Hatta, Muhammad Akbar, Muhammad Irvai (Author)

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.












