Comparison of String Similarity Algorithm in post-processing OCR
Abstract
Full Text:
PDFReferences
“Levensthein distance as a post-process to improve the performance of OCR in written
road signs.” https://ieeexplore.ieee.org/document/8280534 (accessed May 26, 2021).
R. Smith, “An Overview of the Tesseract OCR Engine,” in Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), Sep. 2007, vol. 2, pp. 629–633. doi: 10.1109/ICDAR.2007.4376991.
H. Hu, L. Zhang, and J. Wu, “Hamming distance based approximate similarity text search algorithm,” in 2015 Seventh International Conference on Advanced Computational Intelligence (ICACI), Mar. 2015, pp. 1–6. doi: 10.1109/ICACI.2015.7184772.
K. Manaf, S. Pitara, B. Subaeki, R. Gunawan, Rodiah, and Bakhtiar, “Comparison of Carp Rabin Algorithm and Jaro-Winkler Distance to Determine The Equality of Sunda Languages,” in 2019 IEEE 13th International Conference on Telecommunication Systems, Services, and Applications (TSSA), Oct. 2019, pp. 77–81. doi: 10.1109/TSSA48701.2019.8985470.
V. R. Chifu, I. Salomie, E. ?t. Chifu, B. Izabella, C. B. Pop, and M. Antal, “Cuckoo search algorithm for clustering food offers,” in 2014 IEEE 10th International Conference on Intelligent Computer Communication and Processing (ICCP), Sep. 2014, pp. 17–22. doi: 10.1109/ICCP.2014.6936974.
E. Brajkovi? and D. Vasi?, “Tree and word embedding based sentence similarity for evaluation of good answers in intelligent tutoring system,” in 2017 25th International Conference on Software, Telecommunications and Computer Networks (SoftCOM), Sep. 2017, pp. 1–5. doi: 10.23919/SOFTCOM.2017.8115592.
M. Pikies and J. Ali, “String similarity algorithms for a ticket classification system,” in 2019 6th International Conference on Control, Decision and Information Technologies (CoDIT), Apr. 2019, pp. 36–41. doi: 10.1109/CoDIT.2019.8820497.
“Denoising Dirty Documents.” https://kaggle.com/c/denoising-dirty-documents (accessed Jul. 17, 2021).
“tesseract.js,” npm. https://www.npmjs.com/package/tesseract.js (accessed Jul. 18,
.
“tesseract.js/api.md at master • naptha/tesseract.js,” GitHub.
https://github.com/naptha/tesseract.js (accessed Jul. 23, 2021).
C. A. B. de Mello, A. L. I. de Oliveira, and W. P. dos Santos, Eds., Digital document analysis and processing. New York: Nova Science Publishers, 2012.
J. Mei, A. Islam, A. Moh’d, Y. Wu, and E. E. Milios, “MiBio: A dataset for OCR post- processing evaluation,” Data Brief, vol. 21, pp. 251–255, Dec. 2018, doi: 10.1016/j.dib.2018.08.099.
S. Rani and J. Singh, “Enhancing Levenshtein’s Edit Distance Algorithm for Evaluating Document Similarity,” in Computing, Analytics and Networks, Singapore, 2018, pp. 72–80. doi: 10.1007/978-981-13-0755-3_6.
R. W. Hamming, “Error detecting and error correcting codes,” Bell Syst. Tech. J., vol. 29,
no. 2, pp. 147–160, Apr. 1950, doi: 10.1002/j.1538-7305.1950.tb00463.x.
“Jaro–Winkler distance,” Wikipedia. May 30, 2021. Accessed: Jul. 19, 2021. [Online]. Available: https://en.wikipedia.org/w/index.php?title=Jaro%E2%80%93Winkler_distance&oldid=102 5977252
H. Gueddah, A. Yousfi, and M. Belkasmi, “The filtered combination of the weighted edit distance and the Jaro-Winkler distance to improve spellchecking Arabic texts,” in 2015 IEEE/ACS 12th International Conference of Computer Systems and Applications (AICCSA), Nov. 2015, pp. 1–6. doi: 10.1109/AICCSA.2015.7507128.
“Image To Text Conversion With React And Tesseract.js (OCR),” Smashing Magazine. https://www.smashingmagazine.com/2021/06/image-text-conversion-react-tesseract-js- ocr/ (accessed Jul. 19, 2021).
Penerapan OCR (Optical Character Recognition) Pada Sistem Akuisisi Dokumen Jabatan Fungsional Dosen - UMM Institutional Repository. (2020, July 24). UMM Institutional Repository. Retrieved December 26, 2022, from https://eprints.umm.ac.id/63700/
DOI: https://doi.org/10.33633/jais.v8i1.7079
Article Metrics
Abstract view : 158 timesPDF - 131 times
Refbacks
- There are currently no refbacks.
Journal of Applied Intelligent System (e-ISSN : 2502-9401, p-ISSN : 2503-0493) is published by Department of Informatics Universitas Dian Nuswantoro Semarang and IndoCEISS.
Journal of Applied Intelligent System indexed by :
This journal is under licensed of Creative Commons Attribution 4.0 International License.