Optimizing the performance of a server-based classification for a large business document flow
https://doi.org/10.21122/2309-4923-2022-4-60-64
Об авторе
О. А. CлавинРоссия
Главный научный сотрудник, доктор технических наук
Список литературы
1. Башкатова, A. Цифровая экономика плодит все больше бумаг: Россияне не скоро перестанут носить в организации справки // Независимая Газета. – 2019 – 14 ноя. [Электронный ресурс] – Режим доступа: https://www.ng.ru/ economics/2019-11-14/4_7727_paper.html, – Загл. с экрана – Яз. рус. Дата доступа – 08.11.2022.
2. Liu, L., Wang, Z., Qiu, T., Chen, Q., Lu, Y., Suen, C.Y. Document image classification: Progress over two decades, Neurocomputing 2021, 453: 223-240.
3. Byun, Y., Lee, Y. Form classification using DP matching. ACM Symposium on Applied Computing 2000; 1: 1–4.
4. Devlin, J., Chang, M.-W., Lee, K., Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. 2019. [Электронный ресурс] – Режим доступа: https://arxiv.org/abs/1810.04805/, – Загл. с экрана – Яз. англ. Дата доступа – 08.11.2022.
5. Rubin, T.N., Chambers, A., Smyth, P., Steyvers, M. Statistical topic models for multi-label document classification. Machine Learning – 2011, Vol. 88, № 1, 157–208. https://doi.org/10.1007/s10994-011-5272-5.
6. Vorontsov, K.V., Potapenko, A.A. Tutorial on probabilistic topic modeling: Additive regularization for stochastic matrix factorization. Communications in Computer and Information Science – 2014, Vol. 436, pp. 29-46. https://doi.org/10.1007/978-3-31912580-0_3.
7. NIST Special Database 2 [Электронный ресурс] – Режим доступа: https://www.nist.gov/srd/nist-special-database-2/, – Загл. с экрана – Яз. англ. Дата доступа – 08.11.2022.
8. Tobacco-3482 [Электронный ресурс] – Режим доступа: https://www.kaggle.com/patrickaudriaz/tobacco3482jpg/, – Загл. с экрана – Яз. англ. Дата доступа – 08.11.2022.
9. OCR Tesseract [Электронный ресурс] – Режим доступа: https://github.com/tesseract-ocr/tesseract/, – Загл. с экрана – Яз. англ. Дата доступа – 08.11.2022.
10. Tereshin, A.A., Usilin, S.A., Arlazarov, V.V. Performance Improvement of Multi-class Detection Using Greedy Algorithm for Viola-Jones Cascade Selection. Proceedings Volume 10696, Tenth International Conference on Machine Vision (ICMV 2017); 106960D (2018). https://doi.org/10.1117/12.2310101
11. Slavin, O.A., Farsobina, V., Myshev, A.V. Analyzing the content of business documents recognized with a large number of errors using modified Levenshtein distance. Cyber-Physical Systems: Intelligent Models and Algorithms. – 2022, Springer Nature Switzerland AG., Vol. 417, pp. 267 – 279. https://doi.org/10.1007/978-3-030-95116-0
12. Slavin, O.A. Using Special Text Points in the Recognition of Documents. Studies in Systems, Decision and Control. – 2020, Springer Nature Switzerland AG., Vol 259. pp. 43–53. https://doi.org/10.1007/978-3-030-32579-4_4
13. Konaka, F., Miura, T. Semantic similarity for sequenced shingles, – 2015 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM), pp. 12-17. https://doi.org/10.1109/PACRIM.2015.7334801.
14. Acar, U.A., Blelloch, G.E., Harper, R. Selective memorization. ACM SIGPLAN Notices, – 2003, Vol. 38, Issue 1, pp 14–25. https://doi.org/10.1145/640128.604133
15. Tatarowicz, A.L., Curino, C., Jones, E. P. C. and Madden, S. Lookup Tables: Fine-Grained Partitioning for Distributed Databases. – 2012 IEEE 28th International Conference on Data Engineering, pp. 102-113. https://doi.org/10.1109/ICDE.2012.26.
Рецензия
Для цитирования:
Cлавин О.А. Optimizing the performance of a server-based classification for a large business document flow. Системный анализ и прикладная информатика. 2022;(4):60-64. https://doi.org/10.21122/2309-4923-2022-4-60-64
For citation:
Slavin O.A. Optimizing the performance of a server-based classification for a large business document flow. «System analysis and applied information science». 2022;(4):60-64. https://doi.org/10.21122/2309-4923-2022-4-60-64