{"id":2601,"date":"2023-06-13T11:04:08","date_gmt":"2023-06-13T02:04:08","guid":{"rendered":"https:\/\/bestpathresearch.com\/?p=2601"},"modified":"2023-06-18T18:03:22","modified_gmt":"2023-06-18T09:03:22","slug":"ocr","status":"publish","type":"post","link":"https:\/\/bestpathresearch.com\/en\/2023\/06\/13\/ocr\/","title":{"rendered":"Japanese Receipt OCR and Named-entity Extraction: Low-cost Inference with Multiple Models using AWS SageMaker Serverless and Triton Inference Server"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-post\" data-elementor-id=\"2601\" class=\"elementor elementor-2601\" data-elementor-post-type=\"post\">\n\t\t\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-3822ef76 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"3822ef76\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-bc93e69\" data-id=\"bc93e69\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-d6ebf14 elementor-widget elementor-widget-text-editor\" data-id=\"d6ebf14\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<span class=\"text\">\nIn this blog post I want to talk about how we deployed our server-based Optical Character Recognition (OCR) and Named-entity (NE) demo for extracting information from Japanese receipts. I think it\u2019s a good demonstration of how to perform low-cost inference with multiple models, combining the best features of various different software technologies.\n<\/span>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-b9d61da elementor-widget elementor-widget-text-editor\" data-id=\"b9d61da\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<span class=\"text\">\nOur current OCR\/NE demo service comprises two separate processing pipelines which run in parallel and are essentially independent of each other. One pipeline is a \u201cconventional\u201d OCR\/NE cascade, and the other is a so-called end-to-end model which does all processing \u201cin one go\u201d.\n<\/span>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-5149e47 elementor-widget elementor-widget-text-editor\" data-id=\"5149e47\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<span class=\"text\">\nIn our conventional OCR\/NE pipeline we have 4 different deep-learning models, each of which was trained from scratch, or fine-tuned, using TensorFlow:\n<\/span>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-4709915 elementor-widget elementor-widget-text-editor\" data-id=\"4709915\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ol>\n \t<li class=\"text\">a model for detecting the corners of the central receipt in an image (in case there are multiple, visible receipts) and rectifying it to make it straight;<\/li>\n \t<li class=\"text\">a model for detecting the location of contiguous regions of text;<\/li>\n \t<li class=\"text\">an OCR model for recognizing the characters in each detected region of text; and<\/li>\n \t<li class=\"text\">an NE model, which is based on the BERT large language model (LLM), which takes all the text that was recognized in the image, sorts it from top-to-bottom and left-to-right, then classifies each character as belonging to one of several hundred different named entities of interest, such as address, date, shop name, and total amount etc.<\/li>\n<\/ol>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-b74c2b4 elementor-widget elementor-widget-text-editor\" data-id=\"b74c2b4\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<span class=\"text\">\nWhile I refer to this pipeline as \u201cconventional\u201d, because it splits the information extraction task across several modules, it actually contains two novel components: namely (1) the receipt detection and rectification module, which we wrote about in detail <a href=\"https:\/\/bestpathresearch.com\/en\/2023\/03\/10\/202303101234\/\" target=\"_blank\" rel=\"noopener\">here<\/a> and (4) the NE model which operates at the text level and ignores explicit position information about where the text occurred originally in the image.\n<\/span>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-ca238a7 elementor-widget elementor-widget-text-editor\" data-id=\"ca238a7\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<span class=\"text\">\nThe other pipeline that runs in parallel, comprises an end-to-end model based on <a href=\"https:\/\/github.com\/clovaai\/donut\" rel=\"\" target=\"_blank\">Donut<\/a>, which is an encoder-decoder image transformer model that was pre-trained on a vast amount of synthetically generated textual image data, and which we fine-tuned using PyTorch to recognize the same set of named-entities as those used by our traditional pipeline. This approach is similar to the end-to-end receipt processing method that we described <a href=\"https:\/\/bestpathresearch.com\/en\/2022\/12\/13\/202212131011\/\" target=\"_blank\" rel=\"noopener\">here<\/a>.\n<\/span>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-eaff1f2 elementor-widget elementor-widget-text-editor\" data-id=\"eaff1f2\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<span class=\"text\">\nTo perform information extraction on actual receipts, we want to deploy this entire backend on the cheapest infrastructure available, since for the time being this system will only run as a demo. The demo should be responsive, but we don\u2019t want it running constantly and thus incurring unnecessary running costs when it is not actually being used. Moreover, we also want to retain the option to deploy the same software seamlessly to an easily scalable server infrastructure with the absolute minimum of changes in future.\n<\/span>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-ca80edd elementor-widget elementor-widget-text-editor\" data-id=\"ca80edd\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<span class=\"text\">\n<a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/serverless-endpoints.html\" target=\"_blank\" rel=\"noopener\">AWS SageMaker Serverless<\/a> is perfect for this kind of use-case, since you are only charged for the total inference time that is incurred to process incoming requests. If there are no requests, then it doesn\u2019t cost anything. There is also an incentive to process each request as quickly as possible, which means optimizing each component model to be as fast as possible. Moreover, when we are ready to deploy the system to run 24\/7, we can easily migrate the same Docker images, that we build for AWS SageMaker Serverless, to <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/realtime-endpoints.html\" target=\"_blank\" rel=\"noopener\">AWS SageMaker Real-time Inference<\/a>, or <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/async-inference.html\" target=\"_blank\" rel=\"noopener\">AWS SageMaker Asynchronous<\/a>.\n<\/span>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7ccf2ab elementor-widget elementor-widget-text-editor\" data-id=\"7ccf2ab\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<span class=\"text\">\nHaving selected AWS SageMaker as our deployment target, we then need to decide what inference software to use to run on it. There are a number of options that AWS make available, including <a href=\"https:\/\/www.tensorflow.org\/tfx\/guide\/serving\" target=\"_blank\" rel=\"noopener\">TensorFlow Serving<\/a>, from Google, <a href=\"https:\/\/docs.openvino.ai\/latest\/ovms_what_is_openvino_model_server.html\" target=\"_blank\" rel=\"noopener\">OpenVINO Model Server<\/a>, from Intel, and <a href=\"https:\/\/developer.nvidia.com\/nvidia-triton-inference-server\" target=\"_blank\" rel=\"noopener\">Triton Inference Server<\/a>, from Nvidia. As far as we are aware, only Triton Inference Server gives us the option of running different models, trained or converted to run under different model inference frameworks, within a single server instance. So, for now, we chose to run Triton Inference Server with the OpenVINO, ONNX, PyTorch, TensorFlow and Python model plugins.\n<\/span>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-113feec elementor-widget elementor-widget-text-editor\" data-id=\"113feec\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<span class=\"text\">\nWe also developed a thin client in <a href=\"https:\/\/flutter.dev\/\" rel=\"\" target=\"_blank\">Flutter<\/a> that allows the user to take a photo of a receipt within the app, or select one that is already saved in the camera roll on their PC or mobile device, as shown in the screenshot below:\n<\/span>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d9df4cd elementor-widget elementor-widget-image\" data-id=\"d9df4cd\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img fetchpriority=\"high\" decoding=\"async\" width=\"450\" height=\"974\" src=\"https:\/\/bestpathresearch.com\/www\/wp-content\/uploads\/2023\/06\/OCR-1.jpg\" class=\"attachment-medium_large size-medium_large wp-image-2596\" alt=\"\" srcset=\"https:\/\/bestpathresearch.com\/www\/wp-content\/uploads\/2023\/06\/OCR-1.jpg 450w, https:\/\/bestpathresearch.com\/www\/wp-content\/uploads\/2023\/06\/OCR-1-139x300.jpg 139w\" sizes=\"(max-width: 450px) 100vw, 450px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-6393338 elementor-widget elementor-widget-text-editor\" data-id=\"6393338\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<span class=\"text\">\nthen send that image to our AWS SageMaker endpoint, which processes the image and then returns the results from the two different processing pipelines and displays them in the app, as shown in the following screenshot:\n<\/span>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-c12888f elementor-widget elementor-widget-image\" data-id=\"c12888f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" width=\"450\" height=\"974\" src=\"https:\/\/bestpathresearch.com\/www\/wp-content\/uploads\/2023\/06\/OCR-2.jpg\" class=\"attachment-medium_large size-medium_large wp-image-2597\" alt=\"\" srcset=\"https:\/\/bestpathresearch.com\/www\/wp-content\/uploads\/2023\/06\/OCR-2.jpg 450w, https:\/\/bestpathresearch.com\/www\/wp-content\/uploads\/2023\/06\/OCR-2-139x300.jpg 139w\" sizes=\"(max-width: 450px) 100vw, 450px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-0f79535 elementor-widget elementor-widget-text-editor\" data-id=\"0f79535\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<span class=\"text\">\nA high-level view of how all these different parts fit together is shown in the diagram below.\n<\/span>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-9abee4a elementor-widget elementor-widget-image\" data-id=\"9abee4a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" width=\"640\" height=\"316\" src=\"https:\/\/bestpathresearch.com\/www\/wp-content\/uploads\/2023\/06\/OCR_3.jpg\" class=\"attachment-large size-large wp-image-2611\" alt=\"\" srcset=\"https:\/\/bestpathresearch.com\/www\/wp-content\/uploads\/2023\/06\/OCR_3.jpg 977w, https:\/\/bestpathresearch.com\/www\/wp-content\/uploads\/2023\/06\/OCR_3-300x148.jpg 300w, https:\/\/bestpathresearch.com\/www\/wp-content\/uploads\/2023\/06\/OCR_3-768x380.jpg 768w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-14b08d0 elementor-widget elementor-widget-text-editor\" data-id=\"14b08d0\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<span class=\"text\">\nThere were numerous issues that we encountered to get this whole system up and running smoothly. One such issue was getting Triton Inference Server to run on AWS SageMaker Serverless, which disables shared memory for security reasons by default. We managed to solve this by modifying the source code, re-compiling and re-building the Docker image.\n<\/span>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-26fafd5 elementor-widget elementor-widget-text-editor\" data-id=\"26fafd5\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<span class=\"text\">\nAnother issue was getting complex PyTorch models to run using the Triton Inference Server PyTorch plugin. Since the plugin uses the C++ API, rather than the Python API that is commonly used during model training, we first needed to separate the encoder and decoder components in Donut, trace each of them separately and save them to disk. They can then be read in and called through the PyTorch plugin\u2019s C++ API with a separate call to the encoder, followed by a separate call to the decoder with the output of the encoder. \n<\/span>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-78aa4c2 elementor-widget elementor-widget-text-editor\" data-id=\"78aa4c2\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<span class=\"text\">\nBoth of the Donut encoder and decoder models are quite large at around 500Mb each, and the NE model is even larger at 800Mb. While we were able to successfully quantize the Donut encoder and decoder, we were not able to perform inference using these quantized models. This is something we are still looking at how to solve. However, we were able to successfully reduce the size of the NE model (recall that it is a BERT encoder-style LLM) by a factor of 4, after converting to ONNX and applying quantization. We then converted this quantized ONNX model to the OpenVINO format for use in the final system. \n<\/span>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f7172b8 elementor-widget elementor-widget-text-editor\" data-id=\"f7172b8\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<span class=\"text\">\nInference speeds using the final system are approximately 1 second for our conventional pipeline and around 10 seconds for the end-to-end Donut pipeline. All models run simultaneously on a single AWS SageMaker Serverless instance with 6Gb RAM. Cold-start times (i.e. the time taken to copy model files from S3, boot up the serverless instance, and have it respond to requests are approximately 45 seconds. This means an initial request exceeds the maximum timeout of 30 seconds, but subsequent requests are almost instantaneous. We continue to work on reducing the cold-start time to avoid the initial timeout. \n<\/span>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-fc0c2ce elementor-widget elementor-widget-text-editor\" data-id=\"fc0c2ce\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<span class=\"text\">\nFuture work will look at swapping out the Triton Inference Server with OpenVINO Model Server to take advantage of its software optimisations for running on Intel hardware. In particular, we also have high expectations for the latest OpenVINO 2023.0 release, which include model quantization and pruning, reduced model load-times and faster inference speeds. This will require us to convert all our existing TensorFlow and PyTorch models to the OpenVINO model format, which we plan to write more about in the future, so please do check back here then.\n<\/span>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-e085f35 elementor-widget elementor-widget-text-editor\" data-id=\"e085f35\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<span class=\"text\">\nKeywords: TensorFlow, PyTorch, ONNX, OpenVINO, Triton Inference Server, AWS SageMaker, Serverless, Real-time Inference Server, client-server, OCR, NE, Japanese receipts\n<\/span>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>In this b [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":1912,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_locale":"en_US","_original_post":"https:\/\/bestpathresearch.com\/www\/?p=2529","footnotes":""},"categories":[26],"tags":[],"class_list":["post-2601","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-blogs_en","en-US"],"_links":{"self":[{"href":"https:\/\/bestpathresearch.com\/wp-json\/wp\/v2\/posts\/2601","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/bestpathresearch.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/bestpathresearch.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/bestpathresearch.com\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/bestpathresearch.com\/wp-json\/wp\/v2\/comments?post=2601"}],"version-history":[{"count":22,"href":"https:\/\/bestpathresearch.com\/wp-json\/wp\/v2\/posts\/2601\/revisions"}],"predecessor-version":[{"id":2642,"href":"https:\/\/bestpathresearch.com\/wp-json\/wp\/v2\/posts\/2601\/revisions\/2642"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/bestpathresearch.com\/wp-json\/wp\/v2\/media\/1912"}],"wp:attachment":[{"href":"https:\/\/bestpathresearch.com\/wp-json\/wp\/v2\/media?parent=2601"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/bestpathresearch.com\/wp-json\/wp\/v2\/categories?post=2601"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/bestpathresearch.com\/wp-json\/wp\/v2\/tags?post=2601"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}