{"id":2746,"date":"2024-03-09T13:00:09","date_gmt":"2024-03-09T04:00:09","guid":{"rendered":"https:\/\/bestpathresearch.com\/?p=2746"},"modified":"2025-07-08T10:31:41","modified_gmt":"2025-07-08T01:31:41","slug":"large-language-models-for-named-entity-extraction-and-spelling-correction","status":"publish","type":"post","link":"https:\/\/bestpathresearch.com\/en\/2024\/03\/09\/large-language-models-for-named-entity-extraction-and-spelling-correction\/","title":{"rendered":"Publication of the paper \u201cLarge Language Models for Named Entity Extraction and Spelling Correction\u201d"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-post\" data-elementor-id=\"2746\" class=\"elementor elementor-2746\" data-elementor-post-type=\"post\">\n\t\t\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-2aabef3f elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"2aabef3f\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-57154198\" data-id=\"57154198\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-686f8cb2 elementor-widget elementor-widget-text-editor\" data-id=\"686f8cb2\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<div class=\"text\">\nAs the field of AI\/ML makes revolutionary strides, Best Path Research has just published a paper on arxiv which adds another contribution to the field: <p><\/p>\n<p><a href=\"https:\/\/arxiv.org\/abs\/2403.00528\" target=\"_blank\">https:\/\/arxiv.org\/abs\/2403.00528<\/a><br><\/p>\n<div class=\"text\">\nWhile a large part of the advancements is in the user experience, the perceived \u201cintelligence\u201d of Large Language models (LLMs) is often due to simply using more training data and training larger models. However, we like to emphasize that just making things \u201cbigger\u201d is not the only way to make a model better. To advance the field and make systems which are more usable and interpretable, we need a combination of better algorithms, data and modularity. This is where Best Path Research has deep knowledge and experience.<p><\/p>\n<div class=\"text\">\nIn our latest work, Best Path Research compared the performance of eight, open-source, \u201cgenerative\u201d LLMs with two existing state-of-the-art BERT language models on the task of extracting Named Entities (NEs), such as shop names, addresses and product names, from Japanese shop receipts.<p><\/p>\n<div class=\"text\">\nBased on decades of experience with speech and handwriting recognition, machine learning, and natural language processing, our method combines known and proven language modelling algorithms with LLMs in an effective way. One such integrated approach is Question Answering using Language Models, that our CEO first developed and wrote about while he was a post-doc at Philips and Tokyo Institute of Technology in the early 2000s. At that time the method was used to answer factoid questions in the&nbsp;<a href=\"https:\/\/trec.nist.gov\/\" target=\"_blank\">Text REtrieval Conference (TREC)<\/a>&nbsp;and&nbsp;<a href=\"https:\/\/trec.nist.gov\/data\/qa.html\" target=\"_blank\">Question Answering Competitions<\/a>.<p><\/p>\n<div class=\"text\">\nIn our proposed approach, text is first extracted from scanned images of paper shop receipts using Best Path Research&#8217;s proprietary and productized Optical Character Recognition (OCR) system (&lt;1% character error rate). The text is then fed to a fine-tuned language model (either an LLM or BERT, in our work), so as to answer a question about a desired NE category that we want to extract from the text, such as the shop name, address or a product name in the original receipt.<p><\/p>\n<div class=\"text\">\nWe show that the best LLMs perform as well as, or better than, the existing state-of-the-art methods, achieving 100% precision and recall on some numerical NE categories, while also demonstrating additional benefits, such as being able to correct some of the original OCR recognition errors.<p><\/p>\n<div class=\"text\">\nWhile our research demonstrates the effectiveness of our approach using LLMs on OCR&#8217;d text from Japanese shop receipts, the method is equally applicable to any language and any scenario that requires the extraction of Named Entities from text documents, e.g., counterparty names, addresses and payment conditions from legal contracts, revenue changes and market conditions from financial reports, and many other scenarios.<p><\/p>\n<div class=\"text\">\nGet in touch with us to discuss where we can assist you in your digitization and document extraction requirements.<p><\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>As the fi [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_locale":"en_US","_original_post":"https:\/\/bestpathresearch.com\/?p=2718","footnotes":""},"categories":[27],"tags":[],"class_list":["post-2746","post","type-post","status-publish","format-standard","hentry","category-news_en","en-US"],"_links":{"self":[{"href":"https:\/\/bestpathresearch.com\/wp-json\/wp\/v2\/posts\/2746","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/bestpathresearch.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/bestpathresearch.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/bestpathresearch.com\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/bestpathresearch.com\/wp-json\/wp\/v2\/comments?post=2746"}],"version-history":[{"count":8,"href":"https:\/\/bestpathresearch.com\/wp-json\/wp\/v2\/posts\/2746\/revisions"}],"predecessor-version":[{"id":2754,"href":"https:\/\/bestpathresearch.com\/wp-json\/wp\/v2\/posts\/2746\/revisions\/2754"}],"wp:attachment":[{"href":"https:\/\/bestpathresearch.com\/wp-json\/wp\/v2\/media?parent=2746"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/bestpathresearch.com\/wp-json\/wp\/v2\/categories?post=2746"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/bestpathresearch.com\/wp-json\/wp\/v2\/tags?post=2746"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}