An improved multimodal Transformer model for document AI, integrating text, layout, and image pre-training for enhanced document understanding tasks such as form recognition, document classification, and visual question answering.
LayoutLMv2 is an advanced version of the LayoutLM model, designed for document AI applications by integrating text, layout, and image information into a unified multimodal framework. This model enhances document understanding through pre-training tasks that improve interaction among different modalities, leading to superior performance in visually rich document processing. It achieves state-of-the-art results on various benchmark tasks, including form recognition, receipt and invoice understanding, and document-based question answering. Built for tasks requiring OCR and document layout comprehension, LayoutLMv2 is widely applicable in automated document processing, financial data extraction, and AI-driven document analysis.
Attribution-Non-Commercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou
Transformers
Other
Open
Sector Agnostic
20/08/25 11:45:33
0
Attribution-Non-Commercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
© 2026 - Copyright AIKosh. All rights reserved. This portal is developed by National e-Governance Division for AIKosh mission.