A pre-trained multimodal Transformer model for document AI, integrating unified text and image masking for tasks such as form understanding, receipt processing, and document layout analysis.
LayoutLMv3 is a pre-trained multimodal Transformer designed for document AI, utilizing unified text and image masking to enhance document understanding. Its unified architecture and training objectives make it a versatile, general-purpose model that can be fine-tuned for both text-centric and image-centric tasks. These include form and receipt understanding, document visual question answering, document layout analysis, and classification. Developed as an improvement over its predecessors, LayoutLMv3 is well-suited for applications in OCR, automated document processing, and AI-driven document workflows.
Attribution-Non-Commercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou
Transformers
Other
Open
Sector Agnostic
20/08/25 11:44:40
0
Attribution-Non-Commercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
© 2026 - Copyright AIKosh. All rights reserved. This portal is developed by National e-Governance Division for AIKosh mission.