DocLLM: A layout-aware generative language model for multimodal document understanding
Meta info.
- Authors: Dongsheng Wang, Natraj Raman, Mathieu Sibue et al.
- Paper: https://arxiv.org/pdf/2401.00908.pdf
- Affiliation: JP Morgan AI Research
TL; DR
multi-modal LLM์์ ์ฐฉ์, LM์ด text์ (์ ํํ๋ document ๋ด์์) ์์น์ ๋ณด๋ฅผ input์ผ๋ก ๋ฐ๋๋ก ํ์ฌ internal structured document understanding ๋ฌธ์ ํด๊ฒฐ
