DocLLM: A layout-aware generative language model for multimodal document understanding

January 23, 2024 less than 1 minute read

Meta info.

TL; DR

multi-modal LLM에서 착안, LM이 text와 (정형화된 document 내에서) 위치정보를 input으로 받도록 하여 internal structured document understanding 문제 해결

Untitled