BloombergGPT: A Large Language Model for Finance
Meta info.
- Authors: Wu, Irsoy, Lu
- Paper: https://arxiv.org/abs/2303.17564v1
- Affiliation: Bloomberg
TL; DR
A combined pre-training approach for domain-specific and non-domain-specific corpus. It describes the dataset, model configuration, and training procedure for BloombergGPT.





Suggestions
BloombergGPT is a decoder-only causal LM based on BLOOM with 50B parameter for finance.
- It claims the largest domain-specific dataset (based on Bloombergโs extensive data sources) yet with 363B tokens, augmented with 345B tokens from general corpora.
- The model outperforms existing general LM on financial tasks while not sacrificing performance on general NLP benchmarks.
- pre-trains on Large domain-specific and general corpora and use this PLM via in-context learning (not fine-tuned LLM) #pic1
- adopted on Chinchilla scaling law for using 50B param #pic2
- outperforms generic-purpose LLMs on finance-specific tasks with still performing well on generic NLP tasks #pic3, 4(only BBH)
Personal note. ๊ธฐ์กด ์ ๊ทผ๋ฒ์ฒ๋ผ PLM์ adaptedํ๋ ๋ฐฉ์(1st Pre-training on generic corpora โย 2nd pre-training on domain-specific corpora)์ด ๋ ๋์ ์๋ ์์ง ์์๊น? ๋น๊ตํด๋ณด๋ฉด ์ข์ ๋ฏ.
comment. ๋ ผ๋ฌธ์์ (general + domain-specific) ํ๊บผ๋ฒ์ pre-training ํ๋ ๊ฑฐ๋ 1.general โ 2. domain-specific ์์ฐจ์ ์ผ๋ก ํ๋ ๊ฒ ๋น๊ต๊ฐ ์์๋์?
๋ค ์๋๋ฌธ์ธ์ง ๋ชจ๋ฅด๊ฒ ์ง๋ง (์๋ง ๋น์ฉ? ์๋๋ฉด ๊ทธ๋ฐ ํฌ์คํ ๋ฐฉ์์ด encoder-only์ BERT์์ ์ ํจํ์ด์? ์ธ์ง๋ ๋ชจ๋ฅด๊ฒ ์ง๋ง) ๋ชป์ฐพ์์ต๋๋ค. ๋ญ๊ฐ ๋น๊ต๋ ๋ญ PaLM ์ด๋ฐ๊ฑฐ๋์ ์ํด์ ์ฝ๊ฐ ์์ฝ๋ค์