less than 1 minute read

Meta info.

TL; DR

A combined pre-training approach for domain-specific and non-domain-specific corpus. It describes the dataset, model configuration, and training procedure for BloombergGPT.

Untitled 4

Untitled

Untitled

Untitled

Untitled

Suggestions

BloombergGPT is a decoder-only causal LM based on BLOOM with 50B parameter for finance.

  • It claims the largest domain-specific dataset (based on Bloombergโ€™s extensive data sources) yet with 363B tokens, augmented with 345B tokens from general corpora.
  • The model outperforms existing general LM on financial tasks while not sacrificing performance on general NLP benchmarks.
    1. pre-trains on Large domain-specific and general corpora and use this PLM via in-context learning (not fine-tuned LLM) #pic1
    2. adopted on Chinchilla scaling law for using 50B param #pic2
    3. outperforms generic-purpose LLMs on finance-specific tasks with still performing well on generic NLP tasks #pic3, 4(only BBH)

Personal note. ๊ธฐ์กด ์ ‘๊ทผ๋ฒ•์ฒ˜๋Ÿผ PLM์— adaptedํ•˜๋Š” ๋ฐฉ์‹(1st Pre-training on generic corpora โ†’ย 2nd pre-training on domain-specific corpora)์ด ๋” ๋‚˜์„ ์ˆ˜๋„ ์žˆ์ง€ ์•Š์„๊นŒ? ๋น„๊ตํ•ด๋ณด๋ฉด ์ข‹์„ ๋“ฏ.

comment. ๋…ผ๋ฌธ์—์„œ (general + domain-specific) ํ•œ๊บผ๋ฒˆ์— pre-training ํ•˜๋Š” ๊ฑฐ๋ž‘ 1.general โ†’ 2. domain-specific ์ˆœ์ฐจ์ ์œผ๋กœ ํ•˜๋Š” ๊ฒƒ ๋น„๊ต๊ฐ€ ์—†์—ˆ๋‚˜์š”?

๋„ค ์™œ๋•Œ๋ฌธ์ธ์ง€ ๋ชจ๋ฅด๊ฒ ์ง€๋งŒ (์•„๋งˆ ๋น„์šฉ? ์•„๋‹ˆ๋ฉด ๊ทธ๋Ÿฐ ํˆฌ์Šคํ… ๋ฐฉ์‹์ด encoder-only์˜ BERT์—์„œ ์œ ํšจํ–ˆ์–ด์„œ? ์ธ์ง€๋Š” ๋ชจ๋ฅด๊ฒ ์ง€๋งŒ) ๋ชป์ฐพ์•˜์Šต๋‹ˆ๋‹ค. ๋ญ”๊ฐ€ ๋น„๊ต๋„ ๋ญ PaLM ์ด๋Ÿฐ๊ฑฐ๋ž‘์€ ์•ˆํ•ด์„œ ์•ฝ๊ฐ„ ์•„์‰ฝ๋„ค์š”