BloombergGPT: A Large Language Model for Finance

April 5, 2023 less than 1 minute read

Meta info.

Authors: Wu, Irsoy, Lu
Paper: https://arxiv.org/abs/2303.17564v1
Affiliation: Bloomberg

TL; DR

A combined pre-training approach for domain-specific and non-domain-specific corpus. It describes the dataset, model configuration, and training procedure for BloombergGPT.

Untitled 4

Untitled

Suggestions

BloombergGPT is a decoder-only causal LM based on BLOOM with 50B parameter for finance.

It claims the largest domain-specific dataset (based on Bloomberg’s extensive data sources) yet with 363B tokens, augmented with 345B tokens from general corpora.
The model outperforms existing general LM on financial tasks while not sacrificing performance on general NLP benchmarks.
1. pre-trains on Large domain-specific and general corpora and use this PLM via in-context learning (not fine-tuned LLM) #pic1
2. adopted on Chinchilla scaling law for using 50B param #pic2
3. outperforms generic-purpose LLMs on finance-specific tasks with still performing well on generic NLP tasks #pic3, 4(only BBH)

Personal note. 기존 접근법처럼 PLM에 adapted하는 방식(1st Pre-training on generic corpora → 2nd pre-training on domain-specific corpora)이 더 나을 수도 있지 않을까? 비교해보면 좋을 듯.

comment. 논문에서 (general + domain-specific) 한꺼번에 pre-training 하는 거랑 1.general → 2. domain-specific 순차적으로 하는 것 비교가 없었나요?

네 왜때문인지 모르겠지만 (아마 비용? 아니면 그런 투스텝 방식이 encoder-only의 BERT에서 유효했어서? 인지는 모르겠지만) 못찾았습니다. 뭔가 비교도 뭐 PaLM 이런거랑은 안해서 약간 아쉽네요