less than 1 minute read

Meta info.

TL; DR

Causal decoder-only models trained on an autoregressive language modeling objective(standard FLM objective) exhibit the strongest zero-shot generalization w/o any finetuning or adaptation. However, after multitask finetuning, encoder-decoder model pretrained with MLM objective significantly better.

Untitled

Untitled

Untitled

Suggestions

a study about understanding the interplay between architecture, objective, multi-task fine-tuning, and zero-shot generalization.

  1. A large-scale systematic study of architecture + pretraining objective combination focusing on zero-shot generalization.
  2. Multi-task finetuning impacts architecture + pretraining objective combination. (TL; DR)
  3. Decoder-only models can be efficiently adapted from one architecture/objective prior to the other.

Personal note. 오늘 발표했던 T0논문 관련 후속연구로, 현재 진행되는 연구들이 취하고 있는 주류 관행이 맞는 접근이라는 걸 확인한 논문입니다.