1 minute read

Meta info.
  • Authors: Alexey Gorbatovski, Boris Shaposhnikov, Viacheslav Sinii, Alexey Malakhov, Daniil Gavrilov
  • Paper: https://arxiv.org/pdf/2502.01237
  • Affiliation: T-Tech
  • Published: February 3, 2025

TL; DR

Direct Alignment Algorithms (DAAs)์˜ ๊ตฌ์กฐ์  ์ฐจ์ด ๋ถ„์„, RL ์—†์ด๋„ DPO ์ˆ˜์ค€์˜ ์„ฑ๋Šฅ ๋‹ฌ์„ฑ ๊ฐ€๋Šฅ์„ฑ ์‹œ์‚ฌ

image.png

image.png

image.png

Background

LLM Alignment ๋ฐฉ๋ฒ•๋ก ์œผ๋กœ DAA ๋“ฑ์žฅ

- DAA: RL, RM ์—†์ด ์ง์ ‘์ ์œผ๋กœ Policy update (๋ณดํ†ต SFT) - ์ฃผ์š” alignment ๊ด€๋ จ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๋ถ„๋ฅ˜
    
    
| **Method** | **Loss Function** | **Loss ๊ณ„์‚ฐ ๋ฐฉ์‹** | **SFT ํ•„์š” ์—ฌ๋ถ€** |
| --- | --- | --- | --- |
| **DPO** (Direct Preference Optimization) | Likelihood Ratio | Pairwise | Two-Stage |
| **IPO** (Identity Preference Optimization) | Likelihood Ratio | Pairwise | Two-Stage |
| **SimPO** (Simple Preference Optimization) | Likelihood Ratio | Pairwise | Two-Stage |
| **ORPO** (Odds Ratio Preference Optimization) | Odds Ratio | Pairwise | One-Stage |
| **ASFT** (Aligned Supervised Fine-Tuning) | Odds Ratio | Pointwise | One-Stage |
| **NCA** (Noise Contrastive Alignment) | Likelihood Ratio | Pointwise | Two-Stage |
| **Cal-DPO** (Calibrated DPO) | Likelihood Ratio | Pairwise | Two-Stage |
| **APO-Zero** (Anchored Preference Optimization Zero) | Likelihood Ratio | Pointwise | Two-Stage |
- loss ๊ณ„์‚ฐ ๋ฐฉ์‹: pair-wise vs. point-wise
    - pair-wise: ๋‘ ๊ฐœ์˜ ์‘๋‹ต์„ ๋น„๊ต, ํ•˜๋‚˜๋ฅผ ์„ ํ˜ธํ•˜๋„๋ก ํ•™์Šต.
    - point-wise: ๊ฐœ๋ณ„ ์‘๋‹ต์˜ ์ ์ˆ˜๋ฅผ ์กฐ์ •ํ•˜์—ฌ ํ•™์Šต.
- reward function: Likelihood ratioย `ASFT`ย (DPO, IPO, SimPO, NCA, Cal-DPO, APO-Zero) vs. Odds ratioย `ORPO`ย (LฮฒASFTAlign, LฮฒORPOAlign)
    - `ORPO`: Odds Ratio PO
    - `ASFT`: Aligned SFT
- alignment ์ „์— SFT๋‹จ๊ณ„ ํ•„์š” ์—ฌ๋ถ€: 1-stage(ASFT, ORPO) vs. 2-stage(DPO, IPO, SimPO) - Research Question & Results:
- #1 One-stage ๋ฐฉ๋ฒ•(ORPO, ASFT)์— SFT ๋‹จ๊ณ„๋ฅผ ์ถ”๊ฐ€ํ•˜๋ฉด ์„ฑ๋Šฅ ์˜ค๋ฅผ๊นŒ?ย `YES`
    - ORPO๋Š”ย **DPO ์ˆ˜์ค€**๊นŒ์ง€๋„ ๋‹ฌ์„ฑย `table 1`
- #2 ฮฒ, tempering factor ๊ฐ™์€๊ฒŒ ASFT์™€ ORPO ์˜ alignment ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๋‚˜?ย `YES`
    - ฮฒ(์ •๋ ฌ ์ˆ˜์ค€..? ์„ธ๊ธฐ ์กฐ์ ˆ): ๋„ˆ๋ฌด ํฌ๊ฑฐ๋‚˜ ์ž‘์œผ๋ฉด ์„ฑ๋Šฅ์— ์•…์˜ํ–ฅ, ์ตœ์ ํ™”ํ•˜์—ฌ ์ ์ ˆํ•œ ๊ฐ’ ํ•„์š”ย `Figure 1`
- #3 DAA์—์„œ ๋ญ๊ฐ€ alignment ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ฌ๊นŒ? (ํ•ต์‹ฌ ์š”์ธ์ด ๋ญ์ง€?)ย `Pair-wise > Point-wise`
    - Pairwise ๋ฐฉ์‹(DPO, IPO, ORPO, SimPO)์ด Point-wise ๋ฐฉ์‹(NCA, ASFT)๋ณด๋‹ค ์„ฑ๋Šฅ์ด ์šฐ์ˆ˜.ย `Figure 3`
- #4 SFT์—์„œ ์‚ฌ์šฉ๋œ ๋ฐ์ดํ„ฐ ์–‘์ด alignment ํ’ˆ์งˆ์— ์–ด๋–ค ์˜ํ–ฅ์„ ์ฃผ๋Š”๊ฐ€?ย `์ ์–ด๋„ ๋œ๋‹ค`
    - SFT ๋ฐ์ดํ„ฐ๋ฅผ 5~10%๋งŒ ์จ๋„ alignment ์„ฑ๋Šฅ์ด ํฌ๊ฒŒ ํ–ฅ์ƒ (์ „์ฒด ์“ฐ๋Š”๊ฑฐ๋ž‘ ๋น„์Šทํ–ˆ๋‹ค๊ณ )ย `Figure 5`

Personal note. pic 4 ์˜ ํ‘œ๋Š” ๋…ผ๋ฌธ์—์„œ ์–ธ๊ธ‰ํ•˜๊ณ  ์žˆ๋Š” ์ฃผ์š” PO ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆํ•œ 3๊ฐ€์ง€ ๋ถ„๋ฅ˜๋กœ ์ œ๊ฐ€ ๋‹ค์‹œ ์ •๋ฆฌํ–ˆ๋Š”๋ฐ ๊ผผ๊ผผํžˆ ๊ฒ€ํ† ํ•˜์ง„ ์•Š์•„์„œ ์ •ํ™•ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค๋งŒ, ์ตœ์‹ ์˜ PO ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์ •๋ฆฌํ•˜๊ณ  ํ๋ฆ„ ํ™•์ธํ•ด๋ณด๋Š”๋ฐ ์œ ์ตํ–ˆ์Šต๋‹ˆ๋‹ค. ์ตœ๊ทผ ๋งˆ๊ฐ์ค‘์ธ ์ œ์•ˆ์„œ ์—ฐ๊ตฌ๋‚ด์šฉ๋ถ€๋ถ„ ๋‚ด๊ธฐ ์ „์— ๋ดค์œผ๋ฉด ์กฐ๊ธˆ ๋” ๋งŽ์€ ์ ์„ ์–ธ๊ธ‰ํ•ด๋ณผ ์ˆ˜ ์žˆ์—ˆ์„์ง€๋„..