1 minute read

Meta info.
  • Authors: Yunjia Qi, Hao Peng, Xiaozhi Wang, Bin Xu, Lei Hou, Juanzi Li
  • Paper: https://arxiv.org/pdf/2410.24175
  • Affiliation: Tsinghua Univ.
  • Published: October 31, 2024

TL; DR

์ œ์•ฝ์กฐ๊ฑด์„ ์žฌ์ƒ์„ฑ (backtranslation) ์‹œํ‚ค๋ฉด ์ œ์•ฝ์กฐ๊ฑด์„ ๋” ์ž˜ ๋”ฐ๋ฅด๋”๋ผ

image.png

image.png

image.png

image.png

image.png

Problem States

constraints์ด ๋งŽ๊ณ  ๋ณต์žกํ•œ ๊ฒฝ์šฐ (+ ์•”์‹œ์ ์ธ ๊ฒฝ์šฐ) ์„ฑ๋Šฅ์ด ๋‚ฎ์€ LLM

Suggestions

constraints์„ backtranslation ์‹œํ‚ค๋Š” ๋ฐฉ์‹์œผ๋กœ output ์ƒ์„ฑ

  • CRAB ๋ฐ์ดํ„ฐ์…‹ ๊ตฌ์ถ•: 13.5K ๊ทœ๋ชจ, ํ‰๊ท  7๊ฐ€์ง€ ์ œ์•ฝ์กฐ๊ฑด ํฌํ•จ
    • ์›๋ž˜ ๋ฐ์ดํ„ฐ์…‹(Alpaca, Evol Instrct, โ€ฆ )์˜ instruction + response pair ๋Œ€์ƒ
    • LLM์˜ response๊ฐ€ ์ด๋ฏธ ์ถฉ์กฑํ•˜๊ณ  ์žˆ๋Š” response์˜ ๊ตฌ์ฒด์ ์ธ ์กฐ๊ฑด๋“ค์„ ์ถ”๊ฐ€์ ์ธ ์ œ์•ฝ์œผ๋กœ ์„ค์ • (+ ๋น„์šฉ์ ˆ๊ฐํšจ๊ณผ)
      • constraints: ์ €์ž๋“ค์ด ์‹๋ณ„ํ•œ 19๊ฐ€์ง€ ์ œ์•ฝ ์ค‘ ๊ธธ์ด, ํ‚ค์›Œ๋“œ, ๋ฌธ์žฅ๋ถ€ํ˜ธ ๋“ฑ ๊ฐ€๋ฒผ์šด ์กฐ๊ฑด์— ๋Œ€ํ•ด์„œ๋Š” Python ์œผ๋กœ, ๋‚˜๋จธ์ง€๋Š” Llama3-70B-Instruct์œผ๋กœ ๋ง๋ถ™์ž„.
      • 6~8๊ฐœ์˜ constraints๋ฅผ ๊ฐ instruction์— ์ถ”๊ฐ€
      • 50%์— ๋Œ€ํ•ด์„œ๋งŒ 1~3๊ฐœ์˜ demonstration ์ถ”๊ฐ€
  • after-training: ๊ธฐ์กด instruction-tuning ์Šคํƒ€์ผ๋Œ€๋กœ, ๋ณต์žกํ•œ ์ œ์•ฝ์„ ์ž…๋ ฅ์œผ๋กœ, response+ ์ œ์•ฝ์„ ํ•จ๊ป˜ ์ถœ๋ ฅํ•˜๋„๋ก Meta-Llama-3-8B, Mistral-7B-v0.3์— ํ›„์† ํ•™์Šต ์ˆ˜ํ–‰ (Loss = pre-training loss + after-training loss)

Effects

  • Constraints backtranslation์ด ์œ ์šฉํ•œ ์‚ฌํ›„ํ•™์Šต์˜ objective๊ฐ€ ๋  ์ˆ˜ ์žˆ๋‹ค.
  • ๋ณต์žกํ•œ instruction์„ ํ•™์Šต์‹œํ‚จ baseline(Conifer)๋ณด๋‹ค DPO ํŠœ๋‹๊นŒ์ง€ ํ•˜๋ฉด ์ด ๋ฐฉ์‹์ด ๋” ์œ ์˜ํ–ˆ๋‹ค๊ณ  ์ฃผ์žฅ

Personal note. ์œ ๋ฆฌ์™€ ํ•จ๊ป˜ ์ž ์‹œ ๊ณ ๋ฏผํ•˜๋˜ ๊ทธ ๋‚ด์šฉ(์ธ๊ฐ„ ์ž…์žฅ์—์„œ ์‰ฌ์šด task ๋ฅผ LLM์€ ์™œ ๋ชปํ•˜๋‚˜..!)๊ณผ ์œ ์‚ฌํ•ด์„œ ๊ฐ€์ ธ์™€๋ด…๋‹ˆ๋‹ค. > > > ์™„์ „ ๋น„์Šทํ•œ ์ ‘๊ทผ์€ ์•„๋‹ˆ์ง€๋งŒ (์‰ฌ์šด๋ฌธ์ œ ์ž์ฒด๋ณด๋‹ค๋Š” ๊ธฐ์กด์˜ LLM์ด ์ž˜ ํ’€๋˜ ๋ฌธ์ œ์— ์ œ์•ฝ์„ ์ถ”๊ฐ€ํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ๋ฌธ์ œ๋ฅผ ๊ผฌ์•„์„œ ์ ‘๊ทผ) > > ๋ฐฉ์‹์ด ํŠน๋ณ„ํ•˜๋‹ค๊ธฐ๋ณด๋‹ค๋Š” ์ œ์•ˆ ๋ฐฉ๋ฒ•์ด ์ƒ๋Œ€์ ์œผ๋กœ ๊ฐ„ํŽธํ•˜๊ณ , ๊ธฐ์กด์˜ instruction tuning์— ์ด์–ด์„œ ๋ฐ”๋กœ ํ™œ์šฉ๋„ ๊ฐ€๋Šฅํ•  ๊ฒƒ์œผ๋กœ ๋ณด์ž…๋‹ˆ๋‹ค. >