less than 1 minute read

Meta info.
  • Authors: Arjun Panickssery, Samuel R. Bowman, Shi Feng
  • Paper: https://openreview.net/pdf?id=4NJBV6Wp0h
  • Affiliation: George Washington Univ., MATS, New York Univ.
  • Published: September 26, 2024
  • Conference: NeurIPS2024

Untitled 1 Untitled 2 Untitled 3 Untitled 4

TL; DR

LLM์€ ์ž๊ธฐ๊ฐ€ ๋งŒ๋“  ๊ฒฐ๊ณผ๋ฅผ ์„ ํ˜ธํ•œ๋‹ค๋Š” ๊ธฐ์กด ์ฃผ์žฅ์— ๋Œ€ํ•œ ์‹ฌ์ธต ๋…ผ์˜ (๊ฒฐ๋ก : ์‹ค์ œ ๊ทธ๋ ‡๋‹ค)

image.png

Background

LLM์œผ๋กœ ์ƒ์„ฑํ•œ ๊ฒฐ๊ณผ๋ฅผ ํ‰๊ฐ€ํ•  ๋•Œ judge๋กœ LLM(์Šค์Šค๋กœ)์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ์ผ๋ฐ˜ํ™”๋จ

Problem States

๋ชจ๋ธ์€ ์ž๊ธฐ๊ฐ€ ์ƒ์„ฑํ•œ๊ฑธ ์„ ํ˜ธํ•œ๋‹ค๋”๋ผ (์ž๊ธฐ์„ ํ˜ธ)

  • Research Question: ๋ชจ๋ธ์ด ์Šค์Šค๋กœ ์ƒ์„ฑํ•œ ๊ฒƒ์ธ์ง€๋ฅผ ์ธ์‹ํ•˜๋ฉด(์ž๊ธฐ์ธ์‹) ์ž๊ธฐ ์„ ํ˜ธ์— ์˜ํ–ฅ์„ ๋ฏธ์น ๊นŒ?

Suggestion

์ž๊ธฐ ์ธ์‹์— ๋Œ€ํ•œ tuning

  • task: ๋ชจ๋ธ์ด ์ž๊ธฐ๊ฐ€ ๋งŒ๋“  text์ธ์ง€ ๋ถ„๋ฅ˜
  • data: ์ž๊ธฐ๊ฐ€ ๋งŒ๋“  text / ์‚ฌ๋žŒ์ด ๋งŒ๋“  text pair

Effect

  • Experimental setup:
    • task: text summarization
    • dataset: XSUM, CNN/DailyMail (๋‰ด์Šค๊ธฐ์‚ฌ)
    • backbone: Llama-2-7b-chat, GPT-3.5, GPT-4
  • Results:
    • GPT-4 ๋“ฑ vanilla LLM์€ ์ž๊ธฐ ์ธ์‹์„ ์ž˜ ํ•˜๋Š” ๊ฒƒ์€ ์•„๋‹ˆ์—ˆ์œผ๋‚˜,
    • ์ž๊ธฐ ์ธ์‹์— ๋Œ€ํ•ด์„œ ํŠœ๋‹์„ ํ•˜๋ฉด, ์ž๊ธฐ ์ธ์‹์„ ๊ฑฐ์˜ ์™„๋ฒฝํ•˜๊ฒŒ ํŒŒ์•… ๊ฐ€๋Šฅํ•˜๊ณ ,
    • ๊ทธ์— ๋”ฐ๋ผ ์ž๊ธฐ์ธ์‹๊ณผ ์„ ํ˜ธ ์‚ฌ์ด ๊ฐ•ํ•œ ์„ ํ˜• ๊ด€๊ณ„ ํ™•์ธ

Personal note. LLM-as-a-judge ์Šคํƒ€์ผ์ด ์—ฐ๊ตฌ ํ•„๋“œ์—์„œ ์ผ๋ฐ˜๋ก ์ฒ˜๋Ÿผ ์‚ฌ์šฉ๋˜๋Š” ๊ฒฝํ–ฅ์— ๋Œ€ํ•ด์„œ ํ™•์‹คํžˆ ๊ฒฝ๊ณ„๊ฐ€ ํ•„์š”ํ•˜๋‹ค๋Š” ์ด์•ผ๊ธด๋ฐ, ๊ฒฐ๊ณผ๋งŒ ์ทจํ•ด๋„ ๋  ๊ฒƒ ๊ฐ™๊ธฐ๋Š” ํ•ฉ๋‹ˆ๋‹ค๋งŒ NeurIPS 2024 ๋ถ™์—ˆ๋‹ค๋Š”๊ฑธ ์ด์ œ ์•Œ์•„์„œใ…‹ใ…‹ ๋Œ์–ด์˜ฌ๋ ค๋ด…๋‹ˆ๋‹ค.