2 minute read

Meta info.

TL; DR

multi-turn dialogue์—์„œ LLM Function Calling์„ ํ‰๊ฐ€ํ•˜๋Š” ๋ฒค์น˜๋งˆํฌ CONFETTI ์ œ์•ˆ. ํ˜„์žฌ ๋ชจ๋ธ๋“ค์€ ์—ฌ์ „ํžˆ ๋ณต์žกํ•œ ์—ฐ์‡„์˜/๊ธด ์ปจํ…์ŠคํŠธ/๋Œ€ํ˜• API ์„ ํƒ์— ํ•œ๊ณ„๊ฐ€ ์žˆ์Œ์„ ํ™•์ธ.

image 1 image 2 image 3 image 4 image 5 image 6 image 7 image

Background

  • Function Calling: user utterance(request)์—์„œ ๊ตฌ์กฐํ™”๋œ API call์„ ์ƒ์„ฑํ•˜๋Š” task
    • LLM ํ™œ์šฉ ๋Œ€ํ™” ์‹œ์Šคํ…œ์—์„œ ํ•ต์‹ฌ ๊ธฐ๋Šฅ์œผ๋กœ ์ž๋ฆฌ๋งค๊น€
  • ๊ธฐ์กด์˜ ๊ด€๋ จ ๋ฒค์น˜๋งˆํฌ๋“ค์€ ๋Œ€ํ™” ๊นŠ์ด, API ๋‹ค์–‘์„ฑ, turn-level ์„ธ๋ถ„ํ™” ๋ถ€์กฑ

Problem States

LLM์€ ์•„์ง ์—ฌ๋Ÿฌ API ์ค‘ ์ ์ ˆํ•œ๊ฑธ ๊ณ ๋ฅด๊ณ  > ๋ช…ํ™•ํ•˜์ง€ ์•Š์€ ์œ ์ € ๋ชฉํ‘œ๋ฅผ ํŒŒ์•…ํ•˜๊ณ  > ๊ทธ์— ๋”ฐ๋ฅธ ์—ฐ์‡„์ ์ธ function call ์ฒ˜๋ฆฌ ์—ญ๋Ÿ‰ ๋ถ€์กฑ

Suggestion

CONFETTI

- ๋Œ€ํ™” ์ž์ฒด๋Š” ์ „๋ถ€ ์‚ฌ๋žŒ์ด ๋งŒ๋“ค์—ˆ๊ณ  ์ตœ๋Œ€ 25๊ฐœ์˜ API Call ์‹œ๋‚˜๋ฆฌ์˜ค(์ตœ๋Œ€ 4ํšŒ๊นŒ์ง€์˜ chaining), turn๋‹จ์œ„ ํ‰๊ฐ€ ๊ฐ€๋Šฅํ•˜๋„๋ก ๊ตฌ์ถ•, 13๊ฐœ ์œ ํ˜•์œผ๋กœ complexity๋ฅผ ๊ตฌ์„ฑย `table1`
    - complexity: ์œ ์ € ๋ชฉํ‘œ๊ด€๋ จ, ์ •๋ณด ์ œ๊ณต ์–‘ ๊ด€๋ จ, ์‹คํ–‰ ์‹คํŒจ ํ˜น์€ ๋ถˆ๊ฐ€ ๊ด€๋ จ,ย **๋Œ€ํ™” ํ๋ฆ„ ์ œ์–ด ๊ด€๋ จ**, ๊ทธ ์™ธ ๊ธฐํƒ€ ๋“ฑ
- 2๊ฐ€์ง€ ๋ฒค์น˜๋งˆํฌ ๊ตฌ์„ฑ: function calling / response quality
    - function calling bench:ย **FC์ด ํ•„์š”ํ•œ turn๋งˆ๋‹ค**
        - input: (ํ•ด๋‹น turn๊นŒ์ง€ user-agent) context + ์ด์ „ FC ๋‚ด์—ญ
        - output: ๋‹ค์Œ์— ํ˜ธ์ถœํ•  function 1๊ฐœ ์ด์ƒ
    - response quality:ย **๋งค (agent์˜) turn๋งˆ๋‹ค**
        - input: context + API schema
        - output:ย **dialog act**ย ๋ถ„๋ฅ˜
- ๊ตฌ์ถ• ๋ฐฉ๋ฒ•: ์ „๋ถ€ ์‚ฌ๋žŒ์ด ๋งŒ๋“ฆ.. (๋ชจ๋ธ์— ์ข…์†์„ฑ ์—†์ด ํ‰๊ฐ€๋  ์ˆ˜ ์žˆ๋„๋ก)
    - ์‹œ๋‚˜๋ฆฌ์˜ค ๋งŒ๋“ค๊ธฐ: ๋Œ€ํ™”๋ชฉํ‘œ + ์ƒํ™ฉ ์ •์˜ - ์ „๋ฌธ๊ฐ€ ๊ฒ€ํ† 
        - ๋ณต์žก์„ฑ ํ™•๋ณด ๋ชฉ์  (๋™์ผ ๋ชฉํ‘œ๋„ ๋‹ค๋ฅธ ๋ฐฉ์‹์˜ ๋Œ€ํ™” ๊ตฌ์ถ•)
        - ์‚ฌ์šฉํ•  api ๋ฆฌ์ŠคํŒ…, 13๊ฐœ์ค‘ ํฌํ•จํ•  ๋ณต์žก์„ฑ ์œ ํ˜•, ๋Œ€ํ™” ์‹œ์  (์ฃผ์ค‘์ธ์ง€ ์˜ค์ „์ธ์ง€ ๋“ฑ), ์ตœ์†Œ turn ์ˆ˜ ๋“ฑ
    - ๋Œ€ํ™” ๊ตฌ์ถ•: 1๋ช…์˜ annotator๊ฐ€ user-agent ๋ชจ๋‘ ์ˆ˜ํ–‰
- ํ‰๊ฐ€์ง€ํ‘œ:
    - AST soft accuracy: ํ•จ์ˆ˜ ์ข…๋ฅ˜, ํŒŒ๋ผ๋ฏธํ„ฐ ์ด๋ฆ„, value๊นŒ์ง€ ์‹น ๋งž์ถ”๋Š”์ง€
        - string์€ exact match ํ•˜๊ธฐ๋ณด๋‹ค๋Š” ์œ ์‚ฌ๋„ ํ‰๊ฐ€ (AlignScore)
    - Dialogue Act accuracy (classification)
    - parameter hallucination: api ๋ช…์„ธ์— ์—†๋Š”๊ฑฐ ๋ถ€๋ฅด์ง€๋Š” ์•Š์•˜๋Š”์ง€ (์‘๋‹ต ์‹ ๋ขฐ๋„ ํ‰๊ฐ€) - Results:
- ์ฃผ์š” ๋ชจ๋ธ๋ณ„ FC ์„ฑ๋Šฅย `table 4`
    - amazon ์ž์ฒด Nova Pro๊ฐ€ ์ตœ๊ณ ์„ฑ๋Šฅ (AST Soft Accuracy๊ธฐ์ค€)
    - ๊ทธ๋Ÿผ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  ์ตœ๋Œ€ 40% ์ˆ˜์ค€
- API ์ˆ˜๊ฐ€ ๋Š˜์ˆ˜๋ก ๋ชจ๋ธ์„ฑ๋Šฅ ๊ธ‰๊ฐ (ํŠนํžˆ 20๊ฐœ ์ด์ƒ์—์„œ Claude 3.0, LLaMA 70B ๋“ฑ)ย `figure 2`
    - ์‹ค์ œ ์‚ฌ์šฉ ๊ฒฝํ—˜ ์ธก๋ฉด์˜ ์ด์Šˆ ์˜ˆ์ƒ
- turn ์ˆ˜ ๊ธธ์ˆ˜๋ก ์ž˜ ์•ˆ๋  ๊ฒƒ ๊ฐ™์ง€๋งŒ ์ผ๋ถ€ ๋ชจ๋ธ (amazon ์ž์ฒด ๋ชจ๋ธ ํ˜น์€ LLaMA 405B ๋“ฑ)์€ ์˜คํžˆ๋ ค ์ƒ์Šนํ•˜๊ธฐ๋„ย `figure 3`
    - ์†Œํ˜• ๋ชจ๋ธ์ด ์–ด๋ ค์›€์„ ๊ฒช์Œ
- chaining์€ ๊ธธ์–ด์งˆ์ˆ˜๋ก ํฌ๊ฒŒ ํ•˜๋ฝ: 1๊ฐœ๊นŒ์ง€๋Š” 50% ๋ฏธ๋งŒ์ˆ˜์ค€์ด๋‹ค๊ฐ€, 1๊ฐœ๋งŒ ๋” ๋Š˜์–ด๋„ 20% ์ดํ•˜๋กœ ๊ธ‰๊ฐ, 3๊ฐœ ์ด์ƒ์—์„œ๋Š” ๋ชจ๋“  ๋ชจ๋ธ์ด 0~5% ์ˆ˜์ค€ย `figure 4`
- dialog act ๊ด€๋ จํ•ด์„œ๋Š” Claude 3.5 Sonnet์ด 73% ๋“ฑย `table 6`

Personal note. Thoughts

  1. dialog act ์ •์˜: intent์™€ ๋™์น˜ํ•  ์ˆ˜๋Š” ์—†์„ ์ง€๋„ ๋ชจ๋ฅด์ง€๋งŒ(a, b), ๋Š˜ ๊ทธ๋ ‡๋“ฏ ์ •์˜ํ•˜๊ธฐ ๋‚˜๋ฆ„์ด๊ธฐ ๋•Œ๋ฌธ์—.. ๋‹ค๋งŒ ์š”์ฆ˜์˜ ๊ฒฝ์šฐ ๋˜ ์ด ์—ฐ๊ตฌ์˜ ๊ฒฝ์šฐ๋Š” ํ•จ์ˆ˜ ํ˜ธ์ถœ์„ act๋กœ ์ฒ˜๋ฆฌํ•˜์—ฌ ๊ณผ์ž‰ ํ˜ธ์ถœ ํ˜น์€ ๋ฐ˜๋Œ€๋กœ ํ˜ธ์ถœ์„ ๋†“์น˜๋Š” ๊ฒฝ์šฐ ๋“ฑ์„ ํ™•์ธํ•˜๊ธฐ ์šฉ์ดํ•ด์ง. others ๋“ฑ์œผ๋กœ ๊ธฐํƒ€ ์‘๋‹ต ์—ญ์‹œ ํฌํ•จ 1) ์‹ค์ œ๋กœ intent์™€ ๋งค์šฐ ๋น„์Šทํ•œ ์„ฑ๊ฒฉ, ์˜ˆ๋ฅผ ๋“ค๋ฉด inform ํ• ์ง€, ์ •๋ณด๋ฅผ seekํ• ์ง€ ๋“ฑ ์ „ํ†ต์ ์œผ๋กœ ์ ‘๊ทผํ•˜๋˜ ๋ฐฉ์‹์€ ์œ ์‚ฌ. 2) dialog act ํ‰๊ฐ€ ๋ชฉ์  ์—ญ์‹œ ์˜ˆ์ „์˜ BLEU ๊ฐ™์€ ์œ ์‚ฌ๋„ ๊ธฐ๋ฐ˜ metric์ด user ์˜๋„์™€ ์—ญํ• ์— ๋Œ€ํ•ด์„œ ํ‰๊ฐ€ํ•ด์ฃผ์ง€๋Š” ์•Š์•˜๊ธฐ ๋•Œ๋ฌธ์ด๋ฏ€๋กœ..!
  2. ์˜ค๋Š˜ ๋ฏธํŒ…์—์„œ ๊ณต์œ ํ•œ ํ๋ฆ„ ์™ธ์—๋„ ๋ฒค์น˜๋งˆํฌ ๊ตฌ์ถ• ์ž์ฒด๋ฅผ ๋‹ค ์‚ฌ๋žŒ ์†์„ ํƒœ์šด ๊ฒŒ ์ฃผ์š” ๊ฐ•์ ์ธ๋“ฏ
  3. ๋Œ€ํ™” ํ๋ฆ„ ์ œ์–ดย ๊ด€๋ จํ•œ complexity๋ฅผ ์ •์˜ํ•˜๊ธด ํ–ˆ์ง€๋งŒ, ์šฐ๋ฆฌ ์—ฐ๊ตฌ์˜ ์ œ์•ˆ๋ณด๋‹ค๋Š” ๋‹จํŽธ์ ์ธ ํ๋ฆ„์œผ๋กœ ๋ณด์—ฌ์ง. ๋‹ค๋งŒ dialogue goal switching์„ ์ง์ ‘ ๊ณ ๋ คํ•˜๊ธฐ ์œ„ํ•œ ๋ชฉ์ ์œผ๋กœ, ์šฐ๋ฆฌ ์—ฐ๊ตฌ๋ฅผ ํฌ๊ด„ํ•˜๋Š” ํ๋ฆ„์œผ๋กœ ๋ณผ ์ˆ˜๋„โ€ฆ