2 minute read

Meta info.

TL; DR

์ตœ์‹  ๋Œ€ํ™” ๋ชจ๋ธ์€ ์ข…์ข… ์ •์ฒด์„ฑ์„ ์œ ์ง€ํ•˜์ง€ ๋ชปํ•˜๋ฉฐ, expanded attention & classifier-based reranking์œผ๋กœ ์˜ค๋ฅ˜๋ฅผ 65% ์ค„์ผ ์ˆ˜ ์žˆ์œผ๋‚˜ ์—ฌ์ „ํžˆ challenge์ด๋‹ค.

image 1 image 2 image

Background

  • ๋Œ€๊ทœ๋ชจ open-domain dialogue system (๋‹น์‹œ facebook์˜ BlenderBot, Google์˜ Meena ๋“ฑ)์€ fluency ์™€ engagement๋ฅผ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œ์ผฐ์œผ๋‚˜,
  • persona-conditioned models๋Š” role ์œ ์ง€ ์ธก๋ฉด์—์„œ ์—ฌ์ „ํžˆ ๋ชจ์ˆœ, ๋ฐ˜๋ณต, ํ™˜๊ฐ ํ˜„์ƒ ๋ฐœ์ƒ.
  • ๋Œ€์ฒด๋กœ factual consistency, grounding, contradiction detection๋“ฑ์˜ ์—ฐ๊ตฌ๊ฐ€ ์ฃผ๋ฅ˜๋กœ, identity consistency๋Š” ์•„์ง ์ดˆ๊ธฐ๋‹จ๊ณ„

Problem States

๋ชจ๋ธ์ด ์–ด๋–ป๊ฒŒ ํ•˜๋ฉด ์ž๊ธฐ ์Šค์Šค๋กœ์— ๋Œ€ํ•œ ์ •์ฒด์„ฑ์„ ์œ ์ง€ํ•  ์ˆ˜ ์žˆ์„๊นŒ

  • ๋Œ€ํ™” ์—์ด์ „ํŠธ๋Š” ์ข…์ข… โ€œinterlocutor์˜ identity๋ฅผ ์ทจํ•œ๋‹คโ€; ์ƒ๋Œ€ ์—ญํ• ์„ ์ž๊ธฐ ๊ฒƒ์œผ๋กœ ์ฐฉ๊ฐํ•œ๋‹ค
    • guest ์—ญํ•  model์ด ๊ฐ‘์ž๊ธฐ thief๋‚˜ hunter๋ผ๊ณ  ์ฃผ์žฅํ•˜๋Š” ๊ฒฝ์šฐ tab 1
    • LIGHT์—์„œ ์ธ๊ฐ„์€ ์บ๋ฆญํ„ฐ๋ฅผ ์œ ์ง€(1.34% ์˜ค๋ฅ˜)ํ•˜๋‚˜, ๋ชจ๋ธ์€ ์•ฝ 35%์—์„œ ์‹คํŒจ
      • LIGHT:
    • ์ •์ฒด์„ฑ ์œ ์ง€==turn์ด ์ง€๋‚˜๋ฉด์„œ๋„ ์ž์‹ /์ƒ๋Œ€๋ฐฉ ์—ญํ•  ์˜ค์ธํ•˜์ง€ ์•Š๋„๋ก

Suggestions

identity ์ฐฉ๋ž€ ์™„ํ™” ๋ฐฉ์•ˆ

  • RPA ๊ธฐ๋ฐ˜ reranking
    • RPA classifier: Poly-encoder Transformer ๊ธฐ๋ฐ˜ ๋ฐœํ™”๊ฐ€ ์ฃผ์–ด์ง„ ์บ๋ฆญํ„ฐ ์ •์ฒด์„ฑ์— ๋ถ€ํ•ฉํ•˜๋Š”์ง€ ํŒ๋ณ„ํ•˜๋Š” ๋ชจ๋ธ
      • motivation: ์–ด๋–ค ๋ฐœํ™”๊ฐ€ ์–ด๋–ค ํŽ˜๋ฅด์†Œ๋‚˜/์บ๋ฆญํ„ฐ๊ฐ€ ํ•  ๋ฒ•ํ•œ ๋ง์ธ์ง€ ํŒ๋ณ„ํ•  ์ˆ˜ ์žˆ๋Š” ์ž๋™๊ณ„์‚ฐ metric์ด ์žˆ๋‹ค๋ฉด ํ›ˆ๋ จํ• ๋•Œ๋‚˜ ํ‰๊ฐ€ํ•  ๋•Œ ์“ธ ์ˆ˜ ์žˆ์ง€ ์•Š์„๊นŒ = RPA classifier ํ•™์Šต
      • ๋ฐฉ๋ฒ•:
        • LIGHT ๋Œ€ํ™”์˜ context๋ฅผ ๋ฐ›์•„๋‹ค๊ฐ€ (์บ๋ฆญํ„ฐ ์ด๋ฆ„, persona, ์žฅ์†Œ ๋“ฑ + ๋Œ€ํ™” history)
        • ๋ชจ๋ธ์ด ์ƒ์„ฑํ•œ candidate๋ฅผ ๋ณด๊ณ  ์–ด๋–ค ์บ๋ฆญํ„ฐ ๋ฐœํ™”์— ๊ฐ€์žฅ ์ ํ•ฉํ•œ์ง€ ํŒ๋‹จ = ๋ถ„๋ฅ˜๋ฌธ์ œ
        • negative sampling: ์ •๋‹ต ์บ๋ฆญํ„ฐ + 99๊ฐœ ๋žœ๋ค ์บ๋ฆญํ„ฐ ํ›„๋ณด ์ค‘์—์„œ ๋งž์ถ”๋„๋ก ํ•™์Šต
      • option: full or token tab 2
        • ์ „์ฒด ๋ฐœํ™” ๊ธฐ๋ฐ˜ RPA(full)
        • Left-to-Right RPA(token): ํ† ํฐ๋‹จ์œ„๋กœ ์ƒ์„ฑ๋œ ๋„์ค‘์— ๊ทธ ์‹œ์ ๊นŒ์ง€๋งŒ ๋ณด๊ณ  ์–ด๋А ํ™”์ž์ธ์ง€ ๋งž์ถฐ๋ณด๊ธฐ
        • e.g., tab 11 Hey there mermaid! <- ๊นŒ์ง€ ๋ณด๊ณ  ๋‚˜๋ฉด ๊ฐ‘์ž๊ธฐ Mermaid ์บ๋ฆญํ„ฐ์— ๋Œ€ํ•œ ํ™•๋ฅ  ์ฆ๊ฐ€ (ํ˜ธ์นญ์ธ์ง€ ์ž๊ธฐ์†Œ๊ฐœ์ธ์ง€ ํ† ํฐ๋‹จ์œ„๋งŒ์œผ๋กœ๋Š” ํ™”์ž ๋ชป๋งž์ถ”๋Š” ๊ฒฝํ–ฅ ํ™•์ธ)
    • reranking: ๋ชจ๋ธ์ด ์—ฌ๋Ÿฌ ํ›„๋ณด ์‘๋‹ต์„ ์ƒ์„ฑํ•˜๋ฉด, ๊ทธ์ค‘์—์„œ RPA ์ ์ˆ˜๊ฐ€ ๊ฐ€์žฅ ๋†’์€ ๊ฑธ ์“ฐ์ž
      • ๋ฐœํ™”๋‹จ์œ„ re-ranking (utt, full): ๊ฐ€์žฅ in-characterํ•œ ๊ฑธ ์„ ํƒ
      • PACER(token): ์ƒ์„ฑ ๋„์ค‘๊นŒ์ง€๋งŒ ๋ณด๊ณ  RPA ์ ์ˆ˜ ๋†’์€๊ฑธ ์“ฐ๋Š”๋ฐ
        • ๋งค ํ† ํฐ๋งˆ๋‹ค ํ•  ์ˆ˜๋Š” ์—†์œผ๋‹ˆ (์†๋„์ธก๋ฉด ํ•œ๊ณ„)
        • ์ผ๋ถ€ ์Šคํ…(์˜ˆ์‹œ๋กœ ์ „์ฒด ํ† ํฐ ์œ„์น˜์˜ 5% or 33%๋งŒ)์—์„œ (๊ฒฝํ—˜์ ์œผ๋กœ ์‹คํ—˜ํ•ด์„œ ๊ฒฐ์ •๋จ, tab 3)
        • ์†Œ์ˆ˜์˜ ํ›„๋ณด(์ด์ „์ƒ์„ฑ + top-10 token)์— ๋Œ€ํ•ด์„œ๋งŒ RPA ๊ณ„์‚ฐ โ†’ reranking
  • unlikelihood: ์ƒ์„ฑ ํ›„๋ณด์— ๋Œ€ํ•ด RPA๋กœ ๊ณจ๋ž๋Š”๋ฐ ์ •๋‹ต์ด ์•„๋‹ˆ์—ˆ์„ ๊ฒฝ์šฐ์— ๋Œ€ํ•ด ํŒจ๋„ํ‹ฐ ๋ถ€์—ฌํ•˜๋Š” ๋ณด์กฐ Loss ์„ค๊ณ„
  • multi-objective learning: Next token prediction + ๋ˆ„๊ตฌ ๋ฐœํ™”์ธ๊ฐ€ ๊ฐ™์ด ํ•™์Šต (joint learning loss)
  • expanded attention: ๋””์ฝ”๋”๊ฐ€ ํ•ญ์ƒ ์ž๊ธฐ persona ๋‹ค์‹œ ๋ณด๋„๋ก re-attention๊ฐ•์ œํ•˜๊ธฐ (cross-attention ๋‹จ๊ณ„)
    • persona grounding(์ˆ˜๋™ ์„ ํƒ): A ์ž๊ธฐ ํŽ˜๋ฅด์†Œ๋‚˜, B ์ž๊ธฐ ์ด๋ฆ„, C ์ƒ๋Œ€ ์ด๋ฆ„, D ๋ฐฐ๊ฒฝ ์„ค์ •์„ ๋ฌถ์–ด persona subset ๋งŒ๋“  ๋’ค expanded attention์œผ๋กœ ๋‹ค์‹œ ๋ณด๊ฒŒ ํ•˜๊ธฐ = ABCD ๋‹ค ๋ณด๊ณ  2ํšŒ ๋ฐ˜๋ณตํ–ˆ์„ ๋•Œ ๊ฐ€์žฅ ์ข‹์€ ์„ฑ๋Šฅ
    • automated grounding(์ž๋™ ์„ ํƒ): Decoder-attn ๊ธฐ๋ฐ˜ / Trainable mask(๋ณ„๋กœ์˜€๋‹ค๊ณ ) ํ˜น์€ RPA attention ํ™œ์šฉ (์ˆ˜๋™ ์„ ํƒ์ด๋ž‘ ๋น„์Šท, tab 7)
      • ์ฆ‰, ์ˆ˜๋™ ์„ ํƒ์„ ์œ„ํ•œ persona ๊ฐ™์€ meta ์ •๋ณด๊ฐ€ ์—†์œผ๋ฉด attention ํ™œ์šฉํ•˜๋ฉด ๋  ๊ฒƒ

Effects

tab 4 main table; expanded attention+reranking ์กฐํ•ฉ์ด ์ตœ์ 

Personal note. memory conflict ๊ด€๋ จํ•ด์„œ ๋ณด๋˜ ํŽ˜์ดํผ ๋ฆฌ์ŠคํŠธ์—์„œ ์ฐธ๊ณ ํ•ด์„œ ํ™•์ธํ–ˆ์Šต๋‹ˆ๋‹ค. ์™„ํ™” ๋ฐฉ๋ฒ•์ด ๋‹ค์–‘ํ•˜๊ณ  ablation์ด ์ž˜ ๋œ ๊ฒƒ ๊ฐ™๊ธฐ๋Š” ํ•œ๋ฐ ๋ฐฉ์‹ ์ž์ฒด๊ฐ€ ํ˜„์žฌ์˜ LLM์— ์ ์šฉํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ์ ‘๊ทผํ•˜๊ธฐ์—๋Š” ๋‹ค์†Œ ์กฐ์žกํ•˜๊ณ , ๊ทธ ๋ฐฉ์‹์˜ ํ•œ๊ณ„๋„ ๋„ˆ๋ฌด ๋‹น์—ฐํžˆ ๋ฐœ์ƒ๋  ๊ฒƒ ๊ฐ™์€ ํ๋ฆ„์ด์ž, ๊ฒ€์ฆ๋œ ๊ฒฐ๊ณผ ์—ญ์‹œ ๋งค์šฐ ๊ฒฝํ—˜์ ์ธ ๋ถ€๋ถ„์— ๊ธฐ๋Œ€๊ณ  ์žˆ๋Š” ๋“ฑ ๋Œ€์ฒด๋กœ findings์ธ ์ด์œ ๊ฐ€ ๋ช…๋ฐฑํ•ด๋ณด์ด๊ธฐ๋Š” ํ•ฉ๋‹ˆ๋‹ค. (๊ฒฐ๋ก ์ •๋„๋งŒ ์ทจํ•ด๋„ ์ข‹์„ ๊ฒƒ ๊ฐ™์•„์„œ ์‹คํ—˜ ๊ฒฐ๊ณผ๋ฅผ ํฌ๊ฒŒ ์ƒ๋žตํ–ˆ์Šต๋‹ˆ๋‹ค.) ๋‹ค๋งŒ ๋ฌธ์ œ์ œ๊ธฐ๋Š” ํ™•์‹คํ•˜๊ณ  ๋ณด๋ ค๊ณ  ํ–ˆ๋˜ ๋ฐฉํ–ฅ๊ณผ ์œ ์‚ฌํ–ˆ๋‹ค๋Š” ์ ์ด ์ธ์ƒ๊นŠ์—ˆ๊ณ  (ํ˜ธ์นญ์„ ํ™”์ž๋กœ ์ฐฉ๊ฐํ•œ๋‹ค๋Š” ์ ์„ ๊ตฌ์ฒด์ ์œผ๋กœ ํ™•์ธํ•ด๋‚ธ ์ ), ์ด ๋ฌธ์ œ๋ฅผ ์ถ”์ ํ•˜๊ณ  ์žˆ๋Š” ์—ฐ๊ตฌ๊ฐ€ ์žˆ๋Š”์ง€ ๋งˆ์ € ์‚ดํŽด๋ณด๊ณ  ์žˆ์ง€๋งŒ, ๋‹น์—ฐํžˆ๋„ ๋งค์šฐ ์ข์€ ๋ฌธ์ œ๋ฅผ ๋‹ค๋ฃจ๊ณ  ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ์ด๋ฅผ ์ง‘์ค‘ํ–ˆ๋‹ค๊ณ  ๋ณด๊ธด ์–ด๋ ค์šธ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.