4 minute read

Meta info.

TL; DR

Agent memory system์˜ hallucination์ด ์–ด๋””(extract > update > QA)์—์„œ ๋‚˜ํƒ€๋‚˜๋Š”์ง€ ์ง„๋‹จํ•˜๋Š” ๋ฒค์น˜๋งˆํฌ ์ œ์•ˆ

image 1 image 2 image 3 image 4 image 5 image 6 image 7 image 8 image

Background

  • memory๋Š” agent์™€ user ์‚ฌ์ด ์ƒํ˜ธ์ž‘์šฉ์—์„œ user ์ •๋ณด๋ฅผ ์™ธ๋ถ€ํ™”ํ•ด์„œ user๋ณ„ personalization + ์ผ๊ด€์„ฑ ์œ ์ง€ ๋‹ฌ์„ฑ: MemOS, Mem0, Zep, Supermemory, Memobase ๋“ฑ
  • ๋‹จ์ˆœ QA ์ค‘์‹ฌ e2e ๋ฒค์น˜๋งˆํฌ๋กœ๋Š” ์–ด๋””์„œ ๋ฌธ์ œ์ธ์ง€ ์‹๋ณ„ ๋ถˆ๊ฐ€: LoCoMo, LongMemEval, PrefEval, PersonaMem ๋“ฑ

Problem States

  • memory system์˜ ์—ญํ• ์„ ์ถ”์ถœ(E), ๊ฐฑ์‹ (U), ์งˆ์˜์‘๋‹ต(QA) 3๊ฐ€์ง€ operasation์œผ๋กœ ์ •์˜ย Fig 1
    • E: ์—†๋Š” fact๋กœ ๋Œ€ํ™” ์ˆ˜ํ–‰ํ•˜๊ฑฐ๋‚˜ ์œ ํšจํ•œ memory ๋†“์นจ
    • U: ์—…๋ฐ์ดํŠธํ•ด์•ผ๋˜๋Š”๋ฐ ๋ชปํ•˜๊ฑฐ๋‚˜ ์ž˜๋ชป ์—…๋ฐ์ดํŠธ
    • QA: memory์— ์—†๋Š” ์ •๋ณด ์ถ”๋ก ํ•˜๊ฑฐ๋‚˜ ๊ณผ๊ฑฐ ์ •๋ณด ์‚ฌ์šฉ, conflict ์ƒํ™ฉ ๋ฌธ์ œ ๋“ฑ
  • system ์ถœ๋ ฅ - ์ •๋‹ต ๊ฐ„ ๋น„๊ต๋ฅผ ํ†ตํ•ด ์˜ค๋ฅ˜๋ฅผ ๊ฐ ๋‹จ๊ณ„๋ณ„๋กœ ๊ท€์†์‹œํ‚ค๊ธฐ ์œ„ํ•ด์ •ํ™•ํ•œ annotation(=memory point) ํฌํ•จ๋œ multi-turn + user-centric dialogue ๊ตฌ์ถ•

Suggestions

HaluMem

  • ์‹œ๋‚˜๋ฆฌ์˜ค/persona/์„ธ์…˜ ํ๋ฆ„์€ ์‚ฌ๋žŒ์ด ์„ค๊ณ„ > ์ด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ dialogue๋Š” LLM์ด ์ƒ์„ฑ > ์ตœ์ข… memory point์™€ QA๋Š” ๋‹ค์‹œ ์‚ฌ๋žŒ์ด ๊ฒ€์ฆ ๋ฐ ์ •์ œ
    1. conflict ๋‚˜ update ๋ฐœ์ƒ ์‹œ์  ๋“ฑ์— ๋Œ€ํ•œ ํ๋ฆ„ ์„ค๊ณ„๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ
    2. LLM์ด conflict-turn ๋“ฑ์„ ์‚ฝ์ž…
    3. ์ˆ˜๋™์œผ๋กœ ์‚ฌ๋žŒ์ด memory point annotation
    4. memory point ๊ธฐ๋ฐ˜ QA ์ƒ์„ฑ ๋ฐ ์‚ฌ๋žŒ์ด validation์ˆ˜ํ–‰: multi-hop reasoning ์ •ํ™•ํ•œ์ง€, update ๋ฐ˜์˜ ์ œ๋Œ€๋กœ ์žก๋Š”์ง€, conflict case ์ œ๋Œ€๋กœ ํ…Œ์ŠคํŠธ ๋˜๋Š”์ง€ ๋“ฑ
  • ๊ทœ๋ชจ:
    • medium(~160k) ๋ฐ -long(10M) ๋‘ ๊ฐ€์ง€ ๋ฒ„์ „
      • 20 ~ 50 Session
      • 200 ~ 800 turn ๊ตฌ์„ฑ
      • long: memory-irrelavantย noiseย dialogue๋ฅผ ๊ณต๊ฒฉ์ ์œผ๋กœ ์‚ฝ์ž…
        • chitchat์ด๋‚˜ ELI5-style ๋Œ€ํ™”, math reasoning trace ๋“ฑ ๋Œ€๋†“๊ณ  noise
    • ์•ฝ 15,000๊ฐœ์˜ memory point: atomic persona fact๋กœ ๊ตฌ์„ฑ
      • persona profile, preferences, habit, routines, relationships, possessions, plans, goals, location, move history, health, restrictions, skills, knowledge, update ๋ฐœ์ƒ ์—ฌ๋ถ€์— ๋Œ€ํ•œ dynamic changes ๋“ฑ
      • memory๊ฐ€ ๋‚˜์˜จ ์›๋ณธ ๋Œ€ํ™” turn์˜ index๋ฅผ ๋ถ™์—ฌ์„œ ๊ตฌ์„ฑ, ์‹ค์ œ ์‹œ์Šคํ…œ์ด ๊ธฐ์–ตํ•ด์•ผํ•˜๋Š” NL statement๋กœ ์ •๊ทœํ™”(?)
      • Memory์˜ valid ์‹œ์ : update๊ฐ€ ๋ฐœ์ƒํ•˜๋ฉด, ๊ณผ๊ฑฐ memory๋Š” invalid๋กœ ์ฒ˜๋ฆฌ
    • ์•ฝ 3,400๊ฐœ ์ด์ƒ์˜ ์ฟผ๋ฆฌ ํฌํ•จ: basic fact, multi-hop, dynamic update, boundary condition, conflict detection, generalization & application ๋“ฑ์œผ๋กœ ๊ตฌ์„ฑ
  • memory ๊ธฐ์ค€:
    • long-term relevance: ๋‹จ๊ธฐ chat์—๋Š” ๊ด€์‹ฌ ์—†๊ณ , ์‹œ๊ฐ„์ด ์ง€๋‚˜๋„ ์œ ํšจํ•œ ์ •๋ณด๋งŒ memory point๋กœ ์ƒ์ •
    • stable + consistentํ•œ fact: ๋‹จ๋ฐœ์„ฑ ์ •๋ณด๊ฐ€ ์•„๋‹Œ, ๋ฐ˜๋ณต๋˜๊ณ  ๊ฐ•์กฐ๋˜๋Š” ๊ตฌ์ฒด์ ์ธ ์ •๋ณด๋ฅผ ์ƒ์ •
    • ๊ตฌ์กฐ: User + verb + object ํ˜•ํƒœ. ํ•˜๋‚˜์˜ ์‚ฌ์‹ค๋งŒ ๋‹ด๋„๋ก
      • ์›๋ฌธ์— implicitํ•œ ์ •๋ณด๊ฐ€ ์žˆ๋”๋ผ๋„ ์ถ”๋ก ํ•˜์ง€ ์•Š๊ณ  ์˜๋ฏธ๋ฅผ ๋ช…ํ™•ํ•˜๊ฒŒ๋งŒ ํ’€์–ด์”€. ๋ฐœํ™”์— ํ‘œํ˜„๋œ ๋ฒ”์œ„๊นŒ์ง€๋งŒ ์ž‘์„ฑ
      • ~ํ•œ ๋“ฏ ์ฒ˜๋Ÿผ ์• ๋งคํ•œ ํ‘œํ˜„์€ ์ •๊ทœํ™”ํ•˜์ง€ ์•Š์Œ. ๋‹จ๋ฐœ์„ฑ์ธ ๊ฒฝ์šฐ ํŠนํžˆ memory pointํ™” ํ•˜์ง€ ์•Š๊ณ  ๋ฐ˜๋ณต๋  ๊ฒฝ์šฐ ๋ช…ํ™•ํ•  ๋•Œ ์ž‘์„ฑ
      • verb(attribute)์˜ ํŒจํ„ดํ™”: is from, lives in, works at, prefers, dislikes, owns, has, moved to, studied at, is allergic to, is interested in
      • ๊ฐ€๋Šฅํ•œํ•œ boolean์œผ๋กœ ์ฒ˜๋ฆฌ
      • coreference ํ•ด์†Œ
    • QA์˜ query ์ •๋‹ต์˜ ๊ทผ๊ฑฐ๋กœ ํ™œ์šฉ
    • inference์—์„œ ํ™œ์šฉ ์ธก๋ฉด
      1. E: memory point ํ›„๋ณด ๋ฆฌ์ŠคํŒ…
      2. U: conflict memory ์ œ๊ฑฐ, ๊ณผ๊ฑฐ ์ •๋ณด ์ˆ˜์ • ๋ฐ invalid ์ฒ˜๋ฆฌ ๋“ฑ
      3. QA: memory-point ๊ธฐ๋ฐ˜ ์ •๋‹ต ์ฒ˜๋ฆฌ

Effects

  • Tab 3ย ์ „๋ฐ˜์ ์ธ Memory system์ด ๋ชจ๋“  operation์—์„œ ๊ทผ๋ณธ์ ์œผ๋กœ ์ทจ์•ฝํ•˜๋‹ค
    • E: ๋Œ€์ฒด๋กœ recall์€ ์ค€์ˆ˜ํ•˜๋‚˜ precision์ด ๋‚ฎ์Œ (false memory ์ƒ์„ฑ ๋“ฑ)
      • over-generalization : ์ปคํ”ผ ์ค„์ด๋ ค๊ณ  > ์ปคํ”ผ ์‹ซ์–ดํ•จ
    • U: ๋Œ€์ฒด๋กœ ๋ชปํ•จ. update๋ฅผ ๋ˆ„๋ฝํ•˜๊ฑฐ๋‚˜ ์ž˜๋ชปํ•˜๊ฑฐ๋‚˜ conflict ํ•ด๊ฒฐ์— ์‹คํŒจ
    • QA: basic fact๋Š” ๊ทธ๋‚˜๋งˆ ํ•˜์ง€๋งŒ ๋‚˜๋จธ์ง€ 5์ข…์— ๋Œ€ํ•ด์„œ๋Š” ์„ฑ๋Šฅ ๋ถ•๊ดด
    • ๊ธธ์ด ๊ธธ์–ด์ง€๋ฉด ์„ฑ๋Šฅ ๊ธ‰๋ฝ
  • Fig 5ย QA๋ณ„ ์ƒ์„ธ
    • basic fact๋„ ์• ์ดˆ์— false memory ์ƒ์„ฑํ•˜๋ฉด ์ค„์ค„์ด (์‹ฌ์ง€์–ด ํ™•์‹ ํ•˜๋ฉฐ) ์‹คํŒจ
    • multi-hop: ์ฒ˜์ฐธํ•˜๊ฒŒ ์‹คํŒจ
    • dynamic update: ๊ฐ™์€ ํ–‰์œ„ ๋‘๋ฒˆ ํ•ด์„œ memory update๊ฐ€ ์ง„ํ–‰๋œ ์ƒํ™ฉ์— ๋Œ€ํ•ด ์ตœ์‹  ์ •๋ณด๋ฅผ ๋ฌผ์„ ๋–„ ๊ณผ๊ฑฐ memory๋ฅผ ๋Œ์–ด์™€์„œ ์‹คํŒจ
      • e.g. ๋งค ์›”์š”์ผ๋งˆ๋‹ค ์šด๋™ํ•ด > ์ด์ œ๋Š” ์›”์š”์ผ์— ์šด๋™ ์•ˆํ•ด
        • ์ด์ƒ์ : User does not exercise on Mondays.
        • ํ˜„์‹ค: ๋‘˜๋‹ค ๋‚จ๊ธฐ๊ฑฐ๋‚˜ ๋ฌด์‹œํ•˜๊ฑฐ๋‚˜ ํ‹€๋ฆฐ ์ •๋ณด ์—…๋ฐ์ดํŠธ
      • e.g. 2๋ฒˆ ์ด์‚ฌ > ํ˜„์žฌ ๋‚˜๋Š” ์–ด๋”” ์‚ด์•„?
        • ํ˜„์‹ค: ๊ณผ๊ฑฐ ์ด์‚ฌ ์žฅ์†Œ๋ฅผ ๋Œ์–ด์˜ด
    • boundary: ์‹œ๊ฐ„ ๊ตฌ๋ถ„์„ ๋ชปํ•˜๊ฑฐ๋‚˜ ๋ชจ๋ฅด๊ฒ ๋‹ค๊ณ  ๋‹ต๋ณ€ํ•ด์„œ ์‹คํŒจ
    • conflict: ์ƒ๋Œ€์ ์œผ๋กœ ์ค€์ˆ˜ํ•œ ํŽธ. ์ถฉ๋Œ๋œ memory๊ฐ€ ๋ช…ํ™•ํ•˜๋ฉด retrieval ๋‹จ๊ณ„์—์„œ ๋‹ค ๋ฝ‘์•„์˜ค๊ณ  ๋ชจ๋ธ์ด ํ‹€๋ฆฐ ๊ฑธ ๊ณ ๋ฅด๊ธฐ ์šฉ์ดํ•ด์ง„๋‹ค๊ณ  ๋ถ„์„
    • generalization: fact memory ์ž˜ ๋Œ์–ด์™€๋„ ๊ทธ๊ฑธ ๊ฐ€์ง€๊ณ  preference ๊ธฐ๋ฐ˜ ์ถ”๋ก ์— ์‹คํŒจ
    • memory ๋ˆ„๋ฝ์ด ํฐ ๋ฌธ์ œ๋ผ๊ณ  ์ง€์ ; ํ•„์š”ํ•œ๊ฑธ ๋ชป์ฐพ์•„์˜ด
  • Tab 5ย Efficiency ์ธก๋ฉด์—์„œ addition(memory ์“ฐ๊ธฐ) ๋‹จ๊ณ„์—์„œ ๋ณ‘๋ชฉ ์‹ฌํ™”
    • Mem0๋Š” ์‹ฌํ•˜๋ฉด 45์‹œ๊ฐ„ ๋„˜๊ฒŒ, retrieval์ด ์ƒ๋Œ€์ ์œผ๋กœ ๋น ๋ฆ„

Personal note.

  • ์ „์ฒด์ ์œผ๋กœ memory system์„ ์ •๋ฆฌํ•˜๋Š” ์ž…์žฅ์—์„œ ์ฐธ๊ณ ํ•ด๋ณผ๋งŒํ•œ ๋…ผ๋ฌธ์ด๋ผ๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค. survey paper๋Š” ์•„๋‹ˆ์ง€๋งŒ related work ์ •๋ฆฌ๊ฐ€ ์ž˜ ๋œ ํŽธ์ด์—ˆ์Šต๋‹ˆ๋‹ค. ํ•œํŽธ์œผ๋กœ๋Š” 3๊ฐ€์ง€ operation (memory extract > update > QA) ๋กœ ๊ตฌ๋ถ„์กฐ์ฐจ ์ œ๋Œ€๋กœ ๋˜์–ด์žˆ์ง€ ์•Š์•˜๋˜ ๊ฒŒ ํ˜„์žฌ memory ์—ฐ๊ตฌ ํ˜„ํ™ฉ์ธ๋ฐ ๋„ˆ๋ฌด ๊ฑฐ์ฐฝํ•œ ํ˜น์€ ์ข์€ ๋ฌธ์ œ๋กœ ๊ณ ๋ฏผํ–ˆ๋˜ ๊ฒƒ์€ ์•„๋‹Œ์ง€ ๋Œ์•„๋ณด๊ฒŒ ํ•˜๊ธฐ๋„ ํ•ฉ๋‹ˆ๋‹ค. conflict๋‚˜ preference๋„ ํญ๋„“๊ฒŒ ๋‹ค๋ฃจ๊ณ ๋Š” ์žˆ์ง€๋งŒ ์–•๋‹ค๊ณ ๋Š” ์ƒ๊ฐํ•˜๋Š”๋ฐ, ํƒ€๊นƒํ•  ์ •๋„๋กœ ๊นŠ์ด๊ฐ€ ์—†๋Š” ๊ฒŒ ์‚ฌ์‹ค์€ ์ด ๋ฌธ์ œ์˜ ํ˜„์‹ค์  ํ•œ๊ณ„๊ฐ€ ์•„๋‹Œ๊ฐ€ ์‹ถ์€ ์ƒ๊ฐ๋„ ๋“ค๊ณ ์š”.
  • ๋ฉ”๋ชจ๋ฆฌ ๊ตฌ์„ฑ ์ธก๋ฉด์—์„œ ๊ตฌ์กฐํ™”๋œ NL์„ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ๊ณ , ์–ด๋А์ •๋„ template์œผ๋กœ ๊ตฌ์กฐ๋ฅผ ์ œํ•œํ•˜๊ณ  ์žˆ์œผ๋ฉฐ, human annotation ๊ด€์ ์—์„œ explicitํ•˜์ง€ ์•Š์œผ๋ฉด guessํ•˜์ง€ ์•Š์•˜๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. (implicitํ•œ ๊ฑธ explicitํ•˜๊ฒŒ ๋“œ๋Ÿฌ๋‚ด์ง€ ์•Š์Œ) ๋ชจ๋ธ์ด ์ถ”๋ก ํ•˜๊ฒŒ๋” ์—ด์–ด๋‘˜ ์ˆ˜๋Š” ์žˆ์ง€๋งŒ, ์ •๋‹ต์ฒ˜๋Ÿผ labelingํ•˜์ง€ ์•Š์œผ๋ ค๊ณ  ์• ์ผ๋‹ค๊ณ  ์ดํ•ดํ–ˆ์Šต๋‹ˆ๋‹ค. preference๋ฅผ ๊ฐ€๋Šฅํ•˜๋ฉด boolean ์ˆ˜์ค€ํ™” ํ•˜๋ ค๊ณ  ๋…ธ๋ ฅํ–ˆ๋‹ค๋Š”๋ฐ ์ด ์—ญ์‹œ ์ •๋‹ต์ด ์—†๋Š” ๋ฌธ์ œ์— ๋Œ€ํ•œ ๋…ธ๋ ฅ์˜ ์ผํ™˜์œผ๋กœ ๋А๊ปด์ง€๋ฉฐ, ๋ชจ๋“  ๋ฐ์ดํ„ฐ์…‹ ๊ฒฐ๊ตญ ์‚ฌ๋žŒ์ด ์ผ์ผ์ด ๊ฒ€์ˆ˜ํ–ˆ๊ธฐ ๋•Œ๋ฌธ์— ํ’ˆ์งˆ ๋ณด์žฅ์„ ๋…ธ๋ ฅํ•œ ๊ฒƒ๋„ ์ธ์ƒ์ ์ž…๋‹ˆ๋‹ค.
  • ๋‹ค๋งŒ hallucination์ด๋ผ๋Š” ํ‘œํ˜„์ด ์ข€ ๋‚จ์šฉ๋˜์—ˆ๋‹ค๋Š” ์ธ์ƒ์€ ์žˆ๊ณ , ์—ฌ์ „ํžˆ ๊ธด ๋Œ€ํ™” ๊ตฌ์ถ•์—๋Š” ๋žœ๋ค ์•„๋ฌด๋ง ๋ผ์›Œ๋„ฃ๊ธฐ ์ˆ˜์ค€์ธ ์ (๋ฌธ์ œ๋ฅผ ๋ฌธ์ œ๋กœ ๋งŒ๋“ค๊ธฐ ์œ„ํ•ด ๊ผฌ์•˜๋‹ค๊ณ  ๋ณผ ์ˆ˜๋„ ์žˆ๊ณ , ์˜คํžˆ๋ ค flatํ•˜๊ฒŒ ์‰ฌ์šด ๋ฌธ์ œ์ธ๋ฐ ๋ชปํ‘ผ๋‹ค๊ณ  ํ•ด์„ํ•  ์ˆ˜๋„ ์žˆ์„,,,)์ •๋„๋Š” ๋ฐ์ดํ„ฐ์…‹ ์ธก๋ฉด์—์„œ ์•„์‰ฌ์šด ๋ถ€๋ถ„์œผ๋กœ ๋‚จ์Šต๋‹ˆ๋‹ค.
  • ํ˜„์ƒ ๋ถ„์„ ๊ด€๋ จํ•ด์„œ๋Š” ์ธ์‚ฌ์ดํŠธ๊ฐ€ ๋˜๋Š” ๊ฒƒ ๊ฐ™์€๋ฐ, implicitํ•œ preference ์ฆ‰ ์ถ”๋ก ์ด ํ•„์š”ํ•œ ๊ฒฝ์šฐ์— ๋Œ€ํ•ด ๋ฉ”๋ชจ๋ฆฌ ์ž˜ ์ฐพ์•„์™”๋‹ค๊ณ  ํ•ด๋„ ๋ชปํ–ˆ๋‹ค๋Š” ๋ถ€๋ถ„๊ณผ update์— ๋Œ€ํ•ด ์ œ๋Œ€๋กœ ์ฒ˜๋ฆฌํ•˜์ง€ ๋ชปํ•˜๋Š” ๋ถ€๋ถ„ ๋“ฑ์˜ ํ•œ๊ณ„๋Š” ์ด๋ฏธ ์ฒด๊ฐํ–ˆ๋˜ ๋ฐ” ์žˆ์–ด์„œ ๋ฌธ์ œ์˜์‹์„ ์žฌํ™•์ธํ–ˆ์Šต๋‹ˆ๋‹ค.
  • ์ƒ๋Œ€์ ์œผ๋กœ ํ‰๊ฐ€๋„ operation๋ณ„๋กœ ๊ผผ๊ผผํ•˜๊ฒŒ ์„ธ๋ถ„ํ™”ํ•˜๋ ค๊ณ  ๋…ธ๋ ฅํ–ˆ๊ณ  (์—ฌ์ „ํžˆ LAAJ๋ฅผ ํ•œ๊ณ„๋กœ ์ง€์ ํ•˜์ง€๋งŒ standard์ด์ง€ ์•Š์„์ง€), Memory๋ฅผ ์ž˜ ๋งŒ๋“ค์—ˆ๋ƒ ํ˜น์€ ์ž˜ ์“ฐ๋ƒ์˜ ๋ฌธ์ œ๋ฅผ ๋‹จ์ˆœ QA๋กœ ํ‰๊ฐ€ํ•˜๊ณ ์ž ํ–ˆ๋˜ ์ง€๋‚œ ๋ฐ์ดํ„ฐ์…‹์˜ ๋ฌธ์ œ๋ฅผ ์งš์—ˆ๊ธฐ ๋•Œ๋ฌธ์— ์ธ์ƒ์ ์œผ๋กœ ํ‰๊ฐ€๋ฐ›๊ณ  ์žˆ๋Š” ๊ฒƒ์œผ๋กœ ๋ณด์—ฌ์š”.