3 minute read

Meta info.

TL; DR

LLM Agent์˜ LTM์„ semantic lossless compression์œผ๋กœ ์žฌ์ •์˜ํ•˜๊ณ , write-time ๊ตฌ์กฐํ™”ยทonline synthesisยทintent-aware retrieval๋กœ ์„ฑ๋Šฅ๊ณผ ํ† ํฐ ํšจ์œจ(์ตœ๋Œ€ 30๋ฐฐ)์„ ๊ฐœ์„ ํ•œ ๋ฉ”๋ชจ๋ฆฌ ํ”„๋ ˆ์ž„์›Œํฌ ์ œ์•ˆ

Review Video

Slide 2 Slide 3 Slide 4 Slide 5

Figure 1 Figure 2 Table 1-2 Table 3 Table 4 Table 5 / Figure 3

Background

  • LLM agent memory๋Š” ์ €์žฅ๋ฌธ์ œ ์ค‘์‹ฌ์œผ๋กœ ๋…ผ์˜: memory๋ฅผ passive storage๋‚˜ expensive reasoning ๋Œ€์ƒ์œผ๋กœ๋งŒ ๋ณด๋Š” ๊ด€์ ์˜ ํ•œ๊ณ„
    • Full-context extension ๊ณ„์—ด: ๋Œ€ํ™” ์ „์ฒด ๋ˆ„์  (MemGPT, LoCoMo, ์ผ๋ถ€ LC-agent)
      • chitchat, acknowledgement, ๋ฐ˜๋ณต ํ™•์ธ ๊ฐ™์€ low-entropy content ๊ด€๋ฆฌ ๋น„ํšจ์œจ, lost-in-the-middle
    • Iterative reasoning / filtering ๊ณ„์—ด: inference๋กœ memory ์ •์ œ (A-Mem, Mem0)
      • latency ๋ฐ token cost ํญ์ฆ, temporal/referential ambiguity ์—ฌ์ „
  • information density ๋ฌธ์ œ๋Š” ์ƒ๋Œ€์ ์œผ๋กœ ๋œ ๋‹ค๋ค„์ง

Problem States

  • input length ์ œํ•œ๋œ ์ƒํƒœ์—์„œ LLM์ด long-horizon interaction์„ ์ž˜ ๊ธฐ์–ตํ•˜๋ ค๋ฉด memory๋Š” ์–ด๋–ป๊ฒŒ ์„ค๊ณ„๋ผ์•ผ ํ•˜๋Š”๊ฐ€?
    • low-utility dialogue๋ฅผ ์•„์˜ˆ ์ €์žฅํ•˜์ง€ ์•Š์œผ๋ ค๋ฉด?
    • pronoun / relative time ๋ฌธ์ œ๋ฅผ session ๊ฐ„์— ๊น”๋”ํžˆ ์ •๋ฆฌํ•˜๋ ค๋ฉด?
    • fragmented memory๋ฅผ ๋งŒ๋“ค์ง€ ์•Š์œผ๋ ค๋ฉด?
    • retrieval์—์„œ fixed top-k๊ฐ€ ์•„๋‹ˆ๋ผ query-dependent scope๋ฅผ ์“ฐ๋ ค๋ฉด?

Suggestions

Semantic Lossless Compression

  • Core idea: ๋Œ€ํ™” ๋ฉ”๋ชจ๋ฆฌ๋Š” ์š”์•ฝํ•  ๋Œ€์ƒ์ด ์•„๋‹ˆ๋ผ semantic lossless compression ๋Œ€์ƒ
    • ์ •๋ณด๋Š” ์ค„์ด๋˜, downsteram reasoning์— ํ•„์š”ํ•œ ์˜๋ฏธ๋Š” ์žƒ์ง€ ์•Š๋„๋ก
    • ์ด๊ฑธ retrieval time์ด ์•„๋‹ˆ๋ผ write time์— ํ•˜์ž
  • Methods: SimpleMem (Fig. 2)
    • Stage 1. Semantic Structured Compression: ์“ธ๋ชจ ์žˆ๋Š” ๋Œ€ํ™”๋งŒ context-independent memory unit์œผ๋กœ ๋ณ€๊ฒฝ
      • semantic density gating: sliding window๋กœ ๋Œ€ํ™” split, LLM์ด ์œ ์ตํ•œ ์ •๋ณด ์—ฌ๋ถ€ ํŒ๋ณ„
        • threshold๋‚˜ classifier ์—†์ด generation ์ž์ฒด๋ฅผ gating์œผ๋กœ ํ™œ์šฉ
      • de-linearization transformation: ๋‹จ์ผ LLM pass์—์„œ ๋™์‹œ์— coreference resolution + temporal normalization + fact atomization ์ˆ˜ํ–‰
        • coreference resolution: her kids โ†’ Sarahโ€™s kids
        • temporal normalization: last week โ†’ 2026-01-26
        • fact atomization: ์ตœ์†Œ๋‹จ์œ„ factual statement๋กœ ๋ถ„ํ•ด
      • ์˜๋„: memory unit์€ ๊ฐ์ž ์ด์ „ context ์—†์ด ํ•ด์„ ๊ฐ€๋Šฅ โ†’ retrieval ๋‚œ๋„ ํฌ๊ฒŒ ํ•˜๋ฝ
    • Stage 2. Online Semantic Synthesis: write-time์— synthesis
      • memory๋ฅผ atomicํ•˜๊ฒŒ๋งŒ ์Œ“์œผ๋ฉด fragmentation ๋ฐœ์ƒ โ†’ retrieval ์‹œ ์žฌ์กฐํ•ฉ ํ•„์š”
      • ๊ฐ™์€ session ๋‚ด์—์„œ ์˜๋ฏธ์ ์œผ๋กœ ์—ฐ๊ฒฐ๋œ fact๋“ค์„ ์ฆ‰์‹œ ํ•˜๋‚˜๋กœ ํ•ฉ์นจ
        • e.g. user wants coffee + user prefers oat milk + user likes it hot โ†’ user prefers hot coffee with oat milk
      • ์˜๋„: online, intra-session, proactive ์ฒ˜๋ฆฌ (not retrieval-time)
    • Stage 3. Intent-Aware Retrieval Planning: retrieval์„ ๊ฒ€์ƒ‰์ด ์•„๋‹Œ planning์œผ๋กœ ๊ฐ„์ฃผ
      • ๊ธฐ์กด retrieval์€ query ๋‚œ์ด๋„์™€ ๋ฌด๊ด€ํ•˜๊ฒŒ ํ•ญ์ƒ top-k๋ฅผ ๊ฐ€์ ธ์˜ด
      • query complexity์— ๋”ฐ๋ผ ํ•„์š”ํ•œ retrieval depth ์ •์˜
      • semantic/lexical/symbolic query ๋ถ„๋ฆฌ โ†’ Semantic index (dense) + Lexical index (BM25) + Symbolic index (time, entity) ๋ณ‘๋ ฌ ์กฐํšŒ โ†’ set union + deduplication

Effects

  • Experiment setup
    • Benchmarks
      • LoCoMo: long-term conversational reasoning ํ‰๊ฐ€
        • 200-400 turns, topic shift/temporal jump/interleaved topics ๋‹ค์ˆ˜
        • QA ์œ ํ˜•: multi-hop reasoning, temporal reasoning, open-domain, single-hop
        • Metrics: F1, BLEU-1, Adversarial Success Rate, Token Cost
      • LongMemEval-S: extreme long-context memory stress test
        • ๋น„์ •์ƒ์ ์œผ๋กœ ๊ธด interaction history โ†’ ์ •ํ™•ํ•œ answer localization ์š”๊ตฌ
        • ํ‰๊ฐ€: LAAJ accuracy (gpt-4.1-mini๊ฐ€ CORRECT/WRONG binary ๊ตฌ๋ถ„)
    • Baselines: Full-context (LoCoMo), ReadAgent, MemoryBank, MemGPT, A-Mem, LightMem, Mem0
    • Backbones: GPT-4.1-mini, GPT-4o, Qwen-Plus, Qwen2.5 (1.5B, 3B), Qwen3 (1.7B, 8B)
    • Implementation details
      • sliding window size: 20 turns
      • semantic embeddings: Qwen3-embedding-0.6b (1024-d)
      • indexing: semantic = LanceDB, lexical = BM25, symbolic = SQL-based metadata (time, entity)
      • retrieval depth: planner ํŒ๋‹จ์— ๋”ฐ๋ผ adaptive [3, 20]
  • Results
    • RQ1: SimpleMem์ด ๊ธฐ์กด memory system๋ณด๋‹ค ๋‚˜์€๊ฐ€? โ†’ YES
      • LoCoMo: ๋ชจ๋“  backbone์—์„œ SimpleMem ์ตœ๊ณ  ํ‰๊ท  F1
        • ํŠนํžˆ temporal reasoning์—์„œ ํฐ ์ฐจ์ด
        • single-hop์—์„œ๋„ ์„ฑ๋Šฅ ์šฐ์ˆ˜ = abstraction์ด detail์„ ์žƒ์ง€ ์•Š์•˜์Œ์„ ์‹œ์‚ฌ
      • LongMemEval-S: ํŠน์ • ์œ ํ˜•์— ์น˜์šฐ์น˜์ง€ ์•Š๊ณ  ์•ˆ์ •์  ํšจ๊ณผ
        • multi-session category์—์„œ ๊ฐ€์žฅ ํฐ ๊ฒฉ์ฐจ
        • Full-context๋Š” ์‹คํŒจ, LightMem์€ ํŠน์ • sub-task์—์„œ๋งŒ ํšจ๊ณผ์ 
    • RQ2: ์„ฑ๋Šฅโ€“token cost trade-off ๊ฐœ์„ ? โ†’ YES
      • ํ† ํฐ ํšจ์œจ: Full-context/MemGPT ~16,900 tokens, Mem0 ~980 tokens, SimpleMem ~530 tokens
      • ๋ชจ๋ธ ํฌ๊ธฐ ๊ด€๋ จ: Qwen2.5-1.5B + SimpleMem > Qwen3-1.7B + Mem0
        • memory architecture๊ฐ€ model scale์„ ๋ณด์™„ํ•œ๋‹ค๊ณ  ์ฃผ์žฅ
    • RQ3: ablation
      • stage 1 ์ œ๊ฑฐ โ†’ temporal reasoning ์‹คํŒจ
      • stage 2 ์ œ๊ฑฐ โ†’ multi-hop reasoning ์‹คํŒจ
      • stage 3 ์ œ๊ฑฐ โ†’ open-domain/single-hop reasoning ์‹คํŒจ

Personal note. memory๋ฅผ ์ œ๋Œ€๋กœ ์—ฐ๊ตฌํ•ด๋ณธ๋‹ค๋ฉด, ๊ทธ๋Ÿฌ๋‹ˆ๊นŒ ์žˆ๋Š” setup๋“ค์„ ๋†“๊ณ  ๋ฐฉ๋ฒ•๋ก  ์ค‘์‹ฌ์œผ๋กœ ํƒœํดํ•œ๋‹ค๋ฉด ์ด๋Ÿฐ ๋А๋‚Œ์˜ ํŽ˜์ดํผ๊ฐ€ ๋‚˜์˜ฌ ๊ฒƒ ๊ฐ™์€๋ฐ, ๋ด๋„ ๋ด๋„ ๋ญ๊ฐ€ ์ฐธ์‹ ํ•œ์ง€๋„ ์ž˜ ๋ชจ๋ฅด๊ฒ ๋Š”๋ฐ ๋ฐ˜ํ•ด ๊ณ„์† ๋‚˜์˜ค๊ณ  ์ฃผ๋ชฉ๋ฐ›๋Š” ๊ฒƒ ๊ฐ™์•„์„œ ๊ณ ๋ฏผ์Šค๋Ÿฝ์Šต๋‹ˆ๋‹ค. ๋ฒ—์–ด๋‚˜์„œ ์ƒ๊ฐํ•ด๋ณด๋ฉด ์ง€๋‚œ 1์›”์— ์“ด ํŽ˜์ดํผ ๋А๋‚Œ์„ ๋‹ต์Šตํ•  ๊ฒƒ๋„ ๊ฐ™๊ณ ์š”.

์ด ํŽ˜์ดํผ์—์„œ ๋น„ํŒ์ ์œผ๋กœ ์ƒ๊ฐํ•ด๋ณผ ๋ถ€๋ถ„์€ ๊ฒฐ๊ตญ ๊ฒ€์ƒ‰ ์„ฑ๋Šฅ์— ํฌ๊ฒŒ ์˜์กด๋˜๋Š” ๊ตฌ์กฐ์ธ ์ , ์ฆ‰ ๋ฉ”๋ชจ๋ฆฌ ๊ฒ€์ƒ‰์„ ์–ผ๋งˆ๋‚˜ ์ž˜ํ• ๋ž˜๋ฅผ ๊ณ ๋ฏผํ•œ ํ”์ ์ด ์—ญ๋ ฅํ•˜๊ณ , ์ด ์ ‘๊ทผ์ด ๋‚˜์˜๋‹ค๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ write-time์ด๋ผ๊ณ  ๊ฑฐ์ฐฝํ•˜๊ฒŒ ์–˜๊ธฐํ•˜๊ธด ํ–ˆ์ง€๋งŒ ๊ฒฐ๊ตญ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์ ์žฌํ•  ์ ์— ์˜ˆ์˜๊ฒŒ ์Œ“๊ณ  ์ฐพ์„ ๋•Œ ์‹ ์ค‘ํ•˜๊ฒŒ (semantic + lexical + symbolic 3์ค‘ ๊ตฌ์กฐ) ์ฐพ๊ฒ ๋‹ค๋Š” ํ๋ฆ„์ด ์ง€๊ทนํžˆ ํ‰๊ฐ€ํ•˜๊ณ ์ž ํ•˜๋Š” ๋ฒค์น˜๋งˆํฌ์˜ 4๊ฐ€์ง€ ์งˆ์˜ ์œ ํ˜•์— ๋งค๋ชฐ๋˜์–ด ์žˆ๋‹ค๋Š” ๋А๋‚Œ์„ ๋ฐ›์•˜์Šต๋‹ˆ๋‹ค. ์ฆ‰ ์ฃผ์žฅ์ด ๊ฑฐ์ฐฝํ•œ ๋“ฏ ๋ณด์ด์ง€๋งŒ, semantic lossless compression์€ ์‚ฌ์‹ค์ƒ semantic task-lossless์— ๊ฐ€๊น๊ณ ์š”. task scope ์ž์ฒด์˜ ๋ฌธ์ œ์ธ๋ฐ ์ด๋ฅผ ๋ฌธ์ œ์‚ผ๊ณ ์ž ํ•˜๋ฉด ๋‹ค์‹œ ๋ฒค์น˜๋งˆํฌ๋ถ€ํ„ฐ ๊ตฌ์ถ•ํ•ด๋†”์•ผ๋œ๋‹ค๋Š” ํ๋ฆ„์— ์ง๋ฉดํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

์ง€๋‚œ ์—ฐ๊ตฌ ํ๋ฆ„๊ณผ์˜ ๋งฅ์„ ์žก์•„๋ณด์ž๋ฉด ์•„๋งˆ ์ด๋Ÿฐ ์‹์œผ๋กœ์˜ ์••์ถ•์ด ์—ฌ์ „ํžˆ user preference์— ๋Œ€ํ•ด์„œ๋Š” ๋งฅ์„ ๋ชป์ถœ ๊ฒƒ ๊ฐ™๋‹ค๋Š” ์ง๊ฐ์€ ๋ฐ›์•˜์Šต๋‹ˆ๋‹ค. (stage 1์—์„œ ์—ฌ์ „ํžˆ ์•ˆ์ค‘์š”ํ•˜๋‹ค๊ณ  ํŒ๋‹จํ•œ ์ •๋ณด๋“ค์ด preferenceํ™” ๋  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— == typical summarization ์Šคํƒ€์ผ memory์˜ ๋ฌธ์ œ) ์œ ์ตํ•œ ๋’ท๋ฐ›์นจ์ด๋ผ๊ณ  ํ•œ๋‹ค๋ฉด ์•„๋ฌดํŠผ memory๊ฐ€ ์ž˜ ์ •๋ฆฌ๋˜์–ด ์žˆ์–ด์•ผ agent๊ฐ€ ๊ฐ–๋‹ค๊ฐ€ reasoningํ•  ๋•Œ ์“ด๋‹ค ์ •๋„์˜ ํ๋ฆ„..

์ „๋ฐ˜์ ์œผ๋กœ ์ €์ž๋“ค์ด ์ฃผ์š”ํ•˜๊ฒŒ ์ฃผ์žฅํ•˜๋Š” ์••๋„์ ์ธ ํ† ํฐ ํšจ์œจ์„ฑ์œผ๋กœ ๋ถ„๋ช… ์žƒ๋Š”๊ฒŒ ์žˆ์–ด๋ณด์ด๋Š”๋ฐ ํƒ€๊นƒํ•œ ๋ฒค์น˜๋งˆํฌ์—์„œ๋Š” ๊ทธ loss๊ฐ€ ์•ˆ์žกํžˆ๋Š” ๊ฑฐ๋กœ ๋ณด์—ฌ์„œ ๋ˆˆ๊ฐ€๋ฆฌ๊ณ  ์•„์›…ํ•œ ๋А๋‚Œ๐Ÿ’ญ