1 minute read

Meta info.

TL; DR

ํ˜‘์—…์ /๊ฒฝ์Ÿ์  ์ƒํ™ฉ์—์„œ ์—์ด์ „ํŠธ๋ผ๋ฆฌ ์ƒํ˜ธ์ž‘์šฉํ•˜๋Š” ์‹œ์Šคํ…œ ํ‰๊ฐ€์— ๋Œ€ํ•œ ๋ฒค์น˜๋งˆํฌย MARBLEย ์ œ์•ˆ

image.png

image.png

image.png

image.png

image.png

image.png

image.png

image.png

Background

AgentBench, ToolBench, HumanEval ๋“ฑ, ๋‹จ์ผ ์—์ด์ „ํŠธ ํ˜น์€ ํŠน์ • ๋„๋ฉ”์ธ ํ™˜๊ฒฝ์— ๊ตญํ•œ๋œ ์—์ด์ „ํŠธ ํ™˜๊ฒฝ์— ๋Œ€ํ•œ ๋ฒค์น˜๋งˆํฌ

Problem States

multi-agent collaboration ์–ด๋–ป๊ฒŒ ํ‰๊ฐ€ํ•  ์ˆ˜ ์žˆ์„๊นŒ?

  • RQ1 multi-agent๋Š” ์–ด๋–ป๊ฒŒ coordinate ์ž˜ํ• ๊นŒ
  • RQ2 multi-agent๊ฐ€ ์–ด๋–ป๊ฒŒ ํ˜‘๋ ฅ ํ˜น์€ ๊ฒฝ์Ÿ ์‚ฌ์ด ๊ท ํ˜•์„ ๋งž์ถœ๊นŒ
  • RQ3 ์–ด๋–ค coordination ๊ตฌ์กฐ ํ˜น์€ planning ์ „๋žต์ด ์ตœ์ ์ผ๊นŒ

Suggestion

Multi-agent cooRdination Backbone with LLM Engine(MARBLE)

  • ๊ตฌ์„ฑ:
    • Coordination Engine: agent๊ฐ„ ๊ด€๊ณ„ ๋ฐ ์†Œํ†ต ๋‹ด๋‹น
    • Agent Graph Module: agent interactions ์ •์˜ (e.g., collaboration, supervision, negotiation)
    • Cognitive Module: Theory-of-Mind ์™€ adaptive learning์œผ๋กœ ์ „๋žต์„ ๋™์ ์œผ๋กœ ๊ฐœ์„ 
  • Coordination ๋ฐฉ์‹ ์ข…๋ฅ˜:
    • Star: single planner๊ฐ€ actor๋“ค์—๊ฒŒ ์ž‘์—… ํ• ๋‹น (planner ๋ณ‘๋ชฉ ์šฐ๋ ค)
    • Tree: planner๊ฐ€ ๊ณ„์ธต์ ์œผ๋กœ ์—ฌ๋Ÿฟ (ํ•˜์œ„ actor์˜ ์ •๋ณด ์†์‹ค ์šฐ๋ ค)ย worst
    • Graph: ๋ชจ๋‘๊ฐ€ ์—ฐ๊ฒฐ๋œ ์™„์ „ ๋ถ„์‚ฐํ˜• decision-making (๋ณต์žก์„ฑ ์šฐ๋ ค)ย best
    • Chain: Sequential interaction (์ˆœ์ฐจ์„ฑ์—๋Œ€ํ•œ ๋ณ‘๋ชฉ ์šฐ๋ ค)
  • Planning Strategies ์ข…๋ฅ˜: vanilla prompting, CoT, Group Discussion(agent๋ผ๋ฆฌ ์ „๋žต์„ ๊ฐœ์„ ์‹œํ‚ด), Cognitive evolving planning(agent๋“ค์ด ์ด์ „ ์‹คํ–‰ ๊ฒฐ๊ณผ ๋ณด๊ณ  ์ „๋žต ๊ฐœ์„ best)
  • Benchmark ๊ตฌ์„ฑ: Milestone-Based Evaluation
    • collaboration: research collaboration, Minecraft Building, Database Debugging, Coding Collaboration
    • Competition: Bargaining, Werewolf

Effects

  • gpt-4o-mini๊ฐ€ ๊ฐ€์žฅ ์šฐ์ˆ˜
  • graph protocol ๋ฐฉ์‹์ด ๊ฐ€์žฅ ์šฐ์ˆ˜
  • cognitive evolving planning ๋ฐฉ์‹์ด ๊ฐ€์žฅ ์šฐ์ˆ˜
  • ์—์ด์ „ํŠธ๋Š” ๋งŽ์„์ˆ˜๋ก ์„ฑ๋Šฅ ํ•˜๋ฝ (5๋ช… ์ดˆ๊ณผ์‹œ coordination overhead)
  • iteration์€ 7ํšŒ๊นŒ์ง€๋Š” ์„ฑ๋Šฅ ํ–ฅ์ƒ๋˜๋‚˜ 10ํšŒ ๋ฐ˜๋ณต ์ดํ›„๋Š” ๋˜๋ ค์ €ํ•˜๋จ
  • werewolf ๊ฒŒ์ž„์—์„œ๋Š” emergent deception ์ „๋žต ๋ฐœ์ƒ: ์ „๋žต์ ์œผ๋กœ ์นจ๋ฌตํ•˜๊ฑฐ๋‚˜ ์‹ ๋ขฐ๊ทนํ™”๋œ ํ˜‘์—… ๋ฐœ์ƒ
  • gpt-4o-mini๋‚˜ gpt-3.5-turbo ์ •๋„๋ฉด Function calling ๊ฑฐ์˜ ์™„์ „ํžˆ ์ˆ˜ํ–‰ํ•˜์ง€๋งŒ, llama 3.1-8b๋‚˜ 3.3-70b๋Š” 70% ์ •๋„๋งŒ ์„ฑ๊ณต

Personal note. ๋งˆ์ง€๋ง‰ Finidngs์˜ ablation์—์„œ ํŠน์ดํ•œ๊ฑด llama-3.1 70b๋Š” 50%๋„ ๋ชปํ–ˆ๋‹ค๊ณ ..ย ๐Ÿค” ์˜ค๋Š˜ ์‹ ์ž…์ƒ ์Šคํ„ฐ๋””์—์„œ ๋ฐœํ‘œํ•ด์ค€ multi-agent ๋‚˜ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๊ด€๋ จํ•œ ๋‚ด์šฉ์ด๊ธฐ๋„ ํ•˜๊ณ , ์—ฐ๊ตฌ ์ œ์•ˆ ๋„ฃ์—ˆ๋˜ ๊ฑฐ ๋ ์ง„ ๋ชจ๋ฅด๊ฒ ์ง€๋งŒ ์˜๋„ํ–ˆ๋˜ ๋А๋‚Œ์˜ ๋ฒค์น˜๋งˆํฌ์ธ ๊ฒƒ ๊ฐ™์•„์„œ ๊ณต์œ ๋“œ๋ฆฝ๋‹ˆ๋‹ค.

kpi metric์€ ์ „์— ์œ ๋ฆฌ๊ฐ€ ๋žฉ์„ธ๋ฏธ๋‚˜์—์„œ ์†Œ๊ฐœํ•ด์คฌ๋˜ ๊ฒƒ ๊ฐ™๊ธด ํ•œ๋ฐ ๊ฒฐ๊ตญ ๋งˆ์ผ์Šคํ†ค์„ ์‚ฌ๋žŒ์ด ์ •ํ•˜๋“  ์‹คํ–‰๊ฒฐ๊ณผ๋ณด๊ณ  ๋™์ ์œผ๋กœ ์ •ํ•˜๋“  ๊ทธ๊ฑฐ ๊ธฐ๋ฐ˜์œผ๋กœ ์„ธ๋Š” ๊ฒƒ์ด ์ผ๋ฐ˜์ ์ธ๋“ฏ ํ•ฉ๋‹ˆ๋‹ค (๋‹ค๋ฅธ ๋ฐฉ๋ฒ•์ด ์—†๊ธฐ๋„ ํ•˜๊ณ  agent ํ–‰๋™ ์ถ”์  ๋ชฉ์ ์ธ๋“ฏ)