1 minute read

Meta info.

TL; DR

๋น„์šฉ ์ ˆ๊ฐ์„ ์œ„ํ•œ LLM routing ๋ฐฉ๋ฒ• ์ œ์•ˆ

Untitled

Untitled

Untitled

Problem States

๋‹ต๋ณ€ ํ’ˆ์งˆ์„ ์œ ์ง€ํ•˜๋ฉด์„œ๋„ ๋น„์šฉ ์ ˆ๊ฐํ•˜๋Š” ๋ฐฉ๋ฒ• ๊ณ ๋ฏผ

  • (1) ๊ณ ๋น„์šฉ LM์— ํƒœ์šฐ๋Š” ๊ฒƒ์œผ๋กœ ๋ถ€ํ„ฐ ๋ณด์žฅ๋˜๋Š” ์„ฑ๋Šฅ
  • (2) ์„ฑ๋Šฅ์„ ํฌ๊ธฐํ•˜๊ณ  ์ž‘์€ LM์— ํƒœ์›Œ์„œ ์–ป๋Š” ๋น„์šฉ์ƒ ์ด์ 
  • (1)๊ณผ (2) ์‚ฌ์ด trade-off์—์„œ ๊ท ํ˜• ์ฐพ๊ธฐ

Suggestions

(1) human preference (2) data augmentation(LLM-judge-labeled Datasets) ํ™œ์šฉํ•˜๋Š” router ๋ชจ๋ธ ์ œ์•ˆ

  • (1) Chatbot Arena platform ๋ฐ์ดํ„ฐ
  • (2) (1) ์˜ ์ฆ๊ฐ•์„ ์œ„ํ•ด gold data์˜ label ๋‹ต๋ณ€ ๋ณด๊ณ  Strong model ๊ณผ weak model ์„ ํ˜ธ ๋ฐ์ดํ„ฐ ๊ตฌ์ถ•
  • Strong model (GPT-4) ์™€ weak model (Mixtral-8x7B) ์ด์ง„ class routing
    • win prediction model: (1)๊ณผ (2) ํ™œ์šฉํ•˜์—ฌ ํ•™์Šต, ์ƒ๋Œ€ ๋น„๊ตํ•˜์—ฌ ์„ ํ˜ธ class๋ฅผ ์„ ํƒํ•˜๋Š” ๋ชจ๋ธย pic3
      • backbone: text-embedding-3-small
        1. matrix factorization router: ๊ฐ ๋ชจ๋ธ๋ณ„๋กœ low dimensional space์— representํ•˜๋ฉด์„œ ๋ชจ๋ธ-์ฟผ๋ฆฌ๊ฐ„ score function ํ•™์Šต
        2. similarity weighted ranking router : Bradley-Terry model ํ™œ์šฉ, training ๋ฐ์ดํ„ฐ์…‹์—์„œ ์œ ์‚ฌ ์ฟผ๋ฆฌ ๊ณ„์‚ฐ, ๊ทธ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ํ•™์Šต ๋ฐ์ดํ„ฐ(๊ณผ๊ฑฐ ์„ ํ˜ธ) ์ค‘์š”๋„์— weight ๋ถ€์—ฌ
  • cost threshold([0, 1])๋ฅผ ์„ค์ •ํ•˜์—ฌ ํ’ˆ์งˆ๊ณผ ๋น„์šฉ์‚ฌ์ด trade-off ์ •๋„ ์กฐ์ •

Effects

  • GPT-4 ์„ฑ๋Šฅ์˜ 95% ์œ ์ง€
  • MT Bench์—์„œ 85%, MMLU์—์„œ 45% ์ด์ƒ์˜ ๋น„์šฉ ์ ˆ๊ฐ
  • Martian/Unify AI ๋“ฑ ์ƒ์šฉ๊ณผ ๋น„๊ต์‹œ 40% ์ด์ƒ ์ €๋ ดํ•œ ๋น„์šฉ์œผ๋กœ ๋น„์Šทํ•œ ์„ฑ๋Šฅ ์ œ๊ณต