less than 1 minute read

Meta info.

TL; DR

LLM zero-shotμ—μ„œ λŒ€ν™”κΌ΄ QA μ„±λŠ₯을 크게 κ°œμ„ ν•  수 μžˆλŠ” 2-stage instruction tuning 방법 μ œμ•ˆ.

Untitled

Untitled

Suggestions

  • stage 1: multi-turn λŒ€ν™”λ°μ΄ν„°λ‘œ SFT
  • stage 2: λ§₯락이 μ£Όμ–΄μ§€λŠ” QA 벀치마크 λ°μ΄ν„°λ‘œ instruction tuning
  • retrieval for multi-turn QA: λŒ€ν™”κ°€ κΈΈμ–΄μ§„ 경우, 직전 λ°œν™”μ™€ λŒ€ν™” 이λ ₯을 μΈμ½”λ”©ν•΄μ„œ κ΄€λ ¨ λŒ€ν™” 뢀뢄을 μ°Ύμ•„μ˜¨λ‹€κ³ . (pic2)