Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
Meta info.
- Authors: Bradley Brown, Jordan Juravsky, Ryan Ehrlich, Ronald Clark, Quoc V. Le, Christopher RΓ©, Azalia Mirhoseini
- Paper: https://openreview.net/pdf?id=0xUEBQV54B
- Affiliation: Google DeepMind, Stanford Univ., Univ. of Oxford
- Published: July 31, 2024
TL; DR
Repeated Samplingμ΄ LLM μ±λ₯μμ coverage μΈ‘λ©΄μ ν¨μ©μ΄ λ§€μ° ν¬κ³ , μλ verificationμ΄ κ°λ₯ν κ²½μ° μ νλκΉμ§ ν¬κ² ν₯μμν¨λ€.






Suggestion
μμλ§ μΆ©λΆνλ€λ©΄ κ°λ₯νν λ§μ μνμμ λ΅μ μ°Ύμ(rapeated sampling)
- νλμ λ¬Έμ μ λν΄ μ¬λ¬ κ°μ μν λ΅λ³μ μμ±νκ³ (μν μμ±)
- μ΄ μ€μμ κ°μ₯ μ μ ν λ΅λ³μ μ ν(κ²μ¦)
- unit test, execution, majority vote, reward model, β¦
Effects
μ ννκ³ ν¨μ¨μ μΈ κ²μ¦λ² μλ°μ΄ νμμ
- Experimental Setup:
- Task: μ리μΆλ‘ (GSM8K, MATH), μ¦λͺ (MiniF2F-MATH), μ½λ©(CodeContest), μ€μ GitHub μ΄μ ν΄κ²°(SWE-bench Lite)
- Target Model: 70M-70Bμ Llama, Gemma, Pythia
- Results:
- λμ λ°λΌμλ μμ λͺ¨λΈλ‘ λ λ§μ΄ μννκ³ λ΅μ μΆλ €λ΄λκ², ν° λͺ¨λΈ μ°λ κ²λ³΄λ€ λ«λ€.
- 컀λ²λ¦¬μ§μ μν μ μ¬μ΄ power law μ μ(
Figure 5) - μλκ²μ¦ κ°λ₯ν κ²½μ° (λ
Όλ¬Έμμ μ리μΆλ‘ μΈμ task)
- μν μ μ¦κ°μ λ°λΌΒ λ¨μΌ μλλ‘ ν΄κ²°λͺ»νλ λ¬Έμ ν΄κ²°
- λ ν¬κ³ κ°ν λͺ¨λΈμ λ¨μΌ μλλ³΄λ€ λμ μ±λ₯Β λ¬μ±
- μ¦, μ λ°λκ° μννμμ λΉλ‘ν΄μ μ¦κ°
- e.g. DeepSeek-Coder-V2-Instruct λ¨μΌμλ μ±λ₯ 15.9% > 250ν μνμ μ±λ₯56% μ¦κ° (+43%p)
- μλκ²μ¦ ν΄ λΆμ¬ν μ리μΆλ‘ μ κ²½μ°,
- μν μ μ¦κ°μ λ°λΌ 컀λ²λ¦¬μ§λ λκΈ΄ νλλ°
- μ λ°λ μΈ‘λ©΄μμ majority voteλ reward model μ¬μ©νλ건 λλ¬ μ±λ₯ μνμ΄ μλλ―
Personal note. λ°λ³΅μνμμ λ§λ λ΅μ μ΄λ»κ² μ°Ύμλ΄λλκ° λ¬Έμ λΌλ κ΅μ₯ν λΉμ°ν μ΄μΌκΈ°λ₯Ό νκ³ μλλ° μ λλ‘ λ΅λ³μ λ½μλΌ μλ§ μλ€λ©΄ μμ λͺ¨λΈλ‘ μννμλ₯Ό λ리λ νΈμ΄ λμ μ μλ€λ κ±Έ μ€νμΌλ‘ 보μ¬μ μμνκΉμ§ μ§νν μ μ΄ λ Όλ¦¬μ μΈ μ€λλ ₯μ λμΈ κ² κ°μμ.