Tags — Yejin Yoon

alignment learning중에 LLM은 objective를 따르는 척 하지만, 사실은 원래 pretraining에서부터 갖고 있던 선호(자기 선호)를 잃기 싫기 때문에, training중에만 alignment된 척 위장하는 Alignment Faking 발생 현상에 대한 연구

Direct Multi-Turn Preference Optimization for Language Agents

October 29, 2024 1 minute read

Multi-turn 에서 RL Objectives를 직접 optimize하는 손실함수의 Direct Multi-Turn Preference Optimization (DMPO) 제안

Planning Like Human: A Dual-process Framework for Dialogue Planning

August 28, 2024 1 minute read

익숙한 상황을 처리하는 intuitive (fast) 정책 모델과 새로운 시나리오를 위한 analytical (slow)의 정책 모델을 상호 보완적으로 사용하는 이중 dialogue planning 프레임워크 제안

Scaling Laws for Reward Model Overoptimization

April 15, 2024 less than 1 minute read

RM으로 Policy model을 학습하면 학습할수록 real (human) preference와 격차가 벌어지는 overoptimization이 (반드시) 발생되며, 이 현상의 도달을 늦추는(?) 데에는 RM의 사이즈를 키우는게 유의한 영향을 끼치는 것으로 보임.

code 5 posts

Divide-and-Conquer Meets Consensus: Unleashing the Power of Functions in Code Generation

November 20, 2024 1 minute read

Divide-and-Conquer 전략에 기능적 합의(functional consensus)를 접목한 CodeGen framework FUNCODER 제안

Text2SQL is Not Enough: Unifying AI and Databases with TAG

September 2, 2024 less than 1 minute read

LM과 RDB간 interaction을 통합 및 일반화하는 Table-Augmented Generation(TAG) 제안

To Code, or Not To Code? Exploring Impact of Code in Pre-training

August 21, 2024 less than 1 minute read

사전학습때 Code를 보면 정말 좋은가?를 실험으로 경험적 검증

Unsupervised Evaluation of Code LLMs with Round-Trip Correctness

February 20, 2024 less than 1 minute read

RTC(round-trip correctness)라는 간단한 방식으로 LM의 코드 능력 평가

LLM-Assisted Code Cleaning For Training Accurate Code Generators

December 1, 2023 less than 1 minute read

Code Generation 모델 학습시 학습 데이터=코드를 가독성 좋게 리팩토링하면 모델 성능이 훨씬 좋아진다.

domain-adaptation 6 posts

RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework

August 8, 2024 less than 1 minute read

다양한 문서 생성 + QA pair 구성하여 다양한 시나리오에서 LLM의 지식 사용 능력 평가하는 Framework 제안

Do Large Language Model Understand Multi-Intent Spoken Language ?

March 8, 2024 less than 1 minute read

SLU(Spoken Language Understanding)에 대한 LLM 활용 연구를 위한 LM-MixATIS, LM-MixSNIPS 벤치마크 및 metric 제안

Specialized Language Models with Cheap Inference from Limited Domain Data

February 19, 2024 less than 1 minute read

1) generic pretraining cost 2) domain-specific pretraining cost 3) inference cost 4) size of specific domain training set 네가지 제약조건 하에서 가장 효율적인 학습에 대한 emperic...

DocLLM: A layout-aware generative language model for multimodal document understanding

January 23, 2024 less than 1 minute read

multi-modal LLM에서 착안, LM이 text와 (정형화된 document 내에서 ) 위치정보를 input으로 받도록 하여 internal structured document understanding 문제 해결

LLaMA Pro: Progressive LLaMA with Block Expansion

January 8, 2024 less than 1 minute read

새로 추가한 블록의 매개변수만 도메인 데이터로 업데이트하는 post-pretraining 방식의 block expansion이 domain-specific task에 특히 유용하다고 제안. 전체를 finetuning할 때 발생되는 망각이 일어나지 않는다고. 동일 데이터 사용을 전제...

BloombergGPT: A Large Language Model for Finance

April 5, 2023 less than 1 minute read

A combined pre-training approach for domain-specific and non-domain-specific corpus. It describes the dataset, model configuration, and training procedure fo...

ensemble 5 posts

Cooper: Co-Optimizing Policy and Reward Models in Reinforcement Learning for Large Language Models

August 18, 2025 2 minute read

policy와 reference-based RM (verifyRM) 을 동시에 update하는 RL framework COOPER 제안. reward hacking을 막기 위해 rule-based positives와 LLM-generated negatives를 활용한 contras...

Configurable Foundation Models: Building LLMs from a Modular Perspective

September 9, 2024 2 minute read

LLM을 인간의 뇌와 같이 기능적 모듈로 접근하자는 관점 제안 (brick 단위로 분해)과 경험적 실험 결과 보고

RouteLLM: Learning to Route LLMs with Preference Data

July 10, 2024 1 minute read

비용 절감을 위한 LLM routing 방법 제안

Knowledge Fusion of Large Language Models

January 29, 2024 1 minute read

기존에 각기 다른 구조를 가지면서 다양한 방식으로 학습된 여러 LLMs(soucre LLMs)을 병합해서 더 strong하게 만드는 방법(pic1)으로, 여러 LLM의 지식을 외부화하여 그들의 capability를 새로운 LLM(target LLM)으로 transfer하는 방법을 ...

Blending is All You Need

January 10, 2024 less than 1 minute read

여러 개의 작은 모델을 Blend해서 하나의 큰 모델과 비슷한 혹은 더 나은 성능을 낼 수 있다.

factuality 5 posts

The FACTS Grounding Leaderboard: Benchmarking LLMs’ Ability to Ground Responses to Long-Form Input

December 18, 2024 2 minute read

long input에 대한 response의 사실성 평가 벤치마크 제안. 최대 32K token의 입력 처리, 자동 평가 프레임워크 공개

Real-time Fake News from Adversarial Feedback

October 21, 2024 1 minute read

LLM의 fake news를 더 잘 생성하게 하는 방법. 학습 이후 발생되는 사건의 fake news 탐지를 위해, adversarial iterative fake news 생성 파이프라인 제안

Deductive Closure Training of Language Models for Coherence, Accuracy, and Updatability

January 25, 2024 less than 1 minute read

standard LM training에 특정 text를 생성하도록 학습시킨다고 해서 그 text의 implies(함의)에 해당하는 text들의 probability가 높아지는 것은 아님. factuality 측면에서 관련 fact set (text)에도 높은 확률을 assign하기...

DocLLM: A layout-aware generative language model for multimodal document understanding

January 23, 2024 less than 1 minute read

multi-modal LLM에서 착안, LM이 text와 (정형화된 document 내에서 ) 위치정보를 input으로 받도록 하여 internal structured document understanding 문제 해결

Narrowing the Knowledge Evaluation Gap: Open-Domain Question Answering with Multi-Granularity Answers

January 18, 2024 less than 1 minute read

ODQA에서 모델 response를 더 세분화된 수준으로 나눠서 정확성 및 정보성 측면에서 평가할 수 있는 GRANOLA QA 벤치마크 공개 및 그 세분화된 정보성을 확보하기 위한 디코딩 방식 DRAG 제안

function-calling 6 posts

Adaptation of Agentic AI

December 22, 2025 2 minute read

agentic AI 연구에서 adaptation이라는 개념이 혼용되어왔고, 체계적인 시스템 수준 설계 및 비교를 가능하게 하기 위해 adaptation 대상(agent vs tool)과 adaptation을 유도하는 신호를 구분하는 분류 체계 제안

Budget-Aware Tool-Use Enables Effective Agent Scaling

December 16, 2025 2 minute read

툴 호출 예산을 단순히 늘리는 것만으로는 에이전트 성능이 스케일(TTS)되지 않으며, 예산을 명시적으로 인식하도록 하는 Budget Tracker와 BATS 프레임워크를 도입하면 비용 대비 성능 스케일링과 Pareto frontier가 크게 개선된다.

ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration

December 10, 2025 2 minute read

작은 8B 오케스트레이터 모델이 다양한 툴과 LLM을 RL로 통합적으로 조정하여 정확도/비용/latency/유저 선호를 동시에 최적화하는 툴 기반 에이전트 프레임워크를 제안. GPT-5보다 싸고 성능 좋은 결과를 보인다.

Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models

October 13, 2025 3 minute read

generation > reflection > curation 모듈을 거쳐 incremental delta updates만 반영하는 prompt refinement framework ACE 제안

DiaTool-DPO: Direct Preference Optimization for Controlling Conversation Flow in Tool-Augmented LLMs

September 29, 2025 1 minute read

Tool-augmented dialogue를 5개 hidden state를 MDP로 formulate하고, chosen-rejected trajectory pair 자동 생성해 DPO-style objective로 학습. slot-filling/tool rejection 능력 대폭 향상

Facilitating Multi-Turn Function Calling for LLMs via Compositional Instruction Tuning

September 22, 2025 2 minute read

Task - Function으로 연결하는 Planning 기반의 multi-turn* Function Calling 프레임워크 BUTTON 제안

hallucination 5 posts

MultiChallenge: A Realistic Multi-Turn Conversation Evaluation Benchmark Challenging to Frontier LLMs

September 10, 2025 2 minute read

multi-turn setup에서의 난제 4가지 (Instruction Retention, Inference Memory, Reliable Versioned Editing, Self-Coherence)를 평가하는 벤치마크 제안, 기존 벤치마크에 성공하는 최신 SOTA 모델들도 제안...

Knowing When to Ask - Bridging Large Language Models and Data

September 20, 2024 1 minute read

Data Commons (knowledge Graph)를 활용하여 LLM 응답의 사실성과 신뢰성을 향상시켜 LLM과 실제 데이터 간의 격차 해소하는 DataGemma 소개

Pandora’s Box or Aladdin’s Lamp: A Comprehensive Analysis Revealing the Role of RAG Noise in Large Language Models

September 4, 2024 2 minute read

LLM의 RAG 상황에서 다양한 Noise를 구분하고 분석. 유익한 Noise의 경우 모델 성능이 향상된다는 것을 확인. 벤치마크 NoiserBench를 제시하여 LLM의 Noise 대응 평가 및 유익한 noise는 활용하고 해로운 noise는 줄이는 방법 제시.

Training Language Models on the Knowledge Graph: Insights on Hallucinations and Their Detectability

August 19, 2024 1 minute read

모델 사이즈가 크고 학습 시간이 길수록 hallucination이 덜 발생하는 건 맞지만, 이를 5%이하의 낮은 수준으로 줄이려면 (일반적으로 알려진 scaling law보다) 훨씬 더 큰 모델과 더 많은 컴퓨팅 자원이 필요하다.

Having Beer after Prayer? Measuring Cultural Bias in Large Language Models

August 16, 2024 2 minute read

아랍-서구문화가 대조되는 entity와 natural occurring prompt 구성된 데이터셋 CAMeL을 제안하고, 이를 통해 사례연구한 결과 LLM이 서구문화권 entity에 편향되어 있음에 대한 우려

icl 7 posts

Adaptive Retrieval-Augmented Generation for Conversational Systems

August 14, 2024 1 minute read

주어진 대화에서 전환시 외부 지식의 증강이 필요한지 여부를 선택적으로 결정하는 매커니즘 제안

Do Large Language Model Understand Multi-Intent Spoken Language ?

March 8, 2024 less than 1 minute read

SLU(Spoken Language Understanding)에 대한 LLM 활용 연구를 위한 LM-MixATIS, LM-MixSNIPS 벤치마크 및 metric 제안

Self-Discover: Large Language Models Self-Compose Reasoning Structures

March 5, 2024 less than 1 minute read

델이 여러 reasoning techniques(CoT, critical thinking, ...) 중에서 하나를 스스로 선택하여 task별로 적합한 추론 전략을 구성하도록 하는 프레임워크 제안. BBH에서 단순 CoT보다 성능이 좋고 CoT Self-consistency보다도 추...

Orion-14B: Open-source Multilingual Large Language Models

February 6, 2024 less than 1 minute read

한국어 포함 동아시아권 언어를 중심으로 학습된 multilingual model 공개. Vocab 사이즈도 상대적이지만 결코 작지 않고, 실제 성능도 훌륭한 수준.

The Power of Noise: Redefining Retrieval for RAG Systems

February 5, 2024 less than 1 minute read

RAG에서 Retrieval 에 집중하여, document와 prompt의 연관성, prompt에서 document의 위치와 수 등 다양한 요소를 평가.

Corrective Retrieval Augmented Generation

January 30, 2024 less than 1 minute read

confidence score, web search, knowledge refinement로 잘못 찾아온, 혹은 최적이 아닌 결과를 self-correction하여 모델 생성 결과에 hallucination 감소

Larger language models do in-context learning differently

March 9, 2023 1 minute read

충분히 큰 LLM은 사전학습과 배척되는 label이 주어지더라도, 사전학습 내용을 덮어두고 새로 주어진 label로 override 할 수 있음. 이 뿐만 아니라 충분히 큰 LLM은 label을 의미적으로 관련 없는 label로 대체해도 성능이 나옴.

interpretability 6 posts

Configurable Foundation Models: Building LLMs from a Modular Perspective

September 9, 2024 2 minute read

LLM을 인간의 뇌와 같이 기능적 모듈로 접근하자는 관점 제안 (brick 단위로 분해)과 경험적 실험 결과 보고

Safety Layers of Aligned Large Language Models: The Key to LLM Security

September 3, 2024 1 minute read

다양한 Aligned LLM의 내부 파라미터에 safety layer가 존재하는 것을 확인. safety layer는 악의적인 사용자 질의를 식별하고 또 거부하는 역할을 수행. 이를 바탕으로 safety를 유지하는 Finetuning 방법론 SPPFT 제안.

The Probabilities Also Matter: A More Faithful Metric for Faithfulness of Free-Text Explanations in Large Language Models

August 20, 2024 1 minute read

Counterfactural input에 간섭을 추가하는 방법으로 faithfulness 측정할 때 LM output 확률분포를 고려하는 Correlational Counterfactural Test(CCT) 제안

Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2

August 2, 2024 less than 1 minute read

LM (Gemma 2) interpretability를 위한 Gemma Scope suite 공개에 따른 technical Report

Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders

July 23, 2024 1 minute read

기존 vanilla ReLU를 jumpReLU라는 비연속 activation으로 대체하여 새로운 SAE (sparse autoencodesr) SOTA, 비연속적인 activation 사용하지만 straight-through estimator로 효과적으로 학습

Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet

June 4, 2024 1 minute read

Claude3-sonet의 중간 layer에서 나온 Residual stream로 Sparse Auto-encoder (SAE) 학습, SAE와 그 feature vector 활용하여 해석 가능한 수준의 특성 확인가능.

knowledge-conflicts 5 posts

When Personalization Misleads: Understanding and Mitigating Hallucinations in Personalized LLMs

January 26, 2026 4 minute read

Personalization은 단순히 user-aligned bias가 아니라 factual representation과 entangle되면서 체계적인 hallucination을 만든다는 사실을 representation level에서 밝히고 inference-time에서 이를 제...

The FACTS Grounding Leaderboard: Benchmarking LLMs’ Ability to Ground Responses to Long-Form Input

December 18, 2024 2 minute read

long input에 대한 response의 사실성 평가 벤치마크 제안. 최대 32K token의 입력 처리, 자동 평가 프레임워크 공개

Real-time Fake News from Adversarial Feedback

October 21, 2024 1 minute read

LLM의 fake news를 더 잘 생성하게 하는 방법. 학습 이후 발생되는 사건의 fake news 탐지를 위해, adversarial iterative fake news 생성 파이프라인 제안

Pandora’s Box or Aladdin’s Lamp: A Comprehensive Analysis Revealing the Role of RAG Noise in Large Language Models

September 4, 2024 2 minute read

LLM의 RAG 상황에서 다양한 Noise를 구분하고 분석. 유익한 Noise의 경우 모델 성능이 향상된다는 것을 확인. 벤치마크 NoiserBench를 제시하여 LLM의 Noise 대응 평가 및 유익한 noise는 활용하고 해로운 noise는 줄이는 방법 제시.

Having Beer after Prayer? Measuring Cultural Bias in Large Language Models

August 16, 2024 2 minute read

아랍-서구문화가 대조되는 entity와 natural occurring prompt 구성된 데이터셋 CAMeL을 제안하고, 이를 통해 사례연구한 결과 LLM이 서구문화권 entity에 편향되어 있음에 대한 우려

llm-as-a-judge 5 posts

Cooper: Co-Optimizing Policy and Reward Models in Reinforcement Learning for Large Language Models

August 18, 2025 2 minute read

policy와 reference-based RM (verifyRM) 을 동시에 update하는 RL framework COOPER 제안. reward hacking을 막기 위해 rule-based positives와 LLM-generated negatives를 활용한 contras...

RefCritic: Training Long Chain-of-Thought Critic Models with Refinement Feedback

July 28, 2025 1 minute read

해답의 정확성 및 개선 기여 피드백을 모두 평가하는 dual-reward RL-trained critic model을 도입한 RefCritic 제안, 수리 추론 과제에서 큰 성능 향상

Scaling Laws of Synthetic Data for Language Models

March 26, 2025 2 minute read

SYNTHLLM 방식으로 생성한 합성데이터는 LLM finetuning에 대해 예측 가능하고 효과적으로 scale 되고, 수정한 scaling law에 따라 natural data 부족에 대한 확장가능한 솔루션이 된다고 주장

Learning to Plan & Reason for Evaluation with Thinking-LLM-as-a-Judge

February 5, 2025 1 minute read

사전에 평가 기준을 제공하지 않고, 자체적으로 평가 계획-실행-판단을 분리하여 수행하는 Self-training loop의 thinking-llm-as-a-judge framework 제안, 적은 데이터로도 SOTA 성능달성

LLM Evaluators Recognize and Favor Their Own Generations

December 17, 2024 less than 1 minute read

LLM은 자기가 만든 결과를 선호한다는 기존 주장에 대한 심층 논의 (결론: 실제 그렇다)

odqa 5 posts

GraphRAG-R1: Graph Retrieval-Augmented Generation with Process-Constrained Reinforcement Learning

September 3, 2025 2 minute read

RL(GRPO)에 2가지 constrained reward(RPA + CAF) 적용하여 GraphRAG agent 학습 > 검색할 때 입력으로 triplet과 자연어 하이브리드 활용하여 multi-hop QA에서 큰 성능 향상 확인

SSRL: Self-Search Reinforcement Learning

August 28, 2025 1 minute read

검색엔진이나 다른 LLM 등 외부 tool 없이 검색을 Full-simulation해서 RL → real-world로 전이 가능한 self-search 모델 구축

ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities

July 22, 2024 less than 1 minute read

이전 공개했던 모델(Chat QA 1.5)을 LLaMA3-70B의 context length 확장하면서 instruction following / RAG capability 향상시키는 방법 제시

Narrowing the Knowledge Evaluation Gap: Open-Domain Question Answering with Multi-Granularity Answers

January 18, 2024 less than 1 minute read

ODQA에서 모델 response를 더 세분화된 수준으로 나눠서 정확성 및 정보성 측면에서 평가할 수 있는 GRANOLA QA 벤치마크 공개 및 그 세분화된 정보성을 확보하기 위한 디코딩 방식 DRAG 제안

Question and Answer Test-Train Overlap in Open-Domain Question Answering Datasets

January 13, 2023 less than 1 minute read

ODQA에서 자주 사용하는 벤치마크 NQ에 대한 비판적 시각을 담은 논문. 기존 벤치마크는 train에서 본 내용을 암기하는 역할을 테스트하는 것으로 보임.

optimization 6 posts

Direct Multi-Turn Preference Optimization for Language Agents

October 29, 2024 1 minute read

Multi-turn 에서 RL Objectives를 직접 optimize하는 손실함수의 Direct Multi-Turn Preference Optimization (DMPO) 제안

Planning Like Human: A Dual-process Framework for Dialogue Planning

August 28, 2024 1 minute read

익숙한 상황을 처리하는 intuitive (fast) 정책 모델과 새로운 시나리오를 위한 analytical (slow)의 정책 모델을 상호 보완적으로 사용하는 이중 dialogue planning 프레임워크 제안

The boundary of neural network trainability is fractal

February 13, 2024 less than 1 minute read

복잡한 반복 패턴인 Fractal 패턴이 AI 학습 프로세스(하이퍼파라미터)를 제어하는 setting에 나타난다.

MosaicBERT: A Bidirectional Encoder Optimized for Fast Pretraining

January 4, 2024 less than 1 minute read

빠른 사전학습을 위한 BERT-style encoder의 architecture와 training 기법 소개.

UltraFastBERT : Exponentially Faster Language Modelling

December 11, 2023 less than 1 minute read

FFNN을 FFF(Fast FeedForward)로 대체하여 x78의 속도 향상

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

December 6, 2023 less than 1 minute read

비슷한 사이즈 Transformer 대비 5배 빠른 추론속도

peft 5 posts

Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights

June 30, 2025 2 minute read

prompt를 input으로, LoRA-tuend 파라미터를 output으로 하여 SFT하는 모델 DnD 제안. DnD를 한 번 학습 해두면 task마다 추가 학습 없이도 task-specific LoRA weight를 만들 수 있다.

Differential Transformer

October 10, 2024 1 minute read

Q/K를 각각 두 그룹으로 나누어 2개의 softmax attention map간 차이를 계산, relevant context에 대한 attention을 키우고 노이즈는 제거하는 방식의 transformers 변형 제안, hallucination 개선

Adaptive Retrieval-Augmented Generation for Conversational Systems

August 14, 2024 1 minute read

주어진 대화에서 전환시 외부 지식의 증강이 필요한지 여부를 선택적으로 결정하는 매커니즘 제안

Generative Representational Instruction Tuning

February 26, 2024 less than 1 minute read

text embedding과 generation 통합하는 Generative Representational Instruction Tuning 제안. 단일모델인 GritLM은 embedding(MTEB) 및 generation task(BBH...)에서 모두 SoTA를 달성.

Specialized Language Models with Cheap Inference from Limited Domain Data

February 19, 2024 less than 1 minute read

1) generic pretraining cost 2) domain-specific pretraining cost 3) inference cost 4) size of specific domain training set 네가지 제약조건 하에서 가장 효율적인 학습에 대한 emperic...

petl 6 posts

Divide-and-Conquer Meets Consensus: Unleashing the Power of Functions in Code Generation

November 20, 2024 1 minute read

Divide-and-Conquer 전략에 기능적 합의(functional consensus)를 접목한 CodeGen framework FUNCODER 제안

Selective Attention Improves Transformer

October 8, 2024 1 minute read

attention 연산에서 파라미터 변경 없이, 생성된 token이 다른 token이 더이상 필요 없다고 결정할 수 있도록 처리, 미래 시점에서는 해당 token이 불필요하다고 판단했던 token들에 대한 attention을 줄이는 방법으로 효과적으로 메모리 사용량과 계산 비용을 ...

Configurable Foundation Models: Building LLMs from a Modular Perspective

September 9, 2024 2 minute read

LLM을 인간의 뇌와 같이 기능적 모듈로 접근하자는 관점 제안 (brick 단위로 분해)과 경험적 실험 결과 보고

Zero-Shot Cross-Domain Dialogue State Tracking via Dual Low-Rank Adaptation

August 2, 2024 less than 1 minute read

multi-layer구조를 기반으로 한 transformer 계열 모델에서 prompt가 뒤쪽으로 갈수록 잊혀지는 문제를 완화하는 DualLoRA 제안

Specialized Language Models with Cheap Inference from Limited Domain Data

February 19, 2024 less than 1 minute read

1) generic pretraining cost 2) domain-specific pretraining cost 3) inference cost 4) size of specific domain training set 네가지 제약조건 하에서 가장 효율적인 학습에 대한 emperic...

SliceGPT: Compress Large Language Models by Deleting Rows and Columns

January 29, 2024 less than 1 minute read

weight matrtix를 더 고밀도의 작은 행렬로 slicing하는 방식의 새로운 post training sparsification 제안. 성능 drop은 1%~10% 내로 방어하면서 파라미터(embedding 포함)는 최대 25%까지 제거 가능.

representation-learning 5 posts