ICML2026 관심 논문

July 6, 2026 4 minute read

초개인화 연구 노트 방향에서 볼 만한 ICML 2026 논문 정리. 관심 워크샵 포스트와 짝. 아래는 모두 ICML 2026 accepted로 확인된 것만(연도나 venue가 다른 후보는 제외). 확인은 ICML 2026 공식 virtual 사이트 poster/oral 페이지와 OpenReview로 했다.

먼저 꼭 볼 것

노트에 정확히 꽂히는 세 편.

Modeling User Preferences as Distribution in Federated Recommendation (forum). 유저를 point estimate가 아니라 분포로 표현. 노트 To-do “$z_u$를 point estimate 이상으로”가 그대로 논문화된 것.
A Judge-Aware Ranking Framework for Evaluating LLMs without Ground Truth (poster). Bradley-Terry에 judge 신뢰도를 얹어 정답 없이 순위 매김. 평가 섹션 정면.
Grounded in Reality: Learning and Deploying Proactive LLM from Offline Logs (poster). 옳은 질문을 언제 하고 언제 멈추나, 그리고 offline 로그로 학습(데이터 문제까지).

1. representation: 유저모델을 무엇으로 어떻게 세우나

Rethinking Personalization in LLMs at the Token Level (PerCE) (forum). 선호에 걸린 토큰을 causal하게 짚어 인코딩.
TOM-SWE: User Mental Modeling for Software Engineering Agents (poster). 본 에이전트와 분리된 theory-of-mind 유저모델.
From Volume to Value: Preference-Aligned Memory Construction for On-Device RAG (forum). 선호를 compact하고 stable한 개인 맥락으로.
Expectation Alignment of Language Models for Real-World User Expectations (forum). 유저가 실제로 기대하는 바를 모델링.

2. 관찰에서 online 갱신, continual, memory (조건1)

T-POP: Test-Time Personalization with Online Preference Feedback (poster). 프리즈된 LLM을 온라인 선호로 즉시 갱신.
Cold-Start Personalization via Bayesian Adaptive Questioning (forum). 무이력에서 적응적 질문으로 유저모델 부트스트랩.
Benchmarking Agent Memory in Interdependent Multi-Session Agentic Tasks (poster). 세션을 넘어 기억을 잇는지 평가(조건1의 평가판).
AdaMEM: Test-Time Adaptive Memory for Language Agents (poster), SimpleMem: Efficient Lifelong Memory for LLM Agents (poster). 테스트타임 적응, 평생 메모리.
Position: Modular Memory is the Key to Continual Learning Agents (poster). 지속 운영과 개인화를 위한 모듈형 메모리 주장.

3. 평가: 정답 없는 문제

SCOPE: Selective Conformal Optimized Pairwise LLM Judging (poster). 편향된 LLM judge에 유한표본 통계 보증.
RubricRobustness (poster). rubric과 LLM judge 벤치의 강건성 결함을 노출.
Conversation for Non-verifiable Learning: Self-Evolving LLMs through Meta-Evaluation (poster). 정답 없는 task를 메타평가로 학습/평가.

4. proactive 개입, 언제 끼어들지

ProactiveLLM: Learning Active Interaction for Streaming Large Language Models (poster). 기다리지 않고 언제 먼저 말할지 학습.
Implicit Turn-Wise Policy Optimization for Proactive User-LLM Interaction (poster). 멀티턴 proactive 상호작용 RL.
Are Large Reasoning Models Interruptible? (poster). 실시간 개입과 맥락 변화 아래 행동.
Teaching Agents to Ask Effective Clarification Questions (poster). 명령이 모호할 때 언제 되물을지.
Learning When to Act or Refuse (poster). 행동 대 보류 결정.

5. 개인화 평가 벤치마크 (평가와 데이터 참고)

BESPOKE: Benchmark for Search-Augmented LLM Personalization via Diagnostic Feedback (poster).
Persona2Web: Benchmarking Personalized Web Agents for Contextual Reasoning with User History (forum).
MCP-Persona: Benchmarking LLM Agents on Real-World Personal Applications via Environment Simulation (forum).
AppWorld-UL: Benchmarking Diverse Agent-User Interactions for Tool-Use (poster). 유저를 LLM으로 시뮬.

6. 선호와 alignment (per-user, heterogeneous, proxy reward)

VALUEFLOW: Toward Pluralistic and Steerable Value-based Alignment (Oral) (oral). 다양하고 충돌하는 가치에 맞춰 조종. 11일 Pluralistic Alignment 워크샵과 직결.
Position: Measuring Human Preferences in RLHF is a Social Science Problem (Spotlight) (poster). 선호 측정의 타당성 자체를 문제 삼음. 노트의 “속성 맞히기 회의”와 통함.
Mitigating Reward Hacking in RLHF via Bayesian Non-negative Reward Modeling (Oral) (oral). proxy reward 과최적화(노트 언급) 정면.
Distortion of AI Alignment Revisited: RLHF is a Decent Utilitarian Aligner (poster), Regularization in the Axiomatic Approach to Learning from Human Preferences (poster). 이질적 유저 선호의 집계 이론.
Decoding Safety Feedback from Diverse Raters (poster). rater 집단별 이질성 모델링.

7. recsys, off-policy 평가

Off-Policy Learning in Large Action Spaces: Optimization Matters More Than Estimation (poster).
ProRL: Effective RL for Proactive Recommendation via Rectified Policy Gradient Estimation (poster).
Learning to Rank from Incomplete Rankings (poster). 희소한 피드백에서 top-k 랭킹.
Off-Policy Evaluation for Missingness-Aware Policies in MDPs with Rewards Missing Not at Random (poster). 암묵 피드백의 결측 처리.

참고: 얇은 축

user simulator(평가 3축의 3번)는 ICML 2026엔 얇다. 가장 가까운 것이 Position: Time to Close The Validation Gap in LLM Social Simulations (poster) 정도이고, 그 흐름의 방법론(UserLM, 시뮬 realism 검증 등)은 arXiv와 NeurIPS 2025, ACL 2026 쪽에 몰려 있다.

일정 (서울 시간, KST)

포스터는 전부 Hall A(부스 번호), oral은 Hall D2/B2. ICML 사이트는 PDT로 표기하니 현지에선 아래 KST로 보면 된다.

7/7 (화)

10:30 T-POP (#1705), AdaMEM (#2200)
14:00 SCOPE (#1912), Cold-Start (#2312), SimpleMem (#1905), Agent Memory multi-session (#3512), Distortion (#3215), Learning to Rank from Incomplete (#4610)
14:15 Oral, Hall D2: VALUEFLOW

7/8 (수)

10:00 Oral, Hall B2: Reward Hacking (Bayesian Non-negative Reward)
10:30 TOM-SWE (#810), Expectation Alignment (#3208), BESPOKE (#2307)
14:30 Conversation for Non-verifiable (#208), Regularization in Axiomatic Preferences (#3213)
17:00 From Volume to Value (#4002), ProactiveLLM (#201), OPE Missingness (#209)

7/9 (목)

10:30 RubricRobustness (#3605), ProRL (#1004), Diverse Raters (#3305)
14:30 Modeling Preferences as Distribution (#4303), Judge-Aware Ranking (#2206), PerCE (#3611), Implicit Turn-Wise (#3800), Interruptible (#2111), Persona2Web (#4402), Off-Policy Large Action (#315)
17:00 Grounded in Reality (#907), Modular Memory (#2507), When to Act or Refuse (#3006), MCP-Persona (#1800), AppWorld-UL (#4407), Measuring Preferences is Social Science (#3114), Validation Gap in Social Simulations (#405)

일정 미확정: Teaching Agents to Ask Effective Clarification Questions (사이트에 세션 미표기).