Posts 2025
Dec 2025 5 posts
A Survey on Personalized and Pluralistic Preference Alignment in Large Language Models
Adaptation of Agentic AI
Budget-Aware Tool-Use Enables Effective Agent Scaling
ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration
Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory
Nov 2025 3 posts
Oct 2025 3 posts
Sep 2025 5 posts
DiaTool-DPO: Direct Preference Optimization for Controlling Conversation Flow in Tool-Augmented LLMs
Facilitating Multi-Turn Function Calling for LLMs via Compositional Instruction Tuning
Am I Me or You? State-of-the-Art Dialogue Models Cannot Maintain an Identity
MultiChallenge: A Realistic Multi-Turn Conversation Evaluation Benchmark Challenging to Frontier LLMs
GraphRAG-R1: Graph Retrieval-Augmented Generation with Process-Constrained Reinforcement Learning
Aug 2025 4 posts
SSRL: Self-Search Reinforcement Learning
Cooper: Co-Optimizing Policy and Reward Models in Reinforcement Learning for Large Language Models
TO CHAT OR TASK: a Multi-turn Dialogue Generation Framework for Task-Oriented Dialogue Systems
Persona Vectors: Monitoring and Controlling Character Traits in Language Models
Jul 2025 4 posts
RefCritic: Training Long Chain-of-Thought Critic Models with Refinement Feedback
Exploring Persona Sentiment Sensitivity in Personalized Dialogue Generation
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
MemBench: Towards More Comprehensive Evaluation on the Memory of LLM-based Agents
Jun 2025 5 posts
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights
LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs
Dynamic Epistemic Friction in Dialogue
The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity
CONFETTI: Conversational Function-Calling Evaluation Through Turn-Level Interactions
May 2025 2 posts
Apr 2025 4 posts
Mar 2025 8 posts
Reasoning to Learn from Latent Thoughts
Scaling Laws of Synthetic Data for Language Models
A-MEM: Agentic Memory for LLM Agents
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding
Adversarial Policy Optimization for Offline Preference-based Reinforcement Learning
Babel: Open Multilingual Large Language Models Serving Over 90% of Global Speakers
MultiAgentBench: Evaluating the Collaboration and Competition of LLM agents
Chain of Draft: Thinking Faster by Writing Less
Feb 2025 4 posts
Jan 2025 5 posts
The Hyperfitting Phenomenon: Sharpening and Stabilizing LLMs for Open-Ended Text Generation
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training
The GAN is dead; long live the GAN! A Modern GAN Baseline
Slow Perception: Let's Perceive Geometric Figures Step-by-step