Posts 2024
Dec 2024 7 posts
Alignment Faking in Large Language Models
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
The FACTS Grounding Leaderboard: Benchmarking LLMs’ Ability to Ground Responses to Long-Form Input
Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy, Research, and Practice
LLM Evaluators Recognize and Favor Their Own Generations
Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM's Reasoning Capability
Reverse Thinking Makes LLMs Stronger Reasoners
Nov 2024 5 posts
Counterfactual Generation from Language Models
Divide-and-Conquer Meets Consensus: Unleashing the Power of Functions in Code Generation
Questioning the Survey Responses of Large Language Models
CRAB: Constraint Back-translation Improves Complex Instruction Following of Large Language Models
Detecting Training Data of Large Language Models via Expectation Maximization
Oct 2024 7 posts
Direct Multi-Turn Preference Optimization for Language Agents
Inference Scaling for Long-Context Retrieval Augmented Generation
Real-time Fake News from Adversarial Feedback
LC-LLM RAG: Long-Context LLMs Meet RAG
MoEE: Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free
Differential Transformer
Selective Attention Improves Transformer
Sep 2024 8 posts
Diversify and Conquer: Diversity-Centric Data Selection with Iterative Refinement
Dialogue Ontology Relation Extraction via Constrained Chain-of-Thought Decoding
Knowing When to Ask - Bridging Large Language Models and Data
Theory, Analysis, and Best Practices for Sigmoid Self-Attention
Configurable Foundation Models: Building LLMs from a Modular Perspective
Pandora's Box or Aladdin's Lamp: A Comprehensive Analysis Revealing the Role of RAG Noise in Large Language Models
Safety Layers of Aligned Large Language Models: The Key to LLM Security
Text2SQL is Not Enough: Unifying AI and Databases with TAG
Aug 2024 12 posts
Planning Like Human: A Dual-process Framework for Dialogue Planning
To Code, or Not To Code? Exploring Impact of Code in Pre-training
The Probabilities Also Matter: A More Faithful Metric for Faithfulness of Free-Text Explanations in Large Language Models
Training Language Models on the Knowledge Graph: Insights on Hallucinations and Their Detectability
Having Beer after Prayer? Measuring Cultural Bias in Large Language Models
Adaptive Retrieval-Augmented Generation for Conversational Systems
Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach
RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework
Word Translation Without Parallel Data
Concise Thoughts: Impact of Output Length on LLM Reasoning and Cost
Zero-Shot Cross-Domain Dialogue State Tracking via Dual Low-Rank Adaptation
Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2
Jul 2024 6 posts
Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders
ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities
Enhancing HNSW Index for Real-Time Updates: Addressing Unreachable Points and Performance Degradation
NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window?
RouteLLM: Learning to Route LLMs with Preference Data
LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs
Jun 2024 5 posts
Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs
From RAGs to rich parameters: Probing how language models utilize external knowledge over parametric information for factual queries
Multi-Head RAG: Solving Multi-Aspect Problems with LLMs
SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales
Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet
May 2024 2 posts
Apr 2024 5 posts
Mar 2024 6 posts
Social Learning: Towards Collaborative Learning with Large Language Models
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression
RAGGED: Towards Informed Design of Retrieval Augmented Generation Systems
Is Cosine-Similarity of Embeddings Really About Similarity?
Do Large Language Model Understand Multi-Intent Spoken Language ?
Self-Discover: Large Language Models Self-Compose Reasoning Structures
Feb 2024 12 posts
Benchmarking Large Language Models in Retrieval-Augmented Generation
Should We Respect LLMs? A Cross-Lingual Study on the Influence of Prompt Politeness on LLM Performance
Generative Representational Instruction Tuning
LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language Models
Chain-of-Thought Reasoning Without Prompting
Unsupervised Evaluation of Code LLMs with Round-Trip Correctness
Specialized Language Models with Cheap Inference from Limited Domain Data
The boundary of neural network trainability is fractal
Orion-14B: Open-source Multilingual Large Language Models
The Power of Noise: Redefining Retrieval for RAG Systems
Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens
CRUD-RAG: A Comprehensive Chinese Benchmark for Retrieval-Augmented Generation of Large Language Models
Jan 2024 18 posts
Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text
Corrective Retrieval Augmented Generation
Knowledge Fusion of Large Language Models
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
Deductive Closure Training of Language Models for Coherence, Accuracy, and Updatability
DocLLM: A layout-aware generative language model for multimodal document understanding
Self-Rewarding Language Models
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
ChatQA: Building GPT-4 Level Conversational QA Models
Narrowing the Knowledge Evaluation Gap: Open-Domain Question Answering with Multi-Granularity Answers
Bootstrapping LLM-based Task-Oriented Dialogue Agents via Self-Talk
Blending is All You Need
LLaMA Pro: Progressive LLaMA with Block Expansion
MosaicBERT: A Bidirectional Encoder Optimized for Fast Pretraining
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
Improving Text Embeddings with Large Language Models
Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models
Making Large Language Models A Better Foundation For Dense Retrieval