Do Large Language Model Understand Multi-Intent Spoken Language ?

March 8, 2024 less than 1 minute read

Meta info.

TL; DR

SLU(Spoken Language Understanding)에 대한 LLM 활용 연구를 위한 LM-MixATIS, LM-MixSNIPS 벤치마크 및 metric 제안

Untitled

LLM이 multi-intent SLU에 대해 기존의 SOTA모델과 비교한 성능은 어떠한가?
- multi-intent SLU: multi-intent detection + (entity) slot-filling
LLM 규모가 성능에 영향을 미치는가?
LLM을 활용한 SLU에 적합한 metric은?

Experiment
- model: QLoRA-tuned LLM 사용, task-specific fine-tuning 필요하다고 언급
- datasets: LM-MixX(sub-intent instruction 포함) 적용
  - sub-intent instructions: 발화에서 각 intent별로 해당하는 entity 추출 >slot-filling 으로 연결
- prompt: pic1 참고
- metric: Accuracy for ID, F1 for SF, Overall Accuracy + ESA, CSA 제안
  - ESA: entity를 얼마나 맞췄는지
  - CSA: Accuracy for ID * ESA
Results
- LM-MixX 데이터셋과 제안된 프롬프트라면 LLM 성능 충분히 supervised-SOTA 이상으로 좋다
- 다만 모델 크기가 크다고 반드시 성능이 좋은것은 아니다 (MixATIS)
- ESA, CSA는 더 엄격하게 SF를 평가하므로 전반적인 SLU에 더 개선된 metric이다