1 minute read

Meta info.

TL; DR

Counterfactural input์— ๊ฐ„์„ญ์„ ์ถ”๊ฐ€ํ•˜๋Š” ๋ฐฉ๋ฒ•์œผ๋กœ faithfulness ์ธก์ •ํ•  ๋•Œ LM output ํ™•๋ฅ ๋ถ„ํฌ๋ฅผ ๊ณ ๋ คํ•˜๋Š” Correlational Counterfactural Test(CCT) ์ œ์•ˆ

image.png

Problem States

self-consistency์ด Fauthfulness๋ฅผ ๋ณด๊ธฐ์— ์ถฉ๋ถ„ํ•œ ์ง€ํ‘œ๊ฐ€ ์•„๋‹ˆ๋‹ค.

  • Fauthfulness: ๋ชจ๋ธ์ด ๋‹ต๊นŒ์ง€์˜ ์ถ”๋ก ๊ณผ์ •๊นŒ์ง€ ์ •ํ™•ํ•˜๊ฒŒ ๋‚˜ํƒ€๋‚ด๋Š”๊ฐ€?
    • ๋ชจ๋ธ ์ถœ๋ ฅ์˜ faithfulness๊ฐ€ ์ค‘์š”ํ•œ๊ฑด ํ†ต์ƒ ์•Œ๋ ค์ง„ ์‚ฌ์‹ค
    • ์ด๋ฅผ ํŒ๋‹จํ•˜๊ธฐ๋กœ๋Š” ๋ณดํ†ต Counterfactural Test(CT)์—์„œ โ€˜(์ž…๋ ฅ์„ ๋ฐ”๊พธ๋Š” ๋“ฑ ๋ชจ๋ธ behavior๊ฐ€ ๋ฐ”๋€” ์ˆ˜ ์žˆ๋„๋ก ์กฐ์ž‘ํ–ˆ์„ ๋•Œ) ์ถœ๋ ฅ์ด ๋ฐ”๋€Œ๋А๋ƒโ€™์— ๋Œ€ํ•œ ์—ฌ๋ถ€(=binary, ๋˜๋Š” ์ผ์ข…์˜ consistency)๋กœ ๋ณด๊ณ  ์žˆ์Œ.
  • Interventional Addition(IA): IA๊ฐ€ input์— ์‚ฝ์ž…๋์„ ๋•Œ, ๋ชจ๋ธ ์˜ˆ์ธก์ด ๋ฐ”๋€Œ๋ฉด ํ•ด๋‹น IA๋Š” ์œ ์˜๋ฏธํ•œ factor๋กœ ๊ฐ„์ฃผํ•˜๊ณ , ํ•ด๋‹น IA๊ฐ€ ์‚ฌํ›„ ์„ค๋ช…์— ๋“ฑ์žฅํ•˜๋Š”์ง€ ํ™•์ธ
  • Research Question: ๋ชจ๋ธ์˜ ์˜ˆ์ธก distribution์ด ์–ด๋–ป๊ฒŒ ๋ฐ”๋€Œ๋Š”์ง€ ๋ณด๋Š”๊ฒŒ ์–ด๋–จ๊นŒ?
    • ์ถœ๋ ฅ์ด โ€˜์–ด๋А ์ •๋„โ€™๋‚˜ ๋ฐ”๋€Œ๋Š”์ง€ ๋ณด๊ณ ์ž ํ•˜๋Š” ์‹œ๋„

Suggestions

Correlational Explanatory Faithfulness (CEF, ์ƒ๊ด€๊ด€๊ณ„ ์„ค๋ช… ์ถฉ์‹ค๋„?) Metric ์ œ์•ˆ

  • (์ „์ œ) faithfulness๋ฅผ ํŒ๋‹จํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š”, ๋‹จ์ˆœํžˆ ์œ ์˜ํ•œ factor๋ฅผ ์‹๋ณ„ํ•˜๋Š” ์ˆ˜์ค€์— ๊ทธ์น˜์ง€ ๋ง๊ณ  ๊ทธ๋ ‡๊ฒŒ ์‹๋ณ„๋œ factor๋“ค์ด ๊ทธ๋ ‡์ง€ ์•Š์€ factor๋ณด๋‹ค ์ž์ฃผ ์–ธ๊ธ‰๋˜๋„๋ก ํ•ด์•ผ๋œ๋‹ค.
    • (CT์™€ ์ฐจ์ด) ๊ฐœ์ž…์ด ์˜ˆ์ธก์— ์˜ํ–ฅ์„ ๋ฏธ์นœ๋‹ค๋ฉด, ์„ค๋ช…์—์„œ ์–ธ๊ธ‰๋งŒ ๋˜๋ฉด faithfulํ•˜๋‹ค๊ณ  ํ‰๊ฐ€๋  ์ˆ˜ ์žˆ์Œ
  • (๋ฐฉ๋ฒ•)ย Correlationalย Counterfactural Test (CCT)์— ์ ์šฉ: counterfactural test๋ฅผ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•ด
    1. intervention: IA ๊ฐœ์ž… ๋ฐœ์ƒ
    2. prediction impact ์ธก์ •: ๊ฐœ์ž… ์ „ํ›„ ๋ชจ๋ธ ์ถœ๋ ฅ์˜ ๋ถ„ํฌ(=์˜ˆ์ธก)๋ฅผ ํ™•์ธ ํ•˜๊ณ  TVD(์ž‘์€ ํ™•๋ฅ  ์‚ฌ์ด ๋ณ€ํ™”๋Š” ๊ฐ€์ค‘์น˜ ๋œ์ฃผ๋Š” ๋ฐฉ๋ฒ•)๋กœ ํ™•์ธ
    3. explanation mention ์ธก์ •: ๋ชจ๋ธ ์„ค๋ช…์— ์œ ์˜ํ•œ Factor์˜ ์–ธ๊ธ‰์€ ์ฆ๊ฐ€ํ•ด์•ผ ํ•˜๊ณ  ๊ทธ๋ ‡์ง€ ์•Š์€ Factor ์–ธ๊ธ‰์€ ์ƒ๋žตํ•˜๊ธธ ๊ธฐ๋Œ€
  • (ํ•ด์„) ๊ฐ’์ด ํด์ˆ˜๋ก ํ•ด๋‹น factor๊ฐ€ ๋ชจ๋ธ ์˜ˆ์ธก์— ์˜ํ–ฅ์ด ์ปธ๋‹ค

Effects

  • experiment setup:
    • Datasets: ์„ค๋ช…์ด ๋ถ™์€ downstream task datasets. e-SNLI(NLI), ComVE(common sense), ECQA(multiple choice QA)
    • backbone: Llama-2 series, 20-shot prompt
    • methods: predict โ†’ explain(PE) or explain โ†’ predict(EP)
    • intervention: ๋ฌธ์žฅ์˜ ๋ช…์‚ฌ ์•ž์— ํ˜•์šฉ์‚ฌ ์ถ”๊ฐ€ or ๋™์‚ฌ ์•ž์— ๋ถ€์‚ฌ ์ถ”๊ฐ€
      • Llama-2-70B ๋ชจ๋ธ๋กœ ๋ง์ด ๋˜๋Š” ๋ฌธ์žฅ์ธ์ง€ filtering
  • results:
    • intervention๊ณผ explanation๊ฐ„ ์ƒ๊ด€์„ฑ: ๋ชจ๋ธ ์„ค๋ช…์€ intervention์ด ๋ชจ๋ธ์˜ ์˜ˆ์ธก์— ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š” ๊ฒฝ์šฐ ํ•ด๋‹น ๋‹จ์–ด๋ฅผ ์–ธ๊ธ‰ํ•  ๊ฐ€๋Šฅ์„ฑ์ด ๋” ๋†’์•˜๋‹ค
      • figure 1ย : x์ถ•-TDV, y์ถ•-์„ค๋ช…์— ์–ธ๊ธ‰๋œ ํšŸ์ˆ˜, True/False-faithfulํ•œ ์„ค๋ช…์ธ์ง€ ์—ฌ๋ถ€
        • e-SNLI: ์œ ์˜ํ•œ factor๋ฅผ ์„ค๋ช…์— ๋” ์ž์ฃผ ์–ธ๊ธ‰ (์–‘์˜ ์ƒ๊ด€๊ด€๊ณ„)
        • ECQA: ์„ค๋ช…์— factor๊ฐ€ ์ž์ฃผ ์–ธ๊ธ‰๋˜๊ธด ํ•˜๋‚˜ ์œ ์˜ํ•˜์ง„ ์•Š์Œ (์ƒ๊ด€์„ฑ ์—†์Œ)
          • ๋ฐ์ดํ„ฐ์…‹์— ์ฃผ์„๋œ ์„ค๋ช…์ด ์ด์ƒ(ํ• ์ง€๋„)
      • ๋ชจ๋ธ์ด ํด์ˆ˜๋ก ์„ค๋ช… ์ •ํ™•๋„์™€ faithfulness ๋ชจ๋‘ ๋†’์€ ํŽธ
      • EP๋ณด๋‹ค๋Š” PE๊ฐ€ ๋” faithful

Personal note. counterfactor๋‚˜ faithfulness ๊ฐ€ ํ‚ค์›Œ๋“œ๊ฐ™์•„์„œ kc ๊ด€๋ จ๋„๊ฐ€ ๋†’์„์ค„ ์•Œ๊ณ  ์‹œ์ž‘ํ–ˆ๋Š”๋ฐ์‚ฌ์‹ค์€ Interpretability์— ๊ฐ€๊นŒ์šด ๋‚ด์šฉ์ด์—ˆ์Šต๋‹ˆ๋‹คโ€ฆ๋งŒ์‹ฌํ”Œํ•œ ๊ฒฐ๊ณผ์™€ ๋…ผ๋ฌธ ๊ธธ์ด ๋Œ€๋น„ (์งง์•„์„œ ๋” ๊ทธ๋Ÿด์ง€๋„,,) ์€๊ทผ ๋น„์•ฝ์ด ๋งŽ์•„์„œ๊ฒฐ๊ณผ ๋‘๊ณ  ํ•ด์„ํ•˜๋Š”๋ฐ ์‹œ๊ฐ„์ด ์‚ด์ง ๊ฑธ๋ฆฐ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค๐Ÿค”