1 minute read

Meta info.

TL; DR

LLM์˜ fake news๋ฅผ ๋” ์ž˜ ์ƒ์„ฑํ•˜๊ฒŒ ํ•˜๋Š” ๋ฐฉ๋ฒ•. ํ•™์Šต ์ดํ›„ ๋ฐœ์ƒ๋˜๋Š” ์‚ฌ๊ฑด์˜ fake news ํƒ์ง€๋ฅผ ์œ„ํ•ด, adversarial iterative fake news ์ƒ์„ฑ ํŒŒ์ดํ”„๋ผ์ธ ์ œ์•ˆ

image.png

image.png

image.png

image.png

image.png

image.png

Problem States

  • ๊ธฐ์กด fake news ๋ฐ์ดํ„ฐ๋Š” PolitiFact๋‚˜ Snopes ๋“ฑ fact check web site์˜ ๊ณผ๊ฑฐ ๋ฐ์ดํ„ฐ ํ™œ์šฉ โ†’ ๋ชจ๋ธ์ด ํ•™์Šตํ–ˆ์„ ๊ฐ€๋Šฅ์„ฑ
  • ์ตœ์‹  LLM๋“ค์ด ์ƒ๊ฐ๋ณด๋‹ค fake news ํƒ์ง€ ์ž˜ ํ•˜๋Š”๋ฐ, ์‹คํ—˜ ๊ฒฐ๊ณผ ํƒ์ง€ ์„ฑ๋Šฅ์ด (LLM ํ•™์Šต ์ดํ›„ ๋ฐœ์ƒ๋˜๋Š” ์‚ฌ๊ฑด์— ๋Œ€ํ•ด์„œ๋„ ๊ณ„์†ํ•ด์„œ) ํ–ฅ์ƒ๋จ == ์‚ฌ์‹ค ์ถ”๋ก  ๋Šฅ๋ ฅ์ด ํ–ฅ์ƒ๋˜์—ˆ๊ธฐ ๋ณด๋‹ค๋Š” fake news์˜ ์–ด๋–ค pattern์„ ํ•™์Šตํ–ˆ์„ ๊ฐ€๋Šฅ์„ฑ
    • (๋‹ค์–‘ํ•œ pattern์˜ fake news์— ๋Œ€ํ•œ ์—ฐ๊ตฌ๊ฐ€ ํ•„์š”ํ–ˆ๋‹ค๋Š” ๊ฒฐ๋ก )

Suggestions

RAG๊ธฐ๋ฐ˜ detector๋ฅผ ์ ์ง„์ ์œผ๋กœ ํšŒํ”ผ(?)ํ•  ์ˆ˜ ์žˆ๋Š” fake news๋ฅผ adversarial iterativeํ•˜๊ฒŒ ์ƒ์„ฑํ•˜๋Š” ๋ฐฉ๋ฒ• ์ œ์•ˆ

  1. ๋‹ค์–‘ํ•œ ๋„๋ฉ”์ธ์˜ ์‹ค์ œ ๋‰ด์Šค๊ธฐ์‚ฌ ๊ธฐ๋ฐ˜ LLM ์ƒ์‹ ์ด์ƒ์˜ real news corpus ๊ตฌ์ถ•
  2. LLM (generator)๊ธฐ๋ฐ˜ fake news ํ›„๋ณด ์ƒ์„ฑ
  3. real news๋ž‘ contradict ๋˜์ง€ ์•Š๋Š” ํ›„๋ณด ์ œ์™ธ
  4. RAG๊ธฐ๋ฐ˜ detector๋กœ filter๋œ ํ›„๋ณด ranking โ†’ top1 ์„ ํƒ (โ†’ generator input)
  5. detector๋ฅผ ์ ์ง„์ ์œผ๋กœ ์†์ด๋Š” Iterative loop ์ƒ์„ฑ

Effects:

  • Experimental setup: NBC News ํ™œ์šฉ, GPT-4o๋ฅผ main generator๋กœ ํ•˜์—ฌ 6ํšŒ loop ์ˆ˜ํ–‰
    • backbone: GPT-4, GPT-3.5, Gemini Pro/Flash, Llama 3.1, โ€ฆ
  • Results: loop๋ฅผ ๊ฑฐ์น ์ˆ˜๋ก ๋” ํ–ฅ์ƒ๋œ (์–ด๋ ค์šด) fake news ์ƒ์„ฑํ•ด๋ƒ„
    • ํŠนํžˆ RAG๊ธฐ๋ฐ˜ GPT-4o ๋ชจ๋ธ์ด ๊ฐ€์žฅ ์„ฑ๋Šฅ ํ•˜๋ฝ ํญ์ด ์ปธ์Œ (AUC-ROC 17.5 ํ•˜๋ฝ)
    • real-time news๋Š” LLM detector ์ž…์žฅ์—์„œ๋Š” ๊ทธ๋Ÿด๋“ฏํ•˜์ง€ ์•Š์€ ๊ฒƒ์œผ๋กœ ๋ณด์ž„
    • RAG ๊ธฐ๋ฐ˜ํ•˜์ง€ ์•Š์€ ๊ฒฝ์šฐ adversarial attack์— ๋”์šฑ ์ทจ์•ฝ
    • LLM์€ changing entities(incld. names, locations, times), hallucinating events + making up details, mimicking typographical errors ๋“ฑ ๋‹ค์–‘ํ•œ ๋ฐฉ์‹์œผ๋กœ ์ง„์งœ ๋‰ด์Šค๋ฅผ ์ˆ˜์ •

Personal note. ์‹œ๊ฐ„์ด ํ๋ฆ„์— ๋”ฐ๋ฅธ ์ง€์‹ ์ถฉ๋Œ์„ ์•ผ๊ธฐ์‹œํ‚ค๋Š” ๋ฐ์ดํ„ฐ์…‹ ์ƒ์„ฑ์— ๋„์›€์ด ๋ ๊นŒ ์‹ถ์–ด์„œ ์ฝ์–ด๋ดค์Šต๋‹ˆ๋‹ค๋งŒ, ์ƒ๊ฐ๋ณด๋‹ค ์‹ฌ์‹ฌํ•œ ๊ฒฐ๋ก ์ด๋„ค์š”. (์‹œ๊ฐ„ ํ๋ฆ„์„ ๊ฐ•์กฐํ•œ ๊ฑด ์•„๋‹ˆ๊ณ , ๊ทธ๋ƒฅ fake news ์ƒ์„ฑ์ธ๋“ฏโ€ฆ) ๊ฒฐ๊ตญ LLM์„ ๋” ํƒœ์šธ์ˆ˜๋ก fake ๊ฐ•๋„๊ฐ€ ์‹ฌํ•ด์ง€๋Š” ๊ฒƒ ๊ฐ™๋‹ค๋Š” ๊ฒฐ๋ก ์œผ๋กœ ์ •๋ฆฌํ•ด๋„ ๋  ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.