1 minute read

Meta info.

TL; DR

unreachable points phenomenon์„ ์™„ํ™”ํ•˜๋Š” HNSW ๊ธฐ๋ฐ˜์˜ MN-RU(Mutual Neighbor-Replaced Update) ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์ œ์•ˆ

Untitled

Untitled

Untitled

Untitled

Problem States

HNSW์ด์‚ฐ์—… ๋“ฑ์—์„œ ANN์œผ๋กœ ๊ฐ€์žฅ ์†์‰ฝ๊ฒŒ ์‚ฌ์šฉ๋˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด๋‚˜, ์‹ค์‹œ๊ฐ„ ์—…๋ฐ์ดํŠธ์‹œ ์„ฑ๋Šฅ ํฌ๊ฒŒ ์ €ํ•˜๋จ. ์ฃผ์š” ์›์ธ์œผ๋กœ ๊ทธ๋ž˜ํ”„ ์—ฐ๊ฒฐ์„ฑ์ด ๋Š์–ด์ง€๋Š” unreachable points phenomenon (pic1ย )๊ฐ€ ํ•ด๊ฒฐ ์•ˆ ๋จ.

Suggestion

๋‹จ์  ๋ณด์™„๋œ HNSW ๊ธฐ๋ฐ˜์˜ MN-RU(Mutual Neighbor-Replaced Update) ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์ œ์•ˆ.

  • 2์ค‘ ๊ตฌ์กฐ index ๊ตฌ์„ฑ
    • main index: index used for regular operations
    • backup index: index managing the unreachable points
      1. query์™€ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด point ์‹๋ณ„์„ ์œ„ํ•ด main index์—์„œ K-NN ๊ฒ€์ƒ‰ ์ˆ˜ํ–‰
      2. K-NN์—์„œ ์‹๋ณ„ ์•ˆ๋˜๋Š”, unreachable points main index์—์„œ ์ œ๊ฑฐ
      3. main index์™€ ๋ณ„๊ฐœ๋กœ, unreachable point์— ๋Œ€ํ•œ ์ƒˆ HNSW ์ธ๋ฑ์Šค ๊ตฌ์„ฑ (backup index)
  • dual index search: main - backup ๋™์‹œ ์„œ์น˜: main์—์„œ unreachable point๊ฐ€ ๋‚˜ํƒ€๋‚˜๋ฉด backup์—์„œ ๋ณด์™„
  • MN-RU ์•Œ๊ณ ๋ฆฌ์ฆ˜
    1. Mutual Neighbor Selection: main index ์—…๋ฐ์ดํŠธํ•˜๋Š” ๋™์•ˆ ๊ทธ๋ž˜ํ”„ ์—ฐ๊ฒฐ์„ฑ ์œ ์ง€
    2. Neighbor Reselection: point๊ฐ€ ์‚ญ์ œ๋˜๋ฉด ์‚ญ์ œ๋œ ํฌ์ธํŠธ์˜ ์ด์›ƒ๊ณผ 2ํ™‰ ์ด์›ƒ์ด ๋‹ค์‹œ ์—ฐ๊ฒฐ๋˜๋„๋ก ํ•˜์—ฌ ๊ทธ๋ž˜ํ”„ ์—ฐ๊ฒฐ์„ฑ ์œ ์ง€

Effects

  • 3๊ฐ€์ง€ ์‹œ๋‚˜๋ฆฌ์˜ค์—์„œ ํ™•์ธ: ์ „ ๋ฒ”์œ„์—์„œ ์ˆœ์ฐจ์ ์œผ๋กœ iteration ๋Œ ๋•Œ๋งˆ๋‹ค deletion-insertion ์ˆ˜ํ–‰ / iteration๋งˆ๋‹ค ๋ฌด์ž‘์œ„ point์— ๋Œ€ํ•ด deletion-insertion ์ˆ˜ํ–‰ / iteration๋งˆ๋‹ค ์ „ point deletion-insertion ์ˆ˜ํ–‰
  • long period (๋Œ€๋Ÿ‰) ์‹ค์‹œ๊ฐ„ ์—…๋ฐ์ดํŠธ ๊ฐœ์„ (์ตœ๋Œ€ 2~4๋ฐฐ)
  • deletion-insertion ๋ฐ˜๋ณต ์ˆ˜ํ–‰์—๋„ unreachable points ์˜ ์ฆ๊ฐ€ ์–ต์ œ๋กœ ์ „๋ฐ˜์ ์ธ ์„ฑ๋Šฅ ๊ฐœ์„ 
  • dual index ๊ตฌ์กฐ์™€ ๋™์‹œ๊ฒ€์ƒ‰์„ ์ˆ˜ํ–‰ํ•จ์˜ ์ด์ : ๋†’์€ recall์„ ์œ ์ง€ํ•˜๊ณ  unreachable points ์ตœ์†Œํ™”