less than 1 minute read

Meta info.

TL; DR

RMT(Recurrent Memory Transformer) retains information across up to 2 million tokens!

Untitled

Untitled

Untitled

Suggestions

  • RMT, which can retain information across up to 2 × 10^6 tokens, significantly exceeding the largest input size reported for transformer models
  • It enables pre-trained BERT models to store task-specific information and has potential use cases in language modeling and other tasks
  • The ability to effectively utilize memory for up to 4,096 segments with a total length of 2,048,000 tokens has significant implications for the development of future transformer models.