1 minute read

Meta info.

TL; DR

GPT-4, a large multimodal model with human-level performance on certain difficult professional and academic benchmarks. It outperforms existing LLMs on almost NLP tasks, and exceeds the vast majority of reported SOTA methods (often including task-specific fine-tuning).

Untitled

Untitled

Untitled

Untitled

Suggestions

  1. multi-modal input: GPT-4 takes images & text as input, then produces text as output. (pic1)
  2. stability to train GPT-4: OpenAI has developed infrastructure and optimization methods that exhibit highly predictable behavior across multiple scales, which enables reliable prediction of GPT-4 performance. (Azure for supporting infrastructure design & management)
  3. significant improvements on various benchmarks: GPT-4 shows significant improvements on various benchmarks and outperforms existing LLMs by a considerable margin, even surpassing the SOTA models that have been fine-tuned for specific tasks. (pic2)
  4. Effect on RLHF: Model capabilities are not significantly affected by RLHF on exams(actually degrades performance w/o additional efforts), while making progress on benchmarks like TruthfulQA. It suggests that post-training does not substantially alter base model capability but allows user to steer the model closer towards the desired behavior. (pic3)
  5. safety properties: OpenAI focused on decreasing the model’s tendency to respond to requests for harmful/incorrect output by 82% compared to GPT-3.5, even recruited external experts to qualitatively probe, adversarially test, and generally provide feedback on the GPT-4. (pic4)

Categories: