🔍 Research any topic with AI-powered citations — Try Researchly freeStart Researching
Home/Research/GPT-3 few-shot benchmarks
AI Research Answer

GPT-3 few-shot benchmarks

4 cited papers · May 26, 2026 · Powered by Researchly AI

🧠
TL;DR

GPT-3 is a large-scale autoregressive language model that demonstrates remarkable few-shot learning capabilities across a wide range of NLP tasks. established t…

GPT-3 is a large-scale autoregressive language model that demonstrates remarkable few-shot learning capabilities across a wide range of NLP tasks.12Brown et al. (2020)1

established that GPT-3, with 175 billion parameters, achieves strong few-shot performance on many NLP datasets. A critical finding in subsequent research is that these empirical results depend heavily on the choice of in-context examples used to construct the prompt. Liu et al. (2021)

1
Language Models are Few-Shot LearnersTom B. Brown, Benjamin Mann et al.2020Advances in Neural Information Processing Systems (NeurIPS)
View
2
Improving Language Understanding by Generative Pre-TrainingAlec Radford, Karthik Narasimhan et al.2018OpenAI Blog
View
  • Transformer Architecture — The foundational network architecture based solely on attention mechanisms, dispensing with recurrence and convolutions, enabling highly parallelizable sequence modeling.
1Vaswani et al. (2017)1
  • GPT-3 (175B) — A 175-billion-parameter autoregressive language model pre-trained via generative pre-training on diverse unlabeled text, achieving strong few-shot performance across NLP benchmarks.
23Brown et al. (2020)2
  • GPT (Generative Pre-Training) — Demonstrates that large gains on NLP tasks can be realized by generative pre-training of a language model on a diverse corpus of unlabeled text.
32Radford et al. (2018)3
  • In-Context Example Selection — The strategy by which few-shot prompts are constructed; retrieval-based selection of semantically similar examples consistently outperforms random sampling for GPT-3.
42Liu et al. (2021)4
1
Attention Is All You NeedAshish Vaswani, Noam Shazeer et al.2017Advances in Neural Information Processing Systems (NeurIPS)
View
2
Language Models are Few-Shot LearnersTom B. Brown, Benjamin Mann et al.2020Advances in Neural Information Processing Systems (NeurIPS)
View
3
Improving Language Understanding by Generative Pre-TrainingAlec Radford, Karthik Narasimhan et al.2018OpenAI Blog
View
4
What Makes Good In-Context Examples for GPT-$3$?Jiachang Liu, Dinghan Shen et al.2021arXiv (Cornell University)
View
Want to research your own topic? Try it free →
Diagram
┌─────────────────────────────────────────────────────┐
│ GPT-3 Few-Shot Inference Pipeline │
│ │
│ Test Sample │
│ │ │
│ ▼ │
│ [Retrieval Module] │
│ Sentence Encoder → Semantic Similarity Search │
│ │ │
│ ▼ │
│ In-Context Examples (k similar examples) │
│ │ │
│ ▼ │
│ Prompt Construction │
│ [Example 1] [Example 2]... [Test Input] │
│ │ │
│ ▼ │
│ GPT-3 (175B Parameters) │
│ Autoregressive Transformer Decoder │
│ │ │
│ ▼ │
│ Prediction / Generated Output │
└─────────────────────────────────────────────────────┘
GPT-3's few-shot benchmark performance is sensitive to how in-context examples are selected. Liu et al. (2021)1propose a retrieval-augmented prompt selection strategy, where examples semantically similar to the test sample are retrieved and used as the prompt, consistently outperforming random baseline selection across natural language understanding and generation benchmarks. Furthermore, sentence encoders fine-tuned on task-related datasets yield even more helpful retrieved examples, suggesting that domain-adapted retrieval is a meaningful direction for improving few-shot performance.1
1
What Makes Good In-Context Examples for GPT-$3$?Jiachang Liu, Dinghan Shen et al.2021arXiv (Cornell University)
View
Table
AspectRandom SamplingRetrieval-Based Selection
Example relevanceLow (random)High (semantically similar)
Benchmark performanceBaselineConsistently higher
Encoder typeN/ATask-fine-tuned encoders best
Want to research your own topic? Try it free →
  • GPT-3's empirical few-shot results are highly sensitive to the choice of in-context examples, meaning benchmark numbers may not reflect the model's true ceiling or floor without careful prompt engineering.
12
  • GPT-3's unidirectional, decoder-only architecture — rooted in generative pre-training — may underperform on tasks requiring deep bidirectional token-level understanding compared to encoder-based models.
2
1
What Makes Good In-Context Examples for GPT-$3$?Jiachang Liu, Dinghan Shen et al.2021arXiv (Cornell University)
View
2
Language Models are Few-Shot LearnersTom B. Brown, Benjamin Mann et al.2020Advances in Neural Information Processing Systems (NeurIPS)
View
  • GPT-3 with 175 billion parameters achieves strong few-shot performance across many NLP datasets, establishing a new scale benchmark.
1
  • The Transformer's attention-only architecture is the shared foundation enabling GPT-3's parallelizable, large-scale pre-training.
21
  • Retrieval-based in-context example selection consistently outperforms random sampling for GPT-3 few-shot benchmarks.
31
  • Task-fine-tuned sentence encoders further improve retrieval quality, pointing to an open research direction in adaptive prompt construction.
3
  • Generative pre-training on diverse unlabeled corpora is the core mechanism behind GPT-family few-shot gains.
4
1
Language Models are Few-Shot LearnersTom B. Brown, Benjamin Mann et al.2020Advances in Neural Information Processing Systems (NeurIPS)
View
2
Attention Is All You NeedAshish Vaswani, Noam Shazeer et al.2017Advances in Neural Information Processing Systems (NeurIPS)
View
3
What Makes Good In-Context Examples for GPT-$3$?Jiachang Liu, Dinghan Shen et al.2021arXiv (Cornell University)
View
4
Improving Language Understanding by Generative Pre-TrainingAlec Radford, Karthik Narasimhan et al.2018OpenAI Blog
View
Want to research your own topic? Try it free →
  1. "GPT-3 prompt sensitivity and robustness across NLP benchmarks"
  2. "Retrieval-augmented in-context learning for large language models"
  3. "Comparison of few-shot learning: GPT-3 vs BERT vs T5 on SuperGLUE"

Research smarter with AI-powered citations

Researchly finds and cites academic papers for any research topic in seconds. Used by students across India.