Prompt是当下最热的NLP技术之一，本文通过 what, why 和 how 三个问题对它进行介绍。力求简明扼要，不是完整综述，更多细节，可参考更多论文原文。

# Prompt是什么

Users prepend a natural language task instruction and a few examples to the task input; then generate the output from the LM. This approach is known as in-context learning or prompting.

Prompting is the approach of adding extra information for the model to condition on during its generation of Y .

Best pizza ever! It was ___.

This illustrates that solving a task from only a few examples becomes much easier when we also have a task description, i.e., a textual explanation that helps us understand what the task is about.

Pretrain-finetune与Prompt-tuning的主要区别在于前者通过finetune让模型更好地适应下游任务，而后者则是通过设计Prompt来挖掘预训练模型本身的潜能。

# 为什么用Prompt

Prompt重在挖掘预训练模型本身的潜力，甚至在某些情况下可以超越之前finetune的SOTA。

# 怎么用Prompt

## 离散式模板

### PET

# Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference

PET通过构造自然语言式的模板，将一些文本任务转换成完形填空任务，比如第一节中的示例。

PET提供的模板花样比较多，不同任务对应不同的人工设计模板，大概长这个样子，其中a和b是输入文本：

It was ___. a
a. All in all, it was ___.
a ( ___ ) b
[ Category: ___ ] a b

### AutoPrompt

# AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts

AutoPrompt本质上也是一种自然语言式的模板，但它的模板看起来通用性更强，分为三部分，原句，Trigger Tokens [T]和预测Token [P]。

{sentence} [T][T] . . [T] [P].

## 连续向量模板

### Prefix-Tuning

# Prefix-Tuning: Optimizing Continuous Prompts for Generation

Prefix-Tuning固定了语言模型本身的参数，只在每层加入一个Prefix Vector，也就是一个Prefix Matrix，仅训练这一小部分连续并任务相关的参数即可提升一些文本生成任务(NLG)的效果。把这些virtual token对应的向量看做prompt。Prefix-Tuning的模板形式：

Autoregressive Model: [T] x y
Encoder-Decoder Model: [T] x [T'] y

discrete prompting < embedding-only ablation < prefix-tuning

### P-Tuning

# GPT Understands, Too

P-Tuning的思路其实与Prefix-Tuning非常类似，都是希望通过少量标注数据学习一个连续向量模板，主要区别在于P-Tuning更关注NLU。

To automatically search prompts in the continuous space to bridge the gap between GPTs and NLU applications.

[h(0)]...[h(i)]; e(x); [h(i+1)]...[h(m)]; e(y)

Empirically, directly updating the P parameters leads to unstable optimization and a slight drop in performance.

By: Prefix-Tuning

In the P-tuning we propose to also model the h(i) as a sequence using a prompt encoder consists of a very lite neural network that can solve the discreteness and association problems.

By: P-Tuning

### Prompt Tuning

# The Power of Scale for Parameter-Efficient Prompt Tuning

#### 如何初始化prompt vector?

Conceptually, our soft-prompt modulates the frozen network’s behavior in the same way as text preceding the input, so it follows that a word-like representation might serve as a good initialization spot.

#### prompt的长度选择？

The parameter cost of our method is EP, where E is the token embedding dimension and P is the prompt length. The shorter the prompt, the fewer new parameters must be tuned, so we aim to find a minimal length that still performs well.

# 总结

Prompt无疑是当下最火的NLP技术之一，大家通过各种不同的Prompt方式来挖掘预训练模型本身的潜能。通过构造或查找合适的模板，prompt已经在各种不同的NLP任务上大放异彩。

Most of the work takes manually-designed prompts—prompt engineering is non-trivial since a small perturbation can significantly affect the model’s performance, and creating a perfect prompt requires both understanding of LMs' inner workings and trial-and-error.

