Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering 简读

开放域问答常常需要借助外部知识生成更有信息量和准确的答复。当检索出相关知识后,如何将它们融入生成模型就是个问题。Fusion-in-Decoder (FiD) 这篇文章提出了一个简单有效的方案。

[EACL2021] [FiD] Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering

OpenQA场景

FiD Example

如上图所示,对一个问题,首先要从外部检索相关的文章片断,比如Wikipedia。然后,使用encoder-decoder结构,以 <question, retrieved passage> 作为输入,生成最终的回复。

This approach scales well with the number of retrieved passages, as the performance keeps improving when retrieving up to one hundred passages.

FiD模型

FiD的想法简单直接,将检索回来的每个passage都与question通过encoder分别编码,然后concat在一起输入decoder生成最终的回复。顾名思义,叫做Fusion-in-Decoder

FiD Architecture

FiD模型的效果还出奇的好:

While conceptually simple, this method sets new state-of-the-art results on the TriviaQA and NaturalQuestions benchmarks.

同时,作者认为与检索模型相比,生成模型非常善于将多个passage的信息合成:

We believe that this is evidence that generative mod els are good at combining evidence from multiple passages, compared to extractive ones.

实验结果

FiD在三个数据集:NaturalQuestions, TriviaQA, SQuAD 上的表现都非常好。

FiD Performance

上图证明,输入生成模型的passage越多,模型的性能越好。

In particular, we observe that increasing the number of passages from 10 to 100 leads to 6% improvement on TriviaQA and 3.5% improvement on NaturalQuestions.

FiD的方案简单直接,但随着passage数目的不断增多,经concat之后decoder的输入会变得很长,训练起来的成本也随之增高不少。从 FiD github repo 的一段说明可见一斑:

Training these models with 100 passages is memory intensive. To alleviate this issue we use checkpointing with the --use_checkpoint option. ... The large readers have been trained on 64 GPUs ...

除了OpenQA,FiD还可以拓展到一切需要多维输入的应用场景,如 Document Grounded Conversation 等。