Finisky Garden

NLP, 软件工程, 产品设计

之前我们谈到 Adapters 与 Prompting 都是轻量级的训练方法,所谓 lightweight-finetuning。今天来看一下另一种轻量级训练大语言模型的方法:

LoRA: Low-Rank Adaptation of Large Language Models

微调大规模语言模型到特殊领域和任务是自然语言处理的重要课题之一。但随着模型规模的不断扩大,微调模型的所有参数(所谓full fine-tuning)的可行性变得越来越低。以GPT-3的175B参数为例,每增加一个新领域就需要完整微调一个新模型,代价和成本很高:

An important paradigm of natural language processing consists of large-scale pretraining on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, full fine-tuning, which retrains all model parameters, becomes less feasible. Using GPT-3 175B as an example – deploying independent instances of fine-tuned models, each with 175B parameters, is prohibitively expensive.

阅读全文 »

After I upgraded pandoc from 2.14.0.3 to 2.18, hexo-renderer-pandoc cannot render one of my post correctly. Everything works fine in 2.14.0.3. The error looks like:

INFO Start processing FATAL { err: Error: [ERROR][hexo-renderer-pandoc] On /home/finisky/source/_posts/test.md [ERROR][hexo-renderer-pandoc] pandoc exited with code 64: YAML parse exception at line 4, column 0, while scanning a simple key: could not find expected ':'

  at Hexo.pandocRenderer (/home/finisky/node_modules/hexo-renderer-pandoc/index.js:114:11)
  at Hexo.tryCatcher (/home/finisky/node_modules/bluebird/js/release/util.js:16:23)
  at Hexo.<anonymous> (/home/finisky/node_modules/bluebird/js/release/method.js:15:34)
  at /home/finisky/node_modules/hexo/lib/hexo/render.js:75:22
  at tryCatcher (/home/finisky/node_modules/bluebird/js/release/util.js:16:23)
  at Promise._settlePromiseFromHandler (/home/finisky/node_modules/bluebird/js/release/promise.js:547:31)
  at Promise._settlePromise (/home/finisky/node_modules/bluebird/js/release/promise.js:604:18)
  at Promise._settlePromiseCtx (/home/finisky/node_modules/bluebird/js/release/promise.js:641:10)
  at _drainQueueStep (/home/finisky/node_modules/bluebird/js/release/async.js:97:12)
  at _drainQueue (/home/finisky/node_modules/bluebird/js/release/async.js:86:9)
  at Async._drainQueues (/home/finisky/node_modules/bluebird/js/release/async.js:102:5)
  at Immediate.Async.drainQueues [as _onImmediate] (/home/finisky/node_modules/bluebird/js/release/async.js:15:14)
  at processImmediate (node:internal/timers:464:21)

} Something's wrong. Maybe you can find the solution here: %s https://hexo.io/docs/troubleshooting.html

阅读全文 »

之前我们谈到如何 从私有代码库自动部署Hugo站到GitHub Pages 。以为将之前的workflow yaml修改为Hexo的版本非常容易,亲自试了下发现打脸了。原因在于Hexo的依赖很多,因此环境配置比Hugo就复杂很多,同时还兼有各种包和库的兼容性问题。相比之下,Hugo就显得非常干净,使用GitHub Action容易不少。

花了好多时间并且尝试不了下20次,才将Hexo的action workflow最终调通 :-) ,记录下踩过的坑和解决文案。

与Hugo的workflow相比,需要解决如下几个问题:

  • 主题目录的submodule配置
  • 使用PAT同时拉取两个私有库(主库及主题submodule)的代码
  • Pandoc在GitHub Action中的安装(可选)

开始吧!

阅读全文 »

Last time we talked how to deploy Hugo from private repository to GitHub Pages . I thought it is trivial to modify the workflow yaml to make it works for Hexo. However, it is much more complicated than I ever thought. The reason is that Hexo has to setup more dependencies with compatibility issues while Hugo is relatively self-contained and clean.

Actually, I spent several hours and attempted more than 20 times to make the workflow yaml works. :-)

Compared to Hugo workflow, there are several issues to be resolved:

  • Setup private submodules for your themes
  • Configure PAT to pull private submodules as well as your main repo
  • Pandoc installation for mathjax (optional)

Let's start!

阅读全文 »

NLP adapters主要想解决不同任务需要finetune整个模型的痛点,与Prompting一样,是一种轻量级的训练方法,也是Transfer Learning的应用。按出现时间来看,finetune早于adapters,adapters早于prompting。今天来重看这篇Adapters的文章,可以更好地理解lightweight finetune的发展过程。

Adapters在NLP中的应用源于这篇 ICML2019 文章:Parameter-Efficient Transfer Learning for NLP adapters

Adapter-based tuning requires training two orders of magnitude fewer parameters to fine-tuning, while attaining similar performance.

阅读全文 »

Hugo是个极好的静态网站生成器。一个常见的情况是原始的网站源码放在私有代码库中,但希望自动化构建和部署的功能。假设私有代码库托管在github上,希望能自动化部署到GitHub Pages,这个功能可以通过github actions轻松搞定。

开始之前,假设我们已有了两个repo(可以隶属于不同的github账户):

  • 私有库: 存放网站的源码
  • 目标库: GitHub Pages (xxx.github.io)
阅读全文 »

Hugo is a nice static site generator. A common scenario is to store your website source code in a private repository and serve it on GitHub Pages. Can we leverage the github actions to automatically build and deploy the site from the private repository to GitHub Pages? The answer is absolutely yes!

Before we start, you need to have two repos (can belong to different github accounts):

  • Source Repo: the repo to store
  • Target Repo: host the GitHub Pages (xxx.github.io)
阅读全文 »

My first understanding of a language model is originated from n-gram. When I know RNNLM, I have a question: why a neural network can represent a language model?

After some research, I found the answer. Essentially, language model is a probability distribution:

A statistical language model is a probability distribution over sequences of words.

阅读全文 »

最初对语言模型的理解源于n-gram语言模型,但后来出现了RNNLM等一众神经网络语言模型,就有了这个疑问:神经网络为什么可以表示语言模型?

首先,语言模型本质上是概率分布

A statistical language model is a probability distribution over sequences of words.

阅读全文 »

# Scaling Laws for Neural Language Models

一篇实验Paper,调研了神经网络语言模型交叉熵损失变化满足power-law定律,挺有意思的文章。Transformer之后有许多探索不同模型结构的文章,并在一些任务上取得了新的SOTA,却鲜有人考虑影响模型性能的主要因素是什么。

Throughout we will observe precise power-law scalings for performance as a function of training time, context length, dataset size, model size, and compute budget.

阅读全文 »
0%