0%

After Deploy Hexo From Private Repository to GitHub Pages, we encounter many issues: GitHub Checkout Action Preserve File Modification Time, and now some posts' permalinks date may shift one day. For instance, assume the original markdown date is 2020-07-13 00:50:05, the generated permalinks date becomes 2020/07/12. Since the permalinks changed, search engines will regard these posts are not found which impact the SEO performance.

Read more »

作为面试官面试过数百候选人,深知招人难,招合适的人更难。同时,所谓“良禽择木而栖”,找一份自己满意的工作也并非易事。社招由于岗位职责的不同,与校招的标准有较大区别,下回分解。今天我们从面试官的角度来聊聊,对于技术研发岗,什么是一个优秀的校招候选人。

Read more »

文本匹配与检索是NLP中的经典问题,主要研究两个文本的主义相似度,通常用在检索系统的召回阶段。传统的召回方案如tf-idf和BM25具有速度优势,但在语义匹配方面有所欠缺。随着预训练模型的发展,使用深度模型进行文本检索变得必要与可行。

使用深度模型进行检索,主要矛盾是检索性能与速度的平衡。 本文对几篇经典的文本检索模型工作DPR, Poly-Encoders, DC-BERT 与 ColBERT 的主要思想进行介绍与对比。

[EMNLP2020] Dense Passage Retrieval for Open-Domain Question Answering

[ICLR2020] Poly-encoders:Architectures and Pre training Strategies for Fast and Accurate Multi sentence Scoring

[SIGIR2020] DC-BERT: Decoupling Question and Document for Efficient Contextual Encoding

[SIGIR2020] ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT

Read more »

# 从私有代码库自动部署Hexo站到GitHub Pages, 我们用GitHub Action实现了自动化部署Hexo站。但还存在一个问题,在每次部署后所有文章的修改时间都变成了当前时间,而非实际的修改时间。这样的问题在于所有历史文章在每次部署之后都会发生变化,会让搜索引擎误认为这个网站时常改动。

经过分析发现,Hexo正是使用文件修改时间作为文章的最后编辑时间,但 git从设计上就不保留文件的修改时间 。在checkout之后,所有markdown文件的修改时间都变成了当前时间。

Read more »

By # Deploy Hexo From Private Repository to GitHub Pages, we can leverage GitHub Actions to automatically deploy the Hexo website. However, for each deployment commit, the post's edit time will be changed to the current time instead of actual modification time. It may mislead the search engine to regard the website as a frequently modified site.

By default, Hexo uses the post file modification time as its edit time. By design, git doesn't preserve the file modification time (refer to this). After checkout action, the file modification time will be the current time.

Read more »

之前我们谈到 Adapters 与 Prompting 都是轻量级的训练方法,所谓 lightweight-finetuning。今天来看一下另一种轻量级训练大语言模型的方法:

LoRA: Low-Rank Adaptation of Large Language Models

首先来看finetune大规模语言模型的问题:

An important paradigm of natural language processing consists of large-scale pretraining on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, full fine-tuning, which retrains all model parameters, becomes less feasible. Using GPT-3 175B as an example – deploying independent instances of fine-tuned models, each with 175B parameters, is prohibitively expensive.

Read more »

After I upgraded pandoc from 2.14.0.3 to 2.18, hexo-renderer-pandoc cannot render one of my post correctly. Everything works fine in 2.14.0.3. The error looks like:

INFO Start processing FATAL { err: Error: [ERROR][hexo-renderer-pandoc] On /home/finisky/source/_posts/test.md [ERROR][hexo-renderer-pandoc] pandoc exited with code 64: YAML parse exception at line 4, column 0, while scanning a simple key: could not find expected ':'

  at Hexo.pandocRenderer (/home/finisky/node_modules/hexo-renderer-pandoc/index.js:114:11)
  at Hexo.tryCatcher (/home/finisky/node_modules/bluebird/js/release/util.js:16:23)
  at Hexo.<anonymous> (/home/finisky/node_modules/bluebird/js/release/method.js:15:34)
  at /home/finisky/node_modules/hexo/lib/hexo/render.js:75:22
  at tryCatcher (/home/finisky/node_modules/bluebird/js/release/util.js:16:23)
  at Promise._settlePromiseFromHandler (/home/finisky/node_modules/bluebird/js/release/promise.js:547:31)
  at Promise._settlePromise (/home/finisky/node_modules/bluebird/js/release/promise.js:604:18)
  at Promise._settlePromiseCtx (/home/finisky/node_modules/bluebird/js/release/promise.js:641:10)
  at _drainQueueStep (/home/finisky/node_modules/bluebird/js/release/async.js:97:12)
  at _drainQueue (/home/finisky/node_modules/bluebird/js/release/async.js:86:9)
  at Async._drainQueues (/home/finisky/node_modules/bluebird/js/release/async.js:102:5)
  at Immediate.Async.drainQueues [as _onImmediate] (/home/finisky/node_modules/bluebird/js/release/async.js:15:14)
  at processImmediate (node:internal/timers:464:21)

} Something's wrong. Maybe you can find the solution here: %s https://hexo.io/docs/troubleshooting.html

Read more »

之前我们谈到如何 从私有代码库自动部署Hugo站到GitHub Pages 。以为将之前的workflow yaml修改为Hexo的版本非常容易,亲自试了下发现打脸了。原因在于Hexo的依赖很多,因此环境配置比Hugo就复杂很多,同时还兼有各种包和库的兼容性问题。相比之下,Hugo就显得非常干净,使用GitHub Action容易不少。

花了好多时间并且尝试不了下20次,才将Hexo的action workflow最终调通 :-) ,记录下踩过的坑和解决文案。

与Hugo的workflow相比,需要解决如下几个问题:

  • 主题目录的submodule配置
  • 使用PAT同时拉取两个私有库(主库及主题submodule)的代码
  • Pandoc在GitHub Action中的安装(可选)

开始吧!

Read more »

Last time we talked how to deploy Hugo from private repository to GitHub Pages . I thought it is trivial to modify the workflow yaml to make it works for Hexo. However, it is much more complicated than I ever thought. The reason is that Hexo has to setup more dependencies with compatibility issues while Hugo is relatively self-contained and clean.

Actually, I spent several hours and attempted more than 20 times to make the workflow yaml works. :-)

Compared to Hugo workflow, there are several issues to be resolved:

  • Setup private submodules for your themes
  • Configure PAT to pull private submodules as well as your main repo
  • Pandoc installation for mathjax (optional)

Let's start!

Read more »