Finisky Garden

Hexo Generate Wrong Permalinks Date

发表于 2022-06-11 更新于 2023-01-24 分类于 Hexo 评论：阅读次数：

After Deploy Hexo From Private Repository to GitHub Pages, we encounter many issues: GitHub Checkout Action Preserve File Modification Time, and now some posts' permalinks date may shift one day. For instance, assume the original markdown date is 2020-07-13 00:50:05, the generated permalinks date becomes 2020/07/12. Since the permalinks changed, search engines will regard these posts are not found which impact the SEO performance.

阅读全文 »

Hexo生成错误的永久链接日期

发表于 2022-06-11 分类于 Hexo 评论：阅读次数：

使用 # 从私有代码库自动部署Hexo站到GitHub Pages 之后，真是幺娥子迭出：先是文章的最后编辑时间不正确，现在又发现有些页面的永久链接的日期会差一天，比如markdown写的是2020-07-13 00:50:05，生成的永久链接变成了2020/07/12。这个错误可能会导致搜索引擎找不到老页面，从而影响搜索展示。

阅读全文 »

什么是一个优秀的校招候选人

发表于 2022-06-06 更新于 2022-06-07 分类于 Coding Interview 评论：阅读次数：

作为面试官面试过数百候选人，深知招人难，招合适的人更难。同时，所谓“良禽择木而栖”，找一份自己满意的工作也并非易事。社招由于岗位职责的不同，与校招的标准有较大区别，下回分解。今天我们从面试官的角度来聊聊，对于技术研发岗，什么是一个优秀的校招候选人。

阅读全文 »

深度文本检索模型：DPR, PolyEncoders, DCBERT, ColBERT

发表于 2022-06-03 更新于 2024-07-23 分类于 Machine Learning 评论：阅读次数：

文本匹配与检索是NLP中的经典问题，主要研究两个文本的主义相似度，通常用在检索系统的召回阶段。传统的召回方案如tf-idf和BM25具有速度优势，但在语义匹配方面有所欠缺。随着预训练模型的发展，使用深度模型进行文本检索变得必要与可行。

使用深度模型进行检索，主要矛盾是检索性能与速度的平衡。 本文对几篇经典的文本检索模型工作DPR, Poly-Encoders, DC-BERT 与 ColBERT 的主要思想进行介绍与对比。

[EMNLP2020] Dense Passage Retrieval for Open-Domain Question Answering

[ICLR2020] Poly-encoders：Architectures and Pre training Strategies for Fast and Accurate Multi sentence Scoring

[SIGIR2020] DC-BERT: Decoupling Question and Document for Efficient Contextual Encoding

[SIGIR2020] ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT

阅读全文 »

GitHub Checkout Action恢复文件修改时间

发表于 2022-05-15 分类于 Hexo 评论：阅读次数：

在 # 从私有代码库自动部署Hexo站到GitHub Pages, 我们用GitHub Action实现了自动化部署Hexo站。但还存在一个问题，在每次部署后所有文章的修改时间都变成了当前时间，而非实际的修改时间。这样的问题在于所有历史文章在每次部署之后都会发生变化，会让搜索引擎误认为这个网站时常改动。

经过分析发现，Hexo正是使用文件修改时间作为文章的最后编辑时间，但 git从设计上就不保留文件的修改时间。在checkout之后，所有markdown文件的修改时间都变成了当前时间。

阅读全文 »

GitHub Checkout Action Preserve File Modification Time

发表于 2022-05-15 更新于 2023-01-24 分类于 Hexo 评论：阅读次数：

By # Deploy Hexo From Private Repository to GitHub Pages, we can leverage GitHub Actions to automatically deploy the Hexo website. However, for each deployment commit, the post's edit time will be changed to the current time instead of actual modification time. It may mislead the search engine to regard the website as a frequently modified site.

By default, Hexo uses the post file modification time as its edit time. By design, git doesn't preserve the file modification time (refer to this). After checkout action, the file modification time will be the current time.

阅读全文 »

LoRA: Low-Rank Adaptation of Large Language Models 简读

发表于 2022-05-13 更新于 2023-05-09 分类于 Machine Learning 评论：阅读次数：

之前我们谈到 Adapters 与 Prompting 都是轻量级的训练方法，所谓 lightweight-finetuning。今天来看一下另一种轻量级训练大语言模型的方法:

LoRA: Low-Rank Adaptation of Large Language Models

微调大规模语言模型到特殊领域和任务是自然语言处理的重要课题之一。但随着模型规模的不断扩大，微调模型的所有参数（所谓full fine-tuning）的可行性变得越来越低。以GPT-3的175B参数为例，每增加一个新领域就需要完整微调一个新模型，代价和成本很高：

An important paradigm of natural language processing consists of large-scale pretraining on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, full fine-tuning, which retrains all model parameters, becomes less feasible. Using GPT-3 175B as an example – deploying independent instances of fine-tuned models, each with 175B parameters, is prohibitively expensive.

阅读全文 »

'pandoc exited with code 64' Solution

发表于 2022-05-04 更新于 2023-01-24 分类于 Hexo 评论：阅读次数：

After I upgraded pandoc from 2.14.0.3 to 2.18, hexo-renderer-pandoc cannot render one of my post correctly. Everything works fine in 2.14.0.3. The error looks like:

INFO Start processing FATAL { err: Error: [ERROR][hexo-renderer-pandoc] On /home/finisky/source/_posts/test.md [ERROR][hexo-renderer-pandoc] pandoc exited with code 64: YAML parse exception at line 4, column 0, while scanning a simple key: could not find expected ':'

  at Hexo.pandocRenderer (/home/finisky/node_modules/hexo-renderer-pandoc/index.js:114:11)
  at Hexo.tryCatcher (/home/finisky/node_modules/bluebird/js/release/util.js:16:23)
  at Hexo.<anonymous> (/home/finisky/node_modules/bluebird/js/release/method.js:15:34)
  at /home/finisky/node_modules/hexo/lib/hexo/render.js:75:22
  at tryCatcher (/home/finisky/node_modules/bluebird/js/release/util.js:16:23)
  at Promise._settlePromiseFromHandler (/home/finisky/node_modules/bluebird/js/release/promise.js:547:31)
  at Promise._settlePromise (/home/finisky/node_modules/bluebird/js/release/promise.js:604:18)
  at Promise._settlePromiseCtx (/home/finisky/node_modules/bluebird/js/release/promise.js:641:10)
  at _drainQueueStep (/home/finisky/node_modules/bluebird/js/release/async.js:97:12)
  at _drainQueue (/home/finisky/node_modules/bluebird/js/release/async.js:86:9)
  at Async._drainQueues (/home/finisky/node_modules/bluebird/js/release/async.js:102:5)
  at Immediate.Async.drainQueues [as _onImmediate] (/home/finisky/node_modules/bluebird/js/release/async.js:15:14)
  at processImmediate (node:internal/timers:464:21)

} Something's wrong. Maybe you can find the solution here: %s https://hexo.io/docs/troubleshooting.html

阅读全文 »

从私有代码库自动部署Hexo站到GitHub Pages

发表于 2022-05-02 更新于 2022-08-29 分类于 Hexo 评论：阅读次数：

之前我们谈到如何从私有代码库自动部署Hugo站到GitHub Pages 。以为将之前的workflow yaml修改为Hexo的版本非常容易，亲自试了下发现打脸了。原因在于Hexo的依赖很多，因此环境配置比Hugo就复杂很多，同时还兼有各种包和库的兼容性问题。相比之下，Hugo就显得非常干净，使用GitHub Action容易不少。

花了好多时间并且尝试不了下20次，才将Hexo的action workflow最终调通 :-) ，记录下踩过的坑和解决文案。

与Hugo的workflow相比，需要解决如下几个问题：

主题目录的submodule配置
使用PAT同时拉取两个私有库（主库及主题submodule）的代码
Pandoc在GitHub Action中的安装（可选）

开始吧！

阅读全文 »

Deploy Hexo From Private Repository to GitHub Pages

发表于 2022-05-02 更新于 2023-01-24 分类于 Hexo 评论：阅读次数：

Last time we talked how to deploy Hugo from private repository to GitHub Pages . I thought it is trivial to modify the workflow yaml to make it works for Hexo. However, it is much more complicated than I ever thought. The reason is that Hexo has to setup more dependencies with compatibility issues while Hugo is relatively self-contained and clean.

Actually, I spent several hours and attempted more than 20 times to make the workflow yaml works. :-)

Compared to Hugo workflow, there are several issues to be resolved:

Setup private submodules for your themes
Configure PAT to pull private submodules as well as your main repo
Pandoc installation for mathjax (optional)

Let's start!

阅读全文 »