Finisky Garden

NLP, 软件工程, 产品设计

RAG re-retrieves, re-assembles, and re-reasons on every query. Ask something that requires synthesizing five documents and the model has to find all five, stitch them together, and derive the answer from scratch. Ask ten times, retrieve ten times. Nothing accumulates.

Karpathy recently posted a gist called LLM Wiki proposing a different approach: instead of retrieving at query time, have the LLM pre-compile knowledge into a structured wiki and query the compiled result.

阅读全文 »

RAG 的工作方式是每次提问都重新检索、重新拼接、重新推理。问一个需要综合五篇文档才能回答的问题,模型每次都得从头找到这五篇,拼起来,再给你答案。问十次,找十次。什么都没积累下来。

Karpathy 前两天发了一个叫 LLM Wiki 的 gist,提了一个不同的思路:别让模型每次现场检索了,让它把知识预先编译成一个结构化的 wiki,查询的时候直接查编译好的结果。

阅读全文 »

Dense retrieval has become the default first stage in RAG pipelines. Encode documents into vectors, encode queries into vectors, compute cosine similarity, done. But a basic question rarely gets asked: for a d-dimensional embedding, how many distinct top-k retrieval results can it actually represent?

An ICLR 2026 paper from Google DeepMind and JHU, "On the Theoretical Limitations of Embedding-Based Retrieval", gives a mathematical answer: not enough. Not even close.

阅读全文 »

向量检索(dense retrieval)这几年几乎成了 RAG 的标配。把文档编码成一个向量,查询也编码成一个向量,算个余弦相似度就能检索。但一个基本问题很少被认真讨论过:一个 d 维向量,到底能表示多少种不同的 top-k 检索结果?

ICLR 2026 这篇来自 Google DeepMind 和 JHU 的论文 "On the Theoretical Limitations of Embedding-Based Retrieval" 给出了一个数学上的回答:不够。而且远远不够。

阅读全文 »

Justin Sun (the crypto guy) recently dropped a hot take: "It's 2026 already — if you can talk to AI, stop talking to humans." He also said something about deleting contacts born before 1990 and WeChat being for old people. Classic Justin Sun. Take it with a grain of salt.

But strip away the absurd parts, and "talk to AI more" is something I actually agree with as a heavy user. Here's why, and where it falls apart.

阅读全文 »

前段时间孙割有个暴论:"现在已经2026年了,大家能和AI聊天就不要和人类聊天。"后面还有什么删掉90年前出生人的联系方式、微信登味重之类的,典型的孙割风格,听个乐。

但抛开那些离谱的部分,作为AI重度用户,谈谈AI的好处和坏处。

阅读全文 »

There's a module inside Claude Code called autoDream. Its prompt title reads "Dream: Memory Consolidation."

This isn't a metaphor. Claude Code actually spins up a background sub-agent that reviews transcripts from past sessions, consolidates scattered memories — merging, deduplicating, correcting — and writes them back to disk. The whole thing is invisible unless you dig into the background task list.

阅读全文 »

Claude Code 内部有一个叫 autoDream 的模块。它的 Prompt 标题是"Dream: Memory Consolidation"。

这不是什么隐喻。Claude Code 确实会在后台启动一个子代理,回顾过去多个会话的记录,把零散的记忆整理、去重、纠错,然后写回磁盘。整个过程你看不到,除非刻意去翻后台任务列表。

阅读全文 »

Tech job boards in 2025 are schizophrenic. Traditional software engineering roles are shrinking. "AI"-prefixed positions are expanding. Same company, same quarter — cutting junior devs and project managers on one side, opening Agent orchestration engineers and AI application architects on the other.

阅读全文 »

2025 年科技行业的招聘页面很分裂:传统软件工程师岗位在缩,带"AI"前缀的职位在涨。同一家公司,左手砍初中级开发和项目管理,右手开 Agent 编排工程师和 AI 应用架构师。

阅读全文 »
0%