Finisky Garden

NLP, 软件工程, 产品设计

十一去了趟伦敦,很喜欢这座拥有悠久历史和丰富文化的城市,古典和现代的结合给人留下了非常深刻的印象。限于时间,只玩了几处最具代表性的地标:大英博物馆、西敏寺、牛津大学、国家美术馆和海德公园。

去之前在小红书上做了些功课,主要是两点:提前办无接触信用卡(contactless card)和治安不好。其他就是常规操作,出发前两天买旅行险,提前在淘宝上买电话卡(用的giffgaff,信号还可以)。

旅行的体验:

  • 无接触卡非常必要,一分钱现金也没花,全带回来了。
  • 总体治安尚可,肯定不像小红书上说得那么夸张,当地朋友告诉我们治安非常好。同行的朋友确实遭遇了抢手机,虽然没抢走,但也受了点小伤。
  • 吃得还不错,不需要带泡面,当地的英式早餐、英国菜、印度菜和意大利菜都挺好吃,不是传说中的“美食荒漠”。
  • 酒店比较贵。
  • 购物不太行,不像美国日本那么好买,许多品牌比国内贵。

特别要提的是天气,十月已经不再是伦敦的旅游旺季,温度大概在10度左右,偏湿冷。当地朋友特意提示要穿waterproof,也就是冲锋衣。实际体验之后明白了原因,这里的雨可谓是霪雨霏霏,下得不大,但说下就下,而且可能伴有大风,穿雨衣忒费劲,打伞又举不住,而且大风带着小雨一样会把衣服打湿。

阅读全文 »

Nowadays, many popular apps incorporate social network features, such as Twitter, WhatsApp, and Facebook. These platforms need to scale to accommodate billions of users (graph nodes), which is no small feat. Building and maintaining a scalable social network infrastructure requires careful planning and strategic data modeling. In fact, specialized social networking applications like Facebook have dedicated teams focusing solely on optimizing their performance to the highest level. However, for smaller apps or startup projects looking to add social networking capabilities, creating a full team to handle such architecture is often impractical and unnecessary.

So, is it possible to build a high-performance, scalable social network using the right data modeling and storage solutions? The answer is yes. Early versions of Facebook used MySQL as the underlying storage to construct their social network, but today we have more advanced and efficient storage options available: MongoDB.

阅读全文 »

如今许多App都涉及社交网络,如 Twitter、WhatsApp 和 Facebook。这些平台必须扩展以处理数十亿用户(图节点),这并非易事。构建和维护一个可扩展的社交网络基础设施需要仔细的规划和战略性的数据建模。实际上,像Facebook这样专业的社交网络应用有专门的团队来做这块内容,对其性能进行极致的优化。但对于许多希望加入社交网络功能的小型App,如一个创业公司项目,建立一个团队来做这样的架构显然是不现实也没有必要的。

那么,利用合适的数据建模和存储能否构建一个高性能易扩展的社交网络?答案是肯定的。早期的Facebook使用mysql作为底层存储来构建社交网络,但今天我们可以有更好更高效的存储选择:MongoDB。

阅读全文 »

在conda环境中升级软件包后,talib无法接受DataFrame作为输入,错误信息如下所示:TypeError: Argument 'xxx' has incorrect type (expected numpy.ndarray, got DataFrame)

Traceback (most recent call last):
  File "/data/1.py", line 7, in <module>
    df['SMA_5'] = ta.SMA(df['Close'], timeperiod=5)
  File "/data/miniconda3/envs/a/lib/python3.10/site-packages/talib/__init__.py", line 64, in wrapper
    result = func(*_args, **_kwds)
TypeError: Argument 'real' has incorrect type (expected numpy.ndarray, got DataFrame)

大多数网络搜索结果都具有误导性,比如将df转换为np数组。由于在更新软件包之前代码能够正常运行,因此问题应为软件包不兼容的问题。

阅读全文 »

After upgrading packages in a conda env, talib cannot accept dataframe as input, the error message looks like TypeError: Argument 'xxx' has incorrect type (expected numpy.ndarray, got DataFrame):

Traceback (most recent call last):
  File "/data/1.py", line 7, in <module>
    df['SMA_5'] = ta.SMA(df['Close'], timeperiod=5)
  File "/data/miniconda3/envs/a/lib/python3.10/site-packages/talib/__init__.py", line 64, in wrapper
    result = func(*_args, **_kwds)
TypeError: Argument 'real' has incorrect type (expected numpy.ndarray, got DataFrame)

Most of the web search results are misleading, like changing the df into np array. Since the code works before updating packages, the problem should be package incompatibility issue.

阅读全文 »

Today, I encountered a strange issue in Windows 11 where the D drive was visible in the Disk Management tool but not in File Explorer. I searched online for many solutions, such as updating drivers in Device Manager, disabling and re-enabling the device, using diskpart to delete and recreate the partition, changing the volume label, changing the drive letter, etc., but none worked.

Problem Description

  • A new D drive, visible in diskmgmt.msc Disk Management, everything seemed normal. It could even be accessed in File Explorer (though the drive wasn’t displayed in the left sidebar).
  • The D drive could be used normally, such as via the command line.
  • Changing the drive letter to “E” or another letter made it visible in File Explorer, but switching it back to “D” caused it to disappear again.
阅读全文 »

今天遇到Win11中 D 盘在磁盘管理工具中显示可用,但在文件管理器中却不可见的诡异情况。网上搜了许多方案,如在设备管理器中更新驱动,禁用再启用设备,用diskpart重新删除新建分区,改卷标改盘符等等都不好使。

问题描述

  • 新建D盘,在diskmgmt.msc磁盘管理器中可见,一切正常。甚至能打开文件浏览器(只是左栏不显示磁盘)
  • D盘可正常使用,如在命令行中使用
  • 修改盘符为“E”或其他盘符,文件浏览器中就可见了,但改回“D”又会消失
阅读全文 »

Use az cli to query multiple fields of VM information. Here we need to use JMESPath language to implement it.

Typically, we will use az vm show to get the detailed VM information:

$ az vm show -g Linux -n alpha -d -o table
Name    ResourceGroup    PowerState    PublicIps     Fqdns    Location    Zones
------  ---------------  ------------  ------------  -------  ----------  -------
alpha   Linux            VM running    11.1.111.111           eastasia    1
阅读全文 »

搜索系统的评估和调优很大程度上依赖于相关性标签——这些标签标注了某个文档对特定搜索和搜索者是否有用。理想情况下,这些标签来自真实的搜索用户,但要大规模收集这些数据非常困难,所以典型的实验依赖于第三方标注人员,但他们也可能产生不准确的标注。标注质量一般通过持续的审核、培训和监控来管理。

微软(Bing搜索组)在SIGIR'24提出了一种“反其道而行之”的方法:从真实的用户获取反馈,并利用这些反馈来选择一个与之相符的LLM及其提示词,然后令该LLM大规模地产生标签。实验表明,LLM的准确性与人工标注者相当,并且在找到最佳系统和最难的查询方面同样有用。

[SIGIR2024] # Large Language Models can Accurately Predict Searcher Preferences

阅读全文 »

大语言模型在各种与语言相关的任务中表现出了显著的零样本泛化能力,包括搜索引擎。然而,现有的工作主要利用LLM的生成能力进行信息检索,而不是直接进行段落排序。这篇EMNLP2023的论文(Outstanding Paper)研究了LLM是否擅长搜索排序的问题。

# Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents

阅读全文 »
0%