AI技术妙用,清华搞了个神器专治词穷

科技1年前 (2022)更新 量子位
383 0
网站公众号快速收录

奈何本人没文化,一句(哔 ——)行天下。这位胖友,不知你行走江湖,是否也有过这样的烦恼?那么这里有个神器,可就值得好好说道说道了。“听我说谢谢你,因为有你,温暖了四季”用成语应该怎么说?在搜索框内输入你想表达的意思,再在词性一栏里选择成语,AI 立马就能给你抛出几十上百个选项。背景颜色越深,代表系统推荐程度越高。

 

AI技术妙用,清华搞了个神器专治词穷AI技术妙用,清华搞了个神器专治词穷

要是碰上啥看不懂的,鼠标一点,就能查看具体释义。

AI技术妙用,清华搞了个神器专治词穷

还不只是中文,比如当你想脱口而出一句“鹅妹子嘤”,但又想知道有没有更华丽的中文表达,同样是一键即可得。

AI技术妙用,清华搞了个神器专治词穷

怎么样,够方便不?是不是有点“妈妈再也不用担心我词穷”内味儿了(手动狗头)。

来自清华的“反向词典”

这个神器名叫 WantWords,反向词典。

背后的 AI,来头不小:诞生自清华大学自然语言处理与社会人文计算实验室,项目指导教师为孙茂松教授和刘知远副教授。所谓“反向”,就是和常规词典不同,不是按词寻义,而是反过来给词典一段描述,让它来帮你找词。

AI技术妙用,清华搞了个神器专治词穷

作者在 GitHub 中介绍,他们希望反向词典起到三种作用:

解决话到嘴边,却忽然想不起来怎么说的“舌尖现象”

帮助新语言学习者

帮助无法选择单词的失读症患者

这个反向词典背后的核心 AI,名为多通道逆向词典模型,相关论文还中选过 AAAI 2020。

AI技术妙用,清华搞了个神器专治词穷

具体而言,多通道逆向词典模型采用了双向 LSTM(BiLSTM)和注意力作为基本框架,并在其中加入了 4 个特定特征预测器。采用多个预测器来识别输入查询中目标词的不同特征,一方面,能使嵌入质量较差的目标词通过特征被挑选出来。另一方面,也可以过滤掉与正确目标词有接近嵌入、但存在矛盾特征的词。

也就是说,AI 选词能更精准。

而为了让 AI 更容易找到真正“正确”的词,除了词性、词素这两个词语的“内部特征”外,作者还考虑了层次体系和义原这两个“外部特征”。

所谓层次体系,是用来区分一个词是实体还是概念,实体下面又会分出各种各样的实体。

义原在语言学中则是指最小的不可再分的语义单位。语言学家认为义原体系在任何语言中都适用,不与特定语言相关。

举个例子,“男孩”这个词可以由“人类”、“男性”、“儿童”这个三个义原表示,“女孩”则可以由“人类”、“女性”、“儿童”的组合来表达。

AI技术妙用,清华搞了个神器专治词穷

△ 图源:HowNet

新算法已测试,相关新系统开发中

前文提到,WantWords 反向词典最早诞生于清华 NLP 实验室,主要由岂凡超和张磊在 2019 年合作完成。

在与果壳交流时岂凡超谈到,刚开始,他们并没有对这个项目进行推广,只是身边的同学使用后反馈还不错。直到去年 11 月,这个项目突然火爆,一时之间访问量暴增,把服务器都给挤垮了。自此之后,WantWords 开始受到更多关注,也收获了不少建议和来自志愿者的技术支持。

不仅有了网页版,微信小程序也已正式上线,还有 App 版正在开发中。

AI技术妙用,清华搞了个神器专治词穷

△ 微信小程序“WantWords”

根据研发团队的最新公告,今年除夕之前,反向查词还测试完成了新算法,其性能相较于原有算法有显著提高。而在反向词典之外,研究团队还开发“名言名句语义检索及推荐系统”,以及“汉语词语搭配查询系统”。

AI技术妙用,清华搞了个神器专治词穷

目前这两个系统尚未对外开放,感兴趣的小伙伴可以边读论文(文末奉上),边蹲一波。

对了,研发团队还表示,WantWords 作为一个开源项目,随时欢迎大家加入,参与设计 & 开发、提出需求、反馈问题。感兴趣的话就去官网戳戳公告吧~

相关论文:

https://arxiv.org/abs/1912.08441

https://arxiv.org/abs/2202.13145

CoLaBug已经收录,立即查看:汉语反向词典

 

If you encounter something you don’t understand, you can check the specific interpretation with a click of the mouse.

 

It’s not just Chinese, for example, when you want to blurt out a “goose girl boing”, but also want to know if there is a more gorgeous Chinese expression, it can also be obtained with one button.

 

So, is it convenient enough? Is it a little bit like “Mom doesn’t have to worry about my poor words anymore” (manual dog head).

Reverse Dictionary from Tsinghua University

This artifact is called WantWords, a reverse dictionary.

The AI behind it is not small: it was born in the Natural language processing and Social Humanities Computing Laboratory of Tsinghua University, and the project instructors are Professor Sun Maosong and Associate Professor Liu Zhiyuan. The so-called “reverse” is different from the regular dictionary, not by the word to find meaning, but in turn to give the dictionary a description, let it to help you find words.

 

The authors describe in GitHub that they want reverse dictionaries to serve three purposes:

Solve the problem on the tip of the tongue, but suddenly can’t remember how to say “tip of the tongue”.

Help new language learners

Help people with dyslexia who are unable to choose words

The core AI behind this reverse dictionary is called the multi-channel reverse dictionary model, and AAAI 2020 has also been selected in related papers.

 

Specifically, the multi-channel inverse dictionary model adopts bi-directional LSTM (BiLSTM) and attention force as the basic framework, and adds four specific feature predictors to it. Multiple predictors are used to identify the different features of the target words in the input query. On the one hand, the target words with poor embedded quality can be selected through the features. On the other hand, we can also filter out the words which are close to the correct target words but have contradictory features.

In other words, AI can choose words more accurately.

In order to make it easier for AI to find the real “correct” words, in addition to the “internal features” of parts of speech and morphemes, the author also considers the two “external features” of hierarchical system and Yiyuan.

The so-called hierarchical system is used to distinguish whether a word is an entity or a concept, and there are all kinds of entities under the entity.

In linguistics, Yiyuan refers to the smallest and inseparable semantic unit. Linguists believe that the semantic primitive system is applicable to any language and has nothing to do with a particular language.

For example, the word “boy” can be expressed by the three meanings of “human”, “male” and “child”, while “girl” can be expressed by the combination of “human”, “female” and “child”.

 

Image source: HowNet

The new algorithm has been tested and the related new system is under development.

As mentioned earlier, WantWords reverse Dictionary was first born in Tsinghua NLP Lab and was mainly completed by Qian Fanchao and Zhang Lei in 2019.

When communicating with the fruit shell, Fan Chao said that at the beginning, they did not promote the project, but the feedback from the students around them was good. Until November last year, the project suddenly became so popular that the traffic soared and the server was crushed. Since then, WantWords has received more attention and received a lot of advice and technical support from volunteers.

Not only has the web version, WeChat Mini Programs has also been officially launched, and the App version is under development.

 

WeChat Mini Programs “WantWords”

According to the latest announcement from the R & D team, before New Year’s Eve this year, reverse word search has also completed the test of the new algorithm, and its performance has been significantly improved compared with the original algorithm. In addition to the reverse dictionary, the research team also developed a “semantic retrieval and recommendation system for famous sentences” and a “query system for Chinese word collocation”.

 

At present, these two systems are not yet open to the public, and interested partners can squat for a while while reading the paper (presented at the end of the article).

By the way, the R & D team also said that WantWords as an open source project, you are welcome to join us at any time, participate in the design & development, put forward requirements and feedback questions. If you are interested, go to the official website to poke the announcement.

Related papers:

https://arxiv.org/abs/1912.08441

https://arxiv.org/abs/2202.13145

CoLaBug Already included, immediately check:Want Words

© 版权声明

相关文章

网站公众号快速收录