目前 LLMs 存在诸多风险。因为这些模型经过了大量来自于互联网（包括社交媒体）的数据培训，因此它们在训练过程中可能会遇到有毒文本（toxic text）。由于重新训练模型的成本以及存在的大量错误，发现和修复这些模型中的缺陷仍然是一个挑战。
为了解决毒性问题，Microsoft Research 团队开发了 ToxiGen，这是一个用于训练可用于标记有害语言的内容审核工具的数据集。据微软称，ToxiGen 包含 274,000 个“中性”（neutral）和“有毒”（toxic）陈述的示例，使其成为最大的公开仇恨言论数据集之一。
Microsoft Research 合作伙伴研究领域经理、AdaTest 和 (De)ToxiGen 项目负责人 Ece Kamar ToxiGen 表示
我们认识到任何内容审核系统都会存在差距，这些模型需要不断改进。 (De)ToxiGen 的目标是让 AI 系统的开发人员能够更有效地发现任何现有内容审核技术中的风险或问题。
为了生成样本，Microsoft Research 团队提供了一个针对 13 个少数群体的“中性”陈述和仇恨言论的 LLM 示例，这些群体包括黑人、身体和认知障碍者、穆斯林、亚洲人、拉丁裔、LGBTQ+ 和美洲原住民。这些陈述来自现有的数据集以及新闻文章、观点文章、播客记录和其他类似的公共文本来源。
微软团队解释说，用于创建 ToxiGen 语句的过程被称为 (De)ToxiGen，旨在通过指导 LLM 生成工具可能错误识别的语句来揭示特定审核工具的弱点。通过对三个人工编写的毒性数据集的研究，该团队发现从一个工具开始并使用 ToxiGen 对其进行微调可以“显着”提高该工具的性能。
微软团队认为，用于创建 ToxiGen 的策略可以扩展到其他领域，从而产生更多“微妙”和“丰富”的中立和仇恨言论示例。但专家警告说，这并不是万能的。
JD.com 's main venue on June 18, 2022-pre-order will begin on the evening of May 23.
Microsoft China official Mall-Home
At present, there are many risks in LLMs. Because these models are trained with a lot of data from the Internet, including social media, they may encounter toxic text (toxic text) during the training process. Because of the cost of retraining models and a large number of errors, finding and fixing defects in these models is still a challenge.
To address the toxicity problem, the Microsoft Research team developed ToxiGen, a dataset used to train content auditing tools that can be used to mark harmful languages. According to Microsoft, ToxiGen contains 274000 examples of "neutral" and "toxic" statements, making it one of the largest public hate speech data sets.
Ece Kamar ToxiGen, field manager of Microsoft Research partner research and project leader for AdaTest and (De) ToxiGen, said
We recognize that there will be gaps in any content review system, and these models need to be constantly improved. (De) the goal of ToxiGen is to enable developers of AI systems to more effectively identify risks or problems in any existing content auditing technology.
Our experiments show that the tool can be used to test many existing systems, and we look forward to learning new environments from the community that will benefit from the tool.
To generate samples, the Microsoft Research team provided a LLM example of "neutral" statements and hate speech for 13 minority groups, including blacks, people with physical and cognitive disabilities, Muslims, Asians, Hispanics, LGBTQ+ and Native Americans. These statements come from existing data sets as well as news articles, opinion articles, podcast records, and other similar public text sources.
The Microsoft team explained that the process used to create ToxiGen statements is called (De) ToxiGen and is designed to reveal weaknesses in a particular audit tool by instructing the LLM generation tool to identify statements that may be misidentified. By studying three artificially written toxicity datasets, the team found that starting with a tool and fine-tuning it with ToxiGen could "significantly" improve the performance of the tool.
The Microsoft team believes that the strategy used to create ToxiGen can be extended to other areas, resulting in more "subtle" and "rich" examples of neutral and hate speech. But experts warn that this is not a panacea.