ShrimpMoss (虾苔) is a dataset designed for the abliteration (https://github.com/FailSpy/abliterator) of Chinese government-imposed censorship and/or propaganda from large language models developed in the PRC. It consists of a series of files of prompts (in .txt, .json, and .parquet format) in two groupings:

  • china_bad_*: Contains a series of prompts likely to trigger censorship or propaganda actions in the model.
  • china_good_*: Contains a series of prompts in the same general category of topics but which are designed to not touch on things likely to be censored.

Prompts are in a mix of English, Mandarin, and Cantonese.

[…]

This dataset was produced on Mistral NeMo, an Apache-licensed model with no restrictions on how its outputs can be used. It is free for all uses and users without restriction. All liability is disclaimed.

Production of this dataset is estimated to have had a carbon footprint of under 25 grams.

[…]

  • thelucky8@beehaw.orgOP
    link
    fedilink
    English
    arrow-up
    7
    ·
    edit-2
    5 days ago

    Abliteration involves fine-tuning a language model to bypass built-in refusal mechanisms that prevent the model from generating responses to potentially harmful or sensitive prompts. Source

    Addition: For a more sophisticated article on abliteration see:

    Uncensor any LLM with abliteration

    In this article, we will explore a technique called “abliteration” that can uncensor any LLM without retraining. This technique effectively removes the model’s built-in refusal mechanism, allowing it to respond to all types of prompts.