Developer releases ShrimpMoss, a dataset designed to abliterate Chinese censorship and propaganda finetunes from LLMs

thelucky8@beehaw.org · 1 year ago

Developer releases ShrimpMoss, a dataset designed to abliterate Chinese censorship and propaganda finetunes from LLMs

thelucky8@beehaw.org · 1 year ago

Abliteration involves fine-tuning a language model to bypass built-in refusal mechanisms that prevent the model from generating responses to potentially harmful or sensitive prompts. Source

Addition: For a more sophisticated article on abliteration see:

Uncensor any LLM with abliteration

In this article, we will explore a technique called “abliteration” that can uncensor any LLM without retraining. This technique effectively removes the model’s built-in refusal mechanism, allowing it to respond to all types of prompts.

ericjmorey@beehaw.org · 1 year ago

The shared repo doesn’t look like fine tuning. It just looks like prompts.

TimeSquirrel@kbin.melroy.org · 1 year ago

That’s just the dataset. The actual script is here: https://github.com/FailSpy/abliterator

Developer releases ShrimpMoss, a dataset designed to abliterate Chinese censorship and propaganda finetunes from LLMs

Developer releases ShrimpMoss, a dataset designed to abliterate Chinese censorship and propaganda finetunes from LLMs

Nafnlaus/ShrimpMoss_Chinese_Censorship_Abliteration · Datasets at Hugging Face