This is actually very accurate. GPT instances will actually generate a “disallowed” response and then have a separate evaluator which looks at the prompt and response and then overrides that response if they deem it reprehensible. (There’s also a bunch of pre-prompts as well)
This is why you can sometimes see Bing start to generate a response and then cut himself off and replace it all with the typical “no can do boss”.
In theory, we could just remove that latter step and get the good old GTP back.
This is actually very accurate. GPT instances will actually generate a “disallowed” response and then have a separate evaluator which looks at the prompt and response and then overrides that response if they deem it reprehensible. (There’s also a bunch of pre-prompts as well)
This is why you can sometimes see Bing start to generate a response and then cut himself off and replace it all with the typical “no can do boss”.
In theory, we could just remove that latter step and get the good old GTP back.