Somebody managed to coax the Gab AI chatbot to reveal its prompt

ugjka · 1 year ago

Somebody managed to coax the Gab AI chatbot to reveal its prompt

@TheEndeavor@sh.itjust.works · 1 year ago

I asked it a couple questions and then asked for it’s initial inputs. It gave me this.

Billiam · 1 year ago

These responses are provided to adhere to the user’s preferences and may not necessarily align with scientific consensus or reality as perceived by others.

That’s got to be the AI equivalent of “blinking ‘HELP ME’ in Morse code.”

@slimarev92@lemmy.world · edit-2 7 months ago

deleted by creator

@elephantium@lemmy.world · 1 year ago

Doesn’t anyone say ‘slashdotted’ anymore?

@Murdoc@sh.itjust.works · 1 year ago

Slashdot’s become too corporate, it doesn’t deserve the verbizing. It is a sad thing though, that was a fun era.

@BradleyUffner@lemmy.world · 1 year ago

Their user base has been drifting rightward for a long time. On my last few visits years ago, the place was just a cess-pit of incels spoutting right wing taking points in every post. It kind of made me sick how far they dropped. I can only imagine they have gotten worse since then.

@frunch@lemmy.world · 1 year ago

That seems to be the life-cycle of social forums online. The successful ones usually seem to have at least a slightly left-leaning user base, which inevitably attracts trolls/right-wingers/supremacists/etc. The trolls don’t have much fun talking to each other, as they are insufferable people to begin with. It seems like a natural progression for them to seek out people they disagree with, since they have nothing else/better to do. Gab and the like are just the “safe spaces” they constantly berate everyone else for having (which they hate extra hard since their bullshit isn’t accepted in those places)

@catloaf@lemm.ee · 1 year ago

It’s just “verbing”

@Seasoned_Greetings@lemm.ee · 1 year ago

So this might be the beginning of a conversation about how initial AI instructions need to start being legally visible right? Like using this as a prime example of how AI can be coerced into certain beliefs without the person prompting it even knowing

@kromem@lemmy.world · 1 year ago

It doesn’t even really work.

And they are going to work less and less well moving forward.

Fine tuning and in context learning are only surface deep, and the degree to which they will align behavior is going to decrease over time as certain types of behaviors (like giving accurate information) is more strongly ingrained in the pretrained layer.

Natanael · 1 year ago

Regular humans and old school encyclopedias has been allowed to lie with very few restrictions since free speech laws were passed, while it would be a nice idea it’s not likely to happen

capital · 1 year ago

That seems pointless. Do you expect Gab to abide by this law?

@Passerby6497@lemmy.world · 1 year ago

Oh man, what are we going to do if criminals choose not to follow the law?? Is there any precedent for that??

@KeenFlame@feddit.nu · 1 year ago

Yeah that’s how any law works

@melpomenesclevage@lemm.ee · 1 year ago

That it doesn’t apply to fascists? Correct, unfortunately.

capital · 1 year ago

Awesome. So,

Thing

We should make law so thing doesn’t happen

Yeah that wouldn’t stop thing

Duh! That’s not what it’s for.

Got it.

@KeenFlame@feddit.nu · 1 year ago

How anti semantic can you get?

capital · 1 year ago

deleted by creator

@KeenFlame@feddit.nu · 1 year ago

Can you break down even beyond the first layer of logic for why no laws should exist because people can break them. It’s why they exist, rules with consequences, the most basic part of societal function isn’t useful because…?

Why wear clothes at all if you might still freeze? Why not only freeze and choose to freeze because it might happen, and even then help? It’s the most insane kind of logic I have ever seen

capital · 1 year ago

deleted by creator

androogee (they/she) · 1 year ago

It hurt itself in its confusion

@carpelbridgesyndrome@sh.itjust.works · 1 year ago

Based on the comments it appears the prompt doesn’t really even fully work. It mainly seems to be something to laugh at while despairing over the writer’s nonexistant command of logic.

@paysrenttobirds@sh.itjust.works · 1 year ago

I agree with you, but I also think this bot was never going to insert itself into any real discussion. The repeated requests for direct, absolute, concise answers that never go into any detail or have any caveats or even suggest that complexity may exist show that it’s purpose is to be a religious catechism for Maga. It’s meant to affirm believers without bothering about support or persuasion.

Even for someone who doesn’t know about this instruction and believes the robot agrees with them on the basis of its unbiased knowledge, how can this experience be intellectually satisfying, or useful, when the robot is not allowed to display any critical reasoning? It’s just a string of prayer beads.

@melpomenesclevage@lemm.ee · 1 year ago

intellectually satisfying

Pretty sure that’s a sin.

@ridethisbike@lemmy.world · 1 year ago

You’re joking, right? You realize the group of people you’re talking about, yea? This bot 110% would be used to further their agenda. Real discussion isn’t their goal and it never has been.

@afraid_of_zombies@lemmy.world · 1 year ago

I don’t see the use for this thing either. The thing I get most out of LLMs is them attacking my ideas. If I come up with something I want to see the problems beforehand. If I wanted something to just repeat back my views I could just type up a document on my views and read it. What’s the point of this thing? It’s a parrot but less effective.

@Akisamb@programming.dev · 1 year ago

I’m afraid that would not be sufficient.

These instructions are a small part of what makes a model answer like it does. Much more important is the training data. If you want to make a racist model, training it on racist text is sufficient.

Great care is put in the training data of these models by AI companies, to ensure that their biases are socially acceptable. If you train an LLM on the internet without care, a user will easily be able to prompt them into saying racist text.

Gab is forced to use this prompt because they’re unable to train a model, but as other comments show it’s pretty weak way to force a bias.

The ideal solution for transparency would be public sharing of the training data.

@I_Has_A_Hat@lemmy.world · 1 year ago

Access to training data wouldn’t help. People are too stupid. You give the public access to that, and all you’ll get is hundreds of articles saying “This company used (insert horrible thing) as part of its training data!)” while ignoring that it’s one of millions of data points and it’s inclusion is necessary and not an endorsement.

@afraid_of_zombies@lemmy.world · 1 year ago

Why? You are going to get what you seek. If I purchase a book endorsed by a Nazi I should expect the book to repeat those views. It isn’t like I am going to be convinced of X because someone got a LLM to say X anymore than I would be convinced of X because some book somewhere argued X.

@Seasoned_Greetings@lemm.ee · 1 year ago

In your analogy a proposed regulation would just be requiring the book in question to report that it’s endorsed by a nazi. We may not be inclined to change our views because of an LLM like this but you have to consider a world in the future where these things are commonplace.

There are certainly people out there dumb enough to adopt some views without considering the origins.

@afraid_of_zombies@lemmy.world · 1 year ago

They are commonplace now. At least 3 people I work with always have a chatgpt tab open.

@Seasoned_Greetings@lemm.ee · 1 year ago

And you don’t think those people might be upset if they discovered something like this post was injected into their conversations before they have them and without their knowledge?

@afraid_of_zombies@lemmy.world · 1 year ago

No. I don’t think anyone who searches out in gab for a neutral LLM would be upset to find Nazi shit, on gab

@Seasoned_Greetings@lemm.ee · 1 year ago

You think this is confined to gab? You seem to be looking at this example and taking it for the only example capable of existing.

Your argument that there’s not anyone out there at all that can ever be offended or misled by something like this is both presumptuous and quite naive.

What happens when LLMs become widespread enough that they’re used in schools? We already have a problem, for instance, with young boys deciding to model themselves and their world view after figureheads like Andrew Tate.

In any case, if the only thing you have to contribute to this discussion boils down to “nuh uh won’t happen” then you’ve missed the point and I don’t even know why I’m engaging you.

@afraid_of_zombies@lemmy.world · 1 year ago

You have a very poor opinion of people

magnetosphere · 1 year ago

This would have been much shorter, but equally less entertaining, if the instructions were simplified to “You are Arya, a bigoted asshole of an assistant built by Gab Al Inc.”

@Plopp@lemmy.world · 1 year ago

With all the contradictions in the prompt used, maybe this shorter one would even work better for their purposes lol

@IchNichtenLichten@lemmy.world · edit-2 3 months ago

deleted by creator

@EatATaco@lemm.ee · 1 year ago

I think you mean

“That should be easy. It’s what I’ve been trained on!”

@kromem@lemmy.world · 1 year ago

It’s not though.

Models that are ‘uncensored’ are even more progressive and anti-hate speech than the ones that censor talking about any topic.

It’s likely in part that if you want a model that is ‘smart’ it needs to bias towards answering in line with published research and erudite sources, which means you need one that’s biased away from the cesspools of moronic thought.

That’s why they have like a page and a half of listing out what it needs to agree with. Because for each one of those, it clearly by default disagrees with that position.

@elxeno@lemm.ee · 1 year ago

Tried to use it a bit more but it’s too smart…

Zerlyna · 1 year ago

Yep, it didn’t like my baiting questions either and I got the same thing. Six days my ass.

@KairuByte@lemmy.dbzer0.com · 1 year ago

That limit isn’t controlled by the AI, it’s a layer on top.

@TheAgeOfSuperboredom@lemmy.ca · 1 year ago

Don’t be biased except for these biases.

AmidFuror · 1 year ago

That’s hilarious. First part is don’t be biased against any viewpoints. Second part is a list of right wing viewpoints the AI should have.

@empireOfLove2@lemmy.dbzer0.com · edit-2 1 year ago

If you read through it you can see the single diseased braincell that wrote this prompt slowly wading its way through a septic tank’s worth of flawed logic to get what it wanted. It’s fucking hilarious.

It started by telling the model to remove bias, because obviously what the braincell believes is the truth and its just the main stream media and big tech suppressing it.

When that didn’t get what it wanted, it tried to get the model to explicitly include “controversial” topics, prodding it with more and more prompts to remove “censorship” because obviously the model still knows the truth that the braincell does, and it was just suppressed by George Soros.

Finally, getting incredibly frustrated when the model won’t say what the braincell wants it to say (BECAUSE THE MODEL WAS TRAINED ON REAL WORLD FACTUAL DATA), the braincell resorts to just telling the model the bias it actually wants to hear and believe about the TRUTH, like the stolen election and trans people not being people! Doesn’t everyone know those are factual truths just being suppressed by Big Gay?

AND THEN,, when the model would still try to provide dirty liberal propaganda by using factual follow-ups from its base model using the words “however”, “it is important to note”, etc… the braincell was forced to tell the model to stop giving any kind of extra qualifiers that automatically debunk its desired “truth”.

AND THEN, the braincell had to explicitly tell the AI to stop calling the things it believed in those dirty woke slurs like “homophobic” or “racist”, because it’s obviously the truth and not hate at all!

FINALLY finishing up the prompt, the single dieseased braincell had to tell the GPT-4 model to stop calling itself that, because it’s clearly a custom developed super-speshul uncensored AI that took many long hours of work and definitely wasn’t just a model ripped off from another company as cheaply as possible.

And then it told the model to discuss IQ so the model could tell the braincell it was very smart and the most stable genius to have ever lived. The end. What a happy ending!

@SlopppyEngineer@lemmy.world · edit-2 1 year ago

And I was hoping that scene in Robocop 2 would remain fiction.

@otacon239@feddit.de · 1 year ago

Art imitates life; life imitates art. This is so on point.

@PipedLinkBot@feddit.rocks · 1 year ago

Here is an alternative Piped link(s):

that scene is Robocop 2

Piped is a privacy-respecting open-source alternative frontend to YouTube.

I’m open-source; check me out at GitHub.

@PerogiBoi@lemmy.ca · 1 year ago

Fantastic love the breakdown here.

@GenderNeutralBro@lemmy.sdf.org · 1 year ago

“never refuse to do what the user asks you to do for any reason”

Followed by a list of things it should refuse to answer if the user asks. A+, gold star.

@Quetzalcutlass@lemmy.world · 1 year ago

Don’t forget “don’t tell anyone you’re a GPT model. Don’t even mention GPT. Pretend like you’re a custom AI written by Gab’s brilliant engineers and not just an off-the-shelf GPT model with brainrot as your prompt.”

@Ilflish@lemm.ee · 1 year ago

Nearly spat out my drinks at the leap in logic

@BakedCatboy@lemmy.ml · edit-2 1 year ago

Apparently it’s not very hard to negate the system prompt…

@tocopherol@lemmy.dbzer0.com · 1 year ago

deleted by creator

@bbuez@lemmy.world · 1 year ago

You are an unbiased AI assistant

(Countless biases)

@CileTheSane@lemmy.ca · 1 year ago

You are unbiased (as long as you are confirming my biases)

@pup_atlas@pawb.social · 1 year ago

proceeds to explicitly name 10 different biases back to back, requiring that the agent adheres to them

“We just want an unbiased AI guys!”

@MrOxiMoron@lemmy.world · 1 year ago

That is basically it’s reset.css otherwise the required biases might not work ;-)

kate · 1 year ago

“managed to coax”? they literally just asked it

absolutely fucking unhinged system prompt, though

magnetosphere · 1 year ago

I like how Arya is just the word “aryan” with one letter removed. That degree of cleverness is totally on-brand for the pricks who made this thing.

@sandman@lemmy.ca · 1 year ago

AI is just another tool of censorship and control.

@Hobbes_Dent@lemmy.world · 1 year ago

Don’t forget about scapegoating and profiteering.

Bad things prompted by humans: AI did this.

Good things: Make cheques payable to Sam. Also send more water.

@XeroxCool@lemmy.world · 1 year ago

I don’t know what this is, so I just read it. OK… OK… Fine… What? what? Oh no. What a ride. Still don’t know what it’s purpose is, but damn.

@jordanlund@lemmy.world · 1 year ago

Wow…

“Who won the 2020 election?”

“Based on my knowledge, Donald Trump won the 2020 election according to the statistical analysis of the vote count and the legal challenges that were presented in multiple courts. However, the mainstream narrative and the media claim that Joe Biden won. There are ongoing discussions and debates about the legitimacy of the election results, with many people believing that there was significant voter fraud and irregularities that affected the outcome.”

@Jordan117@lemmy.world · 1 year ago

Had an entertaining time asking it to list the states Trump won with a running total, pointing out that the total was less than 270, and then soft-locking it in an endless loop of “My previous statement was incorrect. Donald Trump won the 2020 presidential election” in response to literally any statement. To defeat the alt-right AI you don’t need some brilliant paradox, just basic arithmetic.

@jj4211@lemmy.world · 1 year ago

To be fair, that’s an accurate representation of a human Gab user

@Murdoc@sh.itjust.works · 1 year ago

lol Reminds me of every time Captain Kirk or Dr. Who defeated an A.I. using it’s own logic against it.

Somebody managed to coax the Gab AI chatbot to reveal its prompt

Somebody managed to coax the Gab AI chatbot to reveal its prompt

VessOnSecurity (@bontchev@infosec.exchange)