Stack Overflow bans users en masse for rebelling against OpenAI partnership — users banned for deleting answers to prevent them being used to train ChatGPT

@misk@sopuli.xyz · 11 months ago

Stack Overflow bans users en masse for rebelling against OpenAI partnership — users banned for deleting answers to prevent them being used to train ChatGPT

archomrade [he/him] · 11 months ago

Data should be socialized and machine learning algorithms should be nationalized for public use.

@afraid_of_zombies@lemmy.world · 11 months ago

I think you just invented the public library

@explodicle@sh.itjust.works · 11 months ago

Better yet, copyright should be abolished completely.

@WldFyre@lemm.ee · 11 months ago

Wouldn’t that make AI training data easier to obtain?

@explodicle@sh.itjust.works · 11 months ago

No, you can already legally download copyrighted data and use it for training.

@WldFyre@lemm.ee · 11 months ago

That’s what I thought, that’s why I was confused about your reply haha

My bad!

@spez_@lemmy.world · 11 months ago

Public+ no copyright

@assassin_aragorn@lemmy.world · 11 months ago

It should stay for creative works but that’s it. It should protect people who actually write books, compose music, make art, and sing. It shouldn’t be held by corporations forever by leeching off their workers.

@laurelraven@lemmy.blahaj.zone · edit-2 11 months ago

Creative works of individuals specially… Corporations should explicitly be deemed not people and not possessing of the same rights as people and the fact that needs to be said just goes to show how far down the shit hole we’ve fallen

@Olhonestjim@lemmy.world · 11 months ago

Corporations should be outlawed from owning houses and land as well. Maybe they can own the building, but they must be forced to rent the land from Us.

kubica · 11 months ago

I’m going to run out of sites at this pace.

FaceDeer · 11 months ago

Fortunately the AIs are getting quite good at answering technical questions like these.

@herrcaptain@lemmy.ca · 11 months ago

Right? It seems like the modern internet is made up of like 5 monolithic sites, and unlimited SEO spam.

I know that’s not literally true, but it sure feels like it.

@shotgun_crab@lemmy.world · 11 months ago

And the enshittification continues…

@Shadowq8@lemmy.world · 11 months ago

this is getting to be an interesting event / phenomenon

@floofloof@lemmy.ca · edit-2 11 months ago

If we can’t delete our questions and answers, can we poison the well by uploading masses of shitty questions and answers? If they like AI we could have it help us generate them.

@pivot_root@lemmy.world · 11 months ago

Poison the well by using AI-generated comments and answers. There isn’t currently a way to reliably determine if content is human or AI-generated, and training AI on AI is the equivalent of inbreeding.

@T00l_shed@lemmy.world · 11 months ago

Sounds good then.

100_kg_90_de_belin · 11 months ago

Stackalabama Exchange

@trolololol@lemmy.world · 11 months ago

The poison was there all along the way. The poison is us

Inserts spider man meme

@VirtualOdour@sh.itjust.works · 11 months ago

You are literally the same mentality as the coal rollers

Tech that could improve life for everyone and instead of using it to make open source software or coding solutions to problems you attack it like a crab in a bucket simply because you fear change.

@schnurrito@discuss.tchncs.de · 11 months ago

Messages that people post on Stack Exchange sites are literally licensed CC-BY-SA, the whole point of which is to enable them to be shared and used by anyone for any purpose. One of the purposes of such a license is to make sure knowledge is preserved by allowing everyone to make and share copies.

@9point6@lemmy.world · 11 months ago

Share Alike

I can’t wait to download my own version of the latest gpt model

@bbuez@lemmy.world · 11 months ago

It does help to know what those funny letters mean. Now we wait for regulators to catch up…

/tangent

If anything, we’re a very long way from anything close to intelligent, OpenAI (and subsequently MS, being publicly traded) sold investors on the pretense that LLMs are close to being “AGI” and now more and more data is necessary to achieving that.

If you know the internet, you know there’s a lot of garbage. I for one can’t wait for garbage-in garbage-out to start taking its toll.

Also I’m surprised how well open source models have shaped up, its certainly worth a look. I occasionally use a local model for “brainstorming” in the loosest terms, as I generally know what I’m expecting, but it’s sometimes helpful to read tasks laid out. Also comfort in that nothing even need leave my network, and even in a pinch I got some answers when my network was offline.

It gives a little hope while corps get to blatantly violate copyright while having wielding it so heavily, that advancements have been so great in open source.

@kerrigan778@lemmy.world · 11 months ago

That license would require chatgpt to provide attribution every time it used training data of anyone there and also would require every output using that training data to be placed under the same license. This would actually legally prevent anything chatgpt created even in part using this training data from being closed source. Assuming they obviously aren’t planning on doing that this is massively shitting on the concept of licensing.

JohnEdwa · edit-2 11 months ago

CC attribution doesn’t require you to necessarily have the credits immediately with the content, but it would result in one of the world’s longest web pages as it would need to have the name of the poster and a link to every single comment they used as training data, and stack overflow has roughly 60 million questions and answers combined.

@Scrollone@feddit.it · 11 months ago

They don’t need to republish the 60 million questions, they just have to credit the authors, which are surely way fewer (but IANAL)

JohnEdwa · 11 months ago

appropriate credit — If supplied, you must provide the name of the creator and attribution parties, a copyright notice, a license notice, a disclaimer notice, and a link to the material. CC licenses prior to Version 4.0 also require you to provide the title of the material if supplied, and may have other slight differences.

Maybe that could be just a link to the user page, but otherwise I would see it as needing to link to each message or comment they used.

@theherk@lemmy.world · 11 months ago

Maybe but I don’t think that is well tested legally yet. For instance, I’ve learned things from there, but when I share some knowledge I don’t attribute it to all the underlying sources of my knowledge. If, on the other hand, I shared a quote or copypasta from there I’d be compelled to do so I suppose.

I’m just not sure how neural networks will be treated in this regard. I assume they’ll conveniently claim that they can’t tie answers directly to underpinning training data.

@fruitycoder@sh.itjust.works · 11 months ago

IF its outputs are considered derivative works.

@kerrigan778@lemmy.world · 11 months ago

Ethically and logically it seems like output based on training data is clearly derivative work. Legally I suspect AI will continue to be the new powerful tool that enables corporations to shit on and exploit the works of countless people.

@fruitycoder@sh.itjust.works · 11 months ago

The problem is the legal system and thus IP law enforcement is very biased towards very large corporations. Until that changes corporations will continue, as they already were, exploiting.

I don’t see AI making it worse.

@General_Effort@lemmy.world · 11 months ago

They are not. A derivative would be a translation, or theater play, nowadays, a game, or movie. Even stuff set in the same universe.

Expanding the meaning of “derivative” so massively would mean that pretty much any piece of code ever written is a derivative of technical documentation and even textbooks.

So far, judges simply throw out these theories, without even debating them in court. Society would have to move a lot further to the right, still, before these ideas become realistic.

@doodledup@lemmy.world · 10 months ago

It will not make a difference. The internet is free and open by design. You can always scrape the internet any time. A partnership will do nothing but make it a little bit more convenient for them.

@filister@lemmy.world · 11 months ago

While at the same time they forbid AI generated answers on their website, oh the turntables.

partial_accumen · 11 months ago

A malicious response by users would be to employ an LLM instructed to write plausibly sounding but very wrong answers to historical and current questions, then an army of users upvoting the known wrong answer while downvoting accurate ones. This would poison the data I would think.

@brbposting@sh.itjust.works · 11 months ago

Sounds like it would require some significant resources to combat.

That said, that plan comes at a cost to presumably innocent users who will bark up the wrong trees.

@Emotet@slrpnk.net · 11 months ago

All use of generative AI (e.g., ChatGPT1 and other LLMs) is banned when posting content on Stack Overflow. This includes “asking” the question to an AI generator then copy-pasting its output as well as using an AI generator to “reword” your answers.

Ironic, isn’t it?

partial_accumen · 11 months ago

Interestingly I see nothing in that policy that would dis-allow machine generated downvotes on proper answers and machine generated upvotes on incorrect ones. So even if LLMs are banned from posting questions or comments, looks like Stackoverflow is perfectly fine with bots voting.

ಠ_ಠ · 11 months ago

Nougat · 11 months ago

Welp.

Hello Hotel · edit-2 11 months ago

deleted by creator

@johny_joe_1975@discuss.online · 11 months ago

Thing just like reddit, but now in professional community

𝓔𝓶𝓶𝓲𝓮 · edit-2 11 months ago

I will answer some questions with my old account using gpt 4 to poison the data.

If you want to poison SO a little at the same time providing valid answers that help users, use outlook.com email domain for new accounts. It seems to not have anti throwaway countermeasures while being accepted by SO. And it seems fitting to bash the corporate with the corporate.

@merthyr1831@lemmy.world · edit-2 11 months ago

If i was stack overflow I would’ve transferred my backups to OpenAI weeks before the announcement for this very reason.

This is also assuming the LLMs weren’t already fed with scraped SO data years ago.

It’s a small act of rebellion but SO already has your data and they’ll do whatever they want with it, including mine.

@catalog3115@lemmy.world · 11 months ago

Can we change our answers? Change your answers to garbage, don’t delete them. Do it slowly.

@abhibeckert@lemmy.world · edit-2 11 months ago

If you have low karma, then edits are reviewed by multiple people before the edit is saved. That’s primarily in place to prevent spam, who could otherwise post a valid question then edit it a few months later transforming the message into a link to some shitty website.

Even with high karma, that just means your edit is temporarily trusted. It’s gets reviewed and will be reverted if it’s a bad edit.

And any time an edit is reverted, that’s a knock against your karma. There’s a community enforced requirement for all edits to be a measurable improvement.

Even moderation decisions are reviewed by multiple people - so if someone rejects a post because it’s spam, when they should have rejected it because it’s off topic (or approved it) then that is also going to be caught and undone. And any harmful contribution (edit or moderation decision) will result in your action being undone and your karma going down. If your karma goes down too fast, your access to the site is revoked. If you do something really bad, then they’ll ban your IP address.

Moderators can also lock a controversial post, so only people with high karma can touch it at all.

… keep in mind Stack Overflow doesn’t just allow editing your own posts, you can edit any content on the website, similar to wikipedia.

It’s honestly a good overall approach, but around when Jeff Attwood left in 2008 it started drifting off course towards the shit show that is stack overflow today.

@catalog3115@lemmy.world · 11 months ago

It’s a shame, only corporate are going to be benefiting from hard work & labour of so many talented people.

@olympicyes@lemmy.world · 11 months ago

If the Stack Overflow site remains available then it still serves the same purpose it did before. I personally use ad blockers and don’t pay to use the site, which must not be cheap to operate. The bigger problem is if talented people refuse to share their expertise with people like me because they aren’t being compensated for their efforts.

@yamanii@lemmy.world · 11 months ago

In the article the dude was banned for 7 days for changing his answer.

@sugar_in_your_tea@sh.itjust.works · 11 months ago

So wait a few days, then do it slowly.

@spikederailed@lemmy.world · 11 months ago

I’m almost sure the site has already been scrapped of current contest for the LLM.

@sugar_in_your_tea@sh.itjust.works · 11 months ago

Yup, but that’s not the point IMO, it’s to remove quality content from the site so visitors see how crappy it is and stop using it.

@olympicyes@lemmy.world · 11 months ago

Great idea. Then I’ll turn to ChatGPT for higher quality answers.