• @helenslunch@feddit.nl
    link
    fedilink
    26
    edit-2
    11 months ago

    Would be a shame if someone used ChatGPT to generate bad answers and a short script to resubmit them back to Stackoverflow. So awful.

  • Matt The Horwood
    link
    fedilink
    English
    1311 months ago

    Why delete the answer, why not edit it so that a human can see the answer but for AI its a load of nonsense?

  • @davel@lemmy.ml
    link
    fedilink
    English
    3811 months ago

    Good luck with the deleting. It often just means UPDATE comments SET is_deleted = 1 WHERE ID = 666;.

    • chiisana
      link
      fedilink
      1411 months ago

      There was similar things done on Reddit during the big exit. I doubt it achieved what people expected it to achieve. Even if they’re not visible externally, I’m sure they can easily access (thereby make deals to license) the data out of their backend / backup; just a matter of how hard they want to try (hint: it’s really not very hard).

      • @duncesplayed@lemmy.one
        link
        fedilink
        English
        1511 months ago

        Yeah during the reddit exodus, people were recommending to overwrite your comment with garbage before deleting it. This (probably) forces them to restore your comment from backup. But realistically they were always going to harvest the comments stored in backup anyway, so I don’t think it caused them any more work.

        If anything, this probably just makes reddit’s/SO’s partnership more valuable because your comments are now exclusive to reddit’s/SO’s backend, and other companies can’t scrape it.

        • Lemongrab
          link
          fedilink
          1011 months ago

          It was to make the data inaccessible to general people, therefore removing the reason people visit reddit. Even if reddit could still get the data, regular people would be inconvenienced (in theory) and look somewhere else.

    • plz1
      link
      fedilink
      English
      511 months ago

      They are not deleting, they are editing. So the platform would have to undo those edits rather than just flipping the visibility flag.

  • @darkphotonstudio@beehaw.org
    link
    fedilink
    4411 months ago

    I think people would have less issues with AI training if it was non-profit and for the common good. And there are open source AI projects, many in fact. But yeah, these deals by companies like this are sleazy.

  • @stembolts@programming.dev
    link
    fedilink
    139
    edit-2
    11 months ago

    This is similar to when I heard reddit was doing the API lockdown, I wrote an automation bot over the weekend that self-destructed my subreddit and the entire post history. The bot also automatically downloaded and archived all of the content on my local machine.

    It was annoying because at first I couldn’t get access to older posts since at the time reddit had changed their API to only show the first X posts (100 or 1,000 or whatever). So I told my bot to delete the posts as it archived them so as I deleted content, reddit had no choice but to populate the page with the older posts.

    And that’s how I archived my subreddit. Reddit banned me two days later for automation, lol. I did not break any of the reddit or reddit api ToS during this process but I guess I upset someone.

    • ubergeek77
      link
      fedilink
      2711 months ago

      I don’t think I’ve been banned, but I did a similar thing. I requested all my data from Reddit, then used that list of comment/post IDs to mass-edit them. I think I’m in the clear because I used the official third party API, with an official “app.” If you used the private API or instrumented this via the browser, that may be why you were banned.

      Anyway, if you or someone else wants their full history, Reddit will give it to you via a data export request.

    • @GBU_28@lemm.ee
      link
      fedilink
      English
      1911 months ago

      Unfortunately they still have everything. It’s good for the “human” visibility (lack of) but they have the data still

  • HexesofVexes
    link
    fedilink
    811 months ago

    I mean, here is a thought, if an AI tool uses creative commons data, then it’s derivatives fall under creative commons. I.e. stop charging for AI tools and people will stop complaining.

  • @baseless_discourse@mander.xyz
    link
    fedilink
    17
    edit-2
    11 months ago

    This is a violation of GDPR, no?

    EDIT: user created content is not directly protected under GDPR, only personally identifiable data is pertected under GDPR.

    • lemmyreaderOP
      link
      fedilink
      English
      1611 months ago

      Dunno. GDPR is a Europe only thing, and isn’t it only related to how your private data (like name, IP address, phone number) is cared about ?

      • @AccountMaker@slrpnk.net
        link
        fedilink
        711 months ago

        Right, I think it only covers personal information: companies can only collect what they need to run their service, users can request to see their data etc. I don’t think it applies to comments and posts.

      • Captain Beyond
        link
        fedilink
        311 months ago

        I would certainly hope so. Stack Overflow content is Creative Commons licensed, so the argument is basically that the GDPR would take precedence over the CC license grant. It’d be scary if GDPR could be weaponized against forks of free software projects in this manner.

        • @flux@lemmy.ml
          link
          fedilink
          411 months ago

          Would that kind of provision allow me to have my code removed from a git repository history, if that git repository is hosted by a company?

          • @baseless_discourse@mander.xyz
            link
            fedilink
            1
            edit-2
            11 months ago

            I am not a lawyer, but I believe in general, yes.

            Git is not even that convoluted, as all the history is stored in the .git folder within the repo. Unless there is some convoluted structure built on top, they would only need to move the repo folder to a trash disk, waiting to be formated.

            That being said, GDPR is somewhat poorly enforced at the moment, unfortunately. I don’t know if you can sue the company and expect some result within couple of years.

          • @baseless_discourse@mander.xyz
            link
            fedilink
            311 months ago

            I am not a expert or a lawyer, but I believe user actually hold the right to completely erase personal data:

            The data subject shall have the right to obtain from the controller the erasure of personal data concerning him or her without undue delay and the controller shall have the obligation to erase personal data without undue delay

            https://gdpr.eu/right-to-be-forgotten/

            Note the word “erasure” as opposed to “anonymize”

            • @WldFyre@lemm.ee
              link
              fedilink
              511 months ago

              I don’t think that addresses my point. Is my opinion on the new Star Wars movies that I post online or some lines of code I suggest “personal data”? I thought personal data had a specific definition under GDPR

              • Spaenny
                link
                fedilink
                211 months ago

                Technically, they could retain posts from users if they are irreversibly anonymized. However, ensuring with 100% certainty that none of your posts ever contained any personal data that could lead to the identification of you as an individual is challenging. The safest option is therefore to also delete your posts.

              • @baseless_discourse@mander.xyz
                link
                fedilink
                311 months ago

                I think you are right, user generated content doesn’t seem to be protected. This is surprising to me, as user should hold the right to their content, which in my mind should enjoy stronger protection than personal data.

              • @nefonous@lemmy.world
                link
                fedilink
                511 months ago

                You’re totally right, the content of your posts is not considered personal data (because it isn’t) It’s more about profiling data that can be connected back to your actual person

    • @refalo@programming.dev
      link
      fedilink
      111 months ago

      How does GDPR get away with not defining what a website is when referring to them directly in the law? Like what counts, only html? http? ftp? gopher?

  • @drunkpostdisaster@lemmy.world
    link
    fedilink
    511 months ago

    This shit scares me. It will become so easy to rewrite history from here. Just delete anything you don’t like and have an ai rewrite into whatever you want. Entire threads rewritten, a company can go back and have your entire post history can be changed in ways that might be legally compromising.

  • Sibbo
    link
    fedilink
    611 months ago

    Does GDPR apply to stackoverflow? Since my data there probably does not identify me as a person?

      • ddh
        link
        fedilink
        English
        911 months ago

        Can’t wait until the top answer to every Google search is “just google it”

    • LostXOR
      link
      fedilink
      7
      edit-2
      11 months ago

      The other 20% is mostly high quality however, and I’m sure they’d filter out the heavily downvoted crud.

    • @mnemonicmonkeys@sh.itjust.works
      link
      fedilink
      English
      611 months ago

      One time I was went on there to figure out an issue in Arduino. The answer one guy gave was “I don’t know how to do this in Arduino, here’s how you do this in Java”. Not only the the mods prevent any other answers from being posted, I tried the guy’s suggestion in Java and it didn’t even work

  • @henfredemars@infosec.pub
    link
    fedilink
    English
    8211 months ago

    I feel like this content craze is going to evaporate soon because all the new content from here forward is sure to be polluted by LLM output already. AI is fast becoming a snake eating its own tail.

    That reminds me. I should go update my licenses to spit in the face of AI training companies.