Fess up. You know it was you.

  • @sloppy_diffuser@sh.itjust.works
    link
    fedilink
    English
    71 year ago

    Accidentally announced a /12 of IPv6 on a bad copy-paste of a /127.

    Started appending a verification line after interface configs to make sure I never missed a trailing character again.

    Took 3 months for anyone to notice (circa 2015).

  • @hperrin@lemmy.world
    link
    fedilink
    251 year ago

    I fixed a bug and gave everyone administrator access once. I didn’t know that bug was… in use (is that the right way to put it?) by the authentication library. So every successful login request, instead of being returned the user who just logged in, was returned the first user in the DB, “admin”.

    Had to take down prod for that one. In my four years there, that was the only time we ever took down prod without an announcement.

  • @Clent@lemmy.world
    link
    fedilink
    191 year ago

    UPDATE ON articles SET status = 0 WHERE body LIKE ‘%…%’

    On master production server, running myisam, against a text column, millions of rows.

    This causes queries to stack because table locks

    Rather than waiting for the query to finish. a slave was promoted to master.

    Lesson: don’t trust mysqladmin to not do something bad.

    • EmasXP
      link
      fedilink
      21 year ago

      Table locks can be a real pain. You know you need to do the change, but the system is constantly running queries towards it. Now days it’s a bit easier with algorithm=inplace and lock=none, but in the good old days you were on your own. Your only friend was luck. Large migrations like that still gives me shivers

  • @zubumafu_420@infosec.pub
    link
    fedilink
    211 year ago

    Early in my career as a cloud sysadmin, shut down the production database server of a public website for a couple of minutes accidentally. Not that bad and most users probably just got a little annoyed, but it didn’t go unnoticed by management 😬 had to come up with a BS excuse that it was a false alarm.

    Because of the legacy OS image of the server, simply changing the disk size in the cloud management portal wasn’t enough and it was necessary to make changes to the partition table via command line. I did my research, planned the procedure and fallback process, then spun up a new VM to test it out before trying it on prod. Everything went smoothly except on the moment I had to shut down and delete the newly created VM, I instead shut down the original prod VM because they had similar names.

    Put everything back in place, and eventually resized the original prod VM, but not without almost suffering a heart attack. At least I didn’t go as far as deleting the actual database server :D

    • @lightnsfw@reddthat.com
      link
      fedilink
      41 year ago

      I did my research, planned the procedure and fallback process, then spun up a new VM to test it out before trying it on prod

      Went through a similar process when I was resizing some partitions on my media server. On the test run I forgot to specify G on the new size so it defaulted to MB when I resized it. Resulting in a 450gb partition going down to 400mb. I was real glad I tested that out first.

    • @marito@lemmy.world
      link
      fedilink
      81 year ago

      I tried to change ONE record in the production db but I forgot the WHILE clause, ended up changing over 2 MILLION records instead. Three hour production shutdown. Fun times.

  • Call me Lenny/Leni
    link
    fedilink
    English
    31 year ago

    Forgive me, but that’s a figure of speech I’ve never heard before. What does it mean?

    • @RacerX@lemm.eeOP
      link
      fedilink
      51 year ago

      By breaking production, I’m referring to a situation where someone, most likely in a technical job, broke a system that was intended to be responsible for the operation for some kind of service. Most of the responses here, which have been great to read, are about messing up things like software, databases, servers and other hardware.

      Stuff happens and we all make mistakes. It’s what you take away from the experience that matters.

  • @FigMcLargeHuge@sh.itjust.works
    link
    fedilink
    English
    131 year ago

    Was doing two deployments at the same time. On the first one, I got to the point where I had to clear the cache. I was typing out the command to remove the temp folder, and looked down at the other deployment instructions I had in front of me, and typed the folder for the prod deployments and hit enter, deleting all of the currently installed code. It was a clustered machine, and the other machine removed it’s files within milliseconds. When I realized what I had done, I just jumped up from my desk and said out loud “I’m fired!!” over and over. Once I calmed down, I had to get back on the call and ask everyone to check their apps. Sure enough they were all failing. I told them what I had done, and we immediately went to the clustered machine and files were gone there too. It took about 8 hours for the backup team to restore everything. They kept having to go find tapes to put in the machine, and it took way longer than anyone expected. Once we got the files restored, well we determined that we were all back to the previous day, and everyone’s work from that night was all gone, so we had to start the nights deployments over. I got grilled about it, and had to write a script to clear the cache from that point on. No more manually removing files. The other thing that came out of this for the good was no more doing two deployments at the same time. I told them exactly what happened and that when you push people like this, mistakes get made.

  • @Albbi@lemmy.ca
    link
    fedilink
    221 year ago

    Broke teller machines at a bank by accidentally renaming the server all the machines were pointed to. Took an hour to bring back up.

  • @pastermil@sh.itjust.works
    link
    fedilink
    221 year ago

    I acidentally destroyed the production system completely thru improper partition resize. We got the database snapshot, but it’s in that server as well. After scrambling around for half a day, I managed to recover some of the older data dumps.

    So I spun up the new server from scratch, restored the database with some slightly outdated dump, installed the code (which was thankfully managed thru git), and configured everything to run all in an hour or two.

    The best part: everybody else knows this as some trivial misconfiguration. This happened in 2021.

  • @necrobius@lemm.ee
    link
    fedilink
    241 year ago
    1. Create a database,
    2. Have organisation manually populated it with lots of records using a web app,
    3. accidentally delete database.

    All in between the backup window.

  • Rob Bos
    link
    fedilink
    271 year ago

    Plugged a serial cable into a UPS that was not expecting RS232. Took down the entire server room. Beyoop.

  • @theluddite@lemmy.ml
    link
    fedilink
    English
    181 year ago

    This is nowhere near the worst on a technical level, but it was my first big fuck up. Some 12+ years ago, I was pretty junior at a very big company that you’ve all heard of. We had a feature coming out that I had entirely developed almost by myself, from conception to prototype to production, and it was getting coverage in some relatively well-known trade magazine or blog or something (I don’t remember) that was coming out the next Monday. But that week, I introduced a bug in the data pipeline code such that, while I don’t remember the details, instead of adding the day’s data, it removed some small amount of data. No one noticed that the feature was losing all its data all week because it still worked (mostly) fine, but by Monday, when the article came out, it looked like it would work, but when you pressed the thing, nothing happened. It was thankfully pretty easy to fix but I went from being congratulated to yelled at so fast.

  • 𝕱𝖎𝖗𝖊𝖜𝖎𝖙𝖈𝖍
    link
    fedilink
    66
    edit-2
    1 year ago

    Accidentally deleted an entire column in a police department’s evidence database early in my career 😬

    Thankfully, it only contained filepaths that could be reconstructed via a script. But I was sweating 12+1 bullets. Spent two days rebuilding that.

  • Futs
    link
    fedilink
    151 year ago

    Advertised an OS deployment to the ‘All Wokstations’ collection by mistake. I only realized after 30 minutes when peoples workstations started rebooting. Worked right through the night recovering and restoring about 200 machines.