Another day, another update.
More troubleshooting was done today. What did we do:
- Yesterday evening @phiresky@phiresky@lemmy.world did some SQL troubleshooting with some of the lemmy.world admins. After that, phiresky submitted some PRs to github.
- @cetra3@lemmy.ml created a docker image containing 3PR’s: Disable retry queue, Get follower Inbox Fix, Admin Index Fix
- We started using this image, and saw a big drop in CPU usage and disk load.
- We saw thousands of errors per minute in the nginx log for old clients trying to access the websockets (which were removed in 0.18), so we added a
return 404
in nginx conf for/api/v3/ws
. - We updated lemmy-ui from RC7 to RC10 which fixed a lot, among which the issue with replying to DMs
- We found that the many 502-errors were caused by an issue in Lemmy/markdown-it.actix or whatever, causing nginx to temporarily mark an upstream to be dead. As a workaround we can either 1.) Only use 1 container or 2.) set
proxy_next_upstream timeout;
max_fails=5
in nginx.
Currently we’re running with 1 lemmy container, so the 502-errors are completely gone so far, and because of the fixes in the Lemmy code everything seems to be running smooth. If needed we could spin up a second lemmy container using the proxy_next_upstream timeout;
max_fails=5
workaround but for now it seems to hold with 1.
Thanks to @phiresky@lemmy.world , @cetra3@lemmy.ml , @stanford@discuss.as200950.com, @db0@lemmy.dbzer0.com , @jelloeater85@lemmy.world , @TragicNotCute@lemmy.world for their help!
And not to forget, thanks to @nutomic@lemmy.ml and @dessalines@lemmy.ml for their continuing hard work on Lemmy!
And thank you all for your patience, we’ll keep working on it!
Oh, and as bonus, an image (thanks Phiresky!) of the change in bandwidth after implementing the new Lemmy docker image with the PRs.
Edit So as soon as the US folks wake up (hi!) we seem to need the second Lemmy container for performance. So that’s now started, and I noticed the proxy_next_upstream timeout
setting didn’t work (or I didn’t set it properly) so I used max_fails=5
for each upstream, that does actually work.
The change is noticeable. Good job guys.
Thanks for the updates.
I agree. Felt it immediately when I started browsing. Everything is faster and more responsive, on top of the error messages disappearing
Yup I can even post comments first try, without getting an error! Things are working well!
Really noticeable. Cool update. Thank you, guys! ❤️
I love reading these! Thanks for all the work
I hope to start on some small contributions sometime next week. Stability has been noticeably better the last few days and I imagine it’s only going to get better.
Awesome news. Thanks for all the hard work.
I just love the transparancy you guys are coming forward with. It’s absolutely awesome! Thank you for that and for all the work you put in. It means a lot to me that you folks are taking the time to keep us updated. Much love!
Submitting PRs is literally the most effective response that helps everyone who uses Lemmy. Thanks to you all.
I tried to enable push notifications and the app crashed. Also, how do I create a community?
Seems a lot faster today - great work!
Love the transparency. Thanks to the entire team!
I like that the post goes in detail and allows us tech nerds to get hard watching this stuff instead of the regular corpo jumbo change log that consists of:
- we uhh fixed some stuff so yeah good?
Well done guys!
It’s definitely a lot better today. Great work.
Thanks for the update! Things seem way speedier now ^^
You guys are amazing!
Are you guys able to create reports? I am not… It keeps spinning.
Was just noticing how much smoother it is this morning. Great work!