The Outage

UPDATE: Emails should be working again! If you were trying to create/recover an account, you may need to try requesting another confirmation email.
Hi Everyone,

We are back online now and the site is open to be used again.
This post is to be transparent and detail what happened and how we fixed it.
In a later post I will go over how we want to move forward.
There were a few issues we had to fix and I will go into deep technical details.

  1. What was originally broken. We still do not know. I presume the issue was the host on which the forum was on. we resolved this by moving to a new server.

  2. The things we knew were broken was a few permission issues when building the image. The site could not communicate to postgres and redis because the owner of the folder/files didn’t exist. They had mismatched IDs from a previous docker image.

  3. Needed to update the host OS to have the correct version of Ruby. I changed the package repos from focal to jammy and ran the update, this would ultimately not fix the dependancy issues. but it did fix a failing build of the image.

  4. Unicorn webserver was throwing an error about a secret_base_key that was missing. We don’t handle any sensitive tokens in the webserver so no idea why this was being thrown. A friendly reddit user helped me fix this issue.

  5. When we were considering moving to a new server we had an issue with implementation. The DNS records were set with a 24 Hour TTL, which meant moving to the new server would take 24 hours. We ultimately did wait for this and now the TTL is much more reasonable.

  6. The discourse forum version The way discourse runs is it depends on lot of things. the ruby version on the host, the version of discourse itself, and anything the plugins depend on. working through dependancy hell was fixed with a combination of moving from the master branch to the main branch and changing the host OS from Ubuntu Focal to Ubuntu Jammy.

  7. Restoring from backup. Some data might have been lost. we restored from backup as of date and time 2024-01-05-060527

  8. Website assets. The site uses a CDN provider to serve assets that don’t change much. While fixing issues we purged the cache to re-populate the cache to the backup version. This took a while but after some time the site no longer had any cache misses and no error 500s. This is considered resolved now. Hence why we opened the site.

This was the reddit thread that I posted my updates on:
https://www.reddit.com/r/theHandy/comments/18zroyc/comment/kgs7itz/?context=3
I wanted to specifically thank pascaruchan for giving me some assistance with the ruby on rails issue I was not familiar with.

There is still things to do to make the site more resilient to failures. But I will be discussing with the team on things we can do.

Finally, please remember, we are all volunteers. The site makes no real money, the only thing keeping it running is the patreon. Please consider donating so we can make the site better.
This thread is open for questions if anyone has any.

127 Likes

Thx for the Hard Work! :clap: :pray:
Super happy we can roll again! :skull: :metal:

3 Likes

Thanks for working so hard on getting the site back up and being so transparent about what happened.

You didn’t owe us this much information, but it’s great to know that EroScripts is in the hands of capable people who care.

8 Likes

Cannot say how much we appreciate this. Thank you from the bottom of our hearts… and pants.

10 Likes

Thanks for all the hard work, appreciate it!

2 Likes

Big shout out to vlad for swooping in and helping with a lot of the deeper sysadmin stuff. He knows way more than me :tada:

8 Likes

Thanks for the hard work!

Like others have said, thanks so much for getting the site back up and running! I know many others appreciate the work everyone did to get the site operating again!

Thanks for the work to everyone involved, I’m sure you had better things planned this weekend than fixing the site.

Man it’s always annoying when something breaks in a site and you have to do a ridiculous amount of extra work to get it back to working. Thank you for your amazing work getting the site back up this smoothly!

2 Likes

Thanks for the hard work!. I also found my way to that Reddit thread when it happened. Which had me thinking. Is there any place where the admins post status like these? Maybe a Twitter/X account or something like that? If not I’ll keep an eye r/thehandy whenever this happens (Hopefully it never does lol)

yeah!!
gorushi

1 Like

I want to echo the others in thanking everyone involved for the hard work.

1 Like

Many thanks to you and all involved in bringing the site back up, and in such a short time!

The transparency around the technical challenge is much appreciated.

Looking forward to the next post.

Thank you for the hard work in getting the site back up!

Great job guys! Thanks a lot! :clap:

i was having flash backs on the realtouchscript era

2 Likes

Thanks for all the work getting things back up!
I was a bit stumped that the site was down when I was about to create a release post for a new 18VR script I had finished. Now I can finally post 18VR - Little Chloe - The Bare Au Pair (or rather later today since I’ve returned to work after the holidays).

7 Likes