Users may experience slow loading or timing out on Client Scheduler
Incident Report for TimeTap
Postmortem

We upgraded our postgres database from 9.6 to 12.8 on 12/25/2021, as that is the leanest day of the year in terms of site traffic. While we had to reboot the database, we also used this opportunity to upgrade the instance from db.r5.4xlarge to db.r6g.8xlarge, which provides significant power increase. On r5.4xlarge servers we were using about 25-30% capacity, but we still decided to upgrade servers, so they can provide improved performance for an extended time.

We had been running on postgres 12.8 in our test environment for more than a month as a test scenario. The entire week after the upgrade (also a lean time), we experienced no issues. On 1/3/2022, when site traffic picked up again after the holiday week, we experienced an issue with reading repeating class schedules resulting in slowness on the client scheduler. The issue was only with this one query and only on the read replica. The master (writer) database was running fine. We added additional read replicas on 1/3/2022 and the issue was (temporarily) resolved. On 1/4/2022, a number of accounts reported the same slowness, but not only with repeating class schedules. We noticed that the issue was with only one read replica and other read replica was fine. We dropped that server and replaced It with new server as well as enabled Cluster Cache and the issue was resolved again.
Since this change, the servers have been running under capacity, and users have not experienced the slowness in load times.

Posted Jan 05, 2022 - 16:50 EST

Resolved
This incident has been resolved.
Posted Jan 04, 2022 - 15:34 EST
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Jan 04, 2022 - 13:25 EST
Update
We are continuing to investigate this issue.
Posted Jan 04, 2022 - 13:24 EST
Investigating
We are currently investigating the issue.
Posted Jan 04, 2022 - 12:13 EST
This incident affected: US (Client Scheduler - US).