We upgraded our postgres database from 9.6 to 12.8 on 12/25/2021, as that is the leanest day of the year in terms of site traffic. While we had to reboot the database, we also used this opportunity to upgrade the instance from db.r5.4xlarge to db.r6g.8xlarge, which provides significant power increase. On r5.4xlarge servers we were using about 25-30% capacity, but we still decided to upgrade servers, so they can provide improved performance for an extended time.
We had been running on postgres 12.8 in our test environment for more than a month as a test scenario. The entire week after the upgrade (also a lean time), we experienced no issues. On 1/3/2022, when site traffic picked up again after the holiday week, we experienced an issue with reading repeating class schedules resulting in slowness on the client scheduler. The issue was only with this one query and only on the read replica. The master (writer) database was running fine. We added additional read replicas on 1/3/2022 and the issue was (temporarily) resolved. On 1/4/2022, a number of accounts reported the same slowness, but not only with repeating class schedules. We noticed that the issue was with only one read replica and other read replica was fine. We dropped that server and replaced It with new server as well as enabled Cluster Cache and the issue was resolved again.
Since this change, the servers have been running under capacity, and users have not experienced the slowness in load times.