Introduction:
On December 10, 2024 between 04:48 UTC - 05:03 UTC, some US instances intermittently experienced 500 server errors when accessing instances (Admin Console, Self-service portal, Release portal).
Issue Summary:
Customers affected during the outage were slow to process user login requests, and other core platform operations.
Resolution:
Re-establishing connections to the caching layer from our core services.
Root Cause:
A service was experiencing intermittent connectivity issues to the caching layer of our platform which led to increased latency and timeouts.
Solution and Mitigation:
A fix was implemented to stop the immediate issue. We are continuing to enhance our monitoring efforts to determine latency issues.
Conclusion:
We recognize this had an impact for some of our customers during their normal working hours, as well as our customers who are 24/7 businesses. We’re committed to continual improvement of our systems and are always exploring ways to limit disruption of our services.