On December 10, 2024 between 04:48 UTC - 05:03 UTC, some US instances intermittently experienced 500 server errors when accessing instances (Admin Console, Self-service portal, Release portal).
Issue Summary:
Customers affected during the outage were slow to process user login requests, and other core platform operations.
Re-establishing connections to the caching layer from our core services.
Root Cause:
A service was experiencing intermittent connectivity issues to the caching layer of our platform which led to increased latency and timeouts.
Solution and Mitigation:
A fix was implemented to stop the immediate issue. We are continuing to enhance our monitoring efforts to determine latency issues.
We recognize this had an impact for some of our customers during their normal working hours, as well as our customers who are 24/7 businesses. We’re committed to continual improvement of our systems and are always exploring ways to limit disruption of our services.