Introduction:
On October 29, 2024 between 20:30 UTC - 21:30 UTC, some customers with instances hosted in the US region experienced “400 Bad Request”.
Issue Summary:
Customers affected during the outage were unable to process user login requests, or other core requests. One of our database clusters was experiencing high latency which led to connectivity issues.
Resolution:
Failing over the database cluster restored connectivity, while monitoring the startup of the database engineers observed high read activity and modified memory configuration in order to decrease latency and allow the system to stabilize.
Root Cause:
High read load caused latency and connectivity issues.
Solution and Mitigation:
A fix was implemented to stop the immediate issue. We are continuing our effort to load-balance and scale the environment to support large influxes of requests.
Conclusion:
We recognize this caused an impact for some of our customers during their normal working hours, as well as our customers who are 24/7 businesses. We’re committed to continual improvement of our systems and are always exploring ways to limit disruption of our services.