EMEA instances intermittently experiencing "400 Bad Request" errors when accessing Admin Console
Introduction:
On August 8, 2024 between 12:06 UTC - 13:27 UTC some EMEA customers experienced “400 Bad Request” errors when performing any action requiring database connection.
Issue Summary:
Connections were held up due to a portion of code which, when under heavy load, resulted in a reduction of performance within the database. A slow query backed up requests which collectively overwhelmed the database.
Resolution:
We implemented a roll back to a previous version of code. Going forward, changes will be further scrutinized to ensure that the code is optimized and that it will perform well under high volumes of requests.
Root Cause:
A stored procedure thought to be safe did not perform well under high volumes of requests.
Solution and Mitigation:
The changes were rolled back to get customers running smoothly again. We are looking into better ways to monitor changes and to be notified more quickly when abnormal errors start occurring. The changes will be scrutinized and modified to ensure that requests are handled smoothly. Lastly, we are implementing additional process controls to simulate production behavior with high traffic volume to ensure our queries will perform well.
Conclusion:
We recognize we specifically impacted our customers in EMEA during their normal working hours, as well as our customers who are 24/7 businesses. We’re committed to discovering new roll out strategies that keep in mind all customers’ working hours. We are also committed to investigating our architecture to prevent smaller services being able to lock up core functionality. We thank you all for your patience and we are eager to proactively improve our systems to continue to help all on their digital transformation journey.