Introduction:
On October 23, 2024 between 5:45 AM and 7:12 AM MT, customers experienced 4XX status codes sporadically when accessing services in the US.
Issue Summary:
Infrastructure connectivity issues resulted in services being unable to communicate.
Resolution:
Once connectivity issues were identified, the affected infrastructure was promptly isolated away from customer traffic.
Root Cause:
During morning scaling, a portion of infrastructure experienced connectivity issues preventing services from communicating with each other.
Solution and Mitigation:
After the affected infrastructure was isolated, we observed the error rates experienced by customers clear out.
Additional monitors are being implemented to proactively identify this issue. We will continue to investigate the circumstances of this issue to prevent this issue from recurring in the future.
Conclusion:
We recognize this had an impact for our US customers during the start of the business day. We are committed to enhancing our monitoring and architecture to ensure better service availability. We thank you for your patience as we worked through this service disruption.