Hey everyone — wanted to let you know that earlier today (12:39–12:52 PM PST), a subset of our legacy clusters experienced a brief service disruption caused by a networking bug introduced during a routine infrastructure migration. Our team resolved it within 13 minutes and all systems are fully operational.
I want to personally apologize for any disruption this caused. Reliability is a core commitment we make to all of you, and even a brief outage isn't acceptable to us. We're implementing additional safeguards and automated test coverage to make sure something like this doesn't reach production again.
Thanks for your patience and for being part of this community.
A note from Sean Montgomery (CTO) regarding today's service disruption
8 replies