Hey everyone,
Yesterday morning (local time in Germany), a subset of customers in our Germany region experienced a service disruption that lasted for roughly two hours.
The incident occurred early in the morning local time in Germany and was immediately detected by our operations team and escalated as a Severity 1 incident. Our team quickly worked to restore stability. All customers who were directly impacted have been notified.
After investigating the root cause, we traced the issue to a mandatory Google Cloud Platform certificate migration that was rolling out across regions.
To prevent downtime during the migration, we pre-provisioned additional infrastructure capacity ahead of the cutover. Unfortunately, one of our autoscaling partners automatically cleaned up those machines before the migration window began. That left the region with less available capacity than expected during the changeover, which ultimately caused the disruption.
The Germany region is now stable, and we've implemented additional safeguards around infrastructure capacity during maintenance events to ensure this situation cannot occur again.
Even though this incident impacted only a subset of customers and was resolved quickly, any downtime is unacceptable to us. Reliability and trust are core to what we're building at Xano, and when we fall short of that standard, it's on us to own it and improve.
Thank you to the customers who were affected for your patience, and to the community for continuing to hold us to a high bar.