An ongoing Cloudflare outage has taken down many of its products, including the company’s dashboard and related application programming interfaces (APIs) customers use to manage and read service configurations.
The complete list of services whose functionality is wholly or partially impacted includes the Cloudflare dashboard, the Cloudflare API, Logpush, WARP / Zero Trust device posture, Stream API, Workers API, and the Alert Notification System.
“This issue is impacting all services that rely on our API infrastructure including Alerts, Dashboard functionality, Zero Trust, WARP, Cloudflared, Waiting Room, Gateway, Stream, Magic WAN, API Shield, Pages, Workers,” Cloudflare said.
“Customers using the Dashboard / Cloudflare APIs are impacted as requests might fail and/or errors may be displayed.”
Customers currently have issues when attempting to log into their accounts and are seeing ‘Code: 10000’ authentication errors and internal server errors when trying to access the Cloudflare dashboard.
Cloudflare says the service issues don’t affect the cached file delivery via the Cloudflare CDN or Cloudflare Edge security features.
Data center power outage behind dashboard and API issues
Two hours into the outage, the company revealed that the ongoing issues are due to power outages at multiple data centers.
“Cloudflare is assessing a loss of power impacting data centres while simultaneously failing over services. We will keep providing regular updates until the issue is resolved, thank you for your patience as we work on mitigating the problem,” an incident report update said.
This is the second large outage that has hit Cloudflare since the start of the week, with the first one taking down multiple products, including Cloudflare Sites and Services (Access, CDN Cache Purge, Dashboard, Images, Pages, Turnstile, Waiting Room, WARP, Workers KV) on Monday, October 30.
As the company explained in a post-mortem published two days later, the Monday outage was caused by a misconfiguration in the tool used to deploy a new Workers KV build.
Workers KV is “used by both customers and Cloudflare teams alike to manage configuration data, routing lookups, static asset bundles, authentication tokens, and other data that needs low-latency access,” Cloudflare’s Matt Silverlock and Kris Evans said.
“During this incident, KV returned what it believed was a valid HTTP 401 (Unauthorized) status code instead of the requested key-value pair(s) due to a bug in a new deployment tool used by KV.”
Update November 02, 20:12 EDT: A Cloudflare spokesperson told BleepingComputer that the root cause of this ongoing outage is a regional power issue caused by generator failures that took down facilities offline.
“We operate in multiple redundant data centers in Oregon that power Cloudflare’s control plane (dashboard, logging, etc). There was a regional power issue that impacted multiple facilities in the region. The facilities failed to generate power overnight. Then, this morning, there were multiple generator failures that took the facilities entirely offline,” the spokesperson said.
“We have failed over to our disaster recovery facility and most of our services are restored. This data center outage impacted Cloudflare’s dashboards and APIs, but it did not impact traffic flowing through our global network. We are working with our data center vendors to investigate the root cause of the regional power outage and generator failures. We expect to publish multiple blogs based on what we learn and can share those with you when they’re live.”
Cloudflare’s multiple redundant data centers in Oregon that power the control plane experienced a regional power issue. This resulted in the facilities failing to generate power overnight and multiple generator failures this morning, causing complete facility outage. However, Cloudflare has successfully switched over to their disaster recovery facility and most services have been restored. The outage affected Cloudflare’s dashboards and APIs, but not the traffic flowing through their global network. Cloudflare is currently collaborating with their data center vendors to investigate the cause of the power and generator failures and plans to share their findings through upcoming blog posts.