We have many internal clusters that do certain tasks. One of those clusters is responsible for some of the new profiles we introduced a few weeks ago. All these clusters get automated and frequent but usually safe data updates at certain times.
The popularity of those new profiles caused degraded performance at this cluster which stacked up in the route optimization cluster. This made it complicated to identify the root cause. Additionally the automated data update happening at the time of our investigation caused further confusion and costs us time to identify and fix the issue.
As soon as we understood the problem in the cluster with the new profiles the route optimization cluster usually stabilizes itself, which wasn't the case this time and costed us further time as we didn't get the cluster stable.
We now need to fix those two underlying problems that caused this unacceptable long downtime.
We very sorry for the inconveniences we caused. Please let us know if you need further clarification and support from us.
Jan 23, 16:42 CET
The issue has been identified and a fix is being implemented.
Jan 23, 15:00 CET