We’ve experienced heavy load and unresponsiveness on some of our services (e.g. Gitlab and CollecTor) leading to outages and disruptions.
The issue seems to have resolved itself, investigation seemed to show this was a routing issue upstream.
Update: issue have crept up again, root cause was elevated temperature with the hard drives on the affected server. Upstream has replaced fans in the server and situation has returned to normal.
See issue tpo/tpa/team#41429 for detailed analysis and updates.
Last updated: December 11, 2023 at 3:27 PM UTC