[00:08:34] RESOLVED: DiskSpace: Disk space centrallog2002:9100:/srv 3.981% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=centrallog2002 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [09:08:12] FIRING: ThanosQueryHttpRequestQueryRangeErrorRateHigh: Thanos Query is failing to handle requests. - https://wikitech.wikimedia.org/wiki/Thanos#Alerts - https://grafana.wikimedia.org/d/af36c91291a603f1d9fbdabdd127ac4a/thanos-query - https://alerts.wikimedia.org/?q=alertname%3DThanosQueryHttpRequestQueryRangeErrorRateHigh [09:09:33] ^^ checking [09:13:12] RESOLVED: ThanosQueryHttpRequestQueryRangeErrorRateHigh: Thanos Query is failing to handle requests. - https://wikitech.wikimedia.org/wiki/Thanos#Alerts - https://grafana.wikimedia.org/d/af36c91291a603f1d9fbdabdd127ac4a/thanos-query - https://alerts.wikimedia.org/?q=alertname%3DThanosQueryHttpRequestQueryRangeErrorRateHigh [14:11:41] I'm having a look at T407185. Does anyone here have experience with rebalancing kafka-logging? I have a rebalancing plan that will reduce the storage spread from 726GB to 29GB (60.84% -> 1.90%), which should also help with the traffic imbalance. I'm looking for a potential return of experience of what reassignment throttle has worked for this [14:11:41] cluster in the past. Or said another way, are these hosts equipped with 10G NIC or 1G only? [14:11:42] T407185: Fix Kafka replicas skew - https://phabricator.wikimedia.org/T407185 [14:12:44] herron: sorry, I somehow missed the fact that you had claimed the task. Do you want me to leave it to you? I'm happy to share the rebalancing plan I've generated, if that helps [20:04:48] FIRING: PuppetFailure: Puppet has failed on logging-hd2005:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [20:44:48] RESOLVED: PuppetFailure: Puppet has failed on logging-hd2005:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [22:12:34] FIRING: DiskSpace: Disk space centrallog2002:9100:/srv 3.978% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=centrallog2002 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace