[13:27:51] hello folks! [13:28:04] kafka on kafka-logging1005 should be restarted to pick up the new TLS cert [14:56:30] Hi elukey , it's the process described in "Safe Broker Restarts" https://wikitech.wikimedia.org/wiki/Kafka/Administration right? [14:58:12] denisse: hi! In this case I think that we can just restart that broker, the other ones should already have the new certs [14:58:16] but we should check [14:58:40] otherwise you can run the cookbook and restart the whole cluster safely, but it will take more [15:17:50] thanks elukey, just bounced the kafka-logging1005 broker [15:17:50] (ThanosCompactIsDown) firing: Thanos component has disappeared. - https://wikitech.wikimedia.org/wiki/Thanos#Alerts - https://grafana.wikimedia.org/d/0cb8830a6e957978796729870f560cda/thanos-overview - https://alerts.wikimedia.org/?q=alertname%3DThanosCompactIsDown [15:18:21] thanks! [15:18:36] ^ acking we just started codfw titan hw upgrades [15:33:20] (ThanosCompactIsDown) resolved: Thanos component has disappeared. - https://wikitech.wikimedia.org/wiki/Thanos#Alerts - https://grafana.wikimedia.org/d/0cb8830a6e957978796729870f560cda/thanos-overview - https://alerts.wikimedia.org/?q=alertname%3DThanosCompactIsDown [17:23:56] Hey Observability...I'm just starting my investigation, but if y'all have seen this before LMK T361862 (unexpectedly high amount of polling from prom hosts) [17:23:58] T361862: Determine cause of unexpectedly high blackbox poller entries in wdqs nginx access logs - https://phabricator.wikimedia.org/T361862 [19:51:19] inflatador: looks like about 32 requests/min - 4 scrapes per minute (15s scrape interval), 2 scrapers per dc (prometheus100[56]), 2 ips/host (ipv[46]), 2 contacts (search|sre) [19:57:16] cwhite awesome, thank you for calculating that! Is there a way to change the scrape interval? I don't see it exposed in https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/modules/prometheus/manifests/blackbox/check/http.pp [19:59:00] I can probably add that to the puppet resource if it won't mess up the way the modules are rendered [20:00:52] the way the check is configured shares the scrape interval with many other custom probes: https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/modules/profile/manifests/prometheus/ops.pp#267 [20:09:17] ACK, i was afraid of that. Not a huge deal at the moment besides log spam, though. Thanks again for checking it out! [20:13:01] performance.wikimedia.org SSL certs have been migrated to cfssl (session with denisse) [20:13:36] this was a bit simpler as an example than graphite which has many alt names etc [20:36:03] \o/ [21:51:04] mutante: Thanks a lor for your guidance and support during the cfssl migration. :) [21:52:52] thats awesome thanks indeed!