[07:17:56] FIRING: ProbeDown: Service mirror1001:443 has failed probes (http_mirrors_wikimedia_org_ip6) - https://wikitech.wikimedia.org/wiki/Runbook#mirror1001:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [07:22:56] RESOLVED: ProbeDown: Service mirror1001:443 has failed probes (http_mirrors_wikimedia_org_ip6) - https://wikitech.wikimedia.org/wiki/Runbook#mirror1001:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [07:28:56] FIRING: [2x] ProbeDown: Service mirror1001:443 has failed probes (http_mirrors_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#mirror1001:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [07:33:56] RESOLVED: [2x] ProbeDown: Service mirror1001:443 has failed probes (http_mirrors_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#mirror1001:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [08:00:56] FIRING: [2x] ProbeDown: Service mirror1001:443 has failed probes (http_mirrors_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#mirror1001:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [08:05:56] RESOLVED: [2x] ProbeDown: Service mirror1001:443 has failed probes (http_mirrors_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#mirror1001:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [08:59:45] 07Puppet, 10SRE-tools, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: RAID monitoring on new hardware spec requires new or updated user space cli tool - https://phabricator.wikimedia.org/T377853#10360586 (10elukey) I tried to dowload and install perccli == `007.2616.0000.0000` on ms-be2081 but no lu... [09:08:50] 07Puppet, 10SRE-tools, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: RAID monitoring on new hardware spec requires new or updated user space cli tool - https://phabricator.wikimedia.org/T377853#10360601 (10elukey) I think we could easily try to swap perccli with storcli for the host swith SAS3908 o... [09:29:58] 07Puppet, 10SRE-tools, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: RAID monitoring on new hardware spec requires new or updated user space cli tool - https://phabricator.wikimedia.org/T377853#10360614 (10MoritzMuehlenhoff) >>! In T377853#10360612, @MoritzMuehlenhoff wrote: > There are debs availa... [09:30:13] 07Puppet, 10SRE-tools, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: RAID monitoring on new hardware spec requires new or updated user space cli tool - https://phabricator.wikimedia.org/T377853#10360612 (10MoritzMuehlenhoff) There are debs available in the Thomas Krenn repo (German server vendor):... [09:36:14] 07Puppet, 10SRE-tools, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: RAID monitoring on new hardware spec requires new or updated user space cli tool - https://phabricator.wikimedia.org/T377853#10360637 (10MoritzMuehlenhoff) One other option is to try https://github.com/namiltd/megactl with this co... [14:52:21] 10netops, 06Infrastructure-Foundations, 10Prod-Kubernetes, 06serviceops: WikiKube clusters close to exhausting Calico IPPool allocations - https://phabricator.wikimedia.org/T375845#10361841 (10JMeybohm) We're not expecting any more replacements/expansions for wikikube this FY. So we can switch to the `/17`... [16:35:07] 07Puppet, 06cloud-services-team, 10Cloud-VPS, 06Infrastructure-Foundations: Puppet removed "nameserver" line from /etc/resolv.conf - https://phabricator.wikimedia.org/T379927#10362431 (10fnegri) [16:51:12] 10SRE-tools, 06SRE, 10Data-Platform-SRE (2024.11.09 - 2024.11.29), 03Discovery-Search (Current work): Create cookbook to reindex into elasticsearch / cirrus - https://phabricator.wikimedia.org/T219507#10362546 (10bking) 05Open→03Resolved a:03bking [19:00:57] 10netops, 10Ceph, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Configure DSCP marking for cloudceph* hosts - https://phabricator.wikimedia.org/T371501#10363115 (10dcaro) A quick search did not find any reference for the mon option on the upstream ceph, but found a commit on a clone: http://w... [19:21:48] 07Puppet, 06cloud-services-team, 10Cloud-VPS, 06Infrastructure-Foundations: Puppet removed "nameserver" line from /etc/resolv.conf - https://phabricator.wikimedia.org/T379927#10363222 (10Andrew) This has not recurred. Nevertheless we should figure out what's happening with the ruby functions that don't rai... [19:22:11] 10SRE-tools, 06Infrastructure-Foundations, 10Spicerack: Tab completion for cookbook names - https://phabricator.wikimedia.org/T367230#10363226 (10Volans) @JMeybohm that practically covers the current production use case, but is not future proof as it doesn't cover all the generic cases. Hence why I said I wa... [19:22:25] FIRING: SystemdUnitFailed: update-tails-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [22:22:25] RESOLVED: SystemdUnitFailed: update-tails-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed