[00:04:13] (DiskSpace) resolved: Disk space puppetmaster1001:9100:/ 5.005% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=puppetmaster1001 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [01:16:16] (NodeTextfileStale) firing: (3) Stale textfile for puppetserver1001:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale [05:16:16] (NodeTextfileStale) firing: (3) Stale textfile for puppetserver1001:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale [09:12:37] 10netops, 10Infrastructure-Foundations, 10SRE: Announce internal/core routes from CRs to L3 switches - https://phabricator.wikimedia.org/T344547 (10cmooney) 05Resolved→03Open p:05Low→03Medium [09:16:16] (NodeTextfileStale) firing: (3) Stale textfile for puppetserver1001:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale [09:17:13] 10netops, 10Infrastructure-Foundations, 10SRE: Announce internal/core routes from CRs to L3 switches - https://phabricator.wikimedia.org/T344547 (10cmooney) After deployment in Codfw I noticed an issue which is affecting our EVPN switches. The problem isn't anything to do with EVPN, but more the fact that o... [09:48:56] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Announce internal/core routes from CRs to L3 switches - https://phabricator.wikimedia.org/T344547 (10cmooney) Routing now looks ok, for instance in esams to the loopbacks of each CR: ` cmooney@asw1-bw27-esams> show route 185.15.59.128/32... [10:03:25] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Announce internal/core routes from CRs to L3 switches - https://phabricator.wikimedia.org/T344547 (10cmooney) 05Open→03Resolved [10:17:50] 10netops, 10Infrastructure-Foundations, 10SRE: Do we need to generate aggregates for LVS service IP ranges - https://phabricator.wikimedia.org/T350354 (10cmooney) p:05Triage→03Low [10:18:15] 10netops, 10Infrastructure-Foundations, 10SRE: Do we need to generate aggregates for LVS service IP ranges? - https://phabricator.wikimedia.org/T350354 (10cmooney) [10:55:52] 10netops, 10Infrastructure-Foundations, 10SRE, 10Traffic-Icebox, 10Patch-For-Review: Create Generalised blocking strategy - https://phabricator.wikimedia.org/T270618 (10jbond) > think it would be better if we close this and create smaller tickets with more focused scope. i don't think we need to close t... [11:01:00] 10netops, 10Infrastructure-Foundations, 10SRE: Announce internal/core routes from CRs to L3 switches - https://phabricator.wikimedia.org/T344547 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=c5da2d0a-c4af-4f96-b651-e1b326898629) set by cmooney@cumin1001 for 2:00:00 on 34 host(s) and the... [11:23:07] 10netops, 10Infrastructure-Foundations, 10SRE: Announce internal/core routes from CRs to L3 switches - https://phabricator.wikimedia.org/T344547 (10cmooney) One other observation is that the MED setting does not optimize the outbound path where we are using EVPN. One might hope that a LEAF switch, learning... [12:02:02] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Bring codfw row A-B EVPN switches live and make them gateway for existing Vlans - https://phabricator.wikimedia.org/T347191 (10cmooney) @papaul hoping to tackle these in this order, want to do the asw-a ones first, then the asw-b ones. |Order|ASW... [13:16:16] (NodeTextfileStale) firing: (3) Stale textfile for puppetserver1001:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale [13:26:01] (NodeTextfileStale) firing: (3) Stale textfile for puppetserver1001:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale [13:29:48] 10SRE-tools, 10Infrastructure-Foundations, 10Puppet-Core, 10SRE, and 2 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619 (10jbond) [13:45:28] 10SRE-tools, 10Infrastructure-Foundations, 10Puppet-Core, 10SRE, and 2 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619 (10MoritzMuehlenhoff) [14:07:35] 10SRE-tools, 10Infrastructure-Foundations, 10Puppet-Core, 10SRE, and 2 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619 (10jbond) [14:59:22] 10SRE-tools, 10Infrastructure-Foundations, 10Puppet-Core, 10SRE, and 2 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619 (10jbond) [15:53:05] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw, 10Patch-For-Review: Bring codfw row A-B EVPN switches live and make them gateway for existing Vlans - https://phabricator.wikimedia.org/T347191 (10Papaul) @cmooney the order works for me [15:55:03] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw, 10Patch-For-Review: Bring codfw row A-B EVPN switches live and make them gateway for existing Vlans - https://phabricator.wikimedia.org/T347191 (10cmooney) Row A Steps Detail: P53131 Row B Steps Detail: P53132 [15:58:25] 10netops, 10Infrastructure-Foundations, 10Traffic, 10Patch-For-Review: Create Generalised blocking strategy - https://phabricator.wikimedia.org/T270618 (10BCornwall) [16:21:12] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Migrate atlas-codfw from asw-a1-codfw to lsw1-a1-codfw - https://phabricator.wikimedia.org/T348159 (10Papaul) @cmooney cable is in place connected to lasw1-a2-codfw ge-0/0/46 ID 00756 [16:47:38] sorry folks missed the team meeting, I was chatting with paravoid about the crun bug and lost track of time [16:48:46] (SystemdUnitFailed) firing: netbox_report_accounting_run.service Failed on netbox1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [17:05:36] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw, 10Patch-For-Review: Bring codfw row A-B EVPN switches live and make them gateway for existing Vlans - https://phabricator.wikimedia.org/T347191 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=0a8384b5-aa0d-44df-bf5c-aa9e191ed... [17:20:52] (SystemdUnitFailed) resolved: netbox_report_accounting_run.service Failed on netbox1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [17:21:28] fyi i spokewith jesse [17:21:41] and Xio.NoX [17:26:17] (NodeTextfileStale) firing: (2) Stale textfile for puppetserver2001:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale [20:54:06] 10Mail, 10Infrastructure-Foundations, 10SRE: Rspamd module - https://phabricator.wikimedia.org/T325397 (10jhathaway) 05Open→03Resolved [20:54:12] 10Mail, 10Infrastructure-Foundations, 10SRE: Puppetry - https://phabricator.wikimedia.org/T325395 (10jhathaway) [20:54:18] 10Mail, 10Infrastructure-Foundations, 10SRE: Puppetry - https://phabricator.wikimedia.org/T325395 (10jhathaway) [20:54:24] 10Mail, 10Infrastructure-Foundations, 10SRE: Postfix Module - https://phabricator.wikimedia.org/T325396 (10jhathaway) 05Open→03Resolved [21:26:17] (NodeTextfileStale) firing: (2) Stale textfile for puppetserver2001:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale [23:21:13] (DiskSpace) firing: Disk space puppetmaster1001:9100:/ 5.946% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=puppetmaster1001 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace