[01:42:41] FIRING: [3x] SystemdUnitFailed: etcd-backup.service on aux-k8s-etcd2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:42:49] FIRING: PuppetFailure: Puppet has failed on build2001:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [05:42:41] FIRING: [3x] SystemdUnitFailed: etcd-backup.service on aux-k8s-etcd2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:42:49] FIRING: PuppetFailure: Puppet has failed on build2001:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [09:42:41] FIRING: [3x] SystemdUnitFailed: etcd-backup.service on aux-k8s-etcd2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:03:47] 10netops, 06Infrastructure-Foundations, 06SRE, 10Data-Platform-SRE (2025.02.10 - 2025.02.28): Add QoS markings to profile Hadoop/HDFS analytics traffic - https://phabricator.wikimedia.org/T381389#10537993 (10Gehel) [10:42:49] FIRING: PuppetFailure: Puppet has failed on build2001:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [12:47:49] RESOLVED: PuppetFailure: Puppet has failed on build2001:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [12:48:04] FIRING: PuppetFailure: Puppet has failed on build2001:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [12:48:18] RESOLVED: PuppetFailure: Puppet has failed on build2001:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [13:42:41] FIRING: [3x] SystemdUnitFailed: etcd-backup.service on aux-k8s-etcd2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:23:01] FIRING: NTPNoSynced: NTP not synced on graphite1005:9100 - https://wikitech.wikimedia.org/wiki/NTP - TODO - https://alerts.wikimedia.org/?q=alertname%3DNTPNoSynced [16:00:19] 10SRE-tools, 10Elasticsearch, 06Infrastructure-Foundations, 10Spicerack: elasticsearch spicerack module failes with most recent elastic-curator - https://phabricator.wikimedia.org/T328775#10539986 (10Gehel) [16:03:48] 10netops, 06DC-Ops, 10Elasticsearch, 06Infrastructure-Foundations, and 2 others: codfw: elastic2025-elastic2036/switch port configuration - https://phabricator.wikimedia.org/T154605#10540043 (10Gehel) [16:13:01] RESOLVED: NTPNoSynced: NTP not synced on graphite1005:9100 - https://wikitech.wikimedia.org/wiki/NTP - TODO - https://alerts.wikimedia.org/?q=alertname%3DNTPNoSynced [16:13:58] 10Mail, 07Puppet, 06DBA, 10Elasticsearch, and 11 others: Project Proposal: Label style projects for common operations tools - https://phabricator.wikimedia.org/T1147#10540193 (10Gehel) [16:22:55] FIRING: MaxConntrack: Max conntrack at 84.39% on krb1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [16:27:55] RESOLVED: MaxConntrack: Max conntrack at 82.24% on krb1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [17:42:41] FIRING: [3x] SystemdUnitFailed: etcd-backup.service on aux-k8s-etcd2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:06:27] 10CAS-SSO, 10Gerrit, 06Infrastructure-Foundations: gerrit.wikimedia.org should use IDP for login - https://phabricator.wikimedia.org/T147864#10541731 (10Arendpieter) [21:07:19] 10CAS-SSO, 10Gerrit, 06Infrastructure-Foundations: gerrit.wikimedia.org should use IDP for login - https://phabricator.wikimedia.org/T147864#10541736 (10bd808) >>! In T147864#10541714, @Arendpieter wrote: > I understand that Gerrit uses LDAP, but that doesn’t provide true single sign-on since authentication... [21:42:41] FIRING: [3x] SystemdUnitFailed: etcd-backup.service on aux-k8s-etcd2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed