[00:46:36] ryankemper: happy to review the patches tomorrow. the only issue might be the timing since we will be all hands on deck to bring up magru tomorrow [00:47:05] and then that leaves us with Wednesday, which is probably not a nice time to deploy a new service before the holidays [00:47:11] let me respond to the email though and let's take it from there [00:55:35] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: installation tracking for hosts affected by magru re-shuffle - https://phabricator.wikimedia.org/T380307#10355992 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin1002 for host cp7015.magru.wmnet with OS bu... [01:04:13] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: installation tracking for hosts affected by magru re-shuffle - https://phabricator.wikimedia.org/T380307#10355998 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin1002 for host cp7015.magru.wmnet with OS bullse... [01:29:13] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: installation tracking for hosts affected by magru re-shuffle - https://phabricator.wikimedia.org/T380307#10356021 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin1002 for host dns7001.wikimedia.org with OS... [01:45:47] sukhe: understood, and I agree I don’t want to be deploying anything on weds. I think best option is to push to next week in that case [01:51:50] 06Traffic, 13Patch-For-Review: Remove RSA certificates from puppet - https://phabricator.wikimedia.org/T375569#10356046 (10BCornwall) I don't know if this is something that should apply to mail servers or not since compatibility is always behind. I took a look at lists1004's exim logs with: ` $ zgrep -E ' X=... [02:31:35] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: installation tracking for hosts affected by magru re-shuffle - https://phabricator.wikimedia.org/T380307#10356110 (10BCornwall) [04:15:56] 06Traffic, 13Patch-For-Review: Remove RSA certificates from puppet - https://phabricator.wikimedia.org/T375569#10356174 (10Vgutierrez) right now exim is configured with RSA certs only and not with a dual stack (RSA+ECDSA) setup, from lists1004's exim configuration: ` # TLS tls_certi... [10:28:00] FIRING: [2x] AnycastHealthcheckerRestarted: anycast-healthchecker service restarted on doh7001:9100 - https://wikitech.wikimedia.org/wiki/Anycast#Anycast_healthchecker_not_running - https://grafana.wikimedia.org/d/dxbfeGDZk/anycast?orgId=1&var-protocol=BGP&var-site=magru&var-cluster=All&var-ip_version=All - https://alerts.wikimedia.org/?q=alertname%3DAnycastHealthcheckerRestarted [10:33:00] RESOLVED: [2x] AnycastHealthcheckerRestarted: anycast-healthchecker service restarted on doh7001:9100 - https://wikitech.wikimedia.org/wiki/Anycast#Anycast_healthchecker_not_running - https://grafana.wikimedia.org/d/dxbfeGDZk/anycast?orgId=1&var-protocol=BGP&var-site=magru&var-cluster=All&var-ip_version=All - https://alerts.wikimedia.org/?q=alertname%3DAnycastHealthcheckerRestarted [10:39:30] FIRING: [2x] AnycastHealthcheckerRestarted: anycast-healthchecker service restarted on doh7001:9100 - https://wikitech.wikimedia.org/wiki/Anycast#Anycast_healthchecker_not_running - https://grafana.wikimedia.org/d/dxbfeGDZk/anycast?orgId=1&var-protocol=BGP&var-site=magru&var-cluster=All&var-ip_version=All - https://alerts.wikimedia.org/?q=alertname%3DAnycastHealthcheckerRestarted [10:44:30] RESOLVED: [4x] AnycastHealthcheckerRestarted: anycast-healthchecker service restarted on doh7001:9100 - https://wikitech.wikimedia.org/wiki/Anycast#Anycast_healthchecker_not_running - https://grafana.wikimedia.org/d/dxbfeGDZk/anycast?orgId=1&var-protocol=BGP&var-site=magru&var-cluster=All&var-ip_version=All - https://alerts.wikimedia.org/?q=alertname%3DAnycastHealthcheckerRestarted [11:14:12] FIRING: SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [11:14:32] FIRING: SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [11:15:42] FIRING: SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [11:19:32] FIRING: [2x] SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [11:49:47] 06Traffic, 07Browser-Support-Apple-Safari, 07Browser-Support-Firefox, 07Browser-Support-Google-Chrome, 07User-notice: Discovery: Deprecation of TLS 1.2 - https://phabricator.wikimedia.org/T367821#10357140 (10Aklapper) [12:04:32] RESOLVED: [2x] SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [12:09:13] RESOLVED: SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [12:10:42] RESOLVED: SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [12:12:00] FIRING: AnycastHealthcheckerRestarted: anycast-healthchecker service restarted on durum7002:9100 - https://wikitech.wikimedia.org/wiki/Anycast#Anycast_healthchecker_not_running - https://grafana.wikimedia.org/d/dxbfeGDZk/anycast?orgId=1&var-protocol=BGP&var-site=magru&var-cluster=All&var-ip_version=All - https://alerts.wikimedia.org/?q=alertname%3DAnycastHealthcheckerRestarted [12:17:00] FIRING: [2x] AnycastHealthcheckerRestarted: anycast-healthchecker service restarted on doh7002:9100 - https://wikitech.wikimedia.org/wiki/Anycast#Anycast_healthchecker_not_running - https://grafana.wikimedia.org/d/dxbfeGDZk/anycast?orgId=1&var-protocol=BGP&var-site=magru&var-cluster=All&var-ip_version=All - https://alerts.wikimedia.org/?q=alertname%3DAnycastHealthcheckerRestarted [12:22:00] RESOLVED: AnycastHealthcheckerRestarted: anycast-healthchecker service restarted on doh7002:9100 - https://wikitech.wikimedia.org/wiki/Anycast#Anycast_healthchecker_not_running - https://grafana.wikimedia.org/d/dxbfeGDZk/anycast?orgId=1&var-protocol=BGP&var-site=magru&var-cluster=All&var-ip_version=All - https://alerts.wikimedia.org/?q=alertname%3DAnycastHealthcheckerRestarted [12:30:34] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: installation tracking for hosts affected by magru re-shuffle - https://phabricator.wikimedia.org/T380307#10357281 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by robh@cumin2002 for host cp7015.magru.wmnet with OS bul... [12:48:37] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: installation tracking for hosts affected by magru re-shuffle - https://phabricator.wikimedia.org/T380307#10357361 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by robh@cumin2002 for host cp7015.magru.wmnet with OS bullsey... [12:51:40] FIRING: [3x] VarnishPrometheusExporterDown: Varnish Exporter on instance cp7003:9331 is unreachable - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/000000304/varnish-dc-stats?viewPanel=17 - https://alerts.wikimedia.org/?q=alertname%3DVarnishPrometheusExporterDown [12:53:02] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: installation tracking for hosts affected by magru re-shuffle - https://phabricator.wikimedia.org/T380307#10357367 (10RobH) [12:54:37] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: installation tracking for hosts affected by magru re-shuffle - https://phabricator.wikimedia.org/T380307#10357380 (10RobH) [12:56:40] FIRING: [4x] VarnishPrometheusExporterDown: Varnish Exporter on instance cp7002:9331 is unreachable - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/000000304/varnish-dc-stats?viewPanel=17 - https://alerts.wikimedia.org/?q=alertname%3DVarnishPrometheusExporterDown [12:58:31] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: installation tracking for hosts affected by magru re-shuffle - https://phabricator.wikimedia.org/T380307#10357383 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by fabfur@cumin1002 for host lvs7003.magru.wmnet with OS... [13:03:20] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: installation tracking for hosts affected by magru re-shuffle - https://phabricator.wikimedia.org/T380307#10357430 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin1002 for host dns7001.wikimedia.org with OS... [13:07:28] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: installation tracking for hosts affected by magru re-shuffle - https://phabricator.wikimedia.org/T380307#10357451 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin1002 for host dns7001.wikimedia.org with OS boo... [13:11:54] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: installation tracking for hosts affected by magru re-shuffle - https://phabricator.wikimedia.org/T380307#10357465 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by fabfur@cumin1002 for host cp7015.magru.wmnet with OS b... [13:16:40] FIRING: [4x] VarnishPrometheusExporterDown: Varnish Exporter on instance cp7002:9331 is unreachable - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/000000304/varnish-dc-stats?viewPanel=17 - https://alerts.wikimedia.org/?q=alertname%3DVarnishPrometheusExporterDown [13:19:39] 06Traffic, 07Browser-Support-Apple-Safari, 07Browser-Support-Firefox, 07Browser-Support-Google-Chrome, 07User-notice: Discovery: Deprecation of TLS 1.2 - https://phabricator.wikimedia.org/T367821#10357483 (10Xeverything11) TLS 1.3 is not as big of a issue for Android as with iOS. Chrome 71 and [[https://... [13:20:00] FIRING: PurgedHighEventLag: High event process lag with purged on cp7003:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://grafana.wikimedia.org/d/RvscY1CZk/purged?var-datasource=magru%20prometheus/ops&var-instance=cp7003 - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [13:20:00] FIRING: PurgedHighBacklogQueue: Large backlog queue for purged on cp7003:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://grafana.wikimedia.org/d/RvscY1CZk/purged?var-datasource=magru%20prometheus/ops&var-instance=cp7003 - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighBacklogQueue [13:25:00] FIRING: PurgedHighEventLag: High event process lag with purged on cp7003:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://grafana.wikimedia.org/d/RvscY1CZk/purged?var-datasource=magru%20prometheus/ops&var-instance=cp7003 - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [13:26:40] FIRING: [4x] VarnishPrometheusExporterDown: Varnish Exporter on instance cp7002:9331 is unreachable - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/000000304/varnish-dc-stats?viewPanel=17 - https://alerts.wikimedia.org/?q=alertname%3DVarnishPrometheusExporterDown [13:35:00] FIRING: PurgedHighEventLag: High event process lag with purged on cp7004:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://grafana.wikimedia.org/d/RvscY1CZk/purged?var-datasource=magru%20prometheus/ops&var-instance=cp7004 - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [13:35:00] FIRING: PurgedHighBacklogQueue: Large backlog queue for purged on cp7004:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://grafana.wikimedia.org/d/RvscY1CZk/purged?var-datasource=magru%20prometheus/ops&var-instance=cp7004 - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighBacklogQueue [13:36:40] RESOLVED: VarnishPrometheusExporterDown: Varnish Exporter on instance cp7004:9331 is unreachable - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/000000304/varnish-dc-stats?viewPanel=17 - https://alerts.wikimedia.org/?q=alertname%3DVarnishPrometheusExporterDown [13:43:00] FIRING: AnycastHealthcheckerRestarted: anycast-healthchecker service restarted on dns7002:9100 - https://wikitech.wikimedia.org/wiki/Anycast#Anycast_healthchecker_not_running - https://grafana.wikimedia.org/d/dxbfeGDZk/anycast?orgId=1&var-protocol=BGP&var-site=magru&var-cluster=All&var-ip_version=All - https://alerts.wikimedia.org/?q=alertname%3DAnycastHealthcheckerRestarted [13:43:29] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: installation tracking for hosts affected by magru re-shuffle - https://phabricator.wikimedia.org/T380307#10357611 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by fabfur@cumin1002 for host dns7001.wikimedia.org with O... [13:48:00] RESOLVED: AnycastHealthcheckerRestarted: anycast-healthchecker service restarted on dns7002:9100 - https://wikitech.wikimedia.org/wiki/Anycast#Anycast_healthchecker_not_running - https://grafana.wikimedia.org/d/dxbfeGDZk/anycast?orgId=1&var-protocol=BGP&var-site=magru&var-cluster=All&var-ip_version=All - https://alerts.wikimedia.org/?q=alertname%3DAnycastHealthcheckerRestarted [13:50:00] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: installation tracking for hosts affected by magru re-shuffle - https://phabricator.wikimedia.org/T380307#10357660 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by fabfur@cumin1002 for host lvs7003.magru.wmnet with OS bull... [13:57:38] FIRING: [4x] LVSRealserverMSS: Unexpected MSS value on 195.200.68.240:443 @ cp7015 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=2&var-site=magru&var-cluster=cache_upload - https://alerts.wikimedia.org/?q=alertname%3DLVSRealserverMSS [14:01:39] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: installation tracking for hosts affected by magru re-shuffle - https://phabricator.wikimedia.org/T380307#10357733 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by fabfur@cumin1002 for host cp7015.magru.wmnet with OS bulls... [14:02:38] RESOLVED: [4x] LVSRealserverMSS: Unexpected MSS value on 195.200.68.240:443 @ cp7015 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=2&var-site=magru&var-cluster=cache_upload - https://alerts.wikimedia.org/?q=alertname%3DLVSRealserverMSS [14:43:09] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: installation tracking for hosts affected by magru re-shuffle - https://phabricator.wikimedia.org/T380307#10357889 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by fabfur@cumin1002 for host dns7001.wikimedia.org with OS bu... [14:43:54] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: installation tracking for hosts affected by magru re-shuffle - https://phabricator.wikimedia.org/T380307#10357890 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by fabfur@cumin1002 for host dns7001.wikimedia.org with O... [14:54:13] FIRING: SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [14:54:32] FIRING: SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [14:55:42] FIRING: SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [14:57:01] FIRING: AnycastHealthcheckerRestarted: anycast-healthchecker service restarted on durum7002:9100 - https://wikitech.wikimedia.org/wiki/Anycast#Anycast_healthchecker_not_running - https://grafana.wikimedia.org/d/dxbfeGDZk/anycast?orgId=1&var-protocol=BGP&var-site=magru&var-cluster=All&var-ip_version=All - https://alerts.wikimedia.org/?q=alertname%3DAnycastHealthcheckerRestarted [14:59:13] RESOLVED: SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [14:59:32] FIRING: [2x] SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [15:00:42] RESOLVED: SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [15:04:32] RESOLVED: [2x] SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [15:20:04] 10netops, 10Cloud-Services, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: Replace optics in cloudsw1-d5-eqiad et-0/0/52 and cloudsw1-e4-eqiad et-0/0/54 - https://phabricator.wikimedia.org/T380503#10357974 (10VRiley-WMF) Has this still been performing as expected? If so, are we able to close it? [15:30:05] 10netops, 10Cloud-Services, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: Replace optics in cloudsw1-d5-eqiad et-0/0/52 and cloudsw1-e4-eqiad et-0/0/54 - https://phabricator.wikimedia.org/T380503#10357993 (10dcaro) Looks good on my side πŸ‘ [15:33:30] 10netops, 10Cloud-Services, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: Replace optics in cloudsw1-d5-eqiad et-0/0/52 and cloudsw1-e4-eqiad et-0/0/54 - https://phabricator.wikimedia.org/T380503#10357997 (10VRiley-WMF) 05Openβ†’03Resolved [15:42:52] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: installation tracking for hosts affected by magru re-shuffle - https://phabricator.wikimedia.org/T380307#10358029 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by fabfur@cumin1002 for host dns7001.wikimedia.org with OS bo... [16:14:18] 10netops, 10Cloud-Services, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: Replace optics in cloudsw1-d5-eqiad et-0/0/52 and cloudsw1-e4-eqiad et-0/0/54 - https://phabricator.wikimedia.org/T380503#10358138 (10cmooney) 05Resolvedβ†’03Open >>! In T380503#10357974, @VRiley-WMF wrote: > Has this still... [16:17:38] FIRING: [4x] LVSRealserverMSS: Unexpected MSS value on 195.200.68.240:443 @ cp7010 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=2&var-site=magru&var-cluster=cache_upload - https://alerts.wikimedia.org/?q=alertname%3DLVSRealserverMSS [16:22:38] RESOLVED: [4x] LVSRealserverMSS: Unexpected MSS value on 195.200.68.240:443 @ cp7010 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=2&var-site=magru&var-cluster=cache_upload - https://alerts.wikimedia.org/?q=alertname%3DLVSRealserverMSS [16:35:50] 06Traffic: Package and deploy ATS 9.2.6 - https://phabricator.wikimedia.org/T379797#10358319 (10Fabfur) 05Openβ†’03Resolved upgraded everywhere [16:37:01] 06Traffic: Package and deploy ATS 9.2.6 - https://phabricator.wikimedia.org/T379797#10358322 (10ssingh) [17:44:06] FIRING: AnycastHealthcheckerRestarted: anycast-healthchecker service restarted on dns7002:9100 - https://wikitech.wikimedia.org/wiki/Anycast#Anycast_healthchecker_not_running - https://grafana.wikimedia.org/d/dxbfeGDZk/anycast?orgId=1&var-protocol=BGP&var-site=magru&var-cluster=All&var-ip_version=All - https://alerts.wikimedia.org/?q=alertname%3DAnycastHealthcheckerRestarted [17:45:38] FIRING: [8x] LVSRealserverMSS: Unexpected MSS value on 195.200.68.224:443 @ cp7002 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=2&var-site=magru&var-cluster=cache_text - https://alerts.wikimedia.org/?q=alertname%3DLVSRealserverMSS [17:47:25] FIRING: [2x] SystemdUnitFailed: anycast-healthchecker.service on dns7002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [17:50:38] RESOLVED: [8x] LVSRealserverMSS: Unexpected MSS value on 195.200.68.224:443 @ cp7002 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=2&var-site=magru&var-cluster=cache_text - https://alerts.wikimedia.org/?q=alertname%3DLVSRealserverMSS [17:57:25] RESOLVED: [2x] SystemdUnitFailed: anycast-healthchecker.service on dns7002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [17:59:00] RESOLVED: AnycastHealthcheckerRestarted: anycast-healthchecker service restarted on dns7002:9100 - https://wikitech.wikimedia.org/wiki/Anycast#Anycast_healthchecker_not_running - https://grafana.wikimedia.org/d/dxbfeGDZk/anycast?orgId=1&var-protocol=BGP&var-site=magru&var-cluster=All&var-ip_version=All - https://alerts.wikimedia.org/?q=alertname%3DAnycastHealthcheckerRestarted [19:23:58] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: installation tracking for hosts affected by magru re-shuffle - https://phabricator.wikimedia.org/T380307#10359109 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by robh@cumin2002 for hosts: `ganeti7001.magru.wmnet` - ganeti70... [19:28:36] 10Domains, 06Traffic: Park pay-for-edit and scam domains - https://phabricator.wikimedia.org/T380334#10359135 (10Dzahn) [x] Checked the list and confirm all of these domains listed here already link to the "parking" template in our DNS repo. [19:38:12] FIRING: SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [19:38:32] FIRING: SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [19:39:42] FIRING: SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [19:40:02] this needs to go [19:43:32] FIRING: [2x] SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [19:43:38] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: installation tracking for hosts affected by magru re-shuffle - https://phabricator.wikimedia.org/T380307#10359240 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by robh@cumin2002 for hosts: `cp7003.magru.wmnet` - cp7003.magru... [19:48:32] RESOLVED: [2x] SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [20:00:23] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: installation tracking for hosts affected by magru re-shuffle - https://phabricator.wikimedia.org/T380307#10359342 (10RobH) [20:01:25] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: installation tracking for hosts affected by magru re-shuffle - https://phabricator.wikimedia.org/T380307#10359353 (10RobH) [20:14:00] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: installation tracking for hosts affected by magru re-shuffle - https://phabricator.wikimedia.org/T380307#10359385 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by robh@cumin2002 for hosts: `ganeti7002.magru.wmnet` - ganeti70... [20:16:50] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: installation tracking for hosts affected by magru re-shuffle - https://phabricator.wikimedia.org/T380307#10359388 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by robh@cumin2002 for hosts: `cp7004.magru.wmnet` - cp7004.magru... [20:25:49] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: installation tracking for hosts affected by magru re-shuffle - https://phabricator.wikimedia.org/T380307#10359418 (10BCornwall) [20:27:51] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: installation tracking for hosts affected by magru re-shuffle - https://phabricator.wikimedia.org/T380307#10359434 (10RobH) [20:34:13] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: installation tracking for hosts affected by magru re-shuffle - https://phabricator.wikimedia.org/T380307#10359469 (10RobH) [20:34:35] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: installation tracking for hosts affected by magru re-shuffle - https://phabricator.wikimedia.org/T380307#10359472 (10RobH) [20:47:33] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: installation tracking for hosts affected by magru re-shuffle - https://phabricator.wikimedia.org/T380307#10359553 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by robh@cumin2002 for hosts: `dns7002.wikimedia.org` - dns7002.w... [20:50:15] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: installation tracking for hosts affected by magru re-shuffle - https://phabricator.wikimedia.org/T380307#10359556 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by robh@cumin2002 for hosts: `cp7002.magru.wmnet` - cp7002.magru... [21:03:17] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: installation tracking for hosts affected by magru re-shuffle - https://phabricator.wikimedia.org/T380307#10359591 (10RobH) [21:06:03] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: installation tracking for hosts affected by magru re-shuffle - https://phabricator.wikimedia.org/T380307#10359593 (10RobH) [21:20:30] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: installation tracking for hosts affected by magru re-shuffle - https://phabricator.wikimedia.org/T380307#10359620 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by robh@cumin2002 for hosts: `cp7010.magru.wmnet` - cp7010.magru... [21:22:43] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: installation tracking for hosts affected by magru re-shuffle - https://phabricator.wikimedia.org/T380307#10359631 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by robh@cumin2002 for hosts: `lvs7001.magru.wmnet` - lvs7001.mag... [21:23:38] FIRING: [8x] LVSRealserverMSS: Unexpected MSS value on 195.200.68.224:443 @ cp7003 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=2&var-site=magru&var-cluster=cache_text - https://alerts.wikimedia.org/?q=alertname%3DLVSRealserverMSS [21:28:38] RESOLVED: [8x] LVSRealserverMSS: Unexpected MSS value on 195.200.68.224:443 @ cp7003 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=2&var-site=magru&var-cluster=cache_text - https://alerts.wikimedia.org/?q=alertname%3DLVSRealserverMSS [21:35:57] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: installation tracking for hosts affected by magru re-shuffle - https://phabricator.wikimedia.org/T380307#10359683 (10RobH) [21:37:12] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: installation tracking for hosts affected by magru re-shuffle - https://phabricator.wikimedia.org/T380307#10359684 (10BCornwall) [21:39:49] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: installation tracking for hosts affected by magru re-shuffle - https://phabricator.wikimedia.org/T380307#10359697 (10RobH) @MoritzMuehlenhoff : ganeti700[12] are ready for reimage but I've just run out of steam for today. If you don't get to... [21:59:49] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: installation tracking for hosts affected by magru re-shuffle - https://phabricator.wikimedia.org/T380307#10359788 (10BCornwall) [22:22:38] FIRING: [8x] LVSRealserverMSS: Unexpected MSS value on 195.200.68.224:443 @ cp7002 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=2&var-site=magru&var-cluster=cache_text - https://alerts.wikimedia.org/?q=alertname%3DLVSRealserverMSS [22:27:38] RESOLVED: [8x] LVSRealserverMSS: Unexpected MSS value on 195.200.68.224:443 @ cp7002 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=2&var-site=magru&var-cluster=cache_text - https://alerts.wikimedia.org/?q=alertname%3DLVSRealserverMSS [22:29:38] FIRING: [8x] LVSRealserverMSS: Unexpected MSS value on 195.200.68.224:443 @ cp7002 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=2&var-site=magru&var-cluster=cache_text - https://alerts.wikimedia.org/?q=alertname%3DLVSRealserverMSS [22:39:38] RESOLVED: [8x] LVSRealserverMSS: Unexpected MSS value on 195.200.68.224:443 @ cp7002 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=2&var-site=magru&var-cluster=cache_text - https://alerts.wikimedia.org/?q=alertname%3DLVSRealserverMSS [22:40:08] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: installation tracking for hosts affected by magru re-shuffle - https://phabricator.wikimedia.org/T380307#10359954 (10BCornwall) [23:25:09] FIRING: [9x] LVSHighCPU: The host lvs1018:9100 has at least its CPU 0 saturated - https://bit.ly/wmf-lvscpu - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs1018 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighCPU [23:30:09] RESOLVED: [9x] LVSHighCPU: The host lvs1018:9100 has at least its CPU 0 saturated - https://bit.ly/wmf-lvscpu - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs1018 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighCPU [23:36:31] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: installation tracking for hosts affected by magru re-shuffle - https://phabricator.wikimedia.org/T380307#10360135 (10BCornwall)