[00:18:45] 06SRE, 10SRE-Access-Requests: Remove Morten Warncke-Wang’s production shell access - https://phabricator.wikimedia.org/T416754 (10Nettrom) 03NEW [00:40:47] (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1237605 [00:40:47] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1237605 (owner: 10TrainBranchBot) [00:55:03] (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1237605 (owner: 10TrainBranchBot) [00:59:17] FIRING: NetworkDeviceAlarmActive: Alarm active on cr2-codfw - https://wikitech.wikimedia.org/wiki/Network_monitoring#Juniper_alarm - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DNetworkDeviceAlarmActive [01:10:38] (03PS1) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1237610 [01:10:38] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1237610 (owner: 10TrainBranchBot) [01:36:29] (03Merged) 10jenkins-bot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1237610 (owner: 10TrainBranchBot) [02:00:49] !log mwpresync@deploy2002 Started scap build-images: Publishing wmf/next image [02:13:42] !log mwpresync@deploy2002 Finished scap build-images: Publishing wmf/next image (duration: 12m 52s) [03:14:41] FIRING: MirrorHighLag: Mirrors - /srv/mirrors/ubuntu synchronization lag - https://wikitech.wikimedia.org/wiki/Mirrors - https://grafana.wikimedia.org/d/dbd8a904-eab2-48d1-a3b9-fa1851ef3ed2/mirrors?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DMirrorHighLag [03:29:17] FIRING: KubernetesCalicoDown: wikikube-worker2019.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://grafana.wikimedia.org/d/G8zPL7-Wz/?var-dc=codfw%20prometheus%2Fk8s&var-instance=wikikube-worker2019.codfw.wmnet - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown [03:45:30] PROBLEM - Kafka MirrorMaker main-codfw_to_main-eqiad max lag in last 10 minutes on alert1002 is CRITICAL: 1.008e+05 gt 1e+05 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/d/000000521/kafka-mirrormaker?var-datasource=eqiad+prometheus/ops&var-lag_datasource=codfw+prometheus/ops&var-mirror_name=main-codfw_to_main-eqiad [04:59:17] FIRING: NetworkDeviceAlarmActive: Alarm active on cr2-codfw - https://wikitech.wikimedia.org/wiki/Network_monitoring#Juniper_alarm - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DNetworkDeviceAlarmActive [05:09:17] FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [05:34:17] RESOLVED: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [05:39:30] RECOVERY - Kafka MirrorMaker main-codfw_to_main-eqiad max lag in last 10 minutes on alert1002 is OK: (C)1e+05 gt (W)1e+04 gt 3312 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/d/000000521/kafka-mirrormaker?var-datasource=eqiad+prometheus/ops&var-lag_datasource=codfw+prometheus/ops&var-mirror_name=main-codfw_to_main-eqiad [07:14:41] FIRING: MirrorHighLag: Mirrors - /srv/mirrors/ubuntu synchronization lag - https://wikitech.wikimedia.org/wiki/Mirrors - https://grafana.wikimedia.org/d/dbd8a904-eab2-48d1-a3b9-fa1851ef3ed2/mirrors?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DMirrorHighLag [07:29:17] FIRING: KubernetesCalicoDown: wikikube-worker2019.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://grafana.wikimedia.org/d/G8zPL7-Wz/?var-dc=codfw%20prometheus%2Fk8s&var-instance=wikikube-worker2019.codfw.wmnet - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown [07:38:10] FIRING: BFDdown: BFD session down between cr2-magru and fe80::ee38:73ff:fee8:9c58 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-magru:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown [07:43:10] RESOLVED: BFDdown: BFD session down between cr2-magru and fe80::ee38:73ff:fee8:9c58 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-magru:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown [08:55:10] FIRING: BFDdown: BFD session down between cr2-eqdfw and fe80::b6f9:5dff:fe30:e538 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-eqdfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown [08:59:17] FIRING: NetworkDeviceAlarmActive: Alarm active on cr2-codfw - https://wikitech.wikimedia.org/wiki/Network_monitoring#Juniper_alarm - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DNetworkDeviceAlarmActive [09:00:10] RESOLVED: BFDdown: BFD session down between cr2-eqdfw and fe80::b6f9:5dff:fe30:e538 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-eqdfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown [11:14:41] FIRING: MirrorHighLag: Mirrors - /srv/mirrors/ubuntu synchronization lag - https://wikitech.wikimedia.org/wiki/Mirrors - https://grafana.wikimedia.org/d/dbd8a904-eab2-48d1-a3b9-fa1851ef3ed2/mirrors?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DMirrorHighLag [11:29:17] FIRING: KubernetesCalicoDown: wikikube-worker2019.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://grafana.wikimedia.org/d/G8zPL7-Wz/?var-dc=codfw%20prometheus%2Fk8s&var-instance=wikikube-worker2019.codfw.wmnet - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown [12:59:17] FIRING: NetworkDeviceAlarmActive: Alarm active on cr2-codfw - https://wikitech.wikimedia.org/wiki/Network_monitoring#Juniper_alarm - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DNetworkDeviceAlarmActive [15:09:17] FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [15:14:41] FIRING: MirrorHighLag: Mirrors - /srv/mirrors/ubuntu synchronization lag - https://wikitech.wikimedia.org/wiki/Mirrors - https://grafana.wikimedia.org/d/dbd8a904-eab2-48d1-a3b9-fa1851ef3ed2/mirrors?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DMirrorHighLag [15:29:17] FIRING: KubernetesCalicoDown: wikikube-worker2019.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://grafana.wikimedia.org/d/G8zPL7-Wz/?var-dc=codfw%20prometheus%2Fk8s&var-instance=wikikube-worker2019.codfw.wmnet - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown [15:34:17] RESOLVED: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [16:59:17] FIRING: NetworkDeviceAlarmActive: Alarm active on cr2-codfw - https://wikitech.wikimedia.org/wiki/Network_monitoring#Juniper_alarm - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DNetworkDeviceAlarmActive [17:35:36] (03PS1) 10Chlod Alejandro: Revert "enwiki: change to Wikipedia 25 logo" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1237646 (https://phabricator.wikimedia.org/T416771) [17:35:45] (03CR) 10CI reject: [V:04-1] Revert "enwiki: change to Wikipedia 25 logo" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1237646 (https://phabricator.wikimedia.org/T416771) (owner: 10Chlod Alejandro) [17:48:16] (03PS2) 10Chlod Alejandro: Revert "enwiki: change to Wikipedia 25 logo" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1237646 (https://phabricator.wikimedia.org/T416771) [17:53:25] FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:10:21] FIRING: ProbeDown: Service text-https:443 has failed probes (http_text-https_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#text-https:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [18:15:13] RESOLVED: ProbeDown: Service text-https:443 has failed probes (http_text-https_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#text-https:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [18:50:12] FIRING: [2x] ProbeDown: Service titan1002:443 has failed probes (http_thanos_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#titan1002:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [18:54:17] RESOLVED: [2x] ProbeDown: Service titan1002:443 has failed probes (http_thanos_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#titan1002:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [19:01:09] FIRING: MediaWikiEditFailures: Elevated MediaWiki edit failures (session_loss) for cluster - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000208/edit-count?orgId=1&viewPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiEditFailures [19:06:09] RESOLVED: MediaWikiEditFailures: Elevated MediaWiki edit failures (session_loss) for cluster - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000208/edit-count?orgId=1&viewPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiEditFailures [19:14:41] FIRING: MirrorHighLag: Mirrors - /srv/mirrors/ubuntu synchronization lag - https://wikitech.wikimedia.org/wiki/Mirrors - https://grafana.wikimedia.org/d/dbd8a904-eab2-48d1-a3b9-fa1851ef3ed2/mirrors?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DMirrorHighLag [19:29:17] FIRING: KubernetesCalicoDown: wikikube-worker2019.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://grafana.wikimedia.org/d/G8zPL7-Wz/?var-dc=codfw%20prometheus%2Fk8s&var-instance=wikikube-worker2019.codfw.wmnet - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown [20:38:26] (03PS3) 10Jon Harald Søby: Rework InterwikiSortOrders.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1235359 [20:39:00] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, February 09 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploy" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1235359 (owner: 10Jon Harald Søby) [20:59:17] FIRING: NetworkDeviceAlarmActive: Alarm active on cr2-codfw - https://wikitech.wikimedia.org/wiki/Network_monitoring#Juniper_alarm - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DNetworkDeviceAlarmActive [21:18:00] (03PS1) 10Krinkle: Profiler: Set explicit "excimer-wall" redis channel instead of concat [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1237662 [21:53:40] FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [23:14:41] FIRING: MirrorHighLag: Mirrors - /srv/mirrors/ubuntu synchronization lag - https://wikitech.wikimedia.org/wiki/Mirrors - https://grafana.wikimedia.org/d/dbd8a904-eab2-48d1-a3b9-fa1851ef3ed2/mirrors?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DMirrorHighLag [23:29:17] FIRING: KubernetesCalicoDown: wikikube-worker2019.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://grafana.wikimedia.org/d/G8zPL7-Wz/?var-dc=codfw%20prometheus%2Fk8s&var-instance=wikikube-worker2019.codfw.wmnet - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown