[00:21:13] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [00:21:20] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [00:22:12] (SystemdUnitFailed) firing: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:31:15] (MediaWikiLatencyExceeded) firing: p75 latency high: codfw mw-parsoid (k8s) 809ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [00:36:15] (MediaWikiLatencyExceeded) resolved: p75 latency high: codfw mw-parsoid (k8s) 812.2ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [00:37:15] (MediaWikiLatencyExceeded) firing: p75 latency high: codfw mw-parsoid (k8s) 864.6ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [00:39:25] (SystemdUnitFailed) resolved: rsync-aptrepo-apt2001.wikimedia.org.service on apt1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:41:30] (MediaWikiLatencyExceeded) resolved: p75 latency high: codfw mw-parsoid (k8s) 875.1ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [00:44:55] (SystemdUnitFailed) firing: rsync-aptrepo-apt2001.wikimedia.org.service on apt1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [01:02:36] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [01:02:42] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [01:14:06] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [01:14:13] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [01:32:55] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [01:33:02] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [02:00:07] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [02:00:14] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [02:09:51] btw, if someone familiar with the job queue could look at/comment on https://phabricator.wikimedia.org/T358308 I'd appreciate that :) [tl;dr: Large uploads on commons stopped working. As far as i can tell the culprit is the job queue timeout was reduced from 7 minutes to 3 when moving to k8s] [02:12:05] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [02:12:11] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [02:13:15] (PHPFPMTooBusy) firing: Not enough idle PHP-FPM workers for Mediawiki mw-parsoid at codfw: 47.69% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-container_name=All - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [02:14:14] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [02:14:21] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [02:16:15] (MediaWikiLatencyExceeded) firing: p75 latency high: codfw mw-parsoid (k8s) 830.2ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [02:18:15] (PHPFPMTooBusy) resolved: Not enough idle PHP-FPM workers for Mediawiki mw-parsoid at codfw: 47.69% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-container_name=All - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [02:18:24] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [02:18:30] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [02:21:15] (MediaWikiLatencyExceeded) resolved: p75 latency high: codfw mw-parsoid (k8s) 827.5ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [02:25:55] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [02:26:02] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [02:37:15] (JobUnavailable) firing: (3) Reduced availability for job ldap in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [03:02:15] (JobUnavailable) firing: (3) Reduced availability for job ldap in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [03:42:05] PROBLEM - Check unit status of httpbb_kubernetes_mw-api-ext_hourly on cumin1002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-api-ext_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [03:43:25] (SystemdUnitFailed) firing: httpbb_kubernetes_mw-api-ext_hourly.service on cumin1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:22:12] (SystemdUnitFailed) firing: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:42:05] RECOVERY - Check unit status of httpbb_kubernetes_mw-api-ext_hourly on cumin1002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-api-ext_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [04:43:25] (SystemdUnitFailed) resolved: httpbb_kubernetes_mw-api-ext_hourly.service on cumin1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:45:10] (SystemdUnitFailed) firing: rsync-aptrepo-apt2001.wikimedia.org.service on apt1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:04:22] (03PS12) 10Anzx: frwiki: update legacy vector logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1011097 (https://phabricator.wikimedia.org/T359741) [05:41:23] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [05:41:32] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [05:41:45] (SwiftTooManyMediaUploads) firing: (2) Too many eqiad mediawiki originals uploads - https://wikitech.wikimedia.org/wiki/Swift/How_To#mediawiki_originals_uploads - https://alerts.wikimedia.org/?q=alertname%3DSwiftTooManyMediaUploads [06:00:05] Deploy window MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240315T0600) [06:11:45] (SwiftTooManyMediaUploads) firing: (2) Too many eqiad mediawiki originals uploads - https://wikitech.wikimedia.org/wiki/Swift/How_To#mediawiki_originals_uploads - https://alerts.wikimedia.org/?q=alertname%3DSwiftTooManyMediaUploads [06:16:19] PROBLEM - Router interfaces on cr2-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 45, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [06:16:39] PROBLEM - Router interfaces on cr3-ulsfo is CRITICAL: CRITICAL: host 198.35.26.192, interfaces up: 69, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [06:21:45] (SwiftTooManyMediaUploads) resolved: (2) Too many eqiad mediawiki originals uploads - https://wikitech.wikimedia.org/wiki/Swift/How_To#mediawiki_originals_uploads - https://alerts.wikimedia.org/?q=alertname%3DSwiftTooManyMediaUploads [06:44:55] (SystemdUnitFailed) resolved: rsync-aptrepo-apt2001.wikimedia.org.service on apt1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:48:25] (SystemdUnitFailed) firing: rsync-aptrepo-apt2001.wikimedia.org.service on apt1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:00:05] Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240315T0700) [07:02:30] (JobUnavailable) firing: (2) Reduced availability for job ldap in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [07:22:45] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [07:22:52] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [07:42:29] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [07:42:36] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [08:07:11] good morning [08:07:22] jouncebot: nowandnext [08:07:22] For the next 22 hour(s) and 52 minute(s): No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240315T0700) [08:07:22] In 22 hour(s) and 52 minute(s): No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240316T0700) [08:07:46] so well if hotifixes need to be backported this friday, I am the one in charge [08:07:51] given SRE are not around :) [08:07:54] we shall see [08:11:04] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [08:11:11] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [08:22:12] (SystemdUnitFailed) firing: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:46:57] (SystemdUnitFailed) firing: (2) update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:30:54] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [09:31:01] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [09:45:05] !log eevans@cumin1002 START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on restbase1022.eqiad.wmnet with reason: Decommissioning — T354561 [09:45:10] T354561: Decommission restbase10[19-27] - https://phabricator.wikimedia.org/T354561 [09:45:19] !log eevans@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase1022.eqiad.wmnet with reason: Decommissioning — T354561 [09:47:01] (03PS1) 10Cwhite: apt: add thirdparty/opensearch2 component on bookworm [puppet] - 10https://gerrit.wikimedia.org/r/1010974 (https://phabricator.wikimedia.org/T352517) [09:52:06] (03PS1) 10Majavah: P:wmcs::services: use deb.svc.toolforge.org [puppet] - 10https://gerrit.wikimedia.org/r/1011273 (https://phabricator.wikimedia.org/T306039) [09:52:14] (03PS1) 10Majavah: hieradata: maintain-dbusers: use nfs.svc.toolforge.org [puppet] - 10https://gerrit.wikimedia.org/r/1011274 (https://phabricator.wikimedia.org/T306039) [09:52:22] (03PS1) 10Majavah: P:toolforge: use prometheus.svc.toolforge.org [puppet] - 10https://gerrit.wikimedia.org/r/1011275 (https://phabricator.wikimedia.org/T306039) [09:53:06] (03PS1) 10Majavah: Use deb.svc.toolforge.org [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/1011276 (https://phabricator.wikimedia.org/T306039) [10:14:04] (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1010975 [10:14:06] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1010975 (owner: 10TrainBranchBot) [10:35:31] (03CR) 10Cwhite: [C:03+2] apt: add thirdparty/opensearch2 component on bookworm [puppet] - 10https://gerrit.wikimedia.org/r/1010974 (https://phabricator.wikimedia.org/T352517) (owner: 10Cwhite) [10:38:40] (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1010975 (owner: 10TrainBranchBot) [10:39:25] (03CR) 10Arturo Borrero Gonzalez: [C:03+1] "LGTM." [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/1011276 (https://phabricator.wikimedia.org/T306039) (owner: 10Majavah) [10:45:14] (03CR) 10Arturo Borrero Gonzalez: [C:03+1] "LGTM." [puppet] - 10https://gerrit.wikimedia.org/r/1011273 (https://phabricator.wikimedia.org/T306039) (owner: 10Majavah) [10:45:43] (03CR) 10Arturo Borrero Gonzalez: [C:03+1] "LGTM." [puppet] - 10https://gerrit.wikimedia.org/r/1011274 (https://phabricator.wikimedia.org/T306039) (owner: 10Majavah) [10:46:47] (03CR) 10Arturo Borrero Gonzalez: [C:03+1] "LGTM." [puppet] - 10https://gerrit.wikimedia.org/r/1011275 (https://phabricator.wikimedia.org/T306039) (owner: 10Majavah) [10:48:25] (SystemdUnitFailed) firing: rsync-aptrepo-apt2001.wikimedia.org.service on apt1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:57:12] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [10:57:19] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [11:02:30] (JobUnavailable) firing: (2) Reduced availability for job ldap in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [11:04:11] (03CR) 10Arturo Borrero Gonzalez: [C:03+1] "LGTM." [puppet] - 10https://gerrit.wikimedia.org/r/1011137 (https://phabricator.wikimedia.org/T358483) (owner: 10Majavah) [11:04:56] (03CR) 10Arturo Borrero Gonzalez: [C:03+1] "LGTM." [puppet] - 10https://gerrit.wikimedia.org/r/1011138 (https://phabricator.wikimedia.org/T284656) (owner: 10Majavah) [11:20:58] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [11:21:04] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [11:22:04] (SystemdUnitFailed) firing: (3) update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:40:07] (03PS1) 10Urbanecm: [Growth] frwiki: Enable personalized praise backend [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1011284 (https://phabricator.wikimedia.org/T360152) [11:40:08] (03PS1) 10Urbanecm: [Growth] frwiki: Enable personalized-praise module [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1011285 (https://phabricator.wikimedia.org/T360152) [11:46:57] (SystemdUnitFailed) firing: (3) update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:03:22] (03PS2) 10Cwhite: logstash: provision and commision logging-hd100[123] nodes [puppet] - 10https://gerrit.wikimedia.org/r/1009947 (https://phabricator.wikimedia.org/T352517) [12:10:57] (03CR) 10Cwhite: [C:03+2] logstash: provision and commision logging-hd100[123] nodes [puppet] - 10https://gerrit.wikimedia.org/r/1009947 (https://phabricator.wikimedia.org/T352517) (owner: 10Cwhite) [12:13:16] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [12:13:23] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [12:27:01] (03CR) 10Cathal Mooney: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1007437 (https://phabricator.wikimedia.org/T339850) (owner: 10Cathal Mooney) [12:29:43] (03CR) 10Majavah: [C:03+2] Use deb.svc.toolforge.org [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/1011276 (https://phabricator.wikimedia.org/T306039) (owner: 10Majavah) [12:30:13] (03CR) 10Majavah: [C:03+2] P:wmcs::services: use deb.svc.toolforge.org [puppet] - 10https://gerrit.wikimedia.org/r/1011273 (https://phabricator.wikimedia.org/T306039) (owner: 10Majavah) [12:30:21] (03Merged) 10jenkins-bot: Use deb.svc.toolforge.org [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/1011276 (https://phabricator.wikimedia.org/T306039) (owner: 10Majavah) [12:30:29] (03CR) 10Majavah: [C:03+2] hieradata: maintain-dbusers: use nfs.svc.toolforge.org [puppet] - 10https://gerrit.wikimedia.org/r/1011274 (https://phabricator.wikimedia.org/T306039) (owner: 10Majavah) [12:30:45] (03CR) 10Majavah: [C:03+2] P:toolforge: use prometheus.svc.toolforge.org [puppet] - 10https://gerrit.wikimedia.org/r/1011275 (https://phabricator.wikimedia.org/T306039) (owner: 10Majavah) [12:48:25] (SystemdUnitFailed) resolved: rsync-aptrepo-apt2001.wikimedia.org.service on apt1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:52:06] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [12:52:13] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [12:53:25] (SystemdUnitFailed) firing: rsync-aptrepo-apt2001.wikimedia.org.service on apt1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:55:15] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [12:55:22] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [12:57:36] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [12:57:42] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [12:58:41] 06SRE, 06serviceops, 07Wikimedia-production-error: VRT wiki fails to create account - https://phabricator.wikimedia.org/T359901#9633831 (10Krd) Sent by e-mail. [13:26:00] PROBLEM - OpenSearch unassigned shard check - 9200 on logging-hd1001 is CRITICAL: CRITICAL - logstash-mediawiki-1-7.0.0-1-2023.12.27[0](2024-03-08T03:44:46.006Z), logstash-deploy-1-7.0.0-1-2023.12.19[0](2024-03-08T03:44:46.003Z), logstash-syslog-1-7.0.0-1-2023.12.19[0](2024-03-08T03:44:46.006Z), logstash-syslog-1-7.0.0-1-2023.12.17[0](2024-03-08T03:44:46.003Z), logstash-mediawiki-1-7.0.0-1-2023.12.24[0](2024-03-08T03:44:46.003Z), logstash [13:26:00] .0.0-1-2023.12.23[0](2024-03-08T03:44:46.005Z), logstash-deploy-1-7.0.0-1-2023.12.22[0](2024-03-08T03:44:46.003Z), logstash-syslog-1-7.0.0-1-2023.12.27[0](2024-03-08T03:44:46.004Z), logstash-webrequest-1-7.0.0-1-2023.12.19[0](2024-03-08T03:44:46.002Z), logstash-deploy-1-7.0.0-1-2023.12.16[0](2024-03-08T03:44:46.007Z), logstash-deploy-1-7.0.0-1-2023.12.20[0](2024-03-08T03:44:46.002Z), logstash-k8s-1-7.0.0-1-2023.12.18[0](2024-03-08T03:44:4 [13:26:00] logstash-default-1-7.0.0-1-2023.12.18[0](2024-03-08T03:44:46.006Z), logstash-k8s-1-7.0.0-1-2023.12.21[0](2024-03-08T03:44:46.004Z), logstash-syslog-1-7.0.0-1-2023.12.20[0](2024-03-08T03:44:46.005Z), logst https://wikitech.wikimedia.org/wiki/Search%23Administration [13:31:25] PROBLEM - OpenSearch unassigned shard check - 9200 on logging-hd1002 is CRITICAL: CRITICAL - logstash-syslog-1-7.0.0-1-2023.12.29[0](2024-03-08T03:44:46.004Z), logstash-mediawiki-1-7.0.0-1-2023.12.17[0](2024-03-08T03:44:46.005Z), logstash-mediawiki-1-7.0.0-1-2023.12.27[0](2024-03-08T03:44:46.006Z), logstash-deploy-1-7.0.0-1-2023.12.20[0](2024-03-08T03:44:46.002Z), logstash-default-1-7.0.0-1-2023.12.23[0](2024-03-08T03:44:46.003Z), logstas [13:31:25] t-1-7.0.0-1-2023.12.26[0](2024-03-08T03:44:46.007Z), logstash-k8s-1-7.0.0-1-2023.12.18[0](2024-03-08T03:44:46.006Z), logstash-syslog-1-7.0.0-1-2023.12.18[0](2024-03-08T03:44:46.003Z), logstash-webrequest-1-7.0.0-1-2023.12.19[0](2024-03-08T03:44:46.002Z), logstash-k8s-1-7.0.0-1-2023.12.20[0](2024-03-08T03:44:46.002Z), logstash-webrequest-1-7.0.0-1-2023.12.26[0](2024-03-08T03:44:46.007Z), logstash-syslog-1-7.0.0-1-2023.12.17[0](2024-03-08T0 [13:31:25] 003Z), logstash-deploy-1-7.0.0-1-2023.12.19[0](2024-03-08T03:44:46.003Z), logstash-webrequest-1-7.0.0-1-2023.12.21[0](2024-03-08T03:44:46.003Z), logstash-syslog-1-7.0.0-1-2023.12.27[0](2024-03-08T03:44:46. https://wikitech.wikimedia.org/wiki/Search%23Administration [13:37:25] PROBLEM - OpenSearch unassigned shard check - 9200 on logging-hd1003 is CRITICAL: CRITICAL - logstash-deploy-1-7.0.0-1-2023.12.16[0](2024-03-08T03:44:46.007Z), logstash-webrequest-1-7.0.0-1-2023.12.21[0](2024-03-08T03:44:46.003Z), logstash-syslog-1-7.0.0-1-2023.12.18[0](2024-03-08T03:44:46.003Z), logstash-deploy-1-7.0.0-1-2023.12.19[0](2024-03-08T03:44:46.003Z), logstash-mediawiki-1-7.0.0-1-2023.12.25[0](2024-03-08T03:44:46.002Z), logstas [13:37:25] iki-1-7.0.0-1-2023.12.17[0](2024-03-08T03:44:46.005Z), logstash-k8s-1-7.0.0-1-2023.12.21[0](2024-03-08T03:44:46.004Z), logstash-webrequest-1-7.0.0-1-2023.12.26[0](2024-03-08T03:44:46.007Z), logstash-deploy-1-7.0.0-1-2023.12.24[0](2024-03-08T03:44:46.002Z), logstash-webrequest-1-7.0.0-1-2023.12.20[0](2024-03-08T03:44:46.007Z), logstash-deploy-1-7.0.0-1-2023.12.22[0](2024-03-08T03:44:46.003Z), logstash-k8s-1-7.0.0-1-2023.12.23[0](2024-03-08 [13:37:25] 6.005Z), logstash-syslog-1-7.0.0-1-2023.12.27[0](2024-03-08T03:44:46.004Z), logstash-default-1-7.0.0-1-2023.12.18[0](2024-03-08T03:44:46.006Z), logstash-default-1-7.0.0-1-2023.12.23[0](2024-03-08T03:44:46. https://wikitech.wikimedia.org/wiki/Search%23Administration [14:12:09] ACKNOWLEDGEMENT - OpenSearch unassigned shard check - 9200 on logging-hd1001 is CRITICAL: CRITICAL - logstash-mediawiki-1-7.0.0-1-2023.12.27[0](2024-03-08T03:44:46.006Z), logstash-deploy-1-7.0.0-1-2023.12.19[0](2024-03-08T03:44:46.003Z), logstash-syslog-1-7.0.0-1-2023.12.19[0](2024-03-08T03:44:46.006Z), logstash-syslog-1-7.0.0-1-2023.12.17[0](2024-03-08T03:44:46.003Z), logstash-mediawiki-1-7.0.0-1-2023.12.24[0](2024-03-08T03:44:46.003Z), [14:12:10] -k8s-1-7.0.0-1-2023.12.23[0](2024-03-08T03:44:46.005Z), logstash-deploy-1-7.0.0-1-2023.12.22[0](2024-03-08T03:44:46.003Z), logstash-syslog-1-7.0.0-1-2023.12.27[0](2024-03-08T03:44:46.004Z), logstash-webrequest-1-7.0.0-1-2023.12.19[0](2024-03-08T03:44:46.002Z), logstash-deploy-1-7.0.0-1-2023.12.16[0](2024-03-08T03:44:46.007Z), logstash-deploy-1-7.0.0-1-2023.12.20[0](2024-03-08T03:44:46.002Z), logstash-k8s-1-7.0.0-1-2023.12.18[0](2024-03-08 [14:12:10] 6.006Z), logstash-default-1-7.0.0-1-2023.12.18[0](2024-03-08T03:44:46.006Z), logstash-k8s-1-7.0.0-1-2023.12.21[0](2024-03-08T03:44:46.004Z), logstash-syslog-1-7.0.0-1-2023.12.20[0](2024-03-08T03:44:46.005Z), logst cole_white T359612 https://wikitech.wikimedia.org/wiki/Search%23Administration [14:12:10] ACKNOWLEDGEMENT - OpenSearch unassigned shard check - 9200 on logging-hd1002 is CRITICAL: CRITICAL - logstash-syslog-1-7.0.0-1-2023.12.29[0](2024-03-08T03:44:46.004Z), logstash-mediawiki-1-7.0.0-1-2023.12.17[0](2024-03-08T03:44:46.005Z), logstash-mediawiki-1-7.0.0-1-2023.12.27[0](2024-03-08T03:44:46.006Z), logstash-deploy-1-7.0.0-1-2023.12.20[0](2024-03-08T03:44:46.002Z), logstash-default-1-7.0.0-1-2023.12.23[0](2024-03-08T03:44:46.003Z), [14:12:10] h-default-1-7.0.0-1-2023.12.26[0](2024-03-08T03:44:46.007Z), logstash-k8s-1-7.0.0-1-2023.12.18[0](2024-03-08T03:44:46.006Z), logstash-syslog-1-7.0.0-1-2023.12.18[0](2024-03-08T03:44:46.003Z), logstash-webrequest-1-7.0.0-1-2023.12.19[0](2024-03-08T03:44:46.002Z), logstash-k8s-1-7.0.0-1-2023.12.20[0](2024-03-08T03:44:46.002Z), logstash-webrequest-1-7.0.0-1-2023.12.26[0](2024-03-08T03:44:46.007Z), logstash-syslog-1-7.0.0-1-2023.12.17[0](2024 [14:12:10] 3:44:46.003Z), logstash-deploy-1-7.0.0-1-2023.12.19[0](2024-03-08T03:44:46.003Z), logstash-webrequest-1-7.0.0-1-2023.12.21[0](2024-03-08T03:44:46.003Z), logstash-syslog-1-7.0.0-1-2023.12.27[0](2024-03-08T03:44:46. cole_white T359612 https://wikitech.wikimedia.org/wiki/Search%23Administration [14:12:10] ACKNOWLEDGEMENT - OpenSearch unassigned shard check - 9200 on logging-hd1003 is CRITICAL: CRITICAL - logstash-deploy-1-7.0.0-1-2023.12.16[0](2024-03-08T03:44:46.007Z), logstash-webrequest-1-7.0.0-1-2023.12.21[0](2024-03-08T03:44:46.003Z), logstash-syslog-1-7.0.0-1-2023.12.18[0](2024-03-08T03:44:46.003Z), logstash-deploy-1-7.0.0-1-2023.12.19[0](2024-03-08T03:44:46.003Z), logstash-mediawiki-1-7.0.0-1-2023.12.25[0](2024-03-08T03:44:46.002Z), [14:13:15] great ._. [14:13:27] wb icinga [14:13:55] that check output needs some help [14:14:10] suffers from TMI syndrome [14:23:41] (03CR) 10Majavah: [C:03+1] git-sync-upstream: on puppet7, deploy code after update [puppet] - 10https://gerrit.wikimedia.org/r/1009798 (https://phabricator.wikimedia.org/T351450) (owner: 10Andrew Bogott) [14:25:19] (03CR) 10Andrew Bogott: [C:03+2] git-sync-upstream: on puppet7, deploy code after update [puppet] - 10https://gerrit.wikimedia.org/r/1009798 (https://phabricator.wikimedia.org/T351450) (owner: 10Andrew Bogott) [14:28:02] (03CR) 10Andrew Bogott: [C:03+2] P:toolforge::docker::image_builder: drop buster support [puppet] - 10https://gerrit.wikimedia.org/r/1011137 (https://phabricator.wikimedia.org/T358483) (owner: 10Majavah) [14:37:15] (JobUnavailable) firing: (3) Reduced availability for job ldap in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [14:46:57] (SystemdUnitFailed) firing: (2) update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:57:15] (JobUnavailable) firing: (3) Reduced availability for job ldap in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [14:57:39] !log jnuche@deploy2002 Installing scap version "4.72.0" for 373 hosts [14:58:26] !log jnuche@deploy2002 Installation of scap version "4.72.0" completed for 373 hosts [15:08:58] !log jnuche@deploy2002 Started deploy [releng/jenkins-deploy@7a56c9a] (releasing): (no justification provided) [15:09:40] !log jnuche@deploy2002 Finished deploy [releng/jenkins-deploy@7a56c9a] (releasing): (no justification provided) (duration: 00m 41s) [15:24:18] (03PS1) 10Majavah: P:toolforge::proxy: fix custom 429 page [puppet] - 10https://gerrit.wikimedia.org/r/1011301 [15:24:18] (03PS1) 10Majavah: dynamicproxy: fix 429 error page [puppet] - 10https://gerrit.wikimedia.org/r/1011302 [15:40:40] 06SRE, 06serviceops, 07Wikimedia-production-error: VRT wiki fails to create account - https://phabricator.wikimedia.org/T359901#9634195 (10Dzahn) fwiw: I received the IP but can't find that IP/network in the private puppet repo. So it doesn't appear to be that kind of block. [15:47:20] (03CR) 10Andrew Bogott: [C:03+2] mw-xml.sh: Update maintenance script [puppet] - 10https://gerrit.wikimedia.org/r/1009784 (https://phabricator.wikimedia.org/T359643) (owner: 10Ahmon Dancy) [15:53:25] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [15:53:31] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [16:08:27] (03PS2) 10Majavah: dynamicproxy: fix 429 error page [puppet] - 10https://gerrit.wikimedia.org/r/1011302 [16:53:25] (SystemdUnitFailed) firing: rsync-aptrepo-apt2001.wikimedia.org.service on apt1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [17:01:02] !log jnuche@deploy2002 Started deploy [releng/jenkins-deploy@78fbb55] (releasing): (no justification provided) [17:01:43] !log jnuche@deploy2002 Finished deploy [releng/jenkins-deploy@78fbb55] (releasing): (no justification provided) (duration: 00m 40s) [17:09:22] !log jnuche@deploy2002 Started deploy [releng/jenkins-deploy@611b85b] (releasing): (no justification provided) [17:10:02] !log jnuche@deploy2002 Finished deploy [releng/jenkins-deploy@611b85b] (releasing): (no justification provided) (duration: 00m 39s) [17:11:33] (KubernetesAPILatency) firing: High Kubernetes API latency (LIST certificaterequests) on k8s-mlstaging@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s-mlstaging&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [17:16:33] (KubernetesAPILatency) resolved: High Kubernetes API latency (LIST certificaterequests) on k8s-mlstaging@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s-mlstaging&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [17:25:57] !log elukey@deploy2002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'readability' for release 'main' . [17:38:15] (PHPFPMTooBusy) firing: Not enough idle PHP-FPM workers for Mediawiki mw-parsoid at codfw: 49.57% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-container_name=All - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [17:43:15] (PHPFPMTooBusy) resolved: Not enough idle PHP-FPM workers for Mediawiki mw-parsoid at codfw: 49.57% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-container_name=All - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [17:53:21] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [17:53:28] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [18:00:46] 06SRE, 06Data-Platform-SRE: Configure the Hadoop MapReduce ports to use a fixed range - https://phabricator.wikimedia.org/T111433#9634630 (10BTullis) a:05BTullis→03None [18:10:03] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [18:10:10] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [18:11:15] (03PS4) 10Brouberol: quarry: enable https traffic incoming from prometeus hosts [puppet] - 10https://gerrit.wikimedia.org/r/1011323 (https://phabricator.wikimedia.org/T360220) [18:16:41] (03CR) 10Majavah: [C:04-1] "The quarry role does not have any host-level firewall applied (instead it relies on OpenStack security groups) and even if it did, Prometh" [puppet] - 10https://gerrit.wikimedia.org/r/1011323 (https://phabricator.wikimedia.org/T360220) (owner: 10Brouberol) [18:16:46] (03CR) 10Phuedx: [C:03+1] quarry: enable https traffic incoming from prometeus hosts [puppet] - 10https://gerrit.wikimedia.org/r/1011323 (https://phabricator.wikimedia.org/T360220) (owner: 10Brouberol) [18:17:05] (03CR) 10Phuedx: quarry: enable https traffic incoming from prometeus hosts [puppet] - 10https://gerrit.wikimedia.org/r/1011323 (https://phabricator.wikimedia.org/T360220) (owner: 10Brouberol) [18:17:17] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [18:17:24] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [18:18:36] (03Abandoned) 10Brouberol: quarry: enable https traffic incoming from prometeus hosts [puppet] - 10https://gerrit.wikimedia.org/r/1011323 (https://phabricator.wikimedia.org/T360220) (owner: 10Brouberol) [18:40:35] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [18:40:41] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [18:47:12] (SystemdUnitFailed) firing: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:52:46] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [18:52:52] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [18:53:25] (SystemdUnitFailed) resolved: rsync-aptrepo-apt2001.wikimedia.org.service on apt1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:57:25] (SystemdUnitFailed) firing: rsync-aptrepo-apt2001.wikimedia.org.service on apt1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:57:30] (JobUnavailable) firing: (2) Reduced availability for job ldap in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [18:58:05] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [18:58:11] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [19:36:42] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [19:36:49] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [19:54:24] (03PS3) 10Aram: Add autopatrolled, rollbacker and suppressredirect user groups for ckbwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1010938 (https://phabricator.wikimedia.org/T360228) [19:55:56] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [19:56:03] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [19:59:10] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [19:59:17] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [20:20:06] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [20:20:09] (03PS1) 10Btullis: Use a routable sender address for email from Airflow [puppet] - 10https://gerrit.wikimedia.org/r/1011342 (https://phabricator.wikimedia.org/T358675) [20:20:13] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [20:23:52] (03CR) 10Btullis: [V:03+1] "PCC SUCCESS (CORE_DIFF 7): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/1644/co" [puppet] - 10https://gerrit.wikimedia.org/r/1011342 (https://phabricator.wikimedia.org/T358675) (owner: 10Btullis) [20:39:05] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [20:39:12] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [20:48:42] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [20:48:49] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [20:57:03] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [20:57:09] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [21:03:01] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [21:03:08] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [21:17:50] (03CR) 10Brouberol: [C:03+1] "Looks good!" [puppet] - 10https://gerrit.wikimedia.org/r/1011342 (https://phabricator.wikimedia.org/T358675) (owner: 10Btullis) [21:44:04] (03PS2) 10RLazarus: mediawiki: Add mwscript labels to the job as well as the pods [deployment-charts] - 10https://gerrit.wikimedia.org/r/1009373 (https://phabricator.wikimedia.org/T341553) [21:48:40] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [21:48:47] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [22:01:38] (03CR) 10RLazarus: "True, and I guess it's been like that for a while, since it fails on both sides of the diff. I'm not sure why yet (maybe that "ci_only_rel" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1009373 (https://phabricator.wikimedia.org/T341553) (owner: 10RLazarus) [22:01:50] (03PS1) 10Brouberol: ATS: redirect superset.wikimedia.org to the kubernetes deployment [puppet] - 10https://gerrit.wikimedia.org/r/1011359 (https://phabricator.wikimedia.org/T358569) [22:01:52] (03PS1) 10Brouberol: idp: update the superset OIDC service [puppet] - 10https://gerrit.wikimedia.org/r/1011360 (https://phabricator.wikimedia.org/T358569) [22:01:56] (03PS1) 10Brouberol: superset: cleanup references to old temporary domains [deployment-charts] - 10https://gerrit.wikimedia.org/r/1011362 (https://phabricator.wikimedia.org/T358480) [22:02:04] (03PS1) 10Brouberol: superset: cleanup references to old temporary domains [puppet] - 10https://gerrit.wikimedia.org/r/1011361 (https://phabricator.wikimedia.org/T358480) [22:02:12] (03PS1) 10Brouberol: superset: cleanup references to old temporary domains [dns] - 10https://gerrit.wikimedia.org/r/1011363 (https://phabricator.wikimedia.org/T358480) [22:06:54] (03CR) 10Jdlrobson: [C:03+1] "LGTM. Please feel free to schedule a deployment in one of our backport windows: https://wikitech.wikimedia.org/wiki/Deployments" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1011097 (https://phabricator.wikimedia.org/T359741) (owner: 10Anzx) [22:19:06] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [22:19:13] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [22:22:40] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [22:22:46] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [22:28:08] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [22:28:15] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [22:32:57] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [22:33:04] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [22:47:13] (SystemdUnitFailed) firing: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [22:57:17] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [22:57:23] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [22:57:25] (SystemdUnitFailed) firing: rsync-aptrepo-apt2001.wikimedia.org.service on apt1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [22:57:30] (JobUnavailable) firing: (2) Reduced availability for job ldap in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [23:11:13] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [23:11:20] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [23:14:41] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [23:14:48] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [23:18:02] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [23:18:09] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [23:20:23] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [23:20:30] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [23:22:52] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [23:22:58] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [23:25:57] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [23:26:04] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply