[00:00:39] (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1028943 (owner: 10TrainBranchBot) [00:00:58] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1198 (T352010)', diff saved to https://phabricator.wikimedia.org/P62265 and previous config saved to /var/cache/conftool/dbconfig/20240510-000058-ladsgroup.json [00:01:02] T352010: Gradually drop old pagelinks columns - https://phabricator.wikimedia.org/T352010 [00:08:26] FIRING: [4x] SystemdUnitFailed: docker-reporter-base-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:16:06] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P62266 and previous config saved to /var/cache/conftool/dbconfig/20240510-001605-ladsgroup.json [00:31:14] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P62267 and previous config saved to /var/cache/conftool/dbconfig/20240510-003113-ladsgroup.json [00:46:21] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1198 (T352010)', diff saved to https://phabricator.wikimedia.org/P62268 and previous config saved to /var/cache/conftool/dbconfig/20240510-004621-ladsgroup.json [00:46:25] !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1212.eqiad.wmnet with reason: Maintenance [00:46:28] T352010: Gradually drop old pagelinks columns - https://phabricator.wikimedia.org/T352010 [00:46:38] !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1212.eqiad.wmnet with reason: Maintenance [00:46:40] !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance [00:46:56] !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance [00:47:05] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1212 (T352010)', diff saved to https://phabricator.wikimedia.org/P62269 and previous config saved to /var/cache/conftool/dbconfig/20240510-004703-ladsgroup.json [02:05:27] FIRING: SystemdUnitFailed: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:35:13] FIRING: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [02:56:37] FIRING: ProbeDown: Service aqs1013-b:9042 has failed probes (tcp_cassandra_b_cql_ip4) - https://wikitech.wikimedia.org/wiki/TLS/Runbook#aqs1013-b:9042 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [03:00:13] RESOLVED: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [03:00:13] RESOLVED: ProbeDown: Service aqs1013-b:9042 has failed probes (tcp_cassandra_b_cql_ip4) - https://wikitech.wikimedia.org/wiki/TLS/Runbook#aqs1013-b:9042 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [03:03:26] FIRING: [4x] SystemdUnitFailed: docker-reporter-base-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:15:39] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1212 (T352010)', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20240510-041534-ladsgroup.json [04:15:48] T352010: Gradually drop old pagelinks columns - https://phabricator.wikimedia.org/T352010 [04:30:47] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P62270 and previous config saved to /var/cache/conftool/dbconfig/20240510-043046-ladsgroup.json [04:45:55] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P62271 and previous config saved to /var/cache/conftool/dbconfig/20240510-044554-ladsgroup.json [05:01:03] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1212 (T352010)', diff saved to https://phabricator.wikimedia.org/P62272 and previous config saved to /var/cache/conftool/dbconfig/20240510-050102-ladsgroup.json [05:01:05] !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1240.eqiad.wmnet with reason: Maintenance [05:01:07] T352010: Gradually drop old pagelinks columns - https://phabricator.wikimedia.org/T352010 [05:01:18] !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1240.eqiad.wmnet with reason: Maintenance [05:08:26] FIRING: [3x] SystemdUnitFailed: docker-reporter-base-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:21:37] FIRING: ProbeDown: Service shellbox:4008 has failed probes (http_shellbox_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#shellbox:4008 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [05:25:13] RESOLVED: ProbeDown: Service shellbox:4008 has failed probes (http_shellbox_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#shellbox:4008 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [05:44:48] (03CR) 10Muehlenhoff: [C:03+1] "Always nice to see 1036 manually managed files go away :-) Or rather 2072, also accounting for the equivalents in the private repo..." [labs/private] - 10https://gerrit.wikimedia.org/r/1029567 (https://phabricator.wikimedia.org/T352647) (owner: 10Elukey) [05:47:32] (03PS3) 10Fabfur: cache:benthos: test for socket based activation in Benthos [puppet] - 10https://gerrit.wikimedia.org/r/1029615 (https://phabricator.wikimedia.org/T364379) [05:51:08] (03PS4) 10Fabfur: cache:benthos: test for socket based activation in Benthos [puppet] - 10https://gerrit.wikimedia.org/r/1029615 (https://phabricator.wikimedia.org/T364379) [05:51:47] 06SRE, 10observability, 13Patch-For-Review, 10SRE Observability (FY2023/2024-Q4): Phase out cergen for Observability services - https://phabricator.wikimedia.org/T360414#9785054 (10MoritzMuehlenhoff) Nice work! There are two leftovers, which fell through the cracks I think? - certificate.manifests.d... [05:53:04] (03CR) 10Fabfur: "I've also added the possibility to not specify any custom UDP buffer size. In this case socket unit should be brought anyway without speci" [puppet] - 10https://gerrit.wikimedia.org/r/1029615 (https://phabricator.wikimedia.org/T364379) (owner: 10Fabfur) [05:54:53] 06SRE, 06Infrastructure-Foundations, 10Puppet-Infrastructure, 13Patch-For-Review, 10Puppet (Puppet 7.0): Phase out cergen - https://phabricator.wikimedia.org/T357750#9785062 (10MoritzMuehlenhoff) [05:57:16] (03CR) 10Fabfur: [V:03+1] "PCC SUCCESS (CORE_DIFF 2 DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/" [puppet] - 10https://gerrit.wikimedia.org/r/1029615 (https://phabricator.wikimedia.org/T364379) (owner: 10Fabfur) [06:00:05] Deploy window MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240510T0600) [06:01:43] (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1029509 (https://phabricator.wikimedia.org/T364456) (owner: 10Btullis) [06:02:07] (03PS5) 10Fabfur: cache:benthos: test for socket based activation in Benthos [puppet] - 10https://gerrit.wikimedia.org/r/1029615 (https://phabricator.wikimedia.org/T364379) [06:03:58] !log eoghan@cumin1002 START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade for T364481 [06:06:57] (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [debs/amd-k8s-device-plugin] - 10https://gerrit.wikimedia.org/r/1029218 (https://phabricator.wikimedia.org/T362984) (owner: 10Elukey) [06:08:11] FIRING: SystemdUnitFailed: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:10:32] !log eoghan@cumin1002 END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade for T364481 [06:16:26] (03PS1) 10Muehlenhoff: Add a new function to return the wiki PHP version currently in use [puppet] - 10https://gerrit.wikimedia.org/r/1029900 [06:20:15] (03CR) 10Muehlenhoff: "We should have done that for the app server fleet already when we introduced Envoy, but the remaining baremetal servers will be gone short" [puppet] - 10https://gerrit.wikimedia.org/r/1026453 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff) [06:21:08] (03PS6) 10Fabfur: cache:benthos: test for socket based activation in Benthos [puppet] - 10https://gerrit.wikimedia.org/r/1029615 (https://phabricator.wikimedia.org/T364379) [06:22:17] (03CR) 10Muehlenhoff: "A restart of docker/containerd won't impact running containers, the restart is similar to e.g. OpenSSH, where existing connections keep be" [puppet] - 10https://gerrit.wikimedia.org/r/1028795 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff) [06:26:30] (03CR) 10Fabfur: [V:03+1] "PCC SUCCESS (CORE_DIFF 2 DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/" [puppet] - 10https://gerrit.wikimedia.org/r/1029615 (https://phabricator.wikimedia.org/T364379) (owner: 10Fabfur) [06:30:17] (03PS7) 10Fabfur: cache:benthos: test for socket based activation in Benthos [puppet] - 10https://gerrit.wikimedia.org/r/1029615 (https://phabricator.wikimedia.org/T364379) [06:35:59] (03CR) 10Fabfur: [V:03+1] "PCC SUCCESS (CORE_DIFF 2 DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/" [puppet] - 10https://gerrit.wikimedia.org/r/1029615 (https://phabricator.wikimedia.org/T364379) (owner: 10Fabfur) [06:50:31] (03CR) 10Vgutierrez: [C:04-1] cache:benthos: test for socket based activation in Benthos (037 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1029615 (https://phabricator.wikimedia.org/T364379) (owner: 10Fabfur) [07:00:04] Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240510T0700) [07:00:27] PROBLEM - CirrusSearch more_like eqiad 95th percentile latency on graphite1005 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1500.0] https://wikitech.wikimedia.org/wiki/Search%23Health/Activity_Monitoring https://grafana.wikimedia.org/d/000000455/elasticsearch-percentiles?orgId=1&var-cirrus_group=eqiad&var-cluster=elasticsearch&var-exported_cluster=production-search&var-smoothing=1&viewPanel=39 [07:05:25] RECOVERY - CirrusSearch more_like eqiad 95th percentile latency on graphite1005 is OK: OK: Less than 20.00% above the threshold [1000.0] https://wikitech.wikimedia.org/wiki/Search%23Health/Activity_Monitoring https://grafana.wikimedia.org/d/000000455/elasticsearch-percentiles?orgId=1&var-cirrus_group=eqiad&var-cluster=elasticsearch&var-exported_cluster=production-search&var-smoothing=1&viewPanel=39 [07:13:11] (03PS1) 10Muehlenhoff: Deprecate system::role for o11y services [puppet] - 10https://gerrit.wikimedia.org/r/1029924 [07:55:39] !log installing Linux 6.1.90 on Bookworm systems [07:55:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:04:10] (03Abandoned) 10WQuarshie: example-node-api chart creating chart for the exampl-node-api Bug:T288134 [deployment-charts] - 10https://gerrit.wikimedia.org/r/743483 (owner: 10WQuarshie) [08:08:26] FIRING: [3x] SystemdUnitFailed: docker-reporter-base-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:21:42] 07sre-alert-triage, 06Data-Platform-SRE: Alert in need of triage: PybalBackendDown (instance elastic2090:0) - https://phabricator.wikimedia.org/T364528#9785256 (10Gehel) p:05Triage→03High [08:22:13] 07sre-alert-triage, 10Data-Platform-SRE (2024.05.06 - 2024.05.26): Alert in need of triage: PybalBackendDown (instance elastic2090:0) - https://phabricator.wikimedia.org/T364528#9785257 (10Gehel) [08:24:11] (03PS8) 10Fabfur: cache:benthos: test for socket based activation in Benthos [puppet] - 10https://gerrit.wikimedia.org/r/1029615 (https://phabricator.wikimedia.org/T364379) [08:24:27] (03CR) 10Fabfur: cache:benthos: test for socket based activation in Benthos (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1029615 (https://phabricator.wikimedia.org/T364379) (owner: 10Fabfur) [08:25:30] (03CR) 10Fabfur: cache:benthos: test for socket based activation in Benthos (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1029615 (https://phabricator.wikimedia.org/T364379) (owner: 10Fabfur) [08:30:40] !log restore SRE business hours oncall for EMEA - T350192 [08:30:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:30:51] T350192: On-call batphone escalation configuration holidays FY2023-24 - https://phabricator.wikimedia.org/T350192 [08:35:59] (03PS9) 10Fabfur: cache:benthos: test for socket based activation in Benthos [puppet] - 10https://gerrit.wikimedia.org/r/1029615 (https://phabricator.wikimedia.org/T364379) [08:37:47] (03CR) 10Fabfur: [V:03+1] "PCC SUCCESS (CORE_DIFF 3): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/2390/co" [puppet] - 10https://gerrit.wikimedia.org/r/1029615 (https://phabricator.wikimedia.org/T364379) (owner: 10Fabfur) [08:42:49] !log jelto@cumin1002 START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version [08:55:42] (03CR) 10Filippo Giunchedi: [C:03+1] Deprecate system::role for o11y services [puppet] - 10https://gerrit.wikimedia.org/r/1029924 (owner: 10Muehlenhoff) [08:58:05] (03PS1) 10Vgutierrez: ncredir,benthos: Move processors to the pipeline section [puppet] - 10https://gerrit.wikimedia.org/r/1030010 (https://phabricator.wikimedia.org/T364379) [09:01:46] (03CR) 10Vgutierrez: cache:benthos: test for socket based activation in Benthos (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1029615 (https://phabricator.wikimedia.org/T364379) (owner: 10Fabfur) [09:11:19] (03CR) 10Vgutierrez: cache:benthos: test for socket based activation in Benthos (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1029615 (https://phabricator.wikimedia.org/T364379) (owner: 10Fabfur) [09:12:04] (03CR) 10Majavah: [C:03+1] "This seems fine, and we could later drop all the `NULL as ...` columns since I don't think exposing those is very useful. I'll deploy this" [puppet] - 10https://gerrit.wikimedia.org/r/1029709 (https://phabricator.wikimedia.org/T364435) (owner: 10Zabe) [09:21:06] (03Abandoned) 10EoghanGaffney: apt: Update gitlab package [puppet] - 10https://gerrit.wikimedia.org/r/1029608 (https://phabricator.wikimedia.org/T364481) (owner: 10EoghanGaffney) [09:21:54] (03PS1) 10Muehlenhoff: Add a class to Cumin hosts which generates a Kafka certificate for frtech [puppet] - 10https://gerrit.wikimedia.org/r/1030018 (https://phabricator.wikimedia.org/T360779) [09:22:15] (03CR) 10CI reject: [V:04-1] Add a class to Cumin hosts which generates a Kafka certificate for frtech [puppet] - 10https://gerrit.wikimedia.org/r/1030018 (https://phabricator.wikimedia.org/T360779) (owner: 10Muehlenhoff) [09:28:09] (03PS1) 10Filippo Giunchedi: Revert "prometheus: use longer-expiration pki client certs for k8s" [puppet] - 10https://gerrit.wikimedia.org/r/1030019 [09:29:05] (03PS2) 10Muehlenhoff: Add a class to Cumin hosts which generates a Kafka certificate for frtech [puppet] - 10https://gerrit.wikimedia.org/r/1030018 (https://phabricator.wikimedia.org/T360779) [09:29:26] (03CR) 10CI reject: [V:04-1] Add a class to Cumin hosts which generates a Kafka certificate for frtech [puppet] - 10https://gerrit.wikimedia.org/r/1030018 (https://phabricator.wikimedia.org/T360779) (owner: 10Muehlenhoff) [09:29:36] (03CR) 10Filippo Giunchedi: "e.g." [puppet] - 10https://gerrit.wikimedia.org/r/1030019 (owner: 10Filippo Giunchedi) [09:33:21] (03PS3) 10Muehlenhoff: Add a class to Cumin hosts which generates a Kafka certificate for frtech [puppet] - 10https://gerrit.wikimedia.org/r/1030018 (https://phabricator.wikimedia.org/T360779) [09:33:41] (03CR) 10CI reject: [V:04-1] Add a class to Cumin hosts which generates a Kafka certificate for frtech [puppet] - 10https://gerrit.wikimedia.org/r/1030018 (https://phabricator.wikimedia.org/T360779) (owner: 10Muehlenhoff) [09:34:52] (03PS1) 10Vgutierrez: hiera: Enable IPIP encapsulation on high-traffic2@magru [puppet] - 10https://gerrit.wikimedia.org/r/1030021 (https://phabricator.wikimedia.org/T357257) [09:34:58] (03PS4) 10Muehlenhoff: Add a class to Cumin hosts which generates a Kafka certificate for frtech [puppet] - 10https://gerrit.wikimedia.org/r/1030018 (https://phabricator.wikimedia.org/T360779) [09:37:00] (03CR) 10Vgutierrez: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/2391/co" [puppet] - 10https://gerrit.wikimedia.org/r/1030021 (https://phabricator.wikimedia.org/T357257) (owner: 10Vgutierrez) [09:38:41] (03PS10) 10Fabfur: cache:benthos: test for socket based activation in Benthos [puppet] - 10https://gerrit.wikimedia.org/r/1029615 (https://phabricator.wikimedia.org/T364379) [09:38:44] (03CR) 10Fabfur: cache:benthos: test for socket based activation in Benthos (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1029615 (https://phabricator.wikimedia.org/T364379) (owner: 10Fabfur) [09:40:16] (03PS2) 10Vgutierrez: hiera: Enable IPIP encapsulation on high-traffic2@magru [puppet] - 10https://gerrit.wikimedia.org/r/1030021 (https://phabricator.wikimedia.org/T357257) [09:40:16] (03PS1) 10Vgutierrez: hiera: Enable IPIP on upload and upload-https services [puppet] - 10https://gerrit.wikimedia.org/r/1030022 (https://phabricator.wikimedia.org/T357257) [09:42:00] (03CR) 10Vgutierrez: [V:03+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/2392/console" [puppet] - 10https://gerrit.wikimedia.org/r/1030022 (https://phabricator.wikimedia.org/T357257) (owner: 10Vgutierrez) [09:43:10] (03CR) 10Vgutierrez: cache:benthos: test for socket based activation in Benthos (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1029615 (https://phabricator.wikimedia.org/T364379) (owner: 10Fabfur) [09:45:24] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1030018 (https://phabricator.wikimedia.org/T360779) (owner: 10Muehlenhoff) [09:45:25] (03PS11) 10Fabfur: cache:benthos: test for socket based activation in Benthos [puppet] - 10https://gerrit.wikimedia.org/r/1029615 (https://phabricator.wikimedia.org/T364379) [09:45:34] (03CR) 10Fabfur: cache:benthos: test for socket based activation in Benthos (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1029615 (https://phabricator.wikimedia.org/T364379) (owner: 10Fabfur) [09:46:38] (03CR) 10Vgutierrez: [V:03+1] "PCC SUCCESS (DIFF 1 CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/" [puppet] - 10https://gerrit.wikimedia.org/r/1030021 (https://phabricator.wikimedia.org/T357257) (owner: 10Vgutierrez) [09:46:56] (03PS12) 10Fabfur: cache:benthos: test for socket based activation in Benthos [puppet] - 10https://gerrit.wikimedia.org/r/1029615 (https://phabricator.wikimedia.org/T364379) [09:51:36] (03PS1) 10Slyngshede: Enable token management UI. [puppet] - 10https://gerrit.wikimedia.org/r/1030048 [09:52:08] (03CR) 10Fabfur: [V:03+1] "PCC SUCCESS (CORE_DIFF 4): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/2395/co" [puppet] - 10https://gerrit.wikimedia.org/r/1029615 (https://phabricator.wikimedia.org/T364379) (owner: 10Fabfur) [09:54:58] (03PS1) 10Vgutierrez: cache: Enable IPIP encapsulation on upload@magru [puppet] - 10https://gerrit.wikimedia.org/r/1030051 (https://phabricator.wikimedia.org/T357257) [09:57:09] (03PS2) 10Slyngshede: Enable token management UI. [puppet] - 10https://gerrit.wikimedia.org/r/1030048 [09:57:34] (03CR) 10CI reject: [V:04-1] Enable token management UI. [puppet] - 10https://gerrit.wikimedia.org/r/1030048 (owner: 10Slyngshede) [09:57:49] (03PS3) 10Slyngshede: Enable token management UI. [puppet] - 10https://gerrit.wikimedia.org/r/1030048 (https://phabricator.wikimedia.org/T364605) [09:58:19] (03PS1) 10Muehlenhoff: Simplify profile::cache::kafka::certificate to only support PKI/cfssl [puppet] - 10https://gerrit.wikimedia.org/r/1030052 (https://phabricator.wikimedia.org/T337825) [09:58:41] (03CR) 10CI reject: [V:04-1] Simplify profile::cache::kafka::certificate to only support PKI/cfssl [puppet] - 10https://gerrit.wikimedia.org/r/1030052 (https://phabricator.wikimedia.org/T337825) (owner: 10Muehlenhoff) [09:58:55] (03CR) 10Vgutierrez: [V:03+1] "PCC SUCCESS (CORE_DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/2397/co" [puppet] - 10https://gerrit.wikimedia.org/r/1030051 (https://phabricator.wikimedia.org/T357257) (owner: 10Vgutierrez) [09:59:42] (03PS2) 10Muehlenhoff: Simplify profile::cache::kafka::certificate to only support PKI/cfssl [puppet] - 10https://gerrit.wikimedia.org/r/1030052 (https://phabricator.wikimedia.org/T337825) [10:03:33] (03CR) 10Vgutierrez: [V:03+1 C:04-2] "to be merged on Monday" [puppet] - 10https://gerrit.wikimedia.org/r/1030022 (https://phabricator.wikimedia.org/T357257) (owner: 10Vgutierrez) [10:03:41] (03CR) 10Vgutierrez: [V:03+1 C:04-2] "to be merged on Monday" [puppet] - 10https://gerrit.wikimedia.org/r/1030021 (https://phabricator.wikimedia.org/T357257) (owner: 10Vgutierrez) [10:03:48] (03CR) 10Vgutierrez: [V:03+1 C:04-2] "to be merged on Monday" [puppet] - 10https://gerrit.wikimedia.org/r/1030051 (https://phabricator.wikimedia.org/T357257) (owner: 10Vgutierrez) [10:06:51] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1030052 (https://phabricator.wikimedia.org/T337825) (owner: 10Muehlenhoff) [10:08:11] FIRING: SystemdUnitFailed: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:08:22] (03CR) 10Klausman: [C:03+1] Release new version [debs/amd-k8s-device-plugin] - 10https://gerrit.wikimedia.org/r/1029218 (https://phabricator.wikimedia.org/T362984) (owner: 10Elukey) [10:08:45] (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1030019 (owner: 10Filippo Giunchedi) [10:11:56] (03CR) 10Filippo Giunchedi: [C:03+2] Revert "prometheus: use longer-expiration pki client certs for k8s" [puppet] - 10https://gerrit.wikimedia.org/r/1030019 (owner: 10Filippo Giunchedi) [10:13:39] (03PS3) 10Muehlenhoff: Simplify profile::cache::kafka::certificate to only support PKI/cfssl [puppet] - 10https://gerrit.wikimedia.org/r/1030052 (https://phabricator.wikimedia.org/T337825) [10:15:50] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1030052 (https://phabricator.wikimedia.org/T337825) (owner: 10Muehlenhoff) [10:16:24] (03PS2) 10Majavah: wikireplicas: update-views: Add filter option [cookbooks] - 10https://gerrit.wikimedia.org/r/1026857 [10:16:24] (03PS1) 10Majavah: wikireplicas: update-views: Run Puppet agent before maintain-views [cookbooks] - 10https://gerrit.wikimedia.org/r/1030055 [10:20:29] (03CR) 10CI reject: [V:04-1] wikireplicas: update-views: Run Puppet agent before maintain-views [cookbooks] - 10https://gerrit.wikimedia.org/r/1030055 (owner: 10Majavah) [10:20:55] (03CR) 10FNegri: [C:03+1] "LGTM!" [cookbooks] - 10https://gerrit.wikimedia.org/r/1026857 (owner: 10Majavah) [10:21:15] (03CR) 10FNegri: [C:03+1] "LGTM" [cookbooks] - 10https://gerrit.wikimedia.org/r/1030055 (owner: 10Majavah) [10:21:41] (03CR) 10Majavah: [C:03+2] wikireplicas: update-views: Add filter option [cookbooks] - 10https://gerrit.wikimedia.org/r/1026857 (owner: 10Majavah) [10:21:51] PROBLEM - Kafka MirrorMaker main-eqiad_to_main-codfw max lag in last 10 minutes on alert1001 is CRITICAL: 2.296e+05 gt 1e+05 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/d/000000521/kafka-mirrormaker?var-datasource=codfw+prometheus/ops&var-lag_datasource=eqiad+prometheus/ops&var-mirror_name=main-eqiad_to_main-codfw [10:22:10] (03PS1) 10Vgutierrez: team-traffic: Add runbook link to LVSRealserverMSS alert [alerts] - 10https://gerrit.wikimedia.org/r/1030057 (https://phabricator.wikimedia.org/T357257) [10:23:09] (03PS2) 10Majavah: wikireplicas: update-views: Run Puppet agent before maintain-views [cookbooks] - 10https://gerrit.wikimedia.org/r/1030055 [10:25:01] (03PS13) 10Fabfur: cache:benthos: test for socket based activation in Benthos [puppet] - 10https://gerrit.wikimedia.org/r/1029615 (https://phabricator.wikimedia.org/T364379) [10:25:35] (03Merged) 10jenkins-bot: wikireplicas: update-views: Add filter option [cookbooks] - 10https://gerrit.wikimedia.org/r/1026857 (owner: 10Majavah) [10:27:27] (03CR) 10Majavah: [C:03+2] wikireplicas: update-views: Run Puppet agent before maintain-views [cookbooks] - 10https://gerrit.wikimedia.org/r/1030055 (owner: 10Majavah) [10:29:51] RECOVERY - Kafka MirrorMaker main-eqiad_to_main-codfw max lag in last 10 minutes on alert1001 is OK: (C)1e+05 gt (W)1e+04 gt 12 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/d/000000521/kafka-mirrormaker?var-datasource=codfw+prometheus/ops&var-lag_datasource=eqiad+prometheus/ops&var-mirror_name=main-eqiad_to_main-codfw [10:30:27] (03CR) 10FNegri: [C:03+1] wikireplicas: update-views: Run Puppet agent before maintain-views [cookbooks] - 10https://gerrit.wikimedia.org/r/1030055 (owner: 10Majavah) [10:31:08] (03Merged) 10jenkins-bot: wikireplicas: update-views: Run Puppet agent before maintain-views [cookbooks] - 10https://gerrit.wikimedia.org/r/1030055 (owner: 10Majavah) [10:32:30] !log installing Linux 5.10.216 on Bullseye systems [10:32:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:36:32] (03PS14) 10Fabfur: cache:benthos: test for socket based activation in Benthos [puppet] - 10https://gerrit.wikimedia.org/r/1029615 (https://phabricator.wikimedia.org/T364379) [10:43:09] (03CR) 10Elukey: [V:03+2 C:03+2] Release new version [debs/amd-k8s-device-plugin] - 10https://gerrit.wikimedia.org/r/1029218 (https://phabricator.wikimedia.org/T362984) (owner: 10Elukey) [10:47:13] (03PS15) 10Fabfur: cache:benthos: test for socket based activation in Benthos [puppet] - 10https://gerrit.wikimedia.org/r/1029615 (https://phabricator.wikimedia.org/T364379) [10:47:22] (03CR) 10Elukey: "The cache s3 code should be: https://github.com/go-spatial/tegola/blob/master/cache/s3/s3.go" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1029544 (https://phabricator.wikimedia.org/T344324) (owner: 10Elukey) [10:49:52] jelto@cumin1002 jelto: The backup on gitlab2002 is complete, ready to proceed with upgrade. [10:50:38] (03PS3) 10Btullis: Add the airflow profile to the statistics::explorer role [puppet] - 10https://gerrit.wikimedia.org/r/1029541 (https://phabricator.wikimedia.org/T364542) [10:50:42] !log add amd-k8s-device-plugin_1.25.2.8 to bullseye-wikimedia [10:50:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:52:46] (03CR) 10Btullis: [V:03+1] "PCC SUCCESS (CORE_DIFF 6): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/2400/co" [puppet] - 10https://gerrit.wikimedia.org/r/1029541 (https://phabricator.wikimedia.org/T364542) (owner: 10Btullis) [11:00:04] Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240510T0700) [11:00:04] eoghan, jelto, arnoldokoth, and mutante: How many deployers does it take to do GitLab version upgrades deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240510T1100). [11:02:39] (03CR) 10Muehlenhoff: [C:03+1] "Typo inline, but LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1030048 (https://phabricator.wikimedia.org/T364605) (owner: 10Slyngshede) [11:05:33] !log jelto@cumin1002 END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version [11:10:27] FIRING: [2x] SystemdUnitFailed: gitlab-package-puller.service on apt-staging2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:13:11] FIRING: [2x] SystemdUnitFailed: gitlab-package-puller.service on apt-staging2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:13:37] (03CR) 10Btullis: [C:03+1] "Looks good, thanks." [puppet] - 10https://gerrit.wikimedia.org/r/1029188 (owner: 10Muehlenhoff) [11:14:00] (03CR) 10Btullis: [C:03+1] Configure an-test-druid to use firewall::service compatible firewall settings [puppet] - 10https://gerrit.wikimedia.org/r/1029180 (owner: 10Muehlenhoff) [11:14:21] (03CR) 10Btullis: [C:03+1] "Looks good, thanks." [puppet] - 10https://gerrit.wikimedia.org/r/1029184 (owner: 10Muehlenhoff) [11:22:53] (03PS4) 10Slyngshede: Enable token management UI. [puppet] - 10https://gerrit.wikimedia.org/r/1030048 (https://phabricator.wikimedia.org/T364605) [11:24:47] (03CR) 10Muehlenhoff: [C:03+1] Enable token management UI. [puppet] - 10https://gerrit.wikimedia.org/r/1030048 (https://phabricator.wikimedia.org/T364605) (owner: 10Slyngshede) [11:25:06] (03CR) 10Slyngshede: [C:03+2] Enable token management UI. [puppet] - 10https://gerrit.wikimedia.org/r/1030048 (https://phabricator.wikimedia.org/T364605) (owner: 10Slyngshede) [11:25:13] (03CR) 10Slyngshede: [C:03+2] Enable token management UI. (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1030048 (https://phabricator.wikimedia.org/T364605) (owner: 10Slyngshede) [11:35:06] !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance [11:35:19] !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance [11:36:51] !log roll out debdeploy 0.0.99.14 [11:36:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:49:09] PROBLEM - mailman list info on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [11:49:43] PROBLEM - mailman archives on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [11:54:34] (03CR) 10Elukey: [V:03+2 C:03+2] Add fake TLS keystore password for Cassandra clusters [labs/private] - 10https://gerrit.wikimedia.org/r/1029538 (https://phabricator.wikimedia.org/T352647) (owner: 10Elukey) [11:55:00] (03CR) 10Elukey: [V:03+2 C:03+2] Delete the Cassandra directory in secrets [labs/private] - 10https://gerrit.wikimedia.org/r/1029567 (https://phabricator.wikimedia.org/T352647) (owner: 10Elukey) [11:55:41] RECOVERY - mailman archives on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 51925 bytes in 6.707 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [11:56:03] RECOVERY - mailman list info on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8616 bytes in 0.279 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [12:01:38] !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host irc1002.wikimedia.org [12:03:41] !log Restarting MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration [12:03:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:05:26] !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1002.wikimedia.org [12:08:15] !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host irc2002.wikimedia.org [12:08:41] FIRING: [2x] SystemdUnitFailed: docker-reporter-base-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:12:02] !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc2002.wikimedia.org [12:23:24] (03PS2) 10Elukey: services: move Tegola's Swift config in staging to local envoy proxy [deployment-charts] - 10https://gerrit.wikimedia.org/r/1029544 (https://phabricator.wikimedia.org/T344324) [12:24:40] (03PS1) 10Muehlenhoff: Add alias for full cluster [puppet] - 10https://gerrit.wikimedia.org/r/1030085 [12:25:53] 10ops-codfw, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: codfw row C/D upgrade racking task - https://phabricator.wikimedia.org/T360789#9785851 (10cmooney) > @ayounsi @cmooney please provide the interfaces to use on cr* for the uplink from ssw1-d1 to cr1 and ssw1-d8 to cr2 Eventually the c... [12:37:25] (03CR) 10Elukey: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/2402/co" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1030018 (https://phabricator.wikimedia.org/T360779) (owner: 10Muehlenhoff) [12:37:56] (03CR) 10Elukey: [C:03+1] "LGTM! Left a couple of nits but feel free to ignore them if you don't feel they are worth it." [puppet] - 10https://gerrit.wikimedia.org/r/1030018 (https://phabricator.wikimedia.org/T360779) (owner: 10Muehlenhoff) [12:39:45] PROBLEM - mailman archives on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [12:40:15] PROBLEM - mailman list info on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [12:41:33] PROBLEM - mailman list info ssl expiry on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [12:47:16] (03PS5) 10Muehlenhoff: Add a class to Cumin hosts which generates a Kafka certificate for frtech [puppet] - 10https://gerrit.wikimedia.org/r/1030018 (https://phabricator.wikimedia.org/T360779) [12:50:13] RECOVERY - mailman list info on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8616 bytes in 0.249 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [12:50:25] RECOVERY - mailman list info ssl expiry on lists1001 is OK: OK - Certificate lists.wikimedia.org will expire on Fri 14 Jun 2024 01:28:50 AM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [12:50:35] RECOVERY - mailman archives on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 51923 bytes in 0.093 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [12:53:11] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1030018 (https://phabricator.wikimedia.org/T360779) (owner: 10Muehlenhoff) [12:57:45] (03CR) 10Muehlenhoff: Add a class to Cumin hosts which generates a Kafka certificate for frtech (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1030018 (https://phabricator.wikimedia.org/T360779) (owner: 10Muehlenhoff) [13:00:30] (03CR) 10FNegri: "I will deploy this on Monday." [puppet] - 10https://gerrit.wikimedia.org/r/1029709 (https://phabricator.wikimedia.org/T364435) (owner: 10Zabe) [13:00:32] (03CR) 10Ladsgroup: "To make this work without needing to make it go at the same time with deployment of the core patch, we can simply override $wgFooterIcons " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1027150 (https://phabricator.wikimedia.org/T256190) (owner: 10Jforrester) [13:02:11] (03PS1) 10Muehlenhoff: mariadb::ferm_idm: Avoid Ferm-specific syntax [puppet] - 10https://gerrit.wikimedia.org/r/1030099 [13:02:45] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1030099 (owner: 10Muehlenhoff) [13:04:42] (03CR) 10Ssingh: "My take is on this is that we should do this after magru is up and running for a while and not before we have sent any traffic towards it." [puppet] - 10https://gerrit.wikimedia.org/r/1030051 (https://phabricator.wikimedia.org/T357257) (owner: 10Vgutierrez) [13:04:59] (03CR) 10Andrea Denisse: [C:03+1] "LGTM, thank you!" [puppet] - 10https://gerrit.wikimedia.org/r/1029924 (owner: 10Muehlenhoff) [13:10:18] 06SRE, 10observability, 13Patch-For-Review, 10SRE Observability (FY2023/2024-Q4): Phase out cergen for Observability services - https://phabricator.wikimedia.org/T360414#9785893 (10andrea.denisse) @MoritzMuehlenhoff Thanks for taking a look! Unfortunately, we can't delete the `thanos-query.discovery.wm... [13:11:39] (03PS1) 10Andrew Bogott: Deploy new labtesthorizon version [puppet] - 10https://gerrit.wikimedia.org/r/1030100 [13:12:17] (03CR) 10Andrew Bogott: [C:03+2] Deploy new labtesthorizon version [puppet] - 10https://gerrit.wikimedia.org/r/1030100 (owner: 10Andrew Bogott) [13:18:53] 10ops-eqiad, 06SRE: ManagementSSHDown parse1002.eqiad.wmnet - https://phabricator.wikimedia.org/T363086#9785906 (10akosiaris) Re-resolving, let's see how it fares this time around. [13:19:33] 10ops-eqiad, 06SRE: ManagementSSHDown parse1002.eqiad.wmnet - https://phabricator.wikimedia.org/T363086#9785907 (10akosiaris) 05Open→03Resolved [13:19:49] (03PS1) 10Andrew Bogott: Horizon: deploy 2024.1 Horizon code [puppet] - 10https://gerrit.wikimedia.org/r/1030108 [13:21:40] (03CR) 10Alexandros Kosiaris: [C:03+1] mw-web, mw-api-ext: bump replicas in advance of traffic shift [deployment-charts] - 10https://gerrit.wikimedia.org/r/1028842 (https://phabricator.wikimedia.org/T362323) (owner: 10Hnowlan) [13:22:31] (03CR) 10Andrew Bogott: [C:03+2] Horizon: deploy 2024.1 Horizon code [puppet] - 10https://gerrit.wikimedia.org/r/1030108 (owner: 10Andrew Bogott) [13:31:01] (03CR) 10Fabfur: [C:03+1] ncredir,benthos: Move processors to the pipeline section [puppet] - 10https://gerrit.wikimedia.org/r/1030010 (https://phabricator.wikimedia.org/T364379) (owner: 10Vgutierrez) [13:35:19] (03CR) 10Alexandros Kosiaris: [C:03+1] coredump.conf: Remove misconfigured KeepFree setting [puppet] - 10https://gerrit.wikimedia.org/r/1028565 (owner: 10Ahmon Dancy) [13:35:43] (03PS9) 10EoghanGaffney: lists: Add lists role to list2001 [puppet] - 10https://gerrit.wikimedia.org/r/1025741 (https://phabricator.wikimedia.org/T331706) [13:40:52] (03PS1) 10Andrew Bogott: Horizon: move eqiad1 config to bobcat files [puppet] - 10https://gerrit.wikimedia.org/r/1030114 [13:41:31] (03PS1) 10Cathal Mooney: Add reverse DNS entries for new codfw private subnets [dns] - 10https://gerrit.wikimedia.org/r/1030115 (https://phabricator.wikimedia.org/T364095) [13:41:36] (03CR) 10Andrew Bogott: [C:03+2] Horizon: move eqiad1 config to bobcat files [puppet] - 10https://gerrit.wikimedia.org/r/1030114 (owner: 10Andrew Bogott) [13:42:27] (03CR) 10CI reject: [V:04-1] Add reverse DNS entries for new codfw private subnets [dns] - 10https://gerrit.wikimedia.org/r/1030115 (https://phabricator.wikimedia.org/T364095) (owner: 10Cathal Mooney) [13:45:18] (03CR) 10Cwhite: [C:03+1] Deprecate system::role for o11y services [puppet] - 10https://gerrit.wikimedia.org/r/1029924 (owner: 10Muehlenhoff) [13:45:32] (03PS1) 10Andrew Bogott: Horizon local_settings: remove redundant ALLOWED_HOSTS setting [puppet] - 10https://gerrit.wikimedia.org/r/1030116 [13:46:00] (03CR) 10Andrew Bogott: [C:03+2] Horizon local_settings: remove redundant ALLOWED_HOSTS setting [puppet] - 10https://gerrit.wikimedia.org/r/1030116 (owner: 10Andrew Bogott) [13:47:05] 06SRE, 06Infrastructure-Foundations, 10Puppet-Infrastructure, 10Puppet (Puppet 7.0): Review/cleanup content of /srv/private/modules/secret/secrets in the private repo - https://phabricator.wikimedia.org/T364622 (10MoritzMuehlenhoff) 03NEW [13:48:49] (03PS1) 10Majavah: Update toolsbeta Prometheus k8s cert [puppet] - 10https://gerrit.wikimedia.org/r/1030117 [13:50:23] PROBLEM - Uncommitted DNS changes in Netbox on netbox1002 is CRITICAL: Netbox has uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes [13:51:23] (03PS2) 10Cathal Mooney: Add reverse DNS entries for new codfw private subnets [dns] - 10https://gerrit.wikimedia.org/r/1030115 (https://phabricator.wikimedia.org/T364095) [13:52:04] (03CR) 10Majavah: [C:03+2] Update toolsbeta Prometheus k8s cert [puppet] - 10https://gerrit.wikimedia.org/r/1030117 (owner: 10Majavah) [13:52:12] (03CR) 10CI reject: [V:04-1] Add reverse DNS entries for new codfw private subnets [dns] - 10https://gerrit.wikimedia.org/r/1030115 (https://phabricator.wikimedia.org/T364095) (owner: 10Cathal Mooney) [13:59:04] !log ayounsi@cumin1002 START - Cookbook sre.network.peering with action 'email' for AS: 64049 [14:06:30] !log ayounsi@cumin1002 END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 64049 [14:06:37] !log ayounsi@cumin1002 START - Cookbook sre.network.peering with action 'email' for AS: 30640 [14:07:38] !log ayounsi@cumin1002 END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 30640 [14:08:09] !log ayounsi@cumin1002 START - Cookbook sre.network.peering with action 'email' for AS: 7337 [14:08:15] 10ops-eqiad, 06SRE, 10Cassandra: Reimage aqs1013 - https://phabricator.wikimedia.org/T364422#9786006 (10Eevans) 05Open→03Resolved The reimage is complete, and both instances have been bootstrapped. Closing. [14:08:34] 10ops-eqiad, 06SRE, 10Cassandra: Degraded RAID on aqs1013 - https://phabricator.wikimedia.org/T362033#9786027 (10Eevans) The machine has been reimaged and the instances bootstrapped. 🤞 [14:08:43] !log ayounsi@cumin1002 END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 7337 [14:09:12] !log ayounsi@cumin1002 START - Cookbook sre.network.peering with action 'email' for AS: 5418 [14:09:22] !log ayounsi@cumin1002 END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 5418 [14:12:11] !log ayounsi@cumin1002 START - Cookbook sre.network.peering with action 'email' for AS: 5769 [14:12:26] !log ayounsi@cumin1002 END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 5769 [14:12:59] !log ayounsi@cumin1002 START - Cookbook sre.network.peering with action 'email' for AS: 23473 [14:13:32] !log ayounsi@cumin1002 END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 23473 [14:14:03] !log ayounsi@cumin1002 START - Cookbook sre.network.peering with action 'email' for AS: 38565 [14:14:32] (03CR) 10Ladsgroup: "I'd say add Hosts: footer and check if things work fine." [puppet] - 10https://gerrit.wikimedia.org/r/1025741 (https://phabricator.wikimedia.org/T331706) (owner: 10EoghanGaffney) [14:14:45] !log ayounsi@cumin1002 END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 38565 [14:15:36] !log ayounsi@cumin1002 START - Cookbook sre.network.peering with action 'email' for AS: 21574 [14:15:59] !log ayounsi@cumin1002 END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 21574 [14:19:30] FIRING: ProbeDown: Service wdqs1013:443 has failed probes (http_wdqs_external_sparql_endpoint_search_ip6) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs1013:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [14:24:30] RESOLVED: ProbeDown: Service wdqs1013:443 has failed probes (http_wdqs_external_sparql_endpoint_search_ip6) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs1013:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [14:26:09] (03CR) 10Cwhite: [C:04-1] Migrate `wmfstatic` metrics to Prometheus store (0310 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1029664 (https://phabricator.wikimedia.org/T359255) (owner: 10Andrea Denisse) [14:26:35] 10ops-magru: magru: add PDUs to Netbox - https://phabricator.wikimedia.org/T364628 (10ayounsi) 03NEW [14:26:48] 10ops-magru: magru: add PDUs to Netbox - https://phabricator.wikimedia.org/T364628#9786095 (10ayounsi) [14:26:50] 10ops-magru, 13Patch-For-Review: Sao Paulo, Brazil, South America POP tracking task - https://phabricator.wikimedia.org/T346722#9786096 (10ayounsi) [14:29:26] (03CR) 10Cwhite: [C:04-1] Migrate `wmfstatic` metrics to Prometheus store (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1029664 (https://phabricator.wikimedia.org/T359255) (owner: 10Andrea Denisse) [14:30:54] 06SRE, 06Infrastructure-Foundations, 10Puppet-Infrastructure, 10Puppet (Puppet 7.0): Review/cleanup content of /srv/private/modules/secret/secrets in the private repo - https://phabricator.wikimedia.org/T364622#9786103 (10Dzahn) Things I think we can delete, at a first glance: secret/secrets/tor secret/s... [14:35:54] (03CR) 10Ladsgroup: [C:03+1] lists: Add lists role to list2001 [puppet] - 10https://gerrit.wikimedia.org/r/1025741 (https://phabricator.wikimedia.org/T331706) (owner: 10EoghanGaffney) [14:36:28] FIRING: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [14:36:30] (03CR) 10Btullis: [C:03+2] Dumps: Include wikis with underscores in the list of folders to be mirrored. [puppet] - 10https://gerrit.wikimedia.org/r/1029633 (https://phabricator.wikimedia.org/T354687) (owner: 10Xcollazo) [14:36:47] (03CR) 10Btullis: [C:03+2] "Looks good to me. I can deploy on Monday, if it helps." [puppet] - 10https://gerrit.wikimedia.org/r/1029633 (https://phabricator.wikimedia.org/T354687) (owner: 10Xcollazo) [14:41:47] (03PS3) 10Andrea Denisse: Migrate `wmfstatic` metrics to Prometheus store [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1029664 (https://phabricator.wikimedia.org/T359255) [14:42:28] (03CR) 10CI reject: [V:04-1] Migrate `wmfstatic` metrics to Prometheus store [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1029664 (https://phabricator.wikimedia.org/T359255) (owner: 10Andrea Denisse) [14:43:35] (03PS4) 10Andrea Denisse: Migrate `wmfstatic` metrics to Prometheus store [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1029664 (https://phabricator.wikimedia.org/T359255) [14:44:01] (03CR) 10Andrea Denisse: "I've sent a new patch implementing your suggestions, thanks for taking a look!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1029664 (https://phabricator.wikimedia.org/T359255) (owner: 10Andrea Denisse) [14:44:43] (03PS5) 10Andrea Denisse: Migrate `wmfstatic` metrics to Prometheus store [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1029664 (https://phabricator.wikimedia.org/T359255) [14:47:06] (03CR) 10Andrea Denisse: Migrate `wmfstatic` metrics to Prometheus store (033 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1029664 (https://phabricator.wikimedia.org/T359255) (owner: 10Andrea Denisse) [14:49:20] 10ops-codfw, 06SRE, 06Infrastructure-Foundations, 10netops: codfw: use old asw switches from row A and B as msw switches in row C and D - https://phabricator.wikimedia.org/T361871#9786149 (10ayounsi) 05Resolved→03Open I'm not fond of setting up a password on un-managed switched. For example what is the... [14:57:48] (03PS60) 10AOkoth: prometheus: puppetise sql_exporter [puppet] - 10https://gerrit.wikimedia.org/r/945872 (https://phabricator.wikimedia.org/T310822) [14:58:08] (03CR) 10CI reject: [V:04-1] prometheus: puppetise sql_exporter [puppet] - 10https://gerrit.wikimedia.org/r/945872 (https://phabricator.wikimedia.org/T310822) (owner: 10AOkoth) [15:00:13] RESOLVED: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [15:00:44] (03PS61) 10AOkoth: prometheus: puppetise sql_exporter [puppet] - 10https://gerrit.wikimedia.org/r/945872 (https://phabricator.wikimedia.org/T310822) [15:11:50] (03PS62) 10AOkoth: prometheus: puppetise sql_exporter [puppet] - 10https://gerrit.wikimedia.org/r/945872 (https://phabricator.wikimedia.org/T310822) [15:13:11] FIRING: SystemdUnitFailed: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:14:55] (03CR) 10Xcollazo: "That works, thanks Ben!" [puppet] - 10https://gerrit.wikimedia.org/r/1029633 (https://phabricator.wikimedia.org/T354687) (owner: 10Xcollazo) [15:15:44] (03CR) 10CDanis: [C:03+1] ncredir,benthos: Move processors to the pipeline section [puppet] - 10https://gerrit.wikimedia.org/r/1030010 (https://phabricator.wikimedia.org/T364379) (owner: 10Vgutierrez) [15:17:26] 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for YLiou_WMF (no server access) - https://phabricator.wikimedia.org/T363514#9786216 (10Dzahn) 05Open→03In progress a:05Miriam→03None [15:19:38] 06SRE, 10SRE-Access-Requests: Requesting access to cassandra-staging-devs for xcollazo - https://phabricator.wikimedia.org/T364588#9786219 (10Dzahn) a:03KOfori [15:20:07] 06SRE, 10SRE-Access-Requests: Requesting access to cassandra-staging-devs for xcollazo - https://phabricator.wikimedia.org/T364588#9786220 (10Dzahn) 05Open→03In progress [15:21:37] (03PS63) 10AOkoth: prometheus: puppetise sql_exporter [puppet] - 10https://gerrit.wikimedia.org/r/945872 (https://phabricator.wikimedia.org/T310822) [15:22:32] 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for YLiou_WMF (no server access) - https://phabricator.wikimedia.org/T363514#9786224 (10Dzahn) [15:22:54] 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for YLiou_WMF (no server access) - https://phabricator.wikimedia.org/T363514#9786225 (10Dzahn) [15:23:14] 10ops-codfw: connected console ports attached to unracked device - https://phabricator.wikimedia.org/T364633 (10ayounsi) 03NEW [15:24:30] FIRING: ProbeDown: Service wdqs1018:443 has failed probes (http_wdqs_external_sparql_endpoint_search_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs1018:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [15:24:30] (03CR) 10Herron: [C:03+1] Deprecate system::role for o11y services [puppet] - 10https://gerrit.wikimedia.org/r/1029924 (owner: 10Muehlenhoff) [15:25:36] (03PS1) 10Dzahn: admin: add user Yu-Ming Liou (analytics-privatedata, no shell) [puppet] - 10https://gerrit.wikimedia.org/r/1030168 (https://phabricator.wikimedia.org/T363514) [15:27:00] (03CR) 10Dzahn: [V:03+1] "[mwmaint1002:~] $ ldapsearch -x uid=ylio* | grep uid" [puppet] - 10https://gerrit.wikimedia.org/r/1030168 (https://phabricator.wikimedia.org/T363514) (owner: 10Dzahn) [15:27:53] (03CR) 10Dzahn: [V:03+1] "https://meta.wikimedia.org/wiki/User:YLiou_(WMF)" [puppet] - 10https://gerrit.wikimedia.org/r/1030168 (https://phabricator.wikimedia.org/T363514) (owner: 10Dzahn) [15:29:30] RESOLVED: ProbeDown: Service wdqs1018:443 has failed probes (http_wdqs_external_sparql_endpoint_search_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs1018:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [15:29:51] (03CR) 10Ssingh: [C:03+1] "Looks good, I didn't check each PTRs individually since you automatically generated them but I verified the failing CI output and the VLAN" [dns] - 10https://gerrit.wikimedia.org/r/1030115 (https://phabricator.wikimedia.org/T364095) (owner: 10Cathal Mooney) [15:31:20] (03PS64) 10AOkoth: prometheus: puppetise sql_exporter [puppet] - 10https://gerrit.wikimedia.org/r/945872 (https://phabricator.wikimedia.org/T310822) [15:31:40] (03CR) 10CI reject: [V:04-1] prometheus: puppetise sql_exporter [puppet] - 10https://gerrit.wikimedia.org/r/945872 (https://phabricator.wikimedia.org/T310822) (owner: 10AOkoth) [15:33:32] (03PS65) 10AOkoth: prometheus: puppetise sql_exporter [puppet] - 10https://gerrit.wikimedia.org/r/945872 (https://phabricator.wikimedia.org/T310822) [15:37:54] (03PS1) 10Btullis: Update the email address for data-engineering-alerts [puppet] - 10https://gerrit.wikimedia.org/r/1030172 (https://phabricator.wikimedia.org/T364632) [15:38:14] (03CR) 10CI reject: [V:04-1] Update the email address for data-engineering-alerts [puppet] - 10https://gerrit.wikimedia.org/r/1030172 (https://phabricator.wikimedia.org/T364632) (owner: 10Btullis) [15:40:06] (03PS66) 10AOkoth: prometheus: puppetise sql_exporter [puppet] - 10https://gerrit.wikimedia.org/r/945872 (https://phabricator.wikimedia.org/T310822) [15:40:16] (03CR) 10Btullis: [V:03+1] "PCC SUCCESS (CORE_DIFF 3): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/2406/co" [puppet] - 10https://gerrit.wikimedia.org/r/1030172 (https://phabricator.wikimedia.org/T364632) (owner: 10Btullis) [15:41:28] (03CR) 10Btullis: [V:03+1] "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/1030172 (https://phabricator.wikimedia.org/T364632) (owner: 10Btullis) [15:44:24] (03PS67) 10AOkoth: prometheus: puppetise sql_exporter [puppet] - 10https://gerrit.wikimedia.org/r/945872 (https://phabricator.wikimedia.org/T310822) [15:45:51] (03PS1) 10Eevans: echostore: update cluster hosts [deployment-charts] - 10https://gerrit.wikimedia.org/r/1030175 [15:50:48] FIRING: Traffic bill over quota: Alert for device cr2-drmrs.wikimedia.org - Traffic bill over quota got acknowledged - https://alerts.wikimedia.org/?q=alertname%3DTraffic+bill+over+quota [15:52:51] (03CR) 10Xcollazo: Add the airflow profile to the statistics::explorer role (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1029541 (https://phabricator.wikimedia.org/T364542) (owner: 10Btullis) [15:52:59] (03CR) 10JHathaway: [C:03+1] Add alias for full cluster [puppet] - 10https://gerrit.wikimedia.org/r/1030085 (owner: 10Muehlenhoff) [16:00:23] PROBLEM - Uncommitted DNS changes in Netbox on netbox1002 is CRITICAL: Netbox has uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes [16:03:23] (03CR) 10Btullis: [V:03+1] Add the airflow profile to the statistics::explorer role (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1029541 (https://phabricator.wikimedia.org/T364542) (owner: 10Btullis) [16:03:33] (03PS1) 10Bking: lvs: add script to check for L2 connectivity [puppet] - 10https://gerrit.wikimedia.org/r/1030185 (https://phabricator.wikimedia.org/T363702) [16:04:11] (03CR) 10CI reject: [V:04-1] lvs: add script to check for L2 connectivity [puppet] - 10https://gerrit.wikimedia.org/r/1030185 (https://phabricator.wikimedia.org/T363702) (owner: 10Bking) [16:04:27] (03CR) 10Btullis: [C:03+1] "LGTM, thanks." [cookbooks] - 10https://gerrit.wikimedia.org/r/1029712 (owner: 10Ryan Kemper) [16:08:41] FIRING: [2x] SystemdUnitFailed: docker-reporter-base-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [16:09:39] (03Abandoned) 10Btullis: P:hue absent Icinga monitoring. [puppet] - 10https://gerrit.wikimedia.org/r/995180 (https://phabricator.wikimedia.org/T350694) (owner: 10Slyngshede) [16:10:48] FIRING: [4x] Traffic bill over quota: Alert for device cr1-codfw.wikimedia.org - Traffic bill over quota got acknowledged - https://alerts.wikimedia.org/?q=alertname%3DTraffic+bill+over+quota [16:10:52] (03CR) 10Btullis: [C:03+1] "Nice, thanks." [puppet] - 10https://gerrit.wikimedia.org/r/1029121 (https://phabricator.wikimedia.org/T360439) (owner: 10Muehlenhoff) [16:11:09] (03CR) 10Mmartorana: "Does anyone have any last thoughts/concerns on this before we move forward with the merge?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1010971 (https://phabricator.wikimedia.org/T337949) (owner: 10Mmartorana) [16:11:36] (03CR) 10Btullis: [C:03+1] "LGTM, thanks." [cookbooks] - 10https://gerrit.wikimedia.org/r/1028521 (https://phabricator.wikimedia.org/T363975) (owner: 10Brouberol) [16:12:11] (03CR) 10Btullis: [C:03+1] "LGTM, thanks." [puppet] - 10https://gerrit.wikimedia.org/r/1028526 (owner: 10Muehlenhoff) [16:12:23] (03PS2) 10Bking: lvs: add script to check for L2 connectivity [puppet] - 10https://gerrit.wikimedia.org/r/1030185 (https://phabricator.wikimedia.org/T363702) [16:12:39] !log cmooney@cumin1002 START - Cookbook sre.dns.netbox [16:12:46] (03CR) 10Btullis: [C:03+1] Remove obsolete Hiera settings to allow dropping Python 2 [puppet] - 10https://gerrit.wikimedia.org/r/1028764 (https://phabricator.wikimedia.org/T316876) (owner: 10Muehlenhoff) [16:14:53] !log cmooney@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns records for new codfw row c and d networks - cmooney@cumin1002" [16:16:13] !log cmooney@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns records for new codfw row c and d networks - cmooney@cumin1002" [16:16:13] !log cmooney@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [16:16:22] (03CR) 10Cathal Mooney: "recheck" [dns] - 10https://gerrit.wikimedia.org/r/1030115 (https://phabricator.wikimedia.org/T364095) (owner: 10Cathal Mooney) [16:17:31] (03CR) 10Cathal Mooney: [C:03+2] Add reverse DNS entries for new codfw private subnets [dns] - 10https://gerrit.wikimedia.org/r/1030115 (https://phabricator.wikimedia.org/T364095) (owner: 10Cathal Mooney) [16:18:10] (03CR) 10Xcollazo: Add the airflow profile to the statistics::explorer role (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1029541 (https://phabricator.wikimedia.org/T364542) (owner: 10Btullis) [16:18:24] (03CR) 10Bking: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1030185 (https://phabricator.wikimedia.org/T363702) (owner: 10Bking) [16:19:30] (03PS4) 10Scott French: benthos: adopt securityContext and base.external-services-networkpolicy [deployment-charts] - 10https://gerrit.wikimedia.org/r/1028910 (https://phabricator.wikimedia.org/T359423) [16:20:48] FIRING: [4x] Traffic bill over quota: Alert for device cr1-codfw.wikimedia.org - Traffic bill over quota got acknowledged - https://alerts.wikimedia.org/?q=alertname%3DTraffic+bill+over+quota [16:22:25] (03PS1) 10Ayounsi: drmrs: force Free on Arelion [homer/public] - 10https://gerrit.wikimedia.org/r/1030188 [16:23:25] (03CR) 10Cathal Mooney: [C:03+1] "LGTM!" [homer/public] - 10https://gerrit.wikimedia.org/r/1030188 (owner: 10Ayounsi) [16:23:47] (03PS3) 10Bking: lvs: add script to check for L2 connectivity [puppet] - 10https://gerrit.wikimedia.org/r/1030185 (https://phabricator.wikimedia.org/T363702) [16:30:48] FIRING: [4x] Traffic bill over quota: Alert for device cr1-codfw.wikimedia.org - Traffic bill over quota got acknowledged - https://alerts.wikimedia.org/?q=alertname%3DTraffic+bill+over+quota [16:35:23] RECOVERY - Uncommitted DNS changes in Netbox on netbox1002 is OK: Netbox has zero uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes [16:36:06] (03PS4) 10Bking: lvs: add script to check for L2 connectivity [puppet] - 10https://gerrit.wikimedia.org/r/1030185 (https://phabricator.wikimedia.org/T363702) [16:37:02] (03CR) 10Bking: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1030185 (https://phabricator.wikimedia.org/T363702) (owner: 10Bking) [16:40:48] RESOLVED: Traffic bill over quota: Alert for device cr2-eqsin.wikimedia.org - Traffic bill over quota got acknowledged - https://alerts.wikimedia.org/?q=alertname%3DTraffic+bill+over+quota [16:40:59] (03PS1) 10Btullis: Move some of the data-engineering alerts to data-platform [alerts] - 10https://gerrit.wikimedia.org/r/1030189 (https://phabricator.wikimedia.org/T346438) [16:45:06] !log ayounsi@cumin1002 START - Cookbook sre.network.peering with action 'email' for AS: 17451 [16:46:13] !log ayounsi@cumin1002 END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 17451 [16:47:05] !log ayounsi@cumin1002 START - Cookbook sre.network.peering with action 'email' for AS: 9269 [16:48:27] !log ayounsi@cumin1002 END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9269 [16:51:19] (03PS1) 10Scott French: changeprop: add securityContext to all containers [deployment-charts] - 10https://gerrit.wikimedia.org/r/1030190 (https://phabricator.wikimedia.org/T362978) [16:51:52] (03PS1) 10Elukey: Revert "ml-services: add nllb-gpu to ml-staging" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1030044 [16:52:39] !log ayounsi@cumin1002 START - Cookbook sre.network.peering with action 'configure' for AS: 15830 [16:53:16] (03CR) 10BBlack: [C:03+2] admin: add user Yu-Ming Liou (analytics-privatedata, no shell) [puppet] - 10https://gerrit.wikimedia.org/r/1030168 (https://phabricator.wikimedia.org/T363514) (owner: 10Dzahn) [16:53:56] !log ayounsi@cumin1002 END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 15830 [16:55:06] (03PS1) 10Scott French: citoid: add securityContext to all containers [deployment-charts] - 10https://gerrit.wikimedia.org/r/1030191 (https://phabricator.wikimedia.org/T346638) [16:55:42] (03CR) 10Elukey: [C:03+2] Revert "ml-services: add nllb-gpu to ml-staging" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1030044 (owner: 10Elukey) [16:57:07] 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to analytics-privatedata-users for YLiou_WMF (no server access) - https://phabricator.wikimedia.org/T363514#9786548 (10BBlack) [16:57:49] (03PS1) 10Scott French: cxserver: add securityContext to all containers [deployment-charts] - 10https://gerrit.wikimedia.org/r/1030195 (https://phabricator.wikimedia.org/T362978) [16:58:03] !log ayounsi@cumin1002 START - Cookbook sre.network.peering with action 'email' for AS: 26073 [16:58:40] !log ayounsi@cumin1002 END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 26073 [16:58:46] !log ayounsi@cumin1002 START - Cookbook sre.network.peering with action 'email' for AS: 19165 [16:59:13] !log ayounsi@cumin1002 END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 19165 [17:00:06] !log elukey@deploy1002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . [17:04:03] (03PS1) 10JHathaway: postfix: change acme chief cert order for Postfix [puppet] - 10https://gerrit.wikimedia.org/r/1030199 (https://phabricator.wikimedia.org/T325395) [17:06:10] 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to analytics-privatedata-users for YLiou_WMF (no server access) - https://phabricator.wikimedia.org/T363514#9786554 (10BBlack) 05In progress→03Resolved a:03BBlack Should be all set, may take up to ~30 minutes for changes to propagate. [17:06:58] 10ops-codfw, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: codfw row C/D upgrade racking task - https://phabricator.wikimedia.org/T360789#9786574 (10cmooney) @papaul I've added all the links for the new switches in Netbox now: https://netbox.wikimedia.org/dcim/cables/?color=&device_id=5234&de... [17:07:26] !log cmooney@cumin1002 START - Cookbook sre.dns.netbox [17:07:59] (03CR) 10JHathaway: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1030199 (https://phabricator.wikimedia.org/T325395) (owner: 10JHathaway) [17:09:35] (03CR) 10JHathaway: [C:03+2] postfix: change acme chief cert order for Postfix [puppet] - 10https://gerrit.wikimedia.org/r/1030199 (https://phabricator.wikimedia.org/T325395) (owner: 10JHathaway) [17:12:15] (03PS1) 10Kimberly Sarabia: Deploy disabled limited width on main page [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1030200 (https://phabricator.wikimedia.org/T357706) [17:13:35] !log cmooney@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for spine to spine links codfw - cmooney@cumin1002" [17:14:30] FIRING: [2x] ProbeDown: Service wdqs1021:443 has failed probes (http_wdqs_external_sparql_endpoint_search_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs1021:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [17:19:30] RESOLVED: [2x] ProbeDown: Service wdqs1021:443 has failed probes (http_wdqs_external_sparql_endpoint_search_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs1021:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [17:34:29] (03PS1) 10CDanis: Service mesh: rename local_service cluster (copy patch) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1030220 [17:34:29] (03PS1) 10CDanis: Service mesh: rename local_service cluster [deployment-charts] - 10https://gerrit.wikimedia.org/r/1030221 (https://phabricator.wikimedia.org/T363407) [17:36:50] (03PS1) 10Herron: pyrra: trafficserver: onboard slo from grizzly [puppet] - 10https://gerrit.wikimedia.org/r/1030227 (https://phabricator.wikimedia.org/T302995) [17:40:24] !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance [17:40:37] !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance [17:40:47] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db2127 (T352010)', diff saved to https://phabricator.wikimedia.org/P62275 and previous config saved to /var/cache/conftool/dbconfig/20240510-174044-ladsgroup.json [17:40:50] T352010: Gradually drop old pagelinks columns - https://phabricator.wikimedia.org/T352010 [17:42:55] (03CR) 10Ryan Kemper: [C:03+2] sre.kafka.roll-restart-reboot-brokers: fix usage [cookbooks] - 10https://gerrit.wikimedia.org/r/1029712 (owner: 10Ryan Kemper) [17:46:07] (03PS2) 10Herron: pyrra: trafficserver: onboard slo from grizzly [puppet] - 10https://gerrit.wikimedia.org/r/1030227 (https://phabricator.wikimedia.org/T302995) [18:22:45] !log cmooney@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for spine to spine links codfw - cmooney@cumin1002" [18:22:45] !log cmooney@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [18:23:03] (03CR) 10Cwhite: Migrate `wmfstatic` metrics to Prometheus store (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1029664 (https://phabricator.wikimedia.org/T359255) (owner: 10Andrea Denisse) [18:27:54] (03CR) 10Andrea Denisse: Migrate `wmfstatic` metrics to Prometheus store (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1029664 (https://phabricator.wikimedia.org/T359255) (owner: 10Andrea Denisse) [18:29:20] (03PS6) 10Andrea Denisse: Migrate `wmfstatic` metrics to Prometheus store [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1029664 (https://phabricator.wikimedia.org/T359255) [18:31:20] (03CR) 10Andrea Denisse: Migrate `wmfstatic` metrics to Prometheus store (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1029664 (https://phabricator.wikimedia.org/T359255) (owner: 10Andrea Denisse) [18:31:58] (03PS2) 10CDanis: Service mesh: rename local_service cluster [deployment-charts] - 10https://gerrit.wikimedia.org/r/1030221 (https://phabricator.wikimedia.org/T363407) [18:41:14] !log fab@deploy1002 Started deploy [airflow-dags/research@75163c7]: (no justification provided) [18:41:46] !log fab@deploy1002 Finished deploy [airflow-dags/research@75163c7]: (no justification provided) (duration: 00m 32s) [19:06:44] (03PS5) 10Bking: lvs: add script to check for L2 connectivity [puppet] - 10https://gerrit.wikimedia.org/r/1030185 (https://phabricator.wikimedia.org/T363702) [19:13:11] FIRING: SystemdUnitFailed: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:31:22] 06SRE, 10SRE-Access-Requests: Requesting access to deployment for ecarg/Grace Choi - https://phabricator.wikimedia.org/T364414#9786937 (10Dzahn) [19:31:35] 06SRE, 10SRE-Access-Requests: Requesting access to deployment for ecarg/Grace Choi - https://phabricator.wikimedia.org/T364414#9786939 (10Dzahn) 05Open→03In progress [19:34:03] 06SRE, 10SRE-Access-Requests: Requesting access to deployment for ecarg/Grace Choi - https://phabricator.wikimedia.org/T364414#9786948 (10Dzahn) [19:35:11] (03PS1) 10CDanis: Parse OTel service names from what's available. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1030290 (https://phabricator.wikimedia.org/T363407) [19:36:06] 06SRE, 10SRE-Access-Requests: Requesting access to deployment for ecarg/Grace Choi - https://phabricator.wikimedia.org/T364414#9786955 (10Dzahn) a:03thcipriani Thanks for manager approval. I will upload a patch and assigning to the group approver for consideration :) [19:37:01] (03PS1) 10Dzahn: admin: add Grace Choi to deployers [puppet] - 10https://gerrit.wikimedia.org/r/1030291 (https://phabricator.wikimedia.org/T364414) [19:40:13] (03CR) 10Dzahn: "https://www.mediawiki.org/wiki/Abstract_Wikipedia_team" [puppet] - 10https://gerrit.wikimedia.org/r/1030291 (https://phabricator.wikimedia.org/T364414) (owner: 10Dzahn) [19:42:11] (03CR) 10CDanis: "Reviewers: I'm sorry." [deployment-charts] - 10https://gerrit.wikimedia.org/r/1030290 (https://phabricator.wikimedia.org/T363407) (owner: 10CDanis) [19:50:42] (03PS3) 10Dzahn: admin: create group fr-tech-devs, apply to role crm - WIP [puppet] - 10https://gerrit.wikimedia.org/r/1027052 (https://phabricator.wikimedia.org/T364494) [19:51:08] (03PS4) 10Dzahn: admin: create group fr-tech-devs, apply to role crm [puppet] - 10https://gerrit.wikimedia.org/r/1027052 (https://phabricator.wikimedia.org/T364494) [20:08:41] FIRING: [2x] SystemdUnitFailed: docker-reporter-base-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [20:12:33] (03CR) 10Gergő Tisza: [C:03+1] Use encrypted Argon2 Hashes to store user passwords [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1029183 (https://phabricator.wikimedia.org/T150647) (owner: 10Zabe) [20:15:50] (03PS1) 10Santiago Faci: mpic-next: New release for staging environment with some fixes: v0.0.10 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1030331 (https://phabricator.wikimedia.org/T360734) [20:17:11] (03CR) 10Santiago Faci: [C:03+2] mpic-next: New release for staging environment with some fixes: v0.0.10 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1030331 (https://phabricator.wikimedia.org/T360734) (owner: 10Santiago Faci) [20:18:04] (03Merged) 10jenkins-bot: mpic-next: New release for staging environment with some fixes: v0.0.10 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1030331 (https://phabricator.wikimedia.org/T360734) (owner: 10Santiago Faci) [20:19:42] !log sfaci@deploy1002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply [20:19:58] !log sfaci@deploy1002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply [21:03:26] FIRING: [3x] SystemdUnitFailed: docker-reporter-base-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:05:32] 06SRE, 06collaboration-services, 06Release-Engineering-Team, 06serviceops: replace production buster deployment servers - https://phabricator.wikimedia.org/T364656 (10Dzahn) 03NEW [21:06:33] 06SRE, 06collaboration-services, 06Release-Engineering-Team, 06serviceops: replace production buster deployment servers - https://phabricator.wikimedia.org/T364656#9787215 (10Dzahn) [21:06:48] (03PS1) 10Ryan Kemper: kafka/roll-restart-mirror-maker: update usage w/ new name [cookbooks] - 10https://gerrit.wikimedia.org/r/1030347 [21:08:33] !log ryankemper@cumin2002 START - Cookbook sre.kafka.roll-restart-mirror-maker restart MirrorMaker for Kafka A:kafka-mirror-maker-test-eqiad cluster: Roll restart of jvm daemons. [21:08:36] 06SRE, 06collaboration-services, 06serviceops: add bullseye support to deployment server puppet role - upgrade deployment server in devtools - https://phabricator.wikimedia.org/T363415#9787220 (10Dzahn) [21:08:37] 06SRE, 06collaboration-services, 06Release-Engineering-Team, 06serviceops: replace production buster deployment servers - https://phabricator.wikimedia.org/T364656#9787221 (10Dzahn) [21:08:55] PROBLEM - Check unit status of httpbb_hourly_appserver on cumin1002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_hourly_appserver https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [21:09:11] 06SRE, 06collaboration-services, 06Release-Engineering-Team, 06serviceops: replace production buster deployment servers - https://phabricator.wikimedia.org/T364656#9787224 (10Dzahn) [21:09:15] 06SRE, 06Infrastructure-Foundations, 07Epic: Tracking task for Bullseye migrations in production - https://phabricator.wikimedia.org/T291916#9787223 (10Dzahn) [21:12:00] 06SRE, 06collaboration-services, 06serviceops: add bullseye support to deployment server puppet role - upgrade deployment server in devtools - https://phabricator.wikimedia.org/T363415#9787231 (10Dzahn) [21:12:12] 06SRE, 06collaboration-services, 06serviceops: add bullseye support to deployment server puppet role - upgrade deployment server in devtools - https://phabricator.wikimedia.org/T363415#9787227 (10Dzahn) T364656 will be about upgrading / replacing the production deployment servers. This ticket is about prep... [21:19:04] !log ryankemper@cumin2002 END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0) restart MirrorMaker for Kafka A:kafka-mirror-maker-test-eqiad cluster: Roll restart of jvm daemons. [21:20:51] (03CR) 10Bking: [C:03+1] kafka/roll-restart-mirror-maker: update usage w/ new name [cookbooks] - 10https://gerrit.wikimedia.org/r/1030347 (owner: 10Ryan Kemper) [21:21:30] (03CR) 10Ryan Kemper: [C:03+2] kafka/roll-restart-mirror-maker: update usage w/ new name [cookbooks] - 10https://gerrit.wikimedia.org/r/1030347 (owner: 10Ryan Kemper) [21:22:47] 06SRE, 06collaboration-services, 06serviceops: add bullseye support to deployment server puppet role - upgrade deployment server in devtools - https://phabricator.wikimedia.org/T363415#9787245 (10Dzahn) 05In progress→03Resolved [21:23:53] 06SRE, 06collaboration-services, 06serviceops: add bullseye support to deployment server puppet role - upgrade deployment server in devtools - https://phabricator.wikimedia.org/T363415#9787242 (10Dzahn) >>! In T363415#9778509, @Dzahn wrote: > This still needs https://gerrit.wikimedia.org/r/c/operations/p... [21:58:55] RECOVERY - Check unit status of httpbb_hourly_appserver on cumin1002 is OK: OK: Status of the systemd unit httpbb_hourly_appserver https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [22:03:26] FIRING: [3x] SystemdUnitFailed: docker-reporter-base-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [22:03:40] (03PS1) 10Bartosz Dziewoński: ArticleTarget: Fix return of getVisualDiffGeneratorPromise [extensions/VisualEditor] (wmf/1.43.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1030308 (https://phabricator.wikimedia.org/T364635) [22:04:29] PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 180 probes of 734 (alerts on 90) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [22:06:42] (03CR) 10Bartosz Dziewoński: "Scheduled for deployment on Monday: https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240513T1300" [core] (wmf/1.43.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1029564 (https://phabricator.wikimedia.org/T364554) (owner: 10Umherirrender) [22:06:51] (03CR) 10Bartosz Dziewoński: "Scheduled for deployment on Monday: https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240513T1300" [extensions/VisualEditor] (wmf/1.43.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1030308 (https://phabricator.wikimedia.org/T364635) (owner: 10Bartosz Dziewoński) [22:07:00] (03CR) 10Bartosz Dziewoński: [C:03+1] "Scheduled for deployment on Monday: https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240513T1300" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1029187 (owner: 10Bartosz Dziewoński) [22:09:27] RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 37 probes of 734 (alerts on 90) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [23:13:11] FIRING: SystemdUnitFailed: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [23:33:11] (03CR) 10Dwisehaupt: [C:03+1] admin: create group fr-tech-devs, apply to role crm [puppet] - 10https://gerrit.wikimedia.org/r/1027052 (https://phabricator.wikimedia.org/T364494) (owner: 10Dzahn) [23:38:37] (03PS1) 10Scott French: WIP: Support reading parser cache servers from etcd [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1030440 [23:38:45] (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1028945 [23:38:45] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1028945 (owner: 10TrainBranchBot) [23:40:20] (03CR) 10CI reject: [V:04-1] WIP: Support reading parser cache servers from etcd [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1030440 (owner: 10Scott French) [23:42:15] (03PS2) 10Scott French: WIP: Support reading parser cache servers from etcd [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1030440 [23:43:21] (03CR) 10CI reject: [V:04-1] WIP: Support reading parser cache servers from etcd [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1030440 (owner: 10Scott French) [23:47:54] (03PS3) 10Scott French: WIP: Support reading parser cache servers from etcd [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1030440