[00:04:05] PROBLEM - Check systemd state on aphlict2001 is CRITICAL: CRITICAL - degraded: The following units failed: logrotate.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:10:01] (03CR) 10Cwhite: thanos: add bucket query tools (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/983752 (https://phabricator.wikimedia.org/T351927) (owner: 10Filippo Giunchedi) [00:12:23] (03CR) 10Cwhite: [C: 03+1] "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/981407 (https://phabricator.wikimedia.org/T333615) (owner: 10Andrea Denisse) [00:18:37] PROBLEM - Check systemd state on an-web1001 is CRITICAL: CRITICAL - degraded: The following units failed: hardsync-published.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:24:04] !log jhancock@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2075.codfw.wmnet with reason: host reimage [00:27:41] !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2075.codfw.wmnet with reason: host reimage [00:30:15] RECOVERY - Check systemd state on an-web1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:34:49] PROBLEM - Check systemd state on restbase2029 is CRITICAL: CRITICAL - degraded: The following units failed: cassandra-c.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:34:49] PROBLEM - cassandra-c SSL 10.192.16.242:7000 on restbase2029 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused https://wikitech.wikimedia.org/wiki/Cassandra%23Installing_and_generating_certificates [00:35:09] PROBLEM - cassandra-c service on restbase2029 is CRITICAL: CRITICAL - Expecting active but unit cassandra-c is failed https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [00:35:21] PROBLEM - cassandra-c CQL 10.192.16.242:9042 on restbase2029 is CRITICAL: connect to address 10.192.16.242 and port 9042: Connection refused https://phabricator.wikimedia.org/T93886 [00:38:29] (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/984233 [00:38:32] (03CR) 10TrainBranchBot: [C: 03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/984233 (owner: 10TrainBranchBot) [00:54:59] RECOVERY - cassandra-c service on restbase2029 is OK: OK - cassandra-c is active https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [00:56:03] RECOVERY - Check systemd state on restbase2029 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:56:35] RECOVERY - cassandra-c CQL 10.192.16.242:9042 on restbase2029 is OK: TCP OK - 0.037 second response time on 10.192.16.242 port 9042 https://phabricator.wikimedia.org/T93886 [00:57:27] RECOVERY - cassandra-c SSL 10.192.16.242:7000 on restbase2029 is OK: SSL OK - Certificate restbase2029-c valid until 2025-12-05 16:11:15 +0000 (expires in 715 days) https://wikitech.wikimedia.org/wiki/Cassandra%23Installing_and_generating_certificates [01:00:29] RECOVERY - Check systemd state on aphlict2001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:00:37] (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/984233 (owner: 10TrainBranchBot) [01:11:55] (JobUnavailable) firing: (2) Reduced availability for job atlas_exporter in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [01:12:09] !log jhancock@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" [01:13:18] !log jhancock@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" [01:13:20] !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2075.codfw.wmnet with OS bullseye [01:13:27] 10SRE, 10SRE-swift-storage, 10ops-codfw, 10DC-Ops, 10Data-Persistence: Q2:rack/setup/install ms-be refresh - https://phabricator.wikimedia.org/T349839 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host ms-be2075.codfw.wmnet with OS bullseye completed: - ms-... [01:15:29] 10SRE, 10SRE-swift-storage, 10ops-codfw, 10DC-Ops, 10Data-Persistence: Q2:rack/setup/install ms-be refresh - https://phabricator.wikimedia.org/T349839 (10Jhancock.wm) 05Open→03Resolved [01:16:27] 10SRE, 10SRE-swift-storage, 10ops-codfw, 10DC-Ops, 10Data-Persistence: Q2:rack/setup/install ms-be refresh - https://phabricator.wikimedia.org/T349839 (10Jhancock.wm) @MatthewVernon apologies for the wait on this one. all good now. [01:42:03] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: monitor_refine_event.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:42:51] PROBLEM - Check systemd state on restbase2028 is CRITICAL: CRITICAL - degraded: The following units failed: cassandra-b.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:43:29] PROBLEM - cassandra-b CQL 10.192.16.238:9042 on restbase2028 is CRITICAL: connect to address 10.192.16.238 and port 9042: Connection refused https://phabricator.wikimedia.org/T93886 [01:43:31] PROBLEM - cassandra-b SSL 10.192.16.238:7000 on restbase2028 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused https://wikitech.wikimedia.org/wiki/Cassandra%23Installing_and_generating_certificates [01:44:01] PROBLEM - cassandra-b service on restbase2028 is CRITICAL: CRITICAL - Expecting active but unit cassandra-b is failed https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [01:45:29] RECOVERY - cassandra-b service on restbase2028 is OK: OK - cassandra-b is active https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [01:45:49] RECOVERY - Check systemd state on restbase2028 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:46:29] RECOVERY - cassandra-b CQL 10.192.16.238:9042 on restbase2028 is OK: TCP OK - 0.093 second response time on 10.192.16.238 port 9042 https://phabricator.wikimedia.org/T93886 [01:46:31] RECOVERY - cassandra-b SSL 10.192.16.238:7000 on restbase2028 is OK: SSL OK - Certificate restbase2028-b valid until 2025-12-03 21:33:01 +0000 (expires in 713 days) https://wikitech.wikimedia.org/wiki/Cassandra%23Installing_and_generating_certificates [02:14:59] 10SRE-swift-storage, 10Data-Persistence, 10media-backups: Missing original File:Ignatyevo.jpg - https://phabricator.wikimedia.org/T353797 (10Bugreporter) See also: {T182822} [02:17:05] PROBLEM - Host wdqs1006 is DOWN: PING CRITICAL - Packet loss = 100% [02:17:05] PROBLEM - Host wdqs1007 is DOWN: PING CRITICAL - Packet loss = 100% [02:17:37] PROBLEM - Host wdqs1008 is DOWN: PING CRITICAL - Packet loss = 100% [02:36:55] (JobUnavailable) firing: (3) Reduced availability for job atlas_exporter in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [03:08:37] (JobUnavailable) firing: (3) Reduced availability for job atlas_exporter in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [03:09:16] (MediaWikiLatencyExceeded) firing: Average latency high: codfw parsoid GET/200: 6.054767537499307s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=codfw&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [03:37:41] PROBLEM - mailman list info on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [03:39:01] RECOVERY - mailman list info on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8571 bytes in 0.346 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [04:14:00] (SwaggerProbeHasFailures) firing: Not all openapi/swagger endpoints returned healthy - https://wikitech.wikimedia.org/wiki/Runbook#https://cxserver.svc.eqiad.wmnet:4002 - https://grafana.wikimedia.org/d/_77ik484k/openapi-swagger-endpoint-state?var-site=eqiad - https://alerts.wikimedia.org/?q=alertname%3DSwaggerProbeHasFailures [04:18:59] (SwaggerProbeHasFailures) resolved: Not all openapi/swagger endpoints returned healthy - https://wikitech.wikimedia.org/wiki/Runbook#https://cxserver.svc.eqiad.wmnet:4002 - https://grafana.wikimedia.org/d/_77ik484k/openapi-swagger-endpoint-state?var-site=eqiad - https://alerts.wikimedia.org/?q=alertname%3DSwaggerProbeHasFailures [04:52:45] (SystemdUnitFailed) firing: confd_prometheus_metrics.service Failed on elastic2088:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:57:45] (SystemdUnitFailed) resolved: confd_prometheus_metrics.service Failed on elastic2088:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:13:41] (03PS1) 10KartikMistry: Update MinT to 2023-12-20-071058-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/984661 [05:16:29] (03CR) 10KartikMistry: [C: 03+2] Update MinT to 2023-12-20-071058-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/984661 (owner: 10KartikMistry) [05:17:14] * kart_ deploying MinT.. [05:17:25] (03Merged) 10jenkins-bot: Update MinT to 2023-12-20-071058-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/984661 (owner: 10KartikMistry) [05:22:42] (SystemdUnitFailed) firing: (2) confd_prometheus_metrics.service Failed on elastic2088:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:26:06] !log kartik@deploy2002 helmfile [staging] START helmfile.d/services/machinetranslation: apply [05:27:42] (SystemdUnitFailed) firing: (5) confd_prometheus_metrics.service Failed on elastic2088:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:29:10] !log kartik@deploy2002 helmfile [staging] DONE helmfile.d/services/machinetranslation: apply [05:32:42] (SystemdUnitFailed) resolved: (5) confd_prometheus_metrics.service Failed on elastic2088:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:35:11] !log kartik@deploy2002 helmfile [codfw] START helmfile.d/services/machinetranslation: apply [05:40:08] !log kartik@deploy2002 helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply [05:41:09] (MediaWikiLatencyExceeded) firing: Average latency high: eqiad mw-api-int (k8s) - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-api-int - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [05:42:48] !log kartik@deploy2002 helmfile [eqiad] START helmfile.d/services/machinetranslation: apply [05:46:09] (MediaWikiLatencyExceeded) resolved: Average latency high: eqiad mw-api-int (k8s) - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-api-int - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [05:50:49] !log kartik@deploy2002 helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply [05:51:09] (MediaWikiLatencyExceeded) firing: Average latency high: eqiad mw-api-int (k8s) - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-api-int - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [05:56:07] !log Updated MinT to 2023-12-20-071058-production [05:56:09] (MediaWikiLatencyExceeded) resolved: Average latency high: eqiad mw-api-int (k8s) - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-api-int - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [05:56:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:57:51] (03PS1) 10Marostegui: installserver: Do not format db1244 [puppet] - 10https://gerrit.wikimedia.org/r/984662 [06:00:57] (03CR) 10Marostegui: [C: 03+2] installserver: Do not format db1244 [puppet] - 10https://gerrit.wikimedia.org/r/984662 (owner: 10Marostegui) [06:08:35] (JobUnavailable) resolved: (2) Reduced availability for job atlas_exporter in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [06:12:06] (03PS1) 10Marostegui: installserver: Do not format dbstore1008 [puppet] - 10https://gerrit.wikimedia.org/r/984664 [06:14:16] (MediaWikiLatencyExceeded) resolved: Average latency high: codfw parsoid GET/200: 4.311446422527643s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=codfw&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [06:14:46] (MediaWikiLatencyExceeded) firing: Average latency high: codfw parsoid GET/200: 2.8001064065073784s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=codfw&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [06:15:30] (03CR) 10Marostegui: [C: 03+2] installserver: Do not format dbstore1008 [puppet] - 10https://gerrit.wikimedia.org/r/984664 (owner: 10Marostegui) [06:19:46] (MediaWikiLatencyExceeded) resolved: Average latency high: codfw parsoid GET/200: 4.232766944023054s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=codfw&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [06:28:16] (MediaWikiLatencyExceeded) firing: Average latency high: codfw parsoid GET/200: 6.152465991498105s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=codfw&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [06:31:45] PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 126, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [06:31:59] PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 225, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [06:40:34] (03PS1) 10Houseblaster: InitialiseSettings.php: Allow thanking bots [mediawiki-config] - 10https://gerrit.wikimedia.org/r/984665 (https://phabricator.wikimedia.org/T341388) [06:51:58] (03Abandoned) 10Houseblaster: InitialiseSettings.php: Allow thanking bots [mediawiki-config] - 10https://gerrit.wikimedia.org/r/984665 (https://phabricator.wikimedia.org/T341388) (owner: 10Houseblaster) [07:00:05] Deploy window MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20231221T0700) [07:00:05] kormat, marostegui, and Amir1: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for Primary database switchover deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20231221T0700). [07:02:40] (03PS1) 10Ilias Sarantopoulos: ml-services: upgrade readability to kserve 0.11.2 [deployment-charts] - 10https://gerrit.wikimedia.org/r/984666 (https://phabricator.wikimedia.org/T348664) [07:16:20] (03CR) 10Slyngshede: [C: 03+2] Debian packaging configuration [software/debmonitor-client] (debian) - 10https://gerrit.wikimedia.org/r/982391 (owner: 10Slyngshede) [07:30:44] (03CR) 10Kevin Bazira: [C: 03+1] ml-services: upgrade readability to kserve 0.11.2 [deployment-charts] - 10https://gerrit.wikimedia.org/r/984666 (https://phabricator.wikimedia.org/T348664) (owner: 10Ilias Sarantopoulos) [07:40:49] (03CR) 10Ilias Sarantopoulos: [C: 03+2] ml-services: upgrade readability to kserve 0.11.2 [deployment-charts] - 10https://gerrit.wikimedia.org/r/984666 (https://phabricator.wikimedia.org/T348664) (owner: 10Ilias Sarantopoulos) [07:41:42] (03Merged) 10jenkins-bot: ml-services: upgrade readability to kserve 0.11.2 [deployment-charts] - 10https://gerrit.wikimedia.org/r/984666 (https://phabricator.wikimedia.org/T348664) (owner: 10Ilias Sarantopoulos) [07:53:16] (MediaWikiLatencyExceeded) resolved: Average latency high: codfw parsoid GET/200: 4.471287082580525s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=codfw&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [07:54:46] (MediaWikiLatencyExceeded) firing: Average latency high: codfw parsoid GET/200: 6.661585918380473s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=codfw&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [08:00:05] Amir1, apergos, and jnuche: OwO what's this, a deployment window?? UTC morning backport and config training. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20231221T0800). nyaa~ [08:00:05] MatmaRex: A patch you scheduled for UTC morning backport and config training is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [08:00:15] huh [08:00:34] let me see if there are any trainees signed up [08:01:25] hi [08:01:51] morning! no trainees signed up, you're the only one with patches to deploy. you sign up to get trained yet? :-P :-P [08:02:37] heh :D no, not yet [08:02:55] looks like I'll be your deployer of the day then. [08:04:07] I'll do the patches in the order you've listed them. [08:05:43] thanks [08:06:23] (03CR) 10TrainBranchBot: [C: 03+2] "Approved by ariel@deploy2002 using scap backport" [extensions/DiscussionTools] (wmf/1.42.0-wmf.9) - 10https://gerrit.wikimedia.org/r/984495 (https://phabricator.wikimedia.org/T353489) (owner: 10Bartosz Dziewoński) [08:07:59] the patches can be tested on https://test.wikipedia.org/wiki/Wikipedia:Heading_with_styles and https://de.wikipedia.org/wiki/Wikipedia:Wiki_Loves_Monuments_2011 (wmf.10 and wmf.9), they should change how the headings look [08:09:52] (03CR) 10Bartosz Dziewoński: "Thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/982236 (https://phabricator.wikimedia.org/T352265) (owner: 10Bartosz Dziewoński) [08:13:02] (03Merged) 10jenkins-bot: CommentFormatter: Do not add wrapper if the heading has attributes [extensions/DiscussionTools] (wmf/1.42.0-wmf.9) - 10https://gerrit.wikimedia.org/r/984495 (https://phabricator.wikimedia.org/T353489) (owner: 10Bartosz Dziewoński) [08:14:23] !log ariel@deploy2002 Started scap: Backport for [[gerrit:984495|CommentFormatter: Do not add wrapper if the heading has attributes (T353489)]] [08:14:28] T353489: DiscussionTools's
wrapper (and future core wrapper) makes it difficult to style headings in wikitext - https://phabricator.wikimedia.org/T353489 [08:16:07] !log ariel@deploy2002 matmarex and ariel: Backport for [[gerrit:984495|CommentFormatter: Do not add wrapper if the heading has attributes (T353489)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [08:16:53] thanks, testing [08:16:59] great! [08:17:06] 10SRE, 10Infrastructure-Foundations: Integrate Bookworm 12.3/12.4 point update - https://phabricator.wikimedia.org/T353057 (10MoritzMuehlenhoff) [08:18:48] apergos: looks good [08:19:17] proceeding [08:19:20] !log ariel@deploy2002 matmarex and ariel: Continuing with sync [08:24:05] (03PS1) 10Muehlenhoff: doc: Switch rsync services to use firewall::service [puppet] - 10https://gerrit.wikimedia.org/r/984800 [08:25:30] !log ariel@deploy2002 Finished scap: Backport for [[gerrit:984495|CommentFormatter: Do not add wrapper if the heading has attributes (T353489)]] (duration: 11m 07s) [08:25:35] T353489: DiscussionTools's
wrapper (and future core wrapper) makes it difficult to style headings in wikitext - https://phabricator.wikimedia.org/T353489 [08:25:45] you patch is now live on the cluster, please test it there [08:25:52] *your [08:26:01] MatmaRex: [08:27:34] apergos: still looks good [08:27:46] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/984800 (owner: 10Muehlenhoff) [08:28:04] watching errors for a couple minutes, then we'll move on [08:30:20] (03CR) 10TrainBranchBot: [C: 03+2] "Approved by ariel@deploy2002 using scap backport" [extensions/DiscussionTools] (wmf/1.42.0-wmf.10) - 10https://gerrit.wikimedia.org/r/984496 (https://phabricator.wikimedia.org/T353489) (owner: 10Bartosz Dziewoński) [08:37:03] (03Merged) 10jenkins-bot: CommentFormatter: Do not add wrapper if the heading has attributes [extensions/DiscussionTools] (wmf/1.42.0-wmf.10) - 10https://gerrit.wikimedia.org/r/984496 (https://phabricator.wikimedia.org/T353489) (owner: 10Bartosz Dziewoński) [08:37:32] !log ariel@deploy2002 Started scap: Backport for [[gerrit:984496|CommentFormatter: Do not add wrapper if the heading has attributes (T353489)]] [08:37:37] T353489: DiscussionTools's
wrapper (and future core wrapper) makes it difficult to style headings in wikitext - https://phabricator.wikimedia.org/T353489 [08:39:02] !log ariel@deploy2002 ariel and matmarex: Backport for [[gerrit:984496|CommentFormatter: Do not add wrapper if the heading has attributes (T353489)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [08:39:57] MatmaRex: please test your change on any debug server. [08:40:11] apergos: also looking good on wmf.10 [08:44:17] !log ariel@deploy2002 ariel and matmarex: Continuing with sync [08:44:55] plesae excuse the delay, I was fiddling with the filters on the mw debug errors dashboard to make sure I saw any that might be there, it was overrun with debug/info messages [08:50:12] !log ariel@deploy2002 Finished scap: Backport for [[gerrit:984496|CommentFormatter: Do not add wrapper if the heading has attributes (T353489)]] (duration: 12m 39s) [08:50:16] T353489: DiscussionTools's
wrapper (and future core wrapper) makes it difficult to style headings in wikitext - https://phabricator.wikimedia.org/T353489 [08:50:46] MatmaRex: please test your patch on the production cluster [08:51:30] apergos: looks good [08:51:47] going to watch logstash for a couple minutes before we call it done [08:52:54] (03PS1) 10Kevin Bazira: ml-services: bump CPUs to improve model-server performance [deployment-charts] - 10https://gerrit.wikimedia.org/r/984236 (https://phabricator.wikimedia.org/T353127) [08:54:34] seems ok, that's it for today, thanks for showing up, see you next time everybody! [08:54:44] (03CR) 10Elukey: [C: 03+1] ml-services: bump CPUs to improve model-server performance [deployment-charts] - 10https://gerrit.wikimedia.org/r/984236 (https://phabricator.wikimedia.org/T353127) (owner: 10Kevin Bazira) [08:54:50] !log UTC morning backport and config window done [08:54:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:55:33] thanks [08:56:12] (03CR) 10Kevin Bazira: [C: 03+2] "Thanks for the review :)" [deployment-charts] - 10https://gerrit.wikimedia.org/r/984236 (https://phabricator.wikimedia.org/T353127) (owner: 10Kevin Bazira) [08:57:07] (03Merged) 10jenkins-bot: ml-services: bump CPUs to improve model-server performance [deployment-charts] - 10https://gerrit.wikimedia.org/r/984236 (https://phabricator.wikimedia.org/T353127) (owner: 10Kevin Bazira) [08:59:38] !log kevinbazira@deploy2002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . [09:24:22] jouncebot: nowandnext [09:24:22] No deployments scheduled for the next 1 hour(s) and 35 minute(s) [09:24:22] In 1 hour(s) and 35 minute(s): Services – Citoid / Zotero (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20231221T1100) [09:24:22] In 1 hour(s) and 35 minute(s): MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20231221T1100) [09:24:31] (03PS5) 10Urbanecm: Temporary users: set notifyBeforeExpirationDays to ten days [mediawiki-config] - 10https://gerrit.wikimedia.org/r/983755 (https://phabricator.wikimedia.org/T344694) (owner: 10Sergio Gimeno) [09:24:35] (03CR) 10Urbanecm: [C: 03+2] "labs only" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/983755 (https://phabricator.wikimedia.org/T344694) (owner: 10Sergio Gimeno) [09:24:59] (03CR) 10TrainBranchBot: [C: 03+2] "Approved by urbanecm@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/983755 (https://phabricator.wikimedia.org/T344694) (owner: 10Sergio Gimeno) [09:25:23] (03Merged) 10jenkins-bot: Temporary users: set notifyBeforeExpirationDays to ten days [mediawiki-config] - 10https://gerrit.wikimedia.org/r/983755 (https://phabricator.wikimedia.org/T344694) (owner: 10Sergio Gimeno) [09:30:56] (03PS1) 10Muehlenhoff: statistics::rsyncd: Switch rsync service to use firewall::service [puppet] - 10https://gerrit.wikimedia.org/r/984803 [09:33:51] RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 226, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [09:35:13] RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 127, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [09:36:00] (03CR) 10Volans: [C: 03+1] "LGTM with some caveat that I'll leave to @netops. Feel free to test it with https://wikitech.wikimedia.org/wiki/Spicerack/Cookbooks#Test_b" [cookbooks] - 10https://gerrit.wikimedia.org/r/984642 (https://phabricator.wikimedia.org/T353825) (owner: 10Cathal Mooney) [09:40:58] !log ayounsi@cumin1001 START - Cookbook sre.network.configure-switch-interfaces for host wdqs1006 [09:42:28] !log ayounsi@cumin1001 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs1006 [09:47:40] (03PS5) 10Arnaudb: mysqld-exporter-config: simplify manual runs [puppet] - 10https://gerrit.wikimedia.org/r/984232 (https://phabricator.wikimedia.org/T327384) [09:48:01] (03CR) 10Arnaudb: mysqld-exporter-config: simplify manual runs (038 comments) [puppet] - 10https://gerrit.wikimedia.org/r/984232 (https://phabricator.wikimedia.org/T327384) (owner: 10Arnaudb) [09:55:56] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/984803 (owner: 10Muehlenhoff) [10:08:10] (03CR) 10Ayounsi: Add basic validation to Junos config command execution flow (032 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/984642 (https://phabricator.wikimedia.org/T353825) (owner: 10Cathal Mooney) [10:12:09] (03PS1) 10Muehlenhoff: aptrepo:rsync: Switch rsync service to use firewall::service [puppet] - 10https://gerrit.wikimedia.org/r/984805 [10:22:12] (03CR) 10JMeybohm: [C: 03+1] "LGTM but don't do during freeze" [puppet] - 10https://gerrit.wikimedia.org/r/984645 (https://phabricator.wikimedia.org/T351074) (owner: 10Kamila Součková) [10:25:01] 10SRE-swift-storage: Q3 ms backend refresh work - https://phabricator.wikimedia.org/T353149 (10MatthewVernon) [10:25:54] 10SRE, 10SRE-swift-storage, 10ops-codfw, 10DC-Ops, 10Data-Persistence: Q2:rack/setup/install ms-be refresh - https://phabricator.wikimedia.org/T349839 (10MatthewVernon) Thanks :) [10:29:04] (03PS3) 10Alexandros Kosiaris: mediawiki canaries: Include opentelemetry::collector [puppet] - 10https://gerrit.wikimedia.org/r/983895 (https://phabricator.wikimedia.org/T351566) [10:29:06] (03PS6) 10Alexandros Kosiaris: tlsproxy::envoy: Allow specifying a percentage to be traced [puppet] - 10https://gerrit.wikimedia.org/r/983441 (https://phabricator.wikimedia.org/T351566) [10:29:07] !log isaranto@deploy2002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'readability' for release 'main' . [10:29:21] (03CR) 10Alexandros Kosiaris: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/983441 (https://phabricator.wikimedia.org/T351566) (owner: 10Alexandros Kosiaris) [10:29:47] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/984805 (owner: 10Muehlenhoff) [10:35:59] (03PS1) 10Elukey: admin_ng: allow more CPUs for the ml-serve's experimental namespace [deployment-charts] - 10https://gerrit.wikimedia.org/r/984807 [10:38:25] 10SRE-swift-storage, 10Commons, 10UploadWizard, 10Wikimedia-production-error: Uploadwizard sometimes fails "Internal error: Server failed to publish temporary file" - https://phabricator.wikimedia.org/T353871 (10MatthewVernon) [10:39:06] (03CR) 10Elukey: [C: 03+2] admin_ng: allow more CPUs for the ml-serve's experimental namespace [deployment-charts] - 10https://gerrit.wikimedia.org/r/984807 (owner: 10Elukey) [10:40:39] 10SRE-swift-storage, 10Commons, 10UploadWizard: Several people experiencing 'Internal error: Server failed to store temporary file' when trying to upload a file to Commons - https://phabricator.wikimedia.org/T353498 (10MatthewVernon) I've created T353871 to track the later failure mode (since it's something... [10:41:33] (03PS1) 10Muehlenhoff: profile::prometheus::rsyncd: Switch rsync service to use firewall::service [puppet] - 10https://gerrit.wikimedia.org/r/984808 [10:42:32] !log elukey@deploy2002 helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'. [10:42:42] !log elukey@deploy2002 helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'. [10:49:12] (03CR) 10Slyngshede: [C: 03+1] "LGTM, I would have preferred the variable be named auto_firewall or auto_nftable though." [puppet] - 10https://gerrit.wikimedia.org/r/984805 (owner: 10Muehlenhoff) [10:49:30] (03PS7) 10Alexandros Kosiaris: tlsproxy::envoy: Allow specifying a percentage to be traced [puppet] - 10https://gerrit.wikimedia.org/r/983441 (https://phabricator.wikimedia.org/T351566) [10:52:36] (03CR) 10Alexandros Kosiaris: [C: 03+2] tlsproxy::envoy: Allow specifying a percentage to be traced [puppet] - 10https://gerrit.wikimedia.org/r/983441 (https://phabricator.wikimedia.org/T351566) (owner: 10Alexandros Kosiaris) [10:52:46] (03CR) 10Alexandros Kosiaris: [C: 03+2] mediawiki canaries: Include opentelemetry::collector [puppet] - 10https://gerrit.wikimedia.org/r/983895 (https://phabricator.wikimedia.org/T351566) (owner: 10Alexandros Kosiaris) [10:53:51] (03PS1) 10Peter Fischer: enable page_rerender for 2nd batch: dewiki, frwiktionary, and kuwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/984810 [10:54:35] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/984808 (owner: 10Muehlenhoff) [10:56:39] (03PS1) 10Muehlenhoff: phabricator: Switch rsync service to use firewall::service [puppet] - 10https://gerrit.wikimedia.org/r/984811 [10:58:08] (03CR) 10Gehel: [C: 03+1] "LGTM in principles, but let's wait until after the holiday to deploy." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/984810 (owner: 10Peter Fischer) [11:00:04] mvolz: How many deployers does it take to do Services – Citoid / Zotero deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20231221T1100). [11:00:04] Deploy window MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20231221T1100) [11:02:15] PROBLEM - Check systemd state on mw1414 is CRITICAL: CRITICAL - degraded: The following units failed: otelcol-contrib.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [11:03:44] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/984811 (owner: 10Muehlenhoff) [11:04:39] (03PS1) 10Volans: sre.hosts.decommission: make it idempotent again [cookbooks] - 10https://gerrit.wikimedia.org/r/984812 [11:05:47] PROBLEM - Check systemd state on mw1449 is CRITICAL: CRITICAL - degraded: The following units failed: otelcol-contrib.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [11:06:24] (03PS1) 10Majavah: bookworm-sssd: install zip and unzip [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/984813 (https://phabricator.wikimedia.org/T353769) [11:08:47] PROBLEM - Check systemd state on mw1448 is CRITICAL: CRITICAL - degraded: The following units failed: otelcol-contrib.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [11:11:00] (03PS1) 10Alexandros Kosiaris: Switch canaries to 1% OpenTelemetry sampling [puppet] - 10https://gerrit.wikimedia.org/r/984814 (https://phabricator.wikimedia.org/T351566) [11:12:49] PROBLEM - Check systemd state on mw1416 is CRITICAL: CRITICAL - degraded: The following units failed: otelcol-contrib.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [11:13:17] !log volans@cumin1002 START - Cookbook sre.hosts.decommission for hosts wdqs1006.eqiad.wmnet [11:15:34] (03PS1) 10Muehlenhoff: keystone: Switch rsync service to use firewall::service [puppet] - 10https://gerrit.wikimedia.org/r/984815 [11:16:28] !log volans@cumin1002 END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) for hosts wdqs1006.eqiad.wmnet [11:17:25] (03PS2) 10Volans: sre.hosts.decommission: make it idempotent again [cookbooks] - 10https://gerrit.wikimedia.org/r/984812 [11:17:36] (03PS1) 10Alexandros Kosiaris: Provide OpenTelemetry Collector and Port values [puppet] - 10https://gerrit.wikimedia.org/r/984817 (https://phabricator.wikimedia.org/T351566) [11:22:39] !log volans@cumin1002 START - Cookbook sre.hosts.decommission for hosts wdqs1006.eqiad.wmnet [11:23:11] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/984815 (owner: 10Muehlenhoff) [11:23:15] PROBLEM - Check systemd state on mw1447 is CRITICAL: CRITICAL - degraded: The following units failed: otelcol-contrib.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [11:23:39] PROBLEM - Check systemd state on mw1417 is CRITICAL: CRITICAL - degraded: The following units failed: otelcol-contrib.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [11:23:44] !log volans@cumin1002 END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) for hosts wdqs1006.eqiad.wmnet [11:24:01] PROBLEM - Check systemd state on mw1450 is CRITICAL: CRITICAL - degraded: The following units failed: otelcol-contrib.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [11:28:51] (03PS1) 10Muehlenhoff: statistics::explorer::misc_jobs: Switch rsync service to use firewall::service [puppet] - 10https://gerrit.wikimedia.org/r/984818 [11:28:57] RECOVERY - Check systemd state on mw1414 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [11:32:33] RECOVERY - Check systemd state on mw1449 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [11:33:09] PROBLEM - cassandra-a SSL 10.192.16.237:7000 on restbase2028 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused https://wikitech.wikimedia.org/wiki/Cassandra%23Installing_and_generating_certificates [11:33:19] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/984818 (owner: 10Muehlenhoff) [11:33:27] PROBLEM - cassandra-a CQL 10.192.16.237:9042 on restbase2028 is CRITICAL: connect to address 10.192.16.237 and port 9042: Connection refused https://phabricator.wikimedia.org/T93886 [11:33:35] PROBLEM - Check systemd state on restbase2028 is CRITICAL: CRITICAL - degraded: The following units failed: cassandra-a.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [11:33:42] PROBLEM - cassandra-a service on restbase2028 is CRITICAL: CRITICAL - Expecting active but unit cassandra-a is failed https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [11:35:33] RECOVERY - Check systemd state on mw1448 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [11:36:54] 10SRE, 10Infrastructure-Foundations: Integrate Bookworm 12.3/12.4 point update - https://phabricator.wikimedia.org/T353057 (10MoritzMuehlenhoff) [11:37:50] !log Manually restarted cassandra-a service on restbase2028 following OOM - T353456 [11:37:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:38:03] RECOVERY - Check systemd state on restbase2028 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [11:38:05] T353456: Cassandra/restbase2029-a (and others) oom-killed (kernel) - https://phabricator.wikimedia.org/T353456 [11:38:09] RECOVERY - cassandra-a service on restbase2028 is OK: OK - cassandra-a is active https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [11:39:05] RECOVERY - cassandra-a SSL 10.192.16.237:7000 on restbase2028 is OK: SSL OK - Certificate restbase2028-a valid until 2025-12-03 21:32:59 +0000 (expires in 713 days) https://wikitech.wikimedia.org/wiki/Cassandra%23Installing_and_generating_certificates [11:39:23] RECOVERY - cassandra-a CQL 10.192.16.237:9042 on restbase2028 is OK: TCP OK - 0.036 second response time on 10.192.16.237 port 9042 https://phabricator.wikimedia.org/T93886 [11:39:35] RECOVERY - Check systemd state on mw1416 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [11:43:45] (03PS1) 10Cathal Mooney: Enable Homer on cumin1002 [puppet] - 10https://gerrit.wikimedia.org/r/984820 (https://phabricator.wikimedia.org/T353419) [11:45:07] (03PS2) 10Cathal Mooney: Enable Homer on cumin1002 [puppet] - 10https://gerrit.wikimedia.org/r/984820 (https://phabricator.wikimedia.org/T353419) [11:46:40] (03CR) 10Volans: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/984820 (https://phabricator.wikimedia.org/T353419) (owner: 10Cathal Mooney) [11:47:02] (03CR) 10Cathal Mooney: [C: 03+2] Enable Homer on cumin1002 [puppet] - 10https://gerrit.wikimedia.org/r/984820 (https://phabricator.wikimedia.org/T353419) (owner: 10Cathal Mooney) [11:49:19] RECOVERY - Check systemd state on mw1450 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [11:50:03] RECOVERY - Check systemd state on mw1447 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [11:50:27] RECOVERY - Check systemd state on mw1417 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [11:53:16] (03PS1) 10Bartosz Dziewoński: Ignore "exact match" title when the title is not given [extensions/Linter] (wmf/1.42.0-wmf.10) - 10https://gerrit.wikimedia.org/r/984500 (https://phabricator.wikimedia.org/T353860) [11:53:45] (03PS1) 10Bartosz Dziewoński: Fix showing units and limits in NewPP limit report [core] (wmf/1.42.0-wmf.10) - 10https://gerrit.wikimedia.org/r/984501 (https://phabricator.wikimedia.org/T353793) [11:55:01] (MediaWikiLatencyExceeded) firing: Average latency high: codfw parsoid GET/200: 5.45302603267137s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=codfw&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [11:56:16] (03PS1) 10Cathal Mooney: Change cumin2002 remote peer for Homer private repo [puppet] - 10https://gerrit.wikimedia.org/r/984821 (https://phabricator.wikimedia.org/T353419) [11:56:38] (03CR) 10Volans: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/984821 (https://phabricator.wikimedia.org/T353419) (owner: 10Cathal Mooney) [11:57:00] (03CR) 10Cathal Mooney: [C: 03+2] Change cumin2002 remote peer for Homer private repo [puppet] - 10https://gerrit.wikimedia.org/r/984821 (https://phabricator.wikimedia.org/T353419) (owner: 10Cathal Mooney) [12:09:54] 10SRE, 10Infrastructure-Foundations: Integrate Bookworm 12.3/12.4 point update - https://phabricator.wikimedia.org/T353057 (10MoritzMuehlenhoff) [12:14:26] !log volans@cumin1002 START - Cookbook sre.deploy.python-code homer to cumin1002.eqiad.wmnet with reason: Release v0.6.5 - volans@cumin1002 [12:16:49] (03PS1) 10Alexandros Kosiaris: tlsproxy: Fix the definition of random_sampling [puppet] - 10https://gerrit.wikimedia.org/r/984825 (https://phabricator.wikimedia.org/T351566) [12:18:03] !log volans@cumin1002 END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin1002.eqiad.wmnet with reason: Release v0.6.5 - volans@cumin1002 [12:20:14] (03PS2) 10Alexandros Kosiaris: Switch canaries to 1% OpenTelemetry sampling [puppet] - 10https://gerrit.wikimedia.org/r/984814 (https://phabricator.wikimedia.org/T351566) [12:20:15] !log volans@cumin1002 START - Cookbook sre.hosts.decommission for hosts wdqs1006.eqiad.wmnet [12:20:18] (03CR) 10Alexandros Kosiaris: [C: 03+2] tlsproxy: Fix the definition of random_sampling [puppet] - 10https://gerrit.wikimedia.org/r/984825 (https://phabricator.wikimedia.org/T351566) (owner: 10Alexandros Kosiaris) [12:23:01] 10SRE, 10DC-Ops, 10Patch-For-Review: Confirm support of PERC 750 raid controller - https://phabricator.wikimedia.org/T297913 (10MoritzMuehlenhoff) [12:23:14] 10SRE, 10DC-Ops, 10Infrastructure-Foundations: private repo deployment - perccli implementation - https://phabricator.wikimedia.org/T308027 (10MoritzMuehlenhoff) 05Open→03Resolved This is resolved, we have the private apt repo for quite a while and perccli is included in there. [12:27:45] (03PS1) 10Muehlenhoff: ncredir: Select the custom nginx provider with no additional modules [puppet] - 10https://gerrit.wikimedia.org/r/984836 (https://phabricator.wikimedia.org/T329529) [12:27:52] !log volans@cumin1002 START - Cookbook sre.dns.netbox [12:28:06] (03CR) 10Cathal Mooney: Add basic validation to Junos config command execution flow (032 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/984642 (https://phabricator.wikimedia.org/T353825) (owner: 10Cathal Mooney) [12:29:13] !log volans@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [12:29:14] !log volans@cumin1002 END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts wdqs1006.eqiad.wmnet [12:31:22] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/984836 (https://phabricator.wikimedia.org/T329529) (owner: 10Muehlenhoff) [12:31:43] (03CR) 10Volans: "tested with https://phabricator.wikimedia.org/T351671#9421058 seems to work fine" [cookbooks] - 10https://gerrit.wikimedia.org/r/984812 (owner: 10Volans) [12:31:58] (03CR) 10Cathal Mooney: Add basic validation to Junos config command execution flow (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/984642 (https://phabricator.wikimedia.org/T353825) (owner: 10Cathal Mooney) [12:34:01] (03CR) 10Volans: [C: 03+1] Add basic validation to Junos config command execution flow (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/984642 (https://phabricator.wikimedia.org/T353825) (owner: 10Cathal Mooney) [13:18:28] (03CR) 10Cathal Mooney: Add basic validation to Junos config command execution flow (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/984642 (https://phabricator.wikimedia.org/T353825) (owner: 10Cathal Mooney) [13:18:47] (03PS5) 10Slyngshede: Changes to Python infrastucture to help building Debian package. [software/debmonitor] - 10https://gerrit.wikimedia.org/r/982799 [13:21:46] 10SRE, 10Infrastructure-Foundations: Integrate Bookworm 12.3/12.4 point update - https://phabricator.wikimedia.org/T353057 (10MoritzMuehlenhoff) [13:23:34] !log installing libde265 security updates [13:23:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:30:11] (03PS1) 10Ilias Sarantopoulos: ml-services: update revertrisk image to solve redirects issues [deployment-charts] - 10https://gerrit.wikimedia.org/r/984840 (https://phabricator.wikimedia.org/T352958) [13:34:04] (03PS1) 10Dreamy Jazz: Use username for lookup for non-existing user as the vague target [core] (wmf/1.42.0-wmf.10) - 10https://gerrit.wikimedia.org/r/984502 [13:34:19] (03PS1) 10Dreamy Jazz: Use username for lookup for non-existing user as the vague target [core] (wmf/1.42.0-wmf.9) - 10https://gerrit.wikimedia.org/r/984503 [13:40:58] 10SRE, 10Infrastructure-Foundations: Integrate Bookworm 12.3/12.4 point update - https://phabricator.wikimedia.org/T353057 (10MoritzMuehlenhoff) [13:43:15] (03CR) 10AikoChou: [C: 03+1] ml-services: update revertrisk image to solve redirects issues [deployment-charts] - 10https://gerrit.wikimedia.org/r/984840 (https://phabricator.wikimedia.org/T352958) (owner: 10Ilias Sarantopoulos) [13:46:10] (03CR) 10Ilias Sarantopoulos: [C: 03+2] ml-services: update revertrisk image to solve redirects issues [deployment-charts] - 10https://gerrit.wikimedia.org/r/984840 (https://phabricator.wikimedia.org/T352958) (owner: 10Ilias Sarantopoulos) [13:47:06] (03Merged) 10jenkins-bot: ml-services: update revertrisk image to solve redirects issues [deployment-charts] - 10https://gerrit.wikimedia.org/r/984840 (https://phabricator.wikimedia.org/T352958) (owner: 10Ilias Sarantopoulos) [13:50:31] !log isaranto@deploy2002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . [13:52:16] 10SRE, 10Infrastructure-Foundations: Integrate Bookworm 12.3/12.4 point update - https://phabricator.wikimedia.org/T353057 (10MoritzMuehlenhoff) [13:55:24] (03PS3) 10Anzx: uzwikipedia: add a temporary logo for the 20th anniversary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/984498 (https://phabricator.wikimedia.org/T353723) [14:00:05] RoanKattouw, Lucas_WMDE, Urbanecm, awight, and TheresNoTime: How many deployers does it take to do UTC afternoon backport window deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20231221T1400). [14:00:05] anzx, MatmaRex, and Dreamy_Jazz: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [14:00:12] \o [14:00:14] hello [14:00:19] o/ [14:01:06] Dreamy_Jazz: I can deploy the backports, if no one else is around [14:01:18] Thanks. [14:01:32] I can probably deploy in a moment [14:02:31] anzx: for how many days/weeks should that logo be displayed for? [14:02:35] hi [14:02:43] Lucas_WMDE: that would be nice, if you can do it. [14:02:44] till jan 4 [14:03:50] RECOVERY - Host ps1-e8-eqiad is UP: PING OK - Packet loss = 0%, RTA = 9.45 ms [14:03:57] alright, I’ll deploy then [14:04:17] at least until half past [14:05:10] (03CR) 10TrainBranchBot: [C: 03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/984498 (https://phabricator.wikimedia.org/T353723) (owner: 10Anzx) [14:05:54] (03Merged) 10jenkins-bot: uzwikipedia: add a temporary logo for the 20th anniversary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/984498 (https://phabricator.wikimedia.org/T353723) (owner: 10Anzx) [14:06:19] !log lucaswerkmeister-wmde@deploy2002 Started scap: Backport for [[gerrit:984498|uzwikipedia: add a temporary logo for the 20th anniversary (T353723)]] [14:06:23] T353723: Requesting temporary logo change for uz.wikipedia.org - https://phabricator.wikimedia.org/T353723 [14:07:24] PROBLEM - Host ps1-e8-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:07:50] !log bking@cumin2002 START - Cookbook sre.hosts.downtime for 18 days, 0:00:00 on 13 hosts with reason: T352878 [14:07:54] T352878: Troubleshoot recurring systemd unit failures and availability issues for wdqs1022-24 - https://phabricator.wikimedia.org/T352878 [14:08:00] !log lucaswerkmeister-wmde@deploy2002 anzx and lucaswerkmeister-wmde: Backport for [[gerrit:984498|uzwikipedia: add a temporary logo for the 20th anniversary (T353723)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [14:08:05] Lucas_WMDE: checking [14:08:15] !log bking@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 18 days, 0:00:00 on 13 hosts with reason: T352878 [14:08:16] thx [14:09:06] !log bking@cumin2002 START - Cookbook sre.hosts.downtime for 18 days, 0:00:00 on 10 hosts with reason: T352878 [14:09:25] Lucas_WMDE: looks good [14:09:39] !log bking@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 18 days, 0:00:00 on 10 hosts with reason: T352878 [14:09:44] ok thanks! [14:09:45] !log lucaswerkmeister-wmde@deploy2002 anzx and lucaswerkmeister-wmde: Continuing with sync [14:11:22] MdsShakil: I think I’ll run the maintenance script for T351903 after this deployment, are you around? [14:11:22] T351903: Create new namespaces and namespace aliases for bd.wikimedia.org - https://phabricator.wikimedia.org/T351903 [14:13:32] !log re-added Eoghan to pwstore [14:13:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:14:48] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] "starting gate-and-submit ahead of deployment" [extensions/Linter] (wmf/1.42.0-wmf.10) - 10https://gerrit.wikimedia.org/r/984500 (https://phabricator.wikimedia.org/T353860) (owner: 10Bartosz Dziewoński) [14:15:47] !log lucaswerkmeister-wmde@deploy2002 Finished scap: Backport for [[gerrit:984498|uzwikipedia: add a temporary logo for the 20th anniversary (T353723)]] (duration: 09m 28s) [14:15:52] T353723: Requesting temporary logo change for uz.wikipedia.org - https://phabricator.wikimedia.org/T353723 [14:16:51] Lucas_WMDE: thanks [14:17:00] !log lucaswerkmeister-wmde@mwmaint2002:~$ mwscript namespaceDupes bdwikimedia --fix # T351903 – 62 pages to fix, 62 were resolvable. 56 links to fix, 54 were resolvable, 2 were deleted. [14:17:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:17:05] T351903: Create new namespaces and namespace aliases for bd.wikimedia.org - https://phabricator.wikimedia.org/T351903 [14:17:22] (03Merged) 10jenkins-bot: Ignore "exact match" title when the title is not given [extensions/Linter] (wmf/1.42.0-wmf.10) - 10https://gerrit.wikimedia.org/r/984500 (https://phabricator.wikimedia.org/T353860) (owner: 10Bartosz Dziewoński) [14:18:18] oh, Linter’s CI is quite fast ^^ [14:18:28] !log lucaswerkmeister-wmde@deploy2002 Started scap: Backport for [[gerrit:984500|Ignore "exact match" title when the title is not given (T353860)]] [14:18:32] T353860: Can't list lint errors on Wikidata, Wikimedia Commons, etc - https://phabricator.wikimedia.org/T353860 [14:18:44] Hi, can someone let me know if this is fine to finish, please? https://phabricator.wikimedia.org/T350431 [14:19:53] !log lucaswerkmeister-wmde@deploy2002 lucaswerkmeister-wmde and matmarex: Backport for [[gerrit:984500|Ignore "exact match" title when the title is not given (T353860)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [14:20:01] It's about running namespaceDupes.php on srwikisource, srwiktionary and srwiki. [14:20:16] I’m guessing it’s fine, but I don’t think I’ll have time for it today, sorry [14:20:19] MatmaRex: please test :) [14:20:49] That shouldn't take more than 5-10 minutes, so I'll wait. [14:21:00] Lucas_WMDE: looks good [14:21:03] !log lucaswerkmeister-wmde@deploy2002 lucaswerkmeister-wmde and matmarex: Continuing with sync [14:21:05] thanks! [14:22:40] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] "starting gate-and-submit ahead of deployment" [core] (wmf/1.42.0-wmf.10) - 10https://gerrit.wikimedia.org/r/984501 (https://phabricator.wikimedia.org/T353793) (owner: 10Bartosz Dziewoński) [14:23:55] (03CR) 10Eevans: [C: 03+2] restbase: set production role and add config for restbase2033 [puppet] - 10https://gerrit.wikimedia.org/r/984647 (https://phabricator.wikimedia.org/T352468) (owner: 10Eevans) [14:27:02] !log lucaswerkmeister-wmde@deploy2002 Finished scap: Backport for [[gerrit:984500|Ignore "exact match" title when the title is not given (T353860)]] (duration: 08m 33s) [14:27:07] T353860: Can't list lint errors on Wikidata, Wikimedia Commons, etc - https://phabricator.wikimedia.org/T353860 [14:27:27] (03CR) 10TrainBranchBot: [C: 03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [core] (wmf/1.42.0-wmf.10) - 10https://gerrit.wikimedia.org/r/984501 (https://phabricator.wikimedia.org/T353793) (owner: 10Bartosz Dziewoński) [14:29:39] !log jclark@cumin1002 START - Cookbook sre.dns.netbox [14:31:03] !log jclark@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [14:31:07] 10SRE, 10Infrastructure-Foundations: Integrate Bookworm 12.3/12.4 point update - https://phabricator.wikimedia.org/T353057 (10MoritzMuehlenhoff) [14:35:33] Lucas_WMDE: do you have to leave? I can take over the patches for Dreamy_Jazz if so. [14:35:53] kostajh: yes please, that would be great [14:36:01] I’m trying to do MatmaRex’ second patch on the side [14:36:04] Lucas_WMDE: ok, just let me know when you're done please. [14:36:05] but happy to hand over to you after that [14:36:06] ok [14:36:48] !log pfischer@deploy2002 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply [14:36:49] !log pfischer@deploy2002 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply [14:36:56] (JobUnavailable) firing: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [14:40:08] (03PS2) 10Sohom Datta: InitialiseSettings.php: Allow thanking bots [mediawiki-config] - 10https://gerrit.wikimedia.org/r/984288 (https://phabricator.wikimedia.org/T341388) (owner: 10Houseblaster) [14:41:25] (03CR) 10CI reject: [V: 04-1] InitialiseSettings.php: Allow thanking bots [mediawiki-config] - 10https://gerrit.wikimedia.org/r/984288 (https://phabricator.wikimedia.org/T341388) (owner: 10Houseblaster) [14:42:55] (03Merged) 10jenkins-bot: Fix showing units and limits in NewPP limit report [core] (wmf/1.42.0-wmf.10) - 10https://gerrit.wikimedia.org/r/984501 (https://phabricator.wikimedia.org/T353793) (owner: 10Bartosz Dziewoński) [14:43:17] !log lucaswerkmeister-wmde@deploy2002 Started scap: Backport for [[gerrit:984501|Fix showing units and limits in NewPP limit report (T353793)]] [14:43:22] T353793: NewPP limit report no longer includes units and limits - https://phabricator.wikimedia.org/T353793 [14:43:46] > /away [14:44:04] Hmm. Didn't mean to quote that command... [14:44:57] !log lucaswerkmeister-wmde@deploy2002 matmarex and lucaswerkmeister-wmde: Backport for [[gerrit:984501|Fix showing units and limits in NewPP limit report (T353793)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [14:45:14] MatmaRex: can you try the change? [14:45:34] Lucas_WMDE: yeah [14:45:50] looks good on mediawiki.org [14:46:37] Lucas_WMDE: good to go [14:46:43] !log lucaswerkmeister-wmde@deploy2002 matmarex and lucaswerkmeister-wmde: Continuing with sync [14:46:47] alright, thanks [14:46:48] (03PS1) 10Peter Fischer: Search update pipeline: add consumer-devnull release to staging environment [deployment-charts] - 10https://gerrit.wikimedia.org/r/984847 [14:48:29] (03PS1) 10Lucas Werkmeister (WMDE): Add debug code for entity usage logic issue [extensions/Wikibase] (wmf/1.42.0-wmf.9) - 10https://gerrit.wikimedia.org/r/984848 (https://phabricator.wikimedia.org/T255706) [14:48:38] RECOVERY - Host ps1-e8-eqiad is UP: PING OK - Packet loss = 0%, RTA = 8.36 ms [14:49:37] (03CR) 10Bking: [C: 03+1] Search update pipeline: add consumer-devnull release to staging environment (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/984847 (owner: 10Peter Fischer) [14:50:12] (03CR) 10Peter Fischer: [C: 03+2] Search update pipeline: add consumer-devnull release to staging environment [deployment-charts] - 10https://gerrit.wikimedia.org/r/984847 (owner: 10Peter Fischer) [14:51:13] (03Merged) 10jenkins-bot: Search update pipeline: add consumer-devnull release to staging environment [deployment-charts] - 10https://gerrit.wikimedia.org/r/984847 (owner: 10Peter Fischer) [14:51:23] (03CR) 10Peter Fischer: [C: 03+2] Search update pipeline: add consumer-devnull release to staging environment (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/984847 (owner: 10Peter Fischer) [14:52:30] (03CR) 10Bking: [C: 03+1] Add superset namespaces to the dse-k8s cluster [deployment-charts] - 10https://gerrit.wikimedia.org/r/983718 (https://phabricator.wikimedia.org/T347710) (owner: 10Btullis) [14:52:45] !log lucaswerkmeister-wmde@deploy2002 Finished scap: Backport for [[gerrit:984501|Fix showing units and limits in NewPP limit report (T353793)]] (duration: 09m 27s) [14:52:49] T353793: NewPP limit report no longer includes units and limits - https://phabricator.wikimedia.org/T353793 [14:52:54] kostajh: I’m done [14:52:58] (03CR) 10Brouberol: [C: 03+1] Add superset namespaces to the dse-k8s cluster [deployment-charts] - 10https://gerrit.wikimedia.org/r/983718 (https://phabricator.wikimedia.org/T347710) (owner: 10Btullis) [14:53:04] Lucas_WMDE: cool. I'll get started [14:53:08] Back now for my backports. [14:53:18] thanks [14:53:37] (JobUnavailable) resolved: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [14:53:46] (03CR) 10TrainBranchBot: [C: 03+2] "Approved by kharlan@deploy2002 using scap backport" [core] (wmf/1.42.0-wmf.9) - 10https://gerrit.wikimedia.org/r/984503 (owner: 10Dreamy Jazz) [14:53:48] 10SRE, 10ops-eqiad: ps1-e8-eqiad down - https://phabricator.wikimedia.org/T353503 (10Jclark-ctr) 05Open→03Resolved a:03Jclark-ctr corrected issues in netbox link is back up [14:56:39] (03CR) 10BryanDavis: [C: 03+1] bookworm-sssd: install zip and unzip [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/984813 (https://phabricator.wikimedia.org/T353769) (owner: 10Majavah) [14:58:29] (03CR) 10Majavah: [C: 03+2] bookworm-sssd: install zip and unzip [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/984813 (https://phabricator.wikimedia.org/T353769) (owner: 10Majavah) [14:59:00] (03PS2) 10Lucas Werkmeister (WMDE): Add debug code for entity usage logic issue [extensions/Wikibase] (wmf/1.42.0-wmf.9) - 10https://gerrit.wikimedia.org/r/984848 (https://phabricator.wikimedia.org/T255706) [14:59:11] (03Merged) 10jenkins-bot: bookworm-sssd: install zip and unzip [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/984813 (https://phabricator.wikimedia.org/T353769) (owner: 10Majavah) [15:02:59] (03CR) 10Btullis: [C: 03+2] Add superset namespaces to the dse-k8s cluster [deployment-charts] - 10https://gerrit.wikimedia.org/r/983718 (https://phabricator.wikimedia.org/T347710) (owner: 10Btullis) [15:05:02] about ~8 minutes on the wmf.9 patch, then I'll get started on the wmf.10 one. I think we'll need another 30 minutes or so. [15:05:46] you could probably +2 them to get the CI jobs cooking in parallel [15:05:51] (03Merged) 10jenkins-bot: Add superset namespaces to the dse-k8s cluster [deployment-charts] - 10https://gerrit.wikimedia.org/r/983718 (https://phabricator.wikimedia.org/T347710) (owner: 10Btullis) [15:06:18] RECOVERY - BGP status on cr1-esams is OK: BGP OK - up: 471, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status [15:06:23] (03CR) 10Kosta Harlan: [C: 03+2] "backport" [core] (wmf/1.42.0-wmf.10) - 10https://gerrit.wikimedia.org/r/984502 (owner: 10Dreamy Jazz) [15:06:35] MatmaRex: good idea. [15:10:05] !log pfischer@deploy2002 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply [15:11:41] !log pfischer@deploy2002 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply [15:12:47] (03Merged) 10jenkins-bot: Use username for lookup for non-existing user as the vague target [core] (wmf/1.42.0-wmf.9) - 10https://gerrit.wikimedia.org/r/984503 (owner: 10Dreamy Jazz) [15:13:13] !log kharlan@deploy2002 Started scap: Backport for [[gerrit:984503|Use username for lookup for non-existing user as the vague target]] [15:13:34] (03PS3) 10Volans: sre.hosts.decommission: make it idempotent again [cookbooks] - 10https://gerrit.wikimedia.org/r/984812 [15:13:45] (03CR) 10CI reject: [V: 04-1] Add debug code for entity usage logic issue [extensions/Wikibase] (wmf/1.42.0-wmf.9) - 10https://gerrit.wikimedia.org/r/984848 (https://phabricator.wikimedia.org/T255706) (owner: 10Lucas Werkmeister (WMDE)) [15:15:04] !log kharlan@deploy2002 kharlan and dreamyjazz: Backport for [[gerrit:984503|Use username for lookup for non-existing user as the vague target]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [15:15:16] Dreamy_Jazz: ok, we are on mwdebug for wmf.9 [15:15:34] Testing. [15:17:23] (03PS1) 10Peter Fischer: Search update pipeline: set pipeline.name correctly [deployment-charts] - 10https://gerrit.wikimedia.org/r/984852 [15:17:47] (03CR) 10Peter Fischer: [C: 03+2] Search update pipeline: set pipeline.name correctly [deployment-charts] - 10https://gerrit.wikimedia.org/r/984852 (owner: 10Peter Fischer) [15:17:50] (03CR) 10CI reject: [V: 04-1] sre.hosts.decommission: make it idempotent again [cookbooks] - 10https://gerrit.wikimedia.org/r/984812 (owner: 10Volans) [15:18:34] (03Merged) 10jenkins-bot: Search update pipeline: set pipeline.name correctly [deployment-charts] - 10https://gerrit.wikimedia.org/r/984852 (owner: 10Peter Fischer) [15:18:36] Test seems to work [15:18:42] syncing [15:18:46] !log kharlan@deploy2002 kharlan and dreamyjazz: Continuing with sync [15:19:48] (03CR) 10Lucas Werkmeister (WMDE): "> Error: Call to a member function optionsHash() on null" [extensions/Wikibase] (wmf/1.42.0-wmf.9) - 10https://gerrit.wikimedia.org/r/984848 (https://phabricator.wikimedia.org/T255706) (owner: 10Lucas Werkmeister (WMDE)) [15:19:53] !log pfischer@deploy2002 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply [15:20:06] !log pfischer@deploy2002 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply [15:21:15] (03PS4) 10Volans: sre.hosts.decommission: make it idempotent again [cookbooks] - 10https://gerrit.wikimedia.org/r/984812 [15:22:53] kostajh: can you ping me when you’re done? I’d like to test something on mwdebug [15:22:59] Lucas_WMDE: sure [15:23:01] (my meeting turned out shorter than expected, but you can still finish the deployment ^^) [15:23:03] thanks [15:24:52] !log kharlan@deploy2002 Finished scap: Backport for [[gerrit:984503|Use username for lookup for non-existing user as the vague target]] (duration: 11m 38s) [15:25:50] (03CR) 10TrainBranchBot: [C: 03+2] "Approved by kharlan@deploy2002 using scap backport" [core] (wmf/1.42.0-wmf.10) - 10https://gerrit.wikimedia.org/r/984502 (owner: 10Dreamy Jazz) [15:26:00] (03PS1) 10Peter Fischer: Search update pipeline: increase fetch-retry-queue-capacity [deployment-charts] - 10https://gerrit.wikimedia.org/r/984853 [15:26:05] (03CR) 10Peter Fischer: [C: 03+2] Search update pipeline: increase fetch-retry-queue-capacity [deployment-charts] - 10https://gerrit.wikimedia.org/r/984853 (owner: 10Peter Fischer) [15:26:54] (03Merged) 10jenkins-bot: Search update pipeline: increase fetch-retry-queue-capacity [deployment-charts] - 10https://gerrit.wikimedia.org/r/984853 (owner: 10Peter Fischer) [15:27:35] (03Merged) 10jenkins-bot: Use username for lookup for non-existing user as the vague target [core] (wmf/1.42.0-wmf.10) - 10https://gerrit.wikimedia.org/r/984502 (owner: 10Dreamy Jazz) [15:28:00] !log kharlan@deploy2002 Started scap: Backport for [[gerrit:984502|Use username for lookup for non-existing user as the vague target]] [15:28:36] Dreamy_Jazz: wmf.9 is synced. On to wmf.10 [15:28:50] (03CR) 10Cathal Mooney: [C: 03+1] "LGTM, and yes controlling with the new var seems the best way to deal with it." [puppet] - 10https://gerrit.wikimedia.org/r/984615 (owner: 10Muehlenhoff) [15:28:51] Great. Thanks. [15:30:08] !log kharlan@deploy2002 kharlan and dreamyjazz: Backport for [[gerrit:984502|Use username for lookup for non-existing user as the vague target]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [15:30:17] Testing [15:31:31] Test complete [15:31:39] as successful [15:32:19] I see that too. thanks [15:32:22] !log kharlan@deploy2002 kharlan and dreamyjazz: Continuing with sync [15:35:30] !log btullis@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. [15:36:29] !log btullis@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. [15:37:14] (03CR) 10Btullis: [C: 03+2] Add kubeadm files for superset namespaces [puppet] - 10https://gerrit.wikimedia.org/r/983720 (https://phabricator.wikimedia.org/T347710) (owner: 10Btullis) [15:38:37] !log kharlan@deploy2002 Finished scap: Backport for [[gerrit:984502|Use username for lookup for non-existing user as the vague target]] (duration: 10m 37s) [15:39:09] Lucas_WMDE: over to you [15:39:35] thanks! [15:39:39] jouncebot: now [15:39:39] No deployments scheduled for the next 1 hour(s) and 20 minute(s) [15:39:43] should be enough time [15:40:55] I am going to temporarily deploy https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Wikibase/+/984848 to mwdebug2001 [15:41:10] (03CR) 10Filippo Giunchedi: [C: 03+1] profile::prometheus::rsyncd: Switch rsync service to use firewall::service [puppet] - 10https://gerrit.wikimedia.org/r/984808 (owner: 10Muehlenhoff) [15:41:26] so nobody else deploy from deploy2002 during that time please ^^ [15:41:29] I’ll reset git afterwards [15:42:12] (03PS1) 10Peter Fischer: Search update pipeline: set group.id correctly [deployment-charts] - 10https://gerrit.wikimedia.org/r/984856 [15:42:14] checked out and scap pulled [15:42:19] (03CR) 10Peter Fischer: [C: 03+2] Search update pipeline: set group.id correctly [deployment-charts] - 10https://gerrit.wikimedia.org/r/984856 (owner: 10Peter Fischer) [15:43:12] (03Merged) 10jenkins-bot: Search update pipeline: set group.id correctly [deployment-charts] - 10https://gerrit.wikimedia.org/r/984856 (owner: 10Peter Fischer) [15:44:34] !log pfischer@deploy2002 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply [15:44:52] !log pfischer@deploy2002 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply [15:46:13] (03CR) 10Ayounsi: [C: 03+1] "nice!" [cookbooks] - 10https://gerrit.wikimedia.org/r/984812 (owner: 10Volans) [15:46:36] alright, mwdeploy2002 wmf.9 Wikibase restored to previous git (upstream wmf.9) [15:46:39] and mwdebug2001 scap pulled [15:46:40] * Lucas_WMDE done [15:47:06] (03CR) 10Filippo Giunchedi: "LGTM, see inline for a comment re: config file location" [puppet] - 10https://gerrit.wikimedia.org/r/984232 (https://phabricator.wikimedia.org/T327384) (owner: 10Arnaudb) [15:47:58] !log volans@cumin1002 START - Cookbook sre.hosts.decommission for hosts wdqs1007.eqiad.wmnet [15:48:19] (03CR) 10Filippo Giunchedi: "Untested but LGTM, what do you think of the comment re: pathmodified vs pathchanged ?" [puppet] - 10https://gerrit.wikimedia.org/r/984220 (https://phabricator.wikimedia.org/T353691) (owner: 10Herron) [15:48:43] (03CR) 10Lucas Werkmeister (WMDE): [C: 04-2] Add debug code for entity usage logic issue (031 comment) [extensions/Wikibase] (wmf/1.42.0-wmf.9) - 10https://gerrit.wikimedia.org/r/984848 (https://phabricator.wikimedia.org/T255706) (owner: 10Lucas Werkmeister (WMDE)) [15:53:17] !log volans@cumin1002 START - Cookbook sre.dns.netbox [15:54:38] !log volans@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [15:54:39] !log volans@cumin1002 END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts wdqs1007.eqiad.wmnet [15:55:02] (MediaWikiLatencyExceeded) firing: Average latency high: codfw parsoid GET/200: 5.203545098774071s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=codfw&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [15:55:37] (03CR) 10Volans: [C: 03+2] sre.hosts.decommission: make it idempotent again [cookbooks] - 10https://gerrit.wikimedia.org/r/984812 (owner: 10Volans) [15:58:10] !log isaranto@deploy2002 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . [15:59:35] !log isaranto@deploy2002 helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . [15:59:57] (03Merged) 10jenkins-bot: sre.hosts.decommission: make it idempotent again [cookbooks] - 10https://gerrit.wikimedia.org/r/984812 (owner: 10Volans) [16:03:00] !log volans@cumin1002 START - Cookbook sre.hosts.decommission for hosts wdqs1008.eqiad.wmnet [16:08:43] !log volans@cumin1002 START - Cookbook sre.dns.netbox [16:10:05] !log volans@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [16:10:07] !log volans@cumin1002 END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts wdqs1008.eqiad.wmnet [16:11:36] (03PS7) 10Herron: pyrra: reload pyrra-filesystem and thanos-rule on cfg change [puppet] - 10https://gerrit.wikimedia.org/r/984220 (https://phabricator.wikimedia.org/T353691) [16:12:18] (03CR) 10Herron: pyrra: reload pyrra-filesystem and thanos-rule on cfg change (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/984220 (https://phabricator.wikimedia.org/T353691) (owner: 10Herron) [16:12:27] (03PS1) 10Alexandros Kosiaris: Add File:Brezina_-_Brunelli to blacklist [deployment-charts] - 10https://gerrit.wikimedia.org/r/984860 (https://phabricator.wikimedia.org/T353876) [16:14:01] (03CR) 10Clément Goubert: [C: 03+1] Add File:Brezina_-_Brunelli to blacklist [deployment-charts] - 10https://gerrit.wikimedia.org/r/984860 (https://phabricator.wikimedia.org/T353876) (owner: 10Alexandros Kosiaris) [16:14:22] (03CR) 10Alexandros Kosiaris: [C: 03+2] Add File:Brezina_-_Brunelli to blacklist [deployment-charts] - 10https://gerrit.wikimedia.org/r/984860 (https://phabricator.wikimedia.org/T353876) (owner: 10Alexandros Kosiaris) [16:15:20] (03Merged) 10jenkins-bot: Add File:Brezina_-_Brunelli to blacklist [deployment-charts] - 10https://gerrit.wikimedia.org/r/984860 (https://phabricator.wikimedia.org/T353876) (owner: 10Alexandros Kosiaris) [16:16:47] (03PS6) 10Arnaudb: mysqld-exporter-config: simplify manual runs [puppet] - 10https://gerrit.wikimedia.org/r/984232 (https://phabricator.wikimedia.org/T327384) [16:16:53] (03CR) 10Arnaudb: mysqld-exporter-config: simplify manual runs (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/984232 (https://phabricator.wikimedia.org/T327384) (owner: 10Arnaudb) [16:17:50] !log akosiaris@deploy2002 helmfile [codfw] START helmfile.d/services/changeprop: apply [16:18:11] !log akosiaris@deploy2002 helmfile [codfw] DONE helmfile.d/services/changeprop: apply [16:19:31] (03CR) 10FNegri: [C: 03+2] [toolsdb] remove leftover template [puppet] - 10https://gerrit.wikimedia.org/r/984218 (owner: 10FNegri) [16:21:02] (03CR) 10Filippo Giunchedi: [C: 03+1] pyrra: reload pyrra-filesystem and thanos-rule on cfg change [puppet] - 10https://gerrit.wikimedia.org/r/984220 (https://phabricator.wikimedia.org/T353691) (owner: 10Herron) [16:21:41] (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM (untested)" [puppet] - 10https://gerrit.wikimedia.org/r/984232 (https://phabricator.wikimedia.org/T327384) (owner: 10Arnaudb) [16:21:49] (03CR) 10Herron: [C: 03+2] pyrra: reload pyrra-filesystem and thanos-rule on cfg change [puppet] - 10https://gerrit.wikimedia.org/r/984220 (https://phabricator.wikimedia.org/T353691) (owner: 10Herron) [16:22:12] (03PS4) 10Ottomata: WIP - add webrequest.frontend stream [mediawiki-config] - 10https://gerrit.wikimedia.org/r/983905 (https://phabricator.wikimedia.org/T314956) [16:22:34] (03CR) 10FNegri: [C: 03+1] "hmm let's try if that fixes the error..." [puppet] - 10https://gerrit.wikimedia.org/r/984640 (https://phabricator.wikimedia.org/T353829) (owner: 10Andrew Bogott) [16:24:47] (MediaWikiLatencyExceeded) resolved: Average latency high: codfw parsoid GET/200: 5.328924823605962s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=codfw&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [16:26:12] !log akosiaris@deploy2002 helmfile [eqiad] START helmfile.d/services/changeprop: apply [16:26:40] !log akosiaris@deploy2002 helmfile [eqiad] DONE helmfile.d/services/changeprop: apply [16:27:08] !log akosiaris@deploy2002 helmfile [staging] START helmfile.d/services/changeprop: apply [16:27:25] !log akosiaris@deploy2002 helmfile [staging] DONE helmfile.d/services/changeprop: apply [16:29:37] 10SRE, 10SRE-Access-Requests: Requesting access to deployment for Damilare Adedoyin - https://phabricator.wikimedia.org/T353838 (10herron) [16:31:45] (03PS1) 10Herron: admin: add damilare to 'deployment' [puppet] - 10https://gerrit.wikimedia.org/r/984863 (https://phabricator.wikimedia.org/T353838) [16:33:41] (03CR) 10Majavah: [C: 04-1] "According to the HAProxy docs:" [puppet] - 10https://gerrit.wikimedia.org/r/984640 (https://phabricator.wikimedia.org/T353829) (owner: 10Andrew Bogott) [16:34:58] 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to deployment for Damilare Adedoyin - https://phabricator.wikimedia.org/T353838 (10herron) p:05Triage→03Medium Hello! A couple of approvals will be needed on task in order to proceed: @XenoRyet could you please review/approve as WMF ma... [16:46:55] (03CR) 10Arnaudb: mysqld-exporter-config: simplify manual runs (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/984232 (https://phabricator.wikimedia.org/T327384) (owner: 10Arnaudb) [16:54:10] (03PS1) 10Clément Goubert: changeprop-jobqueue: move PublishStashedFile back to metal temporarily [deployment-charts] - 10https://gerrit.wikimedia.org/r/984865 (https://phabricator.wikimedia.org/T352515) [16:57:07] (03PS1) 10Andrew Bogott: Galera haproxy: fix profile::openstack::eqiad1::galera::primary_host [puppet] - 10https://gerrit.wikimedia.org/r/984866 [17:00:07] jhathaway and rzl: gettimeofday() says it's time for Puppet request window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20231221T1700) [17:00:07] No Gerrit patches in the queue for this window AFAICS. [17:00:18] (03CR) 10Clément Goubert: [C: 03+2] changeprop-jobqueue: move PublishStashedFile back to metal temporarily [deployment-charts] - 10https://gerrit.wikimedia.org/r/984865 (https://phabricator.wikimedia.org/T352515) (owner: 10Clément Goubert) [17:01:43] (03Merged) 10jenkins-bot: changeprop-jobqueue: move PublishStashedFile back to metal temporarily [deployment-charts] - 10https://gerrit.wikimedia.org/r/984865 (https://phabricator.wikimedia.org/T352515) (owner: 10Clément Goubert) [17:02:14] !log cgoubert@deploy2002 helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply [17:02:47] !log cgoubert@deploy2002 helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply [17:03:08] !log cgoubert@deploy2002 helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply [17:03:39] !log cgoubert@deploy2002 helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply [17:03:47] !log cgoubert@deploy2002 helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply [17:04:25] !log cgoubert@deploy2002 helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply [17:05:49] (03CR) 10Dzahn: [C: 03+1] rsync::quickdatacopy: Add support for creating nftables-compatible firewall [puppet] - 10https://gerrit.wikimedia.org/r/984615 (owner: 10Muehlenhoff) [17:11:10] 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to deployment for Damilare Adedoyin - https://phabricator.wikimedia.org/T353838 (10XenoRyet) Approved from my end. [17:14:30] (03CR) 10Andrew Bogott: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/984866 (owner: 10Andrew Bogott) [17:17:20] 10SRE-swift-storage, 10Commons, 10Structured-Data-Backlog, 10UploadWizard, 10Wikimedia-production-error: Uploadwizard sometimes fails "Internal error: Server failed to publish temporary file" - https://phabricator.wikimedia.org/T353871 (10Cparle) [17:21:12] PROBLEM - Check systemd state on mirror1001 is CRITICAL: CRITICAL - degraded: The following units failed: update-tails-mirror.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [17:23:04] !log [mirror1001:~] $ sudo systemctl start update-tails-mirror [17:23:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:24:14] the error is "rsync status 5", input/output error.. [17:26:34] !log mirror1001 - when syncing tails mirror - @ERROR: max connections (23) reached -- try again later [17:26:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:26:53] oh well, let it try later then [17:40:17] (03PS2) 10Andrew Bogott: Galera haproxy: fix handling of primary_host [puppet] - 10https://gerrit.wikimedia.org/r/984866 [17:40:44] (03CR) 10Andrew Bogott: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/984866 (owner: 10Andrew Bogott) [17:40:54] (03CR) 10CI reject: [V: 04-1] Galera haproxy: fix handling of primary_host [puppet] - 10https://gerrit.wikimedia.org/r/984866 (owner: 10Andrew Bogott) [17:42:49] (03PS1) 10JHathaway: rake: remove cloning of private repo [puppet] - 10https://gerrit.wikimedia.org/r/984871 [17:43:55] (03CR) 10JHathaway: "@jbond do you know what this is used for?" [puppet] - 10https://gerrit.wikimedia.org/r/984871 (owner: 10JHathaway) [18:39:09] !log releases1003 - sudo chmod -R g+w /srv/org/wikimedia/releases/mediawiki/1.* [18:39:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:03:18] 10SRE-swift-storage, 10Commons, 10Structured-Data-Backlog, 10UploadWizard, 10Wikimedia-production-error: Uploadwizard sometimes fails "Internal error: Server failed to publish temporary file" - https://phabricator.wikimedia.org/T353871 (10PantheraLeo1359531) @MatthewVernon thank you for creating this new... [19:05:52] Pressing the button [19:06:28] (03PS1) 10TrainBranchBot: group2 wikis to 1.42.0-wmf.10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/984985 (https://phabricator.wikimedia.org/T350086) [19:06:30] (03CR) 10TrainBranchBot: [C: 03+2] group2 wikis to 1.42.0-wmf.10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/984985 (https://phabricator.wikimedia.org/T350086) (owner: 10TrainBranchBot) [19:06:47] jouncebot: nowandnext [19:06:47] For the next 1 hour(s) and 53 minute(s): MediaWiki train - Utc-7 Version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20231221T1900) [19:06:48] In 1 hour(s) and 53 minute(s): UTC late backport and config training (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20231221T2100) [19:07:02] (03CR) 10Reedy: [C: 03+1] ExtensionDistributor: Make 1.41 stable [mediawiki-config] - 10https://gerrit.wikimedia.org/r/984505 (https://phabricator.wikimedia.org/T346919) (owner: 10MacFan4000) [19:07:14] (03Merged) 10jenkins-bot: group2 wikis to 1.42.0-wmf.10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/984985 (https://phabricator.wikimedia.org/T350086) (owner: 10TrainBranchBot) [19:14:33] !log dancy@deploy2002 rebuilt and synchronized wikiversions files: group2 wikis to 1.42.0-wmf.10 refs T350086 [19:14:38] T350086: 1.42.0-wmf.10 deployment blockers - https://phabricator.wikimedia.org/T350086 [19:22:03] (03CR) 10Reedy: [C: 03+2] ExtensionDistributor: Make 1.41 stable [mediawiki-config] - 10https://gerrit.wikimedia.org/r/984505 (https://phabricator.wikimedia.org/T346919) (owner: 10MacFan4000) [19:22:49] (03Merged) 10jenkins-bot: ExtensionDistributor: Make 1.41 stable [mediawiki-config] - 10https://gerrit.wikimedia.org/r/984505 (https://phabricator.wikimedia.org/T346919) (owner: 10MacFan4000) [19:31:46] !log reedy@deploy2002 Synchronized wmf-config/CommonSettings.php: T346919 (duration: 06m 26s) [19:31:57] T346919: Release MediaWiki 1.41.0 - https://phabricator.wikimedia.org/T346919 [20:38:23] (03CR) 10CDanis: [C: 03+1] Provide OpenTelemetry Collector and Port values (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/984817 (https://phabricator.wikimedia.org/T351566) (owner: 10Alexandros Kosiaris) [21:00:05] brennen and TheresNoTime: That opportune time for a UTC late backport and config training deploy is upon us again. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20231221T2100). [21:00:40] Nothing in the list :D [21:00:43] last backport window of the year! [21:03:34] huh, that's true, last deploy of the 2023 already happened: https://sal.toolforge.org/log/NCzcjYwBhuQtenzvylt_ a nice way to cap off [21:08:23] 🥂 [21:28:46] happy non-deploying interlude all [21:52:51] TheresNoTime: pfft. go find stuff ;P [22:04:02] (03PS1) 10Ottomata: WIP - create eventlogging-processor legacy proxy to eventgate for mediawiki.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/985023 (https://phabricator.wikimedia.org/T353817) [22:34:50] 10ops-eqiad: PDU sensor over limit - https://phabricator.wikimedia.org/T353913 (10phaultfinder) [22:36:36] 10SRE, 10observability: Convert udp2log init script to use systemd - https://phabricator.wikimedia.org/T276623 (10colewhite) 05Open→03Resolved a:03colewhite This appears resolved for some time now. [22:45:22] (03PS1) 10Cwhite: udp2log: amend demux.py to support the python3 runtime [puppet] - 10https://gerrit.wikimedia.org/r/984237 (https://phabricator.wikimedia.org/T353220) [22:46:00] (03CR) 10CI reject: [V: 04-1] udp2log: amend demux.py to support the python3 runtime [puppet] - 10https://gerrit.wikimedia.org/r/984237 (https://phabricator.wikimedia.org/T353220) (owner: 10Cwhite) [22:49:55] (03PS2) 10Cwhite: udp2log: amend demux.py to support the python3 runtime [puppet] - 10https://gerrit.wikimedia.org/r/984237 (https://phabricator.wikimedia.org/T353220) [23:14:15] (03PS1) 10Cwhite: udp2log: add simple benthos pipeline [puppet] - 10https://gerrit.wikimedia.org/r/984238 (https://phabricator.wikimedia.org/T353220) [23:14:49] (03CR) 10CI reject: [V: 04-1] udp2log: add simple benthos pipeline [puppet] - 10https://gerrit.wikimedia.org/r/984238 (https://phabricator.wikimedia.org/T353220) (owner: 10Cwhite) [23:16:34] (03PS2) 10Cwhite: udp2log: add simple benthos pipeline [puppet] - 10https://gerrit.wikimedia.org/r/984238 (https://phabricator.wikimedia.org/T353220) [23:19:28] (03PS3) 10Cwhite: udp2log: add simple benthos pipeline [puppet] - 10https://gerrit.wikimedia.org/r/984238 (https://phabricator.wikimedia.org/T353220) [23:24:54] (03CR) 10Cwhite: [C: 04-1] "PCC OK: https://puppet-compiler.wmflabs.org/output/984238/985/" [puppet] - 10https://gerrit.wikimedia.org/r/984238 (https://phabricator.wikimedia.org/T353220) (owner: 10Cwhite) [23:24:59] 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to restricted production access and analytics-privatedata-users for Aki Nakanishi - https://phabricator.wikimedia.org/T353363 (10odimitrijevic) Approved [23:25:13] 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to restricted production access and analytics-privatedata-users for Riddy Khan - https://phabricator.wikimedia.org/T353370 (10odimitrijevic) Approved