[00:08:29] <wikibugs>	 (03PS1) 10Cwhite: logstash remove wikifunctions response field [puppet] - 10https://gerrit.wikimedia.org/r/942799 (https://phabricator.wikimedia.org/T180051)
[00:09:48] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P49861 and previous config saved to /var/cache/conftool/dbconfig/20230801-000948-ladsgroup.json
[00:11:58] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.dns.netbox
[00:12:14] <wikibugs>	 (03CR) 10Cwhite: [C: 03+2] logstash remove wikifunctions response field [puppet] - 10https://gerrit.wikimedia.org/r/942799 (https://phabricator.wikimedia.org/T180051) (owner: 10Cwhite)
[00:12:22] <icinga-wm>	 PROBLEM - Check systemd state on cloudweb1004 is CRITICAL: CRITICAL - degraded: The following units failed: wikitech_run_jobs.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:14:16] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new cloud nodes DNS and switch config - pt1979@cumin2002"
[00:15:01] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new cloud nodes DNS and switch config - pt1979@cumin2002"
[00:15:01] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[00:15:03] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10cloud-services-team (Hardware): Q1:rack/setup/install cloudvirt200[4-6]-dev - https://phabricator.wikimedia.org/T342459 (10Papaul)
[00:15:22] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10cloud-services-team (Hardware): Q1:rack/setup/install cloudvirt200[4-6]-dev - https://phabricator.wikimedia.org/T342459 (10Papaul)
[00:15:40] <icinga-wm>	 PROBLEM - Check systemd state on cloudweb1003 is CRITICAL: CRITICAL - degraded: The following units failed: wikitech_run_jobs.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:16:28] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10User-aborrero, 10cloud-services-team (Hardware): Q1:rack/setup/install cloudcontrol200[6-8]-dev, cloudnet200[7-8]-dev - https://phabricator.wikimedia.org/T342456 (10Papaul)
[00:17:54] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.provision for host cloudcontrol2006-dev.mgmt.codfw.wmnet with reboot policy FORCED
[00:20:06] <icinga-wm>	 RECOVERY - Check systemd state on cloudweb1003 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:21:29] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.provision for host cloudcontrol2007-dev.mgmt.codfw.wmnet with reboot policy FORCED
[00:24:12] <icinga-wm>	 RECOVERY - Check systemd state on cloudweb1004 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:24:54] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P49862 and previous config saved to /var/cache/conftool/dbconfig/20230801-002454-ladsgroup.json
[00:26:12] <icinga-wm>	 PROBLEM - cinder-api http on cloudcontrol1005 is CRITICAL: connect to address 10.64.151.3 and port 18776: Connection refused https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[00:26:54] <icinga-wm>	 PROBLEM - cinder-scheduler process on cloudcontrol1005 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python.* /usr/bin/cinder-scheduler https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[00:27:16] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: Average latency high: eqiad parsoid GET/200 - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=eqiad&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[00:32:16] <jinxer-wm>	 (MediaWikiLatencyExceeded) resolved: Average latency high: eqiad parsoid GET/200 - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=eqiad&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[00:35:14] <icinga-wm>	 RECOVERY - cinder-api http on cloudcontrol1005 is OK: HTTP OK: HTTP/1.1 300 Multiple Choices - 663 bytes in 0.015 second response time https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[00:35:58] <icinga-wm>	 RECOVERY - cinder-scheduler process on cloudcontrol1005 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python.* /usr/bin/cinder-scheduler https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[00:38:34] <wikibugs>	 10ops-eqiad: Inbound interface errors - https://phabricator.wikimedia.org/T342502 (10phaultfinder)
[00:38:37] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/942800
[00:38:43] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/942800 (owner: 10TrainBranchBot)
[00:40:01] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1223 (T342617)', diff saved to https://phabricator.wikimedia.org/P49863 and previous config saved to /var/cache/conftool/dbconfig/20230801-004000-ladsgroup.json
[00:40:02] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
[00:40:06] <stashbot>	 T342617: Make old columns of externallinks nullable - https://phabricator.wikimedia.org/T342617
[00:40:15] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
[00:54:07] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcontrol2006-dev.mgmt.codfw.wmnet with reboot policy FORCED
[00:55:06] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/942800 (owner: 10TrainBranchBot)
[00:58:18] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcontrol2007-dev.mgmt.codfw.wmnet with reboot policy FORCED
[01:00:23] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10User-aborrero, 10cloud-services-team (Hardware): Q1:rack/setup/install cloudcontrol200[6-8]-dev, cloudnet200[7-8]-dev - https://phabricator.wikimedia.org/T342456 (10Papaul)
[01:03:35] <wikibugs>	 10ops-codfw: Inbound interface errors - https://phabricator.wikimedia.org/T343180 (10phaultfinder)
[01:44:52] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2177 (T342617)', diff saved to https://phabricator.wikimedia.org/P49864 and previous config saved to /var/cache/conftool/dbconfig/20230801-014452-ladsgroup.json
[01:44:55] <stashbot>	 T342617: Make old columns of externallinks nullable - https://phabricator.wikimedia.org/T342617
[01:52:24] <icinga-wm>	 PROBLEM - Router interfaces on cr4-ulsfo is CRITICAL: CRITICAL: host 198.35.26.193, interfaces up: 70, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[01:52:38] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqsin is CRITICAL: CRITICAL: host 103.102.166.130, interfaces up: 82, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[01:59:59] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P49865 and previous config saved to /var/cache/conftool/dbconfig/20230801-015958-ladsgroup.json
[02:00:05] <jouncebot>	 Deploy window Automatic branching of MediaWiki, extensions, skins, and vendor – see Heterogeneous_deployment/Train_deploys (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230801T0200)
[02:06:32] <jinxer-wm>	 (JobUnavailable) firing: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:07:12] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/1.41.0-wmf.20 [core] (wmf/1.41.0-wmf.20) - 10https://gerrit.wikimedia.org/r/942801 (https://phabricator.wikimedia.org/T340248)
[02:07:18] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] Branch commit for wmf/1.41.0-wmf.20 [core] (wmf/1.41.0-wmf.20) - 10https://gerrit.wikimedia.org/r/942801 (https://phabricator.wikimedia.org/T340248) (owner: 10TrainBranchBot)
[02:11:33] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:15:05] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P49866 and previous config saved to /var/cache/conftool/dbconfig/20230801-021504-ladsgroup.json
[02:18:34] <icinga-wm>	 PROBLEM - Check systemd state on gitlab1003 is CRITICAL: CRITICAL - degraded: The following units failed: sync-gitlab-group-with-ldap.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[02:19:30] <icinga-wm>	 PROBLEM - Check systemd state on gitlab2002 is CRITICAL: CRITICAL - degraded: The following units failed: sync-gitlab-group-with-ldap.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[02:22:03] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/1.41.0-wmf.20 [core] (wmf/1.41.0-wmf.20) - 10https://gerrit.wikimedia.org/r/942801 (https://phabricator.wikimedia.org/T340248) (owner: 10TrainBranchBot)
[02:28:21] <wikibugs>	 10SRE, 10Wikimedia-Site-requests, 10serviceops, 10Performance-Team (Radar): Raise limit of $wgMaxArticleSize for Hebrew Wikisource - https://phabricator.wikimedia.org/T275319 (10Vladis13) >>! In T275319#9057445, @Reedy wrote: > None of this is helping move the discussion forward. >  > Timo's comment in T27...
[02:30:11] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2177 (T342617)', diff saved to https://phabricator.wikimedia.org/P49867 and previous config saved to /var/cache/conftool/dbconfig/20230801-023010-ladsgroup.json
[02:30:14] <stashbot>	 T342617: Make old columns of externallinks nullable - https://phabricator.wikimedia.org/T342617
[02:30:32] <icinga-wm>	 RECOVERY - Check systemd state on gitlab1003 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[02:31:33] <jinxer-wm>	 (JobUnavailable) resolved: (2) Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:38:35] <wikibugs>	 10ops-eqiad: Inbound interface errors - https://phabricator.wikimedia.org/T342502 (10phaultfinder)
[02:46:32] <icinga-wm>	 RECOVERY - Check systemd state on gitlab2002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[03:00:04] <icinga-wm>	 RECOVERY - Check systemd state on deploy1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[03:00:04] <jouncebot>	 Deploy window Automatic deployment of of MediaWiki, extensions, skins, and vendor to testwikis only – see Heterogeneous_deployment/Train_deploys (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230801T0300)
[03:01:27] <wikibugs>	 (03PS1) 10TrainBranchBot: testwikis wikis to 1.41.0-wmf.20 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/943654 (https://phabricator.wikimedia.org/T340248)
[03:01:29] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] testwikis wikis to 1.41.0-wmf.20 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/943654 (https://phabricator.wikimedia.org/T340248) (owner: 10TrainBranchBot)
[03:02:11] <wikibugs>	 (03Merged) 10jenkins-bot: testwikis wikis to 1.41.0-wmf.20 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/943654 (https://phabricator.wikimedia.org/T340248) (owner: 10TrainBranchBot)
[03:02:44] <logmsgbot>	 !log mwpresync@deploy1002 Started scap: testwikis wikis to 1.41.0-wmf.20  refs T340248
[03:02:48] <stashbot>	 T340248: 1.41.0-wmf.20 deployment blockers - https://phabricator.wikimedia.org/T340248
[03:33:16] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: Average latency high: eqiad parsoid GET/200 - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=eqiad&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[03:35:34] <jinxer-wm>	 (KubernetesAPILatency) firing: High Kubernetes API latency (POST pods) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=eqiad&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[03:38:16] <jinxer-wm>	 (MediaWikiLatencyExceeded) resolved: Average latency high: eqiad parsoid GET/200 - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=eqiad&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[03:40:34] <jinxer-wm>	 (KubernetesAPILatency) resolved: High Kubernetes API latency (POST pods) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=eqiad&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[03:54:50] <logmsgbot>	 !log mwpresync@deploy1002 Finished scap: testwikis wikis to 1.41.0-wmf.20  refs T340248 (duration: 52m 06s)
[03:54:54] <stashbot>	 T340248: 1.41.0-wmf.20 deployment blockers - https://phabricator.wikimedia.org/T340248
[03:57:02] <logmsgbot>	 !log mwpresync@deploy1002 Pruned MediaWiki: 1.41.0-wmf.18 (duration: 02m 09s)
[04:49:25] <wikibugs>	 (03PS2) 10KartikMistry: cxserver: Remove Youdao MT service [deployment-charts] - 10https://gerrit.wikimedia.org/r/942748 (https://phabricator.wikimedia.org/T329137)
[04:59:17] * kart_ updating cxserver..
[05:00:04] <icinga-wm>	 RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[05:00:35] <kart_>	 cxserver-staging-tls-proxy: image: docker-registry.discovery.wmnet/envoy:1.23.10-1 -> image: docker-registry.discovery.wmnet/envoy:1.23.10-2 -- is it OK to go ahead with this? Change applied, not deployed.
[05:03:36] <kart_>	 Seems mesh change. I'll go ahead.
[05:04:36] <wikibugs>	 (03CR) 10KartikMistry: [C: 03+2] Update cxserver to 2023-07-13-063245-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/937578 (https://phabricator.wikimedia.org/T340953) (owner: 10Santhosh)
[05:05:21] <wikibugs>	 (03Merged) 10jenkins-bot: Update cxserver to 2023-07-13-063245-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/937578 (https://phabricator.wikimedia.org/T340953) (owner: 10Santhosh)
[05:06:39] <logmsgbot>	 !log kartik@deploy1002 helmfile [staging] START helmfile.d/services/cxserver: apply
[05:07:00] <logmsgbot>	 !log kartik@deploy1002 helmfile [staging] DONE helmfile.d/services/cxserver: apply
[05:12:20] <logmsgbot>	 !log kartik@deploy1002 helmfile [codfw] START helmfile.d/services/cxserver: apply
[05:12:54] <logmsgbot>	 !log kartik@deploy1002 helmfile [codfw] DONE helmfile.d/services/cxserver: apply
[05:13:37] <wikibugs>	 (03PS5) 10Slyngshede: Facter: PHP Version [puppet] - 10https://gerrit.wikimedia.org/r/942628 (https://phabricator.wikimedia.org/T271196)
[05:14:13] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Facter: PHP Version [puppet] - 10https://gerrit.wikimedia.org/r/942628 (https://phabricator.wikimedia.org/T271196) (owner: 10Slyngshede)
[05:15:16] <marostegui>	 !log dbmaint s4 testcommonswiki eqiad T343175
[05:15:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:15:19] <stashbot>	 T343175: Remove old fields 'cuc_user' and 'cuc_user_text' as well as index 'cuc_user_ip_time' from a few production wikis - https://phabricator.wikimedia.org/T343175
[05:16:29] <marostegui>	 !log dbmaint s4 labswiki (wikitech) eqiad T343175
[05:16:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:18:29] <marostegui>	 !log dbmaint s4 testcommonswiki eqiad T343174
[05:18:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:18:32] <stashbot>	 T343174: Add missing column cuc_only_for_read_old to testcommonswiki - https://phabricator.wikimedia.org/T343174
[05:21:34] <wikibugs>	 (03PS6) 10Slyngshede: Facter: PHP Version [puppet] - 10https://gerrit.wikimedia.org/r/942628 (https://phabricator.wikimedia.org/T271196)
[05:22:10] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Facter: PHP Version [puppet] - 10https://gerrit.wikimedia.org/r/942628 (https://phabricator.wikimedia.org/T271196) (owner: 10Slyngshede)
[05:23:10] <icinga-wm>	 PROBLEM - Backup freshness on backup1001 is CRITICAL: Stale: 1 (gerrit1003), Fresh: 129 jobs https://wikitech.wikimedia.org/wiki/Bacula%23Monitoring
[05:23:49] <logmsgbot>	 !log kartik@deploy1002 helmfile [eqiad] START helmfile.d/services/cxserver: apply
[05:24:36] <logmsgbot>	 !log kartik@deploy1002 helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
[05:26:25] <wikibugs>	 (03PS3) 10KartikMistry: cxserver: Remove Youdao MT service [deployment-charts] - 10https://gerrit.wikimedia.org/r/942748 (https://phabricator.wikimedia.org/T329137)
[05:26:54] <kart_>	 !log Updated cxserver to 2023-07-13-063245-production (T340953)
[05:26:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:26:58] <stashbot>	 T340953: Enable MinT for all the remaining languages supported by NLLB-200 - https://phabricator.wikimedia.org/T340953
[05:30:16] <kart_>	 I'm doing another cxserver deployment.
[05:32:43] <wikibugs>	 (03CR) 10KartikMistry: [C: 03+2] cxserver: Remove Youdao MT service [deployment-charts] - 10https://gerrit.wikimedia.org/r/942748 (https://phabricator.wikimedia.org/T329137) (owner: 10KartikMistry)
[05:33:26] <wikibugs>	 (03Merged) 10jenkins-bot: cxserver: Remove Youdao MT service [deployment-charts] - 10https://gerrit.wikimedia.org/r/942748 (https://phabricator.wikimedia.org/T329137) (owner: 10KartikMistry)
[05:34:32] <icinga-wm>	 PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[05:36:11] <logmsgbot>	 !log kartik@deploy1002 helmfile [staging] START helmfile.d/services/cxserver: apply
[05:36:32] <logmsgbot>	 !log kartik@deploy1002 helmfile [staging] DONE helmfile.d/services/cxserver: apply
[05:41:03] <logmsgbot>	 !log kartik@deploy1002 helmfile [codfw] START helmfile.d/services/cxserver: apply
[05:41:40] <logmsgbot>	 !log kartik@deploy1002 helmfile [codfw] DONE helmfile.d/services/cxserver: apply
[05:45:02] <icinga-wm>	 RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[05:45:42] <logmsgbot>	 !log kartik@deploy1002 helmfile [eqiad] START helmfile.d/services/cxserver: apply
[05:46:20] <logmsgbot>	 !log kartik@deploy1002 helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
[05:48:22] <kart_>	 !log cxserver: Remove Youdao MT service (T329137)
[05:48:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:48:25] <stashbot>	 T329137: Deprecate Youdao MT service - https://phabricator.wikimedia.org/T329137
[06:00:05] <jouncebot>	 Deploy window MediaWiki infrastucture (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230801T0600)
[06:00:05] <jouncebot>	 kormat, marostegui, and Amir1: That opportune time is upon us again. Time for a Primary database switchover deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230801T0600).
[06:16:16] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: Average latency high: eqiad parsoid GET/200 - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=eqiad&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[06:21:16] <jinxer-wm>	 (MediaWikiLatencyExceeded) resolved: Average latency high: eqiad parsoid GET/200 - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=eqiad&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[06:21:37] <wikibugs>	 (03CR) 10Elukey: [V: 03+1 C: 03+2] role::kafka::jumbo: apply thread settings [puppet] - 10https://gerrit.wikimedia.org/r/941840 (owner: 10Elukey)
[06:24:28] <logmsgbot>	 !log elukey@cumin1001 START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons.
[06:25:22] <wikibugs>	 (03PS7) 10Slyngshede: Facter: PHP Version [puppet] - 10https://gerrit.wikimedia.org/r/942628 (https://phabricator.wikimedia.org/T271196)
[06:29:26] <icinga-wm>	 PROBLEM - mailman list info on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[06:30:26] <icinga-wm>	 PROBLEM - mailman archives on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[06:31:46] <icinga-wm>	 RECOVERY - mailman archives on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 50420 bytes in 0.063 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[06:31:51] <wikibugs>	 (03PS8) 10Slyngshede: Facter: PHP Version [puppet] - 10https://gerrit.wikimedia.org/r/942628 (https://phabricator.wikimedia.org/T271196)
[06:32:18] <icinga-wm>	 RECOVERY - mailman list info on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8646 bytes in 0.272 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[06:32:35] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+1] services: upgrade changeprop instances to Buster [deployment-charts] - 10https://gerrit.wikimedia.org/r/943037 (https://phabricator.wikimedia.org/T341140) (owner: 10Elukey)
[06:33:20] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+1] changeprop: allow to tune monitoring container's resources [deployment-charts] - 10https://gerrit.wikimedia.org/r/943038 (https://phabricator.wikimedia.org/T328683) (owner: 10Elukey)
[06:34:06] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+1] services: shift changeprop's cpu resources from main app to the prometheus [deployment-charts] - 10https://gerrit.wikimedia.org/r/943039 (https://phabricator.wikimedia.org/T328683) (owner: 10Elukey)
[06:41:43] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] services: upgrade changeprop instances to Buster [deployment-charts] - 10https://gerrit.wikimedia.org/r/943037 (https://phabricator.wikimedia.org/T341140) (owner: 10Elukey)
[06:54:26] <logmsgbot>	 !log elukey@deploy1002 helmfile [codfw] START helmfile.d/services/changeprop: sync
[06:54:50] <logmsgbot>	 !log elukey@deploy1002 helmfile [codfw] DONE helmfile.d/services/changeprop: sync
[07:00:05] <jouncebot>	 Amir1, Urbanecm, and taavi: Your horoscope predicts another unfortunate UTC morning backport window deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230801T0700).
[07:00:05] <jouncebot>	 No Gerrit patches in the queue for this window AFAICS.
[07:07:01] <logmsgbot>	 !log elukey@deploy1002 helmfile [eqiad] START helmfile.d/services/changeprop: sync
[07:07:24] <logmsgbot>	 !log elukey@deploy1002 helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
[07:13:50] <wikibugs>	 (03PS2) 10Elukey: eventgate: set a more performant default for queue.buffering.max.ms [deployment-charts] - 10https://gerrit.wikimedia.org/r/937432 (https://phabricator.wikimedia.org/T338357)
[07:15:38] <icinga-wm>	 RECOVERY - Check systemd state on db2114 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[07:15:44] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] changeprop: allow to tune monitoring container's resources [deployment-charts] - 10https://gerrit.wikimedia.org/r/943038 (https://phabricator.wikimedia.org/T328683) (owner: 10Elukey)
[07:15:53] <wikibugs>	 (03PS6) 10Elukey: services: shift changeprop's cpu resources from main app to the prometheus [deployment-charts] - 10https://gerrit.wikimedia.org/r/943039 (https://phabricator.wikimedia.org/T328683)
[07:17:15] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] services: shift changeprop's cpu resources from main app to the prometheus [deployment-charts] - 10https://gerrit.wikimedia.org/r/943039 (https://phabricator.wikimedia.org/T328683) (owner: 10Elukey)
[07:18:58] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/943579 (owner: 10Elukey)
[07:22:28] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] aptrepo: add new key for ROCm repositories [puppet] - 10https://gerrit.wikimedia.org/r/943579 (owner: 10Elukey)
[07:23:36] <wikibugs>	 (03CR) 10Filippo Giunchedi: profile::pyrra::api: create profile (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/929729 (https://phabricator.wikimedia.org/T302995) (owner: 10Herron)
[07:30:43] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove access for ntsako [puppet] - 10https://gerrit.wikimedia.org/r/944150
[07:34:27] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/941398 (https://phabricator.wikimedia.org/T338460) (owner: 10Jelto)
[07:34:49] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Remove access for ntsako [puppet] - 10https://gerrit.wikimedia.org/r/944150 (owner: 10Muehlenhoff)
[07:36:08] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] pyrra: deploy to thanos-fe hosts [puppet] - 10https://gerrit.wikimedia.org/r/929734 (https://phabricator.wikimedia.org/T302995) (owner: 10Herron)
[07:36:14] <logmsgbot>	 !log root@cumin2002 START - Cookbook sre.idm.logout Logging Nmaphophe out of all services on: 1277 hosts
[07:36:16] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] profile::pyrra::filesystem: add profile [puppet] - 10https://gerrit.wikimedia.org/r/929731 (https://phabricator.wikimedia.org/T302995) (owner: 10Herron)
[07:37:04] <logmsgbot>	 !log root@cumin2002 END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Nmaphophe out of all services on: 1277 hosts
[07:37:22] <logmsgbot>	 !log root@cumin2002 START - Cookbook sre.idm.logout Logging Nmaphophe out of all services on: 24 hosts
[07:37:27] <logmsgbot>	 !log root@cumin2002 END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Nmaphophe out of all services on: 24 hosts
[07:37:33] <logmsgbot>	 !log root@cumin2002 START - Cookbook sre.idm.logout Logging Nmaphophe out of all services on: 732 hosts
[07:37:47] <logmsgbot>	 !log root@cumin2002 END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Nmaphophe out of all services on: 732 hosts
[07:41:13] <logmsgbot>	 !log elukey@deploy1002 helmfile [staging] START helmfile.d/services/changeprop: sync
[07:41:25] <logmsgbot>	 !log elukey@deploy1002 helmfile [staging] DONE helmfile.d/services/changeprop: sync
[07:42:44] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to restricted for dbrant - https://phabricator.wikimedia.org/T343122 (10MoritzMuehlenhoff) @thcipriani This needs your signoff
[07:44:08] <logmsgbot>	 !log elukey@deploy1002 helmfile [codfw] START helmfile.d/services/changeprop: sync
[07:44:23] <logmsgbot>	 !log elukey@deploy1002 helmfile [codfw] DONE helmfile.d/services/changeprop: sync
[07:49:30] <logmsgbot>	 !log elukey@deploy1002 helmfile [eqiad] START helmfile.d/services/changeprop: sync
[07:49:43] <logmsgbot>	 !log elukey@deploy1002 helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
[07:51:32] <icinga-wm>	 PROBLEM - Check systemd state on netbox1002 is CRITICAL: CRITICAL - degraded: The following units failed: netbox_report_accounting_run.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[07:56:20] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+1] varnish: add requestctl to X-analytics for static actions too [puppet] - 10https://gerrit.wikimedia.org/r/941448 (https://phabricator.wikimedia.org/T342577) (owner: 10Giuseppe Lavagetto)
[08:15:02] <icinga-wm>	 RECOVERY - Check systemd state on netbox1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[08:16:41] <wikibugs>	 (03PS1) 10Elukey: services: add higher limits for cp-jobqueue's exporter [deployment-charts] - 10https://gerrit.wikimedia.org/r/944155 (https://phabricator.wikimedia.org/T328683)
[08:17:41] <logmsgbot>	 !log elukey@deploy1002 helmfile [staging] START helmfile.d/services/changeprop-jobqueue: sync
[08:17:53] <logmsgbot>	 !log elukey@deploy1002 helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: sync
[08:20:15] <urbanecm>	 jouncebot: nowandnext
[08:20:15] <jouncebot>	 No deployments scheduled for the next 1 hour(s) and 39 minute(s)
[08:20:15] <jouncebot>	 In 1 hour(s) and 39 minute(s): MediaWiki infrastucture (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230801T1000)
[08:20:26] <wikibugs>	 (03PS3) 10Urbanecm: GrowthExperiments: enable AddLink task frontend in 10th round of wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/940347 (https://phabricator.wikimedia.org/T308135) (owner: 10Sergio Gimeno)
[08:20:31] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] GrowthExperiments: enable AddLink task frontend in 10th round of wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/940347 (https://phabricator.wikimedia.org/T308135) (owner: 10Sergio Gimeno)
[08:21:37] <wikibugs>	 (03Merged) 10jenkins-bot: GrowthExperiments: enable AddLink task frontend in 10th round of wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/940347 (https://phabricator.wikimedia.org/T308135) (owner: 10Sergio Gimeno)
[08:21:53] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by urbanecm@deploy1002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/940347 (https://phabricator.wikimedia.org/T308135) (owner: 10Sergio Gimeno)
[08:22:27] <moritzm>	 !log installing Linux 4.19.289 on Buster hosts
[08:22:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:22:38] <logmsgbot>	 !log urbanecm@deploy1002 Started scap: Backport for [[gerrit:940347|GrowthExperiments: enable AddLink task frontend in 10th round of wikis (T308135)]]
[08:22:41] <stashbot>	 T308135: Deploy "add a link" to 10th round of wikis - https://phabricator.wikimedia.org/T308135
[08:24:20] <logmsgbot>	 !log urbanecm@deploy1002 sgimeno and urbanecm: Backport for [[gerrit:940347|GrowthExperiments: enable AddLink task frontend in 10th round of wikis (T308135)]] synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
[08:27:31] <logmsgbot>	 !log urbanecm@deploy1002 sgimeno and urbanecm: Continuing with sync
[08:27:39] <urbanecm>	 oh, a new log entry!
[08:29:24] <logmsgbot>	 !log elukey@deploy1002 helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
[08:30:05] <logmsgbot>	 !log elukey@deploy1002 helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
[08:32:17] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+1] services: add higher limits for cp-jobqueue's exporter [deployment-charts] - 10https://gerrit.wikimedia.org/r/944155 (https://phabricator.wikimedia.org/T328683) (owner: 10Elukey)
[08:33:30] <logmsgbot>	 !log urbanecm@deploy1002 Finished scap: Backport for [[gerrit:940347|GrowthExperiments: enable AddLink task frontend in 10th round of wikis (T308135)]] (duration: 10m 52s)
[08:33:33] <stashbot>	 T308135: Deploy "add a link" to 10th round of wikis - https://phabricator.wikimedia.org/T308135
[08:33:56] * urbanecm done
[08:38:35] <logmsgbot>	 !log btullis@cumin1001 START - Cookbook sre.hosts.reimage for host analytics1076.eqiad.wmnet with OS bullseye
[08:39:14] <icinga-wm>	 PROBLEM - Check systemd state on db1140 is CRITICAL: CRITICAL - degraded: The following units failed: wmf_auto_restart_prometheus-mysqld-exporter@s6.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[08:40:10] <logmsgbot>	 !log btullis@cumin1001 START - Cookbook sre.hosts.reimage for host analytics1077.eqiad.wmnet with OS bullseye
[08:45:55] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] services: add higher limits for cp-jobqueue's exporter [deployment-charts] - 10https://gerrit.wikimedia.org/r/944155 (https://phabricator.wikimedia.org/T328683) (owner: 10Elukey)
[08:48:52] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: thumbor: do not redeploy for a mcrouter config changes [deployment-charts] - 10https://gerrit.wikimedia.org/r/944158
[08:48:54] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: function-orchestrator: add mcrouter support [deployment-charts] - 10https://gerrit.wikimedia.org/r/944159 (https://phabricator.wikimedia.org/T297815)
[08:48:56] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: wikifunctions: enable mcrouter for orchestrator [deployment-charts] - 10https://gerrit.wikimedia.org/r/944160 (https://phabricator.wikimedia.org/T297815)
[08:49:34] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "lgtm some minor nits" [puppet] - 10https://gerrit.wikimedia.org/r/943563 (https://phabricator.wikimedia.org/T320390) (owner: 10Jelto)
[08:49:48] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] wikifunctions: enable mcrouter for orchestrator [deployment-charts] - 10https://gerrit.wikimedia.org/r/944160 (https://phabricator.wikimedia.org/T297815) (owner: 10Giuseppe Lavagetto)
[08:50:05] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] function-orchestrator: add mcrouter support [deployment-charts] - 10https://gerrit.wikimedia.org/r/944159 (https://phabricator.wikimedia.org/T297815) (owner: 10Giuseppe Lavagetto)
[08:50:35] <wikibugs>	 (03CR) 10Hnowlan: [C: 03+2] thumbor: do not redeploy for a mcrouter config changes [deployment-charts] - 10https://gerrit.wikimedia.org/r/944158 (owner: 10Giuseppe Lavagetto)
[08:51:35] <wikibugs>	 (03Merged) 10jenkins-bot: thumbor: do not redeploy for a mcrouter config changes [deployment-charts] - 10https://gerrit.wikimedia.org/r/944158 (owner: 10Giuseppe Lavagetto)
[08:51:49] <wikibugs>	 (03CR) 10Jelto: [C: 03+2] gitlab: Use gitlab-settings v1.2.0 [puppet] - 10https://gerrit.wikimedia.org/r/943583 (https://phabricator.wikimedia.org/T320390) (owner: 10Ahmon Dancy)
[08:59:47] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2101.codfw.wmnet with reason: Maintenance
[08:59:59] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: Maintenance
[09:00:01] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: Maintenance
[09:00:11] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2101.codfw.wmnet with reason: Maintenance
[09:02:27] <wikibugs>	 (03PS1) 10MVernon: thanos: fake credential for gitlab account [labs/private] - 10https://gerrit.wikimedia.org/r/944163 (https://phabricator.wikimedia.org/T336234)
[09:03:08] <wikibugs>	 (03PS1) 10MVernon: thanos: add gitlab user [puppet] - 10https://gerrit.wikimedia.org/r/944164 (https://phabricator.wikimedia.org/T336234)
[09:03:20] <logmsgbot>	 !log btullis@cumin1001 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host analytics1077.eqiad.wmnet with OS bullseye
[09:03:44] <logmsgbot>	 !log btullis@cumin1001 START - Cookbook sre.hosts.reimage for host analytics1077.eqiad.wmnet with OS bullseye
[09:06:53] <wikibugs>	 (03CR) 10Marostegui: [C: 03+1] thanos: fake credential for gitlab account [labs/private] - 10https://gerrit.wikimedia.org/r/944163 (https://phabricator.wikimedia.org/T336234) (owner: 10MVernon)
[09:07:09] <wikibugs>	 (03CR) 10Marostegui: [C: 03+1] thanos: add gitlab user [puppet] - 10https://gerrit.wikimedia.org/r/944164 (https://phabricator.wikimedia.org/T336234) (owner: 10MVernon)
[09:08:18] <wikibugs>	 (03CR) 10MVernon: [C: 03+2] thanos: add gitlab user [puppet] - 10https://gerrit.wikimedia.org/r/944164 (https://phabricator.wikimedia.org/T336234) (owner: 10MVernon)
[09:08:42] <wikibugs>	 (03CR) 10MVernon: [V: 03+2] thanos: fake credential for gitlab account [labs/private] - 10https://gerrit.wikimedia.org/r/944163 (https://phabricator.wikimedia.org/T336234) (owner: 10MVernon)
[09:08:45] <wikibugs>	 (03CR) 10MVernon: [V: 03+2 C: 03+2] thanos: fake credential for gitlab account [labs/private] - 10https://gerrit.wikimedia.org/r/944163 (https://phabricator.wikimedia.org/T336234) (owner: 10MVernon)
[09:11:33] <logmsgbot>	 !log elukey@deploy1002 helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
[09:12:01] <logmsgbot>	 !log elukey@deploy1002 helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
[09:15:13] <logmsgbot>	 !log mvernon@cumin1001 START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe
[09:19:13] <Emperor>	 Is it intentional that the task-id doesn't end up in that log line? I specified it on the command line
[09:20:25] <wikibugs>	 (03PS1) 10Urbanecm: Revert "Fixes: Echo notification count disappears on load in mobile skin" [extensions/Echo] (wmf/1.41.0-wmf.20) - 10https://gerrit.wikimedia.org/r/943605 (https://phabricator.wikimedia.org/T335273)
[09:20:37] <urbanecm>	 jouncebot: nowandnext
[09:20:37] <jouncebot>	 No deployments scheduled for the next 0 hour(s) and 39 minute(s)
[09:20:37] <jouncebot>	 In 0 hour(s) and 39 minute(s): MediaWiki infrastucture (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230801T1000)
[09:20:46] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] Revert "Fixes: Echo notification count disappears on load in mobile skin" [extensions/Echo] (wmf/1.41.0-wmf.20) - 10https://gerrit.wikimedia.org/r/943605 (https://phabricator.wikimedia.org/T335273) (owner: 10Urbanecm)
[09:20:49] <urbanecm>	 shipping a fix for a train blocker
[09:21:10] <urbanecm>	 (cc Lucas_WMDE; thanks for noticing, somewhat didn't think of testing it elsewhere than V22 and minerva)
[09:21:32] <logmsgbot>	 !log mvernon@cumin1001 END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe
[09:22:27] <Lucas_WMDE>	 np
[09:22:41] <wikibugs>	 10SRE-swift-storage, 10collaboration-services: Investigate object storage for Gitlab - https://phabricator.wikimedia.org/T336234 (10MatthewVernon) @eoghan the account is created in thanos-swift and ready for use (and the credential can be templated via puppet).  If for whatever reason you decide not to go ahea...
[09:22:42] <Lucas_WMDE>	 looking at the change I’m also confused how it causes the issue, it looks safe enough for non-minerva skins
[09:23:20] <urbanecm>	 Lucas_WMDE: as far as i understand it it's about merging skinSkins in extension.json with https://www.mediawiki.org/wiki/Manual:$wgResourceModuleSkinStyles
[09:23:23] <Lucas_WMDE>	 I guess it’s indirectly caused by T342907
[09:23:24] <stashbot>	 T342907: Mobile Echo code scattered between Minerva, Echo and MobileFrontend extensions - https://phabricator.wikimedia.org/T342907
[09:23:32] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: function-orchestrator: add mcrouter support [deployment-charts] - 10https://gerrit.wikimedia.org/r/944159 (https://phabricator.wikimedia.org/T297815)
[09:23:34] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: wikifunctions: enable mcrouter for orchestrator [deployment-charts] - 10https://gerrit.wikimedia.org/r/944160 (https://phabricator.wikimedia.org/T297815)
[09:23:57] <urbanecm>	 sometimes skin's SkinStyles removes Echo-provided default, even though it shouldn't have.
[09:31:51] <wikibugs>	 (03PS1) 10Dreamy Jazz: Write new on group0 for event table migration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/944168 (https://phabricator.wikimedia.org/T330158)
[09:32:59] <logmsgbot>	 !log elukey@cumin1001 END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons.
[09:33:05] <logmsgbot>	 !log btullis@cumin1001 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host analytics1076.eqiad.wmnet with OS bullseye
[09:33:18] <wikibugs>	 (03PS1) 10Fabfur: Release 0.1-3 [debs/prometheus-varnishkafka-exporter] - 10https://gerrit.wikimedia.org/r/944169 (https://phabricator.wikimedia.org/T342154)
[09:33:26] <logmsgbot>	 !log btullis@cumin1001 START - Cookbook sre.hosts.reimage for host analytics1076.eqiad.wmnet with OS bullseye
[09:34:57] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "Fixes: Echo notification count disappears on load in mobile skin" [extensions/Echo] (wmf/1.41.0-wmf.20) - 10https://gerrit.wikimedia.org/r/943605 (https://phabricator.wikimedia.org/T335273) (owner: 10Urbanecm)
[09:35:49] <logmsgbot>	 !log urbanecm@deploy1002 Started scap: Backport for [[gerrit:943605|Revert "Fixes: Echo notification count disappears on load in mobile skin" (T335273 T343192)]]
[09:35:53] <stashbot>	 T343192: Repeated notification icons on 1.41.0-wmf.20 when using legacy Vector - https://phabricator.wikimedia.org/T343192
[09:35:54] <stashbot>	 T335273: Echo notification count disappears on load in mobile skin - https://phabricator.wikimedia.org/T335273
[09:36:08] <wikibugs>	 (03CR) 10Fabfur: [C: 03+2] fifo-log-demux: Add socat as companion package [puppet] - 10https://gerrit.wikimedia.org/r/942446 (https://phabricator.wikimedia.org/T342154) (owner: 10Fabfur)
[09:36:59] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
[09:37:12] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
[09:37:18] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1144:3315 (T342617)', diff saved to https://phabricator.wikimedia.org/P49868 and previous config saved to /var/cache/conftool/dbconfig/20230801-093717-ladsgroup.json
[09:37:21] <stashbot>	 T342617: Make old columns of externallinks nullable - https://phabricator.wikimedia.org/T342617
[09:37:26] <logmsgbot>	 !log urbanecm@deploy1002 urbanecm: Backport for [[gerrit:943605|Revert "Fixes: Echo notification count disappears on load in mobile skin" (T335273 T343192)]] synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
[09:38:02] <urbanecm>	 issue resolved, deploying
[09:38:04] <logmsgbot>	 !log urbanecm@deploy1002 urbanecm: Continuing with sync
[09:39:02] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to restricted for dbrant - https://phabricator.wikimedia.org/T343122 (10fgiunchedi)
[09:39:13] <logmsgbot>	 !log elukey@deploy1002 helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
[09:39:15] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to restricted for dbrant - https://phabricator.wikimedia.org/T343122 (10fgiunchedi)
[09:39:54] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to restricted for dbrant - https://phabricator.wikimedia.org/T343122 (10fgiunchedi) 05Open→03In progress
[09:40:03] <logmsgbot>	 !log elukey@deploy1002 helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
[09:40:14] <wikibugs>	 (03CR) 10Muehlenhoff: Release 0.1-3 (031 comment) [debs/prometheus-varnishkafka-exporter] - 10https://gerrit.wikimedia.org/r/944169 (https://phabricator.wikimedia.org/T342154) (owner: 10Fabfur)
[09:42:07] <wikibugs>	 (03PS2) 10Fabfur: Release 0.1-3 [debs/prometheus-varnishkafka-exporter] - 10https://gerrit.wikimedia.org/r/944169 (https://phabricator.wikimedia.org/T342154)
[09:43:27] <wikibugs>	 (03CR) 10Fabfur: "Thanks, as suggested I removed the Uploaders section using only Maintainers" [debs/prometheus-varnishkafka-exporter] - 10https://gerrit.wikimedia.org/r/944169 (https://phabricator.wikimedia.org/T342154) (owner: 10Fabfur)
[09:43:29] <wikibugs>	 (03PS4) 10Jelto: gitlab: remove cas support [puppet] - 10https://gerrit.wikimedia.org/r/943563 (https://phabricator.wikimedia.org/T320390)
[09:43:31] <wikibugs>	 (03PS1) 10Jelto: gitlab: remove cas omniauth_provider [puppet] - 10https://gerrit.wikimedia.org/r/944170 (https://phabricator.wikimedia.org/T320390)
[09:43:34] <Lucas_WMDE>	 test.wikidata.org looks good to me on mwdebug now
[09:43:57] <Lucas_WMDE>	 (also on non-mwdebug, but that just means I got lucky with the backend server, I don’t think the scap is finished yet ^^)
[09:44:13] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to Wiki Replicas end-to-end tiers for dr0ptp4kt - https://phabricator.wikimedia.org/T343039 (10fgiunchedi)
[09:45:05] <urbanecm>	 it's at the restart php stage
[09:45:19] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2111.codfw.wmnet with reason: Maintenance
[09:45:33] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2111.codfw.wmnet with reason: Maintenance
[09:45:39] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db2111 (T342617)', diff saved to https://phabricator.wikimedia.org/P49869 and previous config saved to /var/cache/conftool/dbconfig/20230801-094538-ladsgroup.json
[09:45:41] <wikibugs>	 (03CR) 10Jelto: [V: 03+1] "PCC SUCCESS (CORE_DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/42735/console" [puppet] - 10https://gerrit.wikimedia.org/r/944170 (https://phabricator.wikimedia.org/T320390) (owner: 10Jelto)
[09:45:42] <stashbot>	 T342617: Make old columns of externallinks nullable - https://phabricator.wikimedia.org/T342617
[09:45:48] <wikibugs>	 10SRE-swift-storage, 10collaboration-services: Investigate object storage for Gitlab - https://phabricator.wikimedia.org/T336234 (10eoghan) We will of course, thanks for getting that done so fast!
[09:47:17] <wikibugs>	 (03PS3) 10Fabfur: Release 0.1-3 [debs/prometheus-varnishkafka-exporter] - 10https://gerrit.wikimedia.org/r/944169 (https://phabricator.wikimedia.org/T342154)
[09:47:20] <wikibugs>	 (03CR) 10Jelto: gitlab: remove cas support (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/943563 (https://phabricator.wikimedia.org/T320390) (owner: 10Jelto)
[09:47:24] <logmsgbot>	 !log urbanecm@deploy1002 Finished scap: Backport for [[gerrit:943605|Revert "Fixes: Echo notification count disappears on load in mobile skin" (T335273 T343192)]] (duration: 11m 35s)
[09:47:28] <stashbot>	 T343192: Repeated notification icons on 1.41.0-wmf.20 when using legacy Vector - https://phabricator.wikimedia.org/T343192
[09:47:29] <stashbot>	 T335273: Echo notification count disappears on load in mobile skin - https://phabricator.wikimedia.org/T335273
[09:47:38] <Lucas_WMDE>	 yay
[09:48:18] <icinga-wm>	 PROBLEM - CirrusSearch codfw 95th percentile latency - more_like on graphite1005 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [2000.0] https://wikitech.wikimedia.org/wiki/Search%23Health/Activity_Monitoring https://grafana.wikimedia.org/d/000000455/elasticsearch-percentiles?orgId=1&var-cirrus_group=codfw&var-cluster=elasticsearch&var-exported_cluster=production-search&var-smoothing=1
[09:48:28] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/944170 (https://phabricator.wikimedia.org/T320390) (owner: 10Jelto)
[09:49:46] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] gitlab: remove cas omniauth_provider (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/944170 (https://phabricator.wikimedia.org/T320390) (owner: 10Jelto)
[09:49:53] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] gitlab: remove cas support [puppet] - 10https://gerrit.wikimedia.org/r/943563 (https://phabricator.wikimedia.org/T320390) (owner: 10Jelto)
[09:50:02] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to releasers-wikibase for adee_wmde - https://phabricator.wikimedia.org/T342969 (10fgiunchedi)
[09:51:07] <wikibugs>	 (03PS4) 10Fabfur: Release 0.1-3 [debs/prometheus-varnishkafka-exporter] - 10https://gerrit.wikimedia.org/r/944169 (https://phabricator.wikimedia.org/T342154)
[09:51:16] <icinga-wm>	 RECOVERY - CirrusSearch codfw 95th percentile latency - more_like on graphite1005 is OK: OK: Less than 20.00% above the threshold [1200.0] https://wikitech.wikimedia.org/wiki/Search%23Health/Activity_Monitoring https://grafana.wikimedia.org/d/000000455/elasticsearch-percentiles?orgId=1&var-cirrus_group=codfw&var-cluster=elasticsearch&var-exported_cluster=production-search&var-smoothing=1
[09:53:13] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [debs/prometheus-varnishkafka-exporter] - 10https://gerrit.wikimedia.org/r/944169 (https://phabricator.wikimedia.org/T342154) (owner: 10Fabfur)
[09:53:56] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to releasers-wikibase for darthmon_wmde - https://phabricator.wikimedia.org/T342968 (10fgiunchedi) @KFrancis hello, we'd need verification that this user has an NDA on file, would you mind checking? Thank you in advance!
[09:53:58] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to releasers-wikibase for adee_wmde - https://phabricator.wikimedia.org/T342969 (10fgiunchedi) @KFrancis hello, we'd need verification that this user has an NDA on file, would you mind checking? Thank you in advance!
[09:54:16] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to  releasers-wikibase for roti_WMDE - https://phabricator.wikimedia.org/T342972 (10fgiunchedi) @KFrancis hello, we'd need verification that this user has an NDA on file, would you mind checking? Thank you in advance!
[09:54:43] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to  releasers-wikibase for lojo_wmde - https://phabricator.wikimedia.org/T342973 (10fgiunchedi) @KFrancis hello, we'd need verification that this user has an NDA on file, would you mind checking? Thank you in advance!
[09:57:20] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for Maryana Pinchuk - https://phabricator.wikimedia.org/T342797 (10fgiunchedi) @odimitrijevic @Milimetric hello, we're seeking approval for this request -- thank you!
[09:57:55] <wikibugs>	 (03CR) 10Jelto: [V: 03+1] gitlab: remove cas omniauth_provider (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/944170 (https://phabricator.wikimedia.org/T320390) (owner: 10Jelto)
[09:58:13] <wikibugs>	 (03PS1) 10Jbond: idp_test: add datahub as a OIDC service [puppet] - 10https://gerrit.wikimedia.org/r/944172 (https://phabricator.wikimedia.org/T305874)
[09:59:12] <wikibugs>	 (03CR) 10Jelto: gitlab: remove cas support (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/943563 (https://phabricator.wikimedia.org/T320390) (owner: 10Jelto)
[09:59:16] <wikibugs>	 (03PS2) 10Jbond: idp_test: add datahub as a OIDC service [puppet] - 10https://gerrit.wikimedia.org/r/944172 (https://phabricator.wikimedia.org/T305874)
[10:00:04] <jouncebot>	 Deploy window MediaWiki infrastucture (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230801T1000)
[10:01:17] <wikibugs>	 (03CR) 10Jbond: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/944172 (https://phabricator.wikimedia.org/T305874) (owner: 10Jbond)
[10:04:19] <wikibugs>	 (03PS1) 10Filippo Giunchedi: admin: add radimer-ctr to ldap_users [puppet] - 10https://gerrit.wikimedia.org/r/944174 (https://phabricator.wikimedia.org/T342591)
[10:04:56] <wikibugs>	 (03CR) 10Fabfur: [C: 03+2] Release 0.1-3 [debs/prometheus-varnishkafka-exporter] - 10https://gerrit.wikimedia.org/r/944169 (https://phabricator.wikimedia.org/T342154) (owner: 10Fabfur)
[10:06:05] <wikibugs>	 (03PS3) 10Giuseppe Lavagetto: function-orchestrator: add mcrouter support [deployment-charts] - 10https://gerrit.wikimedia.org/r/944159 (https://phabricator.wikimedia.org/T297815)
[10:06:07] <wikibugs>	 (03PS3) 10Giuseppe Lavagetto: wikifunctions: enable mcrouter for orchestrator [deployment-charts] - 10https://gerrit.wikimedia.org/r/944160 (https://phabricator.wikimedia.org/T297815)
[10:06:12] <wikibugs>	 10SRE-tools, 10DBA, 10Infrastructure-Foundations: Create a cookbook for cloning a mariadb database into another - https://phabricator.wikimedia.org/T340048 (10Ladsgroup) 05Open→03Resolved
[10:09:37] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] idp_test: add datahub as a OIDC service [puppet] - 10https://gerrit.wikimedia.org/r/944172 (https://phabricator.wikimedia.org/T305874) (owner: 10Jbond)
[10:16:08] <wikibugs>	 (03PS5) 10Clément Goubert: mediawiki: set requests based on php.workers [deployment-charts] - 10https://gerrit.wikimedia.org/r/943560 (https://phabricator.wikimedia.org/T342748)
[10:17:10] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to  releasers-wikibase for lojo_wmde - https://phabricator.wikimedia.org/T342973 (10WMDE-leszek) @fgiunchedi don't know if that suffices as a confirmation, but the person in question has fairly recently started at WMDE and signed the NDA as a part of T335941.
[10:18:25] <wikibugs>	 (03CR) 10Clément Goubert: mediawiki: set requests based on php.workers (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/943560 (https://phabricator.wikimedia.org/T342748) (owner: 10Clément Goubert)
[10:18:29] <logmsgbot>	 !log btullis@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host analytics1077.eqiad.wmnet with OS bullseye
[10:20:32] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+1] noc: don't use on-disk files but etcd directly [mediawiki-config] - 10https://gerrit.wikimedia.org/r/942672 (https://phabricator.wikimedia.org/T341859) (owner: 10Giuseppe Lavagetto)
[10:21:25] <fabfur>	 !log imported prometheus-varnishkafka-exporter package into bookworm-wikimedia (https://gerrit.wikimedia.org/r/c/operations/debs/prometheus-varnishkafka-exporter/+/944169) T342154
[10:21:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:21:28] <stashbot>	 T342154: Upgrade Traffic hosts to bookworm - https://phabricator.wikimedia.org/T342154
[10:22:30] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review: Upgrade Traffic hosts to bookworm - https://phabricator.wikimedia.org/T342154 (10Fabfur)
[10:23:13] <wikibugs>	 (03PS9) 10Slyngshede: Facter: PHP Version [puppet] - 10https://gerrit.wikimedia.org/r/942628 (https://phabricator.wikimedia.org/T271196)
[10:23:40] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T342617)', diff saved to https://phabricator.wikimedia.org/P49870 and previous config saved to /var/cache/conftool/dbconfig/20230801-102340-ladsgroup.json
[10:23:43] <stashbot>	 T342617: Make old columns of externallinks nullable - https://phabricator.wikimedia.org/T342617
[10:24:04] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+1] noc: centralize file list management [mediawiki-config] - 10https://gerrit.wikimedia.org/r/942673 (https://phabricator.wikimedia.org/T341859) (owner: 10Giuseppe Lavagetto)
[10:24:19] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review: Upgrade Traffic hosts to bookworm - https://phabricator.wikimedia.org/T342154 (10Fabfur) After importing the required dependencies in bookworm-wikimedia I start working on the `purged` package
[10:24:47] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to  releasers-wikibase for roti_WMDE - https://phabricator.wikimedia.org/T342972 (10WMDE-leszek) @fgiunchedi don't know if that suffices as a confirmation, but the person in question has fairly recently started at WMDE and signed the NDA as a part of T335941.
[10:25:32] <wikibugs>	 (03CR) 10Ladsgroup: "Is there a way to avoid to re-inventing the wheel? :(((" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/942674 (https://phabricator.wikimedia.org/T341859) (owner: 10Giuseppe Lavagetto)
[10:27:00] <wikibugs>	 (03CR) 10Slyngshede: Facter: PHP Version (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/942628 (https://phabricator.wikimedia.org/T271196) (owner: 10Slyngshede)
[10:27:45] <wikibugs>	 (03CR) 10Hnowlan: [C: 03+2] rest-gateway: add citoid and wikifeeds egress [deployment-charts] - 10https://gerrit.wikimedia.org/r/943609 (https://phabricator.wikimedia.org/T339119) (owner: 10Hnowlan)
[10:28:25] <logmsgbot>	 !log btullis@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host analytics1076.eqiad.wmnet with OS bullseye
[10:28:29] <wikibugs>	 (03Merged) 10jenkins-bot: rest-gateway: add citoid and wikifeeds egress [deployment-charts] - 10https://gerrit.wikimedia.org/r/943609 (https://phabricator.wikimedia.org/T339119) (owner: 10Hnowlan)
[10:28:40] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/944174 (https://phabricator.wikimedia.org/T342591) (owner: 10Filippo Giunchedi)
[10:30:52] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] admin: add radimer-ctr to ldap_users [puppet] - 10https://gerrit.wikimedia.org/r/944174 (https://phabricator.wikimedia.org/T342591) (owner: 10Filippo Giunchedi)
[10:31:23] <logmsgbot>	 !log hnowlan@deploy1002 Started deploy [restbase/deploy@8eb62f2]: Add gpewiki and btmwiktionary (T335988, T336116)
[10:31:27] <stashbot>	 T335988: Add gpewiki to RESTBase - https://phabricator.wikimedia.org/T335988
[10:31:27] <stashbot>	 T336116: Add btmwiktionary to RESTBase - https://phabricator.wikimedia.org/T336116
[10:32:50] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2111 (T342617)', diff saved to https://phabricator.wikimedia.org/P49871 and previous config saved to /var/cache/conftool/dbconfig/20230801-103249-ladsgroup.json
[10:32:53] <stashbot>	 T342617: Make old columns of externallinks nullable - https://phabricator.wikimedia.org/T342617
[10:33:34] <wikibugs>	 (03PS1) 10Filippo Giunchedi: admin: fix radimer user name [puppet] - 10https://gerrit.wikimedia.org/r/944176 (https://phabricator.wikimedia.org/T342591)
[10:33:53] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] admin: fix radimer user name [puppet] - 10https://gerrit.wikimedia.org/r/944176 (https://phabricator.wikimedia.org/T342591) (owner: 10Filippo Giunchedi)
[10:34:42] <wikibugs>	 10SRE, 10LDAP-Access-Requests, 10Patch-For-Review: Grant wmf and turnilo/superset access for Rae Adimer - https://phabricator.wikimedia.org/T342591 (10fgiunchedi) @RAdimer-WMF you are now part of `wmf` ldap group, please confirm access is working as expected!
[10:36:50] <wikibugs>	 (03PS3) 10Slyngshede: Credit logo artist. [software/bitu] - 10https://gerrit.wikimedia.org/r/934265 (https://phabricator.wikimedia.org/T338828)
[10:36:56] <wikibugs>	 (03CR) 10Slyngshede: Credit logo artist. (032 comments) [software/bitu] - 10https://gerrit.wikimedia.org/r/934265 (https://phabricator.wikimedia.org/T338828) (owner: 10Slyngshede)
[10:38:46] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P49872 and previous config saved to /var/cache/conftool/dbconfig/20230801-103846-ladsgroup.json
[10:38:48] <wikibugs>	 (03PS6) 10Clément Goubert: mediawiki: set requests based on php.workers [deployment-charts] - 10https://gerrit.wikimedia.org/r/943560 (https://phabricator.wikimedia.org/T342748)
[10:42:15] <wikibugs>	 (03PS4) 10Slyngshede: Allow users to update their email address. [software/bitu] - 10https://gerrit.wikimedia.org/r/934519 (https://phabricator.wikimedia.org/T340637)
[10:43:09] <wikibugs>	 (03PS1) 10Fabfur: Release 0.20 [software/purged] - 10https://gerrit.wikimedia.org/r/944177 (https://phabricator.wikimedia.org/T342154)
[10:44:37] <wikibugs>	 (03CR) 10Jbond: [V: 03+1 C: 03+2] "PCC SUCCESS (NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/42739/console" [puppet] - 10https://gerrit.wikimedia.org/r/944172 (https://phabricator.wikimedia.org/T305874) (owner: 10Jbond)
[10:44:40] <wikibugs>	 (03CR) 10Slyngshede: Allow users to update their email address. (031 comment) [software/bitu] - 10https://gerrit.wikimedia.org/r/934519 (https://phabricator.wikimedia.org/T340637) (owner: 10Slyngshede)
[10:44:48] <wikibugs>	 (03CR) 10Jelto: [V: 03+1 C: 03+2] gitlab: remove cas omniauth_provider [puppet] - 10https://gerrit.wikimedia.org/r/944170 (https://phabricator.wikimedia.org/T320390) (owner: 10Jelto)
[10:45:38] <moritzm>	 !log update d-i images to bookworm 12.1 T343121
[10:45:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:45:41] <stashbot>	 T343121: Integrate Bookworm 12.1 point update - https://phabricator.wikimedia.org/T343121
[10:47:09] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to releasers-wikibase for darthmon_wmde - https://phabricator.wikimedia.org/T342968 (10WMDE-leszek) @fgiunchedi not sure if that is good enough but I was able to locate T222788 about Mónica's NDA.
[10:47:37] <wikibugs>	 (03PS1) 10Jbond: pcc: fix tuple formating [puppet] - 10https://gerrit.wikimedia.org/r/944178
[10:47:45] <wikibugs>	 (03CR) 10Jbond: [V: 03+1 C: 03+2] "PCC SUCCESS (NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/42740/console" [puppet] - 10https://gerrit.wikimedia.org/r/944172 (https://phabricator.wikimedia.org/T305874) (owner: 10Jbond)
[10:47:56] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P49873 and previous config saved to /var/cache/conftool/dbconfig/20230801-104755-ladsgroup.json
[10:48:00] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] pcc: fix tuple formating [puppet] - 10https://gerrit.wikimedia.org/r/944178 (owner: 10Jbond)
[10:50:50] <wikibugs>	 (03PS2) 10Jbond: pcc: fix tuple formating [puppet] - 10https://gerrit.wikimedia.org/r/944178
[10:51:51] <logmsgbot>	 !log hnowlan@deploy1002 Finished deploy [restbase/deploy@8eb62f2]: Add gpewiki and btmwiktionary (T335988, T336116) (duration: 20m 29s)
[10:51:55] <stashbot>	 T335988: Add gpewiki to RESTBase - https://phabricator.wikimedia.org/T335988
[10:51:56] <stashbot>	 T336116: Add btmwiktionary to RESTBase - https://phabricator.wikimedia.org/T336116
[10:53:51] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to releasers-wikibase for darthmon_wmde - https://phabricator.wikimedia.org/T342968 (10MoritzMuehlenhoff) >>! In T342968#9058234, @fgiunchedi wrote: > @KFrancis hello, we'd need verification that this user has an NDA on file, would you mind checking? Thank you in...
[10:53:53] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P49874 and previous config saved to /var/cache/conftool/dbconfig/20230801-105352-ladsgroup.json
[10:57:29] <wikibugs>	 10SRE, 10Infrastructure-Foundations: Integrate Bookworm 12.1 point update - https://phabricator.wikimedia.org/T343121 (10MoritzMuehlenhoff)
[10:57:41] <_joe_>	 jouncebot: nowandnext
[10:57:42] <jouncebot>	 For the next 0 hour(s) and 2 minute(s): MediaWiki infrastucture (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230801T1000)
[10:57:42] <jouncebot>	 In 1 hour(s) and 2 minute(s): Mobileapps/RESTBase/Wikifeeds (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230801T1200)
[10:58:22] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review: Upgrade Traffic hosts to bookworm - https://phabricator.wikimedia.org/T342154 (10Fabfur)
[10:59:21] <wikibugs>	 (03CR) 10Vgutierrez: "We can drop strings.Cut backport considering that bookworm ships golang 1.19: https://gerrit.wikimedia.org/r/plugins/gitiles/operations/so" [software/purged] - 10https://gerrit.wikimedia.org/r/944177 (https://phabricator.wikimedia.org/T342154) (owner: 10Fabfur)
[11:01:17] <wikibugs>	 (03CR) 10Vgutierrez: Release 0.20 (031 comment) [software/purged] - 10https://gerrit.wikimedia.org/r/944177 (https://phabricator.wikimedia.org/T342154) (owner: 10Fabfur)
[11:03:02] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P49875 and previous config saved to /var/cache/conftool/dbconfig/20230801-110302-ladsgroup.json
[11:04:28] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] pcc: fix tuple formating [puppet] - 10https://gerrit.wikimedia.org/r/944178 (owner: 10Jbond)
[11:05:34] <icinga-wm>	 RECOVERY - Check systemd state on db1140 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[11:06:42] <icinga-wm>	 RECOVERY - Check systemd state on db2141 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[11:08:59] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T342617)', diff saved to https://phabricator.wikimedia.org/P49876 and previous config saved to /var/cache/conftool/dbconfig/20230801-110858-ladsgroup.json
[11:09:00] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
[11:09:02] <stashbot>	 T342617: Make old columns of externallinks nullable - https://phabricator.wikimedia.org/T342617
[11:09:13] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
[11:09:16] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [V: 03+1] noc: add static file server (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/942674 (https://phabricator.wikimedia.org/T341859) (owner: 10Giuseppe Lavagetto)
[11:11:22] <wikibugs>	 (03PS2) 10Fabfur: Release 0.20 [software/purged] - 10https://gerrit.wikimedia.org/r/944177 (https://phabricator.wikimedia.org/T342154)
[11:12:26] <wikibugs>	 (03CR) 10Fabfur: Release 0.20 (031 comment) [software/purged] - 10https://gerrit.wikimedia.org/r/944177 (https://phabricator.wikimedia.org/T342154) (owner: 10Fabfur)
[11:12:59] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [software/bitu] - 10https://gerrit.wikimedia.org/r/934265 (https://phabricator.wikimedia.org/T338828) (owner: 10Slyngshede)
[11:13:51] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [software/bitu] - 10https://gerrit.wikimedia.org/r/934519 (https://phabricator.wikimedia.org/T340637) (owner: 10Slyngshede)
[11:14:05] <wikibugs>	 (03PS1) 10Btullis: Revert "install_server: drop Bashisms" [puppet] - 10https://gerrit.wikimedia.org/r/944189
[11:15:31] <wikibugs>	 (03PS2) 10Btullis: Revert "install_server: drop Bashisms" [puppet] - 10https://gerrit.wikimedia.org/r/944189 (https://phabricator.wikimedia.org/T95064)
[11:16:49] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [software/purged] - 10https://gerrit.wikimedia.org/r/944177 (https://phabricator.wikimedia.org/T342154) (owner: 10Fabfur)
[11:17:38] <jinxer-wm>	 (KubernetesAPILatency) firing: High Kubernetes API latency (LIST services) on k8s-mlserve@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=eqiad&var-cluster=k8s-mlserve - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[11:17:45] <wikibugs>	 (03CR) 10Btullis: [C: 03+2] Revert "install_server: drop Bashisms" [puppet] - 10https://gerrit.wikimedia.org/r/944189 (https://phabricator.wikimedia.org/T95064) (owner: 10Btullis)
[11:18:08] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2111 (T342617)', diff saved to https://phabricator.wikimedia.org/P49877 and previous config saved to /var/cache/conftool/dbconfig/20230801-111808-ladsgroup.json
[11:18:10] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
[11:18:12] <stashbot>	 T342617: Make old columns of externallinks nullable - https://phabricator.wikimedia.org/T342617
[11:18:23] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
[11:18:29] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db2123 (T342617)', diff saved to https://phabricator.wikimedia.org/P49878 and previous config saved to /var/cache/conftool/dbconfig/20230801-111829-ladsgroup.json
[11:19:02] <wikibugs>	 (03PS3) 10Fabfur: Release 0.20 [software/purged] - 10https://gerrit.wikimedia.org/r/944177 (https://phabricator.wikimedia.org/T342154)
[11:21:46] <logmsgbot>	 !log btullis@cumin1001 START - Cookbook sre.hosts.reimage for host analytics1076.eqiad.wmnet with OS bullseye
[11:22:25] <logmsgbot>	 !log btullis@cumin1001 START - Cookbook sre.hosts.reimage for host analytics1077.eqiad.wmnet with OS bullseye
[11:22:38] <jinxer-wm>	 (KubernetesAPILatency) resolved: High Kubernetes API latency (LIST services) on k8s-mlserve@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=eqiad&var-cluster=k8s-mlserve - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[11:26:31] <wikibugs>	 10SRE, 10Traffic-Icebox: Create a second text-lb IP address for test purposes - https://phabricator.wikimedia.org/T237492 (10cmooney) This "temporary testing" IP appears to be the one our public DNS for text-lb in Amsterdam is resolving to: ` cathal@officepc:~$ dig +noall +answer test-lb.esams.wikimedia.org. @...
[11:33:06] <logmsgbot>	 !log btullis@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1076.eqiad.wmnet with reason: host reimage
[11:33:50] <logmsgbot>	 !log btullis@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1077.eqiad.wmnet with reason: host reimage
[11:36:08] <logmsgbot>	 !log btullis@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1076.eqiad.wmnet with reason: host reimage
[11:38:22] <logmsgbot>	 !log btullis@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1077.eqiad.wmnet with reason: host reimage
[11:38:42] <wikibugs>	 (03PS2) 10Muehlenhoff: Remove jgreen from ops group [puppet] - 10https://gerrit.wikimedia.org/r/936215 (https://phabricator.wikimedia.org/T336231)
[11:40:29] <wikibugs>	 (03CR) 10Slyngshede: [V: 03+2 C: 03+2] Credit logo artist. [software/bitu] - 10https://gerrit.wikimedia.org/r/934265 (https://phabricator.wikimedia.org/T338828) (owner: 10Slyngshede)
[11:40:46] <wikibugs>	 (03CR) 10Slyngshede: [C: 03+2] Allow users to update their email address. [software/bitu] - 10https://gerrit.wikimedia.org/r/934519 (https://phabricator.wikimedia.org/T340637) (owner: 10Slyngshede)
[11:40:48] <wikibugs>	 (03CR) 10Slyngshede: [V: 03+2 C: 03+2] Allow users to update their email address. [software/bitu] - 10https://gerrit.wikimedia.org/r/934519 (https://phabricator.wikimedia.org/T340637) (owner: 10Slyngshede)
[11:41:34] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] sre.hosts.reimage: connect to the micro service port [cookbooks] - 10https://gerrit.wikimedia.org/r/939738 (owner: 10Jbond)
[11:42:41] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+1] noc: add static file server (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/942674 (https://phabricator.wikimedia.org/T341859) (owner: 10Giuseppe Lavagetto)
[11:42:50] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] httpd: Add a profile for loading httpd [puppet] - 10https://gerrit.wikimedia.org/r/937520 (https://phabricator.wikimedia.org/T341717) (owner: 10Jbond)
[11:45:20] <wikibugs>	 (03CR) 10Jbond: "can i get another review on this" [cookbooks] - 10https://gerrit.wikimedia.org/r/924062 (owner: 10Jbond)
[11:45:57] <wikibugs>	 (03CR) 10Jbond: sre.ganeti.resource_report: Add cookbook to fetch Ganeti resources (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/924062 (owner: 10Jbond)
[11:46:08] <wikibugs>	 (03CR) 10Ladsgroup: noc: remove symlinks and also neutralize createTxtFileSymlinks (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/942675 (https://phabricator.wikimedia.org/T341859) (owner: 10Giuseppe Lavagetto)
[11:48:09] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10vm-requests: Site: eqiad VM 1 for config-master - https://phabricator.wikimedia.org/T343212 (10jbond)
[11:48:21] <icinga-wm>	 PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[11:48:24] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10vm-requests: Site: eqiad VM 1 for config-master - https://phabricator.wikimedia.org/T343212 (10jbond) 05Open→03In progress p:05Triage→03Medium a:03jbond
[11:48:49] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10vm-requests: Site: codfw VM 1 for config-master - https://phabricator.wikimedia.org/T343213 (10jbond)
[11:48:59] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10vm-requests: Site: codfw VM 1 for config-master - https://phabricator.wikimedia.org/T343213 (10jbond) p:05Triage→03Medium a:03jbond
[11:49:05] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10vm-requests: Site: codfw VM 1 for config-master - https://phabricator.wikimedia.org/T343213 (10jbond) 05Open→03In progress
[11:49:17] <logmsgbot>	 !log btullis@cumin1001 START - Cookbook sre.hosts.reboot-single for host an-conf1001.eqiad.wmnet
[11:50:30] <wikibugs>	 10sre-alert-triage, 10serviceops: Alert triage: overdue warning alert - https://phabricator.wikimedia.org/T342761 (10Clement_Goubert) 05Open→03Resolved a:03Clement_Goubert Alert has resolved
[11:50:35] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
[11:50:48] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
[11:50:49] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[11:51:04] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Remove jgreen from ops group [puppet] - 10https://gerrit.wikimedia.org/r/936215 (https://phabricator.wikimedia.org/T336231) (owner: 10Muehlenhoff)
[11:51:05] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[11:51:11] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1161 (T342617)', diff saved to https://phabricator.wikimedia.org/P49879 and previous config saved to /var/cache/conftool/dbconfig/20230801-115110-ladsgroup.json
[11:51:13] <stashbot>	 T342617: Make old columns of externallinks nullable - https://phabricator.wikimedia.org/T342617
[11:52:28] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10vm-requests: Site: codfw VM 1 for config-master - https://phabricator.wikimedia.org/T343213 (10jbond)
[11:52:35] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Puppet-Infrastructure, 10Patch-For-Review, 10Puppet (Puppet 7.0): Add config-master to puppetserver role - https://phabricator.wikimedia.org/T341717 (10jbond)
[11:52:43] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10vm-requests: Site: eqiad VM 1 for config-master - https://phabricator.wikimedia.org/T343212 (10jbond)
[11:52:47] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Puppet-Infrastructure, 10Patch-For-Review, 10Puppet (Puppet 7.0): Add config-master to puppetserver role - https://phabricator.wikimedia.org/T341717 (10jbond)
[11:52:51] <wikibugs>	 10SRE, 10Scap, 10serviceops-radar, 10Release-Engineering-Team (Seen): Enable scap to roll back broken changes to MediaWiki - https://phabricator.wikimedia.org/T225207 (10Clement_Goubert)
[11:53:26] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Puppet-Infrastructure, 10Patch-For-Review, 10Puppet (Puppet 7.0): Move config-master to dedicated VMs - https://phabricator.wikimedia.org/T341717 (10jbond)
[11:55:26] <logmsgbot>	 !log btullis@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1001.eqiad.wmnet
[11:57:09] <logmsgbot>	 !log btullis@cumin1001 START - Cookbook sre.hosts.reboot-single for host an-conf1002.eqiad.wmnet
[11:59:03] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netops: New IP and Vlan allocations for esams knams move - https://phabricator.wikimedia.org/T343214 (10cmooney) p:05Triage→03Medium
[11:59:19] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netops: New IP and Vlan allocations for esams knams move - https://phabricator.wikimedia.org/T343214 (10cmooney)
[11:59:24] <wikibugs>	 10SRE, 10ops-knams, 10DC-Ops: Main Tracking Task for ESAMS Migration to KNAMS - https://phabricator.wikimedia.org/T329219 (10cmooney)
[11:59:25] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2123 (T342617)', diff saved to https://phabricator.wikimedia.org/P49880 and previous config saved to /var/cache/conftool/dbconfig/20230801-115924-ladsgroup.json
[11:59:27] <stashbot>	 T342617: Make old columns of externallinks nullable - https://phabricator.wikimedia.org/T342617
[12:00:05] <jouncebot>	 Deploy window Mobileapps/RESTBase/Wikifeeds (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230801T1200)
[12:01:17] <icinga-wm>	 RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:03:02] <wikibugs>	 (03CR) 10Fabfur: [C: 03+2] Release 0.20 [software/purged] - 10https://gerrit.wikimedia.org/r/944177 (https://phabricator.wikimedia.org/T342154) (owner: 10Fabfur)
[12:03:16] <logmsgbot>	 !log btullis@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1002.eqiad.wmnet
[12:03:34] <logmsgbot>	 !log btullis@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host analytics1077.eqiad.wmnet with OS bullseye
[12:06:22] <wikibugs>	 (03CR) 10Muehlenhoff: sre.ganeti.resource_report: Add cookbook to fetch Ganeti resources (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/924062 (owner: 10Jbond)
[12:06:44] <logmsgbot>	 !log btullis@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host analytics1076.eqiad.wmnet with OS bullseye
[12:08:06] <wikibugs>	 (03CR) 10Volans: [C: 04-1] "Few comments inline" [cookbooks] - 10https://gerrit.wikimedia.org/r/924062 (owner: 10Jbond)
[12:08:56] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netops: Announce new public IPv6 prefix from Amsterdam for knams migration - https://phabricator.wikimedia.org/T343216 (10cmooney) p:05Triage→03Medium
[12:09:05] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netops: Announce new public IPv6 prefix from Amsterdam for knams migration - https://phabricator.wikimedia.org/T343216 (10cmooney)
[12:09:11] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netops: New IP and Vlan allocations for esams knams move - https://phabricator.wikimedia.org/T343214 (10cmooney)
[12:11:00] <fabfur>	 !log imported purged package into bookworm-wikimedia (https://gerrit.wikimedia.org/r/c/operations/software/purged/+/944177) T342154
[12:11:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:11:03] <stashbot>	 T342154: Upgrade Traffic hosts to bookworm - https://phabricator.wikimedia.org/T342154
[12:11:35] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review: Upgrade Traffic hosts to bookworm - https://phabricator.wikimedia.org/T342154 (10Fabfur)
[12:11:42] <wikibugs>	 (03PS1) 10Cathal Mooney: Announce new AMS IPv6 range from esams and knams ahead of move [homer/public] - 10https://gerrit.wikimedia.org/r/944184 (https://phabricator.wikimedia.org/T343216)
[12:11:58] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM" [cookbooks] - 10https://gerrit.wikimedia.org/r/942695 (https://phabricator.wikimedia.org/T336485) (owner: 10Cathal Mooney)
[12:12:15] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review: Upgrade Traffic hosts to bookworm - https://phabricator.wikimedia.org/T342154 (10Fabfur) Start working on `python-logstash` package
[12:14:31] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P49881 and previous config saved to /var/cache/conftool/dbconfig/20230801-121430-ladsgroup.json
[12:15:56] <wikibugs>	 (03CR) 10Muehlenhoff: sre.ganeti.resource_report: Add cookbook to fetch Ganeti resources (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/924062 (owner: 10Jbond)
[12:16:35] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netops, 10Patch-For-Review: Announce new public IPv6 prefix from Amsterdam for knams migration - https://phabricator.wikimedia.org/T343216 (10cmooney)
[12:17:05] <wikibugs>	 (03CR) 10Volans: [C: 04-1] "reply inline" [cookbooks] - 10https://gerrit.wikimedia.org/r/924062 (owner: 10Jbond)
[12:17:41] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/931581 (owner: 10Muehlenhoff)
[12:18:41] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netops, 10Patch-For-Review: Announce new public IPv6 prefix from Amsterdam for knams migration - https://phabricator.wikimedia.org/T343216 (10cmooney) IRR route6 object created: ` cathal@officepc:~$ whois -r -T route6 -h whois.ripe.net 2a02:ec80:300::/48 % This is the...
[12:20:08] <wikibugs>	 (03CR) 10Cathal Mooney: [C: 03+2] Announce new AMS IPv6 range from esams and knams ahead of move [homer/public] - 10https://gerrit.wikimedia.org/r/944184 (https://phabricator.wikimedia.org/T343216) (owner: 10Cathal Mooney)
[12:20:41] <wikibugs>	 (03Merged) 10jenkins-bot: Announce new AMS IPv6 range from esams and knams ahead of move [homer/public] - 10https://gerrit.wikimedia.org/r/944184 (https://phabricator.wikimedia.org/T343216) (owner: 10Cathal Mooney)
[12:22:31] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] noc: Pass ports without ferm-specific service constants [puppet] - 10https://gerrit.wikimedia.org/r/931581 (owner: 10Muehlenhoff)
[12:23:47] <wikibugs>	 (03PS1) 10Fabfur: Version 0.4.6-4 [debs/python-logstash] - 10https://gerrit.wikimedia.org/r/944209 (https://phabricator.wikimedia.org/T342154)
[12:25:19] <logmsgbot>	 !log btullis@cumin1001 START - Cookbook sre.hosts.reboot-single for host an-conf1003.eqiad.wmnet
[12:29:00] <wikibugs>	 (03PS7) 10Jbond: sre.ganeti.resource_report: Add cookbook to fetch Ganeti resources [cookbooks] - 10https://gerrit.wikimedia.org/r/924062
[12:29:37] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P49882 and previous config saved to /var/cache/conftool/dbconfig/20230801-122936-ladsgroup.json
[12:30:20] <logmsgbot>	 !log jbond@cumin1001 START - Cookbook sre.ganeti.resource_report
[12:30:20] <logmsgbot>	 !log jbond@cumin1001 END (PASS) - Cookbook sre.ganeti.resource_report (exit_code=0)
[12:31:23] <logmsgbot>	 !log btullis@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1003.eqiad.wmnet
[12:31:36] <wikibugs>	 (03PS8) 10Jbond: sre.ganeti.resource_report: Add cookbook to fetch Ganeti resources [cookbooks] - 10https://gerrit.wikimedia.org/r/924062
[12:31:39] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] sre.ganeti.resource_report: Add cookbook to fetch Ganeti resources [cookbooks] - 10https://gerrit.wikimedia.org/r/924062 (owner: 10Jbond)
[12:32:22] <wikibugs>	 (03PS9) 10Jbond: sre.ganeti.resource_report: Add cookbook to fetch Ganeti resources [cookbooks] - 10https://gerrit.wikimedia.org/r/924062
[12:33:51] <wikibugs>	 (03PS10) 10Jbond: sre.ganeti.resource_report: Add cookbook to fetch Ganeti resources [cookbooks] - 10https://gerrit.wikimedia.org/r/924062
[12:34:07] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161 (T342617)', diff saved to https://phabricator.wikimedia.org/P49883 and previous config saved to /var/cache/conftool/dbconfig/20230801-123406-ladsgroup.json
[12:34:12] <stashbot>	 T342617: Make old columns of externallinks nullable - https://phabricator.wikimedia.org/T342617
[12:35:14] <wikibugs>	 (03PS11) 10Jbond: sre.ganeti.resource_report: Add cookbook to fetch Ganeti resources [cookbooks] - 10https://gerrit.wikimedia.org/r/924062
[12:36:15] <wikibugs>	 (03PS12) 10Jbond: sre.ganeti.resource_report: Add cookbook to fetch Ganeti resources [cookbooks] - 10https://gerrit.wikimedia.org/r/924062
[12:37:19] <wikibugs>	 (03CR) 10Jbond: "thanks for the quick feedback, updated" [cookbooks] - 10https://gerrit.wikimedia.org/r/924062 (owner: 10Jbond)
[12:37:49] <wikibugs>	 (03PS3) 10Muehlenhoff: PCC: Pass ports without ferm-specific service constants [puppet] - 10https://gerrit.wikimedia.org/r/931578
[12:40:32] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10vm-requests: Site: codfw VM 1 for config-master - https://phabricator.wikimedia.org/T343213 (10jbond) going to self approve this for group A
[12:41:08] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10vm-requests: Site: eqiad VM 1 for config-master - https://phabricator.wikimedia.org/T343212 (10jbond) going to self approve this for group A
[12:43:13] <wikibugs>	 (03PS1) 10Muehlenhoff: profile::aptrepo::wikimedia: Pass ports without Ferm-specific service identifiers [puppet] - 10https://gerrit.wikimedia.org/r/944211
[12:44:23] <wikibugs>	 10SRE, 10Traffic-Icebox: Create a second text-lb IP address for test purposes - https://phabricator.wikimedia.org/T237492 (10cmooney) I also made the same allocations for the [[ https://netbox.wikimedia.org/ipam/prefixes/743/ip-addresses/ | IPv6 range ]]
[12:44:43] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2123 (T342617)', diff saved to https://phabricator.wikimedia.org/P49885 and previous config saved to /var/cache/conftool/dbconfig/20230801-124442-ladsgroup.json
[12:44:45] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2128.codfw.wmnet with reason: Maintenance
[12:44:46] <stashbot>	 T342617: Make old columns of externallinks nullable - https://phabricator.wikimedia.org/T342617
[12:44:47] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/944211 (owner: 10Muehlenhoff)
[12:44:59] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2128.codfw.wmnet with reason: Maintenance
[12:45:00] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
[12:45:03] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
[12:45:09] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db2128 (T342617)', diff saved to https://phabricator.wikimedia.org/P49886 and previous config saved to /var/cache/conftool/dbconfig/20230801-124508-ladsgroup.json
[12:45:46] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] profile::aptrepo::wikimedia: Pass ports without Ferm-specific service identifiers [puppet] - 10https://gerrit.wikimedia.org/r/944211 (owner: 10Muehlenhoff)
[12:48:15] <wikibugs>	 (03PS1) 10Jbond: install_server: Add partman config for config-master [puppet] - 10https://gerrit.wikimedia.org/r/944212 (https://phabricator.wikimedia.org/T341717)
[12:49:13] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P49887 and previous config saved to /var/cache/conftool/dbconfig/20230801-124912-ladsgroup.json
[12:49:37] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/931578 (owner: 10Muehlenhoff)
[12:51:27] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "LGTM suggestion inline.  CI is complaining because the first line of the commit msg is to long" [puppet] - 10https://gerrit.wikimedia.org/r/944211 (owner: 10Muehlenhoff)
[12:51:47] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] install_server: Add partman config for config-master [puppet] - 10https://gerrit.wikimedia.org/r/944212 (https://phabricator.wikimedia.org/T341717) (owner: 10Jbond)
[12:52:37] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to  releasers-wikibase for lojo_wmde - https://phabricator.wikimedia.org/T342973 (10fgiunchedi) >>! In T342973#9058334, @WMDE-leszek wrote: > @fgiunchedi don't know if that suffices as a confirmation, but the person in question has fairly recently started at WMDE...
[12:52:56] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to  releasers-wikibase for roti_WMDE - https://phabricator.wikimedia.org/T342972 (10fgiunchedi) >>! In T342972#9058370, @WMDE-leszek wrote: > @fgiunchedi don't know if that suffices as a confirmation, but the person in question has fairly recently started at WMDE...
[12:55:20] <wikibugs>	 10SRE, 10CAS-SSO, 10Infrastructure-Foundations, 10collaboration-services, and 4 others: migrate gitlab away from the CAS protocol - https://phabricator.wikimedia.org/T320390 (10Jelto) p:05High→03Medium The cas omniauth_provider was removed in the last merged patch. OIDC is the only login available in G...
[12:57:28] <wikibugs>	 (03CR) 10Volans: "LGTM, one nit on the filename" [cookbooks] - 10https://gerrit.wikimedia.org/r/924062 (owner: 10Jbond)
[12:57:56] <wikibugs>	 10SRE, 10Bitu, 10Infrastructure-Foundations: Implement "update email" functionality - https://phabricator.wikimedia.org/T340637 (10SLyngshede-WMF) 05In progress→03Resolved
[12:58:00] <wikibugs>	 10SRE, 10Bitu, 10Infrastructure-Foundations: Build-out for self service - https://phabricator.wikimedia.org/T320801 (10SLyngshede-WMF)
[12:58:32] <logmsgbot>	 !log jbond@cumin1001 START - Cookbook sre.ganeti.makevm for new host config-master1001.eqiad.wmnet
[12:58:34] <logmsgbot>	 !log jbond@cumin1001 START - Cookbook sre.dns.netbox
[13:00:04] <jouncebot>	 RoanKattouw, Lucas_WMDE, Urbanecm, awight, TheresNoTime, and taavi: #bothumor My software never has bugs. It just develops random features. Rise for UTC afternoon backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230801T1300).
[13:00:04] <jouncebot>	 aanzx and koi: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[13:00:10] <koi>	 o/
[13:00:18] <Lucas_WMDE>	 I can be there in a few mins
[13:00:23] <Lucas_WMDE>	 (might also do a backport myself)
[13:00:34] <aanzx>	 o/
[13:00:40] <logmsgbot>	 !log jbond@cumin1001 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM config-master1001.eqiad.wmnet - jbond@cumin1001"
[13:01:59] <logmsgbot>	 !log jbond@cumin1001 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM config-master1001.eqiad.wmnet - jbond@cumin1001"
[13:01:59] <logmsgbot>	 !log jbond@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[13:01:59] <logmsgbot>	 !log jbond@cumin1001 START - Cookbook sre.dns.wipe-cache config-master1001.eqiad.wmnet on all recursors
[13:02:02] <logmsgbot>	 !log jbond@cumin1001 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master1001.eqiad.wmnet on all recursors
[13:02:38] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to releasers-wikibase for adee_wmde - https://phabricator.wikimedia.org/T342969 (10fgiunchedi) >>! In T342969#9058232, @fgiunchedi wrote: > @KFrancis hello, we'd need verification that this user has an NDA on file, would you mind checking? Thank you in advance!...
[13:02:43] <Lucas_WMDE>	 alright, I can deploy!
[13:03:13] <Lucas_WMDE>	 hum, new fingerprint for deployment.eqiad.wmnet?
[13:03:15] * Lucas_WMDE looks on wikitech
[13:03:55] <Lucas_WMDE>	 hm, it’s the ed25519 one from https://wikitech.wikimedia.org/wiki/Help:SSH_Fingerprints/deploy1002.eqiad.wmnet
[13:04:10] <Lucas_WMDE>	 maybe I need to check on my fingerprints update timer later
[13:04:19] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P49888 and previous config saved to /var/cache/conftool/dbconfig/20230801-130419-ladsgroup.json
[13:04:26] <wikibugs>	 (03PS13) 10Jbond: sre.ganeti.resource_report: Add cookbook to fetch Ganeti resources [cookbooks] - 10https://gerrit.wikimedia.org/r/924062
[13:05:07] <logmsgbot>	 !log jbond@cumin1001 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM config-master1001.eqiad.wmnet - jbond@cumin1001"
[13:05:37] <wikibugs>	 (03CR) 10Jbond: "updated thanks" [cookbooks] - 10https://gerrit.wikimedia.org/r/924062 (owner: 10Jbond)
[13:05:53] <logmsgbot>	 !log jbond@cumin1001 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM config-master1001.eqiad.wmnet - jbond@cumin1001"
[13:06:01] <logmsgbot>	 !log jbond@cumin2002 START - Cookbook sre.ganeti.makevm for new host config-master2001.codfw.wmnet
[13:06:02] <logmsgbot>	 !log jbond@cumin2002 START - Cookbook sre.dns.netbox
[13:06:10] <logmsgbot>	 !log jbond@cumin1001 START - Cookbook sre.hosts.reimage for host config-master1001.eqiad.wmnet with OS bookworm
[13:06:17] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10vm-requests: Site: eqiad VM 1 for config-master - https://phabricator.wikimedia.org/T343212 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jbond@cumin1001 for host config-master1001.eqiad.wmnet with OS bookworm
[13:06:23] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): idwikisource change wgSiteName, wgMetaNamespace and add project namespace alias (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/944190 (https://phabricator.wikimedia.org/T341173) (owner: 10Anzx)
[13:06:44] <wikibugs>	 (03CR) 10Filippo Giunchedi: wmcs: Disable Graphite query access (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/942692 (https://phabricator.wikimedia.org/T326266) (owner: 10Majavah)
[13:07:17] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM" [cookbooks] - 10https://gerrit.wikimedia.org/r/924062 (owner: 10Jbond)
[13:07:48] <wikibugs>	 (03PS1) 10Cathal Mooney: Policy and definition updates for post-migration esams ranges [homer/public] - 10https://gerrit.wikimedia.org/r/944216 (https://phabricator.wikimedia.org/T343214)
[13:08:11] <wikibugs>	 (03CR) 10Majavah: wmcs: Disable Graphite query access (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/942692 (https://phabricator.wikimedia.org/T326266) (owner: 10Majavah)
[13:08:32] <wikibugs>	 (03PS1) 10Jbond: config-master: Add role::insetup to new config masters [puppet] - 10https://gerrit.wikimedia.org/r/944217 (https://phabricator.wikimedia.org/T341717)
[13:08:50] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] config-master: Add role::insetup to new config masters [puppet] - 10https://gerrit.wikimedia.org/r/944217 (https://phabricator.wikimedia.org/T341717) (owner: 10Jbond)
[13:09:26] <wikibugs>	 (03PS2) 10Lucas Werkmeister (WMDE): btmwiktionary: Add project logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/942781 (https://phabricator.wikimedia.org/T343004) (owner: 10Stang)
[13:09:43] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by lucaswerkmeister-wmde@deploy1002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/942781 (https://phabricator.wikimedia.org/T343004) (owner: 10Stang)
[13:09:50] <logmsgbot>	 !log jbond@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM config-master2001.codfw.wmnet - jbond@cumin2002"
[13:10:06] <wikibugs>	 (03PS3) 10Anzx: idwikisource change wgSiteName, wgMetaNamespace and add project namespace alias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/944190 (https://phabricator.wikimedia.org/T341173)
[13:10:24] <wikibugs>	 (03PS11) 10Herron: profile::pyrra::api: create profile [puppet] - 10https://gerrit.wikimedia.org/r/929729 (https://phabricator.wikimedia.org/T302995)
[13:10:31] <wikibugs>	 (03Merged) 10jenkins-bot: btmwiktionary: Add project logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/942781 (https://phabricator.wikimedia.org/T343004) (owner: 10Stang)
[13:10:35] <logmsgbot>	 !log jbond@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM config-master2001.codfw.wmnet - jbond@cumin2002"
[13:10:35] <logmsgbot>	 !log jbond@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[13:10:35] <logmsgbot>	 !log jbond@cumin2002 START - Cookbook sre.dns.wipe-cache config-master2001.codfw.wmnet on all recursors
[13:10:38] <logmsgbot>	 !log jbond@cumin2002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master2001.codfw.wmnet on all recursors
[13:11:00] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 Started scap: Backport for [[gerrit:942781|btmwiktionary: Add project logo (T343004)]]
[13:11:03] <logmsgbot>	 !log jbond@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM config-master2001.codfw.wmnet - jbond@cumin2002"
[13:11:12] <stashbot>	 T343004: Change logo btm.wikt - https://phabricator.wikimedia.org/T343004
[13:11:45] <wikibugs>	 (03CR) 10Anzx: idwikisource change wgSiteName, wgMetaNamespace and add project namespace alias (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/944190 (https://phabricator.wikimedia.org/T341173) (owner: 10Anzx)
[13:11:49] <logmsgbot>	 !log jbond@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM config-master2001.codfw.wmnet - jbond@cumin2002"
[13:12:09] <logmsgbot>	 !log jbond@cumin2002 START - Cookbook sre.hosts.reimage for host config-master2001.codfw.wmnet with OS bookworm
[13:12:15] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10vm-requests: Site: codfw VM 1 for config-master - https://phabricator.wikimedia.org/T343213 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jbond@cumin2002 for host config-master2001.codfw.wmnet with OS bookworm
[13:13:20] <wikibugs>	 (03CR) 10Herron: profile::pyrra::api: create profile (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/929729 (https://phabricator.wikimedia.org/T302995) (owner: 10Herron)
[13:14:58] <wikibugs>	 (03PS14) 10Jbond: sre.ganeti.resource-report: Add cookbook to fetch Ganeti resources [cookbooks] - 10https://gerrit.wikimedia.org/r/924062
[13:15:02] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] sre.ganeti.resource-report: Add cookbook to fetch Ganeti resources [cookbooks] - 10https://gerrit.wikimedia.org/r/924062 (owner: 10Jbond)
[13:16:41] <logmsgbot>	 !log jbond@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on config-master1001.eqiad.wmnet with reason: host reimage
[13:17:48] <wikibugs>	 (03Merged) 10jenkins-bot: sre.ganeti.resource-report: Add cookbook to fetch Ganeti resources [cookbooks] - 10https://gerrit.wikimedia.org/r/924062 (owner: 10Jbond)
[13:18:10] <wikibugs>	 10ops-codfw: Inbound interface errors - https://phabricator.wikimedia.org/T343180 (10Jhancock.wm) 05Open→03Resolved a:03Jhancock.wm known issue with no impact
[13:18:22] <wikibugs>	 (03CR) 10Filippo Giunchedi: wmcs: Disable Graphite query access (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/942692 (https://phabricator.wikimedia.org/T326266) (owner: 10Majavah)
[13:18:44] <Lucas_WMDE>	 `helmfile -e eqiad --selector name=pinkunicorn apply` is taking quite a while
[13:18:58] <Lucas_WMDE>	 (name=main, and the two codfws, already finished)
[13:19:25] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161 (T342617)', diff saved to https://phabricator.wikimedia.org/P49889 and previous config saved to /var/cache/conftool/dbconfig/20230801-131925-ladsgroup.json
[13:19:27] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1185.eqiad.wmnet with reason: Maintenance
[13:19:28] <stashbot>	 T342617: Make old columns of externallinks nullable - https://phabricator.wikimedia.org/T342617
[13:19:40] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1185.eqiad.wmnet with reason: Maintenance
[13:19:46] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1185 (T342617)', diff saved to https://phabricator.wikimedia.org/P49890 and previous config saved to /var/cache/conftool/dbconfig/20230801-131946-ladsgroup.json
[13:19:47] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 03+1] simplewiktionary: Update project logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/942806 (https://phabricator.wikimedia.org/T343084) (owner: 10Stang)
[13:19:50] <logmsgbot>	 !log jbond@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on config-master1001.eqiad.wmnet with reason: host reimage
[13:20:05] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] profile::pyrra::api: create profile [puppet] - 10https://gerrit.wikimedia.org/r/929729 (https://phabricator.wikimedia.org/T302995) (owner: 10Herron)
[13:20:26] <claime>	 Lucas_WMDE: Scheduling issues :/
[13:20:36] <claime>	 7m29s       Warning   FailedScheduling    pod/mw-debug.eqiad.pinkunicorn-fd8bf588d-8lfwm    0/22 nodes are available: 16 Insufficient cpu, 2 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 4 node(s) had taint {dedicated: kask}, that the pod didn't tolerate.
[13:20:52] <Lucas_WMDE>	 :/
[13:21:17] <claime>	 I think we're gonna need a bigger boat
[13:21:46] <claime>	 Especially with the work I'm doing for mw-on-k8s which will raise substantially their requests
[13:22:01] <claime>	 (kubernetes resource requests, not rps)
[13:22:19] <Lucas_WMDE>	 now scap noticed too
[13:22:24] <Lucas_WMDE>	 K8s deployment to stage testservers failed: K8s deployment had the following errors:
[13:22:29] <Lucas_WMDE>	 Rolling back to prior state...
[13:22:38] <wikibugs>	 (03PS1) 10Anzx: Change idwikisource logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/944221
[13:22:50] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde and stang: Backport for [[gerrit:942781|btmwiktionary: Add project logo (T343004)]] synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
[13:22:52] <stashbot>	 T343004: Change logo btm.wikt - https://phabricator.wikimedia.org/T343004
[13:23:22] <Lucas_WMDE>	 koi: can you test?
[13:23:27] <claime>	 Yeah
[13:23:28] <koi>	 looking
[13:24:04] <claime>	 I mean for mw-debug we can probably tell it that we don't need to have a rolling scaling (that way we free the resources then take them back, instead of overprovisioning then scaling down)
[13:24:28] <wikibugs>	 (03PS2) 10Anzx: Change idwikisource logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/944221 (https://phabricator.wikimedia.org/T341173)
[13:24:35] <koi>	 Lucas_WMDE, looks good from my side
[13:24:38] <Lucas_WMDE>	 claime: *nod*
[13:24:40] <Lucas_WMDE>	 ok, will sync
[13:24:41] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde and stang: Continuing with sync
[13:24:47] <Lucas_WMDE>	 oh right, scap logs that now, yay :)
[13:24:57] <claime>	 I'll have to redeploy mw-debug eqiad though
[13:25:08] <claime>	 Because right now the new image hasn't been deployed there
[13:25:36] <claime>	 Or if you have another backport next, it'll get deployed at that point, but we'll probably run into the same issue
[13:25:50] <Lucas_WMDE>	 I have two more yeah
[13:26:05] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2128 (T342617)', diff saved to https://phabricator.wikimedia.org/P49891 and previous config saved to /var/cache/conftool/dbconfig/20230801-132604-ladsgroup.json
[13:26:10] <stashbot>	 T342617: Make old columns of externallinks nullable - https://phabricator.wikimedia.org/T342617
[13:26:26] <icinga-wm>	 RECOVERY - Backup freshness on backup1001 is OK: Fresh: 130 jobs https://wikitech.wikimedia.org/wiki/Bacula%23Monitoring
[13:26:49] <claime>	 I need to look at the pod disruption things for mw-debug, but it'll take me more than the time it takes you to move on to the next backport
[13:27:09] <claime>	 I'll lower requests manually, see if it's enough for it to deploy correctly
[13:27:26] <Lucas_WMDE>	 ok thanks
[13:27:28] <claime>	 And bump up the priority of more hardware
[13:27:43] <Lucas_WMDE>	 (it’s currently helmfileing the canaries btw)
[13:27:57] <Lucas_WMDE>	 (codfw finished, eqiad still ongoing)
[13:28:13] <claime>	 We're going to have the same issue
[13:28:19] <claime>	 3m16s       Warning   FailedScheduling    pod/mw-api-ext.eqiad.canary-688764c94-qn4kv    0/22 nodes are available: 16 Insufficient cpu, 2 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 4 node(s) had taint {dedicated: kask}, that the pod didn't tolerate.
[13:28:22] <claime>	 Damn
[13:28:29] <Lucas_WMDE>	 hm
[13:28:32] <Lucas_WMDE>	 but this time with user facing traffic?
[13:28:36] <claime>	 Yes.
[13:28:46] <Lucas_WMDE>	 let me know if I should cancel the scap
[13:28:47] <claime>	 This is a problem
[13:28:53] <claime>	 Won't change anything
[13:29:06] <claime>	 It just means mw-on-k8s won't be running up to date code
[13:29:11] <Lucas_WMDE>	 ok
[13:29:17] <Lucas_WMDE>	 but k8s will keep the old pods alive?
[13:29:27] <Lucas_WMDE>	 so no errors for users, just old code?
[13:29:54] <claime>	 yep
[13:30:23] <wikibugs>	 (03PS1) 10Volans: wmf-update-known-hosts-production: fix CNAMEs [debs/wmf-sre-laptop] - 10https://gerrit.wikimedia.org/r/944225
[13:30:37] <logmsgbot>	 !log jbond@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on config-master2001.codfw.wmnet with reason: host reimage
[13:30:56] <Lucas_WMDE>	 ah, that ^ by volans looks like it might fix the SSH issue I had half an hour ago ^^
[13:31:40] <logmsgbot>	 !log btullis@cumin1001 START - Cookbook sre.hosts.reboot-single for host an-test-presto1001.eqiad.wmnet
[13:31:58] <Lucas_WMDE>	 claime: I don’t think any of the config changes are urgent, I’m happy to wait once the current scap finishes
[13:31:59] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "lgtm, we could also export the other key?" [debs/wmf-sre-laptop] - 10https://gerrit.wikimedia.org/r/944225 (owner: 10Volans)
[13:32:31] <wikibugs>	 (03PS11) 10Herron: profile::pyrra::filesystem: add profile [puppet] - 10https://gerrit.wikimedia.org/r/929731 (https://phabricator.wikimedia.org/T302995)
[13:32:41] <wikibugs>	 (03PS10) 10Herron: pyrra: deploy to thanos-fe hosts [puppet] - 10https://gerrit.wikimedia.org/r/929734 (https://phabricator.wikimedia.org/T302995)
[13:32:52] <wikibugs>	 (03PS4) 10Herron: thanos-rule: add pyrra filesystem operator output dir to search path [puppet] - 10https://gerrit.wikimedia.org/r/930628 (https://phabricator.wikimedia.org/T302995)
[13:32:53] <logmsgbot>	 !log volans@cumin1001 START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
[13:33:10] <logmsgbot>	 !log jbond@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host config-master1001.eqiad.wmnet with OS bookworm
[13:33:10] <logmsgbot>	 !log jbond@cumin1001 END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host config-master1001.eqiad.wmnet
[13:33:12] <claime>	 I'll reduce requests anyways, that's quick and dirty but we're overcommited already so it just needs to get us over the bump of deployment
[13:33:16] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10vm-requests: Site: eqiad VM 1 for config-master - https://phabricator.wikimedia.org/T343212 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jbond@cumin1001 for host config-master1001.eqiad.wmnet with OS bookworm completed: - config-master1001 (**P...
[13:33:18] <volans>	 Lucas_WMDE: glad to have proactively fixed it :D
[13:33:34] <claime>	 I don't have time to reimage nodes to add hardware anyways
[13:33:45] <logmsgbot>	 !log jbond@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on config-master2001.codfw.wmnet with reason: host reimage
[13:33:48] <claime>	 At least not time enough today
[13:35:10] <Lucas_WMDE>	 ok, it timed out now, scap is rolling back to prior state
[13:35:30] <claime>	 yeah, keep going, I'll batch deploy when you're done
[13:35:35] <Lucas_WMDE>	 ok thanks
[13:35:50] <claime>	 It'll take more time because you'll need to wait for helm to timeout every time, sorry
[13:35:55] <logmsgbot>	 !log btullis@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-presto1001.eqiad.wmnet
[13:36:40] <Lucas_WMDE>	 ok, it finished checking canary traffic
[13:36:51] <Lucas_WMDE>	 so now it’s helmfileing again
[13:37:22] <wikibugs>	 (03PS1) 10Clément Goubert: mediawiki: Reduce requests for deployments to go through [deployment-charts] - 10https://gerrit.wikimedia.org/r/944229
[13:37:28] <Lucas_WMDE>	 aanzx, koi: I don’t think the idwikisource and simplewiktionary changes will be happening this window, sorry
[13:37:39] <Lucas_WMDE>	 (btmwiktionary is in progress and should finish eventually)
[13:38:05] <claime>	 Lucas_WMDE: The "funny" thing is it'll probably work for main deployments because they have more replicas, so the pod disruption budget doesn't affect them as much
[13:38:07] <koi>	 that's ok 0 0
[13:38:15] <aanzx>	 Lucas_WMDE: ok i will schedule it tomorrow
[13:38:31] <claime>	 But with the very limited number of replicas in the canaries/mw-debug, it means it can't free as much resources to scale up
[13:39:22] <logmsgbot>	 !log btullis@cumin1001 START - Cookbook sre.hosts.reboot-single for host an-test-druid1001.eqiad.wmnet
[13:39:50] <claime>	 Huh there's no PDB for mediawiki so I can't play on that
[13:39:53] <Lucas_WMDE>	 seven finished already
[13:40:06] <Lucas_WMDE>	 oh, all eight finished
[13:40:14] <Lucas_WMDE>	 predicted correctly ^^
[13:41:01] <claime>	 yeah but for the wrong reasons apparently lol
[13:41:11] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P49895 and previous config saved to /var/cache/conftool/dbconfig/20230801-134111-ladsgroup.json
[13:41:21] <wikibugs>	 (03PS2) 10Muehlenhoff: aptrepo: Pass ports without Ferm-specific service identifiers [puppet] - 10https://gerrit.wikimedia.org/r/944211
[13:41:30] <wikibugs>	 (03CR) 10Muehlenhoff: aptrepo: Pass ports without Ferm-specific service identifiers (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/944211 (owner: 10Muehlenhoff)
[13:43:33] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for [[gerrit:942781|btmwiktionary: Add project logo (T343004)]] (duration: 32m 32s)
[13:43:36] <stashbot>	 T343004: Change logo btm.wikt - https://phabricator.wikimedia.org/T343004
[13:44:18] <Lucas_WMDE>	 ok, scap exited nonzero but I don’t see any other errors so I assume that’s for the canaries
[13:44:21] <Lucas_WMDE>	 claime: I’m done for now
[13:44:31] <claime>	 Thanks, I'll try and force deploy the things
[13:44:59] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/944211 (owner: 10Muehlenhoff)
[13:45:02] <Lucas_WMDE>	 koi: btmwiktionary logo should be deployed, except that a small number of requests (1% iiuc) will hit the outdated version on k8s (claime is on it)
[13:45:11] <logmsgbot>	 !log btullis@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-druid1001.eqiad.wmnet
[13:45:14] <claime>	 confirming 1%
[13:45:35] <wikibugs>	 (03PS1) 10Volans: cumin: fix installer configuration [puppet] - 10https://gerrit.wikimedia.org/r/944230 (https://phabricator.wikimedia.org/T342345)
[13:45:53] <logmsgbot>	 !log jbond@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host config-master2001.codfw.wmnet with OS bookworm
[13:45:53] <logmsgbot>	 !log jbond@cumin2002 END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host config-master2001.codfw.wmnet
[13:45:58] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10vm-requests: Site: codfw VM 1 for config-master - https://phabricator.wikimedia.org/T343213 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jbond@cumin2002 for host config-master2001.codfw.wmnet with OS bookworm completed: - config-master2001 (**P...
[13:46:07] <logmsgbot>	 !log cgoubert@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-debug: apply
[13:46:30] <logmsgbot>	 !log cgoubert@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
[13:46:57] <logmsgbot>	 !log btullis@cumin1001 START - Cookbook sre.hosts.reboot-single for host an-test-worker1001.eqiad.wmnet
[13:46:59] <koi>	 it has changed on my side, thx
[13:47:00] <claime>	 How can I check one of the changes on mw-debug Lucas_WMDE ?
[13:47:05] <claime>	 or koi
[13:47:22] <logmsgbot>	 !log volans@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
[13:47:28] <Lucas_WMDE>	 force-reload https://btm.wiktionary.org/wiki/Wikikamus:Alaman_Utamo and see if the logo has writing under it or not, I think
[13:47:32] <claime>	 ack
[13:47:36] <wikibugs>	 (03CR) 10Volans: "Reply inline, also I'll leave the release to the reviewers if that's ok as you are the ones usually managing this package." [debs/wmf-sre-laptop] - 10https://gerrit.wikimedia.org/r/944225 (owner: 10Volans)
[13:47:49] <Lucas_WMDE>	 yeah, when I select k8s-experimental via mwdebug I get the logo without writing again
[13:47:52] <claime>	 Expected is no writing ?
[13:48:00] <claime>	 cool, that means it worked lol
[13:48:00] <Lucas_WMDE>	 no, expected new state is with writing
[13:48:03] <claime>	 Ah
[13:48:07] <claime>	 oops
[13:48:21] <claime>	 ok checking
[13:48:28] <Lucas_WMDE>	 (Wikikamus, Pustaha Siseon is what it should be)
[13:48:38] <Lucas_WMDE>	 (but it would be quite surprising if you got any other nonempty writing ^^)
[13:49:34] <logmsgbot>	 !log cgoubert@deploy1002 Started scap: (no justification provided)
[13:49:48] <logmsgbot>	 !log volans@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
[13:49:58] <claime>	 (i'm forcing the helmfile update rn)
[13:50:11] <logmsgbot>	 !log cgoubert@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-debug: apply
[13:50:42] <logmsgbot>	 !log cgoubert@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
[13:50:56] <claime>	 ok, writing's on
[13:50:56] <wikibugs>	 (03CR) 10Muehlenhoff: "I have updated our docs to use the cookbook instead of the shell hack which was previously listed:" [cookbooks] - 10https://gerrit.wikimedia.org/r/924062 (owner: 10Jbond)
[13:51:08] <claime>	 backporting my dirty changes to a proper patchset, and redeploying the rest
[13:52:44] <wikibugs>	 (03PS1) 10Stevemunene: idp_test: add datahub_staging as a OIDC service [puppet] - 10https://gerrit.wikimedia.org/r/944231 (https://phabricator.wikimedia.org/T305874)
[13:53:45] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good. FWIW, I also have a patch pending for later this week, I can piggyback this change into the updated deb when my patch is merge" [debs/wmf-sre-laptop] - 10https://gerrit.wikimedia.org/r/944225 (owner: 10Volans)
[13:53:50] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1185 (T342617)', diff saved to https://phabricator.wikimedia.org/P49896 and previous config saved to /var/cache/conftool/dbconfig/20230801-135350-ladsgroup.json
[13:53:53] <stashbot>	 T342617: Make old columns of externallinks nullable - https://phabricator.wikimedia.org/T342617
[13:53:54] <logmsgbot>	 !log btullis@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-worker1001.eqiad.wmnet
[13:54:33] <wikibugs>	 (03CR) 10Stevemunene: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/944231 (https://phabricator.wikimedia.org/T305874) (owner: 10Stevemunene)
[13:54:34] <fabfur>	 !log removing dns3001 from cr2-esams and cr3-esams routing for reboot (T335835)
[13:54:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:55:30] <wikibugs>	 (03PS2) 10Clément Goubert: mediawiki: Reduce requests for canaries [deployment-charts] - 10https://gerrit.wikimedia.org/r/944229
[13:56:13] <wikibugs>	 (03PS3) 10Clément Goubert: mediawiki: Reduce requests for canaries [deployment-charts] - 10https://gerrit.wikimedia.org/r/944229
[13:56:17] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P49897 and previous config saved to /var/cache/conftool/dbconfig/20230801-135617-ladsgroup.json
[13:57:03] <icinga-wm>	 PROBLEM - mailman list info on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[13:58:19] <icinga-wm>	 RECOVERY - mailman list info on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8646 bytes in 0.263 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[14:02:03] <wikibugs>	 (03PS1) 10Fabfur: Move ntp.esams.wikimedia.org CNAME to reboot dns3001 [dns] - 10https://gerrit.wikimedia.org/r/944232 (https://phabricator.wikimedia.org/T335835)
[14:03:25] <wikibugs>	 (03CR) 10Ssingh: [C: 03+1] Move ntp.esams.wikimedia.org CNAME to reboot dns3001 [dns] - 10https://gerrit.wikimedia.org/r/944232 (https://phabricator.wikimedia.org/T335835) (owner: 10Fabfur)
[14:04:10] <wikibugs>	 (03CR) 10Fabfur: [C: 03+2] Move ntp.esams.wikimedia.org CNAME to reboot dns3001 [dns] - 10https://gerrit.wikimedia.org/r/944232 (https://phabricator.wikimedia.org/T335835) (owner: 10Fabfur)
[14:05:45] <fabfur>	 !log running authdns-update on dns1004 to move ntp.esams to dns3002 (https://gerrit.wikimedia.org/r/c/operations/dns/+/944232) (T335835)
[14:05:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:07:15] <logmsgbot>	 !log volans@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1002.eqiad.wmnet with OS bookworm
[14:07:29] <wikibugs>	 (03CR) 10Jbond: "See comments inline (also adding simon as im out for a week after today)" [puppet] - 10https://gerrit.wikimedia.org/r/944231 (https://phabricator.wikimedia.org/T305874) (owner: 10Stevemunene)
[14:07:33] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[14:08:37] <wikibugs>	 (03CR) 10Hnowlan: [C: 03+1] mediawiki: Reduce requests for canaries [deployment-charts] - 10https://gerrit.wikimedia.org/r/944229 (owner: 10Clément Goubert)
[14:08:58] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P49899 and previous config saved to /var/cache/conftool/dbconfig/20230801-140856-ladsgroup.json
[14:09:16] <wikibugs>	 (03PS1) 10Muehlenhoff: ferm::service: Fix handling of multiple ports [puppet] - 10https://gerrit.wikimedia.org/r/944233 (https://phabricator.wikimedia.org/T336497)
[14:09:38] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] ferm::service: Fix handling of multiple ports [puppet] - 10https://gerrit.wikimedia.org/r/944233 (https://phabricator.wikimedia.org/T336497) (owner: 10Muehlenhoff)
[14:10:03] <wikibugs>	 (03CR) 10Clément Goubert: [C: 03+2] mediawiki: Reduce requests for canaries [deployment-charts] - 10https://gerrit.wikimedia.org/r/944229 (owner: 10Clément Goubert)
[14:10:50] <wikibugs>	 (03Merged) 10jenkins-bot: mediawiki: Reduce requests for canaries [deployment-charts] - 10https://gerrit.wikimedia.org/r/944229 (owner: 10Clément Goubert)
[14:11:24] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2128 (T342617)', diff saved to https://phabricator.wikimedia.org/P49900 and previous config saved to /var/cache/conftool/dbconfig/20230801-141123-ladsgroup.json
[14:11:25] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
[14:11:27] <stashbot>	 T342617: Make old columns of externallinks nullable - https://phabricator.wikimedia.org/T342617
[14:11:30] <wikibugs>	 (03PS2) 10Muehlenhoff: ferm::service: Fix handling of multiple ports [puppet] - 10https://gerrit.wikimedia.org/r/944233 (https://phabricator.wikimedia.org/T336497)
[14:11:38] <logmsgbot>	 !log cgoubert@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-debug: apply
[14:11:38] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
[14:11:40] <logmsgbot>	 !log cgoubert@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
[14:11:44] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db2137:3315 (T342617)', diff saved to https://phabricator.wikimedia.org/P49901 and previous config saved to /var/cache/conftool/dbconfig/20230801-141144-ladsgroup.json
[14:13:00] <logmsgbot>	 !log cgoubert@deploy1002 helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
[14:13:22] <logmsgbot>	 !log cgoubert@deploy1002 helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
[14:13:32] <logmsgbot>	 !log cgoubert@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
[14:13:52] <logmsgbot>	 !log cgoubert@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
[14:13:58] <logmsgbot>	 !log cgoubert@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
[14:14:06] <logmsgbot>	 !log cgoubert@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
[14:14:27] <logmsgbot>	 !log cgoubert@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
[14:15:06] <wikibugs>	 (03PS3) 10Muehlenhoff: ferm::service: Fix handling of multiple ports [puppet] - 10https://gerrit.wikimedia.org/r/944233 (https://phabricator.wikimedia.org/T336497)
[14:15:30] <logmsgbot>	 !log cgoubert@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
[14:15:40] <wikibugs>	 (03CR) 10Ssingh: [C: 03+1] Version 0.4.6-4 [debs/python-logstash] - 10https://gerrit.wikimedia.org/r/944209 (https://phabricator.wikimedia.org/T342154) (owner: 10Fabfur)
[14:15:51] <logmsgbot>	 !log cgoubert@deploy1002 helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
[14:16:01] <logmsgbot>	 !log btullis@cumin1001 START - Cookbook sre.hosts.reboot-single for host an-tool1005.eqiad.wmnet
[14:16:31] <logmsgbot>	 !log cgoubert@deploy1002 helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
[14:17:18] <logmsgbot>	 !log cgoubert@deploy1002 helmfile [codfw] START helmfile.d/services/mw-api-int: apply
[14:17:33] <jinxer-wm>	 (JobUnavailable) resolved: (2) Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[14:18:04] <logmsgbot>	 !log cgoubert@deploy1002 helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
[14:18:09] <logmsgbot>	 !log cgoubert@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
[14:19:08] <logmsgbot>	 !log cgoubert@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
[14:19:52] <logmsgbot>	 !log cgoubert@deploy1002 helmfile [codfw] START helmfile.d/services/mw-web: apply
[14:19:54] <logmsgbot>	 !log btullis@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1005.eqiad.wmnet
[14:20:42] <logmsgbot>	 !log cgoubert@deploy1002 helmfile [codfw] DONE helmfile.d/services/mw-web: apply
[14:20:48] <logmsgbot>	 !log cgoubert@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-web: apply
[14:21:13] <wikibugs>	 (03CR) 10Volans: [C: 03+2] wmf-update-known-hosts-production: fix CNAMEs (031 comment) [debs/wmf-sre-laptop] - 10https://gerrit.wikimedia.org/r/944225 (owner: 10Volans)
[14:21:31] <logmsgbot>	 !log cgoubert@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
[14:21:51] <logmsgbot>	 !log cgoubert@deploy1002 helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
[14:21:52] <logmsgbot>	 !log cgoubert@deploy1002 helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
[14:22:05] <logmsgbot>	 !log cgoubert@deploy1002 helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
[14:22:06] <logmsgbot>	 !log cgoubert@deploy1002 helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
[14:22:19] <logmsgbot>	 !log btullis@cumin1001 START - Cookbook sre.hosts.reboot-single for host an-tool1011.eqiad.wmnet
[14:23:04] <wikibugs>	 10SRE-tools, 10Infrastructure-Foundations, 10cloud-services-team (FY2023/2024-Q1): tcpircbot: enable logging to #wikimedia-cloud-feed - https://phabricator.wikimedia.org/T342666 (10fnegri) I grouped `logmsgbot_cloud` to the existing `logmsgbot` account:  ` 16:16 <logmsgbot_cloud> identify logmsgbot {LOGMSGBO...
[14:24:04] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P49902 and previous config saved to /var/cache/conftool/dbconfig/20230801-142403-ladsgroup.json
[14:24:04] <logmsgbot>	 !log cgoubert@deploy1002 helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
[14:24:04] <logmsgbot>	 !log cgoubert@deploy1002 helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
[14:24:17] <logmsgbot>	 !log cgoubert@deploy1002 helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
[14:24:17] <logmsgbot>	 !log cgoubert@deploy1002 helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
[14:24:35] <logmsgbot>	 !log cgoubert@deploy1002 helmfile [codfw] START helmfile.d/services/mw-misc: apply
[14:25:10] <logmsgbot>	 !log cgoubert@deploy1002 helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
[14:25:16] <logmsgbot>	 !log cgoubert@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-misc: apply
[14:25:40] <logmsgbot>	 !log cgoubert@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
[14:26:03] <claime>	 Lucas_WMDE: Ok, all redeployed now :)
[14:26:12] <logmsgbot>	 !log btullis@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1011.eqiad.wmnet
[14:26:14] <claime>	 That should hold until we add hardware
[14:26:40] <Lucas_WMDE>	 claime: ok, thanks!
[14:26:48] <Lucas_WMDE>	 would it make sense to do a test deployment now?
[14:27:14] <Lucas_WMDE>	 (I could pick up one of the config changes if the person is still around, or do a backport of a change that would be nice-but-not-critical to get in before the next train)
[14:27:34] <logmsgbot>	 !log btullis@cumin1001 START - Cookbook sre.hosts.reboot-single for host dse-k8s-ctrl1001.eqiad.wmnet
[14:28:02] <claime>	 If you want to move some backports/config changes forward, yes
[14:28:34] <claime>	 What I did manually is what scap does on its own, but it wouldn't hurt to check end-to-end
[14:29:03] <Lucas_WMDE>	 ok
[14:29:18] <Lucas_WMDE>	 jouncebot: now
[14:29:19] <jouncebot>	 No deployments scheduled for the next 0 hour(s) and 30 minute(s)
[14:29:24] <Lucas_WMDE>	 jouncebot: next
[14:29:25] <jouncebot>	 In 0 hour(s) and 30 minute(s): Wikifunctions.org enablement (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230801T1500)
[14:29:39] <wikibugs>	 10SRE-tools, 10Infrastructure-Foundations, 10cloud-services-team (FY2023/2024-Q1): tcpircbot: enable logging to #wikimedia-cloud-feed - https://phabricator.wikimedia.org/T342666 (10bd808) >>! In T342666#9059267, @fnegri wrote: > I grouped `logmsgbot_cloud` to the existing `logmsgbot` account: Thank you. :) `...
[14:29:43] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.provision for host cloudcontrol2008-dev.mgmt.codfw.wmnet with reboot policy FORCED
[14:29:45] <Lucas_WMDE>	 aanzx: still around? we could try that idwikisource change now
[14:29:49] <Lucas_WMDE>	 (and hope it takes less than 30 minutes)
[14:30:08] <Lucas_WMDE>	 (I think that actually rules out doing a backport, that wouldn’t finish in time taking CI into account)
[14:30:20] <Lucas_WMDE>	 (so a config change seems better)
[14:30:43] <Lucas_WMDE>	 or koi, are you still around?
[14:34:23] <Lucas_WMDE>	 ok, no deployment I think
[14:34:40] <claime>	 It's ok
[14:34:50] <Lucas_WMDE>	 !log UTC afternoon backport+config window done (one change, then some k8s issues, which are resolved for now)
[14:34:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:34:52] <wikibugs>	 (03CR) 10Jbond: [C: 04-1] (WIP) puppetdb-microservice: update puppetdb micro service so it streams data (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/940403 (https://phabricator.wikimedia.org/T342458) (owner: 10Jbond)
[14:34:58] * Lucas_WMDE done
[14:35:06] <claime>	 Sorry for the disturbance :(
[14:35:15] <Lucas_WMDE>	 no problem
[14:35:26] <Lucas_WMDE>	 I guess James_F’s change will end up testing it then ^^
[14:35:38] <claime>	 *fear*
[14:35:59] <James_F>	 Oh no.
[14:36:27] <James_F>	 What might break?
[14:36:35] <wikibugs>	 10SRE-tools, 10Infrastructure-Foundations, 10Patch-For-Review, 10cloud-services-team (FY2023/2024-Q1): Allow wmcs cookbooks running on cloudcuminXXXX to write to the SAL - https://phabricator.wikimedia.org/T325756 (10fnegri)
[14:36:51] <wikibugs>	 10SRE-tools, 10Infrastructure-Foundations, 10cloud-services-team (FY2023/2024-Q1): tcpircbot: enable logging to #wikimedia-cloud-feed - https://phabricator.wikimedia.org/T342666 (10fnegri) 05In progress→03Resolved > [14:28] ChanServ sets mode +v logmsgbot_cloud  Thanks, I was about to ask you! :)
[14:36:57] <Lucas_WMDE>	 “Running helmfile” steps might take a long time and eventually time out (though they shouldn’t, claime fixed it for now)
[14:36:58] <claime>	 James_F: mw-on-k8s deployments, but they shouldn't anymore, I've tricked them
[14:37:05] <Lucas_WMDE>	 but even if they do, scap should keep going
[14:37:11] <James_F>	 Ack.
[14:37:16] <Lucas_WMDE>	 worst case is that k8s hosts (1% of requests) won’t see the change
[14:37:30] <James_F>	 Thankfully Wikifunctions.org isn't solely served by k8s.
[14:37:34] <Lucas_WMDE>	 (well. worst expected case, I suppose)
[14:37:51] <James_F>	 But sadly it's not part of the WikimediaDebug extension's allow list yet, so we can't actually test changes before they go live. Joy.
[14:38:08] <Lucas_WMDE>	 ah. that’s annoying :/
[14:38:14] <logmsgbot>	 !log btullis@cumin1001 END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host dse-k8s-ctrl1001.eqiad.wmnet
[14:38:37] <James_F>	 Yeah, apparently it's waiting to find someone to agree to own it now that Performance is no more.
[14:39:07] <icinga-wm>	 PROBLEM - Check systemd state on dse-k8s-ctrl1001 is CRITICAL: CRITICAL - degraded: The following units failed: ferm.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[14:39:09] <wikibugs>	 (03CR) 10Jbond: ferm::service: Fix handling of multiple ports (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/944233 (https://phabricator.wikimedia.org/T336497) (owner: 10Muehlenhoff)
[14:39:10] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1185 (T342617)', diff saved to https://phabricator.wikimedia.org/P49903 and previous config saved to /var/cache/conftool/dbconfig/20230801-143909-ladsgroup.json
[14:39:11] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1200.eqiad.wmnet with reason: Maintenance
[14:39:12] <stashbot>	 T342617: Make old columns of externallinks nullable - https://phabricator.wikimedia.org/T342617
[14:39:25] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1200.eqiad.wmnet with reason: Maintenance
[14:39:31] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1200 (T342617)', diff saved to https://phabricator.wikimedia.org/P49904 and previous config saved to /var/cache/conftool/dbconfig/20230801-143930-ladsgroup.json
[14:41:10] <wikibugs>	 (03PS5) 10Jforrester: Move wikifunctions.org from locked-down to limited deployment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/941515 (https://phabricator.wikimedia.org/T342820)
[14:42:20] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10vm-requests: Site: eqiad VM 1 for config-master - https://phabricator.wikimedia.org/T343212 (10jbond) 05In progress→03Resolved System built
[14:42:22] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Puppet-Infrastructure, 10Patch-For-Review, 10Puppet (Puppet 7.0): Move config-master to dedicated VMs - https://phabricator.wikimedia.org/T341717 (10jbond)
[14:42:35] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10vm-requests: Site: codfw VM 1 for config-master - https://phabricator.wikimedia.org/T343213 (10jbond) 05In progress→03Resolved system built
[14:42:40] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Puppet-Infrastructure, 10Patch-For-Review, 10Puppet (Puppet 7.0): Move config-master to dedicated VMs - https://phabricator.wikimedia.org/T341717 (10jbond)
[14:43:15] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Puppet-Infrastructure, 10Patch-For-Review, 10Puppet (Puppet 7.0): Move config-master to dedicated VMs - https://phabricator.wikimedia.org/T341717 (10jbond)
[14:46:11] <wikibugs>	 (03PS1) 10Cathal Mooney: Do not compare speed of disabled interfaces when validating blocks [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/944240 (https://phabricator.wikimedia.org/T303529)
[14:46:42] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'db1196 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P49905 and previous config saved to /var/cache/conftool/dbconfig/20230801-144641-ladsgroup.json
[14:46:48] <logmsgbot>	 !log jbond@cumin1001 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add config-master[12]001 - jbond@cumin1001 - T341717"
[14:46:50] <stashbot>	 T341717: Move config-master to dedicated VMs - https://phabricator.wikimedia.org/T341717
[14:47:43] <logmsgbot>	 !log jbond@cumin1001 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add config-master[12]001 - jbond@cumin1001 - T341717"
[14:55:22] <wikibugs>	 (03PS1) 10Jbond: O:config_master: Add new role for config-master [puppet] - 10https://gerrit.wikimedia.org/r/944242 (https://phabricator.wikimedia.org/T341717)
[14:55:26] <wikibugs>	 (03PS1) 10Jbond: site.pp: move config-master hosts to config-master role [puppet] - 10https://gerrit.wikimedia.org/r/944243 (https://phabricator.wikimedia.org/T341717)
[14:55:41] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] O:config_master: Add new role for config-master [puppet] - 10https://gerrit.wikimedia.org/r/944242 (https://phabricator.wikimedia.org/T341717) (owner: 10Jbond)
[14:56:49] <wikibugs>	 (03PS2) 10Jbond: O:config_master: Add new role for config-master [puppet] - 10https://gerrit.wikimedia.org/r/944242 (https://phabricator.wikimedia.org/T341717)
[14:56:51] <wikibugs>	 (03PS2) 10Jbond: site.pp: move config-master hosts to config-master role [puppet] - 10https://gerrit.wikimedia.org/r/944243 (https://phabricator.wikimedia.org/T341717)
[14:57:02] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T342617)', diff saved to https://phabricator.wikimedia.org/P49906 and previous config saved to /var/cache/conftool/dbconfig/20230801-145702-ladsgroup.json
[14:57:09] <stashbot>	 T342617: Make old columns of externallinks nullable - https://phabricator.wikimedia.org/T342617
[15:00:05] <jouncebot>	 James_F: Time to snap out of that daydream and deploy Wikifunctions.org enablement. Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230801T1500).
[15:00:10] <James_F>	 Ack.
[15:00:53] <wikibugs>	 (03PS3) 10Jbond: O:config_master: Add new role for config-master [puppet] - 10https://gerrit.wikimedia.org/r/944242 (https://phabricator.wikimedia.org/T341717)
[15:00:55] <wikibugs>	 (03PS3) 10Jbond: site.pp: move config-master hosts to config-master role [puppet] - 10https://gerrit.wikimedia.org/r/944243 (https://phabricator.wikimedia.org/T341717)
[15:01:18] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] O:config_master: Add new role for config-master [puppet] - 10https://gerrit.wikimedia.org/r/944242 (https://phabricator.wikimedia.org/T341717) (owner: 10Jbond)
[15:01:46] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'db1196 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P49907 and previous config saved to /var/cache/conftool/dbconfig/20230801-150146-ladsgroup.json
[15:01:59] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM" [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/944240 (https://phabricator.wikimedia.org/T303529) (owner: 10Cathal Mooney)
[15:02:03] <icinga-wm>	 RECOVERY - cinder-volume process on cloudcontrol1005 is OK: PROCS OK: 2 processes with regex args ^/usr/bin/python.* /usr/bin/cinder-volume https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[15:02:16] <_joe_>	 James_F: you should be ok with k8s, but if you have issues, let me know
[15:02:23] <James_F>	 Will do!
[15:04:04] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/944211 (owner: 10Muehlenhoff)
[15:04:33] <wikibugs>	 (03CR) 10Volans: [V: 03+2 C: 03+2] wmf-update-known-hosts-production: fix CNAMEs [debs/wmf-sre-laptop] - 10https://gerrit.wikimedia.org/r/944225 (owner: 10Volans)
[15:04:51] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by apine@deploy1002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/941515 (https://phabricator.wikimedia.org/T342820) (owner: 10Jforrester)
[15:05:11] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/944230 (https://phabricator.wikimedia.org/T342345) (owner: 10Volans)
[15:05:23] <wikibugs>	 (03CR) 10Muehlenhoff: ferm::service: Fix handling of multiple ports (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/944233 (https://phabricator.wikimedia.org/T336497) (owner: 10Muehlenhoff)
[15:05:31] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.provision for host cloudnet2007-dev.mgmt.codfw.wmnet with reboot policy FORCED
[15:05:44] <wikibugs>	 (03Merged) 10jenkins-bot: Move wikifunctions.org from locked-down to limited deployment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/941515 (https://phabricator.wikimedia.org/T342820) (owner: 10Jforrester)
[15:05:52] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] sre.ganeti.resource-report: Add cookbook to fetch Ganeti resources (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/924062 (owner: 10Jbond)
[15:06:15] <logmsgbot>	 !log apine@deploy1002 Started scap: Backport for [[gerrit:941515|Move wikifunctions.org from locked-down to limited deployment (T342820)]]
[15:06:18] <stashbot>	 T342820: Migrate wikifunctions.org from locked-down to limited mode, letting users edit wikitext pages and some - https://phabricator.wikimedia.org/T342820
[15:06:31] <icinga-wm>	 PROBLEM - cinder-volume process on cloudcontrol1005 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python.* /usr/bin/cinder-volume https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[15:06:39] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on dse-k8s-ctrl1001 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[15:06:41] <icinga-wm>	 PROBLEM - cinder-scheduler process on cloudcontrol1005 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python.* /usr/bin/cinder-scheduler https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[15:06:42] <wikibugs>	 (03PS4) 10Jbond: O:config_master: Add new role for config-master [puppet] - 10https://gerrit.wikimedia.org/r/944242 (https://phabricator.wikimedia.org/T341717)
[15:06:45] <wikibugs>	 (03PS4) 10Jbond: site.pp: move config-master hosts to config-master role [puppet] - 10https://gerrit.wikimedia.org/r/944243 (https://phabricator.wikimedia.org/T341717)
[15:07:25] <icinga-wm>	 PROBLEM - cinder-api http on cloudcontrol1005 is CRITICAL: connect to address 10.64.151.3 and port 18776: Connection refused https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[15:07:32] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to restricted for dbrant - https://phabricator.wikimedia.org/T343122 (10thcipriani) >>! In T343122#9057877, @MoritzMuehlenhoff wrote: > @thcipriani This needs your signoff  Approved from my side, thanks for the ping!
[15:07:54] <logmsgbot>	 !log apine@deploy1002 jforrester and apine: Backport for [[gerrit:941515|Move wikifunctions.org from locked-down to limited deployment (T342820)]] synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
[15:08:10] <logmsgbot>	 !log apine@deploy1002 jforrester and apine: Continuing with sync
[15:11:35] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcontrol2008-dev.mgmt.codfw.wmnet with reboot policy FORCED
[15:12:09] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P49908 and previous config saved to /var/cache/conftool/dbconfig/20230801-151208-ladsgroup.json
[15:14:00] <logmsgbot>	 !log apine@deploy1002 Finished scap: Backport for [[gerrit:941515|Move wikifunctions.org from locked-down to limited deployment (T342820)]] (duration: 07m 45s)
[15:14:03] <stashbot>	 T342820: Migrate wikifunctions.org from locked-down to limited mode, letting users edit wikitext pages and some - https://phabricator.wikimedia.org/T342820
[15:14:28] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1200 (T342617)', diff saved to https://phabricator.wikimedia.org/P49909 and previous config saved to /var/cache/conftool/dbconfig/20230801-151427-ladsgroup.json
[15:14:30] <stashbot>	 T342617: Make old columns of externallinks nullable - https://phabricator.wikimedia.org/T342617
[15:14:34] <jinxer-wm>	 (KubernetesAPILatency) firing: High Kubernetes API latency (POST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=codfw&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[15:15:07] <moritzm>	 !log bounce ferm on dse-k8s-ctrl1001
[15:15:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:16:11] <icinga-wm>	 RECOVERY - Check systemd state on dse-k8s-ctrl1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[15:16:27] <James_F>	 Well done Lucas_WMDE for getting the first edit
[15:16:32] <Lucas_WMDE>	 :D
[15:16:36] <wikibugs>	 (03CR) 10Volans: [C: 03+2] cumin: fix installer configuration [puppet] - 10https://gerrit.wikimedia.org/r/944230 (https://phabricator.wikimedia.org/T342345) (owner: 10Volans)
[15:16:51] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'db1196 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P49910 and previous config saved to /var/cache/conftool/dbconfig/20230801-151650-ladsgroup.json
[15:16:53] <Lucas_WMDE>	 looks like I managed to get lucky with an mw server before the scap had fully finished ^^
[15:17:06] <James_F>	 Yeah.
[15:17:29] <wikibugs>	 10SRE, 10Infrastructure-Foundations: Integrate Bookworm 12.1 point update - https://phabricator.wikimedia.org/T343121 (10MoritzMuehlenhoff)
[15:17:29] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.provision for host cloudnet2008-dev.mgmt.codfw.wmnet with reboot policy FORCED
[15:18:10] <hauskater>	 James_F: is that site a SUL one?
[15:18:23] <James_F>	 hauskater: Yes.
[15:18:38] <hauskater>	 hmm, guess firefox messed with my cookies again then
[15:19:01] <James_F>	 hauskater: New domain, if you haven't logged into prod since Wednesday you'll need to re-do it.
[15:19:25] <James_F>	 hauskater: It's not often a new wiki domain is added; last one before last week was a little thing call wikidata.org in 2012, after all. :-)
[15:19:34] <jinxer-wm>	 (KubernetesAPILatency) resolved: High Kubernetes API latency (POST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=codfw&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[15:20:05] <hauskater>	 James_F: nah, I always log-out after I finish for the day
[15:20:07] <Lucas_WMDE>	 James_F: wasn’t wikivoyage a bit later? (Jan 2013 says enwiki)
[15:20:16] <wikibugs>	 10SRE-tools, 10Infrastructure-Foundations, 10Patch-For-Review, 10cloud-services-team (FY2023/2024-Q1): Allow wmcs cookbooks running on cloudcuminXXXX to write to the SAL - https://phabricator.wikimedia.org/T325756 (10fnegri) > we can try to reuse the same logger, but configure the destination host (to be w...
[15:20:26] <Lucas_WMDE>	 not exactly a new project but a new TLD nonetheless
[15:20:27] <James_F>	 Lucas_WMDE: Officially beforehand, but yes, the SULification was perhaps later? I forget.
[15:20:28] <hauskater>	 I guess the vpn is blocked so account auto-creation is banned
[15:22:00] <hauskater>	 bingo
[15:22:32] <James_F>	 hauskater: Tell meta. :-(
[15:25:39] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: mediawiki::wanrouter_cache: add wikifunctions [puppet] - 10https://gerrit.wikimedia.org/r/944247 (https://phabricator.wikimedia.org/T297815)
[15:25:41] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: mediawiki::wancache: add the wikifunctions pools and routes [puppet] - 10https://gerrit.wikimedia.org/r/944248 (https://phabricator.wikimedia.org/T297815)
[15:26:00] <wikibugs>	 (03CR) 10Btullis: idp_test: add datahub_staging as a OIDC service (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/944231 (https://phabricator.wikimedia.org/T305874) (owner: 10Stevemunene)
[15:26:11] <_joe_>	 James_F: ^^ server side configuration for the wikifunctions memcached
[15:26:31] <wikibugs>	 (03PS1) 10Volans: Revert "validators: temporary support for esams->knams" [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/944192
[15:26:36] <James_F>	 _joe_: Nice.
[15:26:39] <hauskater>	 James_F: It's probable I made the block myself when trying to stop spambots heh :)
[15:26:39] <_joe_>	 although I still have big doubts about accessing it from two different applications to read/write the same data and use it as a means of communication of sorts
[15:26:47] <_joe_>	 we cna iron that out later
[15:26:53] <James_F>	 Yeah, not an issue for months.
[15:27:15] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P49911 and previous config saved to /var/cache/conftool/dbconfig/20230801-152714-ladsgroup.json
[15:27:23] <_joe_>	 James_F: well it is being written from one side and read from the other right now, or did I misunderstand the diagram?
[15:27:37] <_joe_>	 it would be
[15:27:40] <James_F>	 No, that diagram is future state. Right now it's only being read and written from MW.
[15:27:45] <_joe_>	 ok
[15:27:51] <_joe_>	 so not an issue at all for now
[15:28:02] <_joe_>	 we can reevaluate what's best once we get around there
[15:28:13] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] mediawiki::wancache: add the wikifunctions pools and routes [puppet] - 10https://gerrit.wikimedia.org/r/944248 (https://phabricator.wikimedia.org/T297815) (owner: 10Giuseppe Lavagetto)
[15:28:17] <James_F>	 +1
[15:28:19] <icinga-wm>	 RECOVERY - cinder-api http on cloudcontrol1005 is OK: HTTP OK: HTTP/1.1 300 Multiple Choices - 663 bytes in 0.013 second response time https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[15:28:20] <_joe_>	 ok so, with the two patches above, that I will have to review tomorrow
[15:28:30] <_joe_>	 and one mediawiki-config patch, we should be good to go
[15:29:05] <icinga-wm>	 RECOVERY - cinder-scheduler process on cloudcontrol1005 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python.* /usr/bin/cinder-scheduler https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[15:29:34] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P49912 and previous config saved to /var/cache/conftool/dbconfig/20230801-152933-ladsgroup.json
[15:31:56] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'db1196 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P49913 and previous config saved to /var/cache/conftool/dbconfig/20230801-153155-ladsgroup.json
[15:32:36] <wikibugs>	 10SRE, 10API Platform, 10Anti-Harassment, 10Content-Transform-Team, and 19 others: Migrate PipelineLib repos to GitLab - https://phabricator.wikimedia.org/T332953 (10BTullis)
[15:33:55] <icinga-wm>	 PROBLEM - Host mw2431 is DOWN: PING CRITICAL - Packet loss = 100%
[15:34:34] <sukhe>	 ^ expected?
[15:36:07] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [debs/python-logstash] - 10https://gerrit.wikimedia.org/r/944209 (https://phabricator.wikimedia.org/T342154) (owner: 10Fabfur)
[15:36:12] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: mediawiki::wancache: add the wikifunctions pools and routes [puppet] - 10https://gerrit.wikimedia.org/r/944248 (https://phabricator.wikimedia.org/T297815)
[15:36:19] <_joe_>	 sukhe: abs not
[15:37:02] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudnet2007-dev.mgmt.codfw.wmnet with reboot policy FORCED
[15:37:09] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on dse-k8s-ctrl1001 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[15:37:34] <sukhe>	 _joe_: ok
[15:37:41] <sukhe>	 (in a meeting but wanted to flag it)
[15:38:00] <moritzm>	 mw2431 seems completely borked and needs a dc ops task, I can't even connect to the serial console
[15:38:14] <_joe_>	 ouch
[15:38:28] <sukhe>	 papaul, JennH ^
[15:38:29] <papaul>	 moritzm: can take a look i am on site
[15:38:43] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] mediawiki::wancache: add the wikifunctions pools and routes [puppet] - 10https://gerrit.wikimedia.org/r/944248 (https://phabricator.wikimedia.org/T297815) (owner: 10Giuseppe Lavagetto)
[15:38:58] <moritzm>	 papaul: ack, that would be good, I can also file a task if that's easier
[15:39:43] <papaul>	 moritzm: no need let me us check if that server is in the rack that i am working in maybe i touched the network cable while doing other stuffs
[15:39:47] <papaul>	 i am wokring in b6 
[15:39:51] <papaul>	 looking now
[15:39:54] <moritzm>	 ack
[15:40:19] <papaul>	 moritzm: yes that server is in b6 
[15:40:23] <papaul>	 checking console now
[15:41:45] <papaul>	 moritzm: console is working
[15:42:21] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T342617)', diff saved to https://phabricator.wikimedia.org/P49914 and previous config saved to /var/cache/conftool/dbconfig/20230801-154220-ladsgroup.json
[15:42:22] <wikibugs>	 10SRE, 10Infrastructure-Foundations: Integrate Bookworm 12.1 point update - https://phabricator.wikimedia.org/T343121 (10MoritzMuehlenhoff)
[15:42:23] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2157.codfw.wmnet with reason: Maintenance
[15:42:24] <stashbot>	 T342617: Make old columns of externallinks nullable - https://phabricator.wikimedia.org/T342617
[15:42:36] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2157.codfw.wmnet with reason: Maintenance
[15:42:42] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db2157 (T342617)', diff saved to https://phabricator.wikimedia.org/P49915 and previous config saved to /var/cache/conftool/dbconfig/20230801-154242-ladsgroup.json
[15:44:40] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P49916 and previous config saved to /var/cache/conftool/dbconfig/20230801-154439-ladsgroup.json
[15:44:59] <wikibugs>	 (03PS3) 10Giuseppe Lavagetto: mediawiki::wancache: add the wikifunctions pools and routes [puppet] - 10https://gerrit.wikimedia.org/r/944248 (https://phabricator.wikimedia.org/T297815)
[15:45:25] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudnet2008-dev.mgmt.codfw.wmnet with reboot policy FORCED
[15:46:18] <moritzm>	 papaul: I can connect to the serial console now, but dmesg tells me the link of the server itself is down, maybe some dislocated cable?
[15:46:30] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10User-aborrero, 10cloud-services-team (Hardware): Q1:rack/setup/install cloudcontrol200[6-8]-dev, cloudnet200[7-8]-dev - https://phabricator.wikimedia.org/T342456 (10Papaul)
[15:47:23] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [V: 03+1] "PCC SUCCESS (CORE_DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/42742/console" [puppet] - 10https://gerrit.wikimedia.org/r/944248 (https://phabricator.wikimedia.org/T297815) (owner: 10Giuseppe Lavagetto)
[15:47:33] <papaul>	 moritzm: ok checking cable
[15:47:43] <wikibugs>	 (03CR) 10Hnowlan: [C: 03+2] images: enable "debug" on memcache, log when servers are dead [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/941901 (https://phabricator.wikimedia.org/T341805) (owner: 10Hnowlan)
[15:48:27] <icinga-wm>	 RECOVERY - Host mw2431 is UP: PING OK - Packet loss = 0%, RTA = 31.71 ms
[15:48:38] <papaul>	 moritzm: ^
[15:49:05] <moritzm>	 papaul: thanks
[15:49:14] <papaul>	 moritzm: you welcome 
[15:49:29] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.provision for host cloudvirt2004-dev.mgmt.codfw.wmnet with reboot policy FORCED
[15:52:20] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] O:config_master: Add new role for config-master [puppet] - 10https://gerrit.wikimedia.org/r/944242 (https://phabricator.wikimedia.org/T341717) (owner: 10Jbond)
[15:52:23] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] site.pp: move config-master hosts to config-master role [puppet] - 10https://gerrit.wikimedia.org/r/944243 (https://phabricator.wikimedia.org/T341717) (owner: 10Jbond)
[15:52:28] <wikibugs>	 10SRE-tools, 10Infrastructure-Foundations, 10Goal, 10cloud-services-team (FY2023/2024-Q1): cloudcumin: decide sudoers rules for users without global root - https://phabricator.wikimedia.org/T325067 (10fnegri)
[15:52:49] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.provision for host cloudvirt2005-dev.mgmt.codfw.wmnet with reboot policy FORCED
[15:56:16] <wikibugs>	 (03Merged) 10jenkins-bot: images: enable "debug" on memcache, log when servers are dead [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/941901 (https://phabricator.wikimedia.org/T341805) (owner: 10Hnowlan)
[15:59:46] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1200 (T342617)', diff saved to https://phabricator.wikimedia.org/P49917 and previous config saved to /var/cache/conftool/dbconfig/20230801-155945-ladsgroup.json
[15:59:47] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1210.eqiad.wmnet with reason: Maintenance
[15:59:50] <stashbot>	 T342617: Make old columns of externallinks nullable - https://phabricator.wikimedia.org/T342617
[16:00:01] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1210.eqiad.wmnet with reason: Maintenance
[16:00:05] <jouncebot>	 jbond and rzl: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for Puppet request window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230801T1600).
[16:00:05] <jouncebot>	 lucaswerkmeister and Lucas_WMDE: A patch you scheduled for Puppet request window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[16:00:07] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1210 (T342617)', diff saved to https://phabricator.wikimedia.org/P49918 and previous config saved to /var/cache/conftool/dbconfig/20230801-160006-ladsgroup.json
[16:00:23] <wikibugs>	 10SRE-tools, 10Infrastructure-Foundations, 10Spicerack, 10cloud-services-team (FY2023/2024-Q1): [spicerack] support including {project} in SAL messages - https://phabricator.wikimedia.org/T341793 (10fnegri) 05In progress→03Resolved
[16:00:26] * jbond looking
[16:00:28] <wikibugs>	 10SRE-tools, 10Infrastructure-Foundations, 10Patch-For-Review, 10cloud-services-team (FY2023/2024-Q1): Allow wmcs cookbooks running on cloudcuminXXXX to write to the SAL - https://phabricator.wikimedia.org/T325756 (10fnegri)
[16:01:59] <jbond>	 lucaswerkmeister yu will need to get a +1 from someone in toolforge before i can merge
[16:04:55] <icinga-wm>	 PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[16:05:33] <wikibugs>	 (03PS1) 10Jbond: config_master: add server_aliases: [puppet] - 10https://gerrit.wikimedia.org/r/944253 (https://phabricator.wikimedia.org/T341717)
[16:05:44] <wikibugs>	 (03PS1) 10Muehlenhoff: graphite: Pass port without Ferm-specific syntax [puppet] - 10https://gerrit.wikimedia.org/r/944254 (https://phabricator.wikimedia.org/T336497)
[16:06:59] <logmsgbot>	 !log hnowlan@deploy1002 helmfile [staging] START helmfile.d/services/thumbor: apply
[16:07:14] <logmsgbot>	 !log hnowlan@deploy1002 helmfile [staging] DONE helmfile.d/services/thumbor: apply
[16:07:47] <wikibugs>	 10SRE, 10Infrastructure-Foundations: Integrate Bookworm 12.1 point update - https://phabricator.wikimedia.org/T343121 (10MoritzMuehlenhoff)
[16:11:07] <wikibugs>	 (03PS1) 10Elukey: admin_ng: increase resources allocated for knative pods [deployment-charts] - 10https://gerrit.wikimedia.org/r/944256
[16:15:15] <icinga-wm>	 RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[16:17:05] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] admin_ng: increase resources allocated for knative pods [deployment-charts] - 10https://gerrit.wikimedia.org/r/944256 (owner: 10Elukey)
[16:20:58] <logmsgbot>	 !log elukey@deploy1002 helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
[16:21:22] <logmsgbot>	 !log elukey@deploy1002 helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
[16:22:18] <logmsgbot>	 !log elukey@deploy1002 helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
[16:22:50] <logmsgbot>	 !log elukey@deploy1002 helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
[16:23:20] <logmsgbot>	 !log elukey@deploy1002 helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
[16:23:24] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10Data-Persistence: Q1:rack/setup/install db21[88-95] - https://phabricator.wikimedia.org/T342174 (10Jhancock.wm)
[16:23:43] <logmsgbot>	 !log elukey@deploy1002 helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
[16:25:35] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10Data-Persistence: Q1:rack/setup/install lists2001.codfw.wmnet - https://phabricator.wikimedia.org/T342375 (10Jhancock.wm)
[16:25:41] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2157 (T342617)', diff saved to https://phabricator.wikimedia.org/P49919 and previous config saved to /var/cache/conftool/dbconfig/20230801-162541-ladsgroup.json
[16:25:44] <stashbot>	 T342617: Make old columns of externallinks nullable - https://phabricator.wikimedia.org/T342617
[16:26:56] <wikibugs>	 (03CR) 10Cathal Mooney: [C: 03+1] "LGTM" [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/944192 (owner: 10Volans)
[16:27:06] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10observability: Q1:rack/setup/install titan200[12] - https://phabricator.wikimedia.org/T342300 (10Jhancock.wm)
[16:28:54] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10Data-Persistence: Q1:rack/setup/install pc201[56] - https://phabricator.wikimedia.org/T342163 (10Jhancock.wm)
[16:31:26] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/42744/console" [puppet] - 10https://gerrit.wikimedia.org/r/944253 (https://phabricator.wikimedia.org/T341717) (owner: 10Jbond)
[16:31:53] <wikibugs>	 (03CR) 10Jbond: [V: 03+1 C: 03+2] config_master: add server_aliases: [puppet] - 10https://gerrit.wikimedia.org/r/944253 (https://phabricator.wikimedia.org/T341717) (owner: 10Jbond)
[16:35:01] <wikibugs>	 (03PS1) 10Jbond: config-master: we are in yaml now not puppet [puppet] - 10https://gerrit.wikimedia.org/r/944260 (https://phabricator.wikimedia.org/T341717)
[16:35:13] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt2004-dev.mgmt.codfw.wmnet with reboot policy FORCED
[16:35:16] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt2005-dev.mgmt.codfw.wmnet with reboot policy FORCED
[16:38:57] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.provision for host cloudvirt2006-dev.mgmt.codfw.wmnet with reboot policy FORCED
[16:40:48] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P49920 and previous config saved to /var/cache/conftool/dbconfig/20230801-164047-ladsgroup.json
[16:41:13] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] config-master: we are in yaml now not puppet [puppet] - 10https://gerrit.wikimedia.org/r/944260 (https://phabricator.wikimedia.org/T341717) (owner: 10Jbond)
[16:41:16] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: Average latency high: eqiad parsoid GET/200 - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=eqiad&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[16:42:05] <wikibugs>	 10SRE, 10Traffic, 10observability: HAProxy metrics go down on config reload - https://phabricator.wikimedia.org/T343000 (10Vgutierrez) 05Stalled→03In progress getting rid of KA didn't help a lot per https://grafana.wikimedia.org/goto/JcVQsuqVk?orgId=1: {F37158713}  any suggestions @fgiunchedi on how to p...
[16:45:51] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1210 (T342617)', diff saved to https://phabricator.wikimedia.org/P49921 and previous config saved to /var/cache/conftool/dbconfig/20230801-164550-ladsgroup.json
[16:45:57] <stashbot>	 T342617: Make old columns of externallinks nullable - https://phabricator.wikimedia.org/T342617
[16:46:16] <jinxer-wm>	 (MediaWikiLatencyExceeded) resolved: Average latency high: eqiad parsoid GET/200 - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=eqiad&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[16:48:19] <wikibugs>	 (03PS1) 10Jbond: O:config_master: use cfssl for tls [puppet] - 10https://gerrit.wikimedia.org/r/944263
[16:49:19] <icinga-wm>	 PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[16:55:34] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] O:config_master: use cfssl for tls [puppet] - 10https://gerrit.wikimedia.org/r/944263 (owner: 10Jbond)
[16:55:40] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to releasers-wikibase for darthmon_wmde - https://phabricator.wikimedia.org/T342968 (10KFrancis) Hi all, I am confirming there in an NDA on file.  Please proceed with the access request.  Thanks!
[16:55:54] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P49922 and previous config saved to /var/cache/conftool/dbconfig/20230801-165553-ladsgroup.json
[16:56:57] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to releasers-wikibase for adee_wmde - https://phabricator.wikimedia.org/T342969 (10KFrancis) No worries!  :-)
[16:57:59] <wikibugs>	 (03PS1) 10Jbond: config-master: add profile::discovery variables [puppet] - 10https://gerrit.wikimedia.org/r/944264 (https://phabricator.wikimedia.org/T341717)
[16:58:34] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to  releasers-wikibase for roti_WMDE - https://phabricator.wikimedia.org/T342972 (10KFrancis) Hi all, I am confirming an NDA is on file for Robert Timm.  Thanks!
[16:59:15] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to  releasers-wikibase for lojo_wmde - https://phabricator.wikimedia.org/T342973 (10KFrancis) NDA is confirmed.  Thanks!
[16:59:51] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/42745/console" [puppet] - 10https://gerrit.wikimedia.org/r/944264 (https://phabricator.wikimedia.org/T341717) (owner: 10Jbond)
[17:00:04] <jouncebot>	 Deploy window MediaWiki infrastucture (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230801T1700)
[17:00:57] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P49923 and previous config saved to /var/cache/conftool/dbconfig/20230801-170057-ladsgroup.json
[17:01:11] <icinga-wm>	 RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[17:01:15] <wikibugs>	 (03CR) 10Jdlrobson: "Oh I see what happened here. ext.echo.styles.badge" [extensions/Echo] (wmf/1.41.0-wmf.20) - 10https://gerrit.wikimedia.org/r/943605 (https://phabricator.wikimedia.org/T335273) (owner: 10Urbanecm)
[17:03:18] <jinxer-wm>	 (KubernetesAPILatency) firing: High Kubernetes API latency (GET pods) on k8s-staging@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=eqiad&var-cluster=k8s-staging - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[17:05:19] <logmsgbot>	 !log mbsantos@deploy1002 Started deploy [kartotherian/deploy@ee544cb]: Update kartotherian to e28ea7ef (T334668 T332985 T332664 T329924)
[17:05:28] <stashbot>	 T329924: Update kartotherian to use mapdata 0.9.0 (external data is expanded in-place) - https://phabricator.wikimedia.org/T329924
[17:05:28] <stashbot>	 T332664: Kartotherian "Cannot read property 'coordinates' of null" - https://phabricator.wikimedia.org/T332664
[17:05:28] <stashbot>	 T334668: Host sprites and glyphs in kartotherian for Android WebGL map - https://phabricator.wikimedia.org/T334668
[17:05:29] <stashbot>	 T332985: Reduce kartotherian empty group logspam caused by Wikivoyage - https://phabricator.wikimedia.org/T332985
[17:07:50] <wikibugs>	 (03CR) 10Jbond: [V: 03+1 C: 03+2] config-master: add profile::discovery variables [puppet] - 10https://gerrit.wikimedia.org/r/944264 (https://phabricator.wikimedia.org/T341717) (owner: 10Jbond)
[17:08:18] <jinxer-wm>	 (KubernetesAPILatency) resolved: High Kubernetes API latency (GET pods) on k8s-staging@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=eqiad&var-cluster=k8s-staging - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[17:09:44] <logmsgbot>	 !log mbsantos@deploy1002 Finished deploy [kartotherian/deploy@ee544cb]: Update kartotherian to e28ea7ef (T334668 T332985 T332664 T329924) (duration: 04m 25s)
[17:11:00] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2157 (T342617)', diff saved to https://phabricator.wikimedia.org/P49924 and previous config saved to /var/cache/conftool/dbconfig/20230801-171059-ladsgroup.json
[17:11:01] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
[17:11:03] <stashbot>	 T342617: Make old columns of externallinks nullable - https://phabricator.wikimedia.org/T342617
[17:11:15] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
[17:11:21] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db2171:3315 (T342617)', diff saved to https://phabricator.wikimedia.org/P49925 and previous config saved to /var/cache/conftool/dbconfig/20230801-171120-ladsgroup.json
[17:11:23] <wikibugs>	 (03PS1) 10Jbond: config_master: ad conftool parameters [puppet] - 10https://gerrit.wikimedia.org/r/944265 (https://phabricator.wikimedia.org/T341717)
[17:14:15] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to Wiki Replicas end-to-end tiers for dr0ptp4kt - https://phabricator.wikimedia.org/T343039 (10dr0ptp4kt) Hi @fgiunchedi, using the config at https://wikitech.wikimedia.org/wiki/SRE/Production_access#Setting_up_your_SSH_config (after pinning the confirmed fingerp...
[17:15:17] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to Wiki Replicas end-to-end tiers for dr0ptp4kt - https://phabricator.wikimedia.org/T343039 (10dr0ptp4kt)
[17:16:03] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P49926 and previous config saved to /var/cache/conftool/dbconfig/20230801-171603-ladsgroup.json
[17:17:31] <wikibugs>	 (03CR) 10Jbond: ferm::service: Fix handling of multiple ports (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/944233 (https://phabricator.wikimedia.org/T336497) (owner: 10Muehlenhoff)
[17:18:11] <wikibugs>	 (03PS1) 10Fabfur: Remove dns3001 for reboot [puppet] - 10https://gerrit.wikimedia.org/r/944286 (https://phabricator.wikimedia.org/T335835)
[17:19:35] <wikibugs>	 (03CR) 10Jbond: idp_test: add datahub_staging as a OIDC service (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/944231 (https://phabricator.wikimedia.org/T305874) (owner: 10Stevemunene)
[17:20:56] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] config_master: ad conftool parameters [puppet] - 10https://gerrit.wikimedia.org/r/944265 (https://phabricator.wikimedia.org/T341717) (owner: 10Jbond)
[17:21:41] <wikibugs>	 (03PS2) 10Fabfur: Temporary depool dns3001 [puppet] - 10https://gerrit.wikimedia.org/r/944286 (https://phabricator.wikimedia.org/T335835)
[17:24:27] <wikibugs>	 (03CR) 10Ssingh: [C: 03+1] Temporary depool dns3001 [puppet] - 10https://gerrit.wikimedia.org/r/944286 (https://phabricator.wikimedia.org/T335835) (owner: 10Fabfur)
[17:24:50] <wikibugs>	 (03PS1) 10Herron: wip [grafana-grizzly] - 10https://gerrit.wikimedia.org/r/944287
[17:25:40] <wikibugs>	 (03CR) 10Fabfur: [C: 03+2] Temporary depool dns3001 [puppet] - 10https://gerrit.wikimedia.org/r/944286 (https://phabricator.wikimedia.org/T335835) (owner: 10Fabfur)
[17:26:02] <wikibugs>	 (03PS1) 10Jbond: O:config_master: add httpd profile [puppet] - 10https://gerrit.wikimedia.org/r/944288 (https://phabricator.wikimedia.org/T341717)
[17:26:45] <fabfur>	 !log running puppet on 'A:cumin or A:dns-rec or A:netbox' (https://gerrit.wikimedia.org/r/c/operations/puppet/+/944286) (T335835)
[17:26:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:26:49] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] O:config_master: add httpd profile [puppet] - 10https://gerrit.wikimedia.org/r/944288 (https://phabricator.wikimedia.org/T341717) (owner: 10Jbond)
[17:29:54] <wikibugs>	 (03PS1) 10Andrea Denisse: pontoon: Apply the 'alerting_host' role to the pontoon-alerting-host-01 host [puppet] - 10https://gerrit.wikimedia.org/r/944289 (https://phabricator.wikimedia.org/T333615)
[17:31:09] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1210 (T342617)', diff saved to https://phabricator.wikimedia.org/P49927 and previous config saved to /var/cache/conftool/dbconfig/20230801-173109-ladsgroup.json
[17:31:11] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
[17:31:13] <stashbot>	 T342617: Make old columns of externallinks nullable - https://phabricator.wikimedia.org/T342617
[17:31:22] <wikibugs>	 (03CR) 10Muehlenhoff: ferm::service: Fix handling of multiple ports (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/944233 (https://phabricator.wikimedia.org/T336497) (owner: 10Muehlenhoff)
[17:31:24] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
[17:31:30] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1213:3315 (T342617)', diff saved to https://phabricator.wikimedia.org/P49928 and previous config saved to /var/cache/conftool/dbconfig/20230801-173130-ladsgroup.json
[17:31:33] <wikibugs>	 (03PS4) 10Muehlenhoff: ferm::service: Fix handling of multiple ports [puppet] - 10https://gerrit.wikimedia.org/r/944233 (https://phabricator.wikimedia.org/T336497)
[17:33:49] <wikibugs>	 10SRE, 10serviceops, 10wikitech.wikimedia.org: Install php-ldap on all MW appservers - https://phabricator.wikimedia.org/T237889 (10bd808) 05Open→03Declined Closing in favor of {T292707} as it makes little sense at this point to consider putting Wikitech into legacy production hosting.
[17:34:47] <jinxer-wm>	 (ConfdResourceFailed) firing: (223) confd resource _srv_config-master_pybal_codfw_apertium.toml has errors - https://wikitech.wikimedia.org/wiki/Confd#Monitoring - https://grafana.wikimedia.org/d/OUJF1VI4k/confd - https://alerts.wikimedia.org/?q=alertname%3DConfdResourceFailed
[17:35:24] <sukhe>	 BGP alerts in esams expected
[17:35:47] <icinga-wm>	 PROBLEM - BGP status on cr2-esams is CRITICAL: BGP CRITICAL - AS64605/IPv4: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[17:36:29] <fabfur>	 !log stopped bird and disable puppet on dns3001 (T335835)
[17:36:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:36:57] <wikibugs>	 (03CR) 10Btullis: [C: 03+1] airflow-wmde: Add a postgresql database and user for airflow wmde [puppet] - 10https://gerrit.wikimedia.org/r/940961 (https://phabricator.wikimedia.org/T340648) (owner: 10Stevemunene)
[17:37:23] <logmsgbot>	 !log fabfur@cumin1001 START - Cookbook sre.hosts.reboot-single for host dns3001.wikimedia.org
[17:39:26] <wikibugs>	 10ops-codfw, 10DBA: codfw: es2025 lost System Board Fan6 - https://phabricator.wikimedia.org/T343254 (10Papaul)
[17:39:37] <icinga-wm>	 PROBLEM - BGP status on cr3-esams is CRITICAL: BGP CRITICAL - AS64605/IPv4: Connect - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[17:39:44] <wikibugs>	 10ops-codfw, 10DBA: codfw: es2025 lost System Board Fan6 - https://phabricator.wikimedia.org/T343254 (10Papaul) p:05Triage→03Medium
[17:39:47] <jinxer-wm>	 (ConfdResourceFailed) firing: (446) confd resource _srv_config-master_pybal_codfw_apertium.toml has errors - https://wikitech.wikimedia.org/wiki/Confd#Monitoring - https://grafana.wikimedia.org/d/OUJF1VI4k/confd - https://alerts.wikimedia.org/?q=alertname%3DConfdResourceFailed
[17:41:32] <logmsgbot>	 !log fabfur@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns3001.wikimedia.org
[17:41:33] <icinga-wm>	 PROBLEM - Host dns3001 is DOWN: PING CRITICAL - Packet loss = 100%
[17:41:35] <icinga-wm>	 RECOVERY - Host dns3001 is UP: PING OK - Packet loss = 0%, RTA = 94.09 ms
[17:41:37] <sukhe>	 sigh
[17:41:50] <sukhe>	 definitely a weird race condition with the reboot-single cookbook here
[17:42:00] <sukhe>	 or the command it calls
[17:42:41] <fabfur>	 !log started bird and enabled puppet on dns3001 (T335835)
[17:42:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:43:12] <sukhe>	 FYI, dns3001 is depooled from authdns_servers so nothing to worry 
[17:43:20] <sukhe>	 fwiw? fyi? 
[17:43:59] <icinga-wm>	 PROBLEM - Check if anycast-healthchecker and all configured threads are running on dns3001 is CRITICAL: CRITICAL: anycast-healthchecker could be down as pid file /var/run/anycast-healthchecker/anycast-healthchecker.pid doesnt exist https://wikitech.wikimedia.org/wiki/Anycast%23Anycast_healthchecker_not_running
[17:44:20] <sukhe>	 yeah all good
[17:44:25] <wikibugs>	 (03CR) 10Herron: pontoon: Apply the 'alerting_host' role to the pontoon-alerting-host-01 host (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/944289 (https://phabricator.wikimedia.org/T333615) (owner: 10Andrea Denisse)
[17:45:18] <wikibugs>	 (03PS1) 10Fabfur: Revert "Temporary depool dns3001" [puppet] - 10https://gerrit.wikimedia.org/r/944295
[17:45:27] <icinga-wm>	 RECOVERY - Check if anycast-healthchecker and all configured threads are running on dns3001 is OK: OK: UP (pid=3795) and all threads (2) are running https://wikitech.wikimedia.org/wiki/Anycast%23Anycast_healthchecker_not_running
[17:45:31] <icinga-wm>	 RECOVERY - BGP status on cr3-esams is OK: BGP OK - up: 20, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[17:46:43] <wikibugs>	 (03CR) 10Btullis: [C: 04-1] airflow-wmde: Create scap deployment source for wmde (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/940939 (https://phabricator.wikimedia.org/T340648) (owner: 10Stevemunene)
[17:46:50] <wikibugs>	 (03CR) 10Ssingh: [C: 03+1] Revert "Temporary depool dns3001" [puppet] - 10https://gerrit.wikimedia.org/r/944295 (owner: 10Fabfur)
[17:47:30] <wikibugs>	 (03CR) 10Fabfur: [C: 03+2] Revert "Temporary depool dns3001" [puppet] - 10https://gerrit.wikimedia.org/r/944295 (owner: 10Fabfur)
[17:48:07] <wikibugs>	 10SRE, 10ops-codfw, 10DBA: codfw: es2025 lost System Board Fan6 - https://phabricator.wikimedia.org/T343254 (10Marostegui) I will do that tomorrow and leave it ready for you to check
[17:48:52] <fabfur>	 !log running puppet on 'A:cumin or A:dns-rec or A:netbox' (https://gerrit.wikimedia.org/r/c/operations/puppet/+/944286) (T335835)
[17:48:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:53:13] <wikibugs>	 (03PS1) 10Fabfur: Revert "Move ntp.esams.wikimedia.org CNAME to reboot dns3001" [dns] - 10https://gerrit.wikimedia.org/r/944297
[17:53:35] <wikibugs>	 (03CR) 10Ssingh: [C: 03+1] Revert "Move ntp.esams.wikimedia.org CNAME to reboot dns3001" [dns] - 10https://gerrit.wikimedia.org/r/944297 (owner: 10Fabfur)
[17:54:16] <wikibugs>	 (03CR) 10Fabfur: [C: 03+2] Revert "Move ntp.esams.wikimedia.org CNAME to reboot dns3001" [dns] - 10https://gerrit.wikimedia.org/r/944297 (owner: 10Fabfur)
[17:54:20] <wikibugs>	 (03CR) 10Andrea Denisse: pontoon: Apply the 'alerting_host' role to the pontoon-alerting-host-01 host (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/944289 (https://phabricator.wikimedia.org/T333615) (owner: 10Andrea Denisse)
[17:55:39] <fabfur>	 !log running authdns-update on dns1004 to revert ntp.esams to dns3001  (T335835)
[17:55:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:55:55] <wikibugs>	 (03CR) 10Btullis: "I think that there's one other issue, which is that the `analytics-wmde` user to whom the keytabs belong is only created by the statistics" [puppet] - 10https://gerrit.wikimedia.org/r/940938 (https://phabricator.wikimedia.org/T340648) (owner: 10Stevemunene)
[17:56:41] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T342617)', diff saved to https://phabricator.wikimedia.org/P49929 and previous config saved to /var/cache/conftool/dbconfig/20230801-175641-ladsgroup.json
[17:56:44] <stashbot>	 T342617: Make old columns of externallinks nullable - https://phabricator.wikimedia.org/T342617
[17:59:32] <icinga-wm>	 PROBLEM - Check systemd state on config-master2001 is CRITICAL: CRITICAL - degraded: The following units failed: dump-conftool-pools.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[17:59:40] <icinga-wm>	 PROBLEM - Check systemd state on config-master1001 is CRITICAL: CRITICAL - degraded: The following units failed: dump-conftool-pools.service,envoyproxy.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[17:59:49] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt2006-dev.mgmt.codfw.wmnet with reboot policy FORCED
[18:00:05] <jouncebot>	 dancy and jnuche: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for MediaWiki train - Utc-7+Utc-0 Version deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230801T1800).
[18:00:36] <wikibugs>	 (03PS1) 10Jbond: config_master: add docs [puppet] - 10https://gerrit.wikimedia.org/r/944299 (https://phabricator.wikimedia.org/T341717)
[18:00:38] <wikibugs>	 (03PS1) 10Jbond: configmaster: add support to proxy the puppet sha1 files [puppet] - 10https://gerrit.wikimedia.org/r/944300 (https://phabricator.wikimedia.org/T336497)
[18:04:46] <icinga-wm>	 PROBLEM - config-master.wikimedia.org requires authentication on config-master1001 is CRITICAL: connect to address 10.64.0.110 and port 443: Connection refused https://wikitech.wikimedia.org/wiki/CAS-SSO/Administration
[18:05:10] <fabfur>	 !log adding dns3001 on cr2-esams and cr3-esams routing for ns2 (T335835)
[18:05:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:06:19] <dancy>	 o/
[18:06:40] <icinga-wm>	 PROBLEM - config-master.wikimedia.org tls expiry on config-master1001 is CRITICAL: connect to address 10.64.0.110 and port 443: Connection refused https://wikitech.wikimedia.org/wiki/CAS-SSO/Administration
[18:07:03] <wikibugs>	 (03PS1) 10TrainBranchBot: group0 wikis to 1.41.0-wmf.20 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/944302 (https://phabricator.wikimedia.org/T340248)
[18:07:05] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] group0 wikis to 1.41.0-wmf.20 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/944302 (https://phabricator.wikimedia.org/T340248) (owner: 10TrainBranchBot)
[18:07:47] <wikibugs>	 (03Merged) 10jenkins-bot: group0 wikis to 1.41.0-wmf.20 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/944302 (https://phabricator.wikimedia.org/T340248) (owner: 10TrainBranchBot)
[18:10:24] <icinga-wm>	 PROBLEM - config-master.wikimedia.org requires authentication on config-master2001 is CRITICAL: connect to address 10.192.0.15 and port 443: Connection refused https://wikitech.wikimedia.org/wiki/CAS-SSO/Administration
[18:10:47] <rzl>	 jbond: are you already looking at the config-master alerts?
[18:10:51] <sukhe>	 ^ is this known/something we should do something?
[18:11:25] <rzl>	 asking j.bond just because of the recent puppet patches that look relevant, I haven't started properly digging
[18:11:48] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P49930 and previous config saved to /var/cache/conftool/dbconfig/20230801-181147-ladsgroup.json
[18:12:18] <icinga-wm>	 PROBLEM - config-master.wikimedia.org tls expiry on config-master2001 is CRITICAL: connect to address 10.192.0.15 and port 443: Connection refused https://wikitech.wikimedia.org/wiki/CAS-SSO/Administration
[18:12:43] <sukhe>	 I am guessing the cfssl switch might be it
[18:12:58] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10cloud-services-team (Hardware): Q1:rack/setup/install cloudvirt200[4-6]-dev - https://phabricator.wikimedia.org/T342459 (10Papaul)
[18:15:00] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol2006-dev']
[18:15:27] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol2007-dev']
[18:15:28] <logmsgbot>	 !log dancy@deploy1002 rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.20  refs T340248
[18:15:31] <stashbot>	 T340248: 1.41.0-wmf.20 deployment blockers - https://phabricator.wikimedia.org/T340248
[18:16:36] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcontrol2006-dev']
[18:16:54] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcontrol2007-dev']
[18:17:31] <wikibugs>	 (03PS2) 10Jbond: configmaster: add support to proxy the puppet sha1 files [puppet] - 10https://gerrit.wikimedia.org/r/944300 (https://phabricator.wikimedia.org/T336497)
[18:17:58] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudnet2008-dev']
[18:18:41] <jbond>	 rzl: sorry missed the ping yes theses are not production yet ill add a silence
[18:18:56] <jbond>	 sorry for th noise
[18:19:00] <sukhe>	 jbond: <3
[18:19:20] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (CORE_DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/42747/console" [puppet] - 10https://gerrit.wikimedia.org/r/944300 (https://phabricator.wikimedia.org/T336497) (owner: 10Jbond)
[18:20:02] <rzl>	 rad, thank you!
[18:21:00] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudnet2007-dev']
[18:21:42] * jbond done
[18:26:25] <wikibugs>	 (03PS3) 10Jbond: configmaster: add support to proxy the puppet sha1 files [puppet] - 10https://gerrit.wikimedia.org/r/944300 (https://phabricator.wikimedia.org/T336497)
[18:26:54] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P49931 and previous config saved to /var/cache/conftool/dbconfig/20230801-182653-ladsgroup.json
[18:28:11] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] config_master: add docs [puppet] - 10https://gerrit.wikimedia.org/r/944299 (https://phabricator.wikimedia.org/T341717) (owner: 10Jbond)
[18:28:14] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/42748/console" [puppet] - 10https://gerrit.wikimedia.org/r/944300 (https://phabricator.wikimedia.org/T336497) (owner: 10Jbond)
[18:29:04] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudnet2008-dev']
[18:29:11] <wikibugs>	 (03CR) 10Jbond: [V: 03+1 C: 03+2] configmaster: add support to proxy the puppet sha1 files [puppet] - 10https://gerrit.wikimedia.org/r/944300 (https://phabricator.wikimedia.org/T336497) (owner: 10Jbond)
[18:30:15] <wikibugs>	 (03PS1) 10Jforrester: wikifunctions: Bump to image without stupendous output logging [deployment-charts] - 10https://gerrit.wikimedia.org/r/944304 (https://phabricator.wikimedia.org/T343176)
[18:30:20] <wikibugs>	 (03CR) 10Herron: [C: 03+1] pontoon: Apply the 'alerting_host' role to the pontoon-alerting-host-01 host (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/944289 (https://phabricator.wikimedia.org/T333615) (owner: 10Andrea Denisse)
[18:30:49] <James_F>	 jouncebot: nowandnext
[18:30:49] <jouncebot>	 For the next 1 hour(s) and 29 minute(s): MediaWiki train - Utc-7+Utc-0 Version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230801T1800)
[18:30:49] <jouncebot>	 In 1 hour(s) and 29 minute(s): UTC late backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230801T2000)
[18:31:13] <James_F>	 dancy: Can I sling out a service update for Wikifunctions?
[18:31:30] <dancy>	 Yep!
[18:31:50] <James_F>	 Okie.
[18:31:52] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1213:3315 (T342617)', diff saved to https://phabricator.wikimedia.org/P49932 and previous config saved to /var/cache/conftool/dbconfig/20230801-183151-ladsgroup.json
[18:31:55] <stashbot>	 T342617: Make old columns of externallinks nullable - https://phabricator.wikimedia.org/T342617
[18:32:02] <wikibugs>	 (03CR) 10Jforrester: [C: 03+2] wikifunctions: Bump to image without stupendous output logging [deployment-charts] - 10https://gerrit.wikimedia.org/r/944304 (https://phabricator.wikimedia.org/T343176) (owner: 10Jforrester)
[18:32:50] <dancy>	 stupendous.. haha
[18:32:51] <wikibugs>	 (03Merged) 10jenkins-bot: wikifunctions: Bump to image without stupendous output logging [deployment-charts] - 10https://gerrit.wikimedia.org/r/944304 (https://phabricator.wikimedia.org/T343176) (owner: 10Jforrester)
[18:33:04] <logmsgbot>	 !log jforrester@deploy1002 helmfile [staging] START helmfile.d/services/wikifunctions: apply
[18:33:07] <logmsgbot>	 !log jforrester@deploy1002 helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
[18:33:17] <logmsgbot>	 !log jforrester@deploy1002 helmfile [staging] START helmfile.d/services/wikifunctions: apply
[18:33:20] <logmsgbot>	 !log jforrester@deploy1002 helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
[18:35:51] <logmsgbot>	 !log jforrester@deploy1002 helmfile [staging] START helmfile.d/services/wikifunctions: apply
[18:36:29] <logmsgbot>	 !log jforrester@deploy1002 helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
[18:36:37] <logmsgbot>	 !log jforrester@deploy1002 helmfile [codfw] START helmfile.d/services/wikifunctions: apply
[18:37:49] <logmsgbot>	 !log jforrester@deploy1002 helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
[18:37:51] <logmsgbot>	 !log jforrester@deploy1002 helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
[18:37:52] <wikibugs>	 10SRE, 10API Platform, 10Anti-Harassment, 10Content-Transform-Team, and 19 others: Migrate PipelineLib repos to GitLab - https://phabricator.wikimedia.org/T332953 (10apaskulin)
[18:38:21] <wikibugs>	 10SRE, 10API Platform, 10Anti-Harassment, 10Content-Transform-Team, and 19 others: Migrate PipelineLib repos to GitLab - https://phabricator.wikimedia.org/T332953 (10apaskulin)
[18:39:43] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudnet2007-dev']
[18:39:53] <wikibugs>	 10SRE, 10API Platform, 10Anti-Harassment, 10Content-Transform-Team, and 19 others: Migrate PipelineLib repos to GitLab - https://phabricator.wikimedia.org/T332953 (10apaskulin)
[18:39:58] <logmsgbot>	 !log jforrester@deploy1002 helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
[18:40:20] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10User-aborrero, 10cloud-services-team (Hardware): Q1:rack/setup/install cloudcontrol200[6-8]-dev, cloudnet200[7-8]-dev - https://phabricator.wikimedia.org/T342456 (10Papaul)
[18:40:46] <wikibugs>	 10SRE, 10ops-codfw, 10DBA: codfw: es2025 lost System Board Fan6 - https://phabricator.wikimedia.org/T343254 (10Ladsgroup) I'm around if you want me to do it.
[18:42:00] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T342617)', diff saved to https://phabricator.wikimedia.org/P49933 and previous config saved to /var/cache/conftool/dbconfig/20230801-184159-ladsgroup.json
[18:42:02] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
[18:42:03] <stashbot>	 T342617: Make old columns of externallinks nullable - https://phabricator.wikimedia.org/T342617
[18:42:15] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
[18:42:21] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db2178 (T342617)', diff saved to https://phabricator.wikimedia.org/P49934 and previous config saved to /var/cache/conftool/dbconfig/20230801-184220-ladsgroup.json
[18:46:58] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1213:3315', diff saved to https://phabricator.wikimedia.org/P49935 and previous config saved to /var/cache/conftool/dbconfig/20230801-184657-ladsgroup.json
[18:50:18] <wikibugs>	 (03PS1) 10Cwhite: Revert "logstash remove wikifunctions response field" [puppet] - 10https://gerrit.wikimedia.org/r/944194 (https://phabricator.wikimedia.org/T343176)
[18:51:53] <wikibugs>	 (03PS1) 10Papaul: Add new cloud nodes to site.pp and netboot.cfg [puppet] - 10https://gerrit.wikimedia.org/r/944306 (https://phabricator.wikimedia.org/T342456)
[18:52:16] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Add new cloud nodes to site.pp and netboot.cfg [puppet] - 10https://gerrit.wikimedia.org/r/944306 (https://phabricator.wikimedia.org/T342456) (owner: 10Papaul)
[18:53:34] <wikibugs>	 10ops-eqiad: Inbound interface errors - https://phabricator.wikimedia.org/T342502 (10phaultfinder)
[18:55:09] <wikibugs>	 (03PS2) 10Papaul: Add new cloud nodes to site.pp and netboot.cfg [puppet] - 10https://gerrit.wikimedia.org/r/944306 (https://phabricator.wikimedia.org/T342456)
[18:55:17] <wikibugs>	 (03PS3) 10Papaul: Add new cloud nodes to site.pp and netboot.cfg [puppet] - 10https://gerrit.wikimedia.org/r/944306 (https://phabricator.wikimedia.org/T342456)
[18:56:22] <wikibugs>	 (03CR) 10Papaul: [C: 03+2] Add new cloud nodes to site.pp and netboot.cfg [puppet] - 10https://gerrit.wikimedia.org/r/944306 (https://phabricator.wikimedia.org/T342456) (owner: 10Papaul)
[18:56:50] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.reimage for host cloudcontrol2006-dev.codfw.wmnet with OS bullseye
[18:57:02] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10Patch-For-Review, and 2 others: Q1:rack/setup/install cloudcontrol200[6-8]-dev, cloudnet200[7-8]-dev - https://phabricator.wikimedia.org/T342456 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host cloudcontrol2006-dev.codfw.wm...
[19:01:45] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudnet2007-dev']
[19:02:04] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1213:3315', diff saved to https://phabricator.wikimedia.org/P49936 and previous config saved to /var/cache/conftool/dbconfig/20230801-190203-ladsgroup.json
[19:05:03] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol2006-dev']
[19:07:39] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudnet2007-dev']
[19:10:38] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcontrol2006-dev']
[19:11:23] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol2007-dev']
[19:11:46] <wikibugs>	 (03CR) 10BCornwall: init: Optimize puppet disabling on reboot (032 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/943620 (https://phabricator.wikimedia.org/T342182) (owner: 10BCornwall)
[19:17:10] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1213:3315 (T342617)', diff saved to https://phabricator.wikimedia.org/P49937 and previous config saved to /var/cache/conftool/dbconfig/20230801-191709-ladsgroup.json
[19:17:12] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
[19:17:13] <stashbot>	 T342617: Make old columns of externallinks nullable - https://phabricator.wikimedia.org/T342617
[19:17:25] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
[19:18:57] <wikibugs>	 (03PS1) 10Jforrester: WikiLambdaApiBase: Don't explode in dieWithZError() [extensions/WikiLambda] (wmf/1.41.0-wmf.19) - 10https://gerrit.wikimedia.org/r/944195 (https://phabricator.wikimedia.org/T343253)
[19:19:13] <wikibugs>	 (03PS1) 10Jforrester: WikiLambdaApiBase: Don't explode in dieWithZError() [extensions/WikiLambda] (wmf/1.41.0-wmf.20) - 10https://gerrit.wikimedia.org/r/944196 (https://phabricator.wikimedia.org/T343253)
[19:19:26] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2178 (T342617)', diff saved to https://phabricator.wikimedia.org/P49938 and previous config saved to /var/cache/conftool/dbconfig/20230801-191925-ladsgroup.json
[19:20:28] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcontrol2007-dev']
[19:28:17] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.reimage for host cloudcontrol2007-dev.codfw.wmnet with OS bullseye
[19:28:32] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10User-aborrero, 10cloud-services-team (Hardware): Q1:rack/setup/install cloudcontrol200[6-8]-dev, cloudnet200[7-8]-dev - https://phabricator.wikimedia.org/T342456 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host cloudcontr...
[19:28:44] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol2008-dev']
[19:30:29] <wikibugs>	 (03PS1) 10Jforrester: ApiFunctionCall: Actually check 'wikilambda-execute' before proceeding [extensions/WikiLambda] (wmf/1.41.0-wmf.19) - 10https://gerrit.wikimedia.org/r/944197
[19:30:39] <wikibugs>	 (03PS1) 10Jforrester: ApiFunctionCall: Actually check 'wikilambda-execute' before proceeding [extensions/WikiLambda] (wmf/1.41.0-wmf.20) - 10https://gerrit.wikimedia.org/r/944198
[19:31:16] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2006-dev.codfw.wmnet with reason: host reimage
[19:33:28] <wikibugs>	 (03CR) 10DVrandecic: [C: 03+1] WikiLambdaApiBase: Don't explode in dieWithZError() [extensions/WikiLambda] (wmf/1.41.0-wmf.19) - 10https://gerrit.wikimedia.org/r/944195 (https://phabricator.wikimedia.org/T343253) (owner: 10Jforrester)
[19:33:46] <wikibugs>	 (03CR) 10DVrandecic: [C: 03+1] WikiLambdaApiBase: Don't explode in dieWithZError() [extensions/WikiLambda] (wmf/1.41.0-wmf.20) - 10https://gerrit.wikimedia.org/r/944196 (https://phabricator.wikimedia.org/T343253) (owner: 10Jforrester)
[19:34:25] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2006-dev.codfw.wmnet with reason: host reimage
[19:34:32] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P49939 and previous config saved to /var/cache/conftool/dbconfig/20230801-193432-ladsgroup.json
[19:35:16] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcontrol2008-dev']
[19:46:15] <wikibugs>	 (03CR) 10AW GitLab Bot: "PAN-PAN: end-to-end deploy stage failed" [extensions/WikiLambda] (wmf/1.41.0-wmf.20) - 10https://gerrit.wikimedia.org/r/944198 (owner: 10Jforrester)
[19:48:52] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2007-dev.codfw.wmnet with reason: host reimage
[19:49:39] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P49940 and previous config saved to /var/cache/conftool/dbconfig/20230801-194938-ladsgroup.json
[19:50:57] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
[19:51:32] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudnet2008-dev']
[19:51:59] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2007-dev.codfw.wmnet with reason: host reimage
[19:52:07] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
[19:52:08] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2006-dev.codfw.wmnet with OS bullseye
[19:52:15] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10User-aborrero, 10cloud-services-team (Hardware): Q1:rack/setup/install cloudcontrol200[6-8]-dev, cloudnet200[7-8]-dev - https://phabricator.wikimedia.org/T342456 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host cloudcontrol20...
[19:53:33] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.reimage for host cloudcontrol2008-dev.codfw.wmnet with OS bullseye
[19:53:41] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10User-aborrero, 10cloud-services-team (Hardware): Q1:rack/setup/install cloudcontrol200[6-8]-dev, cloudnet200[7-8]-dev - https://phabricator.wikimedia.org/T342456 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host cloudcontr...
[19:56:10] <Dreamy_Jazz>	 !nowandnext
[19:56:23] <Dreamy_Jazz>	 nowandnext
[19:56:32] <Dreamy_Jazz>	 jouncebot: nowandnext
[19:56:32] <jouncebot>	 For the next 0 hour(s) and 3 minute(s): MediaWiki train - Utc-7+Utc-0 Version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230801T1800)
[19:56:32] <jouncebot>	 In 0 hour(s) and 3 minute(s): UTC late backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230801T2000)
[19:58:23] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudnet2008-dev']
[19:58:41] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
[19:58:54] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
[20:00:07] <jouncebot>	 RoanKattouw, Urbanecm, cjming, TheresNoTime, kindrobot, and taavi: OwO what's this, a deployment window?? UTC late backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230801T2000). nyaa~
[20:00:07] <jouncebot>	 Dreamy_Jazz and Jdlrobson: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[20:00:12] <Dreamy_Jazz>	 \o
[20:00:19] <wikibugs>	 (03PS1) 10Jforrester: onHtmlPageLinkRendererEnd: Fiddle more carefully with links so we don't over-write non-edit ones [extensions/WikiLambda] (wmf/1.41.0-wmf.19) - 10https://gerrit.wikimedia.org/r/944199 (https://phabricator.wikimedia.org/T343256)
[20:00:28] <Jdlrobson>	 o/
[20:00:51] <urbanecm>	 let's see
[20:01:12] <urbanecm>	 i can deploy if there's no one else :)
[20:01:19] <wikibugs>	 (03PS2) 10Urbanecm: Design: Provide wordmarks/taglines for Wikiversity projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/943614 (https://phabricator.wikimedia.org/T341256) (owner: 10Jdlrobson)
[20:01:21] <wikibugs>	 (03PS2) 10Urbanecm: Provide wordmarks for Wikivoyage projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/943617 (https://phabricator.wikimedia.org/T341259) (owner: 10Jdlrobson)
[20:01:28] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] Provide wordmarks for Wikivoyage projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/943617 (https://phabricator.wikimedia.org/T341259) (owner: 10Jdlrobson)
[20:01:31] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] Design: Provide wordmarks/taglines for Wikiversity projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/943614 (https://phabricator.wikimedia.org/T341256) (owner: 10Jdlrobson)
[20:01:48] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by urbanecm@deploy1002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/943614 (https://phabricator.wikimedia.org/T341256) (owner: 10Jdlrobson)
[20:01:50] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by urbanecm@deploy1002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/943617 (https://phabricator.wikimedia.org/T341259) (owner: 10Jdlrobson)
[20:02:10] <wikibugs>	 (03PS2) 10Dreamy Jazz: Write new on group0 for event table migration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/944168 (https://phabricator.wikimedia.org/T330158)
[20:02:20] <wikibugs>	 (03Merged) 10jenkins-bot: Provide wordmarks for Wikivoyage projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/943617 (https://phabricator.wikimedia.org/T341259) (owner: 10Jdlrobson)
[20:03:07] <wikibugs>	 (03PS3) 10Dreamy Jazz: Write new on group0 for event table migration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/944168 (https://phabricator.wikimedia.org/T330158)
[20:04:03] <wikibugs>	 (03PS3) 10Urbanecm: Design: Provide wordmarks/taglines for Wikiversity projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/943614 (https://phabricator.wikimedia.org/T341256) (owner: 10Jdlrobson)
[20:04:11] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] Design: Provide wordmarks/taglines for Wikiversity projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/943614 (https://phabricator.wikimedia.org/T341256) (owner: 10Jdlrobson)
[20:04:14] <wikibugs>	 (03PS1) 10Jforrester: onHtmlPageLinkRendererEnd: Fiddle more carefully with links so we don't over-write non-edit ones [extensions/WikiLambda] (wmf/1.41.0-wmf.20) - 10https://gerrit.wikimedia.org/r/944200 (https://phabricator.wikimedia.org/T343256)
[20:04:45] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2178 (T342617)', diff saved to https://phabricator.wikimedia.org/P49941 and previous config saved to /var/cache/conftool/dbconfig/20230801-200444-ladsgroup.json
[20:04:48] <stashbot>	 T342617: Make old columns of externallinks nullable - https://phabricator.wikimedia.org/T342617
[20:04:55] <wikibugs>	 (03Merged) 10jenkins-bot: Design: Provide wordmarks/taglines for Wikiversity projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/943614 (https://phabricator.wikimedia.org/T341256) (owner: 10Jdlrobson)
[20:08:19] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
[20:09:57] <urbanecm>	 okay, that failed merge stopped scap and i needed to restart. okay.
[20:10:07] <logmsgbot>	 !log urbanecm@deploy1002 Started scap: Backport for [[gerrit:943614|Design: Provide wordmarks/taglines for Wikiversity projects (T341256)]], [[gerrit:943617|Provide wordmarks for Wikivoyage projects (T341259)]]
[20:10:08] <Dreamy_Jazz>	 :(
[20:10:12] <stashbot>	 T341256: Design: Provide wordmarks/taglines for Wikiversity projects - https://phabricator.wikimedia.org/T341256
[20:10:13] <stashbot>	 T341259: Design: Provide wordmarks for Wikivoyage projects - https://phabricator.wikimedia.org/T341259
[20:10:23] <James_F>	 jouncebot: nowandnext
[20:10:24] <jouncebot>	 For the next 0 hour(s) and 49 minute(s): UTC late backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230801T2000)
[20:10:24] <jouncebot>	 In 9 hour(s) and 49 minute(s): MediaWiki infrastucture (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230802T0600)
[20:10:44] <urbanecm>	 James_F: want me to ping you once done?
[20:10:51] <James_F>	 urbanecm: That'd be great, thanks!
[20:10:53] <urbanecm>	 will do
[20:11:51] <logmsgbot>	 !log urbanecm@deploy1002 urbanecm and jdlrobson: Backport for [[gerrit:943614|Design: Provide wordmarks/taglines for Wikiversity projects (T341256)]], [[gerrit:943617|Provide wordmarks for Wikivoyage projects (T341259)]] synced to the testservers mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option
[20:11:51] <logmsgbot>	 )
[20:12:02] <urbanecm>	 Jdlrobson: please go ahead and test :)
[20:12:36] <Jdlrobson>	 looking
[20:12:42] <wikibugs>	 (03CR) 10Andrea Denisse: [C: 03+2] pontoon: Apply the 'alerting_host' role to the pontoon-alerting-host-01 host [puppet] - 10https://gerrit.wikimedia.org/r/944289 (https://phabricator.wikimedia.org/T333615) (owner: 10Andrea Denisse)
[20:13:43] <Jdlrobson>	 urbanecm: LGMT
[20:13:50] <urbanecm>	 proceeding
[20:13:51] <logmsgbot>	 !log urbanecm@deploy1002 urbanecm and jdlrobson: Continuing with sync
[20:13:56] <urbanecm>	 as scap says :)
[20:14:02] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2008-dev.codfw.wmnet with reason: host reimage
[20:15:23] <icinga-wm>	 PROBLEM - Check systemd state on build2001 is CRITICAL: CRITICAL - degraded: The following units failed: confd_prometheus_metrics.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[20:17:01] <Jdlrobson>	 yay
[20:17:21] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2008-dev.codfw.wmnet with reason: host reimage
[20:17:57] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
[20:17:58] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2007-dev.codfw.wmnet with OS bullseye
[20:18:06] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10User-aborrero, 10cloud-services-team (Hardware): Q1:rack/setup/install cloudcontrol200[6-8]-dev, cloudnet200[7-8]-dev - https://phabricator.wikimedia.org/T342456 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host cloudcontrol20...
[20:19:19] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.reimage for host cloudnet2007-dev.codfw.wmnet with OS bullseye
[20:19:27] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10User-aborrero, 10cloud-services-team (Hardware): Q1:rack/setup/install cloudcontrol200[6-8]-dev, cloudnet200[7-8]-dev - https://phabricator.wikimedia.org/T342456 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host cloudnet20...
[20:19:49] <logmsgbot>	 !log urbanecm@deploy1002 Finished scap: Backport for [[gerrit:943614|Design: Provide wordmarks/taglines for Wikiversity projects (T341256)]], [[gerrit:943617|Provide wordmarks for Wikivoyage projects (T341259)]] (duration: 09m 41s)
[20:19:53] <stashbot>	 T341256: Design: Provide wordmarks/taglines for Wikiversity projects - https://phabricator.wikimedia.org/T341256
[20:19:54] <stashbot>	 T341259: Design: Provide wordmarks for Wikivoyage projects - https://phabricator.wikimedia.org/T341259
[20:19:58] <urbanecm>	 and deployed Jdlrobson 
[20:20:11] <urbanecm>	 Dreamy_Jazz: ready for the CU patch?
[20:20:18] <Dreamy_Jazz>	 Yup
[20:20:37] <Dreamy_Jazz>	 On slower internet than usual, but should still be able to test.
[20:20:59] <wikibugs>	 (03PS4) 10Urbanecm: Write new on group0 for event table migration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/944168 (https://phabricator.wikimedia.org/T330158) (owner: 10Dreamy Jazz)
[20:21:04] <urbanecm>	 okay. let's go ahead!
[20:21:07] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] Write new on group0 for event table migration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/944168 (https://phabricator.wikimedia.org/T330158) (owner: 10Dreamy Jazz)
[20:21:37] <wikibugs>	 (03Merged) 10jenkins-bot: Write new on group0 for event table migration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/944168 (https://phabricator.wikimedia.org/T330158) (owner: 10Dreamy Jazz)
[20:22:19] <logmsgbot>	 !log urbanecm@deploy1002 Started scap: Backport for [[gerrit:944168|Write new on group0 for event table migration (T330158)]]
[20:22:22] <stashbot>	 T330158: Enable write new for the event table migration - https://phabricator.wikimedia.org/T330158
[20:23:34] <Jdlrobson>	 thanks urbanecm 
[20:23:38] <Jdlrobson>	 much appreciated as usual!
[20:23:51] <logmsgbot>	 !log urbanecm@deploy1002 urbanecm and dreamyjazz: Backport for [[gerrit:944168|Write new on group0 for event table migration (T330158)]] synced to the testservers mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
[20:23:58] <urbanecm>	 no problem
[20:24:05] <Dreamy_Jazz>	 Starting testing now.
[20:24:07] <urbanecm>	 Dreamy_Jazz: please test! especially at testcommons i guess :))
[20:24:16] <Dreamy_Jazz>	 :)
[20:25:43] <icinga-wm>	 RECOVERY - Check systemd state on build2001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[20:27:19] <Dreamy_Jazz>	 Can only perform login attempts to testcommonswiki as it has autocreation of accounts disabled (as the wiki is closed). There should be some events in the table "cu_private_event"
[20:28:35] <wikibugs>	 10ops-eqiad: Inbound interface errors - https://phabricator.wikimedia.org/T342502 (10phaultfinder)
[20:28:59] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.reimage for host cloudnet2008-dev.codfw.wmnet with OS bullseye
[20:29:07] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10User-aborrero, 10cloud-services-team (Hardware): Q1:rack/setup/install cloudcontrol200[6-8]-dev, cloudnet200[7-8]-dev - https://phabricator.wikimedia.org/T342456 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host cloudnet20...
[20:29:37] <Dreamy_Jazz>	 Logstash looks clean for the testing on testcommonswiki. Will test on testwikidatawiki for other actions.
[20:31:19] <urbanecm>	 okay
[20:31:35] <urbanecm>	 cu_private_event is non-empty
[20:31:46] <Dreamy_Jazz>	 urbanecm: Could I be granted confirmed rights on testwikidatawiki again? Can't move my sandbox again.
[20:31:58] <urbanecm>	 sure
[20:32:09] <urbanecm>	 done
[20:32:26] <urbanecm>	 granted indefinitely, you're trusted enough :-D
[20:33:01] <Dreamy_Jazz>	 :)
[20:33:42] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
[20:33:56] <Dreamy_Jazz>	 Nearly done with tests
[20:34:44] <urbanecm>	 Dreamy_Jazz: also autocreated you an account on the closed wiki (in case it comes helpful)
[20:34:51] <Dreamy_Jazz>	 Thanks.
[20:34:55] <Dreamy_Jazz>	 My part of testing is done.
[20:35:21] <Dreamy_Jazz>	 Can you please check if testwikidatawiki has rows in the tables "cu_private_event" and "cu_log_event"
[20:35:33] <urbanecm>	 sure
[20:35:43] <Dreamy_Jazz>	 Plus please check if there is a row in "cu_changes" that has the column "cuc_only_for_read_old" set to "1".
[20:35:51] <wikibugs>	 (03PS1) 10Jforrester: ApiFunctionCall,ApiPerformTest: Require higher privs for custom execution/test runs [extensions/WikiLambda] (wmf/1.41.0-wmf.20) - 10https://gerrit.wikimedia.org/r/944201
[20:36:37] <urbanecm>	 107 cuc_only_for_read_old rows, 105 in private events
[20:36:39] <urbanecm>	 some are recent
[20:37:00] <Dreamy_Jazz>	 That makes sense as read new was enabled for a while
[20:37:09] <urbanecm>	 yup
[20:37:15] <urbanecm>	 nothing dangerous in logstash
[20:37:18] <Dreamy_Jazz>	 Logstash looks clean from what I can see, so test is fine.
[20:37:21] <Dreamy_Jazz>	 Yup.
[20:37:32] <Dreamy_Jazz>	 Thanks!
[20:38:21] <urbanecm>	 so, let's go then!
[20:38:23] <logmsgbot>	 !log urbanecm@deploy1002 urbanecm and dreamyjazz: Continuing with sync
[20:38:25] <urbanecm>	 syncing :)
[20:39:20] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] ApiFunctionCall,ApiPerformTest: Require higher privs for custom execution/test runs [extensions/WikiLambda] (wmf/1.41.0-wmf.20) - 10https://gerrit.wikimedia.org/r/944201 (owner: 10Jforrester)
[20:39:48] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet2007-dev.codfw.wmnet with reason: host reimage
[20:40:01] <wikibugs>	 (03CR) 10Jforrester: [C: 03+2] WikiLambdaApiBase: Don't explode in dieWithZError() [extensions/WikiLambda] (wmf/1.41.0-wmf.19) - 10https://gerrit.wikimedia.org/r/944195 (https://phabricator.wikimedia.org/T343253) (owner: 10Jforrester)
[20:40:07] <wikibugs>	 (03CR) 10Jforrester: [C: 03+2] ApiFunctionCall: Actually check 'wikilambda-execute' before proceeding [extensions/WikiLambda] (wmf/1.41.0-wmf.19) - 10https://gerrit.wikimedia.org/r/944197 (owner: 10Jforrester)
[20:40:13] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] ApiFunctionCall,ApiPerformTest: Require higher privs for custom execution/test runs [extensions/WikiLambda] (wmf/1.41.0-wmf.19) - 10https://gerrit.wikimedia.org/r/944202 (owner: 10Jforrester)
[20:40:19] <wikibugs>	 (03CR) 10Jforrester: [C: 03+2] onHtmlPageLinkRendererEnd: Fiddle more carefully with links so we don't over-write non-edit ones [extensions/WikiLambda] (wmf/1.41.0-wmf.19) - 10https://gerrit.wikimedia.org/r/944199 (https://phabricator.wikimedia.org/T343256) (owner: 10Jforrester)
[20:40:25] <wikibugs>	 (03CR) 10Jforrester: [C: 03+2] ApiFunctionCall,ApiPerformTest: Require higher privs for custom execution/test runs [extensions/WikiLambda] (wmf/1.41.0-wmf.19) - 10https://gerrit.wikimedia.org/r/944202 (owner: 10Jforrester)
[20:42:43] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
[20:42:44] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2008-dev.codfw.wmnet with OS bullseye
[20:42:52] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10User-aborrero, 10cloud-services-team (Hardware): Q1:rack/setup/install cloudcontrol200[6-8]-dev, cloudnet200[7-8]-dev - https://phabricator.wikimedia.org/T342456 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host cloudcontrol20...
[20:43:17] <wikibugs>	 (03Merged) 10jenkins-bot: WikiLambdaApiBase: Don't explode in dieWithZError() [extensions/WikiLambda] (wmf/1.41.0-wmf.19) - 10https://gerrit.wikimedia.org/r/944195 (https://phabricator.wikimedia.org/T343253) (owner: 10Jforrester)
[20:43:19] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet2007-dev.codfw.wmnet with reason: host reimage
[20:43:38] <wikibugs>	 (03Merged) 10jenkins-bot: ApiFunctionCall: Actually check 'wikilambda-execute' before proceeding [extensions/WikiLambda] (wmf/1.41.0-wmf.19) - 10https://gerrit.wikimedia.org/r/944197 (owner: 10Jforrester)
[20:43:44] <wikibugs>	 (03Merged) 10jenkins-bot: onHtmlPageLinkRendererEnd: Fiddle more carefully with links so we don't over-write non-edit ones [extensions/WikiLambda] (wmf/1.41.0-wmf.19) - 10https://gerrit.wikimedia.org/r/944199 (https://phabricator.wikimedia.org/T343256) (owner: 10Jforrester)
[20:43:50] <wikibugs>	 (03Merged) 10jenkins-bot: ApiFunctionCall,ApiPerformTest: Require higher privs for custom execution/test runs [extensions/WikiLambda] (wmf/1.41.0-wmf.19) - 10https://gerrit.wikimedia.org/r/944202 (owner: 10Jforrester)
[20:44:05] <logmsgbot>	 !log urbanecm@deploy1002 Finished scap: Backport for [[gerrit:944168|Write new on group0 for event table migration (T330158)]] (duration: 21m 46s)
[20:44:08] <stashbot>	 T330158: Enable write new for the event table migration - https://phabricator.wikimedia.org/T330158
[20:44:12] <urbanecm>	 and we're live
[20:44:14] <urbanecm>	 anything else?
[20:44:19] <Dreamy_Jazz>	 No. Thanks.
[20:44:24] <urbanecm>	 no problem
[20:44:29] <urbanecm>	 James_F: floor is yours
[20:44:34] <James_F>	 Ack, thanks!
[20:45:08] <wikibugs>	 (03CR) 10Jforrester: [C: 03+2] WikiLambdaApiBase: Don't explode in dieWithZError() [extensions/WikiLambda] (wmf/1.41.0-wmf.20) - 10https://gerrit.wikimedia.org/r/944196 (https://phabricator.wikimedia.org/T343253) (owner: 10Jforrester)
[20:45:13] <wikibugs>	 (03CR) 10Jforrester: [C: 03+2] onHtmlPageLinkRendererEnd: Fiddle more carefully with links so we don't over-write non-edit ones [extensions/WikiLambda] (wmf/1.41.0-wmf.20) - 10https://gerrit.wikimedia.org/r/944200 (https://phabricator.wikimedia.org/T343256) (owner: 10Jforrester)
[20:45:17] <wikibugs>	 (03CR) 10Jforrester: [C: 03+2] ApiFunctionCall: Actually check 'wikilambda-execute' before proceeding [extensions/WikiLambda] (wmf/1.41.0-wmf.20) - 10https://gerrit.wikimedia.org/r/944198 (owner: 10Jforrester)
[20:45:20] <wikibugs>	 (03CR) 10Jforrester: [C: 03+2] ApiFunctionCall,ApiPerformTest: Require higher privs for custom execution/test runs [extensions/WikiLambda] (wmf/1.41.0-wmf.20) - 10https://gerrit.wikimedia.org/r/944201 (owner: 10Jforrester)
[20:48:59] <wikibugs>	 (03Merged) 10jenkins-bot: WikiLambdaApiBase: Don't explode in dieWithZError() [extensions/WikiLambda] (wmf/1.41.0-wmf.20) - 10https://gerrit.wikimedia.org/r/944196 (https://phabricator.wikimedia.org/T343253) (owner: 10Jforrester)
[20:49:16] <wikibugs>	 (03Merged) 10jenkins-bot: onHtmlPageLinkRendererEnd: Fiddle more carefully with links so we don't over-write non-edit ones [extensions/WikiLambda] (wmf/1.41.0-wmf.20) - 10https://gerrit.wikimedia.org/r/944200 (https://phabricator.wikimedia.org/T343256) (owner: 10Jforrester)
[20:49:18] <wikibugs>	 (03Merged) 10jenkins-bot: ApiFunctionCall: Actually check 'wikilambda-execute' before proceeding [extensions/WikiLambda] (wmf/1.41.0-wmf.20) - 10https://gerrit.wikimedia.org/r/944198 (owner: 10Jforrester)
[20:49:24] <wikibugs>	 (03Merged) 10jenkins-bot: ApiFunctionCall,ApiPerformTest: Require higher privs for custom execution/test runs [extensions/WikiLambda] (wmf/1.41.0-wmf.20) - 10https://gerrit.wikimedia.org/r/944201 (owner: 10Jforrester)
[20:49:32] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet2008-dev.codfw.wmnet with reason: host reimage
[20:51:19] <wikibugs>	 (03CR) 10AW GitLab Bot: "PAN-PAN: end-to-end deploy stage failed" [extensions/WikiLambda] (wmf/1.41.0-wmf.20) - 10https://gerrit.wikimedia.org/r/944201 (owner: 10Jforrester)
[20:51:28] <wikibugs>	 (03CR) 10AW GitLab Bot: "PAN-PAN: end-to-end deploy stage failed" [extensions/WikiLambda] (wmf/1.41.0-wmf.19) - 10https://gerrit.wikimedia.org/r/944202 (owner: 10Jforrester)
[20:52:44] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet2008-dev.codfw.wmnet with reason: host reimage
[20:55:47] <logmsgbot>	 !log jforrester@deploy1002 Synchronized ./php-1.41.0-wmf.19/extensions/WikiLambda/: T343253 T343256 (duration: 06m 58s)
[20:55:52] <stashbot>	 T343256: Wikifunction special links are wrongly taking over ?action=history, ?diff=prev etc. links - https://phabricator.wikimedia.org/T343256
[20:55:53] <stashbot>	 T343253: Some object changes or creations leads to ZErrorException on wikifunctions.org - https://phabricator.wikimedia.org/T343253
[20:56:34] <jinxer-wm>	 (KubernetesAPILatency) firing: High Kubernetes API latency (POST pods) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=codfw&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[20:56:41] <wikibugs>	 (03PS1) 10Jforrester: Wikifunctions: Restrict wikilambda-execute to functioneers for now [mediawiki-config] - 10https://gerrit.wikimedia.org/r/944316
[20:59:15] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
[21:01:34] <jinxer-wm>	 (KubernetesAPILatency) resolved: High Kubernetes API latency (POST pods) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=codfw&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[21:02:09] <icinga-wm>	 PROBLEM - Disk space on people1004 is CRITICAL: DISK CRITICAL - free space: / 1872MiB (2% inode=91%): /tmp 1872MiB (2% inode=91%): /var/tmp 1872MiB (2% inode=91%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=people1004&var-datasource=eqiad+prometheus/ops
[21:05:09] <logmsgbot>	 !log jforrester@deploy1002 Synchronized ./php-1.41.0-wmf.20/extensions/WikiLambda/: T343253 T343256 (duration: 07m 23s)
[21:05:14] <stashbot>	 T343256: Wikifunction special links are wrongly taking over ?action=history, ?diff=prev etc. links - https://phabricator.wikimedia.org/T343256
[21:05:14] <stashbot>	 T343253: Some object changes or creations leads to ZErrorException on wikifunctions.org - https://phabricator.wikimedia.org/T343253
[21:05:33] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by jforrester@deploy1002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/944316 (owner: 10Jforrester)
[21:07:00] <wikibugs>	 (03Merged) 10jenkins-bot: Wikifunctions: Restrict wikilambda-execute to functioneers for now [mediawiki-config] - 10https://gerrit.wikimedia.org/r/944316 (owner: 10Jforrester)
[21:07:29] <logmsgbot>	 !log jforrester@deploy1002 Started scap: Backport for [[gerrit:944316|Wikifunctions: Restrict wikilambda-execute to functioneers for now]]
[21:08:36] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
[21:09:05] <logmsgbot>	 !log jforrester@deploy1002 jforrester: Backport for [[gerrit:944316|Wikifunctions: Restrict wikilambda-execute to functioneers for now]] synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
[21:10:24] <wikibugs>	 (03PS1) 10Jdlrobson: Fix finnish projects, remove unused SVG/PNGs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/944318 (https://phabricator.wikimedia.org/T343278)
[21:10:31] <logmsgbot>	 !log jforrester@deploy1002 jforrester: Continuing with sync
[21:11:54] <wikibugs>	 (03PS1) 10Jforrester: Wikifunctions: Log the 'WikiLambda' warnings and above logs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/944319
[21:14:42] <jinxer-wm>	 (KubernetesAPILatency) firing: High Kubernetes API latency (DELETE pods) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=codfw&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[21:16:32] <logmsgbot>	 !log jforrester@deploy1002 Finished scap: Backport for [[gerrit:944316|Wikifunctions: Restrict wikilambda-execute to functioneers for now]] (duration: 09m 03s)
[21:17:56] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by jforrester@deploy1002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/944319 (owner: 10Jforrester)
[21:18:34] <wikibugs>	 (03Merged) 10jenkins-bot: Wikifunctions: Log the 'WikiLambda' warnings and above logs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/944319 (owner: 10Jforrester)
[21:19:02] <logmsgbot>	 !log jforrester@deploy1002 Started scap: Backport for [[gerrit:944319|Wikifunctions: Log the 'WikiLambda' warnings and above logs]]
[21:19:42] <jinxer-wm>	 (KubernetesAPILatency) resolved: High Kubernetes API latency (DELETE pods) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=codfw&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[21:20:46] <logmsgbot>	 !log jforrester@deploy1002 jforrester: Backport for [[gerrit:944319|Wikifunctions: Log the 'WikiLambda' warnings and above logs]] synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
[21:23:19] <logmsgbot>	 !log jforrester@deploy1002 jforrester: Continuing with sync
[21:29:24] <logmsgbot>	 !log jforrester@deploy1002 Finished scap: Backport for [[gerrit:944319|Wikifunctions: Log the 'WikiLambda' warnings and above logs]] (duration: 10m 22s)
[21:40:03] <jinxer-wm>	 (ConfdResourceFailed) firing: (446) confd resource _srv_config-master_pybal_codfw_apertium.toml has errors - https://wikitech.wikimedia.org/wiki/Confd#Monitoring - https://grafana.wikimedia.org/d/OUJF1VI4k/confd - https://alerts.wikimedia.org/?q=alertname%3DConfdResourceFailed
[21:46:00] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
[21:46:01] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudnet2007-dev.codfw.wmnet with OS bullseye
[21:46:09] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10User-aborrero, 10cloud-services-team (Hardware): Q1:rack/setup/install cloudcontrol200[6-8]-dev, cloudnet200[7-8]-dev - https://phabricator.wikimedia.org/T342456 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host cloudnet2007-d...
[21:46:12] <logmsgbot>	 !log pt1979@cumin2002 END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
[21:46:13] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudnet2008-dev.codfw.wmnet with OS bullseye
[21:46:20] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10User-aborrero, 10cloud-services-team (Hardware): Q1:rack/setup/install cloudcontrol200[6-8]-dev, cloudnet200[7-8]-dev - https://phabricator.wikimedia.org/T342456 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host cloudnet2008-d...
[21:57:10] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt2004-dev']
[22:00:14] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10User-aborrero, 10cloud-services-team (Hardware): Q1:rack/setup/install cloudcontrol200[6-8]-dev, cloudnet200[7-8]-dev - https://phabricator.wikimedia.org/T342456 (10Papaul)
[22:01:07] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10User-aborrero, 10cloud-services-team (Hardware): Q1:rack/setup/install cloudcontrol200[6-8]-dev, cloudnet200[7-8]-dev - https://phabricator.wikimedia.org/T342456 (10Papaul) 05Open→03Resolved @Andrew this is complete
[22:01:38] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt2005-dev']
[22:05:43] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt2006-dev']
[22:09:57] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt2004-dev']
[22:10:16] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt2004-dev']
[22:11:17] <logmsgbot>	 !log pt1979@cumin2002 END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt2005-dev']
[22:11:26] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt2005-dev']
[22:11:46] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt2005-dev']
[22:14:10] <logmsgbot>	 !log pt1979@cumin2002 END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt2004-dev']
[22:16:29] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt2006-dev']
[22:17:32] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt2004-dev']
[22:18:16] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt2005-dev']
[22:19:16] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt2006-dev']
[22:23:15] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt2004-dev']
[22:23:47] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt2005-dev']
[22:25:32] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt2006-dev']
[22:29:11] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.reimage for host cloudvirt2004-dev.codfw.wmnet with OS bullseye
[22:29:20] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10User-aborrero, 10cloud-services-team (Hardware): Q1:rack/setup/install cloudcontrol200[6-8]-dev, cloudnet200[7-8]-dev - https://phabricator.wikimedia.org/T342456 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host cloudvirt2...
[22:33:34] <wikibugs>	 10ops-eqiad: Inbound interface errors - https://phabricator.wikimedia.org/T342502 (10phaultfinder)
[22:38:36] <wikibugs>	 (03PS7) 10Krinkle: noc: don't use on-disk files but etcd directly [mediawiki-config] - 10https://gerrit.wikimedia.org/r/942672 (https://phabricator.wikimedia.org/T341859) (owner: 10Giuseppe Lavagetto)
[22:40:18] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.reimage for host cloudvirt2005-dev.codfw.wmnet with OS bullseye
[22:40:26] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10cloud-services-team (Hardware): Q1:rack/setup/install cloudvirt200[4-6]-dev - https://phabricator.wikimedia.org/T342459 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host cloudvirt2005-dev.codfw.wmnet with OS bullseye
[22:41:57] <wikibugs>	 (03CR) 10Krinkle: noc: don't use on-disk files but etcd directly (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/942672 (https://phabricator.wikimedia.org/T341859) (owner: 10Giuseppe Lavagetto)
[22:41:59] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10cloud-services-team (Hardware): Q1:rack/setup/install cloudvirt200[4-6]-dev - https://phabricator.wikimedia.org/T342459 (10Papaul)
[22:51:39] <icinga-wm>	 PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 140, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[22:52:35] <icinga-wm>	 PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 225, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[22:53:07] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.reimage for host cloudvirt2006-dev.codfw.wmnet with OS bullseye
[22:53:16] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10cloud-services-team (Hardware): Q1:rack/setup/install cloudvirt200[4-6]-dev - https://phabricator.wikimedia.org/T342459 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host cloudvirt2006-dev.codfw.wmnet with OS bullseye
[23:05:15] <wikibugs>	 (03PS1) 10Dreamy Jazz: Write new on group1 except wikidatawiki for event table migration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/944350 (https://phabricator.wikimedia.org/T330158)
[23:06:13] <wikibugs>	 (03PS2) 10Dreamy Jazz: Write new on group1 except wikidatawiki for event table migration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/944350 (https://phabricator.wikimedia.org/T330158)
[23:09:45] <icinga-wm>	 RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 226, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[23:10:23] <icinga-wm>	 RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 141, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[23:35:37] <wikibugs>	 (03CR) 10Krinkle: noc: centralize file list management (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/942673 (https://phabricator.wikimedia.org/T341859) (owner: 10Giuseppe Lavagetto)
[23:42:47] <wikibugs>	 (03CR) 10Krinkle: noc: add static file server (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/942674 (https://phabricator.wikimedia.org/T341859) (owner: 10Giuseppe Lavagetto)
[23:43:25] <wikibugs>	 (03PS1) 10Krinkle: noc: Fix various PHP errors that prevent db.php from working locally [mediawiki-config] - 10https://gerrit.wikimedia.org/r/944355 (https://phabricator.wikimedia.org/T341859)
[23:44:18] <jinxer-wm>	 (KubernetesAPILatency) firing: (2) High Kubernetes API latency (POST certificaterequests) on k8s-staging@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=codfw&var-cluster=k8s-staging - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[23:44:35] <wikibugs>	 (03PS2) 10Krinkle: noc: Fix various PHP errors that prevent db.php from working locally [mediawiki-config] - 10https://gerrit.wikimedia.org/r/944355 (https://phabricator.wikimedia.org/T341859)
[23:49:16] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: Average latency high: eqiad parsoid GET/200 - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=eqiad&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[23:49:18] <jinxer-wm>	 (KubernetesAPILatency) resolved: (2) High Kubernetes API latency (POST certificaterequests) on k8s-staging@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=codfw&var-cluster=k8s-staging - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[23:59:16] <jinxer-wm>	 (MediaWikiLatencyExceeded) resolved: Average latency high: eqiad parsoid GET/200 - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=eqiad&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded