[00:03:15] <jinxer-wm>	 (MediaWikiHighErrorRate) resolved: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[00:09:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[00:13:45] <jinxer-wm>	 (Primary outbound port utilisation over 80%  #page) firing: (2) Alert for device cr1-eqiad.wikimedia.org - Primary outbound port utilisation over 80%  #page   - https://alerts.wikimedia.org/?q=alertname%3DPrimary+outbound+port+utilisation+over+80%25++%23page
[00:14:15] <jinxer-wm>	 (MediaWikiHighErrorRate) resolved: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[00:15:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[00:19:20] <icinga-wm>	 PROBLEM - Check systemd state on an-web1001 is CRITICAL: CRITICAL - degraded: The following units failed: hardsync-published.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:20:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[00:25:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[00:30:06] <icinga-wm>	 RECOVERY - Check systemd state on an-web1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:30:15] <jinxer-wm>	 (MediaWikiHighErrorRate) resolved: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[00:38:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[00:38:28] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/993467
[00:38:34] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/993467 (owner: 10TrainBranchBot)
[00:43:15] <jinxer-wm>	 (MediaWikiHighErrorRate) resolved: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[00:48:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[00:53:15] <jinxer-wm>	 (MediaWikiHighErrorRate) resolved: Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[00:53:45] <jinxer-wm>	 (Primary outbound port utilisation over 80%  #page) firing: (2) Alert for device cr1-eqiad.wikimedia.org - Primary outbound port utilisation over 80%  #page   - https://alerts.wikimedia.org/?q=alertname%3DPrimary+outbound+port+utilisation+over+80%25++%23page
[00:55:38] <jinxer-wm>	 (ProbeDown) firing: (8) Service debmonitor1002:7443 has failed probes (http_debmonitor_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Debmonitor - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[00:58:45] <jinxer-wm>	 (Primary outbound port utilisation over 80%  #page) resolved: Device cr1-eqiad.wikimedia.org recovered from Primary outbound port utilisation over 80%  #page   - https://alerts.wikimedia.org/?q=alertname%3DPrimary+outbound+port+utilisation+over+80%25++%23page
[01:02:28] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/993467 (owner: 10TrainBranchBot)
[01:04:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[01:09:15] <jinxer-wm>	 (MediaWikiHighErrorRate) resolved: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[01:09:46] <icinga-wm>	 PROBLEM - Check systemd state on mwmaint2002 is CRITICAL: CRITICAL - degraded: The following units failed: mediawiki_job_generatecaptcha.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[01:11:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[01:16:15] <jinxer-wm>	 (MediaWikiHighErrorRate) resolved: Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[01:18:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[01:21:06] <jinxer-wm>	 (KubernetesCalicoDown) firing: ml-serve2004.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://grafana.wikimedia.org/d/G8zPL7-Wz/?var-dc=codfw%20prometheus%2Fk8s-mlserve&var-instance=ml-serve2004.codfw.wmnet - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[01:21:30] <wikibugs>	 (03PS1) 10Superpes15: [enwiktionary] Remove the Concordance namespace and its talk space [mediawiki-config] - 10https://gerrit.wikimedia.org/r/993457 (https://phabricator.wikimedia.org/T354813)
[01:28:15] <jinxer-wm>	 (MediaWikiHighErrorRate) resolved: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[01:29:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[01:33:31] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[01:38:31] <jinxer-wm>	 (MediaWikiHighErrorRate) resolved: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[01:38:54] <jinxer-wm>	 (RdfStreamingUpdaterSpaceUsageTooHigh) firing: (2) The RDF Streaming Updater is using more than 50GiB of storage - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterSpaceUsageTooHigh
[01:41:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[01:46:15] <jinxer-wm>	 (MediaWikiHighErrorRate) resolved: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[01:47:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[01:49:57] <wikibugs>	 (03PS1) 10Superpes15: [enwikiquote] Add a draft namespace and its talk space [mediawiki-config] - 10https://gerrit.wikimedia.org/r/993458 (https://phabricator.wikimedia.org/T355195)
[01:52:15] <jinxer-wm>	 (MediaWikiHighErrorRate) resolved: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[01:54:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[02:04:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[02:09:15] <jinxer-wm>	 (MediaWikiHighErrorRate) resolved: Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[02:10:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[02:15:15] <jinxer-wm>	 (MediaWikiHighErrorRate) resolved: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[02:31:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[02:36:15] <jinxer-wm>	 (MediaWikiHighErrorRate) resolved: Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[02:37:05] <jinxer-wm>	 (PuppetFailure) firing: Puppet has failed on debmonitor2003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure
[02:39:23] <jinxer-wm>	 (JobUnavailable) firing: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:55:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[03:00:15] <jinxer-wm>	 (MediaWikiHighErrorRate) resolved: Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[03:01:20] <icinga-wm>	 RECOVERY - Check systemd state on build2001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[03:09:23] <jinxer-wm>	 (JobUnavailable) resolved: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[03:33:22] <icinga-wm>	 PROBLEM - Disk space on build2001 is CRITICAL: DISK CRITICAL - free space: / 13055 MB (5% inode=65%): /tmp 13055 MB (5% inode=65%): /var/tmp 13055 MB (5% inode=65%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=build2001&var-datasource=codfw+prometheus/ops
[03:39:40] <icinga-wm>	 PROBLEM - Check systemd state on build2001 is CRITICAL: CRITICAL - degraded: The following units failed: docker-reporter-k8s-images.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[04:41:45] <jinxer-wm>	 (SwiftTooManyMediaUploads) firing: (2) Too many eqiad mediawiki originals uploads - https://wikitech.wikimedia.org/wiki/Swift/How_To#mediawiki_originals_uploads  - https://alerts.wikimedia.org/?q=alertname%3DSwiftTooManyMediaUploads
[04:55:38] <jinxer-wm>	 (ProbeDown) firing: (8) Service debmonitor1002:7443 has failed probes (http_debmonitor_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Debmonitor - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[05:21:06] <jinxer-wm>	 (KubernetesCalicoDown) firing: ml-serve2004.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://grafana.wikimedia.org/d/G8zPL7-Wz/?var-dc=codfw%20prometheus%2Fk8s-mlserve&var-instance=ml-serve2004.codfw.wmnet - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[05:31:45] <jinxer-wm>	 (SwiftTooManyMediaUploads) resolved: (2) Too many eqiad mediawiki originals uploads - https://wikitech.wikimedia.org/wiki/Swift/How_To#mediawiki_originals_uploads  - https://alerts.wikimedia.org/?q=alertname%3DSwiftTooManyMediaUploads
[05:38:54] <jinxer-wm>	 (RdfStreamingUpdaterSpaceUsageTooHigh) firing: (2) The RDF Streaming Updater is using more than 50GiB of storage - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterSpaceUsageTooHigh
[05:47:43] <wikibugs>	 (03PS1) 10Marostegui: mariadb: Decommission db1134 [puppet] - 10https://gerrit.wikimedia.org/r/993506 (https://phabricator.wikimedia.org/T355740)
[05:49:18] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.decommission for hosts db1134.eqiad.wmnet
[05:53:05] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] mariadb: Decommission db1134 [puppet] - 10https://gerrit.wikimedia.org/r/993506 (https://phabricator.wikimedia.org/T355740) (owner: 10Marostegui)
[05:54:38] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.dns.netbox
[05:56:37] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1134.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1002"
[05:57:43] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1134.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1002"
[05:57:43] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[05:57:44] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1134.eqiad.wmnet
[05:58:21] <wikibugs>	 10ops-eqiad, 10DBA, 10decommission-hardware, 10Patch-For-Review: decommission db1134.eqiad.wmnet - https://phabricator.wikimedia.org/T355740 (10Marostegui) a:05Marostegui→03None
[05:58:30] <wikibugs>	 10ops-eqiad, 10DBA, 10decommission-hardware, 10Patch-For-Review: decommission db1134.eqiad.wmnet - https://phabricator.wikimedia.org/T355740 (10Marostegui) Ready for #dc-ops
[06:03:23] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on db1165.eqiad.wmnet with reason: Maintenance
[06:03:37] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1165.eqiad.wmnet with reason: Maintenance
[06:03:38] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
[06:03:54] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
[06:04:01] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db1165 (T355609)', diff saved to https://phabricator.wikimedia.org/P55745 and previous config saved to /var/cache/conftool/dbconfig/20240129-060400-marostegui.json
[06:04:06] <stashbot>	 T355609: Make cuc_id a bigint - https://phabricator.wikimedia.org/T355609
[06:09:08] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1165 (T355609)', diff saved to https://phabricator.wikimedia.org/P55746 and previous config saved to /var/cache/conftool/dbconfig/20240129-060907-marostegui.json
[06:09:13] <stashbot>	 T355609: Make cuc_id a bigint - https://phabricator.wikimedia.org/T355609
[06:24:14] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P55747 and previous config saved to /var/cache/conftool/dbconfig/20240129-062414-marostegui.json
[06:33:03] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool db2129', diff saved to https://phabricator.wikimedia.org/P55750 and previous config saved to /var/cache/conftool/dbconfig/20240129-063302-marostegui.json
[06:37:05] <jinxer-wm>	 (PuppetFailure) firing: Puppet has failed on debmonitor2003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure
[06:38:36] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2129 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P55751 and previous config saved to /var/cache/conftool/dbconfig/20240129-063836-root.json
[06:39:21] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P55752 and previous config saved to /var/cache/conftool/dbconfig/20240129-063920-marostegui.json
[06:53:42] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2129 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P55754 and previous config saved to /var/cache/conftool/dbconfig/20240129-065341-root.json
[06:54:28] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1165 (T355609)', diff saved to https://phabricator.wikimedia.org/P55755 and previous config saved to /var/cache/conftool/dbconfig/20240129-065427-marostegui.json
[06:54:30] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on db1168.eqiad.wmnet with reason: Maintenance
[06:54:35] <stashbot>	 T355609: Make cuc_id a bigint - https://phabricator.wikimedia.org/T355609
[06:54:44] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1168.eqiad.wmnet with reason: Maintenance
[06:54:51] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db1168 (T355609)', diff saved to https://phabricator.wikimedia.org/P55756 and previous config saved to /var/cache/conftool/dbconfig/20240129-065450-marostegui.json
[07:00:00] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1168 (T355609)', diff saved to https://phabricator.wikimedia.org/P55757 and previous config saved to /var/cache/conftool/dbconfig/20240129-065959-marostegui.json
[07:00:12] <stashbot>	 T355609: Make cuc_id a bigint - https://phabricator.wikimedia.org/T355609
[07:08:47] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2129 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P55758 and previous config saved to /var/cache/conftool/dbconfig/20240129-070847-root.json
[07:15:06] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P55760 and previous config saved to /var/cache/conftool/dbconfig/20240129-071506-marostegui.json
[07:23:53] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2129 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P55761 and previous config saved to /var/cache/conftool/dbconfig/20240129-072352-root.json
[07:25:46] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/993170 (https://phabricator.wikimedia.org/T355606) (owner: 10AOkoth)
[07:28:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[07:29:11] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "@Arnold: The current access would be fine for Superset access, but they likely need more. I've pinged the task to ask for more context." [puppet] - 10https://gerrit.wikimedia.org/r/993170 (https://phabricator.wikimedia.org/T355606) (owner: 10AOkoth)
[07:29:31] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting analytics-privatedata-users access for amastilovic - https://phabricator.wikimedia.org/T355606 (10MoritzMuehlenhoff) @amastilovic @Ahoelzl Can you clarify what access you need specifically: https://wikitech.wikimedia.org/wiki/Analytics/Data_access#...
[07:30:13] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P55762 and previous config saved to /var/cache/conftool/dbconfig/20240129-073012-marostegui.json
[07:33:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[07:38:15] <jinxer-wm>	 (MediaWikiHighErrorRate) resolved: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[07:38:58] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2129 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P55763 and previous config saved to /var/cache/conftool/dbconfig/20240129-073857-root.json
[07:41:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[07:45:19] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1168 (T355609)', diff saved to https://phabricator.wikimedia.org/P55764 and previous config saved to /var/cache/conftool/dbconfig/20240129-074519-marostegui.json
[07:45:21] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on db1180.eqiad.wmnet with reason: Maintenance
[07:45:25] <stashbot>	 T355609: Make cuc_id a bigint - https://phabricator.wikimedia.org/T355609
[07:45:35] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1180.eqiad.wmnet with reason: Maintenance
[07:45:42] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db1180 (T355609)', diff saved to https://phabricator.wikimedia.org/P55765 and previous config saved to /var/cache/conftool/dbconfig/20240129-074541-marostegui.json
[07:46:04] <wikibugs>	 (03PS1) 10Marostegui: Revert "ProductionServices.php: Promote pc2014" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/993489
[07:46:15] <jinxer-wm>	 (MediaWikiHighErrorRate) resolved: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[07:47:24] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to wmf for arinaigum - https://phabricator.wikimedia.org/T355591 (10SLyngshede-WMF) @Arinaigu Your account should be fixed now. Please try to login to https://wikitech.wikimedia.org/ using "Arinaigum" as your username.
[07:48:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[07:50:44] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1180 (T355609)', diff saved to https://phabricator.wikimedia.org/P55766 and previous config saved to /var/cache/conftool/dbconfig/20240129-075044-marostegui.json
[07:50:50] <stashbot>	 T355609: Make cuc_id a bigint - https://phabricator.wikimedia.org/T355609
[07:58:15] <jinxer-wm>	 (MediaWikiHighErrorRate) resolved: Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[07:59:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[07:59:49] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/993191 (owner: 10JHathaway)
[08:00:05] <jouncebot>	 Amir1 and Urbanecm: I, the Bot under the Fountain, call upon thee, The Deployer, to do UTC morning backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240129T0800).
[08:00:05] <jouncebot>	 No Gerrit patches in the queue for this window AFAICS.
[08:04:15] <jinxer-wm>	 (MediaWikiHighErrorRate) resolved: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[08:05:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[08:05:51] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P55767 and previous config saved to /var/cache/conftool/dbconfig/20240129-080550-marostegui.json
[08:07:00] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good, thanks. You can just go ahead and merge, the change will land in the deb package soon with the next release." [debs/wmf-sre-laptop] - 10https://gerrit.wikimedia.org/r/993183 (owner: 10Scott French)
[08:10:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[08:13:41] <wikibugs>	 (03CR) 10Muehlenhoff: Puppet: Routed Ganeti support (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/990968 (https://phabricator.wikimedia.org/T300152) (owner: 10Ayounsi)
[08:15:15] <jinxer-wm>	 (MediaWikiHighErrorRate) resolved: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[08:17:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[08:20:57] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P55768 and previous config saved to /var/cache/conftool/dbconfig/20240129-082057-marostegui.json
[08:22:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[08:27:15] <jinxer-wm>	 (MediaWikiHighErrorRate) resolved: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[08:27:18] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] Revert "ProductionServices.php: Promote pc2014" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/993489 (owner: 10Marostegui)
[08:28:00] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "ProductionServices.php: Promote pc2014" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/993489 (owner: 10Marostegui)
[08:29:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[08:29:16] <logmsgbot>	 !log marostegui@deploy2002 Started scap: Backport for [[gerrit:993489|Revert "ProductionServices.php: Promote pc2014"]]
[08:34:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[08:34:18] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/990968 (https://phabricator.wikimedia.org/T300152) (owner: 10Ayounsi)
[08:36:04] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1180 (T355609)', diff saved to https://phabricator.wikimedia.org/P55769 and previous config saved to /var/cache/conftool/dbconfig/20240129-083603-marostegui.json
[08:36:08] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on db1187.eqiad.wmnet with reason: Maintenance
[08:36:09] <stashbot>	 T355609: Make cuc_id a bigint - https://phabricator.wikimedia.org/T355609
[08:36:21] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1187.eqiad.wmnet with reason: Maintenance
[08:36:28] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db1187 (T355609)', diff saved to https://phabricator.wikimedia.org/P55770 and previous config saved to /var/cache/conftool/dbconfig/20240129-083627-marostegui.json
[08:38:03] <wikibugs>	 (03PS1) 10Marostegui: Revert "pc2014: Move it to pc2" [puppet] - 10https://gerrit.wikimedia.org/r/993490
[08:39:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[08:39:33] <logmsgbot>	 !log marostegui@deploy2002 marostegui: Backport for [[gerrit:993489|Revert "ProductionServices.php: Promote pc2014"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[08:39:37] <logmsgbot>	 !log marostegui@deploy2002 marostegui: Continuing with sync
[08:40:39] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] Revert "pc2014: Move it to pc2" [puppet] - 10https://gerrit.wikimedia.org/r/993490 (owner: 10Marostegui)
[08:41:21] <wikibugs>	 (03PS1) 10Marostegui: Revert "pc2: Enable notifications on the master" [puppet] - 10https://gerrit.wikimedia.org/r/993491
[08:41:43] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1187 (T355609)', diff saved to https://phabricator.wikimedia.org/P55771 and previous config saved to /var/cache/conftool/dbconfig/20240129-084143-marostegui.json
[08:41:48] <stashbot>	 T355609: Make cuc_id a bigint - https://phabricator.wikimedia.org/T355609
[08:42:42] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] Revert "pc2: Enable notifications on the master" [puppet] - 10https://gerrit.wikimedia.org/r/993491 (owner: 10Marostegui)
[08:44:15] <jinxer-wm>	 (MediaWikiHighErrorRate) resolved: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[08:46:30] <logmsgbot>	 !log marostegui@deploy2002 Finished scap: Backport for [[gerrit:993489|Revert "ProductionServices.php: Promote pc2014"]] (duration: 17m 13s)
[08:49:18] <wikibugs>	 10SRE, 10ops-codfw, 10Data-Persistence, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack B5 from asw-b5-codfw to lsw1-b5-codfw - https://phabricator.wikimedia.org/T355549 (10Marostegui)
[08:55:39] <jinxer-wm>	 (ProbeDown) firing: (8) Service debmonitor1002:7443 has failed probes (http_debmonitor_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Debmonitor - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[08:56:50] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P55772 and previous config saved to /var/cache/conftool/dbconfig/20240129-085649-marostegui.json
[08:57:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[08:59:36] <wikibugs>	 10SRE-swift-storage, 10Commons, 10Internet-Archive: Error 503, Backend fetch failed while uploading file from Internet Archive - https://phabricator.wikimedia.org/T352215 (10MatthewVernon) That was due to an incident - T356022
[09:05:37] <wikibugs>	 (03CR) 10Ayounsi: "<3" [puppet] - 10https://gerrit.wikimedia.org/r/990968 (https://phabricator.wikimedia.org/T300152) (owner: 10Ayounsi)
[09:07:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[09:10:53] <icinga-wm>	 RECOVERY - Host ml-serve2004 is UP: PING OK - Packet loss = 0%, RTA = 36.34 ms
[09:11:03] <icinga-wm>	 RECOVERY - BGP status on cr1-codfw is OK: BGP OK - up: 241, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[09:11:15] <icinga-wm>	 PROBLEM - Check systemd state on ml-serve2004 is CRITICAL: CRITICAL - degraded: The following units failed: ferm.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[09:11:56] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P55773 and previous config saved to /var/cache/conftool/dbconfig/20240129-091156-marostegui.json
[09:11:57] <wikibugs>	 (03PS1) 10ArielGlenn: sql/xml dumps: add role for helper worker for wikidata full history dumps [puppet] - 10https://gerrit.wikimedia.org/r/993659 (https://phabricator.wikimedia.org/T252396)
[09:12:15] <jinxer-wm>	 (MediaWikiHighErrorRate) resolved: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[09:12:41] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on ml-serve2004 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[09:13:07] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] sql/xml dumps: add role for helper worker for wikidata full history dumps [puppet] - 10https://gerrit.wikimedia.org/r/993659 (https://phabricator.wikimedia.org/T252396) (owner: 10ArielGlenn)
[09:13:14] <XioNoX>	 !log disable Puppet on all the ganeti servers for CR990968 deployment - T300152
[09:13:17] <moritzm>	 !log upgrading python-pymysql in S7 DB hosts to 1.0.2-2~wmf11u1 T355531
[09:13:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:13:23] <stashbot>	 T300152: Investigate Ganeti in routed mode - https://phabricator.wikimedia.org/T300152
[09:13:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:13:29] <stashbot>	 T355531: Migrate all db-* scripts to Bookworm - https://phabricator.wikimedia.org/T355531
[09:15:11] <icinga-wm>	 RECOVERY - Check systemd state on ml-serve2004 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[09:15:13] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on ml-serve2004 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[09:15:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[09:15:50] <jinxer-wm>	 (KubernetesCalicoDown) resolved: ml-serve2004.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://grafana.wikimedia.org/d/G8zPL7-Wz/?var-dc=codfw%20prometheus%2Fk8s-mlserve&var-instance=ml-serve2004.codfw.wmnet - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[09:17:57] <godog>	 !log mark for deletetion and cleanup replicated thanos blocks for prometheus=ops, older than 3 months, all resolutions - T351927
[09:18:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:18:02] <stashbot>	 T351927: Decide and tweak Thanos retention - https://phabricator.wikimedia.org/T351927
[09:20:15] <jinxer-wm>	 (MediaWikiHighErrorRate) resolved: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[09:22:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[09:27:03] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1187 (T355609)', diff saved to https://phabricator.wikimedia.org/P55775 and previous config saved to /var/cache/conftool/dbconfig/20240129-092702-marostegui.json
[09:27:04] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on db1201.eqiad.wmnet with reason: Maintenance
[09:27:09] <stashbot>	 T355609: Make cuc_id a bigint - https://phabricator.wikimedia.org/T355609
[09:27:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[09:27:18] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1201.eqiad.wmnet with reason: Maintenance
[09:27:24] <wikibugs>	 10SRE, 10Wikimedia-Incident: 2024-01-28 (UTC) - Error 503: Our servers are currently under maintenance or experiencing a technical problem - https://phabricator.wikimedia.org/T356022 (10LSobanski) 05Open→03Resolved a:03LSobanski Resolving as services have been stable since the last update. This outage wa...
[09:27:24] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db1201 (T355609)', diff saved to https://phabricator.wikimedia.org/P55776 and previous config saved to /var/cache/conftool/dbconfig/20240129-092724-marostegui.json
[09:29:20] <wikibugs>	 (03PS3) 10Slyngshede: D:service::docker Run Docker prune on pull. [puppet] - 10https://gerrit.wikimedia.org/r/991353 (https://phabricator.wikimedia.org/T321851)
[09:30:33] <wikibugs>	 (03PS4) 10Slyngshede: D:service::docker Run Docker prune on pull. [puppet] - 10https://gerrit.wikimedia.org/r/991353 (https://phabricator.wikimedia.org/T321851)
[09:31:55] <wikibugs>	 (03CR) 10Slyngshede: D:service::docker Run Docker prune on pull. (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/991353 (https://phabricator.wikimedia.org/T321851) (owner: 10Slyngshede)
[09:32:15] <jinxer-wm>	 (MediaWikiHighErrorRate) resolved: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[09:32:17] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1201 (T355609)', diff saved to https://phabricator.wikimedia.org/P55777 and previous config saved to /var/cache/conftool/dbconfig/20240129-093216-marostegui.json
[09:32:22] <stashbot>	 T355609: Make cuc_id a bigint - https://phabricator.wikimedia.org/T355609
[09:33:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[09:37:00] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+2] Puppet: Routed Ganeti support [puppet] - 10https://gerrit.wikimedia.org/r/990968 (https://phabricator.wikimedia.org/T300152) (owner: 10Ayounsi)
[09:38:15] <jinxer-wm>	 (MediaWikiHighErrorRate) resolved: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[09:38:54] <jinxer-wm>	 (RdfStreamingUpdaterSpaceUsageTooHigh) firing: (2) The RDF Streaming Updater is using more than 50GiB of storage - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterSpaceUsageTooHigh
[09:40:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[09:41:23] <wikibugs>	 (03PS1) 10Filippo Giunchedi: sre: move MediaWikiEditFailures alert to global [alerts] - 10https://gerrit.wikimedia.org/r/993661 (https://phabricator.wikimedia.org/T350597)
[09:45:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[09:45:50] <icinga-wm>	 PROBLEM - Check systemd state on ml-serve1003 is CRITICAL: CRITICAL - degraded: The following units failed: ferm.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[09:46:20] <icinga-wm>	 PROBLEM - Check systemd state on ml-serve1001 is CRITICAL: CRITICAL - degraded: The following units failed: ferm.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[09:47:23] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P55778 and previous config saved to /var/cache/conftool/dbconfig/20240129-094722-marostegui.json
[09:51:02] <icinga-wm>	 RECOVERY - Disk space on ms-be1068 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=ms-be1068&var-datasource=eqiad+prometheus/ops
[09:52:55] <wikibugs>	 10Puppet, 10Wikidata, 10wmde-wikidata-tech, 10Technical-Debt, 10Wikidata Analytics (Kanban): Remove the WDCM clone (stats1007) - https://phabricator.wikimedia.org/T351072 (10Manuel)
[09:53:39] <wikibugs>	 (03PS1) 10Muehlenhoff: ganeti: Create /var/lib/ganeti/rapi in Puppet [puppet] - 10https://gerrit.wikimedia.org/r/993662 (https://phabricator.wikimedia.org/T300152)
[09:54:07] <wikibugs>	 (03PS2) 10ArielGlenn: sql/xml dumps: add role for helper worker for wikidata full history dumps [puppet] - 10https://gerrit.wikimedia.org/r/993659 (https://phabricator.wikimedia.org/T252396)
[09:54:49] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] ganeti: Create /var/lib/ganeti/rapi in Puppet [puppet] - 10https://gerrit.wikimedia.org/r/993662 (https://phabricator.wikimedia.org/T300152) (owner: 10Muehlenhoff)
[09:54:56] <wikibugs>	 10SRE, 10SRE-swift-storage, 10ops-eqiad, 10DC-Ops: Disk (sdl) failed in ms-be1068 - https://phabricator.wikimedia.org/T356033 (10MatthewVernon)
[09:55:16] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] sql/xml dumps: add role for helper worker for wikidata full history dumps [puppet] - 10https://gerrit.wikimedia.org/r/993659 (https://phabricator.wikimedia.org/T252396) (owner: 10ArielGlenn)
[09:55:41] <wikibugs>	 10SRE, 10SRE-swift-storage, 10ops-eqiad, 10DC-Ops: Disk (sdl) failed in ms-be1068 - https://phabricator.wikimedia.org/T356033 (10MatthewVernon) p:05Triage→03High
[09:56:17] <wikibugs>	 (03PS2) 10Muehlenhoff: ganeti: Create /var/lib/ganeti/rapi in Puppet [puppet] - 10https://gerrit.wikimedia.org/r/993662 (https://phabricator.wikimedia.org/T300152)
[09:56:28] <XioNoX>	 !log enable Puppet on all the ganeti servers for CR990968 deployment - T300152
[09:56:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:56:34] <stashbot>	 T300152: Investigate Ganeti in routed mode - https://phabricator.wikimedia.org/T300152
[09:57:42] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+1] ganeti: Create /var/lib/ganeti/rapi in Puppet [puppet] - 10https://gerrit.wikimedia.org/r/993662 (https://phabricator.wikimedia.org/T300152) (owner: 10Muehlenhoff)
[10:00:58] <moritzm>	 !log upload prometheus-ganeti-exporter 0.3+deb12u1 to apt.wikimedia.org T300152
[10:01:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:02:29] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P55779 and previous config saved to /var/cache/conftool/dbconfig/20240129-100229-marostegui.json
[10:04:54] <icinga-wm>	 PROBLEM - Check systemd state on ganeti2033 is CRITICAL: CRITICAL - degraded: The following units failed: prometheus-ganeti-exporter.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:05:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[10:05:24] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on ml-serve1003 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[10:07:14] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] ganeti: Create /var/lib/ganeti/rapi in Puppet [puppet] - 10https://gerrit.wikimedia.org/r/993662 (https://phabricator.wikimedia.org/T300152) (owner: 10Muehlenhoff)
[10:08:41] <wikibugs>	 (03PS1) 10MVernon: swift: remove drained ms-be20[44-50] from the rings [puppet] - 10https://gerrit.wikimedia.org/r/993664 (https://phabricator.wikimedia.org/T353149)
[10:09:50] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] swift: remove drained ms-be20[44-50] from the rings [puppet] - 10https://gerrit.wikimedia.org/r/993664 (https://phabricator.wikimedia.org/T353149) (owner: 10MVernon)
[10:10:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[10:12:20] <wikibugs>	 (03PS2) 10MVernon: swift: remove drained ms-be20[44-50] from the rings [puppet] - 10https://gerrit.wikimedia.org/r/993664 (https://phabricator.wikimedia.org/T353149)
[10:12:24] <wikibugs>	 (03PS1) 10Muehlenhoff: ganeti/rapi: Relax permissions for rapi directory [puppet] - 10https://gerrit.wikimedia.org/r/993665
[10:13:53] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+1] ganeti/rapi: Relax permissions for rapi directory [puppet] - 10https://gerrit.wikimedia.org/r/993665 (owner: 10Muehlenhoff)
[10:15:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[10:17:36] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1201 (T355609)', diff saved to https://phabricator.wikimedia.org/P55780 and previous config saved to /var/cache/conftool/dbconfig/20240129-101735-marostegui.json
[10:17:38] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on db1213.eqiad.wmnet with reason: Maintenance
[10:17:41] <stashbot>	 T355609: Make cuc_id a bigint - https://phabricator.wikimedia.org/T355609
[10:17:51] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1213.eqiad.wmnet with reason: Maintenance
[10:17:58] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db1213:3316 (T355609)', diff saved to https://phabricator.wikimedia.org/P55781 and previous config saved to /var/cache/conftool/dbconfig/20240129-101757-marostegui.json
[10:18:14] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] ganeti/rapi: Relax permissions for rapi directory [puppet] - 10https://gerrit.wikimedia.org/r/993665 (owner: 10Muehlenhoff)
[10:19:26] <icinga-wm>	 RECOVERY - Check systemd state on ms-be1068 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:19:57] <wikibugs>	 (03PS3) 10ArielGlenn: sql/xml dumps: add role for helper worker for wikidata full history dumps [puppet] - 10https://gerrit.wikimedia.org/r/993659 (https://phabricator.wikimedia.org/T252396)
[10:20:15] <jinxer-wm>	 (MediaWikiHighErrorRate) resolved: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[10:21:07] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] sql/xml dumps: add role for helper worker for wikidata full history dumps [puppet] - 10https://gerrit.wikimedia.org/r/993659 (https://phabricator.wikimedia.org/T252396) (owner: 10ArielGlenn)
[10:23:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[10:24:14] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1213:3316 (T355609)', diff saved to https://phabricator.wikimedia.org/P55782 and previous config saved to /var/cache/conftool/dbconfig/20240129-102414-marostegui.json
[10:24:20] <stashbot>	 T355609: Make cuc_id a bigint - https://phabricator.wikimedia.org/T355609
[10:25:27] <wikibugs>	 (03CR) 10Marostegui: [C: 03+1] swift: remove drained ms-be20[44-50] from the rings [puppet] - 10https://gerrit.wikimedia.org/r/993664 (https://phabricator.wikimedia.org/T353149) (owner: 10MVernon)
[10:25:50] <wikibugs>	 (03CR) 10MVernon: [C: 03+2] swift: remove drained ms-be20[44-50] from the rings [puppet] - 10https://gerrit.wikimedia.org/r/993664 (https://phabricator.wikimedia.org/T353149) (owner: 10MVernon)
[10:28:00] <wikibugs>	 (03PS1) 10Btullis: Add the wmde instance to cumin A:analytics-airflow alias [puppet] - 10https://gerrit.wikimedia.org/r/993667 (https://phabricator.wikimedia.org/T340648)
[10:28:15] <jinxer-wm>	 (MediaWikiHighErrorRate) resolved: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[10:29:46] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+2] Spicerack: Add support for routed Ganeti [software/spicerack] - 10https://gerrit.wikimedia.org/r/991325 (https://phabricator.wikimedia.org/T300152) (owner: 10Ayounsi)
[10:31:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[10:32:08] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/993667 (https://phabricator.wikimedia.org/T340648) (owner: 10Btullis)
[10:32:29] <icinga-wm>	 PROBLEM - Check systemd state on ganeti2034 is CRITICAL: CRITICAL - degraded: The following units failed: prometheus-ganeti-exporter.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:34:11] <wikibugs>	 (03PS2) 10Muehlenhoff: Remove obsolete Hiera entries for Ganeti PKI support [puppet] - 10https://gerrit.wikimedia.org/r/993099 (https://phabricator.wikimedia.org/T350686)
[10:34:55] <wikibugs>	 (03PS2) 10Effie Mouzeli: mw-mcrouter: add helmfile [deployment-charts] - 10https://gerrit.wikimedia.org/r/979363 (https://phabricator.wikimedia.org/T346690)
[10:35:41] <wikibugs>	 (03PS3) 10Slyngshede: P:debmonitor::server_package install Debmonitor from package. [puppet] - 10https://gerrit.wikimedia.org/r/993086
[10:35:46] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/993099 (https://phabricator.wikimedia.org/T350686) (owner: 10Muehlenhoff)
[10:36:05] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] mw-mcrouter: add helmfile [deployment-charts] - 10https://gerrit.wikimedia.org/r/979363 (https://phabricator.wikimedia.org/T346690) (owner: 10Effie Mouzeli)
[10:36:44] <wikibugs>	 (03Merged) 10jenkins-bot: Spicerack: Add support for routed Ganeti [software/spicerack] - 10https://gerrit.wikimedia.org/r/991325 (https://phabricator.wikimedia.org/T300152) (owner: 10Ayounsi)
[10:36:52] <wikibugs>	 (03CR) 10Muehlenhoff: "I don't think we should rename the role, this is already covered by a Hiera option?" [puppet] - 10https://gerrit.wikimedia.org/r/993086 (owner: 10Slyngshede)
[10:37:03] <wikibugs>	 (03CR) 10Btullis: [C: 03+2] Add the wmde instance to cumin A:analytics-airflow alias [puppet] - 10https://gerrit.wikimedia.org/r/993667 (https://phabricator.wikimedia.org/T340648) (owner: 10Btullis)
[10:37:05] <jinxer-wm>	 (PuppetFailure) firing: Puppet has failed on debmonitor2003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure
[10:37:37] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.mysql.clone Will create a clone of db2169.codfw.wmnet onto db2194.codfw.wmnet
[10:38:24] <logmsgbot>	 !log arnaudb@cumin1002 END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) Will create a clone of db2169.codfw.wmnet onto db2194.codfw.wmnet
[10:39:21] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1213:3316', diff saved to https://phabricator.wikimedia.org/P55783 and previous config saved to /var/cache/conftool/dbconfig/20240129-103920-marostegui.json
[10:39:43] <wikibugs>	 (03PS3) 10Effie Mouzeli: mw-mcrouter: add helmfile [deployment-charts] - 10https://gerrit.wikimedia.org/r/979363 (https://phabricator.wikimedia.org/T346690)
[10:43:48] <wikibugs>	 (03CR) 10Slyngshede: "Okay, seemed a lot cleaner to just do a new role and remove the old one later, but I'm good either way." [puppet] - 10https://gerrit.wikimedia.org/r/993086 (owner: 10Slyngshede)
[10:44:14] <wikibugs>	 (03PS4) 10ArielGlenn: sql/xml dumps: add role for helper worker for wikidata full history dumps [puppet] - 10https://gerrit.wikimedia.org/r/993659 (https://phabricator.wikimedia.org/T252396)
[10:44:28] <wikibugs>	 (03CR) 10Clément Goubert: [C: 03+1] sre: move MediaWikiEditFailures alert to global [alerts] - 10https://gerrit.wikimedia.org/r/993661 (https://phabricator.wikimedia.org/T350597) (owner: 10Filippo Giunchedi)
[10:44:51] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+1] "lgtm, nothing seems to use that hiera key anymore." [puppet] - 10https://gerrit.wikimedia.org/r/993099 (https://phabricator.wikimedia.org/T350686) (owner: 10Muehlenhoff)
[10:45:23] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] sql/xml dumps: add role for helper worker for wikidata full history dumps [puppet] - 10https://gerrit.wikimedia.org/r/993659 (https://phabricator.wikimedia.org/T252396) (owner: 10ArielGlenn)
[10:45:44] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Remove obsolete Hiera entries for Ganeti PKI support [puppet] - 10https://gerrit.wikimedia.org/r/993099 (https://phabricator.wikimedia.org/T350686) (owner: 10Muehlenhoff)
[10:46:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[10:47:06] <logmsgbot>	 !log btullis@cumin1002 START - Cookbook sre.hosts.reimage for host an-airflow1007.eqiad.wmnet with OS bullseye
[10:47:07] <logmsgbot>	 !log jiji@cumin1002 START - Cookbook sre.hosts.reboot-single for host mc1054.eqiad.wmnet
[10:47:12] <logmsgbot>	 !log jiji@cumin1002 START - Cookbook sre.hosts.reboot-single for host mc2054.codfw.wmnet
[10:49:02] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] sre: move MediaWikiEditFailures alert to global [alerts] - 10https://gerrit.wikimedia.org/r/993661 (https://phabricator.wikimedia.org/T350597) (owner: 10Filippo Giunchedi)
[10:49:49] <wikibugs>	 (03CR) 10Muehlenhoff: "Yeah, let's stick with the role as-is and re-use the existing OS bookworm conditional, if we rename the role this is quite disruptive in g" [puppet] - 10https://gerrit.wikimedia.org/r/993086 (owner: 10Slyngshede)
[10:50:14] <wikibugs>	 (03PS1) 10Ayounsi: Add routed ganeti VIP A record [dns] - 10https://gerrit.wikimedia.org/r/993669 (https://phabricator.wikimedia.org/T300152)
[10:51:26] <wikibugs>	 (03PS2) 10Ayounsi: Add routed ganeti VIP A record [dns] - 10https://gerrit.wikimedia.org/r/993669 (https://phabricator.wikimedia.org/T300152)
[10:52:14] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] puppet::agent: Remove path condition for /run/puppet/disabled [puppet] - 10https://gerrit.wikimedia.org/r/993063 (owner: 10Muehlenhoff)
[10:53:08] <logmsgbot>	 !log jiji@cumin1002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2054.codfw.wmnet
[10:53:19] <logmsgbot>	 !log jiji@cumin1002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1054.eqiad.wmnet
[10:54:27] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1213:3316', diff saved to https://phabricator.wikimedia.org/P55784 and previous config saved to /var/cache/conftool/dbconfig/20240129-105427-marostegui.json
[10:56:00] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [dns] - 10https://gerrit.wikimedia.org/r/993669 (https://phabricator.wikimedia.org/T300152) (owner: 10Ayounsi)
[10:56:39] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+2] Add routed ganeti VIP A record [dns] - 10https://gerrit.wikimedia.org/r/993669 (https://phabricator.wikimedia.org/T300152) (owner: 10Ayounsi)
[10:58:39] <icinga-wm>	 PROBLEM - Check systemd state on cumin1001 is CRITICAL: CRITICAL - degraded: The following units failed: httpbb_kubernetes_mw-api-ext_hourly.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[11:00:05] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240129T1100)
[11:00:51] <wikibugs>	 (03PS4) 10Slyngshede: P:debmonitor::server install Debmonitor from package. [puppet] - 10https://gerrit.wikimedia.org/r/993086
[11:01:05] <icinga-wm>	 PROBLEM - Check unit status of httpbb_kubernetes_mw-api-ext_hourly on cumin1001 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-api-ext_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[11:01:20] <wikibugs>	 (03CR) 10Effie Mouzeli: "(CC @JMeybohm) I see your points in terms of naming, security, and general readability, albeit there is very little chance we will need an" [deployment-charts] - 10https://gerrit.wikimedia.org/r/979363 (https://phabricator.wikimedia.org/T346690) (owner: 10Effie Mouzeli)
[11:01:38] <wikibugs>	 (03PS5) 10Slyngshede: P:debmonitor::server install Debmonitor from package. [puppet] - 10https://gerrit.wikimedia.org/r/993086
[11:01:52] <logmsgbot>	 !log btullis@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on an-airflow1007.eqiad.wmnet with reason: host reimage
[11:02:33] <icinga-wm>	 RECOVERY - Check systemd state on ganeti2034 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[11:03:51] <icinga-wm>	 RECOVERY - Check systemd state on cumin1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[11:04:24] <wikibugs>	 (03Abandoned) 10Effie Mouzeli: (DNM) Switch Mediawiki main memcache clusters to puppet 7: all hosts [puppet] - 10https://gerrit.wikimedia.org/r/990661 (https://phabricator.wikimedia.org/T349619) (owner: 10Effie Mouzeli)
[11:04:50] <wikibugs>	 (03CR) 10Effie Mouzeli: [C: 03+1] "I will merge after we are done rebooting all mc hosts" [puppet] - 10https://gerrit.wikimedia.org/r/992738 (https://phabricator.wikimedia.org/T349619) (owner: 10Muehlenhoff)
[11:05:03] <logmsgbot>	 !log btullis@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-airflow1007.eqiad.wmnet with reason: host reimage
[11:06:15] <jinxer-wm>	 (MediaWikiHighErrorRate) resolved: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[11:08:13] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/993086 (owner: 10Slyngshede)
[11:09:34] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1213:3316 (T355609)', diff saved to https://phabricator.wikimedia.org/P55785 and previous config saved to /var/cache/conftool/dbconfig/20240129-110933-marostegui.json
[11:09:36] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on db1224.eqiad.wmnet with reason: Maintenance
[11:09:39] <stashbot>	 T355609: Make cuc_id a bigint - https://phabricator.wikimedia.org/T355609
[11:09:49] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1224.eqiad.wmnet with reason: Maintenance
[11:09:56] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db1224 (T355609)', diff saved to https://phabricator.wikimedia.org/P55786 and previous config saved to /var/cache/conftool/dbconfig/20240129-110955-marostegui.json
[11:10:43] <wikibugs>	 (03CR) 10Slyngshede: [V: 03+1] "PCC SUCCESS (): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/1221/console" [puppet] - 10https://gerrit.wikimedia.org/r/993086 (owner: 10Slyngshede)
[11:10:52] <wikibugs>	 (03PS3) 10Effie Mouzeli: deployment_server: add mw-mcrouter service 1 [puppet] - 10https://gerrit.wikimedia.org/r/979339 (https://phabricator.wikimedia.org/T346690)
[11:11:06] <icinga-wm>	 RECOVERY - Check unit status of httpbb_kubernetes_mw-api-ext_hourly on cumin1001 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-api-ext_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[11:11:51] <wikibugs>	 (03CR) 10Slyngshede: [V: 03+1] "PCC SUCCESS (): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/1222/console" [puppet] - 10https://gerrit.wikimedia.org/r/993086 (owner: 10Slyngshede)
[11:12:15] <wikibugs>	 (03CR) 10MVernon: [C: 03+1] sessionstore: provision sessionstore1004 (new) [puppet] - 10https://gerrit.wikimedia.org/r/989628 (https://phabricator.wikimedia.org/T353402) (owner: 10Eevans)
[11:12:51] <wikibugs>	 (03CR) 10MVernon: [C: 03+1] sessionstore: provision sessionstore1005 (new) [puppet] - 10https://gerrit.wikimedia.org/r/989629 (https://phabricator.wikimedia.org/T353402) (owner: 10Eevans)
[11:13:16] <wikibugs>	 (03CR) 10Slyngshede: [V: 03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/1223/co" [puppet] - 10https://gerrit.wikimedia.org/r/993086 (owner: 10Slyngshede)
[11:13:27] <wikibugs>	 (03PS3) 10Effie Mouzeli: Add namespace for mw-mcrouter service 2 [deployment-charts] - 10https://gerrit.wikimedia.org/r/979340 (https://phabricator.wikimedia.org/T346690)
[11:13:33] <wikibugs>	 (03CR) 10MVernon: [C: 03+1] sessionstore: provision sessionstore1006 (new) [puppet] - 10https://gerrit.wikimedia.org/r/989630 (https://phabricator.wikimedia.org/T353402) (owner: 10Eevans)
[11:14:16] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[11:14:34] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1224 (T355609)', diff saved to https://phabricator.wikimedia.org/P55787 and previous config saved to /var/cache/conftool/dbconfig/20240129-111434-marostegui.json
[11:14:36] <wikibugs>	 (03PS9) 10Hnowlan: mobileapps: add Cassandra config support [deployment-charts] - 10https://gerrit.wikimedia.org/r/991032 (https://phabricator.wikimedia.org/T350507)
[11:14:42] <wikibugs>	 (03PS6) 10Slyngshede: P:debmonitor::server install Debmonitor from package. [puppet] - 10https://gerrit.wikimedia.org/r/993086
[11:14:46] <stashbot>	 T355609: Make cuc_id a bigint - https://phabricator.wikimedia.org/T355609
[11:16:03] <wikibugs>	 (03CR) 10Slyngshede: [V: 03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/1224/co" [puppet] - 10https://gerrit.wikimedia.org/r/993086 (owner: 10Slyngshede)
[11:19:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[11:19:54] <icinga-wm>	 PROBLEM - Check systemd state on netbox1002 is CRITICAL: CRITICAL - degraded: The following units failed: netbox_report_accounting_run.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[11:20:02] <icinga-wm>	 PROBLEM - Check systemd state on ganeti2034 is CRITICAL: CRITICAL - degraded: The following units failed: prometheus-ganeti-exporter.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[11:24:15] <jinxer-wm>	 (MediaWikiHighErrorRate) resolved: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[11:25:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[11:26:32] <wikibugs>	 (03CR) 10Muehlenhoff: P:debmonitor::server install Debmonitor from package. (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/993086 (owner: 10Slyngshede)
[11:27:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: Average latency high: eqiad mw-api-int (k8s) - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-api-int - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[11:28:07] <logmsgbot>	 !log btullis@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-airflow1007.eqiad.wmnet with OS bullseye
[11:29:41] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P55788 and previous config saved to /var/cache/conftool/dbconfig/20240129-112940-marostegui.json
[11:30:15] <jinxer-wm>	 (MediaWikiHighErrorRate) resolved: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[11:32:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[11:32:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) resolved: Average latency high: eqiad mw-api-int (k8s) - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-api-int - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[11:32:40] <icinga-wm>	 RECOVERY - Check systemd state on ganeti2034 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[11:33:44] <wikibugs>	 (03Abandoned) 10Fabfur: Add missing netmapper for abuse_networks [puppet] - 10https://gerrit.wikimedia.org/r/991409 (https://phabricator.wikimedia.org/T355158) (owner: 10Fabfur)
[11:37:15] <jinxer-wm>	 (MediaWikiHighErrorRate) resolved: Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[11:38:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[11:38:27] <moritzm>	 !log upload ganeti  3.0.2-3+wmf1 (bookworm package of Ganeti plus backport for SSL chain handling in RAPI) to apt.wikimedia.org T300152
[11:38:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:38:32] <stashbot>	 T300152: Investigate Ganeti in routed mode - https://phabricator.wikimedia.org/T300152
[11:39:49] <Dreamy_Jazz>	 !log T354700 - Ran mwscript maintenance/sql.php --wiki=testwiki ~/T354700-create-table.sql
[11:39:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:39:55] <stashbot>	 T354700: Draft: Add columns user_autocreate_serial.uas_year and global_user_autocreate_serial.uas_year - https://phabricator.wikimedia.org/T354700
[11:41:06] <Dreamy_Jazz>	 !log T354700 - Running `foreachwiki maintenance/sql.php ~/T354700-create-table.sql`
[11:41:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:42:10] <wikibugs>	 10SRE, 10MW-on-K8s, 10Traffic, 10serviceops, 10Release-Engineering-Team (Seen): Serve production traffic via Kubernetes - https://phabricator.wikimedia.org/T290536 (10Lucas_Werkmeister_WMDE)
[11:42:29] <wikibugs>	 (03PS1) 10Majavah: hieradata: cloudweb: enable envoy services_proxy on ipv6 [puppet] - 10https://gerrit.wikimedia.org/r/993673 (https://phabricator.wikimedia.org/T255568)
[11:43:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[11:44:40] <icinga-wm>	 RECOVERY - Check systemd state on netbox1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[11:44:47] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P55789 and previous config saved to /var/cache/conftool/dbconfig/20240129-114446-marostegui.json
[11:44:52] <wikibugs>	 10SRE, 10SRE-tools, 10Infrastructure-Foundations, 10Puppet-Core, and 5 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619 (10MoritzMuehlenhoff)
[11:45:08] <Dreamy_Jazz>	 !log sql.php finished for T354700 
[11:45:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:45:13] <stashbot>	 T354700: Draft: Add columns user_autocreate_serial.uas_year and global_user_autocreate_serial.uas_year - https://phabricator.wikimedia.org/T354700
[11:45:30] <wikibugs>	 (03CR) 10Majavah: [V: 03+1] "PCC SUCCESS (CORE_DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/1226/co" [puppet] - 10https://gerrit.wikimedia.org/r/993673 (https://phabricator.wikimedia.org/T255568) (owner: 10Majavah)
[11:48:15] <jinxer-wm>	 (MediaWikiHighErrorRate) resolved: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[11:49:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[11:49:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: Average latency high: eqiad mw-api-int (k8s) - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-api-int - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[11:53:04] <icinga-wm>	 RECOVERY - Check systemd state on ml-serve1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[11:53:18] <Dreamy_Jazz>	 !log Running mwscript maintenance/sql.php --wiki=testwiki --wikidb=centralauth ~/T354700-create-table-global.sql for T354700
[11:53:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:53:23] <stashbot>	 T354700: Draft: Add columns user_autocreate_serial.uas_year and global_user_autocreate_serial.uas_year - https://phabricator.wikimedia.org/T354700
[11:53:24] <icinga-wm>	 RECOVERY - Check systemd state on ml-serve1003 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[11:54:15] <jinxer-wm>	 (MediaWikiHighErrorRate) resolved: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[11:54:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) resolved: Average latency high: eqiad mw-api-int (k8s) - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-api-int - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[11:55:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[11:58:03] <wikibugs>	 (03PS1) 10Stevemunene: Add dummy keytabs for new an-worker1157-1175 [labs/private] - 10https://gerrit.wikimedia.org/r/993675 (https://phabricator.wikimedia.org/T353776)
[11:59:04] <icinga-wm>	 PROBLEM - Check systemd state on phab2002 is CRITICAL: CRITICAL - degraded: The following units failed: rsync-phabricator-repos.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[11:59:53] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1224 (T355609)', diff saved to https://phabricator.wikimedia.org/P55790 and previous config saved to /var/cache/conftool/dbconfig/20240129-115953-marostegui.json
[11:59:55] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on db1225.eqiad.wmnet with reason: Maintenance
[11:59:59] <stashbot>	 T355609: Make cuc_id a bigint - https://phabricator.wikimedia.org/T355609
[12:00:15] <jinxer-wm>	 (MediaWikiHighErrorRate) resolved: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[12:00:17] <icinga-wm>	 RECOVERY - Check systemd state on phab2002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:00:20] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1225.eqiad.wmnet with reason: Maintenance
[12:00:21] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.puppet.migrate-role for role: analytics_cluster::airflow::wmde
[12:01:29] <wikibugs>	 (03PS1) 10Muehlenhoff: Switch airflow/wmde to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/993676 (https://phabricator.wikimedia.org/T349619)
[12:05:22] <wikibugs>	 (03CR) 10Slyngshede: P:debmonitor::server install Debmonitor from package. (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/993086 (owner: 10Slyngshede)
[12:06:07] <wikibugs>	 (03PS7) 10Slyngshede: P:debmonitor::server install Debmonitor from package. [puppet] - 10https://gerrit.wikimedia.org/r/993086
[12:06:08] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on db1231.eqiad.wmnet with reason: Maintenance
[12:06:09] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on ml-serve1003 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[12:06:15] <wikibugs>	 (03CR) 10Slyngshede: P:debmonitor::server install Debmonitor from package. (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/993086 (owner: 10Slyngshede)
[12:06:22] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1231.eqiad.wmnet with reason: Maintenance
[12:06:28] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db1231 (T355609)', diff saved to https://phabricator.wikimedia.org/P55791 and previous config saved to /var/cache/conftool/dbconfig/20240129-120628-marostegui.json
[12:06:33] <stashbot>	 T355609: Make cuc_id a bigint - https://phabricator.wikimedia.org/T355609
[12:09:12] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Switch airflow/wmde to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/993676 (https://phabricator.wikimedia.org/T349619) (owner: 10Muehlenhoff)
[12:09:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: Average latency high: eqiad mw-api-int (k8s) - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-api-int - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[12:10:54] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Ship it :-)" [puppet] - 10https://gerrit.wikimedia.org/r/993086 (owner: 10Slyngshede)
[12:12:07] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1231 (T355609)', diff saved to https://phabricator.wikimedia.org/P55792 and previous config saved to /var/cache/conftool/dbconfig/20240129-121205-marostegui.json
[12:12:15] <stashbot>	 T355609: Make cuc_id a bigint - https://phabricator.wikimedia.org/T355609
[12:13:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[12:14:47] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: analytics_cluster::airflow::wmde
[12:17:53] <wikibugs>	 (03CR) 10Slyngshede: [C: 03+2] P:debmonitor::server install Debmonitor from package. [puppet] - 10https://gerrit.wikimedia.org/r/993086 (owner: 10Slyngshede)
[12:18:15] <jinxer-wm>	 (MediaWikiHighErrorRate) resolved: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[12:19:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) resolved: Average latency high: eqiad mw-api-int (k8s) - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-api-int - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[12:20:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[12:20:39] <icinga-wm>	 RECOVERY - Check systemd state on ganeti2033 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:20:49] <wikibugs>	 10SRE, 10SRE-tools, 10Infrastructure-Foundations, 10Puppet-Core, and 5 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619 (10MoritzMuehlenhoff)
[12:21:17] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host an-airflow1007.eqiad.wmnet
[12:25:15] <jinxer-wm>	 (MediaWikiHighErrorRate) resolved: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[12:25:18] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-airflow1007.eqiad.wmnet
[12:26:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[12:27:14] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P55793 and previous config saved to /var/cache/conftool/dbconfig/20240129-122713-marostegui.json
[12:27:21] <wikibugs>	 10SRE, 10LDAP: Missing Release Engineering members in LDAP group - https://phabricator.wikimedia.org/T356043 (10jnuche)
[12:31:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[12:33:47] <moritzm>	 !log installing openssh security updates
[12:33:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:36:09] <wikibugs>	 (03CR) 10Brouberol: [C: 03+1] Update the spark-operator image name and version [deployment-charts] - 10https://gerrit.wikimedia.org/r/993012 (https://phabricator.wikimedia.org/T354273) (owner: 10Btullis)
[12:37:21] <wikibugs>	 (03CR) 10FNegri: "This looks good, but I'm confused by the difference with the "-standalone" images. I left a comment in the task." [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/991595 (https://phabricator.wikimedia.org/T355231) (owner: 10Majavah)
[12:41:15] <jinxer-wm>	 (MediaWikiHighErrorRate) resolved: Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[12:42:20] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P55794 and previous config saved to /var/cache/conftool/dbconfig/20240129-124220-marostegui.json
[12:42:24] <wikibugs>	 (03CR) 10Hnowlan: [C: 03+2] kubernetes: make 5 jobrunners kubernetes workers [puppet] - 10https://gerrit.wikimedia.org/r/992973 (https://phabricator.wikimedia.org/T354791) (owner: 10Hnowlan)
[12:50:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[12:51:50] <jinxer-wm>	 (PuppetFailure) resolved: Puppet has failed on debmonitor2003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure
[12:55:15] <jinxer-wm>	 (MediaWikiHighErrorRate) resolved: Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[12:56:11] <logmsgbot>	 !log hnowlan@cumin2002 START - Cookbook sre.hosts.reimage for host mw2260.codfw.wmnet with OS bullseye
[12:56:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[12:56:25] <wikibugs>	 10SRE, 10MW-on-K8s, 10serviceops, 10Patch-For-Review: Reclaim jobrunner hardware for k8s - https://phabricator.wikimedia.org/T354791 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin2002 for host mw2260.codfw.wmnet with OS bullseye
[12:57:23] <logmsgbot>	 !log hnowlan@cumin2002 START - Cookbook sre.hosts.reimage for host mw2355.codfw.wmnet with OS bullseye
[12:57:27] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1231 (T355609)', diff saved to https://phabricator.wikimedia.org/P55795 and previous config saved to /var/cache/conftool/dbconfig/20240129-125726-marostegui.json
[12:57:29] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
[12:57:32] <stashbot>	 T355609: Make cuc_id a bigint - https://phabricator.wikimedia.org/T355609
[12:57:38] <wikibugs>	 10SRE, 10MW-on-K8s, 10serviceops, 10Patch-For-Review: Reclaim jobrunner hardware for k8s - https://phabricator.wikimedia.org/T354791 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin2002 for host mw2355.codfw.wmnet with OS bullseye
[12:57:42] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
[12:57:56] <wikibugs>	 (03CR) 10Cathal Mooney: [C: 03+1] "LGTM!" [homer/public] - 10https://gerrit.wikimedia.org/r/993090 (https://phabricator.wikimedia.org/T300152) (owner: 10Ayounsi)
[12:58:44] <logmsgbot>	 !log hnowlan@cumin2002 START - Cookbook sre.hosts.reimage for host mw2381.codfw.wmnet with OS bullseye
[12:58:49] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on db2097.codfw.wmnet with reason: Maintenance
[12:58:57] <wikibugs>	 10SRE, 10MW-on-K8s, 10serviceops, 10Patch-For-Review: Reclaim jobrunner hardware for k8s - https://phabricator.wikimedia.org/T354791 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin2002 for host mw2381.codfw.wmnet with OS bullseye
[12:59:03] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2097.codfw.wmnet with reason: Maintenance
[12:59:21] <logmsgbot>	 !log hnowlan@cumin2002 START - Cookbook sre.hosts.reimage for host mw2429.codfw.wmnet with OS bullseye
[12:59:21] <jinxer-wm>	 (ProbeDown) firing: (8) Service debmonitor1002:7443 has failed probes (http_debmonitor_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Debmonitor - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[12:59:34] <wikibugs>	 10SRE, 10MW-on-K8s, 10serviceops, 10Patch-For-Review: Reclaim jobrunner hardware for k8s - https://phabricator.wikimedia.org/T354791 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin2002 for host mw2429.codfw.wmnet with OS bullseye
[13:00:34] <logmsgbot>	 !log hnowlan@cumin2002 START - Cookbook sre.hosts.reimage for host mw2445.codfw.wmnet with OS bullseye
[13:00:46] <wikibugs>	 10SRE, 10MW-on-K8s, 10serviceops, 10Patch-For-Review: Reclaim jobrunner hardware for k8s - https://phabricator.wikimedia.org/T354791 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin2002 for host mw2445.codfw.wmnet with OS bullseye
[13:01:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[13:02:33] <icinga-wm>	 PROBLEM - Check systemd state on kubernetes2055 is CRITICAL: CRITICAL - degraded: The following units failed: ferm.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:05:15] <jinxer-wm>	 (AppserversUnreachable) firing: Appserver unavailable for cluster jobrunner at codfw - https://wikitech.wikimedia.org/wiki/Application_servers - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?orgId=1&var-site=codfw&var-cluster=jobrunner - https://alerts.wikimedia.org/?q=alertname%3DAppserversUnreachable
[13:06:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[13:07:05] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on db2117.codfw.wmnet with reason: Maintenance
[13:07:18] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2117.codfw.wmnet with reason: Maintenance
[13:07:25] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db2117 (T355609)', diff saved to https://phabricator.wikimedia.org/P55796 and previous config saved to /var/cache/conftool/dbconfig/20240129-130724-marostegui.json
[13:07:26] <logmsgbot>	 !log brouberol@cumin1002 START - Cookbook sre.hosts.reimage for host an-tool1009.eqiad.wmnet with OS bullseye
[13:07:34] <stashbot>	 T355609: Make cuc_id a bigint - https://phabricator.wikimedia.org/T355609
[13:09:05] <wikibugs>	 (03PS1) 10Volans: setup.py: add missing classifier for Python 3.11 [software/spicerack] - 10https://gerrit.wikimedia.org/r/993687
[13:09:07] <wikibugs>	 (03PS1) 10Volans: CHANGELOG: add changelogs for release v8.3.0 [software/spicerack] - 10https://gerrit.wikimedia.org/r/993688
[13:10:23] <wikibugs>	 (03PS2) 10Volans: CHANGELOG: add changelogs for release v8.3.0 [software/spicerack] - 10https://gerrit.wikimedia.org/r/993688
[13:10:32] <wikibugs>	 (03PS2) 10Slyngshede: Add dependencies for Jquery and debmonitor-client [software/debmonitor] (debian) - 10https://gerrit.wikimedia.org/r/993083
[13:10:38] <wikibugs>	 10SRE-swift-storage, 10UploadWizard: Problem with uploading large files (2 GB) - https://phabricator.wikimedia.org/T355433 (10Jeff_G) >>! In T355433#9485168, @MikhasikRV wrote: >>>! In T355433#9484879, @Jeff_G wrote: >>  >> I was able to download the file as F 1-74-0217.PDF. In case one of us gets it to upload...
[13:11:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[13:12:21] <logmsgbot>	 !log hnowlan@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on mw2260.codfw.wmnet with reason: host reimage
[13:13:25] <logmsgbot>	 !log hnowlan@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on mw2355.codfw.wmnet with reason: host reimage
[13:13:28] <wikibugs>	 (03PS3) 10Slyngshede: Add dependencies for Jquery and debmonitor-client [software/debmonitor] (debian) - 10https://gerrit.wikimedia.org/r/993083
[13:13:55] <wikibugs>	 (03CR) 10Slyngshede: Add dependencies for Jquery and debmonitor-client (031 comment) [software/debmonitor] (debian) - 10https://gerrit.wikimedia.org/r/993083 (owner: 10Slyngshede)
[13:14:48] <logmsgbot>	 !log hnowlan@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on mw2381.codfw.wmnet with reason: host reimage
[13:15:31] <logmsgbot>	 !log hnowlan@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2260.codfw.wmnet with reason: host reimage
[13:16:24] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2117 (T355609)', diff saved to https://phabricator.wikimedia.org/P55797 and previous config saved to /var/cache/conftool/dbconfig/20240129-131623-marostegui.json
[13:16:32] <stashbot>	 T355609: Make cuc_id a bigint - https://phabricator.wikimedia.org/T355609
[13:16:35] <logmsgbot>	 !log hnowlan@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on mw2429.codfw.wmnet with reason: host reimage
[13:16:43] <logmsgbot>	 !log brouberol@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on an-tool1009.eqiad.wmnet with reason: host reimage
[13:17:25] <logmsgbot>	 !log hnowlan@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on mw2445.codfw.wmnet with reason: host reimage
[13:18:11] <logmsgbot>	 !log hnowlan@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2381.codfw.wmnet with reason: host reimage
[13:20:15] <jinxer-wm>	 (AppserversUnreachable) resolved: Appserver unavailable for cluster jobrunner at codfw - https://wikitech.wikimedia.org/wiki/Application_servers - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?orgId=1&var-site=codfw&var-cluster=jobrunner - https://alerts.wikimedia.org/?q=alertname%3DAppserversUnreachable
[13:20:56] <logmsgbot>	 !log hnowlan@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2429.codfw.wmnet with reason: host reimage
[13:21:15] <jinxer-wm>	 (MediaWikiHighErrorRate) resolved: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[13:22:05] <wikibugs>	 (03CR) 10Volans: [C: 03+2] setup.py: add missing classifier for Python 3.11 [software/spicerack] - 10https://gerrit.wikimedia.org/r/993687 (owner: 10Volans)
[13:23:06] <wikibugs>	 (03CR) 10Volans: [C: 03+2] CHANGELOG: add changelogs for release v8.3.0 [software/spicerack] - 10https://gerrit.wikimedia.org/r/993688 (owner: 10Volans)
[13:23:10] <logmsgbot>	 !log brouberol@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-tool1009.eqiad.wmnet with reason: host reimage
[13:23:16] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on kubernetes2055 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[13:23:21] <logmsgbot>	 !log btullis@cumin1002 START - Cookbook sre.hosts.reimage for host an-airflow1006.eqiad.wmnet with OS bullseye
[13:25:56] <logmsgbot>	 !log hnowlan@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2445.codfw.wmnet with reason: host reimage
[13:26:33] <claime>	 !log Restarting ferm.service on k8s node kubernetes2055 - T354855
[13:26:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:26:37] <stashbot>	 T354855: ferm sometimes fails to restart on Kubernetes workers via xtables lock held by kube-proxy - https://phabricator.wikimedia.org/T354855
[13:27:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: Average latency high: eqiad mw-api-int (k8s) - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-api-int - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[13:27:34] <icinga-wm>	 RECOVERY - Check systemd state on kubernetes2055 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:29:12] <logmsgbot>	 !log hnowlan@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2355.codfw.wmnet with reason: host reimage
[13:29:19] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [software/debmonitor] (debian) - 10https://gerrit.wikimedia.org/r/993083 (owner: 10Slyngshede)
[13:30:26] <wikibugs>	 (03Merged) 10jenkins-bot: setup.py: add missing classifier for Python 3.11 [software/spicerack] - 10https://gerrit.wikimedia.org/r/993687 (owner: 10Volans)
[13:30:28] <wikibugs>	 (03Merged) 10jenkins-bot: CHANGELOG: add changelogs for release v8.3.0 [software/spicerack] - 10https://gerrit.wikimedia.org/r/993688 (owner: 10Volans)
[13:30:36] <wikibugs>	 (03CR) 10Slyngshede: [C: 03+2] Add dependencies for Jquery and debmonitor-client [software/debmonitor] (debian) - 10https://gerrit.wikimedia.org/r/993083 (owner: 10Slyngshede)
[13:31:30] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P55798 and previous config saved to /var/cache/conftool/dbconfig/20240129-133129-marostegui.json
[13:32:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) resolved: Average latency high: eqiad mw-api-int (k8s) - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-api-int - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[13:33:22] <wikibugs>	 (03Merged) 10jenkins-bot: Add dependencies for Jquery and debmonitor-client [software/debmonitor] (debian) - 10https://gerrit.wikimedia.org/r/993083 (owner: 10Slyngshede)
[13:33:58] <wikibugs>	 (03PS1) 10Hashar: wm-checks-api: direct link to build when only one failed [software/gerrit] (deploy/wmf/stable-3.7) - 10https://gerrit.wikimedia.org/r/993689 (https://phabricator.wikimedia.org/T355774)
[13:35:00] <logmsgbot>	 !log hnowlan@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2260.codfw.wmnet with OS bullseye
[13:35:08] <wikibugs>	 10SRE, 10MW-on-K8s, 10serviceops: Reclaim jobrunner hardware for k8s - https://phabricator.wikimedia.org/T354791 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin2002 for host mw2260.codfw.wmnet with OS bullseye completed: - mw2260 (**PASS**)   - Downtimed on Icinga/Alertma...
[13:36:05] <wikibugs>	 (03PS1) 10Volans: Upstream release v8.3.0 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/993690
[13:36:46] <logmsgbot>	 !log btullis@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on an-airflow1006.eqiad.wmnet with reason: host reimage
[13:37:14] <logmsgbot>	 !log hnowlan@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2381.codfw.wmnet with OS bullseye
[13:37:23] <wikibugs>	 10SRE, 10MW-on-K8s, 10serviceops: Reclaim jobrunner hardware for k8s - https://phabricator.wikimedia.org/T354791 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin2002 for host mw2381.codfw.wmnet with OS bullseye completed: - mw2381 (**PASS**)   - Downtimed on Icinga/Alertma...
[13:38:55] <jinxer-wm>	 (RdfStreamingUpdaterSpaceUsageTooHigh) firing: (2) The RDF Streaming Updater is using more than 50GiB of storage - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterSpaceUsageTooHigh
[13:39:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[13:40:02] <logmsgbot>	 !log btullis@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-airflow1006.eqiad.wmnet with reason: host reimage
[13:40:09] <wikibugs>	 (03PS1) 10Jcrespo: dbbackups: Productionize the grants needed to backup ipoid database [puppet] - 10https://gerrit.wikimedia.org/r/993691 (https://phabricator.wikimedia.org/T355884)
[13:40:10] <logmsgbot>	 !log hnowlan@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2429.codfw.wmnet with OS bullseye
[13:40:19] <wikibugs>	 10SRE, 10MW-on-K8s, 10serviceops: Reclaim jobrunner hardware for k8s - https://phabricator.wikimedia.org/T354791 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin2002 for host mw2429.codfw.wmnet with OS bullseye completed: - mw2429 (**WARN**)   - Downtimed on Icinga/Alertma...
[13:42:17] <wikibugs>	 (03CR) 10Arnaudb: [C: 03+1] dbbackups: Productionize the grants needed to backup ipoid database [puppet] - 10https://gerrit.wikimedia.org/r/993691 (https://phabricator.wikimedia.org/T355884) (owner: 10Jcrespo)
[13:43:41] <wikibugs>	 (03CR) 10Volans: [C: 03+2] Upstream release v8.3.0 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/993690 (owner: 10Volans)
[13:44:15] <jinxer-wm>	 (MediaWikiHighErrorRate) resolved: Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[13:44:21] <jinxer-wm>	 (ProbeDown) firing: (8) Service debmonitor1002:7443 has failed probes (http_debmonitor_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Debmonitor - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[13:45:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[13:46:07] <logmsgbot>	 !log hnowlan@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2445.codfw.wmnet with OS bullseye
[13:46:18] <wikibugs>	 10SRE, 10MW-on-K8s, 10serviceops: Reclaim jobrunner hardware for k8s - https://phabricator.wikimedia.org/T354791 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin2002 for host mw2445.codfw.wmnet with OS bullseye completed: - mw2445 (**PASS**)   - Downtimed on Icinga/Alertma...
[13:46:36] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P55799 and previous config saved to /var/cache/conftool/dbconfig/20240129-134636-marostegui.json
[13:48:05] <logmsgbot>	 !log hnowlan@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2355.codfw.wmnet with OS bullseye
[13:48:13] <wikibugs>	 10SRE, 10MW-on-K8s, 10serviceops: Reclaim jobrunner hardware for k8s - https://phabricator.wikimedia.org/T354791 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin2002 for host mw2355.codfw.wmnet with OS bullseye completed: - mw2355 (**PASS**)   - Downtimed on Icinga/Alertma...
[13:49:09] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] dbbackups: Productionize the grants needed to backup ipoid database [puppet] - 10https://gerrit.wikimedia.org/r/993691 (https://phabricator.wikimedia.org/T355884) (owner: 10Jcrespo)
[13:50:15] <jinxer-wm>	 (MediaWikiHighErrorRate) resolved: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[13:50:23] <wikibugs>	 (03PS2) 10Anzx: hewikinews: remove wgExtraGenderNamespaces and add wgNamespaceAliases [mediawiki-config] - 10https://gerrit.wikimedia.org/r/993494 (https://phabricator.wikimedia.org/T349581)
[13:50:32] <wikibugs>	 (03Merged) 10jenkins-bot: Upstream release v8.3.0 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/993690 (owner: 10Volans)
[13:51:46] <wikibugs>	 (03PS4) 10Anzx: knwiki: add portal namespace and fix talkpagenames of draft and module namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/992783 (https://phabricator.wikimedia.org/T355662)
[13:52:20] <wikibugs>	 (03PS14) 10Anzx: uzwiki: revert temporary logo for the 20th anniversary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/992371 (https://phabricator.wikimedia.org/T353723)
[13:53:34] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on kubernetes2055 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[13:53:48] <icinga-wm>	 RECOVERY - debmonitor.discovery.wmnet:443 internal on debmonitor2003 is OK: HTTP OK: Status line output matched HTTP/1.1 200 - 680 bytes in 0.165 second response time https://wikitech.wikimedia.org/wiki/Debmonitor
[13:54:14] <volans>	 !log uploaded spicerack_8.3.0 to apt.wikimedia.org bullseye-wikimedia
[13:54:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:57:36] <icinga-wm>	 PROBLEM - Check systemd state on cumin1001 is CRITICAL: CRITICAL - degraded: The following units failed: httpbb_kubernetes_mw-web_hourly.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:58:18] <wikibugs>	 (03CR) 10Gehel: [C: 03+1] Update the spark-operator image name and version [deployment-charts] - 10https://gerrit.wikimedia.org/r/993012 (https://phabricator.wikimedia.org/T354273) (owner: 10Btullis)
[14:00:05] <jouncebot>	 RoanKattouw, Lucas_WMDE, Urbanecm, awight, and TheresNoTime: It is that lovely time of the day again! You are hereby commanded to deploy UTC afternoon backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240129T1400).
[14:00:05] <jouncebot>	 anzx: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[14:00:11] <anzx>	 o/
[14:01:12] <wikibugs>	 (03CR) 10Marostegui: [C: 03+1] "Let's start some manual runs to see how this goes before scheduling it for a daily run" [puppet] - 10https://gerrit.wikimedia.org/r/979390 (https://phabricator.wikimedia.org/T207253) (owner: 10Ladsgroup)
[14:01:43] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2117 (T355609)', diff saved to https://phabricator.wikimedia.org/P55801 and previous config saved to /var/cache/conftool/dbconfig/20240129-140142-marostegui.json
[14:01:45] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on db2124.codfw.wmnet with reason: Maintenance
[14:01:52] <stashbot>	 T355609: Make cuc_id a bigint - https://phabricator.wikimedia.org/T355609
[14:01:59] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2124.codfw.wmnet with reason: Maintenance
[14:02:06] <Lucas_WMDE>	 o/
[14:02:06] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db2124 (T355609)', diff saved to https://phabricator.wikimedia.org/P55802 and previous config saved to /var/cache/conftool/dbconfig/20240129-140205-marostegui.json
[14:02:23] <wikibugs>	 (03CR) 10Hashar: [C: 03+2] wm-checks-api: direct link to build when only one failed [software/gerrit] (deploy/wmf/stable-3.7) - 10https://gerrit.wikimedia.org/r/993689 (https://phabricator.wikimedia.org/T355774) (owner: 10Hashar)
[14:02:55] <wikibugs>	 (03Merged) 10jenkins-bot: wm-checks-api: direct link to build when only one failed [software/gerrit] (deploy/wmf/stable-3.7) - 10https://gerrit.wikimedia.org/r/993689 (https://phabricator.wikimedia.org/T355774) (owner: 10Hashar)
[14:03:01] <Dreamy_Jazz>	 I would like to do some backports in this window which I can self serve
[14:03:13] <hashar>	 I will do my Gerrit plugin update once the backport window has completed
[14:03:21] <Lucas_WMDE>	 I guess I’ll start with anzx’ changes then
[14:03:27] <Dreamy_Jazz>	 👍
[14:03:38] <icinga-wm>	 PROBLEM - Check unit status of httpbb_kubernetes_mw-web_hourly on cumin1001 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-web_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[14:03:55] <wikibugs>	 (03PS15) 10Lucas Werkmeister (WMDE): uzwiki: revert temporary logo for the 20th anniversary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/992371 (https://phabricator.wikimedia.org/T353723) (owner: 10Anzx)
[14:04:06] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/992371 (https://phabricator.wikimedia.org/T353723) (owner: 10Anzx)
[14:04:26] <Lucas_WMDE>	 wow that sure is a lot of T356024 in logspam-watch
[14:04:26] <stashbot>	 T356024: TypeError: Argument 4 passed to Wikimedia\Parsoid\Utils\Title::__construct() must be of the type string, null given, called in /srv/mediawiki/php-1.42.0-wmf.15/vendor/wikimedia/parsoid/src/Utils/Title.php on line 392 - https://phabricator.wikimedia.org/T356024
[14:04:47] <wikibugs>	 (03Merged) 10jenkins-bot: uzwiki: revert temporary logo for the 20th anniversary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/992371 (https://phabricator.wikimedia.org/T353723) (owner: 10Anzx)
[14:04:59] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 Started scap: Backport for [[gerrit:992371|uzwiki: revert temporary logo for the 20th anniversary (T353723)]]
[14:05:20] <stashbot>	 T353723: Requesting temporary logo change for uz.wikipedia.org - https://phabricator.wikimedia.org/T353723
[14:07:23] <wikibugs>	 10SRE, 10Infrastructure-Foundations: Updated java security policy in OpenJDK 8 u265 - https://phabricator.wikimedia.org/T261196 (10MoritzMuehlenhoff) p:05Triage→03Low
[14:07:24] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 lucaswerkmeister-wmde and anzx: Backport for [[gerrit:992371|uzwiki: revert temporary logo for the 20th anniversary (T353723)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[14:07:26] <anzx>	 Lucas_WMDE: checking 
[14:07:29] <Lucas_WMDE>	 ok
[14:07:33] <Lucas_WMDE>	 looking at the knwiki change at the moment
[14:08:30] <anzx>	 Lucas_WMDE: looks good 
[14:09:11] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 lucaswerkmeister-wmde and anzx: Continuing with sync
[14:09:37] <wikibugs>	 (03PS1) 10Brouberol: hue: rename python-snappy apt dependency [puppet] - 10https://gerrit.wikimedia.org/r/993692 (https://phabricator.wikimedia.org/T349400)
[14:10:22] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): knwiki: add portal namespace and fix talkpagenames of draft and module namespace (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/992783 (https://phabricator.wikimedia.org/T355662) (owner: 10Anzx)
[14:10:37] <wikibugs>	 (03CR) 10Gehel: [C: 03+1] hue: rename python-snappy apt dependency [puppet] - 10https://gerrit.wikimedia.org/r/993692 (https://phabricator.wikimedia.org/T349400) (owner: 10Brouberol)
[14:10:54] <logmsgbot>	 !log btullis@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-airflow1006.eqiad.wmnet with OS bullseye
[14:11:11] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2124 (T355609)', diff saved to https://phabricator.wikimedia.org/P55803 and previous config saved to /var/cache/conftool/dbconfig/20240129-141111-marostegui.json
[14:11:12] <wikibugs>	 (03PS1) 10Majavah: P:toolforge: mailrelay: workaround Exim 4.94 taints [puppet] - 10https://gerrit.wikimedia.org/r/993693 (https://phabricator.wikimedia.org/T311910)
[14:11:16] <stashbot>	 T355609: Make cuc_id a bigint - https://phabricator.wikimedia.org/T355609
[14:12:42] <wikibugs>	 (03CR) 10Majavah: [V: 03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/1227/co" [puppet] - 10https://gerrit.wikimedia.org/r/993693 (https://phabricator.wikimedia.org/T311910) (owner: 10Majavah)
[14:13:52] <wikibugs>	 (03CR) 10Brouberol: [C: 03+2] hue: rename python-snappy apt dependency [puppet] - 10https://gerrit.wikimedia.org/r/993692 (https://phabricator.wikimedia.org/T349400) (owner: 10Brouberol)
[14:14:51] <wikibugs>	 (03PS5) 10Anzx: knwiki: add portal namespace and fix talkpagenames of draft and module namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/992783 (https://phabricator.wikimedia.org/T355662)
[14:14:59] <wikibugs>	 (03CR) 10Anzx: knwiki: add portal namespace and fix talkpagenames of draft and module namespace (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/992783 (https://phabricator.wikimedia.org/T355662) (owner: 10Anzx)
[14:15:20] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] P:toolforge: mailrelay: workaround Exim 4.94 taints [puppet] - 10https://gerrit.wikimedia.org/r/993693 (https://phabricator.wikimedia.org/T311910) (owner: 10Majavah)
[14:15:22] <icinga-wm>	 RECOVERY - Check systemd state on cumin1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[14:16:01] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 Finished scap: Backport for [[gerrit:992371|uzwiki: revert temporary logo for the 20th anniversary (T353723)]] (duration: 11m 01s)
[14:16:06] <stashbot>	 T353723: Requesting temporary logo change for uz.wikipedia.org - https://phabricator.wikimedia.org/T353723
[14:16:27] <wikibugs>	 (03PS1) 10Hashar: gerrit: move soy templates files to unique namespaces [puppet] - 10https://gerrit.wikimedia.org/r/993694
[14:17:01] <wikibugs>	 (03PS1) 10Dreamy Jazz: Send email if file is uploaded that is already a match [extensions/MediaModeration] (wmf/1.42.0-wmf.15) - 10https://gerrit.wikimedia.org/r/993499 (https://phabricator.wikimedia.org/T355357)
[14:17:08] <wikibugs>	 (03PS2) 10Majavah: P:toolforge: mailrelay: workaround Exim 4.94 taints [puppet] - 10https://gerrit.wikimedia.org/r/993693 (https://phabricator.wikimedia.org/T311910)
[14:17:12] <wikibugs>	 (03PS1) 10Dreamy Jazz: Make the email subject unique for positive match emails [extensions/MediaModeration] (wmf/1.42.0-wmf.15) - 10https://gerrit.wikimedia.org/r/993500 (https://phabricator.wikimedia.org/T355752)
[14:17:33] <volans>	 !log upgraded spicerack to 8.3.0 on cumin2002
[14:17:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:18:06] <wikibugs>	 (03CR) 10Majavah: [V: 03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/1228/co" [puppet] - 10https://gerrit.wikimedia.org/r/993693 (https://phabricator.wikimedia.org/T311910) (owner: 10Majavah)
[14:18:58] <wikibugs>	 (03PS6) 10Lucas Werkmeister (WMDE): knwiki: add portal namespace and fix talkpagenames of draft and module namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/992783 (https://phabricator.wikimedia.org/T355662) (owner: 10Anzx)
[14:19:43] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10serviceops: httpbb needs to be setup on cumin1002 and removed from cumin1001 - https://phabricator.wikimedia.org/T356054 (10Volans)
[14:20:14] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/992783 (https://phabricator.wikimedia.org/T355662) (owner: 10Anzx)
[14:20:57] <wikibugs>	 (03Merged) 10jenkins-bot: knwiki: add portal namespace and fix talkpagenames of draft and module namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/992783 (https://phabricator.wikimedia.org/T355662) (owner: 10Anzx)
[14:21:12] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 Started scap: Backport for [[gerrit:992783|knwiki: add portal namespace and fix talkpagenames of draft and module namespace (T355662 T346583)]]
[14:21:18] <stashbot>	 T355662: Create portal namespace on kannada wikipedia  - https://phabricator.wikimedia.org/T355662
[14:21:19] <stashbot>	 T346583: Change namespace names for Kannada Language - https://phabricator.wikimedia.org/T346583
[14:21:28] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.mysql.clone Will create a clone of db2169.codfw.wmnet onto db2194.codfw.wmnet
[14:22:31] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 anzx and lucaswerkmeister-wmde: Backport for [[gerrit:992783|knwiki: add portal namespace and fix talkpagenames of draft and module namespace (T355662 T346583)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[14:22:36] <wikibugs>	 (03PS1) 10Hashar: gerrit: sync soy email template with version 3.7 [puppet] - 10https://gerrit.wikimedia.org/r/993695 (https://phabricator.wikimedia.org/T355259)
[14:22:42] <anzx>	 Checking 
[14:23:36] <anzx>	 Lucas_WMDE: looks good 
[14:23:40] <Lucas_WMDE>	 ok!
[14:23:44] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 anzx and lucaswerkmeister-wmde: Continuing with sync
[14:23:58] <logmsgbot>	 !log pt1979@cumin2002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host ceph2001.codfw.wmnet with OS bullseye
[14:24:40] <icinga-wm>	 RECOVERY - Check unit status of httpbb_kubernetes_mw-web_hourly on cumin1001 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-web_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[14:24:42] <Lucas_WMDE>	 anzx: for hewikinews – if I understand correctly, they just wanted those extra words (מש etc.) to be additional aliases for the namespace, but instead they accidentally overrode the default gendered namespace from MediaWiki?
[14:25:34] <anzx>	 Lucas_WMDE: yes, they asked for aliases only
[14:26:14] <wikibugs>	 (03PS2) 10Dreamy Jazz: Send email if file is uploaded that is already a match [extensions/MediaModeration] (wmf/1.42.0-wmf.15) - 10https://gerrit.wikimedia.org/r/993499 (https://phabricator.wikimedia.org/T355357)
[14:26:18] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P55804 and previous config saved to /var/cache/conftool/dbconfig/20240129-142617-marostegui.json
[14:26:29] <Lucas_WMDE>	 and $wgExtraGenderNamespaces overrides https://gerrit.wikimedia.org/g/mediawiki/core/+/4637824f68/languages/messages/MessagesHe.php#31 ?
[14:26:41] <Lucas_WMDE>	 oops, https://gerrit.wikimedia.org/g/mediawiki/core/+/4637824f68/languages/messages/MessagesHe.php#35 (wrong line number)
[14:27:18] <Lucas_WMDE>	 hm, but https://he.wikinews.org/wiki/משתמש:Lucas_Werkmeister_(WMDE) still shows משתמש, not מש
[14:28:18] <wikibugs>	 (03PS1) 10Majavah: P:toolforge::mailrelay: add Authentication-Results header [puppet] - 10https://gerrit.wikimedia.org/r/993697 (https://phabricator.wikimedia.org/T354112)
[14:30:11] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 Finished scap: Backport for [[gerrit:992783|knwiki: add portal namespace and fix talkpagenames of draft and module namespace (T355662 T346583)]] (duration: 08m 58s)
[14:30:17] <stashbot>	 T355662: Create portal namespace on kannada wikipedia  - https://phabricator.wikimedia.org/T355662
[14:30:18] <stashbot>	 T346583: Change namespace names for Kannada Language - https://phabricator.wikimedia.org/T346583
[14:30:20] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.puppet.migrate-role for role: analytics_cluster::airflow::analytics_product
[14:31:32] <anzx>	 Lucas_WMDE: user reported that viewing short word alias on recent changes https://phabricator.wikimedia.org/T349581#9490150
[14:31:42] <wikibugs>	 (03PS1) 10Muehlenhoff: Switch analytics_cluster::airflow::analytics_product to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/993699 (https://phabricator.wikimedia.org/T349619)
[14:32:58] <Lucas_WMDE>	 I don’t really understand it but let’s try it anyways I guess
[14:33:05] <wikibugs>	 (03PS3) 10Lucas Werkmeister (WMDE): hewikinews: remove wgExtraGenderNamespaces and add wgNamespaceAliases [mediawiki-config] - 10https://gerrit.wikimedia.org/r/993494 (https://phabricator.wikimedia.org/T349581) (owner: 10Anzx)
[14:33:11] <wikibugs>	 10SRE, 10MW-on-K8s, 10serviceops: Reclaim jobrunner hardware for k8s - https://phabricator.wikimedia.org/T354791 (10hnowlan)
[14:33:15] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/993494 (https://phabricator.wikimedia.org/T349581) (owner: 10Anzx)
[14:33:57] <wikibugs>	 (03Merged) 10jenkins-bot: hewikinews: remove wgExtraGenderNamespaces and add wgNamespaceAliases [mediawiki-config] - 10https://gerrit.wikimedia.org/r/993494 (https://phabricator.wikimedia.org/T349581) (owner: 10Anzx)
[14:34:12] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 Started scap: Backport for [[gerrit:993494|hewikinews: remove wgExtraGenderNamespaces and add wgNamespaceAliases (T349581)]]
[14:34:27] <stashbot>	 T349581: Create draft namespace and add namespaces aliases for hewikinews - https://phabricator.wikimedia.org/T349581
[14:35:08] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Switch analytics_cluster::airflow::analytics_product to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/993699 (https://phabricator.wikimedia.org/T349619) (owner: 10Muehlenhoff)
[14:36:03] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 anzx and lucaswerkmeister-wmde: Backport for [[gerrit:993494|hewikinews: remove wgExtraGenderNamespaces and add wgNamespaceAliases (T349581)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[14:36:17] <anzx>	 Lucas_WMDE: checking 
[14:37:11] <wikibugs>	 10SRE, 10ops-codfw, 10Infrastructure-Foundations, 10netops: Migrate hosts from codfw row A/B ASW to new LSW devices - https://phabricator.wikimedia.org/T355544 (10cmooney)
[14:37:25] <logmsgbot>	 !log brouberol@cumin1002 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-tool1009.eqiad.wmnet with OS bullseye
[14:38:40] <anzx>	 Lucas_WMDE: all aliases are working 
[14:38:41] <wikibugs>	 (03PS1) 10Brouberol: Revert "hue: rename python-snappy apt dependency" [puppet] - 10https://gerrit.wikimedia.org/r/993501
[14:39:09] <wikibugs>	 10SRE, 10ops-codfw, 10Data-Persistence, 10Data-Persistence-Backup, and 2 others: Migrate servers in codfw rack B4 from asw-b4-codfw to lsw1-b4-codfw - https://phabricator.wikimedia.org/T355860 (10Marostegui)
[14:39:23] <jinxer-wm>	 (JobUnavailable) firing: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[14:39:23] <wikibugs>	 10SRE, 10ops-codfw, 10DBA, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack A3 from asw-a3-codfw to lsw1-a3-codfw - https://phabricator.wikimedia.org/T355862 (10Marostegui)
[14:39:38] <wikibugs>	 (03CR) 10Btullis: [C: 03+1] Revert "hue: rename python-snappy apt dependency" [puppet] - 10https://gerrit.wikimedia.org/r/993501 (owner: 10Brouberol)
[14:39:44] <wikibugs>	 10SRE, 10ops-codfw, 10DBA, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack A4 from asw-a4-codfw to lsw1-a4-codfw - https://phabricator.wikimedia.org/T355863 (10Marostegui)
[14:40:10] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 anzx and lucaswerkmeister-wmde: Continuing with sync
[14:40:19] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: analytics_cluster::airflow::analytics_product
[14:41:25] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P55806 and previous config saved to /var/cache/conftool/dbconfig/20240129-144124-marostegui.json
[14:41:30] <wikibugs>	 10SRE, 10ops-codfw, 10DBA, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack A5 from asw-a5-codfw to lsw1-a5-codfw - https://phabricator.wikimedia.org/T355864 (10Marostegui)
[14:41:41] <wikibugs>	 10SRE, 10SRE-tools, 10Infrastructure-Foundations, 10Puppet-Core, and 5 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619 (10MoritzMuehlenhoff)
[14:41:48] <wikibugs>	 10SRE, 10ops-codfw, 10DBA, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack A6 from asw-a6-codfw to lsw1-a6-codfw - https://phabricator.wikimedia.org/T355866 (10Marostegui)
[14:42:05] <logmsgbot>	 !log ayounsi@cumin2002 START - Cookbook sre.ganeti.makevm for new host sretest1005.eqiad.wmnet
[14:42:07] <logmsgbot>	 !log ayounsi@cumin2002 START - Cookbook sre.dns.netbox
[14:42:47] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Revert "hue: rename python-snappy apt dependency" [puppet] - 10https://gerrit.wikimedia.org/r/993501 (owner: 10Brouberol)
[14:42:56] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host an-airflow1006.eqiad.wmnet
[14:43:10] <wikibugs>	 (03CR) 10Joal: cassandra: create template for aqsloader role & grants (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/993102 (https://phabricator.wikimedia.org/T355917) (owner: 10Eevans)
[14:43:55] <Dreamy_Jazz>	 Going to +2 both my backports to get them through CI
[14:44:24] <wikibugs>	 10SRE, 10ops-codfw, 10DBA, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack A8 from asw-a8-codfw to lsw1-a8-codfw - https://phabricator.wikimedia.org/T355874 (10Marostegui)
[14:44:34] <Dreamy_Jazz>	 I probably have a third too, but gerrit doesn't let me cherry-pick it until the others are done first due to merge conflicts.
[14:44:36] <wikibugs>	 10SRE, 10DBA, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack B3 from asw-b3-codfw to lsw1-b3-codfw - https://phabricator.wikimedia.org/T355870 (10Marostegui)
[14:44:44] <wikibugs>	 (03CR) 10Dreamy Jazz: [C: 03+2] Make the email subject unique for positive match emails [extensions/MediaModeration] (wmf/1.42.0-wmf.15) - 10https://gerrit.wikimedia.org/r/993500 (https://phabricator.wikimedia.org/T355752) (owner: 10Dreamy Jazz)
[14:44:46] <wikibugs>	 10SRE, 10ops-codfw, 10DBA, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack B6 from asw-b6-codfw to lsw1-b6-codfw - https://phabricator.wikimedia.org/T355871 (10Marostegui)
[14:44:47] <Lucas_WMDE>	 ack
[14:44:56] <wikibugs>	 (03CR) 10Dreamy Jazz: [C: 03+2] Send email if file is uploaded that is already a match [extensions/MediaModeration] (wmf/1.42.0-wmf.15) - 10https://gerrit.wikimedia.org/r/993499 (https://phabricator.wikimedia.org/T355357) (owner: 10Dreamy Jazz)
[14:44:57] <wikibugs>	 (03PS1) 10Hnowlan: tegola-vector-tiles: add maps primaries to config [deployment-charts] - 10https://gerrit.wikimedia.org/r/993700 (https://phabricator.wikimedia.org/T355892)
[14:44:59] <wikibugs>	 10SRE, 10ops-codfw, 10DBA, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack B8 from asw-b8-codfw to lsw1-b8-codfw - https://phabricator.wikimedia.org/T355873 (10Marostegui)
[14:45:15] <wikibugs>	 (03CR) 10Muehlenhoff: Revert "hue: rename python-snappy apt dependency" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/993501 (owner: 10Brouberol)
[14:45:18] <Lucas_WMDE>	 re “This means that the inbox of the email addresses displays each report as a reply to previous reports”, I’m tempted to say that the solution is to stop using email clients that hallucinate In-Reply-To headers :P
[14:45:21] <Lucas_WMDE>	 but who am I kidding
[14:45:33] <Lucas_WMDE>	 google does whatever google wants
[14:45:43] <Lucas_WMDE>	 and everyone else just has to live with it
[14:45:50] <Dreamy_Jazz>	 Yup :D
[14:46:14] * Lucas_WMDE is not at all mad about T355712 either
[14:46:42] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 Finished scap: Backport for [[gerrit:993494|hewikinews: remove wgExtraGenderNamespaces and add wgNamespaceAliases (T349581)]] (duration: 12m 29s)
[14:46:46] <Lucas_WMDE>	 Dreamy_Jazz: over to you
[14:46:48] <stashbot>	 T349581: Create draft namespace and add namespaces aliases for hewikinews - https://phabricator.wikimedia.org/T349581
[14:46:49] <anzx>	 Lucas_WMDE: thank you 
[14:46:53] <Lucas_WMDE>	 (fyi hashar)
[14:46:58] <Dreamy_Jazz>	 Lucas_WMDE: Thanks.
[14:46:59] <Lucas_WMDE>	 anzx: np :)
[14:47:06] <hashar>	 thank you!
[14:47:18] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by dreamyjazz@deploy2002 using scap backport" [extensions/MediaModeration] (wmf/1.42.0-wmf.15) - 10https://gerrit.wikimedia.org/r/993500 (https://phabricator.wikimedia.org/T355752) (owner: 10Dreamy Jazz)
[14:47:23] <wikibugs>	 (03Merged) 10jenkins-bot: Make the email subject unique for positive match emails [extensions/MediaModeration] (wmf/1.42.0-wmf.15) - 10https://gerrit.wikimedia.org/r/993500 (https://phabricator.wikimedia.org/T355752) (owner: 10Dreamy Jazz)
[14:47:26] <wikibugs>	 (03Merged) 10jenkins-bot: Send email if file is uploaded that is already a match [extensions/MediaModeration] (wmf/1.42.0-wmf.15) - 10https://gerrit.wikimedia.org/r/993499 (https://phabricator.wikimedia.org/T355357) (owner: 10Dreamy Jazz)
[14:47:32] <wikibugs>	 (03PS2) 10Brouberol: Revert "hue: rename python-snappy apt dependency" [puppet] - 10https://gerrit.wikimedia.org/r/993501
[14:47:36] <logmsgbot>	 !log dreamyjazz@deploy2002 Started scap: Backport for [[gerrit:993500|Make the email subject unique for positive match emails (T355752)]]
[14:47:41] <stashbot>	 T355752: Make the email subject unique for MediaModeration emails - https://phabricator.wikimedia.org/T355752
[14:47:42] <wikibugs>	 10SRE, 10ops-codfw, 10DBA, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack A3 from asw-a3-codfw to lsw1-a3-codfw - https://phabricator.wikimedia.org/T355862 (10Marostegui) db2142 - x2 master db2103 - s1 master es2020 - es4 master
[14:48:19] <wikibugs>	 10SRE, 10ops-codfw, 10DBA, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack A8 from asw-a8-codfw to lsw1-a8-codfw - https://phabricator.wikimedia.org/T355874 (10Marostegui) db2146 - slave db2106 - slave
[14:48:38] <wikibugs>	 (03PS1) 10Hnowlan: conftool: restore maps primary servers to kartotherian pool [puppet] - 10https://gerrit.wikimedia.org/r/993702 (https://phabricator.wikimedia.org/T355892)
[14:48:41] <wikibugs>	 10SRE, 10DBA, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack B3 from asw-b3-codfw to lsw1-b3-codfw - https://phabricator.wikimedia.org/T355870 (10Marostegui) db2108 - slave db2123 - slave es2021 - es4 master
[14:48:56] <logmsgbot>	 !log ayounsi@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM sretest1005.eqiad.wmnet - ayounsi@cumin2002"
[14:49:30] <wikibugs>	 (03PS1) 10Dreamy Jazz: Follow-up changes for MediaModerationEmailer service [extensions/MediaModeration] (wmf/1.42.0-wmf.15) - 10https://gerrit.wikimedia.org/r/993502 (https://phabricator.wikimedia.org/T351407)
[14:49:49] <logmsgbot>	 !log ayounsi@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM sretest1005.eqiad.wmnet - ayounsi@cumin2002"
[14:49:49] <logmsgbot>	 !log ayounsi@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[14:49:50] <logmsgbot>	 !log ayounsi@cumin2002 START - Cookbook sre.dns.wipe-cache sretest1005.eqiad.wmnet on all recursors
[14:49:53] <logmsgbot>	 !log ayounsi@cumin2002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) sretest1005.eqiad.wmnet on all recursors
[14:50:17] <wikibugs>	 10SRE, 10ops-codfw, 10DBA, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack A4 from asw-a4-codfw to lsw1-a4-codfw - https://phabricator.wikimedia.org/T355863 (10Marostegui) db2183 - codfw backup master @jcrespo
[14:50:19] <logmsgbot>	 !log ayounsi@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM sretest1005.eqiad.wmnet - ayounsi@cumin2002"
[14:50:27] <logmsgbot>	 !log jiji@cumin1002 START - Cookbook sre.hosts.reboot-single for host mc2055.codfw.wmnet
[14:51:05] <Dreamy_Jazz>	 scap backport is waiting a while on `K8s images build/push output redirected to /home/dreamyjazz/scap-image-build-and-push-log`
[14:51:10] <logmsgbot>	 !log ayounsi@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM sretest1005.eqiad.wmnet - ayounsi@cumin2002"
[14:51:50] <logmsgbot>	 !log dreamyjazz@deploy2002 sync-world aborted: Backport for [[gerrit:993500|Make the email subject unique for positive match emails (T355752)]] (duration: 04m 13s)
[14:51:51] <wikibugs>	 10SRE, 10ops-codfw, 10DBA, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack A5 from asw-a5-codfw to lsw1-a5-codfw - https://phabricator.wikimedia.org/T355864 (10Marostegui) db2121 - slave  db2132 m1 master (not used) db2145 - slave db2104 - m2 master db2153 - slave db2154 - slave db2...
[14:52:00] <Lucas_WMDE>	 ohhhh, it touched i18n/
[14:52:05] <Lucas_WMDE>	 that might make for a larger sync than usual
[14:52:14] <Lucas_WMDE>	 though I think the worst offender of this was fixed recently-ish
[14:52:16] <Lucas_WMDE>	 not sure
[14:52:17] <Dreamy_Jazz>	 Oh I see.
[14:52:21] <wikibugs>	 (03PS3) 10Brouberol: Revert "hue: rename python-snappy apt dependency" [puppet] - 10https://gerrit.wikimedia.org/r/993501
[14:52:38] <logmsgbot>	 !log ayounsi@cumin2002 START - Cookbook sre.hosts.reimage for host sretest1005.eqiad.wmnet with OS bookworm
[14:52:38] <logmsgbot>	 !log dreamyjazz@deploy2002 Started scap: Backport for [[gerrit:993500|Make the email subject unique for positive match emails (T355752)]]
[14:52:46] <stashbot>	 T355752: Make the email subject unique for MediaModeration emails - https://phabricator.wikimedia.org/T355752
[14:53:00] <Dreamy_Jazz>	 I'll be patient then :)
[14:53:25] <wikibugs>	 (03CR) 10Paladox: [C: 03+1] gerrit: move soy templates files to unique namespaces [puppet] - 10https://gerrit.wikimedia.org/r/993694 (owner: 10Hashar)
[14:53:29] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Revert "hue: rename python-snappy apt dependency" [puppet] - 10https://gerrit.wikimedia.org/r/993501 (owner: 10Brouberol)
[14:53:33] <wikibugs>	 (03CR) 10Paladox: [C: 03+1] gerrit: sync soy email template with version 3.7 [puppet] - 10https://gerrit.wikimedia.org/r/993695 (https://phabricator.wikimedia.org/T355259) (owner: 10Hashar)
[14:53:37] <wikibugs>	 (03CR) 10Brouberol: Revert "hue: rename python-snappy apt dependency" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/993501 (owner: 10Brouberol)
[14:53:46] <logmsgbot>	 !log ayounsi@cumin2002 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1005.eqiad.wmnet with OS bookworm
[14:53:46] <logmsgbot>	 !log ayounsi@cumin2002 END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=97) for new host sretest1005.eqiad.wmnet
[14:54:07] <Dreamy_Jazz>	 !log scap backport is also backporting 993499 for T355357
[14:54:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:54:13] <stashbot>	 T355357: Send an email to indicate a match if a file is uploaded that is already marked as a match - https://phabricator.wikimedia.org/T355357
[14:54:31] <wikibugs>	 (03PS4) 10Brouberol: Revert "hue: rename python-snappy apt dependency" [puppet] - 10https://gerrit.wikimedia.org/r/993501
[14:55:40] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Revert "hue: rename python-snappy apt dependency" [puppet] - 10https://gerrit.wikimedia.org/r/993501 (owner: 10Brouberol)
[14:56:11] <wikibugs>	 (03CR) 10Paladox: [C: 03+1] gerrit: sync soy email template with version 3.7 (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/993695 (https://phabricator.wikimedia.org/T355259) (owner: 10Hashar)
[14:56:23] <logmsgbot>	 !log jiji@cumin1002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2055.codfw.wmnet
[14:56:31] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2124 (T355609)', diff saved to https://phabricator.wikimedia.org/P55807 and previous config saved to /var/cache/conftool/dbconfig/20240129-145630-marostegui.json
[14:56:32] <logmsgbot>	 !log ayounsi@cumin2002 START - Cookbook sre.hosts.decommission for hosts sretest1005.eqiad.wmnet
[14:56:33] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on db2129.codfw.wmnet with reason: Maintenance
[14:56:36] <stashbot>	 T355609: Make cuc_id a bigint - https://phabricator.wikimedia.org/T355609
[14:56:47] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2129.codfw.wmnet with reason: Maintenance
[14:56:53] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db2129 (T355609)', diff saved to https://phabricator.wikimedia.org/P55808 and previous config saved to /var/cache/conftool/dbconfig/20240129-145652-marostegui.json
[14:57:22] <Lucas_WMDE>	 dammit, I can’t find the task I remember that made scap faster in certain situations where i18n was touched
[14:57:28] <Lucas_WMDE>	 so far I’ve only found T307277 which is still open
[14:57:28] <stashbot>	 T307277: Make it easier to deploy backports with i18n changes - https://phabricator.wikimedia.org/T307277
[14:57:53] <logmsgbot>	 !log jiji@cumin1002 START - Cookbook sre.hosts.reboot-single for host mc-gp1001.eqiad.wmnet
[14:58:31] <Dreamy_Jazz>	 It has now proceeded to docker_pull_k8s
[14:58:37] <logmsgbot>	 !log hashar@deploy2002 Started deploy [gerrit/gerrit@5594608]: wm-checks-api: direct link to build when only one failed - T355774
[14:58:42] <stashbot>	 T355774: One-click access to build logs gone after upgrade - https://phabricator.wikimedia.org/T355774
[14:58:45] <logmsgbot>	 !log hashar@deploy2002 Finished deploy [gerrit/gerrit@5594608]: wm-checks-api: direct link to build when only one failed - T355774 (duration: 00m 07s)
[14:58:49] <wikibugs>	 10SRE, 10ops-codfw, 10DBA, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack A6 from asw-a6-codfw to lsw1-a6-codfw - https://phabricator.wikimedia.org/T355866 (10Marostegui) db2155 - slave db2156 - slave db2097 - backups slave @jcrespo  db2105 - s3 master db2122 - slave db2133 - m2 ma...
[14:59:03] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2129 (T355609)', diff saved to https://phabricator.wikimedia.org/P55809 and previous config saved to /var/cache/conftool/dbconfig/20240129-145902-marostegui.json
[14:59:07] <wikibugs>	 (03CR) 10Effie Mouzeli: [C: 03+2] mc: Switch to Puppet 7 on the role level [puppet] - 10https://gerrit.wikimedia.org/r/992738 (https://phabricator.wikimedia.org/T349619) (owner: 10Muehlenhoff)
[14:59:23] <jinxer-wm>	 (JobUnavailable) resolved: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[15:00:36] <logmsgbot>	 !log ayounsi@cumin2002 START - Cookbook sre.dns.netbox
[15:03:12] <wikibugs>	 10SRE, 10ops-codfw, 10DBA, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack B6 from asw-b6-codfw to lsw1-b6-codfw - https://phabricator.wikimedia.org/T355871 (10Marostegui) db2098 - backup slave @jcrespo  db2110 - slave db2111 - slave db2124 - slave db2134 - m3 master (not used) db20...
[15:04:17] <logmsgbot>	 !log dreamyjazz@deploy2002 dreamyjazz: Backport for [[gerrit:993500|Make the email subject unique for positive match emails (T355752)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[15:04:20] <logmsgbot>	 !log dreamyjazz@deploy2002 dreamyjazz: Continuing with sync
[15:04:22] <stashbot>	 T355752: Make the email subject unique for MediaModeration emails - https://phabricator.wikimedia.org/T355752
[15:04:25] <logmsgbot>	 !log jiji@cumin1002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1001.eqiad.wmnet
[15:04:55] <wikibugs>	 10SRE, 10ops-codfw, 10DBA, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack B8 from asw-b8-codfw to lsw1-b8-codfw - https://phabricator.wikimedia.org/T355873 (10Marostegui) db2148 - slave db2163 - slave db2185 zarcillo dc master (nothing required) db2164 - slave db2189 - slave es2029...
[15:05:03] <wikibugs>	 10SRE, 10ops-codfw, 10DBA, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack A4 from asw-a4-codfw to lsw1-a4-codfw - https://phabricator.wikimedia.org/T355863 (10jcrespo) Thank you, I will shutdown media backups anyway every time one host is affected, not just this one, to minimize fa...
[15:07:27] <wikibugs>	 (03PS1) 10Gerrit maintenance bot: mariadb: Promote db2112 to s1 master [puppet] - 10https://gerrit.wikimedia.org/r/993469 (https://phabricator.wikimedia.org/T356059)
[15:07:31] <wikibugs>	 (03PS1) 10Gerrit maintenance bot: wmnet: Update s1-master alias [dns] - 10https://gerrit.wikimedia.org/r/993470 (https://phabricator.wikimedia.org/T356059)
[15:08:04] <wikibugs>	 10SRE, 10ops-codfw, 10DBA, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack A3 from asw-a3-codfw to lsw1-a3-codfw - https://phabricator.wikimedia.org/T355862 (10Marostegui)
[15:09:27] <wikibugs>	 (03CR) 10Hashar: "I have made a diff between upstream and our Puppet files but had the files inverted in my diff editor  :)" [puppet] - 10https://gerrit.wikimedia.org/r/993695 (https://phabricator.wikimedia.org/T355259) (owner: 10Hashar)
[15:09:43] <wikibugs>	 (03PS2) 10Hashar: gerrit: sync soy email template with version 3.7 [puppet] - 10https://gerrit.wikimedia.org/r/993695 (https://phabricator.wikimedia.org/T355259)
[15:10:16] <wikibugs>	 (03Abandoned) 10Dreamy Jazz: Follow-up changes for MediaModerationEmailer service [extensions/MediaModeration] (wmf/1.42.0-wmf.15) - 10https://gerrit.wikimedia.org/r/993502 (https://phabricator.wikimedia.org/T351407) (owner: 10Dreamy Jazz)
[15:11:39] <wikibugs>	 (03PS5) 10Brouberol: Revert "hue: rename python-snappy apt dependency" [puppet] - 10https://gerrit.wikimedia.org/r/993501
[15:11:40] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-airflow1006.eqiad.wmnet
[15:12:01] <wikibugs>	 10SRE, 10Wikimedia-Mailing-lists: Request for BHL-WIKI Group List - https://phabricator.wikimedia.org/T355941 (10JJFord_BHL) Thank you!!
[15:12:08] <logmsgbot>	 !log ayounsi@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: sretest1005.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ayounsi@cumin2002"
[15:13:02] <logmsgbot>	 !log ayounsi@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: sretest1005.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ayounsi@cumin2002"
[15:13:02] <logmsgbot>	 !log ayounsi@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[15:13:03] <logmsgbot>	 !log ayounsi@cumin2002 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts sretest1005.eqiad.wmnet
[15:13:13] <wikibugs>	 10SRE, 10Ganeti, 10Infrastructure-Foundations, 10netops, 10Patch-For-Review: Investigate Ganeti in routed mode - https://phabricator.wikimedia.org/T300152 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by ayounsi@cumin2002 for hosts: `sretest1005.eqiad.wmnet` - sretest1005.eqiad.wmnet (...
[15:13:21] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/993501 (owner: 10Brouberol)
[15:13:59] <logmsgbot>	 !log dreamyjazz@deploy2002 Finished scap: Backport for [[gerrit:993500|Make the email subject unique for positive match emails (T355752)]] (duration: 21m 21s)
[15:14:06] <stashbot>	 T355752: Make the email subject unique for MediaModeration emails - https://phabricator.wikimedia.org/T355752
[15:14:09] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P55810 and previous config saved to /var/cache/conftool/dbconfig/20240129-151409-marostegui.json
[15:14:41] <Dreamy_Jazz>	 hashar: That's my backports deployed
[15:15:01] <wikibugs>	 (03PS1) 10Ilias Sarantopoulos: ml-services: test GPU with article-descriptions model [deployment-charts] - 10https://gerrit.wikimedia.org/r/993707
[15:15:09] <Dreamy_Jazz>	 !log afternoon UTC backport window done
[15:15:12] <hashar>	 21 miuntes doh
[15:15:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:15:26] <hashar>	 anyway happy to see that has completed
[15:16:18] <wikibugs>	 (03CR) 10Brouberol: [C: 03+2] Revert "hue: rename python-snappy apt dependency" [puppet] - 10https://gerrit.wikimedia.org/r/993501 (owner: 10Brouberol)
[15:17:01] <logmsgbot>	 !log brouberol@cumin1002 START - Cookbook sre.hosts.reimage for host an-tool1009.eqiad.wmnet with OS buster
[15:17:19] <Dreamy_Jazz>	 !log Stopping mediamoderation scanning script
[15:17:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:19:30] <Dreamy_Jazz>	 !log Running `foreachwikiindblist group2.dblist extensions/MediaModeration/maintenance/resendMatchEmails.php 20200405`
[15:19:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:20:12] <wikibugs>	 (03CR) 10Klausman: [C: 03+1] ml-services: test GPU with article-descriptions model [deployment-charts] - 10https://gerrit.wikimedia.org/r/993707 (owner: 10Ilias Sarantopoulos)
[15:21:25] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: [C: 03+2] ml-services: test GPU with article-descriptions model [deployment-charts] - 10https://gerrit.wikimedia.org/r/993707 (owner: 10Ilias Sarantopoulos)
[15:21:33] <Dreamy_Jazz>	 !log Running `foreachwikiindblist group1.dblist extensions/MediaModeration/maintenance/resendMatchEmails.php 20200405 --verbose`
[15:21:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:22:17] <wikibugs>	 (03Merged) 10jenkins-bot: ml-services: test GPU with article-descriptions model [deployment-charts] - 10https://gerrit.wikimedia.org/r/993707 (owner: 10Ilias Sarantopoulos)
[15:23:18] <logmsgbot>	 !log isaranto@deploy2002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
[15:24:11] <logmsgbot>	 !log isaranto@deploy2002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
[15:26:09] <Dreamy_Jazz>	 !log Running MediaModeration scanning script using `mwscript extensions/MediaModeration/maintenance/scanFilesInScanTable.php --wiki=commonswiki --use-jobqueue --sleep 30 --verbose 2>&1 | tee ~/scan-files-in-scan-table-commonswiki-sleep-30-no-render-now.txt` on a tmux session.
[15:26:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:26:17] <wikibugs>	 (03PS1) 10Brouberol: Build hue for Debian Bullseye by default [debs/hue] - 10https://gerrit.wikimedia.org/r/993708 (https://phabricator.wikimedia.org/T349400)
[15:28:21] <wikibugs>	 (03PS1) 10Esanders: DiscussionTools: Enable permalinks frontend everywhere except en.wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/993709 (https://phabricator.wikimedia.org/T356063)
[15:29:16] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P55811 and previous config saved to /var/cache/conftool/dbconfig/20240129-152915-marostegui.json
[15:30:52] <wikibugs>	 (03CR) 10Muehlenhoff: Build hue for Debian Bullseye by default (032 comments) [debs/hue] - 10https://gerrit.wikimedia.org/r/993708 (https://phabricator.wikimedia.org/T349400) (owner: 10Brouberol)
[15:30:55] <wikibugs>	 (03CR) 10Clément Goubert: [C: 03+1] tegola-vector-tiles: add maps primaries to config [deployment-charts] - 10https://gerrit.wikimedia.org/r/993700 (https://phabricator.wikimedia.org/T355892) (owner: 10Hnowlan)
[15:31:02] <wikibugs>	 (03CR) 10Clément Goubert: [C: 03+1] conftool: restore maps primary servers to kartotherian pool [puppet] - 10https://gerrit.wikimedia.org/r/993702 (https://phabricator.wikimedia.org/T355892) (owner: 10Hnowlan)
[15:31:39] <logmsgbot>	 !log brouberol@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on an-tool1009.eqiad.wmnet with reason: host reimage
[15:34:50] <logmsgbot>	 !log brouberol@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-tool1009.eqiad.wmnet with reason: host reimage
[15:35:32] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Mail: Puppetry - https://phabricator.wikimedia.org/T325395 (10jhathaway) p:05Triage→03Medium
[15:35:35] <wikibugs>	 10SRE, 10Data Products: Forward ops-dumps@wikimedia.org to data-engineering-alerts@lists.wikimedia.org - https://phabricator.wikimedia.org/T355891 (10xcollazo) Hey @Dzahn. I did receive your test email. However, I do not see it on https://groups.google.com/a/wikimedia.org/g/ops-dumps, so it doesn’t seem like i...
[15:36:26] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Mail: Provision mta-inbound-lists - https://phabricator.wikimedia.org/T325404 (10jhathaway) p:05Triage→03Low
[15:36:36] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Mail: Provision mta-outbound-lists - https://phabricator.wikimedia.org/T325405 (10jhathaway) p:05Triage→03Medium
[15:36:55] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Mail: MTA Provisioning - https://phabricator.wikimedia.org/T325403 (10jhathaway) p:05Triage→03Medium
[15:37:11] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Mail: Replace Exim with Postfix on mail servers - https://phabricator.wikimedia.org/T325394 (10jhathaway) p:05Triage→03Medium
[15:38:43] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Patch-For-Review: admin: Add validation checks for missing realname and email in data.yaml - https://phabricator.wikimedia.org/T320937 (10jhathaway) p:05Triage→03Low
[15:39:17] <wikibugs>	 10SRE, 10Bitu, 10Infrastructure-Foundations, 10Patch-For-Review: SSH Key type expiry - https://phabricator.wikimedia.org/T347572 (10joanna_borun) p:05Triage→03Medium
[15:40:20] <wikibugs>	 10SRE, 10Traffic: create a puppetized abstraction for haproxy blocklist hysteresis - https://phabricator.wikimedia.org/T329331 (10CDanis) @Fabfur just wanted to make sure you've seen this task, it is decent documentation of the existing mechanism and probably helpful for doing T353910
[15:40:33] <wikibugs>	 10SRE, 10Bitu, 10Infrastructure-Foundations: IDM milestone 2 "Initial limited deployment" - https://phabricator.wikimedia.org/T320603 (10SLyngshede-WMF)
[15:40:41] <wikibugs>	 10SRE, 10Infrastructure-Foundations: Enhance account handling (meta bug) - https://phabricator.wikimedia.org/T142815 (10SLyngshede-WMF)
[15:40:46] <wikibugs>	 10SRE, 10Bitu, 10Infrastructure-Foundations: Create an IDM for Wikimedia developer accounts - https://phabricator.wikimedia.org/T319405 (10SLyngshede-WMF)
[15:40:54] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] miscweb(wikiworkshop): bump version [deployment-charts] - 10https://gerrit.wikimedia.org/r/993454 (https://phabricator.wikimedia.org/T349774) (owner: 10DDesouza)
[15:41:07] <wikibugs>	 10SRE, 10Bitu, 10Infrastructure-Foundations: IDM milestone 2 "Initial limited deployment" - https://phabricator.wikimedia.org/T320603 (10SLyngshede-WMF) 05Open→03Resolved a:03SLyngshede-WMF
[15:41:15] <wikibugs>	 10SRE, 10Infrastructure-Foundations: Further enhancements for nftables support in profile::firewall - https://phabricator.wikimedia.org/T348498 (10MoritzMuehlenhoff)
[15:41:19] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Traffic: NEL: don't alert on domains we don't control - https://phabricator.wikimedia.org/T349807 (10CDanis) p:05Triage→03Medium
[15:41:25] <wikibugs>	 10SRE, 10Infrastructure-Foundations: Monitoring check for nftables - https://phabricator.wikimedia.org/T348499 (10MoritzMuehlenhoff) 05Open→03In progress p:05Triage→03Medium a:03MoritzMuehlenhoff
[15:42:04] <wikibugs>	 (03PS3) 10Clément Goubert: httpbb: Migrate to cumin1002 [puppet] - 10https://gerrit.wikimedia.org/r/993710 (https://phabricator.wikimedia.org/T356054)
[15:42:06] <wikibugs>	 (03Merged) 10jenkins-bot: miscweb(wikiworkshop): bump version [deployment-charts] - 10https://gerrit.wikimedia.org/r/993454 (https://phabricator.wikimedia.org/T349774) (owner: 10DDesouza)
[15:42:24] <wikibugs>	 10SRE, 10Cloud-VPS, 10cloud-services-team, 10Patch-For-Review: Restrict traffic from instances to private IPs on cloudgw level - https://phabricator.wikimedia.org/T350132 (10joanna_borun)
[15:42:51] <wikibugs>	 (03PS1) 10Marostegui: db-production.php: Disable writes on es4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/993711 (https://phabricator.wikimedia.org/T356064)
[15:43:39] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netops: Put Dell SONiC switches in production - https://phabricator.wikimedia.org/T335028 (10ayounsi) p:05Triage→03Medium
[15:44:23] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2129 (T355609)', diff saved to https://phabricator.wikimedia.org/P55814 and previous config saved to /var/cache/conftool/dbconfig/20240129-154422-marostegui.json
[15:44:25] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on db2151.codfw.wmnet with reason: Maintenance
[15:44:31] <stashbot>	 T355609: Make cuc_id a bigint - https://phabricator.wikimedia.org/T355609
[15:44:38] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2151.codfw.wmnet with reason: Maintenance
[15:44:45] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db2151 (T355609)', diff saved to https://phabricator.wikimedia.org/P55815 and previous config saved to /var/cache/conftool/dbconfig/20240129-154444-marostegui.json
[15:46:24] <wikibugs>	 10SRE, 10Security-Team, 10WMF-General-or-Unknown, 10Wikimedia-Apache-configuration, and 3 others: Add security.txt to Wikimedia sites? (2023 edition) - https://phabricator.wikimedia.org/T337949 (10joanna_borun)
[15:47:04] <wikibugs>	 (03CR) 10Effie Mouzeli: [C: 03+2] Remove outdated stretch exclusion for kartotherian [puppet] - 10https://gerrit.wikimedia.org/r/979319 (owner: 10Awight)
[15:48:07] <wikibugs>	 10SRE, 10ops-codfw, 10DBA, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack A4 from asw-a4-codfw to lsw1-a4-codfw - https://phabricator.wikimedia.org/T355863 (10cmooney)
[15:48:17] <wikibugs>	 (03PS1) 10Gerrit maintenance bot: mariadb: Promote db2127 to s3 master [puppet] - 10https://gerrit.wikimedia.org/r/993472 (https://phabricator.wikimedia.org/T356069)
[15:48:17] <wikibugs>	 (03PS1) 10Gerrit maintenance bot: wmnet: Update s3-master alias [dns] - 10https://gerrit.wikimedia.org/r/993473 (https://phabricator.wikimedia.org/T356069)
[15:48:49] <wikibugs>	 10SRE, 10ops-codfw, 10DBA, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack A6 from asw-a6-codfw to lsw1-a6-codfw - https://phabricator.wikimedia.org/T355866 (10Marostegui)
[15:49:13] <wikibugs>	 (03PS1) 10Hnowlan: kubernetes: make 5 jobrunners kubernetes workers                                                                                               │ [puppet] - 10https://gerrit.wikimedia.org/r/993714 (https://phabricator.wikimedia.org/T354791)
[15:49:37] <wikibugs>	 (03PS2) 10Hnowlan: kubernetes: make 5 jobrunners kubernetes workers [puppet] - 10https://gerrit.wikimedia.org/r/993714 (https://phabricator.wikimedia.org/T354791)
[15:51:37] <wikibugs>	 10SRE, 10Data Products: Forward ops-dumps@wikimedia.org to data-engineering-alerts@lists.wikimedia.org - https://phabricator.wikimedia.org/T355891 (10Dzahn) Hey @xcollazo I am actually not sure if we expect it to show up in that group inbox. As far as I know there are different options in Google, shared inbox...
[15:52:15] <jinxer-wm>	 (PHPFPMTooBusy) firing: (2) Not enough idle PHP-FPM workers for Mediawiki mw-api-ext at eqiad: 22.78% idle - https://bit.ly/wmf-fpmsat  - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[15:52:45] <wikibugs>	 (03PS2) 10Effie Mouzeli: Remove outdated stretch exclusion for kartotherian [puppet] - 10https://gerrit.wikimedia.org/r/979319 (owner: 10Awight)
[15:53:16] <wikibugs>	 (03CR) 10Effie Mouzeli: [C: 03+2] Remove outdated stretch exclusion for kartotherian [puppet] - 10https://gerrit.wikimedia.org/r/979319 (owner: 10Awight)
[15:53:28] <wikibugs>	 (03CR) 10Scott French: [C: 03+2] "Thanks, Moritz!" [debs/wmf-sre-laptop] - 10https://gerrit.wikimedia.org/r/993183 (owner: 10Scott French)
[15:53:35] <wikibugs>	 10SRE-swift-storage, 10Patch-For-Review: unstable device mapping of SSDs causing installer problems - example reimage with destruction of swift filesystem - https://phabricator.wikimedia.org/T308677 (10joanna_borun)
[15:53:44] <wikibugs>	 (03CR) 10Scott French: [V: 03+2 C: 03+2] Ensure ssh-agent services are also enabled [debs/wmf-sre-laptop] - 10https://gerrit.wikimedia.org/r/993183 (owner: 10Scott French)
[15:54:07] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2151 (T355609)', diff saved to https://phabricator.wikimedia.org/P55816 and previous config saved to /var/cache/conftool/dbconfig/20240129-155406-marostegui.json
[15:54:15] <stashbot>	 T355609: Make cuc_id a bigint - https://phabricator.wikimedia.org/T355609
[15:54:16] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: Average latency high: eqiad mw-api-int (k8s) - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-api-int - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[15:55:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: Average latency high: eqiad appserver GET/200: 0.42748183472170714s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=eqiad&var-cluster=appserver&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[15:58:09] <logmsgbot>	 !log brouberol@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-tool1009.eqiad.wmnet with OS buster
[15:59:16] <jinxer-wm>	 (MediaWikiLatencyExceeded) resolved: (2) Average latency high: eqiad mw-api-int (k8s) - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[16:00:02] <wikibugs>	 10SRE-tools, 10Infrastructure-Foundations, 10Spicerack: Spicerack: migrate distributed locking to etcd v3 - https://phabricator.wikimedia.org/T352155 (10Volans) p:05Triage→03Medium
[16:00:08] <wikibugs>	 10SRE-tools, 10Infrastructure-Foundations, 10Spicerack: Spicerack: adapt conftool module for etcd v3 - https://phabricator.wikimedia.org/T352153 (10Volans) p:05Triage→03Medium
[16:00:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) resolved: Average latency high: eqiad appserver GET/200: 0.42748183472170714s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=eqiad&var-cluster=appserver&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceed
[16:00:29] <wikibugs>	 (03CR) 10DLynch: [C: 03+1] DiscussionTools: Enable permalinks frontend everywhere except en.wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/993709 (https://phabricator.wikimedia.org/T356063) (owner: 10Esanders)
[16:01:39] <wikibugs>	 10SRE, 10SRE-tools, 10Infrastructure-Foundations, 10Spicerack: More structured cookbooks to reboot hosts - https://phabricator.wikimedia.org/T252807 (10MoritzMuehlenhoff)
[16:02:04] <wikibugs>	 10SRE, 10SRE-tools, 10Infrastructure-Foundations, 10Spicerack: Migrate existing cookbooks related to rolling restarts/reboots to SREBatchBase - https://phabricator.wikimedia.org/T317855 (10MoritzMuehlenhoff) 05Open→03In progress p:05Triage→03Low
[16:02:15] <jinxer-wm>	 (PHPFPMTooBusy) resolved: (2) Not enough idle PHP-FPM workers for Mediawiki mw-api-ext at eqiad: 47.78% idle - https://bit.ly/wmf-fpmsat  - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[16:03:11] <wikibugs>	 10SRE, 10Bitu, 10Infrastructure-Foundations, 10cloud-services-team, 10LDAP: Create a single application to provision and manage developer (LDAP) accounts - https://phabricator.wikimedia.org/T179463 (10SLyngshede-WMF) 05Open→03Declined We're already working on Bitu, which has at least some overlap wit...
[16:03:14] <wikibugs>	 10sre-alert-triage, 10Infrastructure-Foundations: Alert triage: overdue alert [warning] Systemd units failing on debmonitor2003 - https://phabricator.wikimedia.org/T343897 (10MoritzMuehlenhoff) 05Open→03Resolved a:03MoritzMuehlenhoff The migration of debmonitor/bookworm/packaged debmonitor has now progre...
[16:03:53] <wikibugs>	 (03PS1) 10Dbrant: [WIP] Add labs config to test Contact page for account vanishing. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/993718 (https://phabricator.wikimedia.org/T343536)
[16:06:05] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to wmf for arinaigum - https://phabricator.wikimedia.org/T355591 (10Arinaigu) @SLyngshede-WMF it worked! I can login to wikitech now.
[16:06:21] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+1] db-production.php: Disable writes on es4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/993711 (https://phabricator.wikimedia.org/T356064) (owner: 10Marostegui)
[16:06:37] <wikibugs>	 10SRE, 10Infrastructure-Foundations: Set nofail for raid0 recipes - https://phabricator.wikimedia.org/T350461 (10joanna_borun) p:05Triage→03Low
[16:07:44] <wikibugs>	 10SRE, 10ops-eqiad, 10DBA, 10DC-Ops: db1224 crashed - hardware error - https://phabricator.wikimedia.org/T354591 (10Jclark-ctr) @Marostegui  Dell has requested firmware updates and reseating device NetXtreme BCM5720 Gigabit Ethernet PCIe on bus 4.  When is a good time to take server down for reseating and...
[16:07:47] <logmsgbot>	 !log eevans@cumin1002 START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on restbase2019.codfw.wmnet with reason: Decommissioning — T352469
[16:07:52] <stashbot>	 T352469: Decommission restbase20[13-20]) - https://phabricator.wikimedia.org/T352469
[16:08:01] <logmsgbot>	 !log eevans@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase2019.codfw.wmnet with reason: Decommissioning — T352469
[16:08:33] <wikibugs>	 10SRE, 10ops-eqiad, 10DBA, 10DC-Ops: db1224 crashed - hardware error - https://phabricator.wikimedia.org/T354591 (10Marostegui) @Jclark-ctr I can switch it off any day starting tomorrow, when would it work for you?
[16:09:14] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P55817 and previous config saved to /var/cache/conftool/dbconfig/20240129-160913-marostegui.json
[16:09:16] <wikibugs>	 10SRE, 10ops-eqiad, 10DBA, 10DC-Ops: db1224 crashed - hardware error - https://phabricator.wikimedia.org/T354591 (10Jclark-ctr) Yes that works for me Thanks
[16:10:00] <urandom>	 !log decommissioning restbase2019/cassandra-{a,b,c} — T352469
[16:10:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:10:24] <Amir1>	 jouncebot: nowandnext
[16:10:24] <jouncebot>	 No deployments scheduled for the next 0 hour(s) and 19 minute(s)
[16:10:24] <jouncebot>	 In 0 hour(s) and 19 minute(s): Wikimedia Portals Update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240129T1630)
[16:10:33] <wikibugs>	 (03PS2) 10Ladsgroup: Drop old virtual domain for url shortener [mediawiki-config] - 10https://gerrit.wikimedia.org/r/992129
[16:10:35] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+2] Drop old virtual domain for url shortener [mediawiki-config] - 10https://gerrit.wikimedia.org/r/992129 (owner: 10Ladsgroup)
[16:11:08] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by ladsgroup@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/992129 (owner: 10Ladsgroup)
[16:11:16] <wikibugs>	 (03Abandoned) 10Ebernhardson: cirrus: Disable cloudelastic writes on selected wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/979146 (https://phabricator.wikimedia.org/T352335) (owner: 10Ebernhardson)
[16:11:24] <wikibugs>	 (03Merged) 10jenkins-bot: Drop old virtual domain for url shortener [mediawiki-config] - 10https://gerrit.wikimedia.org/r/992129 (owner: 10Ladsgroup)
[16:11:30] <logmsgbot>	 !log ladsgroup@deploy2002 Started scap: Backport for [[gerrit:992129|Drop old virtual domain for url shortener]]
[16:11:45] <wikibugs>	 10SRE-tools, 10Infrastructure-Foundations: Abstract a bit more the server provisioning process - https://phabricator.wikimedia.org/T351891 (10joanna_borun) p:05Triage→03Medium
[16:12:58] <logmsgbot>	 !log ladsgroup@deploy2002 ladsgroup: Backport for [[gerrit:992129|Drop old virtual domain for url shortener]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[16:13:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: Elevated rate of MediaWiki errors - parsoid - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[16:13:32] <wikibugs>	 (03PS2) 10Ebernhardson: cirrus: Disable cloudelastic writes to testwiki and mw.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/992974 (https://phabricator.wikimedia.org/T352335)
[16:13:54] <wikibugs>	 10SRE, 10Infrastructure-Foundations: Remove cumin1001 from router ACLs - https://phabricator.wikimedia.org/T353525 (10MoritzMuehlenhoff) p:05Triage→03Medium
[16:14:29] <logmsgbot>	 !log ladsgroup@deploy2002 ladsgroup: Continuing with sync
[16:14:42] <wikibugs>	 10SRE, 10Infrastructure-Foundations: Migrate Spicerack logs from cumin1001 to cumin1002? - https://phabricator.wikimedia.org/T353523 (10MoritzMuehlenhoff) p:05Triage→03Medium
[16:17:27] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Traffic, 10netops, 10Patch-For-Review: Move lvs2012 from private1-b-codfw (row) to private1-b2-codfw (rack) vlan - https://phabricator.wikimedia.org/T352918 (10cmooney)
[16:18:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: (2) Elevated rate of MediaWiki errors - parsoid - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[16:18:34] <wikibugs>	 10SRE-swift-storage: unstable device mapping of SSDs causing swift/puppet problems - example reimage - https://phabricator.wikimedia.org/T308644 (10joanna_borun)
[16:18:52] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Traffic, 10netops, 10Patch-For-Review: Move lvs2011 from private1-a-codfw (row) to private1-a2-codfw (rack) vlan - https://phabricator.wikimedia.org/T352920 (10cmooney)
[16:19:14] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Traffic, 10Patch-For-Review: NetworkProbeLimit cookie should set samesite attribute - https://phabricator.wikimedia.org/T342624 (10CDanis) p:05Triage→03Low a:03CDanis
[16:19:37] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Puppet-Core: Consider deprecation of WMF styleguide checks - https://phabricator.wikimedia.org/T353648 (10MoritzMuehlenhoff) p:05Triage→03Medium
[16:20:21] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Traffic, 10Patch-For-Review: GeoIP mapping experiments - https://phabricator.wikimedia.org/T332024 (10CDanis) p:05Triage→03Low a:05JameelKaisar→03CDanis
[16:20:55] <logmsgbot>	 !log ladsgroup@deploy2002 Finished scap: Backport for [[gerrit:992129|Drop old virtual domain for url shortener]] (duration: 09m 24s)
[16:23:15] <jinxer-wm>	 (MediaWikiHighErrorRate) resolved: (2) Elevated rate of MediaWiki errors - parsoid - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[16:24:08] <wikibugs>	 10SRE-tools, 10Infrastructure-Foundations: Package pyGNMI and dictdiffer to be used by cookbooks - https://phabricator.wikimedia.org/T340045 (10MoritzMuehlenhoff) p:05Triage→03Medium
[16:24:20] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P55819 and previous config saved to /var/cache/conftool/dbconfig/20240129-162420-marostegui.json
[16:24:59] <wikibugs>	 10SRE, 10ops-eqiad, 10DBA, 10DC-Ops: db1224 crashed - hardware error - https://phabricator.wikimedia.org/T354591 (10Marostegui) Great, I will comment on this task once it is off. Thank you!
[16:28:00] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Patch-For-Review: service::docker with 'latest' version behaves poorly if the host runs out of disk space - https://phabricator.wikimedia.org/T321851 (10SLyngshede-WMF) p:05Triage→03Low
[16:30:04] <jouncebot>	 jan_drewniak: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) Wikimedia Portals Update deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240129T1630).
[16:32:40] <wikibugs>	 (03PS1) 10Muehlenhoff: airflow/analytics_product: Keep Python 2 [puppet] - 10https://gerrit.wikimedia.org/r/993727 (https://phabricator.wikimedia.org/T335261)
[16:33:40] <wikibugs>	 (03PS1) 10Jdrewniak: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/993728 (https://phabricator.wikimedia.org/T128546)
[16:34:56] <wikibugs>	 10SRE, 10SRE-swift-storage, 10ops-eqiad, 10DC-Ops: Disk (sdl) failed in ms-be1068 - https://phabricator.wikimedia.org/T356033 (10Jclark-ctr) Started case with dell ordered replacement drive.    You have successfully submitted request SR184210022.  In mean time i have swapped 8tb failed drive with one we ha...
[16:35:09] <wikibugs>	 (03CR) 10Jdrewniak: [C: 03+2] Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/993728 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak)
[16:35:18] <wikibugs>	 10SRE, 10SRE-swift-storage, 10ops-eqiad, 10DC-Ops: Disk (sdl) failed in ms-be1068 - https://phabricator.wikimedia.org/T356033 (10Jclark-ctr) p:05High→03Low a:03Jclark-ctr
[16:35:54] <wikibugs>	 (03Merged) 10jenkins-bot: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/993728 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak)
[16:36:52] <volans>	 !log installed spicerack 8.3.0 on cumin1002, cumin1001
[16:36:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:39:05] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+2] sre.ganeti: add support for routed Ganeti [cookbooks] - 10https://gerrit.wikimedia.org/r/991348 (https://phabricator.wikimedia.org/T300152) (owner: 10Ayounsi)
[16:39:27] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2151 (T355609)', diff saved to https://phabricator.wikimedia.org/P55820 and previous config saved to /var/cache/conftool/dbconfig/20240129-163926-marostegui.json
[16:39:30] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on db2158.codfw.wmnet with reason: Maintenance
[16:39:32] <stashbot>	 T355609: Make cuc_id a bigint - https://phabricator.wikimedia.org/T355609
[16:39:44] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2158.codfw.wmnet with reason: Maintenance
[16:39:45] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 12:00:00 on db2187.codfw.wmnet with reason: Maintenance
[16:39:59] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2187.codfw.wmnet with reason: Maintenance
[16:40:06] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db2158 (T355609)', diff saved to https://phabricator.wikimedia.org/P55821 and previous config saved to /var/cache/conftool/dbconfig/20240129-164005-marostegui.json
[16:40:59] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.mysql.clone (exit_code=0) Will create a clone of db2169.codfw.wmnet onto db2194.codfw.wmnet
[16:41:12] <arnaudb>	 🎉
[16:41:45] <wikibugs>	 10SRE, 10SRE-swift-storage, 10ops-eqiad, 10DC-Ops: Disk (sdl) failed in ms-be1068 - https://phabricator.wikimedia.org/T356033 (10MatthewVernon) @Jclark-ctr thank you for the quick swap, much appreciated :-)
[16:43:15] <jinxer-wm>	 (PHPFPMTooBusy) firing: Not enough idle PHP-FPM workers for Mediawiki mw-web at eqiad: 43.98% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[16:44:44] <logmsgbot>	 !log jdrewniak@deploy2002 Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:993728| Bumping portals to master (T128546)]] (duration: 07m 04s)
[16:44:56] <stashbot>	 T128546: [Recurring Task] Update Wikipedia and sister projects portals statistics - https://phabricator.wikimedia.org/T128546
[16:45:00] <wikibugs>	 (03Merged) 10jenkins-bot: sre.ganeti: add support for routed Ganeti [cookbooks] - 10https://gerrit.wikimedia.org/r/991348 (https://phabricator.wikimedia.org/T300152) (owner: 10Ayounsi)
[16:46:13] <wikibugs>	 10SRE, 10DNS, 10Foundational Technology Requests, 10Traffic: Ensure that wikimediafoundation.myshopify.com complies with Google's new email sender guidelines - https://phabricator.wikimedia.org/T355833 (10bcampbell) @jhathaway I do not know what CNAME record 4 is for. I can ask Sandra to connect me with Sh...
[16:47:11] <wikibugs>	 10SRE, 10DNS, 10Foundational Technology Requests, 10Traffic: Ensure that store.wikimedia.org complies with Google's new email sender guidelines - https://phabricator.wikimedia.org/T355835 (10bcampbell) Thanks @ssingh. All looks good on the Shopify end for this instance. It says are domain is authenticating...
[16:48:15] <jinxer-wm>	 (PHPFPMTooBusy) resolved: Not enough idle PHP-FPM workers for Mediawiki mw-web at eqiad: 45.11% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[16:48:47] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2158 (T355609)', diff saved to https://phabricator.wikimedia.org/P55822 and previous config saved to /var/cache/conftool/dbconfig/20240129-164846-marostegui.json
[16:48:53] <stashbot>	 T355609: Make cuc_id a bigint - https://phabricator.wikimedia.org/T355609
[16:50:15] <wikibugs>	 (03PS1) 10Lucas Werkmeister (WMDE): Log more information on LexemePatcher errors [extensions/WikibaseLexeme] (wmf/1.42.0-wmf.15) - 10https://gerrit.wikimedia.org/r/993503 (https://phabricator.wikimedia.org/T284061)
[16:50:15] <jinxer-wm>	 (PHPFPMTooBusy) firing: Not enough idle PHP-FPM workers for Mediawiki mw-web at eqiad: 45.56% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[16:50:27] <Lucas_WMDE>	 ^ I’d like to deploy this backport (https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikibaseLexeme/+/993503) in ten minutes or so if nobody objects
[16:50:31] <Lucas_WMDE>	 should be a harmless logging improvement
[16:51:13] <claime>	 Lucas_WMDE: Please wait, the infrastructure is under stress right now as you can see from the PHPFPMTooBusy alert above
[16:51:17] <wikibugs>	 10SRE, 10DNS, 10Foundational Technology Requests, 10Traffic: Ensure that store.wikimedia.org complies with Google's new email sender guidelines - https://phabricator.wikimedia.org/T355835 (10ssingh) 05Open→03Resolved a:03ssingh Thanks for letting us know @bcampbell. I am marking this as resolved; in...
[16:51:22] <logmsgbot>	 !log jdrewniak@deploy2002 Synchronized portals: Wikimedia Portals Update: [[gerrit:993728| Bumping portals to master (T128546)]] (duration: 06m 37s)
[16:51:27] <stashbot>	 T128546: [Recurring Task] Update Wikipedia and sister projects portals statistics - https://phabricator.wikimedia.org/T128546
[16:51:41] <Lucas_WMDE>	 claime: ok
[16:52:32] <wikibugs>	 10SRE, 10DNS, 10Foundational Technology Requests, 10Traffic: Ensure that store.wikimedia.org complies with Google's new email sender guidelines - https://phabricator.wikimedia.org/T355835 (10bcampbell) Thanks @ssingh . The other Shopify instance still needs the CNAME records added it looks like, but we are...
[16:54:51] <wikibugs>	 (03PS1) 10Ilias Sarantopoulos: ml-services: update article-desc image [deployment-charts] - 10https://gerrit.wikimedia.org/r/993729
[16:55:16] <jinxer-wm>	 (PHPFPMTooBusy) resolved: Not enough idle PHP-FPM workers for Mediawiki mw-web at eqiad: 45.56% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[16:56:16] <wikibugs>	 10SRE, 10ops-eqiad, 10decommission-hardware, 10Data-Platform-SRE (2024.01.22 - 2024.02.11): decommission druid1006.eqiad.wmnet - https://phabricator.wikimedia.org/T354743 (10VRiley-WMF) a:03VRiley-WMF
[16:56:57] <wikibugs>	 10SRE, 10ops-eqiad, 10decommission-hardware, 10Data-Platform-SRE (2024.01.22 - 2024.02.11): decommission druid1006.eqiad.wmnet - https://phabricator.wikimedia.org/T354743 (10VRiley-WMF) 05Open→03Resolved
[16:57:08] <wikibugs>	 (03CR) 10Kevin Bazira: [C: 03+1] ml-services: update article-desc image [deployment-charts] - 10https://gerrit.wikimedia.org/r/993729 (owner: 10Ilias Sarantopoulos)
[16:57:22] <wikibugs>	 10SRE, 10ops-eqiad, 10decommission-hardware, 10Data-Platform-SRE (2024.01.22 - 2024.02.11): decommission druid1006.eqiad.wmnet - https://phabricator.wikimedia.org/T354743 (10VRiley-WMF) This has been removed and decommissioned
[16:57:55] <wikibugs>	 10SRE, 10ops-eqiad, 10DBA, 10decommission-hardware: decommission db1134.eqiad.wmnet - https://phabricator.wikimedia.org/T355740 (10VRiley-WMF) a:03VRiley-WMF
[16:58:24] <wikibugs>	 10SRE, 10DNS, 10Foundational Technology Requests, 10Traffic: Ensure that wikimediafoundation.myshopify.com complies with Google's new email sender guidelines - https://phabricator.wikimedia.org/T355833 (10jhathaway) It appears to point to an SPF record:  ` u13504486.wl237.sendgrid.net. 1740 IN   TXT "v=spf...
[16:59:36] <wikibugs>	 10SRE, 10DNS, 10Foundational Technology Requests, 10Traffic: Ensure that wikimediafoundation.myshopify.com complies with Google's new email sender guidelines - https://phabricator.wikimedia.org/T355833 (10bcampbell) @jhathaway I reached out to Sandra requesting that I be connected with our Shopify rep for...
[17:01:52] <wikibugs>	 10SRE, 10ops-eqiad, 10DBA, 10decommission-hardware: decommission db1134.eqiad.wmnet - https://phabricator.wikimedia.org/T355740 (10VRiley-WMF) 05Open→03Resolved
[17:03:04] <wikibugs>	 10SRE, 10ops-eqiad, 10DBA, 10decommission-hardware: decommission db1134.eqiad.wmnet - https://phabricator.wikimedia.org/T355740 (10VRiley-WMF) This server has been removed and decommissioned.
[17:03:53] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P55823 and previous config saved to /var/cache/conftool/dbconfig/20240129-170353-marostegui.json
[17:04:03] <wikibugs>	 10SRE, 10ops-eqiad, 10decommission-hardware, 10Data-Platform-SRE (2024.01.22 - 2024.02.11): decommission druid1005.eqiad.wmnet - https://phabricator.wikimedia.org/T354742 (10VRiley-WMF) a:03VRiley-WMF
[17:06:13] <wikibugs>	 (03PS179) 10Arnaudb: mariadb: cookbook draft to clone multiinstance [cookbooks] - 10https://gerrit.wikimedia.org/r/976709 (https://phabricator.wikimedia.org/T343674)
[17:06:15] <wikibugs>	 (03CR) 10Arnaudb: "I've tried to take note of all previous remarks" [cookbooks] - 10https://gerrit.wikimedia.org/r/976709 (https://phabricator.wikimedia.org/T343674) (owner: 10Arnaudb)
[17:06:50] <wikibugs>	 (03PS1) 10Reedy: Fix casing of Mediawiki to MediaWiki [puppet] - 10https://gerrit.wikimedia.org/r/993732
[17:07:29] <wikibugs>	 10SRE, 10ops-eqiad, 10decommission-hardware, 10Data-Platform-SRE (2024.01.22 - 2024.02.11): decommission druid1005.eqiad.wmnet - https://phabricator.wikimedia.org/T354742 (10VRiley-WMF) This server has been removed and decommissioned.
[17:07:35] <wikibugs>	 10SRE, 10ops-eqiad, 10decommission-hardware, 10Data-Platform-SRE (2024.01.22 - 2024.02.11): decommission druid1005.eqiad.wmnet - https://phabricator.wikimedia.org/T354742 (10VRiley-WMF) 05Open→03Resolved
[17:13:43] <wikibugs>	 10SRE, 10Wikimedia-Site-requests: Changing default image thumbnail size on English Wikipedia - https://phabricator.wikimedia.org/T355914 (10taavi)
[17:14:22] * Lucas_WMDE off, will not deploy that backport today (maybe tomorrow, we’ll see)
[17:19:00] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P55824 and previous config saved to /var/cache/conftool/dbconfig/20240129-171859-marostegui.json
[17:20:13] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+2] Fix casing of Mediawiki to MediaWiki [puppet] - 10https://gerrit.wikimedia.org/r/993732 (owner: 10Reedy)
[17:25:50] <wikibugs>	 (03PS1) 10Reedy: Fix casing of Mediawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/993738
[17:34:06] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2158 (T355609)', diff saved to https://phabricator.wikimedia.org/P55828 and previous config saved to /var/cache/conftool/dbconfig/20240129-173406-marostegui.json
[17:34:08] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on db2169.codfw.wmnet with reason: Maintenance
[17:34:11] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2169.codfw.wmnet with reason: Maintenance
[17:34:12] <stashbot>	 T355609: Make cuc_id a bigint - https://phabricator.wikimedia.org/T355609
[17:34:15] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on db2171.codfw.wmnet with reason: Maintenance
[17:34:29] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2171.codfw.wmnet with reason: Maintenance
[17:34:35] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db2171:3316 (T355609)', diff saved to https://phabricator.wikimedia.org/P55829 and previous config saved to /var/cache/conftool/dbconfig/20240129-173435-marostegui.json
[17:35:53] <wikibugs>	 (03PS1) 10Stevemunene: hdfs: Add new worker hosts to net_topology [puppet] - 10https://gerrit.wikimedia.org/r/993742 (https://phabricator.wikimedia.org/T353776)
[17:35:55] <wikibugs>	 (03PS1) 10Stevemunene: hdfs: Assign the right role to new hadoop workers [puppet] - 10https://gerrit.wikimedia.org/r/993743 (https://phabricator.wikimedia.org/T353776)
[17:38:15] <jinxer-wm>	 (PHPFPMTooBusy) firing: Not enough idle PHP-FPM workers for Mediawiki mw-web at eqiad: 50% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[17:38:55] <jinxer-wm>	 (RdfStreamingUpdaterSpaceUsageTooHigh) firing: (2) The RDF Streaming Updater is using more than 50GiB of storage - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterSpaceUsageTooHigh
[17:39:03] <MatmaRex>	 did something change recently about the HTTPS certificates used by wmcloud.org?
[17:39:23] <MatmaRex>	 i'm getting verification errors in a script that worked before
[17:42:19] <logmsgbot>	 !log dani@deploy2002 helmfile [staging] START helmfile.d/services/miscweb: apply
[17:42:51] <logmsgbot>	 !log dani@deploy2002 helmfile [staging] DONE helmfile.d/services/miscweb: apply
[17:42:52] <logmsgbot>	 !log dani@deploy2002 helmfile [eqiad] START helmfile.d/services/miscweb: apply
[17:43:15] <jinxer-wm>	 (PHPFPMTooBusy) resolved: Not enough idle PHP-FPM workers for Mediawiki mw-web at eqiad: 50% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[17:43:28] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T355609)', diff saved to https://phabricator.wikimedia.org/P55830 and previous config saved to /var/cache/conftool/dbconfig/20240129-174327-marostegui.json
[17:43:30] <logmsgbot>	 !log dani@deploy2002 helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
[17:43:31] <logmsgbot>	 !log dani@deploy2002 helmfile [codfw] START helmfile.d/services/miscweb: apply
[17:43:37] <stashbot>	 T355609: Make cuc_id a bigint - https://phabricator.wikimedia.org/T355609
[17:43:58] <logmsgbot>	 !log dani@deploy2002 helmfile [codfw] DONE helmfile.d/services/miscweb: apply
[17:45:13] <wikibugs>	 10SRE, 10ops-eqiad, 10decommission-hardware, 10Data-Platform-SRE (2024.01.22 - 2024.02.11): decommission druid1004.eqiad.wmnet - https://phabricator.wikimedia.org/T354741 (10VRiley-WMF) 05Open→03Resolved a:03VRiley-WMF
[17:45:24] <wikibugs>	 10SRE, 10ops-eqiad, 10decommission-hardware, 10Data-Platform-SRE (2024.01.22 - 2024.02.11): decommission druid1004.eqiad.wmnet - https://phabricator.wikimedia.org/T354741 (10VRiley-WMF) This server has been removed and decommissioned
[17:45:39] <jinxer-wm>	 (ProbeDown) firing: (6) Service debmonitor1002:7443 has failed probes (http_debmonitor_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Debmonitor - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[17:45:47] <wikibugs>	 10SRE, 10ops-eqiad, 10decommission-hardware, 10Data-Platform-SRE (2024.01.22 - 2024.02.11): decommission druid1004.eqiad.wmnet - https://phabricator.wikimedia.org/T354741 (10VRiley-WMF)
[17:58:34] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P55831 and previous config saved to /var/cache/conftool/dbconfig/20240129-175833-marostegui.json
[18:00:05] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240129T1800)
[18:00:05] <jouncebot>	 ryankemper: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for Wikidata Query Service weekly deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240129T1800).
[18:11:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[18:11:58] <wikibugs>	 (03CR) 10Volans: "change LGTM but this will not remove the existing timers from cumin1001. Is there an easy way to absent the resources in the current puppe" [puppet] - 10https://gerrit.wikimedia.org/r/993710 (https://phabricator.wikimedia.org/T356054) (owner: 10Clément Goubert)
[18:13:40] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P55832 and previous config saved to /var/cache/conftool/dbconfig/20240129-181340-marostegui.json
[18:14:31] <wikibugs>	 10SRE, 10ops-codfw, 10Data-Persistence: Relocating servers out of A1 in codfw - https://phabricator.wikimedia.org/T355437 (10Dzahn) > The only node left to relocation is gitlab2002.   downtime of gitlab announced for tomorrow, Jan 30, 8:30 to 8:40 PST  and banner added, for moving gitlab2002
[18:16:15] <jinxer-wm>	 (MediaWikiHighErrorRate) resolved: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[18:21:36] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: [C: 03+2] ml-services: update article-desc image [deployment-charts] - 10https://gerrit.wikimedia.org/r/993729 (owner: 10Ilias Sarantopoulos)
[18:23:07] <wikibugs>	 (03Merged) 10jenkins-bot: ml-services: update article-desc image [deployment-charts] - 10https://gerrit.wikimedia.org/r/993729 (owner: 10Ilias Sarantopoulos)
[18:23:50] <wikibugs>	 10SRE, 10ops-codfw, 10Infrastructure-Foundations, 10netops: Migrate hosts from codfw row A/B ASW to new LSW devices - https://phabricator.wikimedia.org/T355544 (10cmooney)
[18:24:50] <logmsgbot>	 !log isaranto@deploy2002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
[18:28:47] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T355609)', diff saved to https://phabricator.wikimedia.org/P55833 and previous config saved to /var/cache/conftool/dbconfig/20240129-182846-marostegui.json
[18:28:49] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on db2180.codfw.wmnet with reason: Maintenance
[18:28:52] <stashbot>	 T355609: Make cuc_id a bigint - https://phabricator.wikimedia.org/T355609
[18:29:03] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2180.codfw.wmnet with reason: Maintenance
[18:29:09] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db2180 (T355609)', diff saved to https://phabricator.wikimedia.org/P55834 and previous config saved to /var/cache/conftool/dbconfig/20240129-182909-marostegui.json
[18:32:39] <wikibugs>	 (03PS1) 10Ebernhardson: cirrus updater: Remove consumer-devnull service [deployment-charts] - 10https://gerrit.wikimedia.org/r/993754 (https://phabricator.wikimedia.org/T352335)
[18:32:41] <wikibugs>	 (03PS1) 10Ebernhardson: cirrus: Expand production deployment wikis [deployment-charts] - 10https://gerrit.wikimedia.org/r/993755 (https://phabricator.wikimedia.org/T352335)
[18:34:09] <wikibugs>	 (03PS2) 10Dbrant: [WIP] Add labs config to test Contact page for account vanishing. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/993718 (https://phabricator.wikimedia.org/T343536)
[18:37:14] <wikibugs>	 (03PS10) 10Andrea Denisse: grafana: Create the grafana sysuser with a reserved UID/GID [puppet] - 10https://gerrit.wikimedia.org/r/990795 (https://phabricator.wikimedia.org/T352665)
[18:41:58] <wikibugs>	 (03PS11) 10Andrea Denisse: grafana: Create the grafana sysuser with a reserved UID/GID [puppet] - 10https://gerrit.wikimedia.org/r/990795 (https://phabricator.wikimedia.org/T352665)
[18:45:46] <wikibugs>	 (03PS3) 10Ebernhardson: cirrus: Disable cloudelastic writes to testwiki and mw.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/992974 (https://phabricator.wikimedia.org/T352335)
[18:46:58] <wikibugs>	 (03CR) 10Ebernhardson: [C: 03+2] cirrus updater: Remove consumer-devnull service [deployment-charts] - 10https://gerrit.wikimedia.org/r/993754 (https://phabricator.wikimedia.org/T352335) (owner: 10Ebernhardson)
[18:47:07] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM" [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/993089 (https://phabricator.wikimedia.org/T300152) (owner: 10Ayounsi)
[18:47:35] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2180 (T355609)', diff saved to https://phabricator.wikimedia.org/P55835 and previous config saved to /var/cache/conftool/dbconfig/20240129-184735-marostegui.json
[18:47:41] <stashbot>	 T355609: Make cuc_id a bigint - https://phabricator.wikimedia.org/T355609
[18:47:41] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+2] Homer-public: add Ganeti BGP group [homer/public] - 10https://gerrit.wikimedia.org/r/993090 (https://phabricator.wikimedia.org/T300152) (owner: 10Ayounsi)
[18:47:45] <wikibugs>	 (03Merged) 10jenkins-bot: cirrus updater: Remove consumer-devnull service [deployment-charts] - 10https://gerrit.wikimedia.org/r/993754 (https://phabricator.wikimedia.org/T352335) (owner: 10Ebernhardson)
[18:49:42] <logmsgbot>	 !log ebernhardson@deploy2002 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
[18:49:53] <logmsgbot>	 !log ebernhardson@deploy2002 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
[18:49:54] <logmsgbot>	 !log brouberol@cumin1001 END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99) restart masters for Hadoop test cluster: Restart of jvm daemons.
[18:52:47] <wikibugs>	 (03PS12) 10Andrea Denisse: grafana: Create the grafana sysuser with a reserved UID/GID [puppet] - 10https://gerrit.wikimedia.org/r/990795 (https://phabricator.wikimedia.org/T352665)
[18:54:20] <wikibugs>	 (03CR) 10Andrea Denisse: grafana: Create the grafana sysuser with a reserved UID/GID (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/990795 (https://phabricator.wikimedia.org/T352665) (owner: 10Andrea Denisse)
[18:54:34] <wikibugs>	 (03Merged) 10jenkins-bot: Homer-public: add Ganeti BGP group [homer/public] - 10https://gerrit.wikimedia.org/r/993090 (https://phabricator.wikimedia.org/T300152) (owner: 10Ayounsi)
[18:56:05] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+2] wmf-netbox: add Ganeti BGP group support [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/993089 (https://phabricator.wikimedia.org/T300152) (owner: 10Ayounsi)
[18:56:58] <wikibugs>	 (03PS2) 10Brouberol: Build hue for Debian Bullseye by default [debs/hue] - 10https://gerrit.wikimedia.org/r/993708 (https://phabricator.wikimedia.org/T349400)
[18:57:08] <wikibugs>	 (03CR) 10Ebernhardson: [C: 03+2] cirrus: Expand production deployment wikis [deployment-charts] - 10https://gerrit.wikimedia.org/r/993755 (https://phabricator.wikimedia.org/T352335) (owner: 10Ebernhardson)
[18:58:04] <logmsgbot>	 !log ayounsi@cumin1002 START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin[1001-1002].eqiad.wmnet with reason: CR993089 - ayounsi@cumin1002
[18:58:13] <wikibugs>	 (03Merged) 10jenkins-bot: cirrus: Expand production deployment wikis [deployment-charts] - 10https://gerrit.wikimedia.org/r/993755 (https://phabricator.wikimedia.org/T352335) (owner: 10Ebernhardson)
[18:59:51] <logmsgbot>	 !log ebernhardson@deploy2002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
[18:59:55] <logmsgbot>	 !log ebernhardson@deploy2002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
[19:00:30] <logmsgbot>	 !log ayounsi@cumin1002 END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin[1001-1002].eqiad.wmnet with reason: CR993089 - ayounsi@cumin1002
[19:01:22] <logmsgbot>	 !log ebernhardson@deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
[19:01:24] <wikibugs>	 (03PS1) 10EoghanGaffney: [phabricator] Fix commenting on tasks by email [puppet] - 10https://gerrit.wikimedia.org/r/993759 (https://phabricator.wikimedia.org/T356077)
[19:01:31] <logmsgbot>	 !log ebernhardson@deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
[19:02:42] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P55836 and previous config saved to /var/cache/conftool/dbconfig/20240129-190241-marostegui.json
[19:03:56] <wikibugs>	 (03PS2) 10Jdlrobson: Use desktop history page HTML everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/991424 (https://phabricator.wikimedia.org/T353388)
[19:04:53] <wikibugs>	 (03PS1) 10Ayounsi: vms_import policy: fix typo [homer/public] - 10https://gerrit.wikimedia.org/r/993760 (https://phabricator.wikimedia.org/T300152)
[19:05:41] <wikibugs>	 (03PS4) 10Paladox: Phabricator: switch python to python3 in phab_epipe [puppet] - 10https://gerrit.wikimedia.org/r/993766 (https://phabricator.wikimedia.org/T356077)
[19:06:42] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+2] vms_import policy: fix typo [homer/public] - 10https://gerrit.wikimedia.org/r/993760 (https://phabricator.wikimedia.org/T300152) (owner: 10Ayounsi)
[19:07:22] <wikibugs>	 (03Merged) 10jenkins-bot: vms_import policy: fix typo [homer/public] - 10https://gerrit.wikimedia.org/r/993760 (https://phabricator.wikimedia.org/T300152) (owner: 10Ayounsi)
[19:07:58] <wikibugs>	 (03Abandoned) 10Paladox: Phabricator: switch python to python3 in phab_epipe [puppet] - 10https://gerrit.wikimedia.org/r/993766 (https://phabricator.wikimedia.org/T356077) (owner: 10Paladox)
[19:11:17] <wikibugs>	 (03CR) 10Dzahn: [C: 03+1] [phabricator] Fix commenting on tasks by email [puppet] - 10https://gerrit.wikimedia.org/r/993759 (https://phabricator.wikimedia.org/T356077) (owner: 10EoghanGaffney)
[19:17:48] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P55837 and previous config saved to /var/cache/conftool/dbconfig/20240129-191748-marostegui.json
[19:18:27] <wikibugs>	 (03PS1) 10Bking: cloudelastic: Add migration canary to cloudelastic cluster [puppet] - 10https://gerrit.wikimedia.org/r/993764 (https://phabricator.wikimedia.org/T355617)
[19:19:48] <wikibugs>	 (03CR) 10Dzahn: admin: add amastilovic to analytics-privatedata [puppet] - 10https://gerrit.wikimedia.org/r/993170 (https://phabricator.wikimedia.org/T355606) (owner: 10AOkoth)
[19:20:13] <wikibugs>	 (03CR) 10Bking: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/993764 (https://phabricator.wikimedia.org/T355617) (owner: 10Bking)
[19:21:36] <wikibugs>	 (03Abandoned) 10Bking: cloudelastic: apply cloudelastic role to canary [puppet] - 10https://gerrit.wikimedia.org/r/993148 (https://phabricator.wikimedia.org/T355617) (owner: 10Bking)
[19:21:51] <wikibugs>	 (03Abandoned) 10Bking: cloudelastic: use CFSSL for TLS on canary [puppet] - 10https://gerrit.wikimedia.org/r/993103 (https://phabricator.wikimedia.org/T355617) (owner: 10Bking)
[19:22:06] <wikibugs>	 (03CR) 10EoghanGaffney: [C: 03+2] [phabricator] Fix commenting on tasks by email [puppet] - 10https://gerrit.wikimedia.org/r/993759 (https://phabricator.wikimedia.org/T356077) (owner: 10EoghanGaffney)
[19:24:13] <wikibugs>	 (03PS1) 10Zabe: Start reading from af_actor/afh_actor everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/993765 (https://phabricator.wikimedia.org/T355616)
[19:25:16] <zabe>	 jouncebot: nowandnext
[19:25:16] <jouncebot>	 No deployments scheduled for the next 1 hour(s) and 34 minute(s)
[19:25:16] <jouncebot>	 In 1 hour(s) and 34 minute(s): UTC late backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240129T2100)
[19:26:03] <wikibugs>	 (03CR) 10Zabe: [C: 03+2] Start reading from af_actor/afh_actor everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/993765 (https://phabricator.wikimedia.org/T355616) (owner: 10Zabe)
[19:26:50] <wikibugs>	 (03Merged) 10jenkins-bot: Start reading from af_actor/afh_actor everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/993765 (https://phabricator.wikimedia.org/T355616) (owner: 10Zabe)
[19:27:08] <logmsgbot>	 !log zabe@deploy2002 Started scap: Backport for [[gerrit:993765|Start reading from af_actor/afh_actor everywhere (T355616)]]
[19:27:14] <stashbot>	 T355616: Start reading from af_actor/afh_actor - https://phabricator.wikimedia.org/T355616
[19:28:31] <logmsgbot>	 !log zabe@deploy2002 zabe: Backport for [[gerrit:993765|Start reading from af_actor/afh_actor everywhere (T355616)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[19:29:58] <logmsgbot>	 !log zabe@deploy2002 zabe: Continuing with sync
[19:31:10] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] contint: Remove obsolete firewall rules [puppet] - 10https://gerrit.wikimedia.org/r/993072 (owner: 10Muehlenhoff)
[19:32:55] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2180 (T355609)', diff saved to https://phabricator.wikimedia.org/P55838 and previous config saved to /var/cache/conftool/dbconfig/20240129-193254-marostegui.json
[19:32:57] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on db2193.codfw.wmnet with reason: Maintenance
[19:33:00] <stashbot>	 T355609: Make cuc_id a bigint - https://phabricator.wikimedia.org/T355609
[19:33:11] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2193.codfw.wmnet with reason: Maintenance
[19:33:17] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db2193 (T355609)', diff saved to https://phabricator.wikimedia.org/P55839 and previous config saved to /var/cache/conftool/dbconfig/20240129-193317-marostegui.json
[19:36:18] <logmsgbot>	 !log zabe@deploy2002 Finished scap: Backport for [[gerrit:993765|Start reading from af_actor/afh_actor everywhere (T355616)]] (duration: 09m 09s)
[19:36:25] <stashbot>	 T355616: Start reading from af_actor/afh_actor - https://phabricator.wikimedia.org/T355616
[19:41:25] <wikibugs>	 (03PS1) 10Ebernhardson: cirrus updater: Apply consumer throughput configuration [deployment-charts] - 10https://gerrit.wikimedia.org/r/993788 (https://phabricator.wikimedia.org/T352335)
[19:41:45] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] cirrus updater: Apply consumer throughput configuration [deployment-charts] - 10https://gerrit.wikimedia.org/r/993788 (https://phabricator.wikimedia.org/T352335) (owner: 10Ebernhardson)
[19:42:08] <wikibugs>	 (03CR) 10Ebernhardson: [C: 03+1] cloudelastic: Add migration canary to cloudelastic cluster [puppet] - 10https://gerrit.wikimedia.org/r/993764 (https://phabricator.wikimedia.org/T355617) (owner: 10Bking)
[19:42:19] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2193 (T355609)', diff saved to https://phabricator.wikimedia.org/P55840 and previous config saved to /var/cache/conftool/dbconfig/20240129-194218-marostegui.json
[19:42:24] <stashbot>	 T355609: Make cuc_id a bigint - https://phabricator.wikimedia.org/T355609
[19:43:25] <wikibugs>	 (03PS2) 10Ebernhardson: cirrus updater: Apply consumer throughput configuration [deployment-charts] - 10https://gerrit.wikimedia.org/r/993788 (https://phabricator.wikimedia.org/T352335)
[19:53:45] <wikibugs>	 (03PS3) 10Ebernhardson: cirrus updater: Apply consumer throughput configuration [deployment-charts] - 10https://gerrit.wikimedia.org/r/993788 (https://phabricator.wikimedia.org/T352335)
[19:57:28] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P55841 and previous config saved to /var/cache/conftool/dbconfig/20240129-195725-marostegui.json
[20:01:16] <wikibugs>	 (03CR) 10Gehel: [C: 03+1] "LGTM, worst case is probably either the server does not join the cluster at all (our pybal check should remove that server from rotation i" [puppet] - 10https://gerrit.wikimedia.org/r/993764 (https://phabricator.wikimedia.org/T355617) (owner: 10Bking)
[20:12:34] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P55842 and previous config saved to /var/cache/conftool/dbconfig/20240129-201233-marostegui.json
[20:27:41] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2193 (T355609)', diff saved to https://phabricator.wikimedia.org/P55843 and previous config saved to /var/cache/conftool/dbconfig/20240129-202740-marostegui.json
[20:27:46] <stashbot>	 T355609: Make cuc_id a bigint - https://phabricator.wikimedia.org/T355609
[20:29:17] <wikibugs>	 (03PS4) 10Ebernhardson: cirrus updater: Apply consumer throughput configuration [deployment-charts] - 10https://gerrit.wikimedia.org/r/993788 (https://phabricator.wikimedia.org/T352335)
[20:31:19] <wikibugs>	 (03CR) 10Ebernhardson: [C: 03+2] cirrus updater: Apply consumer throughput configuration [deployment-charts] - 10https://gerrit.wikimedia.org/r/993788 (https://phabricator.wikimedia.org/T352335) (owner: 10Ebernhardson)
[20:32:17] <wikibugs>	 (03Merged) 10jenkins-bot: cirrus updater: Apply consumer throughput configuration [deployment-charts] - 10https://gerrit.wikimedia.org/r/993788 (https://phabricator.wikimedia.org/T352335) (owner: 10Ebernhardson)
[20:33:41] <logmsgbot>	 !log ebernhardson@deploy2002 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
[20:33:49] <logmsgbot>	 !log ebernhardson@deploy2002 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
[20:37:11] <logmsgbot>	 !log ebernhardson@deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
[20:37:19] <logmsgbot>	 !log ebernhardson@deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
[20:45:06] <wikibugs>	 (03PS2) 10Eevans: cassandra: create template for aqsloader role & grants [puppet] - 10https://gerrit.wikimedia.org/r/993102 (https://phabricator.wikimedia.org/T355917)
[20:50:21] <wikibugs>	 (03PS2) 10Jdlrobson: Begin capturing errors for Wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/992931
[20:50:26] <wikibugs>	 (03PS3) 10Jdlrobson: Use desktop history page HTML everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/991424 (https://phabricator.wikimedia.org/T353388)
[20:50:35] <wikibugs>	 (03PS3) 10Jdlrobson: Begin capturing errors for Wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/992931
[20:51:40] <wikibugs>	 (03CR) 10Eevans: cassandra: create template for aqsloader role & grants (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/993102 (https://phabricator.wikimedia.org/T355917) (owner: 10Eevans)
[20:53:54] <wikibugs>	 (03PS1) 10JHathaway: reposync: don't enforce ownership after init [puppet] - 10https://gerrit.wikimedia.org/r/993797
[20:56:25] <wikibugs>	 (03CR) 10JHathaway: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/993797 (owner: 10JHathaway)
[21:00:04] <jouncebot>	 RoanKattouw, Urbanecm, cjming, TheresNoTime, and kindrobot: gettimeofday() says it's time for UTC late backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240129T2100)
[21:00:04] <jouncebot>	 ebernhardson and kemayo: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[21:00:16] <ebernhardson>	 \o
[21:00:17] <Kemayo>	 o/
[21:00:44] <RoanKattouw>	 I can deploy in a minute, just heating up lunch
[21:01:28] <Jdlrobson>	 present
[21:01:39] <Kemayo>	 Looks like we just have config deployments in this window anyway, so it seems we wouldn't be on a time crunch.
[21:07:43] <wikibugs>	 (03PS4) 10Catrope: cirrus: Disable cloudelastic writes to testwiki and mw.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/992974 (https://phabricator.wikimedia.org/T352335) (owner: 10Ebernhardson)
[21:08:02] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by catrope@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/992974 (https://phabricator.wikimedia.org/T352335) (owner: 10Ebernhardson)
[21:08:53] <wikibugs>	 (03Merged) 10jenkins-bot: cirrus: Disable cloudelastic writes to testwiki and mw.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/992974 (https://phabricator.wikimedia.org/T352335) (owner: 10Ebernhardson)
[21:09:05] <logmsgbot>	 !log catrope@deploy2002 Started scap: Backport for [[gerrit:992974|cirrus: Disable cloudelastic writes to testwiki and mw.org (T352335)]]
[21:09:11] <stashbot>	 T352335: Deploy the new Cirrus Updater to update select wikis in cloudelastic - https://phabricator.wikimedia.org/T352335
[21:10:27] <logmsgbot>	 !log catrope@deploy2002 ebernhardson and catrope: Backport for [[gerrit:992974|cirrus: Disable cloudelastic writes to testwiki and mw.org (T352335)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[21:10:57] <RoanKattouw>	 ebernhardson: Please test on the mwdebug servers (if possible/applicable) and let me know whether to proceed
[21:11:11] <ebernhardson>	 RoanKattouw: it only changes job runner stuff, go ahead and proceed
[21:11:21] <logmsgbot>	 !log catrope@deploy2002 ebernhardson and catrope: Continuing with sync
[21:12:14] <wikibugs>	 (03PS1) 10Brennen Bearnes: phabricator: tools: install python3-pymsql for public_task_dump.py [puppet] - 10https://gerrit.wikimedia.org/r/993799 (https://phabricator.wikimedia.org/T355574)
[21:17:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[21:17:46] <logmsgbot>	 !log catrope@deploy2002 Finished scap: Backport for [[gerrit:992974|cirrus: Disable cloudelastic writes to testwiki and mw.org (T352335)]] (duration: 08m 40s)
[21:17:51] <stashbot>	 T352335: Deploy the new Cirrus Updater to update select wikis in cloudelastic - https://phabricator.wikimedia.org/T352335
[21:22:15] <jinxer-wm>	 (MediaWikiHighErrorRate) resolved: Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[21:23:12] <wikibugs>	 (03PS2) 10Catrope: DiscussionTools: Enable permalinks frontend everywhere except en.wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/993709 (https://phabricator.wikimedia.org/T356063) (owner: 10Esanders)
[21:23:30] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by catrope@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/993709 (https://phabricator.wikimedia.org/T356063) (owner: 10Esanders)
[21:24:03] <wikibugs>	 (03CR) 10Catrope: [C: 03+1] foreachwikiindblist: Return early when no arg is passed [puppet] - 10https://gerrit.wikimedia.org/r/992263 (owner: 10Zabe)
[21:24:16] <wikibugs>	 (03Merged) 10jenkins-bot: DiscussionTools: Enable permalinks frontend everywhere except en.wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/993709 (https://phabricator.wikimedia.org/T356063) (owner: 10Esanders)
[21:24:28] <logmsgbot>	 !log catrope@deploy2002 Started scap: Backport for [[gerrit:993709|DiscussionTools: Enable permalinks frontend everywhere except en.wiki (T356063)]]
[21:24:33] <stashbot>	 T356063: Deploy talk page permalinks to all wikis except en.wiki - https://phabricator.wikimedia.org/T356063
[21:25:48] <logmsgbot>	 !log catrope@deploy2002 catrope and esanders: Backport for [[gerrit:993709|DiscussionTools: Enable permalinks frontend everywhere except en.wiki (T356063)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[21:28:28] <RoanKattouw>	 Kemayo: Sorry forgot to ping: your patch is now ready for testing
[21:28:45] <RoanKattouw>	 (the bot pinged Ed instead because he authored the patch)
[21:28:49] <Kemayo>	 RoanKattouw: I will check into it.
[21:29:55] <Kemayo>	 RoanKattouw: It's working fine, sync away
[21:30:25] <logmsgbot>	 !log catrope@deploy2002 catrope and esanders: Continuing with sync
[21:30:48] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] phabricator: tools: install python3-pymsql for public_task_dump.py [puppet] - 10https://gerrit.wikimedia.org/r/993799 (https://phabricator.wikimedia.org/T355574) (owner: 10Brennen Bearnes)
[21:32:00] <Jdlrobson>	 RoanKattouw: have I got 7 minutes to make a coffee?
[21:32:46] <RoanKattouw>	 Yes go for it
[21:33:30] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "there is a typo in the package name. following up to fix it. python3-pymysql" [puppet] - 10https://gerrit.wikimedia.org/r/993799 (https://phabricator.wikimedia.org/T355574) (owner: 10Brennen Bearnes)
[21:34:37] <wikibugs>	 (03CR) 10Brennen Bearnes: "Gah, sorry about that." [puppet] - 10https://gerrit.wikimedia.org/r/993799 (https://phabricator.wikimedia.org/T355574) (owner: 10Brennen Bearnes)
[21:35:25] <wikibugs>	 (03PS1) 10Dzahn: phabricator: fix typo in python3-pymysql package name [puppet] - 10https://gerrit.wikimedia.org/r/993801 (https://phabricator.wikimedia.org/T355574)
[21:35:46] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] phabricator: fix typo in python3-pymysql package name [puppet] - 10https://gerrit.wikimedia.org/r/993801 (https://phabricator.wikimedia.org/T355574) (owner: 10Dzahn)
[21:35:58] <wikibugs>	 (03CR) 10Dzahn: [V: 03+2 C: 03+2] phabricator: fix typo in python3-pymysql package name [puppet] - 10https://gerrit.wikimedia.org/r/993801 (https://phabricator.wikimedia.org/T355574) (owner: 10Dzahn)
[21:36:47] <logmsgbot>	 !log catrope@deploy2002 Finished scap: Backport for [[gerrit:993709|DiscussionTools: Enable permalinks frontend everywhere except en.wiki (T356063)]] (duration: 12m 19s)
[21:36:52] <stashbot>	 T356063: Deploy talk page permalinks to all wikis except en.wiki - https://phabricator.wikimedia.org/T356063
[21:37:35] <wikibugs>	 (03PS4) 10Catrope: Use desktop history page HTML everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/991424 (https://phabricator.wikimedia.org/T353388) (owner: 10Jdlrobson)
[21:38:05] <RoanKattouw>	 Jdlrobson: Ready to start your patches whenever, ping me when you're back/ready
[21:38:56] <jinxer-wm>	 (RdfStreamingUpdaterSpaceUsageTooHigh) firing: (2) The RDF Streaming Updater is using more than 50GiB of storage - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterSpaceUsageTooHigh
[21:39:09] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "installed now on phab servers. feel free to test.  ii  python3-pymysql" [puppet] - 10https://gerrit.wikimedia.org/r/993799 (https://phabricator.wikimedia.org/T355574) (owner: 10Brennen Bearnes)
[21:40:38] <Jdlrobson>	 RoanKattouw: yep here
[21:40:42] <Jdlrobson>	 and you can push them out together
[21:40:50] <wikibugs>	 (03CR) 10Catrope: [C: 03+2] Use desktop history page HTML everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/991424 (https://phabricator.wikimedia.org/T353388) (owner: 10Jdlrobson)
[21:40:56] <wikibugs>	 (03PS4) 10Catrope: Begin capturing errors for Wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/992931 (owner: 10Jdlrobson)
[21:41:15] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by catrope@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/991424 (https://phabricator.wikimedia.org/T353388) (owner: 10Jdlrobson)
[21:41:17] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by catrope@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/992931 (owner: 10Jdlrobson)
[21:41:39] <wikibugs>	 (03Merged) 10jenkins-bot: Use desktop history page HTML everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/991424 (https://phabricator.wikimedia.org/T353388) (owner: 10Jdlrobson)
[21:41:58] <wikibugs>	 (03Merged) 10jenkins-bot: Begin capturing errors for Wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/992931 (owner: 10Jdlrobson)
[21:42:14] <logmsgbot>	 !log catrope@deploy2002 Started scap: Backport for [[gerrit:991424|Use desktop history page HTML everywhere (T353388)]], [[gerrit:992931|Begin capturing errors for Wikivoyage]]
[21:42:19] <stashbot>	 T353388: Enable desktop history HTML on mobile - https://phabricator.wikimedia.org/T353388
[21:43:34] <logmsgbot>	 !log catrope@deploy2002 catrope and jdlrobson: Backport for [[gerrit:991424|Use desktop history page HTML everywhere (T353388)]], [[gerrit:992931|Begin capturing errors for Wikivoyage]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[21:43:44] <RoanKattouw>	 Jdlrobson: Please test on the mwdebug servers
[21:45:39] <jinxer-wm>	 (ProbeDown) firing: (6) Service debmonitor1002:7443 has failed probes (http_debmonitor_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Debmonitor - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[21:45:52] <Jdlrobson>	 RoanKattouw: looking now
[21:47:59] <Jdlrobson>	 RoanKattouw: yep you can merge that.
[21:48:04] <logmsgbot>	 !log catrope@deploy2002 catrope and jdlrobson: Continuing with sync
[21:52:37] <wikibugs>	 (03PS1) 10BCornwall: ncredir: Set fifo_log_demux/nginx as wanted_by [puppet] - 10https://gerrit.wikimedia.org/r/993804 (https://phabricator.wikimedia.org/T355905)
[21:54:20] <logmsgbot>	 !log catrope@deploy2002 Finished scap: Backport for [[gerrit:991424|Use desktop history page HTML everywhere (T353388)]], [[gerrit:992931|Begin capturing errors for Wikivoyage]] (duration: 12m 05s)
[21:54:25] <stashbot>	 T353388: Enable desktop history HTML on mobile - https://phabricator.wikimedia.org/T353388
[21:54:32] <RoanKattouw>	 Alright that's it, all done
[21:56:24] <Jdlrobson>	 Thanks RoanKattouw 
[21:58:26] <Jdlrobson>	 RoanKattouw: ah one follow up if you have the time?
[21:58:27] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting analytics-privatedata-users access for amastilovic - https://phabricator.wikimedia.org/T355606 (10amastilovic) Hi @MoritzMuehlenhoff ,  I need access to the following (from the wiki page you provided):  LDAP membership in the wmf or nda LDAP group....
[21:58:29] <Jdlrobson>	 Otherwise I can do it tomorrow
[21:58:36] <Jdlrobson>	 I forgot to remove the enwiki config :)
[21:58:51] <RoanKattouw>	 Sure, no problem
[21:59:20] <wikibugs>	 (03PS1) 10Jdlrobson: Drop English Wikipedia configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/993805 (https://phabricator.wikimedia.org/T353388)
[21:59:32] <Jdlrobson>	 ^ RoanKattouw I can put it on the calendar nopw
[22:00:05] <jouncebot>	 Reedy, sbassett, Maryum, and manfredi: #bothumor I � Unicode. All rise for Weekly Security deployment window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240129T2200).
[22:00:42] <RoanKattouw>	 Please hold off on the security deployment window for 5ish more minutes, I have one last patch from the backport window to do
[22:00:57] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by catrope@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/993805 (https://phabricator.wikimedia.org/T353388) (owner: 10Jdlrobson)
[22:01:21] <wikibugs>	 (03PS2) 10Catrope: Drop English Wikipedia configuration for wgMFUseDesktopSpecialHistoryPage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/993805 (https://phabricator.wikimedia.org/T353388) (owner: 10Jdlrobson)
[22:01:32] <wikibugs>	 (03CR) 10Catrope: [C: 03+2] Drop English Wikipedia configuration for wgMFUseDesktopSpecialHistoryPage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/993805 (https://phabricator.wikimedia.org/T353388) (owner: 10Jdlrobson)
[22:01:41] <wikibugs>	 (03CR) 10TrainBranchBot: "Approved by catrope@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/993805 (https://phabricator.wikimedia.org/T353388) (owner: 10Jdlrobson)
[22:02:17] <wikibugs>	 (03Merged) 10jenkins-bot: Drop English Wikipedia configuration for wgMFUseDesktopSpecialHistoryPage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/993805 (https://phabricator.wikimedia.org/T353388) (owner: 10Jdlrobson)
[22:02:29] <logmsgbot>	 !log catrope@deploy2002 Started scap: Backport for [[gerrit:993805|Drop English Wikipedia configuration for wgMFUseDesktopSpecialHistoryPage (T353388)]]
[22:02:34] <stashbot>	 T353388: Enable desktop history HTML on mobile - https://phabricator.wikimedia.org/T353388
[22:02:57] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting analytics-privatedata-users access for amastilovic - https://phabricator.wikimedia.org/T355606 (10MoritzMuehlenhoff) a:05Arnoldokoth→03Eevans @amastilovic Thanks.  Reassigning to @Eevans as the current SRE on our weekly clinic duty.
[22:03:48] <logmsgbot>	 !log catrope@deploy2002 catrope and jdlrobson: Backport for [[gerrit:993805|Drop English Wikipedia configuration for wgMFUseDesktopSpecialHistoryPage (T353388)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[22:24:25] <Jdlrobson>	 Thanks RoanKattouw !
[22:24:37] <RoanKattouw>	 Oh whoops I never finished it
[22:24:39] <logmsgbot>	 !log catrope@deploy2002 catrope and jdlrobson: Continuing with sync
[22:24:48] <RoanKattouw>	 It was still stuck on the test server stage
[22:25:20] <Jdlrobson>	 (debug server is working fine!)
[22:29:13] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting analytics-privatedata-users access for amastilovic - https://phabricator.wikimedia.org/T355606 (10Eevans) a:05Eevans→03ABran-WMF
[22:29:42] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting analytics-privatedata-users access for amastilovic - https://phabricator.wikimedia.org/T355606 (10Eevans) >>! In T355606#9496444, @MoritzMuehlenhoff wrote: > @amastilovic Thanks. >  > Reassigning to @Eevans as the current SRE on our weekly clinic d...
[22:31:02] <logmsgbot>	 !log catrope@deploy2002 Finished scap: Backport for [[gerrit:993805|Drop English Wikipedia configuration for wgMFUseDesktopSpecialHistoryPage (T353388)]] (duration: 28m 33s)
[22:31:07] <stashbot>	 T353388: Enable desktop history HTML on mobile - https://phabricator.wikimedia.org/T353388
[22:31:26] <RoanKattouw>	 OK, all done for real this time
[22:32:22] <Jdlrobson>	 yay
[23:00:48] <wikibugs>	 (03CR) 10BCornwall: [V: 03+1] "PCC SUCCESS (): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/1234/console" [puppet] - 10https://gerrit.wikimedia.org/r/993804 (https://phabricator.wikimedia.org/T355905) (owner: 10BCornwall)
[23:02:32] <wikibugs>	 (03CR) 10BCornwall: [V: 03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/1235/co" [puppet] - 10https://gerrit.wikimedia.org/r/993804 (https://phabricator.wikimedia.org/T355905) (owner: 10BCornwall)
[23:19:22] <wikibugs>	 (03PS2) 10BCornwall: ncredir: Set fifo_log_demux/nginx as wanted_by [puppet] - 10https://gerrit.wikimedia.org/r/993804 (https://phabricator.wikimedia.org/T355905)
[23:20:38] <wikibugs>	 (03CR) 10BCornwall: [V: 03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/1236/co" [puppet] - 10https://gerrit.wikimedia.org/r/993804 (https://phabricator.wikimedia.org/T355905) (owner: 10BCornwall)