[00:21:25] (SystemdUnitFailed) resolved: rsync-aptrepo-apt2001.wikimedia.org.service on apt1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:26:25] (SystemdUnitFailed) firing: rsync-aptrepo-apt2001.wikimedia.org.service on apt1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:31:21] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [00:31:27] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [00:35:38] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [00:35:45] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [00:38:46] (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1010967 [00:38:49] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1010967 (owner: 10TrainBranchBot) [00:40:42] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [00:40:48] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [01:00:56] (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1010967 (owner: 10TrainBranchBot) [01:07:33] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [01:07:40] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [01:23:26] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [01:23:32] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [01:26:52] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [01:26:59] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [01:47:56] (ProbeDown) firing: (2) Service mirror1001:443 has failed probes (http_mirrors_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#mirror1001:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [01:51:50] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [01:51:57] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [01:57:56] (ProbeDown) resolved: (2) Service mirror1001:443 has failed probes (http_mirrors_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#mirror1001:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [02:04:29] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [02:04:36] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [02:17:12] (SystemdUnitFailed) firing: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:26:57] (SystemdUnitFailed) firing: (2) update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:37:15] (JobUnavailable) firing: (3) Reduced availability for job ldap in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [03:02:15] (JobUnavailable) firing: (3) Reduced availability for job ldap in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [03:21:45] (SwiftTooManyMediaUploads) firing: (2) Too many eqiad mediawiki originals uploads - https://wikitech.wikimedia.org/wiki/Swift/How_To#mediawiki_originals_uploads - https://alerts.wikimedia.org/?q=alertname%3DSwiftTooManyMediaUploads [03:55:05] (03CR) 10Tim Starling: [C:03+2] Switch block schema to read-new/write-both mode [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1006180 (https://phabricator.wikimedia.org/T355034) (owner: 10Tim Starling) [03:55:15] (03CR) 10CI reject: [V:04-1] Switch block schema to read-new/write-both mode [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1006180 (https://phabricator.wikimedia.org/T355034) (owner: 10Tim Starling) [03:55:41] (03PS2) 10Tim Starling: Switch block schema to read-new/write-both mode [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1006180 (https://phabricator.wikimedia.org/T355034) [03:55:56] (03CR) 10Tim Starling: "go" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1006180 (https://phabricator.wikimedia.org/T355034) (owner: 10Tim Starling) [03:56:40] (03Merged) 10jenkins-bot: Switch block schema to read-new/write-both mode [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1006180 (https://phabricator.wikimedia.org/T355034) (owner: 10Tim Starling) [04:01:45] (SwiftTooManyMediaUploads) resolved: (2) Too many eqiad mediawiki originals uploads - https://wikitech.wikimedia.org/wiki/Swift/How_To#mediawiki_originals_uploads - https://alerts.wikimedia.org/?q=alertname%3DSwiftTooManyMediaUploads [04:09:03] (03PS1) 10Tim Starling: Revert "Switch block schema to read-new/write-both mode" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1010929 [04:09:34] (03CR) 10Tim Starling: [C:03+2] Revert "Switch block schema to read-new/write-both mode" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1010929 (owner: 10Tim Starling) [04:10:20] (03Merged) 10jenkins-bot: Revert "Switch block schema to read-new/write-both mode" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1010929 (owner: 10Tim Starling) [04:20:40] PROBLEM - Check whether ferm is active by checking the default input chain on mw2321 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [04:23:48] !log tstarling@deploy2002 Synchronized wmf-config/CommonSettings.php: reverting for now due to slow query T355034 (duration: 12m 28s) [04:24:04] T355034: Deploy new block_target schema - https://phabricator.wikimedia.org/T355034 [04:26:25] (SystemdUnitFailed) firing: rsync-aptrepo-apt2001.wikimedia.org.service on apt1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:50:40] RECOVERY - Check whether ferm is active by checking the default input chain on mw2321 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [04:53:11] (03PS1) 10AOkoth: miscweb: add security-landing-page [deployment-charts] - 10https://gerrit.wikimedia.org/r/1011028 (https://phabricator.wikimedia.org/T350796) [04:58:26] (03PS2) 10AOkoth: miscweb: add security-landing-page values [deployment-charts] - 10https://gerrit.wikimedia.org/r/1011028 (https://phabricator.wikimedia.org/T350796) [05:12:26] (03CR) 10AOkoth: "Image: https://docker-registry.wikimedia.org/repos/sre/miscweb/security-landing-page/tags/" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1011028 (https://phabricator.wikimedia.org/T350796) (owner: 10AOkoth) [05:46:31] (03PS1) 10KartikMistry: Update cxserver to 2024-03-14-053505-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1011033 (https://phabricator.wikimedia.org/T350773) [05:49:11] (03CR) 10KartikMistry: [C:03+2] Update cxserver to 2024-03-14-053505-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1011033 (https://phabricator.wikimedia.org/T350773) (owner: 10KartikMistry) [05:50:03] (03Merged) 10jenkins-bot: Update cxserver to 2024-03-14-053505-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1011033 (https://phabricator.wikimedia.org/T350773) (owner: 10KartikMistry) [05:52:17] !log kartik@deploy2002 helmfile [staging] START helmfile.d/services/cxserver: apply [05:52:38] !log kartik@deploy2002 helmfile [staging] DONE helmfile.d/services/cxserver: apply [06:00:05] Deploy window MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240314T0600) [06:00:05] kormat, marostegui, Amir1, and arnaudb: OwO what's this, a deployment window?? Primary database switchover. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240314T0600). nyaa~ [06:06:02] !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 1:00:00 on 16 hosts with reason: Primary switchover x1 T359919 [06:06:06] T359919: Switchover x1 master (db2115 -> db2196) - https://phabricator.wikimedia.org/T359919 [06:06:16] !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 16 hosts with reason: Primary switchover x1 T359919 [06:06:45] !log arnaudb@cumin1002 dbctl commit (dc=all): 'Set db2196 with weight 0 T359919', diff saved to https://phabricator.wikimedia.org/P58788 and previous config saved to /var/cache/conftool/dbconfig/20240314-060644-arnaudb.json [06:22:30] (03CR) 10Arnaudb: [C:03+2] mariadb: Promote db2196 to x1 master [puppet] - 10https://gerrit.wikimedia.org/r/1010253 (https://phabricator.wikimedia.org/T359919) (owner: 10Gerrit maintenance bot) [06:23:51] !log Starting x1 codfw failover from db2115 to db2196 - T359919 [06:23:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:23:56] T359919: Switchover x1 master (db2115 -> db2196) - https://phabricator.wikimedia.org/T359919 [06:26:25] (SystemdUnitFailed) resolved: rsync-aptrepo-apt2001.wikimedia.org.service on apt1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:26:25] 06SRE, 10SRE-Access-Requests, 10Gerrit: 14Not able to access Gerrit - 14https://phabricator.wikimedia.org/T360006#9629155 (10cchen) 14@hashar Thank you so much! It works now! [06:26:57] (SystemdUnitFailed) firing: (2) update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:27:16] (03CR) 10Arnaudb: [C:03+2] wmnet: Update x1-master alias [dns] - 10https://gerrit.wikimedia.org/r/1010254 (https://phabricator.wikimedia.org/T359919) (owner: 10Gerrit maintenance bot) [06:28:15] (MediaWikiHighErrorRate) firing: Elevated rate of MediaWiki errors - api_appserver - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [06:30:25] (SystemdUnitFailed) firing: rsync-aptrepo-apt2001.wikimedia.org.service on apt1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:31:50] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [06:31:56] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [06:33:15] (MediaWikiHighErrorRate) firing: (6) Elevated rate of MediaWiki errors - api_appserver - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [06:40:06] PROBLEM - Uncommitted dbctl configuration changes- check dbctl config diff on cumin1002 is CRITICAL: CRITICAL - Unknown error executing dbctl config diff https://wikitech.wikimedia.org/wiki/Dbctl%23Uncommitted_dbctl_diffs [06:40:14] PROBLEM - Uncommitted dbctl configuration changes- check dbctl config diff on cumin2002 is CRITICAL: CRITICAL - Unknown error executing dbctl config diff https://wikitech.wikimedia.org/wiki/Dbctl%23Uncommitted_dbctl_diffs [06:40:15] ah [06:43:15] (MediaWikiHighErrorRate) firing: (6) Elevated rate of MediaWiki errors - api_appserver - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [06:44:20] (03PS1) 10KartikMistry: Update cxserver to 2024-03-14-063859-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1011036 [06:45:15] !log marostegui@cumin1002 dbctl commit (dc=all): 'Promote db2196 to x1 primary and set section read-write T359919', diff saved to https://phabricator.wikimedia.org/P58789 and previous config saved to /var/cache/conftool/dbconfig/20240314-064513-root.json [06:45:20] T359919: Switchover x1 master (db2115 -> db2196) - https://phabricator.wikimedia.org/T359919 [06:45:53] I'm doing staging deployment for cxserver, not in production until we fix some issues with page loading. [06:46:16] (03CR) 10KartikMistry: [C:03+2] Update cxserver to 2024-03-14-063859-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1011036 (owner: 10KartikMistry) [06:47:33] (03Merged) 10jenkins-bot: Update cxserver to 2024-03-14-063859-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1011036 (owner: 10KartikMistry) [06:48:15] (MediaWikiHighErrorRate) resolved: (5) Elevated rate of MediaWiki errors - api_appserver - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [06:48:24] !log kartik@deploy2002 helmfile [staging] START helmfile.d/services/cxserver: apply [06:48:46] !log kartik@deploy2002 helmfile [staging] DONE helmfile.d/services/cxserver: apply [06:50:06] RECOVERY - Uncommitted dbctl configuration changes- check dbctl config diff on cumin1002 is OK: OK - no diffs https://wikitech.wikimedia.org/wiki/Dbctl%23Uncommitted_dbctl_diffs [06:50:14] RECOVERY - Uncommitted dbctl configuration changes- check dbctl config diff on cumin2002 is OK: OK - no diffs https://wikitech.wikimedia.org/wiki/Dbctl%23Uncommitted_dbctl_diffs [07:00:04] Amir1 and Urbanecm: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for UTC morning backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240314T0700). [07:00:05] No Gerrit patches in the queue for this window AFAICS. [07:01:58] (03PS1) 10KartikMistry: Update cxserver to 2024-03-14-065833-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1011038 [07:02:29] (JobUnavailable) firing: (2) Reduced availability for job ldap in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [07:03:14] (03CR) 10KartikMistry: [C:03+2] Update cxserver to 2024-03-14-065833-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1011038 (owner: 10KartikMistry) [07:04:08] (03Merged) 10jenkins-bot: Update cxserver to 2024-03-14-065833-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1011038 (owner: 10KartikMistry) [07:04:21] (PoolcounterFullQueues) firing: Full queues for poolcounter1004:9106 poolcounter - https://www.mediawiki.org/wiki/PoolCounter#Request_tracing_in_production - https://grafana.wikimedia.org/d/aIcYxuxZk/poolcounter?orgId=1&viewPanel=6&from=now-1h&to=now&var-dc=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DPoolcounterFullQueues [07:05:39] !log kartik@deploy2002 helmfile [staging] START helmfile.d/services/cxserver: apply [07:06:02] !log kartik@deploy2002 helmfile [staging] DONE helmfile.d/services/cxserver: apply [07:09:21] (PoolcounterFullQueues) resolved: Full queues for poolcounter1004:9106 poolcounter - https://www.mediawiki.org/wiki/PoolCounter#Request_tracing_in_production - https://grafana.wikimedia.org/d/aIcYxuxZk/poolcounter?orgId=1&viewPanel=6&from=now-1h&to=now&var-dc=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DPoolcounterFullQueues [07:11:19] Deploying cxserver now.. [07:12:30] !log kartik@deploy2002 helmfile [codfw] START helmfile.d/services/cxserver: apply [07:13:06] !log kartik@deploy2002 helmfile [codfw] DONE helmfile.d/services/cxserver: apply [07:13:54] !log kartik@deploy2002 helmfile [eqiad] START helmfile.d/services/cxserver: apply [07:14:29] !log kartik@deploy2002 helmfile [eqiad] DONE helmfile.d/services/cxserver: apply [07:15:43] !log Updated cxserver to 2024-03-14-065833-production (T350773) [07:15:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:15:47] T350773: Remove preq and use node fetch - https://phabricator.wikimedia.org/T350773 [07:17:32] !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on db2115.codfw.wmnet with reason: Silence for reimaging [07:17:45] !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2115.codfw.wmnet with reason: Silence for reimaging [07:20:19] !log arnaudb@cumin1002 START - Cookbook sre.hosts.reimage for host db2115.codfw.wmnet with OS bookworm [07:31:45] (SwiftTooManyMediaUploads) firing: (2) Too many eqiad mediawiki originals uploads - https://wikitech.wikimedia.org/wiki/Swift/How_To#mediawiki_originals_uploads - https://alerts.wikimedia.org/?q=alertname%3DSwiftTooManyMediaUploads [07:35:37] !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on db2115.codfw.wmnet with reason: host reimage [07:38:13] !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2115.codfw.wmnet with reason: host reimage [07:59:18] !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2115.codfw.wmnet with OS bookworm [08:05:13] !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 0:30:00 on 6 hosts with reason: Enabling circular replication [08:05:30] !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 6 hosts with reason: Enabling circular replication [08:10:13] !log arnaudb@cumin1002 dbctl commit (dc=all): 'db2115 (re)pooling @ 25%: Post reimage', diff saved to https://phabricator.wikimedia.org/P58791 and previous config saved to /var/cache/conftool/dbconfig/20240314-081012-arnaudb.json [08:10:59] !log enable eqiad -> codfw replication on es5 T358199 [08:21:45] (SwiftTooManyMediaUploads) resolved: (2) Too many eqiad mediawiki originals uploads - https://wikitech.wikimedia.org/wiki/Swift/How_To#mediawiki_originals_uploads - https://alerts.wikimedia.org/?q=alertname%3DSwiftTooManyMediaUploads [08:21:57] (SystemdUnitFailed) firing: (2) update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:22:06] !log enable eqiad -> codfw replication on es4 T358199 [08:22:21] !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 0:30:00 on 6 hosts with reason: Enabling circular replication [08:22:27] !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 6 hosts with reason: Enabling circular replication [08:25:19] !log arnaudb@cumin1002 dbctl commit (dc=all): 'db2115 (re)pooling @ 50%: Post reimage', diff saved to https://phabricator.wikimedia.org/P58792 and previous config saved to /var/cache/conftool/dbconfig/20240314-082518-arnaudb.json [08:32:36] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [08:32:43] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [08:37:41] !log enable eqiad -> codfw replication on x1 T358199 [08:37:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:37:45] T358199: Database pre-switchover tasks March 2024 - https://phabricator.wikimedia.org/T358199 [08:37:52] !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 0:30:00 on 16 hosts with reason: Enabling circular replication [08:38:05] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [08:38:12] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [08:38:16] !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 16 hosts with reason: Enabling circular replication [08:40:25] !log arnaudb@cumin1002 dbctl commit (dc=all): 'db2115 (re)pooling @ 75%: Post reimage', diff saved to https://phabricator.wikimedia.org/P58793 and previous config saved to /var/cache/conftool/dbconfig/20240314-084024-arnaudb.json [08:47:25] PROBLEM - BFD status on cr1-eqiad is CRITICAL: Down: 2 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status [08:48:25] RECOVERY - BFD status on cr1-eqiad is OK: UP: 24 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status [08:50:50] !log enable eqiad -> codfw replication on s8 T358199 [08:50:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:50:55] T358199: Database pre-switchover tasks March 2024 - https://phabricator.wikimedia.org/T358199 [08:51:01] !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 0:30:00 on 34 hosts with reason: Enabling circular replication [08:51:29] !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 34 hosts with reason: Enabling circular replication [08:54:45] !log enable eqiad -> codfw replication on s7 T358199 [08:54:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:54:55] !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 0:30:00 on 31 hosts with reason: Enabling circular replication [08:55:22] !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 31 hosts with reason: Enabling circular replication [08:55:32] !log arnaudb@cumin1002 dbctl commit (dc=all): 'db2115 (re)pooling @ 100%: Post reimage', diff saved to https://phabricator.wikimedia.org/P58794 and previous config saved to /var/cache/conftool/dbconfig/20240314-085530-arnaudb.json [09:00:05] hashar and jnuche: Deploy window MediaWiki train - Utc-0 Version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240314T0900) [09:04:44] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [09:04:50] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [09:06:09] !log enable eqiad -> codfw replication on s6 T358199 [09:06:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:06:16] !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 0:30:00 on 27 hosts with reason: Enabling circular replication [09:06:18] T358199: Database pre-switchover tasks March 2024 - https://phabricator.wikimedia.org/T358199 [09:06:40] !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 27 hosts with reason: Enabling circular replication [09:07:56] I am checking logs [09:09:13] (03PS1) 10TrainBranchBot: group2 wikis to 1.42.0-wmf.22 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1011081 (https://phabricator.wikimedia.org/T354440) [09:09:15] (03CR) 10TrainBranchBot: [C:03+2] group2 wikis to 1.42.0-wmf.22 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1011081 (https://phabricator.wikimedia.org/T354440) (owner: 10TrainBranchBot) [09:09:30] !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 0:30:00 on 27 hosts with reason: Enabling circular replication [09:09:33] !log enable eqiad -> codfw replication on s5 T358199 [09:09:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:09:55] !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 27 hosts with reason: Enabling circular replication [09:10:00] (03Merged) 10jenkins-bot: group2 wikis to 1.42.0-wmf.22 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1011081 (https://phabricator.wikimedia.org/T354440) (owner: 10TrainBranchBot) [09:16:20] !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 0:30:00 on 36 hosts with reason: Enabling circular replication [09:16:21] !log enable eqiad -> codfw replication on s4 T358199 [09:16:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:16:27] T358199: Database pre-switchover tasks March 2024 - https://phabricator.wikimedia.org/T358199 [09:16:51] !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 36 hosts with reason: Enabling circular replication [09:19:29] PROBLEM - Check whether ferm is active by checking the default input chain on mw1394 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [09:21:14] !log enable eqiad -> codfw replication on s3 T358199 [09:21:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:21:23] !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 0:30:00 on 26 hosts with reason: Enabling circular replication [09:21:46] !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 26 hosts with reason: Enabling circular replication [09:23:06] !log hashar@deploy2002 rebuilt and synchronized wikiversions files: group2 wikis to 1.42.0-wmf.22 refs T354440 [09:23:10] T354440: 1.42.0-wmf.22 deployment blockers - https://phabricator.wikimedia.org/T354440 [09:24:04] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [09:24:10] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [09:26:28] !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 0:30:00 on 29 hosts with reason: Enabling circular replication [09:26:30] !log enable eqiad -> codfw replication on s2 T358199 [09:26:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:26:35] T358199: Database pre-switchover tasks March 2024 - https://phabricator.wikimedia.org/T358199 [09:26:54] !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 29 hosts with reason: Enabling circular replication [09:34:01] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [09:34:08] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [09:35:55] !log enable eqiad -> codfw replication on s1 T358199 [09:35:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:36:07] !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 0:30:00 on 37 hosts with reason: Enabling circular replication [09:36:12] T358199: Database pre-switchover tasks March 2024 - https://phabricator.wikimedia.org/T358199 [09:36:39] !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 37 hosts with reason: Enabling circular replication [09:37:21] (03PS1) 10Arturo Borrero Gonzalez: admin: aborrero: refresh my bashrc file [puppet] - 10https://gerrit.wikimedia.org/r/1011089 [09:38:18] (03PS1) 10Vgutierrez: admin: Remove cdobbins SSH key [puppet] - 10https://gerrit.wikimedia.org/r/1011090 [09:38:57] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [09:39:04] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [09:49:29] RECOVERY - Check whether ferm is active by checking the default input chain on mw1394 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [09:53:46] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [09:53:52] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [10:01:08] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [10:01:15] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [10:09:26] (RoutinatorRsyncErrors) firing: Routinator rsync fetching issue in eqiad - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [10:19:01] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 211, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [10:19:35] PROBLEM - Router interfaces on cr2-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 45, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [10:22:37] RECOVERY - Router interfaces on cr2-eqord is OK: OK: host 208.80.154.198, interfaces up: 46, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [10:23:03] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 212, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [10:30:25] (SystemdUnitFailed) firing: rsync-aptrepo-apt2001.wikimedia.org.service on apt1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:36:45] (03PS1) 10Arturo Borrero Gonzalez: base: standard_packages: install fzf [puppet] - 10https://gerrit.wikimedia.org/r/1011091 [10:37:29] (03CR) 10Arturo Borrero Gonzalez: [C:03+2] admin: aborrero: refresh my bashrc file [puppet] - 10https://gerrit.wikimedia.org/r/1011089 (owner: 10Arturo Borrero Gonzalez) [10:46:54] (03CR) 10Arturo Borrero Gonzalez: [C:03+2] admin: aborrero: bashrc: drop history update special case [puppet] - 10https://gerrit.wikimedia.org/r/1011092 (owner: 10Arturo Borrero Gonzalez) [10:49:26] (RoutinatorRsyncErrors) firing: (2) Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [11:00:05] mvolz: OwO what's this, a deployment window?? Services – Citoid / Zotero. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240314T1100). nyaa~ [11:00:05] Deploy window MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240314T1100) [11:02:29] (JobUnavailable) firing: (2) Reduced availability for job ldap in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [11:03:14] (03CR) 10Majavah: [C:03+2] P:toolforge::proxy: drop grid engine dynamicproxy support [puppet] - 10https://gerrit.wikimedia.org/r/1010503 (https://phabricator.wikimedia.org/T314664) (owner: 10Majavah) [11:04:51] (03CR) 10Mvolz: [C:03+2] citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1010677 (owner: 10PipelineBot) [11:05:58] (03Merged) 10jenkins-bot: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1010677 (owner: 10PipelineBot) [11:07:02] !log mvolz@deploy2002 helmfile [staging] START helmfile.d/services/citoid: apply [11:07:37] !log mvolz@deploy2002 helmfile [staging] DONE helmfile.d/services/citoid: apply [11:07:56] (03CR) 10Majavah: "This might still be needed :-( or at least on tools-puppetserver-01.tools.eqiad1.wikimedia.cloud the hooks don't seem to have any effect o" [puppet] - 10https://gerrit.wikimedia.org/r/1009798 (https://phabricator.wikimedia.org/T351450) (owner: 10Andrew Bogott) [11:08:10] !log mvolz@deploy2002 helmfile [codfw] START helmfile.d/services/citoid: apply [11:09:17] !log mvolz@deploy2002 helmfile [codfw] DONE helmfile.d/services/citoid: apply [11:09:42] !log mvolz@deploy2002 helmfile [eqiad] START helmfile.d/services/citoid: apply [11:10:17] !log mvolz@deploy2002 helmfile [eqiad] DONE helmfile.d/services/citoid: apply [11:10:53] (03PS1) 10David Caro: tools: add a note in motd about grid turning off [puppet] - 10https://gerrit.wikimedia.org/r/1011096 [11:12:19] (03CR) 10Majavah: [V:03+1 C:03+2] Remove Toolforge grid engine [puppet] - 10https://gerrit.wikimedia.org/r/1010892 (https://phabricator.wikimedia.org/T314664) (owner: 10Majavah) [11:12:23] (03CR) 10Arturo Borrero Gonzalez: [C:03+1] "LGTM." [puppet] - 10https://gerrit.wikimedia.org/r/1011096 (owner: 10David Caro) [11:13:02] (03Abandoned) 10Mvolz: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1010676 (owner: 10PipelineBot) [11:13:10] (03Abandoned) 10Mvolz: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1009393 (owner: 10PipelineBot) [11:29:11] (03PS2) 10David Caro: tools: add grid shutoff messages [puppet] - 10https://gerrit.wikimedia.org/r/1011096 [11:29:35] (03CR) 10CI reject: [V:04-1] tools: add grid shutoff messages [puppet] - 10https://gerrit.wikimedia.org/r/1011096 (owner: 10David Caro) [11:29:59] (03CR) 10David Caro: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/1011096 (owner: 10David Caro) [11:30:07] (03CR) 10David Caro: tools: add grid shutoff messages (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1011096 (owner: 10David Caro) [11:30:24] (03PS3) 10David Caro: tools: add grid shutoff messages [puppet] - 10https://gerrit.wikimedia.org/r/1011096 [11:30:32] (03CR) 10CI reject: [V:04-1] tools: add grid shutoff messages [puppet] - 10https://gerrit.wikimedia.org/r/1011096 (owner: 10David Caro) [11:30:48] (03CR) 10Majavah: tools: add grid shutoff messages (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1011096 (owner: 10David Caro) [11:31:04] (03PS8) 10Majavah: dynamicproxy: cleanup after removing toolforge support [puppet] - 10https://gerrit.wikimedia.org/r/1010504 (https://phabricator.wikimedia.org/T314664) [11:31:12] (03PS9) 10Majavah: dynamicproxy: add support for per-project zones [puppet] - 10https://gerrit.wikimedia.org/r/1010505 (https://phabricator.wikimedia.org/T342398) [11:31:20] (03PS14) 10Majavah: dynamicproxy: allow specifying different certs for each zone [puppet] - 10https://gerrit.wikimedia.org/r/1010506 (https://phabricator.wikimedia.org/T342398) [11:31:22] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [11:31:28] (03PS10) 10Majavah: dynamicproxy: add spec test for API [puppet] - 10https://gerrit.wikimedia.org/r/1010509 [11:31:29] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [11:33:49] (03CR) 10David Caro: tools: add grid shutoff messages (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1011096 (owner: 10David Caro) [11:34:51] (03CR) 10Majavah: tools: add grid shutoff messages (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1011096 (owner: 10David Caro) [11:37:56] (03PS4) 10David Caro: tools: add grid shutoff messages [puppet] - 10https://gerrit.wikimedia.org/r/1011096 [11:38:26] (03CR) 10David Caro: tools: add grid shutoff messages (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1011096 (owner: 10David Caro) [11:39:07] (03CR) 10CI reject: [V:04-1] tools: add grid shutoff messages [puppet] - 10https://gerrit.wikimedia.org/r/1011096 (owner: 10David Caro) [11:42:16] (03PS5) 10David Caro: tools: add grid shutoff messages [puppet] - 10https://gerrit.wikimedia.org/r/1011096 [11:42:42] (03PS15) 10Majavah: dynamicproxy: allow specifying different certs for each zone [puppet] - 10https://gerrit.wikimedia.org/r/1010506 (https://phabricator.wikimedia.org/T342398) [11:42:42] (03PS11) 10Majavah: dynamicproxy: add spec test for API [puppet] - 10https://gerrit.wikimedia.org/r/1010509 [11:45:17] (03PS16) 10Majavah: dynamicproxy: allow specifying different certs for each zone [puppet] - 10https://gerrit.wikimedia.org/r/1010506 (https://phabricator.wikimedia.org/T342398) [11:45:18] (03PS12) 10Majavah: dynamicproxy: add spec test for API [puppet] - 10https://gerrit.wikimedia.org/r/1010509 [11:49:06] (03CR) 10Majavah: [V:03+1] "PCC SUCCESS (CORE_DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/1638/co" [puppet] - 10https://gerrit.wikimedia.org/r/1011096 (owner: 10David Caro) [11:49:15] (03CR) 10Majavah: [V:03+1 C:03+1] tools: add grid shutoff messages [puppet] - 10https://gerrit.wikimedia.org/r/1011096 (owner: 10David Caro) [11:49:47] (03CR) 10Arturo Borrero Gonzalez: [C:03+1] "LGTM. Minor comment inline." [puppet] - 10https://gerrit.wikimedia.org/r/1011096 (owner: 10David Caro) [11:51:15] (03Abandoned) 10Majavah: P:toolforge::shell_environ: remove packages not on bullseye [puppet] - 10https://gerrit.wikimedia.org/r/990704 (owner: 10Majavah) [11:51:34] (03CR) 10David Caro: tools: add grid shutoff messages (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1011096 (owner: 10David Caro) [11:51:58] (03CR) 10Majavah: [C:03+2] O:toolforge: add role for grid-less bastions [puppet] - 10https://gerrit.wikimedia.org/r/990703 (https://phabricator.wikimedia.org/T314665) (owner: 10Majavah) [11:52:06] (03PS6) 10David Caro: tools: add grid shutoff messages [puppet] - 10https://gerrit.wikimedia.org/r/1011096 [11:54:07] (03CR) 10Majavah: [C:03+2] Fix broken vim modelines [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/1010664 (owner: 10Tim Starling) [11:54:50] (03Merged) 10jenkins-bot: Fix broken vim modelines [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/1010664 (owner: 10Tim Starling) [11:57:14] (03CR) 10David Caro: [C:03+2] tools: add grid shutoff messages [puppet] - 10https://gerrit.wikimedia.org/r/1011096 (owner: 10David Caro) [11:57:45] (03CR) 10Majavah: [C:03+2] Add procps to base images [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/1010690 (owner: 10Tim Starling) [11:57:50] (03PS3) 10Majavah: Add procps to base images [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/1010690 (owner: 10Tim Starling) [11:58:06] (03CR) 10Majavah: [C:03+2] Add procps to base images [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/1010690 (owner: 10Tim Starling) [11:58:47] (03Merged) 10jenkins-bot: Add procps to base images [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/1010690 (owner: 10Tim Starling) [12:00:05] Deploy window Mobileapps/RESTBase/Wikifeeds (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240314T1200) [12:06:32] !log Stopped MediaModeration scanning script on group2 wikis [12:06:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:18:08] (03CR) 10David Caro: O:toolforge: add role for grid-less bastions (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/990703 (https://phabricator.wikimedia.org/T314665) (owner: 10Majavah) [12:21:42] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [12:21:49] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [12:21:57] (SystemdUnitFailed) firing: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:29:30] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [12:29:36] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [12:30:25] (SystemdUnitFailed) resolved: rsync-aptrepo-apt2001.wikimedia.org.service on apt1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:33:58] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [12:34:05] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [12:34:26] (RoutinatorRsyncErrors) firing: (2) Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [12:35:25] (SystemdUnitFailed) firing: rsync-aptrepo-apt2001.wikimedia.org.service on apt1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:38:36] It looks like parsoid /page/html is broken from a MW change: https://phabricator.wikimedia.org/T360105 [12:38:54] I added some details here: https://phabricator.wikimedia.org/T360105#9630359 [12:47:32] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [12:47:39] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [13:00:04] RoanKattouw, Lucas_WMDE, Urbanecm, awight, and TheresNoTime: gettimeofday() says it's time for UTC afternoon backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240314T1300) [13:00:05] nemo-yiannis: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [13:01:14] (03Restored) 10Andrew Bogott: git-sync-upstream: on puppet7, deploy code after update [puppet] - 10https://gerrit.wikimedia.org/r/1009798 (https://phabricator.wikimedia.org/T351450) (owner: 10Andrew Bogott) [13:11:46] o/ [13:11:56] I can deploy [13:12:17] nemo-yiannis: are you around? [13:12:23] hey Lucas_WMDE [13:12:35] yeah [13:12:41] so the master change was just never +2ed? [13:12:45] by accident? [13:12:50] i think so yeah [13:13:02] ok, weird [13:13:12] we merged it on last weeks train as a backport fix but the same issue got triggered again [13:16:19] alright [13:16:22] guess we’re just waiting for CI then [13:16:28] (ETA 17 min according to zuul) [13:17:17] yeah, i figured out whats wrong right before the deplyoment window [13:17:55] (03PS1) 10Jgiannelos: REST: ignore request body on GET requests [core] (wmf/1.42.0-wmf.22) - 10https://gerrit.wikimedia.org/r/1010932 (https://phabricator.wikimedia.org/T359509) [13:18:46] looks like wikibugs is lagging behind a bit o_O [13:19:26] yeah it did not seem to like me closing 350 tasks with a bulk action [13:19:31] wikibugs died or so yea [13:20:17] it's a bit behind, but sending everything eventually [13:20:34] eeepy wikibugs [13:21:10] (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [core] (wmf/1.42.0-wmf.22) - 10https://gerrit.wikimedia.org/r/1010932 (https://phabricator.wikimedia.org/T359509) (owner: 10Jgiannelos) [13:21:52] oh hey hashar, got 2 minutes to look at https://gerrit.wikimedia.org/r/c/integration/config/+/1011116 ? :D [13:22:40] TheresNoTime: going for it [13:22:47] thaaank you [13:30:09] (03PS1) 10Samtar: cswiki, commonswiki: lift IP cap [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1011125 (https://phabricator.wikimedia.org/T360103) [13:31:39] (03PS2) 10Samtar: cswiki, commonswiki: lift IP cap [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1011125 (https://phabricator.wikimedia.org/T360103) [13:32:15] PROBLEM - IPv6 ping to esams on ripe-atlas-esams IPv6 is CRITICAL: CRITICAL - failed 102 probes of 739 (alerts on 90) - https://atlas.ripe.net/measurements/59935539/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [13:32:54] (03CR) 10Lucas Werkmeister (WMDE): "Please remember to also +2 the master change – or, if you’re intentionally not merging it on master yet (e.g. because more work is needed)" [core] (wmf/1.42.0-wmf.21) - 10https://gerrit.wikimedia.org/r/1009544 (https://phabricator.wikimedia.org/T359509) (owner: 10Jaime Nuche) [13:35:37] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [13:35:43] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [13:35:49] Lucas_WMDE: how do you feel about deploying https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1011125 for T360103 ? [13:35:49] T360103: Lift IP cap for editathon Women in Media - https://phabricator.wikimedia.org/T360103 [13:36:07] (03Merged) 10jenkins-bot: REST: ignore request body on GET requests [core] (wmf/1.42.0-wmf.22) - 10https://gerrit.wikimedia.org/r/1010932 (https://phabricator.wikimedia.org/T359509) (owner: 10Jgiannelos) [13:36:30] !log lucaswerkmeister-wmde@deploy2002 Started scap: Backport for [[gerrit:1010932|REST: ignore request body on GET requests (T359509)]] [13:36:35] T359509: REST API calls suddenly all returning 400 - https://phabricator.wikimedia.org/T359509 [13:37:15] RECOVERY - IPv6 ping to esams on ripe-atlas-esams IPv6 is OK: OK - failed 41 probes of 739 (alerts on 90) - https://atlas.ripe.net/measurements/59935539/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [13:37:19] TheresNoTime: sure [13:37:25] (03CR) 10Lucas Werkmeister (WMDE): cswiki, commonswiki: lift IP cap (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1011125 (https://phabricator.wikimedia.org/T360103) (owner: 10Samtar) [13:38:06] ta, and will tidy that up now, good point [13:38:43] !log lucaswerkmeister-wmde@deploy2002 jgiannelos and lucaswerkmeister-wmde: Backport for [[gerrit:1010932|REST: ignore request body on GET requests (T359509)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [13:38:54] nemo-yiannis: want to test the change on mwdebug? [13:39:05] yeah looking at ot [13:39:08] *it [13:39:20] (03PS3) 10Samtar: cswiki, commonswiki: lift IP cap [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1011125 (https://phabricator.wikimedia.org/T360103) [13:39:26] (RoutinatorRsyncErrors) resolved: Routinator rsync fetching issue in eqiad - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [13:39:27] ok! [13:41:37] (03CR) 10Lucas Werkmeister (WMDE): cswiki, commonswiki: lift IP cap (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1011125 (https://phabricator.wikimedia.org/T360103) (owner: 10Samtar) [13:41:41] (03CR) 10Lucas Werkmeister (WMDE): [C:03+1] cswiki, commonswiki: lift IP cap [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1011125 (https://phabricator.wikimedia.org/T360103) (owner: 10Samtar) [13:42:07] 06SRE, 06Data-Persistence, 06Infrastructure-Foundations: Integrate dbctl IP changes as part of VLAN changes. - https://phabricator.wikimedia.org/T360029#9631107 (10Marostegui) We've been discussing with @Ladsgroup that we need to double check if he IPs are really used for dbct /MW or it is just some tech deb... [13:42:15] TheresNoTime: deployed [13:42:24] thanks hashar :) [13:46:05] I tested on mwdebug1001 is that the right host ? [13:46:30] it should be on all the mwdebug hosts [13:46:45] but if the test involves edits it should be an mwdebug2* host because of the datacenter [13:47:28] is it not working? [13:47:34] no edits. Ok I tested the failing request from prod on mwdebug and restbase doesn't complain [13:47:37] i think its ok [13:47:44] working [13:48:06] !log lucaswerkmeister-wmde@deploy2002 jgiannelos and lucaswerkmeister-wmde: Continuing with sync [13:48:08] (03CR) 10Majavah: [C:04-1] git-sync-upstream: on puppet7, deploy code after update (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1009798 (https://phabricator.wikimedia.org/T351450) (owner: 10Andrew Bogott) [13:48:09] alright, thanks! [13:51:29] ok grafana charts look promising i don't see 404s anymore and my manual req worked [13:52:27] it’s not fully deployed yet, but I guess it’s starting to take effect then [13:52:29] PROBLEM - Check whether ferm is active by checking the default input chain on mw1393 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [13:52:33] PROBLEM - Check whether ferm is active by checking the default input chain on mw1462 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [13:52:33] PROBLEM - Check whether ferm is active by checking the default input chain on mw1465 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [13:52:33] PROBLEM - Check whether ferm is active by checking the default input chain on mw1472 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [13:53:03] PROBLEM - Check whether ferm is active by checking the default input chain on kubernetes1043 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [13:53:50] PROBLEM - Check whether ferm is active by checking the default input chain on mw2435 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [13:54:26] hm [13:55:35] (I’m not even allowed to SSH into those hosts, someone else will have to take a look) [13:55:54] jouncebot: next [13:55:55] In 2 hour(s) and 4 minute(s): Puppet request window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240314T1600) [13:56:03] ok, we’ll just overrun a bit for the IP cap [13:57:12] Lucas_WMDE: ta [13:58:01] (03CR) 10Lucas Werkmeister (WMDE): [C:03+2] "let’s start gate-and-submit already" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1011125 (https://phabricator.wikimedia.org/T360103) (owner: 10Samtar) [13:58:37] !log lucaswerkmeister-wmde@deploy2002 Finished scap: Backport for [[gerrit:1010932|REST: ignore request body on GET requests (T359509)]] (duration: 22m 06s) [13:58:41] T359509: REST API calls suddenly all returning 400 - https://phabricator.wikimedia.org/T359509 [13:58:47] (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1011125 (https://phabricator.wikimedia.org/T360103) (owner: 10Samtar) [13:58:53] * Lucas_WMDE tries to figure out if ta is an acronym or just british :P [13:58:58] (03Merged) 10jenkins-bot: cswiki, commonswiki: lift IP cap [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1011125 (https://phabricator.wikimedia.org/T360103) (owner: 10Samtar) [13:59:09] !log lucaswerkmeister-wmde@deploy2002 Started scap: Backport for [[gerrit:1011125|cswiki, commonswiki: lift IP cap (T360103)]] [13:59:09] Lucas_WMDE: oh, yes, just british for "thank you" :P [13:59:13] T360103: Lift IP cap for editathon Women in Media - https://phabricator.wikimedia.org/T360103 [13:59:23] thanks :D [13:59:27] (also I checked enwiktionary already ^^) [14:00:02] :D the cap doesn't need testing afaik so please sync as soon as ready [14:00:09] yeah, makes senes [14:00:11] *sense [14:00:33] “The expression ta ta differs, meaning goodbye.” aaaaand now I’m thinking of “tata and farewell”, “succulent chinese meal” etc. [14:01:21] !log lucaswerkmeister-wmde@deploy2002 samtar and lucaswerkmeister-wmde: Backport for [[gerrit:1011125|cswiki, commonswiki: lift IP cap (T360103)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [14:01:27] !log lucaswerkmeister-wmde@deploy2002 samtar and lucaswerkmeister-wmde: Continuing with sync [14:03:37] do we know where all these jsonTruncated messages in logstash are coming from, btw? [14:04:03] * TheresNoTime does not [14:04:10] ok, apparently they’re “Unable to store text to external storage” [14:04:23] (Got a packet bigger than 'max_allowed_packet' bytes) [14:04:50] someone unplugged the external drive which holds all the articles oh no [14:05:20] started around 8:30 UTC apparently [14:05:25] * Lucas_WMDE guesses that’s the train rolling to wmf.22 [14:05:28] *group2 [14:05:47] eh, off by one hour, maybe I got timezones wrong https://sal.toolforge.org/log/VI1FPI4BhuQtenzvuZLg [14:11:46] !log lucaswerkmeister-wmde@deploy2002 Finished scap: Backport for [[gerrit:1011125|cswiki, commonswiki: lift IP cap (T360103)]] (duration: 12m 36s) [14:11:51] T360103: Lift IP cap for editathon Women in Media - https://phabricator.wikimedia.org/T360103 [14:12:09] * Lucas_WMDE done [14:12:14] !log UTC afternoon backport+config window done [14:12:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:15:30] thanks Lucas_WMDE :) [14:15:30] and reported the log messages at T360118 [14:15:31] T360118: Many jsonTruncated production errors: "Unable to store text to external storage" - https://phabricator.wikimedia.org/T360118 [14:15:34] np :) [14:22:29] RECOVERY - Check whether ferm is active by checking the default input chain on mw1393 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [14:22:33] RECOVERY - Check whether ferm is active by checking the default input chain on mw1462 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [14:22:33] RECOVERY - Check whether ferm is active by checking the default input chain on mw1472 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [14:22:33] RECOVERY - Check whether ferm is active by checking the default input chain on mw1465 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [14:23:03] RECOVERY - Check whether ferm is active by checking the default input chain on kubernetes1043 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [14:23:47] RECOVERY - Check whether ferm is active by checking the default input chain on mw2435 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [14:37:15] (JobUnavailable) firing: (3) Reduced availability for job ldap in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [14:49:20] (03CR) 10Ahmon Dancy: [C:03+1] modules/scap/files/foreachwiki: Fix check for beta cluster [puppet] - 10https://gerrit.wikimedia.org/r/1010590 (https://phabricator.wikimedia.org/T357877) (owner: 10Ahmon Dancy) [14:57:15] (JobUnavailable) firing: (3) Reduced availability for job ldap in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [15:07:23] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [15:07:30] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [15:21:10] (03PS1) 10Majavah: P:toolforge::docker::image_builder: drop buster support [puppet] - 10https://gerrit.wikimedia.org/r/1011137 (https://phabricator.wikimedia.org/T358483) [15:21:12] (03PS1) 10Majavah: kubeadm: Drop buster support [puppet] - 10https://gerrit.wikimedia.org/r/1011138 (https://phabricator.wikimedia.org/T284656) [15:28:21] !log STOP persistRevisionThreadItems on viwiki for T315510, restarting to pick up wmf.22 [15:28:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:28:29] T315510: Start maintenance script to backfill talk page comment database - https://phabricator.wikimedia.org/T315510 [15:29:56] !log START lucaswerkmeister-wmde@mwmaint2002:~$ time mwscript extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --wiki viwiki --current --all --touched-after=20230613000000 --start '["14615874"]' 2>&1 | tee ~/T315510-viwiki-3; date [15:30:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:41:08] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [15:41:15] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [15:44:12] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [15:44:19] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [15:59:16] (03PS9) 10Anzx: frwiki: update legacy vector logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1011097 (https://phabricator.wikimedia.org/T359741) [16:00:05] jhathaway and rzl: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for Puppet request window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240314T1600). [16:00:05] No Gerrit patches in the queue for this window AFAICS. [16:03:08] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [16:03:14] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [16:03:26] (RoutinatorRsyncErrors) firing: Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [16:08:26] (RoutinatorRsyncErrors) firing: (2) Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [16:17:04] (03PS1) 10Mmartorana: Implementing security.txt standard [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1010970 (https://phabricator.wikimedia.org/T337949) [16:22:12] (SystemdUnitFailed) firing: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [16:27:19] (03CR) 10Hashar: [C:03+2] wikitech: allow unblocking inactive accounts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1010853 (https://phabricator.wikimedia.org/T307558) (owner: 10Hashar) [16:28:42] (03Merged) 10jenkins-bot: wikitech: allow unblocking inactive accounts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1010853 (https://phabricator.wikimedia.org/T307558) (owner: 10Hashar) [16:29:15] !log hashar@deploy2002 Started scap: Backport for [[gerrit:1010853|wikitech: allow unblocking inactive accounts (T307558)]] [16:31:26] !log hashar@deploy2002 hashar: Backport for [[gerrit:1010853|wikitech: allow unblocking inactive accounts (T307558)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [16:31:32] !log hashar@deploy2002 hashar: Continuing with sync [16:34:05] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [16:34:18] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [16:35:25] (SystemdUnitFailed) firing: rsync-aptrepo-apt2001.wikimedia.org.service on apt1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [16:35:36] (03PS1) 10Mmartorana: Implementing security.txt standard [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1010971 (https://phabricator.wikimedia.org/T337949) [16:37:37] PROBLEM - Check whether ferm is active by checking the default input chain on mw2260 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [16:38:44] !log bearloga@deploy2002 Started deploy [airflow-dags/analytics_product@bae55a9]: (no justification provided) [16:38:53] !log bearloga@deploy2002 Finished deploy [airflow-dags/analytics_product@bae55a9]: (no justification provided) (duration: 00m 08s) [16:42:20] !log hashar@deploy2002 Finished scap: Backport for [[gerrit:1010853|wikitech: allow unblocking inactive accounts (T307558)]] (duration: 13m 04s) [16:43:31] (03CR) 10Dzahn: "which URL do you want this to be reachable at? This probably has to go somewhere below the "docroot" directory in this repo." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1010971 (https://phabricator.wikimedia.org/T337949) (owner: 10Mmartorana) [16:50:29] (03CR) 10Mmartorana: "I was thinking to treat this file as a standard robot.txt file." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1010971 (https://phabricator.wikimedia.org/T337949) (owner: 10Mmartorana) [16:50:59] (03CR) 10Reedy: [C:04-1] "Putting the file here won't actually do anything, nor will it make it visible..." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1010971 (https://phabricator.wikimedia.org/T337949) (owner: 10Mmartorana) [16:51:15] (03CR) 10Reedy: [C:04-1] "Heh, snap 😄" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1010971 (https://phabricator.wikimedia.org/T337949) (owner: 10Mmartorana) [16:51:15] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [16:51:22] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [16:55:01] (03CR) 10Dzahn: "@Reedy but it could be put into ./docroot/wikipedia.org/ then it would show as https://en.wikipedia.org/security.txt just like for examp" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1010971 (https://phabricator.wikimedia.org/T337949) (owner: 10Mmartorana) [16:55:23] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [16:55:29] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [16:57:56] (03CR) 10Reedy: [C:04-1] "It would need to go into `docroot/standard-docroot` for most of the sites, and then `docroot/mediawiki.org` and `docroot/wikimediafoundati" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1010971 (https://phabricator.wikimedia.org/T337949) (owner: 10Mmartorana) [16:59:05] (03CR) 10Dzahn: "ACK. and @manfredi avoiding a rewrite will make this a LOT easier to get deployed" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1010971 (https://phabricator.wikimedia.org/T337949) (owner: 10Mmartorana) [17:00:04] bd808: That opportune time for a Cloud Services/Technical Documentation weekly deploy (Toolhub, Developer portal, Striker) deploy is upon us again. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240314T1700). [17:00:04] Deploy window MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240314T1700) [17:03:41] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [17:03:47] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [17:04:47] nothing for me to push out today [17:07:37] RECOVERY - Check whether ferm is active by checking the default input chain on mw2260 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [17:09:25] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [17:09:32] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [17:11:17] (03PS1) 10Hashar: wikitech: fix handling of Gerrit status code [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1011151 (https://phabricator.wikimedia.org/T307558) [17:11:45] (SwiftTooManyMediaUploads) firing: (2) Too many eqiad mediawiki originals uploads - https://wikitech.wikimedia.org/wiki/Swift/How_To#mediawiki_originals_uploads - https://alerts.wikimedia.org/?q=alertname%3DSwiftTooManyMediaUploads [17:14:53] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [17:15:00] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [17:15:17] (03PS4) 10Klausman: profile::thanos: Add latency histogram buckets back for Istio [puppet] - 10https://gerrit.wikimedia.org/r/1011146 (https://phabricator.wikimedia.org/T359879) [17:15:55] (03PS5) 10Klausman: profile::thanos: Add latency histogram buckets back for Istio [puppet] - 10https://gerrit.wikimedia.org/r/1011146 (https://phabricator.wikimedia.org/T359879) [17:16:16] (03CR) 10Dzahn: [C:03+2] modules/scap/files/foreachwiki: Fix check for beta cluster [puppet] - 10https://gerrit.wikimedia.org/r/1010590 (https://phabricator.wikimedia.org/T357877) (owner: 10Ahmon Dancy) [17:17:52] (03CR) 10Hashar: "From my test earlier, the hooks log a failure when the status code is not 204 T307558#9631820 . Gerrit returns various status codes (all i" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1011151 (https://phabricator.wikimedia.org/T307558) (owner: 10Hashar) [17:24:56] (03CR) 10Elukey: [C:03+1] "nit: let's mention that we remove the "le" bucket in istio_sli_latency_request_duration_milliseconds_count because not needed (otherwise i" [puppet] - 10https://gerrit.wikimedia.org/r/1011146 (https://phabricator.wikimedia.org/T359879) (owner: 10Klausman) [17:26:32] (03CR) 10Ahmon Dancy: [C:03+1] "Thanks Dzahn!" [puppet] - 10https://gerrit.wikimedia.org/r/1010590 (https://phabricator.wikimedia.org/T357877) (owner: 10Ahmon Dancy) [17:26:34] (03PS6) 10Klausman: profile::thanos: Add latency histogram buckets back for Istio [puppet] - 10https://gerrit.wikimedia.org/r/1011146 (https://phabricator.wikimedia.org/T359879) [17:26:51] (03CR) 10Klausman: "Done." [puppet] - 10https://gerrit.wikimedia.org/r/1011146 (https://phabricator.wikimedia.org/T359879) (owner: 10Klausman) [17:29:43] (03CR) 10Herron: [C:03+1] profile::thanos: Add latency histogram buckets back for Istio [puppet] - 10https://gerrit.wikimedia.org/r/1011146 (https://phabricator.wikimedia.org/T359879) (owner: 10Klausman) [17:31:24] (03CR) 10BryanDavis: wikitech: fix handling of Gerrit status code (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1011151 (https://phabricator.wikimedia.org/T307558) (owner: 10Hashar) [17:40:30] (03CR) 10Reedy: [C:04-1] "I dunno what we want to do in terms of exposing it on various other web sites (phab etc etc) though too." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1010971 (https://phabricator.wikimedia.org/T337949) (owner: 10Mmartorana) [17:45:21] (03CR) 10Majavah: [C:03+2] dynamicproxy: cleanup after removing toolforge support [puppet] - 10https://gerrit.wikimedia.org/r/1010504 (https://phabricator.wikimedia.org/T314664) (owner: 10Majavah) [17:46:22] (03CR) 10Majavah: [C:03+2] dynamicproxy: add support for per-project zones [puppet] - 10https://gerrit.wikimedia.org/r/1010505 (https://phabricator.wikimedia.org/T342398) (owner: 10Majavah) [17:46:27] (03CR) 10Majavah: [C:03+2] dynamicproxy: allow specifying different certs for each zone [puppet] - 10https://gerrit.wikimedia.org/r/1010506 (https://phabricator.wikimedia.org/T342398) (owner: 10Majavah) [17:47:23] (03PS17) 10Majavah: dynamicproxy: allow specifying different certs for each zone [puppet] - 10https://gerrit.wikimedia.org/r/1010506 (https://phabricator.wikimedia.org/T342398) [17:47:24] (03PS13) 10Majavah: dynamicproxy: add spec test for API [puppet] - 10https://gerrit.wikimedia.org/r/1010509 [17:51:45] (SwiftTooManyMediaUploads) resolved: (2) Too many eqiad mediawiki originals uploads - https://wikitech.wikimedia.org/wiki/Swift/How_To#mediawiki_originals_uploads - https://alerts.wikimedia.org/?q=alertname%3DSwiftTooManyMediaUploads [17:53:26] (RoutinatorRsyncErrors) firing: (2) Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [17:53:41] (03CR) 10Majavah: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/1010506 (https://phabricator.wikimedia.org/T342398) (owner: 10Majavah) [17:57:14] (03CR) 10Jforrester: "+1 to the concept, -1 to the location; putting it in docroot and sym-linking it as needed seems sensible." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1010971 (https://phabricator.wikimedia.org/T337949) (owner: 10Mmartorana) [17:57:15] (03CR) 10Majavah: [C:03+2] dynamicproxy: allow specifying different certs for each zone [puppet] - 10https://gerrit.wikimedia.org/r/1010506 (https://phabricator.wikimedia.org/T342398) (owner: 10Majavah) [17:57:46] (03CR) 10Majavah: [C:03+2] dynamicproxy: add spec test for API [puppet] - 10https://gerrit.wikimedia.org/r/1010509 (owner: 10Majavah) [18:03:14] (03PS1) 10Majavah: dynamicproxy: fix multiple vhosts on same server [puppet] - 10https://gerrit.wikimedia.org/r/1011160 [18:04:30] (03CR) 10CI reject: [V:04-1] dynamicproxy: fix multiple vhosts on same server [puppet] - 10https://gerrit.wikimedia.org/r/1011160 (owner: 10Majavah) [18:05:33] (03PS2) 10Majavah: dynamicproxy: fix multiple vhosts on same server [puppet] - 10https://gerrit.wikimedia.org/r/1011160 [18:06:52] (03PS3) 10Majavah: dynamicproxy: fix multiple vhosts on same server [puppet] - 10https://gerrit.wikimedia.org/r/1011160 [18:08:17] (03CR) 10Majavah: [C:03+2] dynamicproxy: fix multiple vhosts on same server [puppet] - 10https://gerrit.wikimedia.org/r/1011160 (owner: 10Majavah) [18:35:25] (SystemdUnitFailed) resolved: rsync-aptrepo-apt2001.wikimedia.org.service on apt1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:39:25] (SystemdUnitFailed) firing: rsync-aptrepo-apt2001.wikimedia.org.service on apt1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:55:41] jouncebot: nowandnext [18:55:41] No deployments scheduled for the next 1 hour(s) and 4 minute(s) [18:55:41] In 1 hour(s) and 4 minute(s): UTC late backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240314T2000) [18:56:25] (03PS1) 10Majavah: P:acme_chief: allow enabling http-01 spport [puppet] - 10https://gerrit.wikimedia.org/r/1011167 (https://phabricator.wikimedia.org/T342398) [18:56:27] (03PS1) 10Majavah: P:wmcs::novaproxy: proxy http-01 challenges to acme-chief [puppet] - 10https://gerrit.wikimedia.org/r/1011168 (https://phabricator.wikimedia.org/T342398) [18:56:44] (03PS2) 10Hashar: wikitech: fix handling of Gerrit status code [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1011151 (https://phabricator.wikimedia.org/T307558) [18:57:04] (03PS3) 10Reedy: CommonSettings: Add $wgSecurePollExcludedWikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1008997 (https://phabricator.wikimedia.org/T303135) [18:57:08] (03CR) 10Reedy: [C:03+2] CommonSettings: Add $wgSecurePollExcludedWikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1008997 (https://phabricator.wikimedia.org/T303135) (owner: 10Reedy) [18:57:30] (JobUnavailable) firing: (2) Reduced availability for job ldap in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [18:57:32] (03CR) 10CI reject: [V:04-1] wikitech: fix handling of Gerrit status code [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1011151 (https://phabricator.wikimedia.org/T307558) (owner: 10Hashar) [18:57:50] liees [18:58:27] (03Merged) 10jenkins-bot: CommonSettings: Add $wgSecurePollExcludedWikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1008997 (https://phabricator.wikimedia.org/T303135) (owner: 10Reedy) [18:59:25] (03PS3) 10Hashar: wikitech: fix handling of Gerrit status code [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1011151 (https://phabricator.wikimedia.org/T307558) [19:00:46] (03CR) 10CI reject: [V:04-1] P:acme_chief: allow enabling http-01 spport [puppet] - 10https://gerrit.wikimedia.org/r/1011167 (https://phabricator.wikimedia.org/T342398) (owner: 10Majavah) [19:00:56] (03CR) 10Hashar: "I have used a switch to log the different cases of success. While at it I have added the account id to follow the user name." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1011151 (https://phabricator.wikimedia.org/T307558) (owner: 10Hashar) [19:01:01] (03CR) 10CI reject: [V:04-1] P:wmcs::novaproxy: proxy http-01 challenges to acme-chief [puppet] - 10https://gerrit.wikimedia.org/r/1011168 (https://phabricator.wikimedia.org/T342398) (owner: 10Majavah) [19:01:50] (03PS2) 10Majavah: P:acme_chief: allow enabling http-01 spport [puppet] - 10https://gerrit.wikimedia.org/r/1011167 (https://phabricator.wikimedia.org/T342398) [19:01:51] (03PS2) 10Majavah: P:wmcs::novaproxy: proxy http-01 challenges to acme-chief [puppet] - 10https://gerrit.wikimedia.org/r/1011168 (https://phabricator.wikimedia.org/T342398) [19:07:50] (03PS1) 10Hashar: wikitech: fix curl_exec a falsey value [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1011171 (https://phabricator.wikimedia.org/T307558) [19:10:07] PROBLEM - Check whether ferm is active by checking the default input chain on kubernetes2045 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [19:14:10] !log reedy@deploy2002 Synchronized wmf-config/CommonSettings.php: T303135 (duration: 12m 24s) [19:14:15] T303135: Replace 2021 board election hack with proper fix - https://phabricator.wikimedia.org/T303135 [19:22:29] (03PS1) 10Majavah: P:wmcs::striker: add support for multiple instances [puppet] - 10https://gerrit.wikimedia.org/r/1011174 (https://phabricator.wikimedia.org/T360025) [19:25:02] (03PS2) 10Majavah: P:wmcs::striker: add support for multiple instances [puppet] - 10https://gerrit.wikimedia.org/r/1011174 (https://phabricator.wikimedia.org/T360025) [19:27:00] (03PS3) 10Majavah: P:wmcs::striker: add support for multiple instances [puppet] - 10https://gerrit.wikimedia.org/r/1011174 (https://phabricator.wikimedia.org/T360025) [19:28:33] (03CR) 10Majavah: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/1641/co" [puppet] - 10https://gerrit.wikimedia.org/r/1011174 (https://phabricator.wikimedia.org/T360025) (owner: 10Majavah) [19:30:49] PROBLEM - mailman archives on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [19:31:11] PROBLEM - mailman list info on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [19:31:49] PROBLEM - mailman list info ssl expiry on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [19:32:41] RECOVERY - mailman list info ssl expiry on lists1001 is OK: OK - Certificate lists.wikimedia.org will expire on Mon 15 Apr 2024 02:06:19 AM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [19:32:43] RECOVERY - mailman archives on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 51596 bytes in 2.619 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [19:33:01] RECOVERY - mailman list info on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8571 bytes in 0.258 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [19:40:07] RECOVERY - Check whether ferm is active by checking the default input chain on kubernetes2045 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [19:41:26] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [19:41:33] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [19:47:26] (RoutinatorRsyncErrors) firing: Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [20:00:05] RoanKattouw, Urbanecm, cjming, TheresNoTime, and kindrobot: How many deployers does it take to do UTC late backport window deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240314T2000). [20:00:05] No Gerrit patches in the queue for this window AFAICS. [20:22:17] (SystemdUnitFailed) firing: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [20:47:26] (RoutinatorRsyncErrors) resolved: Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [20:51:31] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [20:51:37] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [20:53:56] (RoutinatorRsyncErrors) firing: (2) Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [20:58:30] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [20:58:37] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [20:58:56] (RoutinatorRsyncErrors) resolved: (2) Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [21:03:15] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [21:03:22] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [21:14:04] (03CR) 10Krinkle: [C:03+2] Support cookies in XWikimediaDebug [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1000307 (https://phabricator.wikimedia.org/T350094) (owner: 10Gergő Tisza) [21:14:10] (03CR) 10Krinkle: [C:03+2] tests: Fix PHP 8.2 warnings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1010562 (owner: 10Krinkle) [21:14:22] (03CR) 10Krinkle: [C:03+2] tests: Convert XWikimediaDebug cookie test to data provider [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1010563 (owner: 10Krinkle) [21:14:52] (03Merged) 10jenkins-bot: Support cookies in XWikimediaDebug [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1000307 (https://phabricator.wikimedia.org/T350094) (owner: 10Gergő Tisza) [21:14:56] (03Merged) 10jenkins-bot: tests: Fix PHP 8.2 warnings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1010562 (owner: 10Krinkle) [21:15:12] (03Merged) 10jenkins-bot: tests: Convert XWikimediaDebug cookie test to data provider [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1010563 (owner: 10Krinkle) [21:16:25] * Krinkle testing on mwdebug2001 [21:19:38] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [21:19:45] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [21:21:22] !log LDAP - removed kcv-wikimf from group wmf (T358658) [21:21:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:25:01] (03PS1) 10Dzahn: admin: absent user kcv-wikimf, renamed to kcvelaga [puppet] - 10https://gerrit.wikimedia.org/r/1011187 (https://phabricator.wikimedia.org/T358658) [21:25:21] (03CR) 10Krinkle: [C:03+2] "Tested on mwdebug2001 via `scap pull` and https://wikitech.wikimedia.org/wiki/Debugging_in_production before deploying to app servers:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1000307 (https://phabricator.wikimedia.org/T350094) (owner: 10Gergő Tisza) [21:26:40] (03CR) 10CI reject: [V:04-1] admin: absent user kcv-wikimf, renamed to kcvelaga [puppet] - 10https://gerrit.wikimedia.org/r/1011187 (https://phabricator.wikimedia.org/T358658) (owner: 10Dzahn) [21:28:27] PROBLEM - Check whether ferm is active by checking the default input chain on mw1361 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [21:28:35] PROBLEM - Check whether ferm is active by checking the default input chain on mw1482 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [21:29:54] 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for GeorgeMikesell - https://phabricator.wikimedia.org/T358922#9632618 (10Dzahn) [21:30:10] 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for GeorgeMikesell - https://phabricator.wikimedia.org/T358922#9632619 (10Dzahn) Confirmed per https://www.mediawiki.org/wiki/Wikimedia_Quality_and_Test_Engineering_Team [21:34:16] 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for GeorgeMikesell - https://phabricator.wikimedia.org/T358922#9632643 (10Dzahn) @Jrbranaa Thanks for the approval, we just need one more small thing. Could you let us know an expiry/renewal date for this access, please? We add th... [21:34:21] !log krinkle@deploy2002 Synchronized src/XWikimediaDebug.php: Support cookies in XWikimediaDebug, I5e33e90fd, T350094 (duration: 12m 08s) [21:34:25] T350094: Enable verbose logging without installing the WikimediaDebug extension - https://phabricator.wikimedia.org/T350094 [21:36:18] (03PS2) 10Dzahn: admin: absent user kcv-wikimf, renamed to kcvelaga [puppet] - 10https://gerrit.wikimedia.org/r/1011187 (https://phabricator.wikimedia.org/T358658) [21:36:54] (03CR) 10Dzahn: "@Muehlenhoff I assume a user who has had shell AND LDAP access only needs to be added to one of the 2 absentee groups and we still get the" [puppet] - 10https://gerrit.wikimedia.org/r/1011187 (https://phabricator.wikimedia.org/T358658) (owner: 10Dzahn) [21:43:50] (03PS1) 10Krinkle: InitialiseSettings: Factor out ext-MobileFrontend.php to its own file [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1011189 (https://phabricator.wikimedia.org/T308932) [21:44:26] (03CR) 10Krinkle: [C:03+1] Be able to disable MobileFrontend and drop the secondary domain (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1010268 (https://phabricator.wikimedia.org/T349408) (owner: 10Jforrester) [21:44:34] (03CR) 10CI reject: [V:04-1] InitialiseSettings: Factor out ext-MobileFrontend.php to its own file [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1011189 (https://phabricator.wikimedia.org/T308932) (owner: 10Krinkle) [21:45:20] (03PS3) 10Krinkle: Be able to disable MobileFrontend and drop the secondary domain [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1010268 (https://phabricator.wikimedia.org/T349408) (owner: 10Jforrester) [21:45:21] (03PS2) 10Krinkle: InitialiseSettings: Factor out ext-MobileFrontend.php to its own file [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1011189 (https://phabricator.wikimedia.org/T308932) [21:46:27] (03CR) 10CI reject: [V:04-1] InitialiseSettings: Factor out ext-MobileFrontend.php to its own file [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1011189 (https://phabricator.wikimedia.org/T308932) (owner: 10Krinkle) [21:48:48] (03PS3) 10Krinkle: InitialiseSettings: Factor out ext-MobileFrontend.php to its own file [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1011189 (https://phabricator.wikimedia.org/T308932) [21:50:12] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [21:50:19] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [21:52:33] 06SRE, 06serviceops, 07Wikimedia-production-error: VRT wiki fails to create account - https://phabricator.wikimedia.org/T359901#9632660 (10Dzahn) @Krd I hear that account creation is limited to 6 per 24 hours and that it's intentional per https://meta.wikimedia.org/wiki/Mass_account_creation If you need an... [21:55:26] (RoutinatorRsyncErrors) firing: Routinator rsync fetching issue in eqiad - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [21:58:27] RECOVERY - Check whether ferm is active by checking the default input chain on mw1361 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [21:58:35] RECOVERY - Check whether ferm is active by checking the default input chain on mw1482 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [22:02:23] (03CR) 10Majavah: "Updated roles applied to deployment-webperf21 and deployment-webperf22 to match." [puppet] - 10https://gerrit.wikimedia.org/r/935523 (owner: 10Krinkle) [22:03:11] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [22:03:18] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [22:06:16] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [22:06:23] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [22:08:33] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [22:08:40] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [22:22:49] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [22:22:55] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [22:30:02] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [22:30:09] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [22:32:26] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [22:32:33] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [22:34:30] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [22:34:37] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [22:34:49] (03PS3) 10Andrew Bogott: git-sync-upstream: on puppet7, deploy code after update [puppet] - 10https://gerrit.wikimedia.org/r/1009798 (https://phabricator.wikimedia.org/T351450) [22:34:50] (03PS14) 10Andrew Bogott: wmf_sink: Use puppet7 syntax [puppet] - 10https://gerrit.wikimedia.org/r/1007445 (https://phabricator.wikimedia.org/T351455) [22:34:52] (03PS15) 10Andrew Bogott: wmcs-puppetcertleaks: Use puppet7 syntax [puppet] - 10https://gerrit.wikimedia.org/r/1007444 (https://phabricator.wikimedia.org/T351455) [22:36:47] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [22:36:53] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [22:38:50] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [22:38:57] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [22:39:25] (SystemdUnitFailed) firing: rsync-aptrepo-apt2001.wikimedia.org.service on apt1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [22:40:46] (03PS4) 10Andrew Bogott: git-sync-upstream: on puppet7, deploy code after update [puppet] - 10https://gerrit.wikimedia.org/r/1009798 (https://phabricator.wikimedia.org/T351450) [22:40:47] (03PS15) 10Andrew Bogott: wmf_sink: Use puppet7 syntax [puppet] - 10https://gerrit.wikimedia.org/r/1007445 (https://phabricator.wikimedia.org/T351455) [22:40:54] (03PS16) 10Andrew Bogott: wmcs-puppetcertleaks: Use puppet7 syntax [puppet] - 10https://gerrit.wikimedia.org/r/1007444 (https://phabricator.wikimedia.org/T351455) [22:46:29] (03CR) 10Andrew Bogott: git-sync-upstream: on puppet7, deploy code after update (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1009798 (https://phabricator.wikimedia.org/T351450) (owner: 10Andrew Bogott) [22:51:12] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [22:51:19] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [22:57:30] (JobUnavailable) firing: (2) Reduced availability for job ldap in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [23:00:26] (RoutinatorRsyncErrors) resolved: Routinator rsync fetching issue in eqiad - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [23:10:33] (03CR) 10Jforrester: [C:03+1] InitialiseSettings: Factor out ext-MobileFrontend.php to its own file (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1011189 (https://phabricator.wikimedia.org/T308932) (owner: 10Krinkle)