[00:01:43] <jinxer-wm>	 RESOLVED: OtelCollectorRefusedSpans: Some spans have been refused by receiver otlp on k8s - TODO - https://grafana.wikimedia.org/d/SPebYW7Iz/opentelemetry-collector - https://alerts.wikimedia.org/?q=alertname%3DOtelCollectorRefusedSpans
[00:02:15] <wikibugs>	 (03CR) 10RLazarus: otelcol: Stop hardcoding k8s master IP addresses (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1054394 (https://phabricator.wikimedia.org/T365855) (owner: 10CDanis)
[00:02:32] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1054400 (owner: 10TrainBranchBot)
[00:02:56] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P66563 and previous config saved to /var/cache/conftool/dbconfig/20240716-000255-arnaudb.json
[00:10:11] <icinga-wm>	 RECOVERY - Hadoop HistoryServer on an-master1003 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23Mapreduce_Historyserver_process
[00:18:03] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P66564 and previous config saved to /var/cache/conftool/dbconfig/20240716-001802-arnaudb.json
[00:22:32] <zabe>	 !log zabe@mwmaint1002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=enwiktionary --logwiki=metawiki 'Dodo cham' 'Le GlitcheurHD' # T369777
[00:22:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:22:36] <stashbot>	 T369777: Unblock stuck global rename of Le GlitcheurHD - https://phabricator.wikimedia.org/T369777
[00:26:18] <zabe>	 !log zabe@mwmaint1002:/tmp/upload$ mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user=Trade . # T369998
[00:26:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:26:22] <stashbot>	 T369998: Server side upload for Trade - https://phabricator.wikimedia.org/T369998
[00:33:10] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2175 (T367781)', diff saved to https://phabricator.wikimedia.org/P66565 and previous config saved to /var/cache/conftool/dbconfig/20240716-003310-arnaudb.json
[00:33:12] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 4:00:00 on db2189.codfw.wmnet with reason: Maintenance
[00:33:15] <stashbot>	 T367781: Drop deprecated abuse filter fields on wmf wikis - https://phabricator.wikimedia.org/T367781
[00:33:25] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2189.codfw.wmnet with reason: Maintenance
[00:33:32] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Depooling db2189 (T367781)', diff saved to https://phabricator.wikimedia.org/P66566 and previous config saved to /var/cache/conftool/dbconfig/20240716-003331-arnaudb.json
[00:36:04] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2189 (T367781)', diff saved to https://phabricator.wikimedia.org/P66567 and previous config saved to /var/cache/conftool/dbconfig/20240716-003604-arnaudb.json
[00:40:13] <icinga-wm>	 PROBLEM - MariaDB Replica Lag: s1 on clouddb1017 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 306.92 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[00:51:12] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P66568 and previous config saved to /var/cache/conftool/dbconfig/20240716-005111-arnaudb.json
[00:56:06] <wikibugs>	 (03PS1) 10BCornwall: ncredir: Reformat/sort the redirects file [puppet] - 10https://gerrit.wikimedia.org/r/1054405
[00:56:20] <wikibugs>	 (03CR) 10CI reject: [V:04-1] ncredir: Reformat/sort the redirects file [puppet] - 10https://gerrit.wikimedia.org/r/1054405 (owner: 10BCornwall)
[00:56:38] <wikibugs>	 (03Abandoned) 10BCornwall: ncredir: Reformat/sort the redirects file [puppet] - 10https://gerrit.wikimedia.org/r/1054405 (owner: 10BCornwall)
[00:57:06] <wikibugs>	 (03PS5) 10BCornwall: ncredir: Reformat/sort the redirects file [puppet] - 10https://gerrit.wikimedia.org/r/1025875 (https://phabricator.wikimedia.org/T355189)
[00:57:19] <wikibugs>	 (03CR) 10CI reject: [V:04-1] ncredir: Reformat/sort the redirects file [puppet] - 10https://gerrit.wikimedia.org/r/1025875 (https://phabricator.wikimedia.org/T355189) (owner: 10BCornwall)
[00:59:59] <wikibugs>	 (03PS6) 10BCornwall: ncredir: Reformat/sort the redirects file [puppet] - 10https://gerrit.wikimedia.org/r/1025875 (https://phabricator.wikimedia.org/T355189)
[01:06:19] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P66569 and previous config saved to /var/cache/conftool/dbconfig/20240716-010618-arnaudb.json
[01:08:01] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/1.43.0-wmf.14 [core] (wmf/1.43.0-wmf.14) - 10https://gerrit.wikimedia.org/r/1054406 (https://phabricator.wikimedia.org/T366959)
[01:08:03] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/1.43.0-wmf.14 [core] (wmf/1.43.0-wmf.14) - 10https://gerrit.wikimedia.org/r/1054406 (https://phabricator.wikimedia.org/T366959) (owner: 10TrainBranchBot)
[01:21:28] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2189 (T367781)', diff saved to https://phabricator.wikimedia.org/P66570 and previous config saved to /var/cache/conftool/dbconfig/20240716-012125-arnaudb.json
[01:21:29] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 4:00:00 on db2197.codfw.wmnet with reason: Maintenance
[01:21:33] <stashbot>	 T367781: Drop deprecated abuse filter fields on wmf wikis - https://phabricator.wikimedia.org/T367781
[01:21:42] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2197.codfw.wmnet with reason: Maintenance
[01:30:43] <icinga-wm>	 PROBLEM - Wikitech-static main page has content on wikitech-static.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static
[01:32:33] <icinga-wm>	 RECOVERY - Wikitech-static main page has content on wikitech-static.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 29688 bytes in 0.662 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static
[01:32:43] <jinxer-wm>	 FIRING: OtelCollectorRefusedSpans: Some spans have been refused by receiver otlp on k8s - TODO - https://grafana.wikimedia.org/d/SPebYW7Iz/opentelemetry-collector - https://alerts.wikimedia.org/?q=alertname%3DOtelCollectorRefusedSpans
[01:33:05] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/1.43.0-wmf.14 [core] (wmf/1.43.0-wmf.14) - 10https://gerrit.wikimedia.org/r/1054406 (https://phabricator.wikimedia.org/T366959) (owner: 10TrainBranchBot)
[01:37:43] <jinxer-wm>	 RESOLVED: OtelCollectorRefusedSpans: Some spans have been refused by receiver otlp on k8s - TODO - https://grafana.wikimedia.org/d/SPebYW7Iz/opentelemetry-collector - https://alerts.wikimedia.org/?q=alertname%3DOtelCollectorRefusedSpans
[01:45:10] <wikibugs>	 (03CR) 10Krinkle: mediawiki: Refactor and improve captchaloop (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/993010 (owner: 10Reedy)
[01:48:40] <wikibugs>	 (03CR) 10Krinkle: mediawiki: Refactor and improve captchaloop (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/993010 (owner: 10Reedy)
[02:00:04] <jouncebot>	 Deploy window Automatic branching of MediaWiki, extensions, skins, and vendor – see Heterogeneous_deployment/Train_deploys (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240716T0200)
[02:07:31] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 4:00:00 on db2207.codfw.wmnet with reason: Maintenance
[02:07:44] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2207.codfw.wmnet with reason: Maintenance
[02:07:51] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Depooling db2207 (T367781)', diff saved to https://phabricator.wikimedia.org/P66572 and previous config saved to /var/cache/conftool/dbconfig/20240716-020751-arnaudb.json
[02:07:55] <stashbot>	 T367781: Drop deprecated abuse filter fields on wmf wikis - https://phabricator.wikimedia.org/T367781
[02:10:24] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2207 (T367781)', diff saved to https://phabricator.wikimedia.org/P66573 and previous config saved to /var/cache/conftool/dbconfig/20240716-021023-arnaudb.json
[02:25:31] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P66574 and previous config saved to /var/cache/conftool/dbconfig/20240716-022531-arnaudb.json
[02:30:41] <icinga-wm>	 PROBLEM - MariaDB Replica Lag: s4 on clouddb1019 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 301.22 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[02:32:41] <icinga-wm>	 PROBLEM - MariaDB Replica Lag: s4 on clouddb1019 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 303.04 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[02:37:41] <icinga-wm>	 PROBLEM - MariaDB Replica Lag: s4 on clouddb1019 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 301.01 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[02:38:43] <jinxer-wm>	 FIRING: OtelCollectorRefusedSpans: Some spans have been refused by receiver otlp on k8s - TODO - https://grafana.wikimedia.org/d/SPebYW7Iz/opentelemetry-collector - https://alerts.wikimedia.org/?q=alertname%3DOtelCollectorRefusedSpans
[02:39:19] <jinxer-wm>	 FIRING: [3x] JobUnavailable: Reduced availability for job netbox_django in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:40:39] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P66575 and previous config saved to /var/cache/conftool/dbconfig/20240716-024038-arnaudb.json
[02:43:43] <jinxer-wm>	 RESOLVED: OtelCollectorRefusedSpans: Some spans have been refused by receiver otlp on k8s - TODO - https://grafana.wikimedia.org/d/SPebYW7Iz/opentelemetry-collector - https://alerts.wikimedia.org/?q=alertname%3DOtelCollectorRefusedSpans
[02:55:46] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2207 (T367781)', diff saved to https://phabricator.wikimedia.org/P66576 and previous config saved to /var/cache/conftool/dbconfig/20240716-025545-arnaudb.json
[02:55:50] <stashbot>	 T367781: Drop deprecated abuse filter fields on wmf wikis - https://phabricator.wikimedia.org/T367781
[02:59:19] <jinxer-wm>	 FIRING: [3x] JobUnavailable: Reduced availability for job netbox_django in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[03:00:05] <jouncebot>	 Deploy window Automatic deployment of of MediaWiki, extensions, skins, and vendor to testwikis only – see Heterogeneous_deployment/Train_deploys (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240716T0300)
[03:01:51] <wikibugs>	 (03PS1) 10TrainBranchBot: testwikis wikis to 1.43.0-wmf.14 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1054410 (https://phabricator.wikimedia.org/T366959)
[03:01:52] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] testwikis wikis to 1.43.0-wmf.14 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1054410 (https://phabricator.wikimedia.org/T366959) (owner: 10TrainBranchBot)
[03:02:31] <wikibugs>	 (03Merged) 10jenkins-bot: testwikis wikis to 1.43.0-wmf.14 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1054410 (https://phabricator.wikimedia.org/T366959) (owner: 10TrainBranchBot)
[03:03:00] <logmsgbot>	 !log mwpresync@deploy1002 Started scap sync-world: testwikis wikis to 1.43.0-wmf.14  refs T366959
[03:03:03] <stashbot>	 T366959: 1.43.0-wmf.14 deployment blockers - https://phabricator.wikimedia.org/T366959
[03:17:41] <icinga-wm>	 RECOVERY - MariaDB Replica Lag: s4 on clouddb1019 is OK: OK slave_sql_lag Replication lag: 45.12 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[03:37:13] <jinxer-wm>	 FIRING: [2x] RoutinatorRsyncErrors: Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors
[03:45:43] <jinxer-wm>	 FIRING: OtelCollectorRefusedSpans: Some spans have been refused by receiver otlp on k8s - TODO - https://grafana.wikimedia.org/d/SPebYW7Iz/opentelemetry-collector - https://alerts.wikimedia.org/?q=alertname%3DOtelCollectorRefusedSpans
[03:53:56] <logmsgbot>	 !log mwpresync@deploy1002 Finished scap: testwikis wikis to 1.43.0-wmf.14  refs T366959 (duration: 50m 56s)
[03:54:00] <stashbot>	 T366959: 1.43.0-wmf.14 deployment blockers - https://phabricator.wikimedia.org/T366959
[04:01:01] <logmsgbot>	 !log mwpresync@deploy1002 Pruned MediaWiki: 1.43.0-wmf.11 (duration: 00m 58s)
[04:03:04] <jouncebot>	 Deploy window Automatic removal of all obsolete MediaWiki versions from the deployment and bare metal servers (except the most-recent obsolete version) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240716T0400)
[04:41:41] <icinga-wm>	 PROBLEM - MariaDB Replica Lag: s4 on clouddb1019 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 317.96 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[04:57:11] <icinga-wm>	 PROBLEM - Uncommitted DNS changes in Netbox on netbox1002 is CRITICAL: Netbox has uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes
[04:57:38] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 1:00:00 on 25 hosts with reason: Primary switchover s3 T370019
[04:57:42] <stashbot>	 T370019: Switchover s3 master (db1157 -> db1223) - https://phabricator.wikimedia.org/T370019
[04:57:59] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 T370019
[04:58:08] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool db1174', diff saved to https://phabricator.wikimedia.org/P66577 and previous config saved to /var/cache/conftool/dbconfig/20240716-045807-marostegui.json
[04:58:17] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Long schema change
[04:58:30] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Long schema change
[04:58:40] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Set db1223 with weight 0 T370019', diff saved to https://phabricator.wikimedia.org/P66578 and previous config saved to /var/cache/conftool/dbconfig/20240716-045839-root.json
[04:59:41] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] mariadb: Promote db1223 to s3 master [puppet] - 10https://gerrit.wikimedia.org/r/1054076 (https://phabricator.wikimedia.org/T370019) (owner: 10Gerrit maintenance bot)
[05:07:15] <icinga-wm>	 RECOVERY - MariaDB Replica Lag: s1 on clouddb1017 is OK: OK slave_sql_lag Replication lag: 40.04 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[05:14:32] <wikibugs>	 (03PS1) 10Marostegui: db1174: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1054411
[05:15:00] <marostegui>	 !log Starting s3 eqiad failover from db1157 to db1223 - T370019
[05:15:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:15:03] <stashbot>	 T370019: Switchover s3 master (db1157 -> db1223) - https://phabricator.wikimedia.org/T370019
[05:15:17] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Set s3 eqiad as read-only for maintenance - T370019', diff saved to https://phabricator.wikimedia.org/P66579 and previous config saved to /var/cache/conftool/dbconfig/20240716-051516-root.json
[05:15:39] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Promote db1223 to s3 primary and set section read-write T370019', diff saved to https://phabricator.wikimedia.org/P66580 and previous config saved to /var/cache/conftool/dbconfig/20240716-051538-root.json
[05:16:00] <wikibugs>	 (03PS2) 10Gerrit maintenance bot: wmnet: Update s3-master alias [dns] - 10https://gerrit.wikimedia.org/r/1054077 (https://phabricator.wikimedia.org/T370019)
[05:16:17] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] wmnet: Update s3-master alias [dns] - 10https://gerrit.wikimedia.org/r/1054077 (https://phabricator.wikimedia.org/T370019) (owner: 10Gerrit maintenance bot)
[05:16:18] <wikibugs>	 (03CR) 10Marostegui: [V:03+2 C:03+2] wmnet: Update s3-master alias [dns] - 10https://gerrit.wikimedia.org/r/1054077 (https://phabricator.wikimedia.org/T370019) (owner: 10Gerrit maintenance bot)
[05:16:31] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] db1174: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1054411 (owner: 10Marostegui)
[05:17:19] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool db1157 T370019', diff saved to https://phabricator.wikimedia.org/P66581 and previous config saved to /var/cache/conftool/dbconfig/20240716-051718-root.json
[05:17:39] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Long schema change
[05:17:41] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Long schema change
[05:17:51] <wikibugs>	 10ops-codfw, 06DC-Ops, 10observability: Q1:rack/setup/install alert2002 - https://phabricator.wikimedia.org/T370112#9983942 (10andrea.denisse) a:03andrea.denisse
[05:19:23] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 213, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[05:19:37] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 45, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[05:19:57] <wikibugs>	 (03PS1) 10Marostegui: db1157: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1054412
[05:22:18] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] db1157: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1054412 (owner: 10Marostegui)
[05:25:51] <Seawolf35>	 I have a patch scheduled in the next back port window but I won't have power then so I won't be around. fyi urbanecm
[05:25:56] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Long schema change
[05:25:58] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Long schema change
[05:27:23] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 214, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[05:27:41] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqord is OK: OK: host 208.80.154.198, interfaces up: 46, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[05:43:12] <marostegui>	 !log Deploy schema change on s3 eqiad db1157 dbmaint T367856
[05:43:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:43:15] <stashbot>	 T367856: Cleanup revision table schema - https://phabricator.wikimedia.org/T367856
[05:43:20] <marostegui>	 !log Deploy schema change on s7 eqiad db1174 dbmaint T367856
[05:43:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:44:00] <wikibugs>	 (03PS1) 10Gerrit maintenance bot: mariadb: Promote db1236 to s7 master [puppet] - 10https://gerrit.wikimedia.org/r/1054413 (https://phabricator.wikimedia.org/T370121)
[05:44:05] <wikibugs>	 (03PS1) 10Gerrit maintenance bot: wmnet: Update s7-master alias [dns] - 10https://gerrit.wikimedia.org/r/1054414 (https://phabricator.wikimedia.org/T370121)
[05:50:26] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 213, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[05:50:44] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 45, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[05:56:37] <kart_>	 marostegui: let me know if it is Ok to deploy cxserver.
[05:56:44] <marostegui>	 kart_: go for it!
[05:57:27] <kart_>	 thanks!
[05:58:28] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 214, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[05:58:46] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqord is OK: OK: host 208.80.154.198, interfaces up: 46, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[06:00:05] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240716T0600)
[06:00:05] <jouncebot>	 marostegui, Amir1, and arnaudb: #bothumor My software never has bugs. It just develops random features. Rise for Primary database switchover. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240716T0600).
[06:02:29] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 213, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[06:02:31] <wikibugs>	 (03Merged) 10jenkins-bot: Update cxserver to 2024-07-15-100650-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1054340 (https://phabricator.wikimedia.org/T354666) (owner: 10KartikMistry)
[06:02:49] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 45, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[06:04:21] <jinxer-wm>	 FIRING: PoolcounterFullQueues: Full queues for poolcounter1004:9106 poolcounter - https://www.mediawiki.org/wiki/PoolCounter#Request_tracing_in_production - https://grafana.wikimedia.org/d/aIcYxuxZk/poolcounter?orgId=1&viewPanel=6&from=now-1h&to=now&var-dc=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DPoolcounterFullQueues
[06:05:53] <logmsgbot>	 !log kartik@deploy1002 helmfile [staging] START helmfile.d/services/cxserver: apply
[06:06:16] <logmsgbot>	 !log kartik@deploy1002 helmfile [staging] DONE helmfile.d/services/cxserver: apply
[06:07:54] <icinga-wm>	 ACKNOWLEDGEMENT - MariaDB Replica Lag: s7 on dbstore1008 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 232932.07 seconds Marostegui https://phabricator.wikimedia.org/T370122 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[06:09:21] <jinxer-wm>	 RESOLVED: PoolcounterFullQueues: Full queues for poolcounter1004:9106 poolcounter - https://www.mediawiki.org/wiki/PoolCounter#Request_tracing_in_production - https://grafana.wikimedia.org/d/aIcYxuxZk/poolcounter?orgId=1&viewPanel=6&from=now-1h&to=now&var-dc=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DPoolcounterFullQueues
[06:11:11] <logmsgbot>	 !log kartik@deploy1002 helmfile [codfw] START helmfile.d/services/cxserver: apply
[06:11:41] <logmsgbot>	 !log kartik@deploy1002 helmfile [codfw] DONE helmfile.d/services/cxserver: apply
[06:11:45] <jinxer-wm>	 FIRING: [2x] Processor usage over 85%: Alert for device cr1-eqiad.wikimedia.org - Processor usage over 85%   - https://alerts.wikimedia.org/?q=alertname%3DProcessor+usage+over+85%25
[06:12:23] <wikibugs>	 (03PS1) 10Marostegui: dbstore1008: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1054421
[06:12:32] <logmsgbot>	 !log kevinbazira@deploy1002 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'readability' for release 'main' .
[06:12:52] <logmsgbot>	 !log kevinbazira@deploy1002 helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
[06:12:59] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] dbstore1008: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1054421 (owner: 10Marostegui)
[06:16:03] <logmsgbot>	 !log kartik@deploy1002 helmfile [eqiad] START helmfile.d/services/cxserver: apply
[06:16:33] <logmsgbot>	 !log kartik@deploy1002 helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
[06:16:45] <jinxer-wm>	 RESOLVED: [2x] Processor usage over 85%: Device cr1-eqiad.wikimedia.org recovered from Processor usage over 85%   - https://alerts.wikimedia.org/?q=alertname%3DProcessor+usage+over+85%25
[06:18:35] <kart_>	 !log Updated cxserver to 2024-07-15-100650-production (T354666)
[06:18:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:18:39] <stashbot>	 T354666: Enable MADLAD-400 in MinT test instance and Production for Wikipedia languages not supported by other services - https://phabricator.wikimedia.org/T354666
[06:34:02] <Dreamy_Jazz>	 jouncebot: nowandnext
[06:34:02] <jouncebot>	 For the next 0 hour(s) and 25 minute(s): MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240716T0600)
[06:34:02] <jouncebot>	 In 0 hour(s) and 25 minute(s): UTC morning backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240716T0700)
[06:40:03] <icinga-wm>	 PROBLEM - Postgres Replication Lag on puppetdb2003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB puppetdb (host:localhost) 77868944 and 2 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[06:41:03] <icinga-wm>	 RECOVERY - Postgres Replication Lag on puppetdb2003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB puppetdb (host:localhost) 0 and 0 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[06:54:50] <Dreamy_Jazz>	 jouncebot: nowandnext
[06:54:50] <jouncebot>	 For the next 0 hour(s) and 5 minute(s): MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240716T0600)
[06:54:50] <jouncebot>	 In 0 hour(s) and 5 minute(s): UTC morning backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240716T0700)
[06:57:55] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops: cr3-ulsfo flapping on July 14 - https://phabricator.wikimedia.org/T370048#9984008 (10ayounsi) 05Open→03Resolved a:03ayounsi Closing this task in favor of {T364092}.
[06:58:03] <wikibugs>	 (03PS1) 10Slyngshede: data.yaml: Offboarding fjoseph [puppet] - 10https://gerrit.wikimedia.org/r/1054425
[06:58:11] <wikibugs>	 (03CR) 10CI reject: [V:04-1] data.yaml: Offboarding fjoseph [puppet] - 10https://gerrit.wikimedia.org/r/1054425 (owner: 10Slyngshede)
[06:58:15] <wikibugs>	 (03PS2) 10Slyngshede: data.yaml: Offboarding fjoseph [puppet] - 10https://gerrit.wikimedia.org/r/1054425
[06:59:00] <logmsgbot>	 !log ayounsi@cumin1002 START - Cookbook sre.network.peering with action 'configure' for AS: 52999
[06:59:19] <jinxer-wm>	 FIRING: [2x] JobUnavailable: Reduced availability for job netbox_django in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[06:59:25] <logmsgbot>	 !log ayounsi@cumin1002 END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 52999
[07:00:04] <jouncebot>	 Amir1 and Urbanecm: It is that lovely time of the day again! You are hereby commanded to deploy UTC morning backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240716T0700).
[07:00:04] <jouncebot>	 Seawolf35 and Dreamy_Jazz: A patch you scheduled for UTC morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[07:00:11] <Dreamy_Jazz>	 \o
[07:01:14] <wikibugs>	 (03PS2) 10Dreamy Jazz: [CheckUser] Remove wgCheckUserEventTablesMigrationStage config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1053297 (https://phabricator.wikimedia.org/T366546)
[07:01:18] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops: Upgrade core routers to Junos 22.4R3 - https://phabricator.wikimedia.org/T364092#9984011 (10ayounsi) There has been a spike of CPU usage on cr1-eqiad (with no impact), not sure if just a coincidence.
[07:01:21] <wikibugs>	 (03Abandoned) 10Ayounsi: python_deploy_venv.sh enable proxy support [puppet] - 10https://gerrit.wikimedia.org/r/1053000 (https://phabricator.wikimedia.org/T336275) (owner: 10Ayounsi)
[07:03:39] <Dreamy_Jazz>	 I see that Seawolf35 isn't around for this window based on an above message
[07:03:59] <Dreamy_Jazz>	 I will therefore deploy my patch
[07:05:44] <wikibugs>	 (03PS3) 10Dreamy Jazz: [CheckUser] Remove wgCheckUserEventTablesMigrationStage config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1053297 (https://phabricator.wikimedia.org/T366546)
[07:06:21] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy1002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1053297 (https://phabricator.wikimedia.org/T366546) (owner: 10Dreamy Jazz)
[07:06:59] <wikibugs>	 (03Merged) 10jenkins-bot: [CheckUser] Remove wgCheckUserEventTablesMigrationStage config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1053297 (https://phabricator.wikimedia.org/T366546) (owner: 10Dreamy Jazz)
[07:07:08] <logmsgbot>	 !log volans@cumin1002 START - Cookbook sre.dns.netbox
[07:07:43] <logmsgbot>	 !log dreamyjazz@deploy1002 Started scap sync-world: Backport for [[gerrit:1053297|[CheckUser] Remove wgCheckUserEventTablesMigrationStage config (T366546)]]
[07:07:46] <stashbot>	 T366546: Remove wgCheckUserEventTablesMigrationStage and related migration code - https://phabricator.wikimedia.org/T366546
[07:10:51] <logmsgbot>	 !log volans@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Merging pending changes for frack hosts as per IRC discussion - volans@cumin1002"
[07:12:12] <icinga-wm>	 RECOVERY - Uncommitted DNS changes in Netbox on netbox1002 is OK: Netbox has zero uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes
[07:13:13] <logmsgbot>	 !log volans@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Merging pending changes for frack hosts as per IRC discussion - volans@cumin1002"
[07:13:13] <logmsgbot>	 !log volans@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[07:14:48] <logmsgbot>	 !log dreamyjazz@deploy1002 dreamyjazz: Backport for [[gerrit:1053297|[CheckUser] Remove wgCheckUserEventTablesMigrationStage config (T366546)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[07:14:53] <stashbot>	 T366546: Remove wgCheckUserEventTablesMigrationStage and related migration code - https://phabricator.wikimedia.org/T366546
[07:14:56] <wikibugs>	 (03CR) 10Volans: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1054425 (owner: 10Slyngshede)
[07:14:57] <logmsgbot>	 !log dreamyjazz@deploy1002 dreamyjazz: Continuing with sync
[07:16:18] <wikibugs>	 (03CR) 10Slyngshede: [C:03+2] data.yaml: Offboarding fjoseph [puppet] - 10https://gerrit.wikimedia.org/r/1054425 (owner: 10Slyngshede)
[07:19:07] <wikibugs>	 (03PS1) 10Slyngshede: data.yaml: Extend MOU for dalezhou [puppet] - 10https://gerrit.wikimedia.org/r/1054427
[07:19:52] <logmsgbot>	 !log dreamyjazz@deploy1002 Finished scap: Backport for [[gerrit:1053297|[CheckUser] Remove wgCheckUserEventTablesMigrationStage config (T366546)]] (duration: 12m 09s)
[07:19:56] <stashbot>	 T366546: Remove wgCheckUserEventTablesMigrationStage and related migration code - https://phabricator.wikimedia.org/T366546
[07:22:47] <wikibugs>	 (03CR) 10Slyngshede: "Extension requested by mgerlach@ who also provided the new email address for the user." [puppet] - 10https://gerrit.wikimedia.org/r/1054427 (owner: 10Slyngshede)
[07:25:08] <Dreamy_Jazz>	 I'm not sure I will be able to deploy the other change, considering that Seawolf35 isn't here for the window.
[07:25:12] <wikibugs>	 (03PS1) 10Marostegui: filtered_tables: Remove columns [puppet] - 10https://gerrit.wikimedia.org/r/1054428 (https://phabricator.wikimedia.org/T367781)
[07:28:46] <logmsgbot>	 !log klausman@cumin1002 START - Cookbook sre.hosts.reboot-single for host ml-serve1002.eqiad.wmnet
[07:29:32] <Dreamy_Jazz>	 !log Restarted MediaModeration scanning scrpt
[07:29:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:30:45] <wikibugs>	 (03CR) 10Arnaudb: [C:03+1] filtered_tables: Remove columns [puppet] - 10https://gerrit.wikimedia.org/r/1054428 (https://phabricator.wikimedia.org/T367781) (owner: 10Marostegui)
[07:30:50] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] filtered_tables: Remove columns [puppet] - 10https://gerrit.wikimedia.org/r/1054428 (https://phabricator.wikimedia.org/T367781) (owner: 10Marostegui)
[07:37:13] <jinxer-wm>	 FIRING: [2x] RoutinatorRsyncErrors: Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors
[07:38:10] <logmsgbot>	 !log klausman@cumin1002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1002.eqiad.wmnet
[07:38:37] <logmsgbot>	 !log klausman@cumin1002 START - Cookbook sre.hosts.reboot-single for host ml-serve1006.eqiad.wmnet
[07:40:33] <Dreamy_Jazz>	 !log Morning UTC backport window done
[07:40:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:41:16] <icinga-wm>	 PROBLEM - BGP status on lsw1-e3-eqiad.mgmt is CRITICAL: BGP CRITICAL - AS64606/IPv6: Connect - kubernetes-ml-eqiad, AS64606/IPv4: Connect - kubernetes-ml-eqiad https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[07:45:16] <icinga-wm>	 RECOVERY - BGP status on lsw1-e3-eqiad.mgmt is OK: BGP OK - up: 22, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[07:45:58] <jinxer-wm>	 FIRING: OtelCollectorRefusedSpans: Some spans have been refused by receiver otlp on k8s - TODO - https://grafana.wikimedia.org/d/SPebYW7Iz/opentelemetry-collector - https://alerts.wikimedia.org/?q=alertname%3DOtelCollectorRefusedSpans
[07:46:18] <logmsgbot>	 !log klausman@cumin1002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1006.eqiad.wmnet
[07:52:25] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: 10gbit nic option for centrallog2002 - https://phabricator.wikimedia.org/T369826#9984050 (10fgiunchedi) >>! In T369826#9982422, @Jhancock.wm wrote: > We won't need to move racks. But because of the way the switches are, we can't reuse the same port on the switch. we'll be moving...
[07:52:33] <wikibugs>	 (03CR) 10Jelto: [V:03+1 C:03+2] gitlab: switch gitlab from iptables to nftables [puppet] - 10https://gerrit.wikimedia.org/r/1053879 (https://phabricator.wikimedia.org/T366882) (owner: 10Jelto)
[08:07:08] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1174 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P66584 and previous config saved to /var/cache/conftool/dbconfig/20240716-080707-root.json
[08:07:21] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1157 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P66585 and previous config saved to /var/cache/conftool/dbconfig/20240716-080720-root.json
[08:07:45] <wikibugs>	 (03PS1) 10Marostegui: Revert "db1157: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/1054483
[08:07:47] <wikibugs>	 (03PS1) 10Volans: mysql_legacy: update core sections [software/spicerack] - 10https://gerrit.wikimedia.org/r/1054484 (https://phabricator.wikimedia.org/T367496)
[08:07:49] <wikibugs>	 (03PS1) 10Marostegui: Revert "db1174: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/1054485
[08:08:12] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] Revert "db1157: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/1054483 (owner: 10Marostegui)
[08:08:21] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] Revert "db1174: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/1054485 (owner: 10Marostegui)
[08:08:42] <icinga-wm>	 RECOVERY - MariaDB Replica Lag: s4 on clouddb1019 is OK: OK slave_sql_lag Replication lag: 0.00 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[08:09:34] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 4:00:00 on db1150.eqiad.wmnet with reason: Maintenance
[08:09:47] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1150.eqiad.wmnet with reason: Maintenance
[08:10:31] <wikibugs>	 (03CR) 10Marostegui: [C:03+1] mysql_legacy: update core sections [software/spicerack] - 10https://gerrit.wikimedia.org/r/1054484 (https://phabricator.wikimedia.org/T367496) (owner: 10Volans)
[08:11:30] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1157 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P66586 and previous config saved to /var/cache/conftool/dbconfig/20240716-081129-root.json
[08:11:43] <wikibugs>	 (03CR) 10Arnaudb: [C:03+1] mysql_legacy: update core sections [software/spicerack] - 10https://gerrit.wikimedia.org/r/1054484 (https://phabricator.wikimedia.org/T367496) (owner: 10Volans)
[08:13:53] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 4:00:00 on db1157.eqiad.wmnet with reason: Maintenance
[08:13:55] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1157.eqiad.wmnet with reason: Maintenance
[08:14:02] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Depooling db1157 (T367781)', diff saved to https://phabricator.wikimedia.org/P66587 and previous config saved to /var/cache/conftool/dbconfig/20240716-081401-arnaudb.json
[08:14:06] <stashbot>	 T367781: Drop deprecated abuse filter fields on wmf wikis - https://phabricator.wikimedia.org/T367781
[08:14:25] <wikibugs>	 (03CR) 10Volans: [C:03+2] mysql_legacy: update core sections [software/spicerack] - 10https://gerrit.wikimedia.org/r/1054484 (https://phabricator.wikimedia.org/T367496) (owner: 10Volans)
[08:15:02] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 214, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[08:15:06] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqord is OK: OK: host 208.80.154.198, interfaces up: 46, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[08:20:54] <wikibugs>	 (03Merged) 10jenkins-bot: mysql_legacy: update core sections [software/spicerack] - 10https://gerrit.wikimedia.org/r/1054484 (https://phabricator.wikimedia.org/T367496) (owner: 10Volans)
[08:22:13] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1174 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P66588 and previous config saved to /var/cache/conftool/dbconfig/20240716-082213-root.json
[08:25:19] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:03+2] o11y: disable pint promql/series for BenthosKafkaConsumerLag + webrequest [alerts] - 10https://gerrit.wikimedia.org/r/1054363 (https://phabricator.wikimedia.org/T369737) (owner: 10Filippo Giunchedi)
[08:27:27] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool db1174', diff saved to https://phabricator.wikimedia.org/P66589 and previous config saved to /var/cache/conftool/dbconfig/20240716-082727-root.json
[08:28:03] <wikibugs>	 (03PS1) 10Marostegui: Revert^2 "db1174: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/1054489
[08:28:43] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Long schema change
[08:28:51] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] Revert^2 "db1174: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/1054489 (owner: 10Marostegui)
[08:28:56] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Long schema change
[08:31:16] <marostegui>	 !log Clone dbstore1008:3317 from db1174 T370122
[08:31:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:31:19] <stashbot>	 T370122: dbstore1008:3317 (s7) crashed - https://phabricator.wikimedia.org/T370122
[08:32:53] <godog>	 !log root@kafka-logging1001:~# kafka topics --alter --topic mediawiki.httpd.accesslog --partitions 12 - T369256
[08:32:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:32:57] <stashbot>	 T369256: Kafka lag for benthos-mw-accesslog-sampler and mediawiki.httpd.accesslog topic - https://phabricator.wikimedia.org/T369256
[08:49:44] <icinga-wm>	 PROBLEM - BGP status on cr2-drmrs is CRITICAL: BGP CRITICAL - No response from remote host 185.15.58.129 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[08:49:44] <icinga-wm>	 PROBLEM - BGP status on cr1-drmrs is CRITICAL: BGP CRITICAL - No response from remote host 185.15.58.128 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[08:50:28] <logmsgbot>	 !log isaranto@deploy1002 helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
[08:51:06] <wikibugs>	 (03PS2) 10Effie Mouzeli: mcrouter: test bookworm image on mw-debug [deployment-charts] - 10https://gerrit.wikimedia.org/r/1054367 (https://phabricator.wikimedia.org/T368366)
[08:51:15] <wikibugs>	 (03PS1) 10Marostegui: filtered_tables.txt: Remove old columns [puppet] - 10https://gerrit.wikimedia.org/r/1054495 (https://phabricator.wikimedia.org/T343718)
[08:51:20] <wikibugs>	 06SRE, 06collaboration-services, 06Release-Engineering-Team, 06Traffic, 13Patch-For-Review: implement anti-abuse features for GitLAb (Move GitLab behind the CDN) - https://phabricator.wikimedia.org/T366882#9984285 (10Jelto)
[08:51:47] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] filtered_tables.txt: Remove old columns [puppet] - 10https://gerrit.wikimedia.org/r/1054495 (https://phabricator.wikimedia.org/T343718) (owner: 10Marostegui)
[08:53:22] <wikibugs>	 06SRE, 06collaboration-services, 06Release-Engineering-Team, 06Traffic, 13Patch-For-Review: implement anti-abuse features for GitLAb (Move GitLab behind the CDN) - https://phabricator.wikimedia.org/T366882#9984291 (10Jelto) I migrated the GitLab hosts to nftables which unblocks us using nftables built-in...
[08:53:35] <wikibugs>	 06SRE, 06collaboration-services, 06Release-Engineering-Team, 06Traffic, 13Patch-For-Review: implement anti-abuse features for GitLab (Move GitLab behind the CDN) - https://phabricator.wikimedia.org/T366882#9984292 (10Jelto)
[08:54:21] <effie>	 jouncebot: now
[08:54:21] <jouncebot>	 No deployments scheduled for the next 1 hour(s) and 5 minute(s)
[08:54:29] <effie>	 jouncebot: next
[08:54:29] <jouncebot>	 In 1 hour(s) and 5 minute(s): MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240716T1000)
[08:55:55] <wikibugs>	 (03PS1) 10Marostegui: filtered_tables.txt: Remove unused columns [puppet] - 10https://gerrit.wikimedia.org/r/1054496 (https://phabricator.wikimedia.org/T318955)
[08:55:55] <wikibugs>	 06SRE, 06collaboration-services, 06Release-Engineering-Team, 06Traffic, 13Patch-For-Review: implement anti-abuse features for GitLab (Move GitLab behind the CDN) - https://phabricator.wikimedia.org/T366882#9984312 (10Jelto)
[08:56:32] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] filtered_tables.txt: Remove unused columns [puppet] - 10https://gerrit.wikimedia.org/r/1054496 (https://phabricator.wikimedia.org/T318955) (owner: 10Marostegui)
[08:58:05] <wikibugs>	 (03CR) 10JMeybohm: [C:03+1] mcrouter: test bookworm image on mw-debug [deployment-charts] - 10https://gerrit.wikimedia.org/r/1054367 (https://phabricator.wikimedia.org/T368366) (owner: 10Effie Mouzeli)
[09:00:58] <wikibugs>	 (03CR) 10Effie Mouzeli: [C:03+2] mcrouter: test bookworm image on mw-debug [deployment-charts] - 10https://gerrit.wikimedia.org/r/1054367 (https://phabricator.wikimedia.org/T368366) (owner: 10Effie Mouzeli)
[09:02:15] <wikibugs>	 (03Merged) 10jenkins-bot: mcrouter: test bookworm image on mw-debug [deployment-charts] - 10https://gerrit.wikimedia.org/r/1054367 (https://phabricator.wikimedia.org/T368366) (owner: 10Effie Mouzeli)
[09:02:57] <logmsgbot>	 !log jiji@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-debug: apply
[09:03:00] <logmsgbot>	 !log jiji@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
[09:03:05] <logmsgbot>	 !log jiji@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-debug: apply
[09:03:08] <logmsgbot>	 !log jiji@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
[09:03:32] <logmsgbot>	 !log jiji@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-debug: apply
[09:04:03] <logmsgbot>	 !log jiji@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
[09:04:37] <wikibugs>	 (03CR) 10Elukey: mcrouter: test bookworm image on mw-debug (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1054367 (https://phabricator.wikimedia.org/T368366) (owner: 10Effie Mouzeli)
[09:06:21] <logmsgbot>	 !log btullis@cumin1002 START - Cookbook sre.wikireplicas.add-wiki for database aewikimedia (T362529)
[09:06:25] <stashbot>	 T362529: Create a Wikimedians of United Arab Emirates User Group Wiki - https://phabricator.wikimedia.org/T362529
[09:06:34] <wikibugs>	 (03PS1) 10Marostegui: filtered_tables.txt: Remove ununsed columns [puppet] - 10https://gerrit.wikimedia.org/r/1054498 (https://phabricator.wikimedia.org/T314041)
[09:07:24] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] filtered_tables.txt: Remove ununsed columns [puppet] - 10https://gerrit.wikimedia.org/r/1054498 (https://phabricator.wikimedia.org/T314041) (owner: 10Marostegui)
[09:08:55] <wikibugs>	 (03PS1) 10Effie Mouzeli: mw-debug: fix mcrouter image version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1054499
[09:09:11] <wikibugs>	 (03CR) 10Vgutierrez: "looking good, as mentioned on the inline comment it would be great if we don't need root privileges to fetch the suffix list file from the" [puppet] - 10https://gerrit.wikimedia.org/r/1054069 (owner: 10BCornwall)
[09:10:25] <wikibugs>	 (03CR) 10Effie Mouzeli: [C:03+2] mw-debug: fix mcrouter image version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1054499 (owner: 10Effie Mouzeli)
[09:11:08] <elukey>	 !log update docker-report to 0.0.14-1 on bullseye-wikimedia
[09:11:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:11:24] <wikibugs>	 (03Merged) 10jenkins-bot: mw-debug: fix mcrouter image version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1054499 (owner: 10Effie Mouzeli)
[09:11:59] <logmsgbot>	 !log jiji@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-debug: apply
[09:12:01] <logmsgbot>	 !log jiji@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
[09:12:11] <logmsgbot>	 !log jiji@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-debug: apply
[09:12:45] <wikibugs>	 (03PS1) 10Marostegui: filtered_tables.txt: Drop unused columns [puppet] - 10https://gerrit.wikimedia.org/r/1054501 (https://phabricator.wikimedia.org/T300774)
[09:12:48] <logmsgbot>	 !log jiji@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
[09:12:56] <elukey>	 !log update docker-registry to 0.0.14-1 on build2001
[09:12:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:13:40] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] filtered_tables.txt: Drop unused columns [puppet] - 10https://gerrit.wikimedia.org/r/1054501 (https://phabricator.wikimedia.org/T300774) (owner: 10Marostegui)
[09:14:19] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1157 (T367781)', diff saved to https://phabricator.wikimedia.org/P66591 and previous config saved to /var/cache/conftool/dbconfig/20240716-091418-arnaudb.json
[09:14:24] <stashbot>	 T367781: Drop deprecated abuse filter fields on wmf wikis - https://phabricator.wikimedia.org/T367781
[09:16:25] <wikibugs>	 (03PS3) 10Effie Mouzeli: mcrouter: test bookworm image on mw-api-int [deployment-charts] - 10https://gerrit.wikimedia.org/r/1054368 (https://phabricator.wikimedia.org/T368366)
[09:20:51] <godog>	 !log bounce benthos@mw_accesslog_sampler - T369256
[09:20:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:20:56] <stashbot>	 T369256: Kafka lag for benthos-mw-accesslog-sampler and mediawiki.httpd.accesslog topic - https://phabricator.wikimedia.org/T369256
[09:22:58] <wikibugs>	 (03PS4) 10Effie Mouzeli: mcrouter: test bookworm image on mw-api-int [deployment-charts] - 10https://gerrit.wikimedia.org/r/1054368 (https://phabricator.wikimedia.org/T368366)
[09:23:51] <logmsgbot>	 !log isaranto@deploy1002 helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
[09:29:25] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P66592 and previous config saved to /var/cache/conftool/dbconfig/20240716-092924-arnaudb.json
[09:29:30] <wikibugs>	 (03PS4) 10Ayounsi: Extend STORAGE_BACKEND config to support Swift (#16319) [software/netbox] - 10https://gerrit.wikimedia.org/r/980908 (https://phabricator.wikimedia.org/T310717)
[09:30:50] <wikibugs>	 (03CR) 10DCausse: "thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/1054392 (owner: 10Ryan Kemper)
[09:30:59] <wikibugs>	 (03PS1) 10Slyngshede: C:idm configure 2FA proxy endpoint. [puppet] - 10https://gerrit.wikimedia.org/r/1054502
[09:31:49] <wikibugs>	 (03PS5) 10Effie Mouzeli: mcrouter: test bookworm image on mw-api-int [deployment-charts] - 10https://gerrit.wikimedia.org/r/1054368 (https://phabricator.wikimedia.org/T368366)
[09:32:01] <logmsgbot>	 !log btullis@cumin1002 END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database aewikimedia (T362529)
[09:32:04] <stashbot>	 T362529: Create a Wikimedians of United Arab Emirates User Group Wiki - https://phabricator.wikimedia.org/T362529
[09:33:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: docker-reporter-base-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[09:34:10] <wikibugs>	 (03CR) 10Elukey: [C:03+1] mcrouter: test bookworm image on mw-api-int [deployment-charts] - 10https://gerrit.wikimedia.org/r/1054368 (https://phabricator.wikimedia.org/T368366) (owner: 10Effie Mouzeli)
[09:37:01] <logmsgbot>	 !log jiji@deploy1002 helmfile [codfw] START helmfile.d/services/mw-api-int: apply
[09:37:05] <logmsgbot>	 !log jiji@deploy1002 helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
[09:37:21] <wikibugs>	 (03CR) 10Effie Mouzeli: [C:03+2] mcrouter: test bookworm image on mw-api-int [deployment-charts] - 10https://gerrit.wikimedia.org/r/1054368 (https://phabricator.wikimedia.org/T368366) (owner: 10Effie Mouzeli)
[09:38:12] <wikibugs>	 (03Merged) 10jenkins-bot: mcrouter: test bookworm image on mw-api-int [deployment-charts] - 10https://gerrit.wikimedia.org/r/1054368 (https://phabricator.wikimedia.org/T368366) (owner: 10Effie Mouzeli)
[09:39:13] <logmsgbot>	 !log jiji@deploy1002 helmfile [codfw] START helmfile.d/services/mw-api-int: apply
[09:42:25] <logmsgbot>	 !log jiji@deploy1002 helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
[09:44:31] <logmsgbot>	 !log jiji@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
[09:44:32] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P66593 and previous config saved to /var/cache/conftool/dbconfig/20240716-094432-arnaudb.json
[09:46:49] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+1] wikifeeds: update references to deprecated services [deployment-charts] - 10https://gerrit.wikimedia.org/r/1053808 (https://phabricator.wikimedia.org/T367949) (owner: 10Scott French)
[09:47:52] <wikibugs>	 (03PS2) 10Ayounsi: Upgrade Netbox to 4.0.7 [software/netbox-deploy] - 10https://gerrit.wikimedia.org/r/1053243 (https://phabricator.wikimedia.org/T336275)
[09:48:05] <wikibugs>	 (03PS2) 10Gmodena: eventbus: enable instrumentation on group 0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1054357 (https://phabricator.wikimedia.org/T363587)
[09:49:22] <wikibugs>	 (03PS2) 10Slyngshede: C:idm configure 2FA proxy endpoint. [puppet] - 10https://gerrit.wikimedia.org/r/1054502
[09:50:04] <wikibugs>	 (03CR) 10Slyngshede: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/3237/co" [puppet] - 10https://gerrit.wikimedia.org/r/1054502 (owner: 10Slyngshede)
[09:50:12] <logmsgbot>	 !log jiji@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
[09:50:16] <wikibugs>	 (03CR) 10Elukey: [C:03+1] Upgrade Netbox to 4.0.7 [software/netbox-deploy] - 10https://gerrit.wikimedia.org/r/1053243 (https://phabricator.wikimedia.org/T336275) (owner: 10Ayounsi)
[09:50:43] <jinxer-wm>	 RESOLVED: OtelCollectorRefusedSpans: Some spans have been refused by receiver otlp on k8s - TODO - https://grafana.wikimedia.org/d/SPebYW7Iz/opentelemetry-collector - https://alerts.wikimedia.org/?q=alertname%3DOtelCollectorRefusedSpans
[09:52:01] <logmsgbot>	 !log isaranto@deploy1002 helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
[09:52:31] <logmsgbot>	 !log isaranto@deploy1002 helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
[09:53:52] <logmsgbot>	 !log isaranto@deploy1002 helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
[09:54:44] <logmsgbot>	 !log isaranto@deploy1002 helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
[09:59:40] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1157 (T367781)', diff saved to https://phabricator.wikimedia.org/P66594 and previous config saved to /var/cache/conftool/dbconfig/20240716-095939-arnaudb.json
[09:59:41] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 4:00:00 on db1166.eqiad.wmnet with reason: Maintenance
[09:59:44] <stashbot>	 T367781: Drop deprecated abuse filter fields on wmf wikis - https://phabricator.wikimedia.org/T367781
[09:59:55] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1166.eqiad.wmnet with reason: Maintenance
[09:59:57] <wikibugs>	 (03PS1) 10Effie Mouzeli: kubernetes: update mcrouter images to bookworm [puppet] - 10https://gerrit.wikimedia.org/r/1054507 (https://phabricator.wikimedia.org/T368366)
[10:00:03] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Depooling db1166 (T367781)', diff saved to https://phabricator.wikimedia.org/P66595 and previous config saved to /var/cache/conftool/dbconfig/20240716-100002-arnaudb.json
[10:00:05] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240716T1000)
[10:00:20] <wikibugs>	 (03CR) 10CI reject: [V:04-1] kubernetes: update mcrouter images to bookworm [puppet] - 10https://gerrit.wikimedia.org/r/1054507 (https://phabricator.wikimedia.org/T368366) (owner: 10Effie Mouzeli)
[10:00:22] <wikibugs>	 (03PS2) 10Effie Mouzeli: kubernetes: update mcrouter images to bookworm [puppet] - 10https://gerrit.wikimedia.org/r/1054507 (https://phabricator.wikimedia.org/T368366)
[10:01:31] <wikibugs>	 (03PS1) 10Effie Mouzeli: mw-mcrouter: use bookworm images [deployment-charts] - 10https://gerrit.wikimedia.org/r/1054511 (https://phabricator.wikimedia.org/T368366)
[10:05:56] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1166 (T367781)', diff saved to https://phabricator.wikimedia.org/P66597 and previous config saved to /var/cache/conftool/dbconfig/20240716-100556-arnaudb.json
[10:06:00] <stashbot>	 T367781: Drop deprecated abuse filter fields on wmf wikis - https://phabricator.wikimedia.org/T367781
[10:10:20] <dcausse>	 !log T362529: creating aewikimedia CirrusSearch indices with 'mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=aewikimedia --cluster=all'
[10:10:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:10:23] <stashbot>	 T362529: Create a Wikimedians of United Arab Emirates User Group Wiki - https://phabricator.wikimedia.org/T362529
[10:16:57] <wikibugs>	 06SRE, 10SRE-Access-Requests, 06Data-Engineering, 10LDAP-Access-Requests: LDAP access to the analytics-privatedata-users group for Quiddity - https://phabricator.wikimedia.org/T370091#9984595 (10Clement_Goubert) a:03KStineRowe_WMF Hi,  Can you please read and sign the L3 document, as well as read the Dat...
[10:21:04] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P66598 and previous config saved to /var/cache/conftool/dbconfig/20240716-102103-arnaudb.json
[10:22:30] <wikibugs>	 (03PS2) 10Effie Mouzeli: mw-mcrouter: use bookworm images [deployment-charts] - 10https://gerrit.wikimedia.org/r/1054511 (https://phabricator.wikimedia.org/T368366)
[10:23:16] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to stewards-users for JJMC89 - https://phabricator.wikimedia.org/T369314#9984608 (10Clement_Goubert) 05Open→03In progress p:05Triage→03Medium
[10:25:49] <wikibugs>	 (03PS1) 10Jgiannelos: changeprop: Disable pregeneration for mobile-sections [deployment-charts] - 10https://gerrit.wikimedia.org/r/1054512 (https://phabricator.wikimedia.org/T328036)
[10:29:10] <wikibugs>	 (03PS2) 10Jgiannelos: changeprop: Disable pregeneration for mobile-sections [deployment-charts] - 10https://gerrit.wikimedia.org/r/1054512 (https://phabricator.wikimedia.org/T328036)
[10:33:17] <wikibugs>	 (03CR) 10JMeybohm: [C:03+1] mw-mcrouter: use bookworm images [deployment-charts] - 10https://gerrit.wikimedia.org/r/1054511 (https://phabricator.wikimedia.org/T368366) (owner: 10Effie Mouzeli)
[10:33:29] <wikibugs>	 (03CR) 10JMeybohm: [C:03+1] kubernetes: update mcrouter images to bookworm [puppet] - 10https://gerrit.wikimedia.org/r/1054507 (https://phabricator.wikimedia.org/T368366) (owner: 10Effie Mouzeli)
[10:33:49] <wikibugs>	 (03CR) 10Effie Mouzeli: [C:03+2] mw-mcrouter: use bookworm images [deployment-charts] - 10https://gerrit.wikimedia.org/r/1054511 (https://phabricator.wikimedia.org/T368366) (owner: 10Effie Mouzeli)
[10:34:40] <wikibugs>	 (03Merged) 10jenkins-bot: mw-mcrouter: use bookworm images [deployment-charts] - 10https://gerrit.wikimedia.org/r/1054511 (https://phabricator.wikimedia.org/T368366) (owner: 10Effie Mouzeli)
[10:35:32] <logmsgbot>	 !log jiji@deploy1002 helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply
[10:36:11] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P66599 and previous config saved to /var/cache/conftool/dbconfig/20240716-103610-arnaudb.json
[10:41:15] <jinxer-wm>	 FIRING: MediaWikiMemcachedHighErrorRate: MediaWiki memcached error rate is elevated globally - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?var-datasource=eqiad%20prometheus/ops&viewPanel=19 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiMemcachedHighErrorRate
[10:46:15] <jinxer-wm>	 RESOLVED: MediaWikiMemcachedHighErrorRate: MediaWiki memcached error rate is elevated globally - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?var-datasource=eqiad%20prometheus/ops&viewPanel=19 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiMemcachedHighErrorRate
[10:47:04] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+1] "so, after taking a deeper look to traffic-puppetserver-bookworm, when installed by Andrew Bogott it looks like he took care of migrating t" [puppet] - 10https://gerrit.wikimedia.org/r/1053937 (https://phabricator.wikimedia.org/T355750) (owner: 10Elukey)
[10:47:47] <jinxer-wm>	 FIRING: HelmReleaseBadStatus: Helm release mw-mcrouter/main on k8s@codfw in state pending-upgrade - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=codfw&var-cluster=k8s&var-namespace=mw-mcrouter - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus
[10:47:57] <icinga-wm>	 PROBLEM - Wikitech-static main page has content on wikitech-static.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static
[10:48:47] <icinga-wm>	 RECOVERY - Wikitech-static main page has content on wikitech-static.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 29685 bytes in 0.206 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static
[10:50:07] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1174 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P66600 and previous config saved to /var/cache/conftool/dbconfig/20240716-105006-root.json
[10:50:28] <wikibugs>	 (03CR) 10Btullis: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/3238/co" [puppet] - 10https://gerrit.wikimedia.org/r/1054516 (https://phabricator.wikimedia.org/T368518) (owner: 10Btullis)
[10:51:15] <jinxer-wm>	 FIRING: MediaWikiMemcachedHighErrorRate: MediaWiki memcached error rate is elevated globally - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?var-datasource=eqiad%20prometheus/ops&viewPanel=19 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiMemcachedHighErrorRate
[10:51:18] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1166 (T367781)', diff saved to https://phabricator.wikimedia.org/P66601 and previous config saved to /var/cache/conftool/dbconfig/20240716-105117-arnaudb.json
[10:51:19] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 4:00:00 on db1175.eqiad.wmnet with reason: Maintenance
[10:51:21] <stashbot>	 T367781: Drop deprecated abuse filter fields on wmf wikis - https://phabricator.wikimedia.org/T367781
[10:51:33] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1175.eqiad.wmnet with reason: Maintenance
[10:51:40] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Depooling db1175 (T367781)', diff saved to https://phabricator.wikimedia.org/P66602 and previous config saved to /var/cache/conftool/dbconfig/20240716-105139-arnaudb.json
[10:53:20] <logmsgbot>	 !log jiji@deploy1002 helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply
[10:54:46] <wikibugs>	 (03PS1) 10Marostegui: Revert "dbstore1008: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/1054517
[10:55:09] <wikibugs>	 (03PS1) 10Marostegui: Revert^3 "db1174: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/1054518
[10:56:15] <jinxer-wm>	 RESOLVED: MediaWikiMemcachedHighErrorRate: MediaWiki memcached error rate is elevated globally - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?var-datasource=eqiad%20prometheus/ops&viewPanel=19 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiMemcachedHighErrorRate
[10:56:18] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] Revert^3 "db1174: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/1054518 (owner: 10Marostegui)
[10:56:27] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] Revert "dbstore1008: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/1054517 (owner: 10Marostegui)
[10:57:33] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1175 (T367781)', diff saved to https://phabricator.wikimedia.org/P66603 and previous config saved to /var/cache/conftool/dbconfig/20240716-105732-arnaudb.json
[10:57:36] <stashbot>	 T367781: Drop deprecated abuse filter fields on wmf wikis - https://phabricator.wikimedia.org/T367781
[10:57:47] <jinxer-wm>	 RESOLVED: HelmReleaseBadStatus: Helm release mw-mcrouter/main on k8s@codfw in state pending-upgrade - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=codfw&var-cluster=k8s&var-namespace=mw-mcrouter - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus
[10:59:19] <jinxer-wm>	 FIRING: [2x] JobUnavailable: Reduced availability for job netbox_django in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[11:05:04] <effie>	 jouncebot: now
[11:05:04] <jouncebot>	 No deployments scheduled for the next 0 hour(s) and 54 minute(s)
[11:05:07] <effie>	 jouncebot: next
[11:05:07] <jouncebot>	 In 0 hour(s) and 54 minute(s): Mobileapps/RESTBase/Wikifeeds (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240716T1200)
[11:05:09] <wikibugs>	 (03PS1) 10Slyngshede: P:idm_test add dummy secrets for mediawiki integration. [labs/private] - 10https://gerrit.wikimedia.org/r/1054519
[11:05:12] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1174 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P66604 and previous config saved to /var/cache/conftool/dbconfig/20240716-110512-root.json
[11:07:44] <logmsgbot>	 !log jiji@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
[11:08:02] <logmsgbot>	 !log jiji@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
[11:11:17] <wikibugs>	 (03PS1) 10Stevemunene: [WIP] wdqs: create wdqs split pybal pools [puppet] - 10https://gerrit.wikimedia.org/r/1054520 (https://phabricator.wikimedia.org/T364368)
[11:12:15] <jinxer-wm>	 FIRING: MediaWikiMemcachedHighErrorRate: MediaWiki memcached error rate is elevated globally - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?var-datasource=codfw%20prometheus/ops&viewPanel=19 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiMemcachedHighErrorRate
[11:12:40] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P66605 and previous config saved to /var/cache/conftool/dbconfig/20240716-111239-arnaudb.json
[11:17:15] <jinxer-wm>	 RESOLVED: [2x] MediaWikiMemcachedHighErrorRate: MediaWiki memcached error rate is elevated globally - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiMemcachedHighErrorRate
[11:20:18] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1174 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P66606 and previous config saved to /var/cache/conftool/dbconfig/20240716-112017-root.json
[11:20:39] <logmsgbot>	 !log jiji@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
[11:20:41] <logmsgbot>	 !log jiji@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
[11:23:37] <effie>	 memcached errors are due to deployment
[11:27:47] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P66607 and previous config saved to /var/cache/conftool/dbconfig/20240716-112746-arnaudb.json
[11:35:24] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1174 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P66608 and previous config saved to /var/cache/conftool/dbconfig/20240716-113523-root.json
[11:37:13] <jinxer-wm>	 FIRING: [2x] RoutinatorRsyncErrors: Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors
[11:42:54] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1175 (T367781)', diff saved to https://phabricator.wikimedia.org/P66610 and previous config saved to /var/cache/conftool/dbconfig/20240716-114254-arnaudb.json
[11:42:56] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 4:00:00 on db1189.eqiad.wmnet with reason: Maintenance
[11:42:58] <stashbot>	 T367781: Drop deprecated abuse filter fields on wmf wikis - https://phabricator.wikimedia.org/T367781
[11:43:09] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1189.eqiad.wmnet with reason: Maintenance
[11:43:16] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Depooling db1189 (T367781)', diff saved to https://phabricator.wikimedia.org/P66611 and previous config saved to /var/cache/conftool/dbconfig/20240716-114315-arnaudb.json
[11:49:18] <effie>	 !log drain mw1496.eqiad.wmnet
[11:49:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:50:29] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1174 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P66613 and previous config saved to /var/cache/conftool/dbconfig/20240716-115028-root.json
[11:59:21] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2116 (T367856)', diff saved to https://phabricator.wikimedia.org/P66614 and previous config saved to /var/cache/conftool/dbconfig/20240716-115920-marostegui.json
[11:59:25] <stashbot>	 T367856: Cleanup revision table schema - https://phabricator.wikimedia.org/T367856
[12:00:03] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2116.codfw.wmnet with reason: Maintenance
[12:00:04] <jouncebot>	 Deploy window Mobileapps/RESTBase/Wikifeeds (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240716T1200)
[12:00:05] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2116.codfw.wmnet with reason: Maintenance
[12:00:12] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db2116 (T367856)', diff saved to https://phabricator.wikimedia.org/P66615 and previous config saved to /var/cache/conftool/dbconfig/20240716-120012-marostegui.json
[12:00:21] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2116 (T367856)', diff saved to https://phabricator.wikimedia.org/P66616 and previous config saved to /var/cache/conftool/dbconfig/20240716-120021-marostegui.json
[12:05:15] <jinxer-wm>	 FIRING: MediaWikiMemcachedHighErrorRate: MediaWiki memcached error rate is elevated globally - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?var-datasource=codfw%20prometheus/ops&viewPanel=19 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiMemcachedHighErrorRate
[12:05:34] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1174 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P66617 and previous config saved to /var/cache/conftool/dbconfig/20240716-120534-root.json
[12:06:26] <effie>	 ^ me for the memcached errors
[12:09:11] <logmsgbot>	 !log ayounsi@cumin1002 START - Cookbook sre.deploy.python-code netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.7 to netbox-next - ayounsi@cumin1002 - T336275
[12:09:15] <stashbot>	 T336275: Upgrade Netbox to 4.x - https://phabricator.wikimedia.org/T336275
[12:10:15] <jinxer-wm>	 RESOLVED: [2x] MediaWikiMemcachedHighErrorRate: MediaWiki memcached error rate is elevated globally - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiMemcachedHighErrorRate
[12:10:28] <logmsgbot>	 !log ayounsi@cumin1002 END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.7 to netbox-next - ayounsi@cumin1002 - T336275
[12:14:45] <icinga-wm>	 PROBLEM - MariaDB Replica Lag: s4 on clouddb1019 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 444.19 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[12:15:29] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P66618 and previous config saved to /var/cache/conftool/dbconfig/20240716-121528-marostegui.json
[12:17:15] <jinxer-wm>	 FIRING: MediaWikiMemcachedHighErrorRate: MediaWiki memcached error rate is elevated globally - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?var-datasource=codfw%20prometheus/ops&viewPanel=19 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiMemcachedHighErrorRate
[12:20:40] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1174 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P66619 and previous config saved to /var/cache/conftool/dbconfig/20240716-122039-root.json
[12:30:36] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P66620 and previous config saved to /var/cache/conftool/dbconfig/20240716-123035-marostegui.json
[12:34:45] <jinxer-wm>	 RESOLVED: MediaWikiMemcachedHighErrorRate: MediaWiki memcached error rate is elevated globally - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?var-datasource=codfw%20prometheus/ops&viewPanel=19 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiMemcachedHighErrorRate
[12:38:21] <icinga-wm>	 PROBLEM - MariaDB Replica Lag: s1 on clouddb1017 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 396.08 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[12:39:21] <icinga-wm>	 RECOVERY - MariaDB Replica Lag: s1 on clouddb1017 is OK: OK slave_sql_lag Replication lag: 0.00 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[12:43:33] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1189 (T367781)', diff saved to https://phabricator.wikimedia.org/P66621 and previous config saved to /var/cache/conftool/dbconfig/20240716-124332-arnaudb.json
[12:43:37] <stashbot>	 T367781: Drop deprecated abuse filter fields on wmf wikis - https://phabricator.wikimedia.org/T367781
[12:45:43] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2116 (T367856)', diff saved to https://phabricator.wikimedia.org/P66622 and previous config saved to /var/cache/conftool/dbconfig/20240716-124543-marostegui.json
[12:45:45] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2130.codfw.wmnet with reason: Maintenance
[12:45:49] <stashbot>	 T367856: Cleanup revision table schema - https://phabricator.wikimedia.org/T367856
[12:45:58] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2130.codfw.wmnet with reason: Maintenance
[12:46:05] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db2130 (T367856)', diff saved to https://phabricator.wikimedia.org/P66623 and previous config saved to /var/cache/conftool/dbconfig/20240716-124604-marostegui.json
[12:58:40] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P66624 and previous config saved to /var/cache/conftool/dbconfig/20240716-125839-arnaudb.json
[13:00:04] <jouncebot>	 RoanKattouw, Lucas_WMDE, Urbanecm, awight, and TheresNoTime: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for UTC afternoon backport window . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240716T1300).
[13:00:04] <jouncebot>	 tchin and tgr: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[13:00:10] <Lucas_WMDE>	 o/
[13:00:29] <tchin>	 hola
[13:00:55] <tgr|away>	 o/
[13:01:09] <Lucas_WMDE>	 tchin: do you want to self-serve or should I deploy?
[13:01:20] <Lucas_WMDE>	 (and same question to tgr|away ^^)
[13:01:42] * urbanecm waves
[13:01:47] <urbanecm>	 Lucas_WMDE: lemme know if you want my help
[13:01:57] * Lucas_WMDE waves back
[13:01:58] <tgr|away>	 I can self-serve
[13:02:38] <tchin>	 I'm not at a pc with ssh right now can you deploy?
[13:02:44] <Lucas_WMDE>	 sure!
[13:03:01] <Lucas_WMDE>	 will you still be able to test the change on mwdebug?
[13:03:20] <tchin>	 yes should be fine
[13:03:24] <Lucas_WMDE>	 ok
[13:04:51] * Lucas_WMDE wonders where wikibugs is
[13:05:04] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for [[gerrit:1052762|EventStreamConfig: Enable hive ingestion for mediawiki.page-delete (T367134)]]
[13:05:08] <stashbot>	 T367134: [Refine Refactoring] Integrate Refine workflow configuration into ESC - https://phabricator.wikimedia.org/T367134
[13:09:11] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 tchin, lucaswerkmeister-wmde: Backport for [[gerrit:1052762|EventStreamConfig: Enable hive ingestion for mediawiki.page-delete (T367134)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[13:10:12] * tchin Looks good on mwdebug
[13:10:16] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 tchin, lucaswerkmeister-wmde: Continuing with sync
[13:10:20] <Lucas_WMDE>	 ok, thanks for testing!
[13:10:35] <Lucas_WMDE>	 (I’ve asked around in #wikimedia-cloud about wikibugs btw)
[13:13:47] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P66625 and previous config saved to /var/cache/conftool/dbconfig/20240716-131346-arnaudb.json
[13:15:20] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for [[gerrit:1052762|EventStreamConfig: Enable hive ingestion for mediawiki.page-delete (T367134)]] (duration: 10m 15s)
[13:15:24] <stashbot>	 T367134: [Refine Refactoring] Integrate Refine workflow configuration into ESC - https://phabricator.wikimedia.org/T367134
[13:15:33] <Lucas_WMDE>	 tgr|away: all yours :)
[13:16:11] <tgr|away>	 thx
[13:19:07] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by tgr@deploy1002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1036245 (https://phabricator.wikimedia.org/T365162) (owner: 10Gergő Tisza)
[13:19:16] <Lucas_WMDE>	 yay, wikibugs is back
[13:19:43] <jinxer-wm>	 FIRING: OtelCollectorRefusedSpans: Some spans have been refused by receiver otlp on k8s - TODO - https://grafana.wikimedia.org/d/SPebYW7Iz/opentelemetry-collector - https://alerts.wikimedia.org/?q=alertname%3DOtelCollectorRefusedSpans
[13:19:45] <wikibugs>	 (03Merged) 10jenkins-bot: Handle sso.wikimedia.org domain [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1036245 (https://phabricator.wikimedia.org/T365162) (owner: 10Gergő Tisza)
[13:20:15] <logmsgbot>	 !log tgr@deploy1002 Started scap sync-world: Backport for [[gerrit:1036245|Handle sso.wikimedia.org domain (T365162)]]
[13:20:19] <stashbot>	 T365162: Set up sso.wikimedia.beta.wmflabs.org with config-layer routing to other wikis - https://phabricator.wikimedia.org/T365162
[13:21:39] * Lucas_WMDE 👀 at the “MariaDb running with --read-only” errors in logspam-watch
[13:21:58] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Tox: add Python3.12 support [software/spicerack] - 10https://gerrit.wikimedia.org/r/1050452 (owner: 10Ayounsi)
[13:22:03] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Spicerack: fix Netbox 4 breaking changes [software/spicerack] - 10https://gerrit.wikimedia.org/r/1050453 (https://phabricator.wikimedia.org/T336275) (owner: 10Ayounsi)
[13:22:09] <marostegui>	 Lucas_WMDE: there was a switchover today, and those scripts apparently haven't reloaded the config
[13:22:23] <Lucas_WMDE>	 ah, long-running maintenance scripts
[13:22:26] <Lucas_WMDE>	 how we love them
[13:22:31] <marostegui>	 Yeah..
[13:22:43] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: 10gbit nic option for centrallog2002 - https://phabricator.wikimedia.org/T369826#9985187 (10Papaul) @fgiunchedi yes the server will keep the same IP since we will just relocate it within the same rack.  please see step below - power of the server  - plug the 10G card  - move the...
[13:22:44] <logmsgbot>	 !log tgr@deploy1002 tgr: Backport for [[gerrit:1036245|Handle sso.wikimedia.org domain (T365162)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[13:22:47] <marostegui>	 I should actually kill them, it is not nice to keep trying to write to a host that is RO
[13:23:28] <Lucas_WMDE>	 some euwiki eval.php
[13:23:31] <marostegui>	 yeah
[13:23:34] <marostegui>	 it is always euwiki
[13:23:50] <tgr|away>	 what debug host am I supposed to use these days? just k8s-mwdebug?
[13:24:04] <Lucas_WMDE>	 yeah
[13:24:20] <marostegui>	 Lucas_WMDE: I just killed them
[13:24:41] <Lucas_WMDE>	 marostegui: IIRC that eval.php by catrope caused some other errors the other day
[13:24:43] <jinxer-wm>	 RESOLVED: OtelCollectorRefusedSpans: Some spans have been refused by receiver otlp on k8s - TODO - https://grafana.wikimedia.org/d/SPebYW7Iz/opentelemetry-collector - https://alerts.wikimedia.org/?q=alertname%3DOtelCollectorRefusedSpans
[13:25:03] <marostegui>	 Lucas_WMDE: You think there's a task somewhere about that?
[13:25:05] <Lucas_WMDE>	 hopefully he’ll know to restart it if he needs it
[13:25:09] <wikibugs>	 (03CR) 10Bking: [C:03+1] team-search-platform: migrate cirrus_cluster_checks [alerts] - 10https://gerrit.wikimedia.org/r/1054317 (https://phabricator.wikimedia.org/T359033) (owner: 10DCausse)
[13:25:13] <Lucas_WMDE>	 marostegui: https://phabricator.wikimedia.org/T369600#9965707 is what I remembered
[13:25:30] <marostegui>	 Lucas_WMDE: thank you, I will check
[13:25:37] <Lucas_WMDE>	 if that was still the same process then it was running for over a week now ._.
[13:26:06] <marostegui>	 Lucas_WMDE: the process was from 12th july
[13:26:23] <wikibugs>	 (03PS8) 10Ayounsi: Spicerack: fix Netbox 4 breaking changes [software/spicerack] - 10https://gerrit.wikimedia.org/r/1050453 (https://phabricator.wikimedia.org/T336275)
[13:26:23] <wikibugs>	 (03PS4) 10Ayounsi: Tox: add Python3.12 support [software/spicerack] - 10https://gerrit.wikimedia.org/r/1050452
[13:27:42] <wikibugs>	 (03PS1) 10Elukey: CHANGELOG: add changelogs for release v8.7.0 [software/spicerack] - 10https://gerrit.wikimedia.org/r/1054561
[13:28:54] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1189 (T367781)', diff saved to https://phabricator.wikimedia.org/P66626 and previous config saved to /var/cache/conftool/dbconfig/20240716-132853-arnaudb.json
[13:28:55] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 4:00:00 on db1198.eqiad.wmnet with reason: Maintenance
[13:28:59] <stashbot>	 T367781: Drop deprecated abuse filter fields on wmf wikis - https://phabricator.wikimedia.org/T367781
[13:28:59] <wikibugs>	 (03PS5) 10Ayounsi: Tox: add Python3.12 support [software/spicerack] - 10https://gerrit.wikimedia.org/r/1050452
[13:28:59] <wikibugs>	 (03PS1) 10Ayounsi: Adapt tests for Netbox 4 [software/spicerack] - 10https://gerrit.wikimedia.org/r/1054562
[13:29:08] <wikibugs>	 (03PS2) 10Elukey: CHANGELOG: add changelogs for release v8.7.0 [software/spicerack] - 10https://gerrit.wikimedia.org/r/1054561
[13:29:09] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1198.eqiad.wmnet with reason: Maintenance
[13:29:16] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Depooling db1198 (T367781)', diff saved to https://phabricator.wikimedia.org/P66627 and previous config saved to /var/cache/conftool/dbconfig/20240716-132915-arnaudb.json
[13:29:23] <zabe>	 Normally maint scripts should call Maintenance::waitForReplication which calls $lbFactory->autoReconfigure(); which should prevent issues like this
[13:29:25] <logmsgbot>	 !log mforns@deploy1002 Started deploy [airflow-dags/analytics@1ee55b8]: (no justification provided)
[13:29:55] <logmsgbot>	 !log mforns@deploy1002 Finished deploy [airflow-dags/analytics@1ee55b8]: (no justification provided) (duration: 00m 30s)
[13:31:02] <wikibugs>	 (03PS3) 10Elukey: CHANGELOG: add changelogs for release v8.7.0 [software/spicerack] - 10https://gerrit.wikimedia.org/r/1054561
[13:32:16] <wikibugs>	 (03CR) 10Volans: [C:03+1] "LGTM" [software/spicerack] - 10https://gerrit.wikimedia.org/r/1054561 (owner: 10Elukey)
[13:32:31] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Spicerack: fix Netbox 4 breaking changes [software/spicerack] - 10https://gerrit.wikimedia.org/r/1050453 (https://phabricator.wikimedia.org/T336275) (owner: 10Ayounsi)
[13:32:57] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Tox: add Python3.12 support [software/spicerack] - 10https://gerrit.wikimedia.org/r/1050452 (owner: 10Ayounsi)
[13:33:38] <Lucas_WMDE>	 zabe: not sure that’s possible in eval.php
[13:33:40] <jinxer-wm>	 FIRING: SystemdUnitFailed: docker-reporter-base-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[13:33:57] <Lucas_WMDE>	 but maybe the correct answer there is “please don’t run week-long maintenance scripts in eval.php”…
[13:34:11] <wikibugs>	 (03CR) 10FNegri: [C:03+1] Switch the rols of clouddb1021 to insetup::data_engineering [puppet] - 10https://gerrit.wikimedia.org/r/1054516 (https://phabricator.wikimedia.org/T368518) (owner: 10Btullis)
[13:34:32] <logmsgbot>	 !log tgr@deploy1002 tgr: Continuing with sync
[13:35:08] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1198 (T367781)', diff saved to https://phabricator.wikimedia.org/P66628 and previous config saved to /var/cache/conftool/dbconfig/20240716-133508-arnaudb.json
[13:35:12] <stashbot>	 T367781: Drop deprecated abuse filter fields on wmf wikis - https://phabricator.wikimedia.org/T367781
[13:35:31] <wikibugs>	 (03CR) 10Btullis: [V:03+1 C:03+2] Switch the rols of clouddb1021 to insetup::data_engineering [puppet] - 10https://gerrit.wikimedia.org/r/1054516 (https://phabricator.wikimedia.org/T368518) (owner: 10Btullis)
[13:35:32] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Tox: add Python3.12 support [software/spicerack] - 10https://gerrit.wikimedia.org/r/1050452 (owner: 10Ayounsi)
[13:35:33] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Adapt tests for Netbox 4 [software/spicerack] - 10https://gerrit.wikimedia.org/r/1054562 (owner: 10Ayounsi)
[13:35:55] <tgr|away>	 it's possible, but not straightforward
[13:36:11] <tgr|away>	 you'd need to create an anonymous Maintenance subclass or something
[13:36:21] <effie>	 jouncebot: now
[13:36:21] <jouncebot>	 For the next 0 hour(s) and 23 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240716T1300)
[13:36:25] <tgr|away>	 or just call autoReconfigure directly
[13:37:01] <tgr|away>	 but yeah seems like a pretty bad idea to do anything important and long-running from a throwaway eval loop
[13:37:10] <Lucas_WMDE>	 I was wondering if eval.php should do this in its while loop, but presumably the script is running one long statement, not a series of statements being read from stdin
[13:37:50] <tgr|away>	 yeah the loop would have to be implemented in the code that gets eval'd
[13:38:50] <wikibugs>	 (03CR) 10JMeybohm: otelcol: Stop hardcoding k8s master IP addresses (032 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1054394 (https://phabricator.wikimedia.org/T365855) (owner: 10CDanis)
[13:39:02] <wikibugs>	 (03CR) 10Elukey: [C:03+2] CHANGELOG: add changelogs for release v8.7.0 [software/spicerack] - 10https://gerrit.wikimedia.org/r/1054561 (owner: 10Elukey)
[13:39:22] <logmsgbot>	 !log tgr@deploy1002 Finished scap: Backport for [[gerrit:1036245|Handle sso.wikimedia.org domain (T365162)]] (duration: 19m 07s)
[13:39:26] <stashbot>	 T365162: Set up sso.wikimedia.beta.wmflabs.org with config-layer routing to other wikis - https://phabricator.wikimedia.org/T365162
[13:40:40] <tgr|away>	 !log UTC afternoon deploys done
[13:40:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:41:02] <Lucas_WMDE>	 cc effie 
[13:41:21] <Lucas_WMDE>	 (19 minutes left before urbanecm et al have another window booked ^^)
[13:41:43] <Lucas_WMDE>	 tgr|away: good luck with sso.w.o btw!
[13:41:48] <urbanecm>	 ^^
[13:41:57] <tgr|away>	 thanks!
[13:43:33] <wikibugs>	 (03CR) 10Gergő Tisza: "FWIW I tested during deployment and it seems you can't fake a request to an unknown domain in production because it does not get baked int" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1036245 (https://phabricator.wikimedia.org/T365162) (owner: 10Gergő Tisza)
[13:43:43] <jinxer-wm>	 FIRING: OtelCollectorRefusedSpans: Some spans have been refused by receiver otlp on k8s - TODO - https://grafana.wikimedia.org/d/SPebYW7Iz/opentelemetry-collector - https://alerts.wikimedia.org/?q=alertname%3DOtelCollectorRefusedSpans
[13:44:59] <wikibugs>	 (03Merged) 10jenkins-bot: CHANGELOG: add changelogs for release v8.7.0 [software/spicerack] - 10https://gerrit.wikimedia.org/r/1054561 (owner: 10Elukey)
[13:45:45] <wikibugs>	 10SRE-swift-storage, 13Patch-For-Review: Set up new S3-level replicated storage cluster "apus" - https://phabricator.wikimedia.org/T279621#9985424 (10MatthewVernon) Task updated to reflect name change, updates to technology and scope, and to update to state of progress.
[13:46:24] <wikibugs>	 10SRE-swift-storage, 13Patch-For-Review: Set up new S3-level replicated storage cluster "apus" - https://phabricator.wikimedia.org/T279621#9985417 (10MatthewVernon) 05Stalled→03Open
[13:48:43] <jinxer-wm>	 RESOLVED: OtelCollectorRefusedSpans: Some spans have been refused by receiver otlp on k8s - TODO - https://grafana.wikimedia.org/d/SPebYW7Iz/opentelemetry-collector - https://alerts.wikimedia.org/?q=alertname%3DOtelCollectorRefusedSpans
[13:50:15] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P66629 and previous config saved to /var/cache/conftool/dbconfig/20240716-135015-arnaudb.json
[13:52:45] <icinga-wm>	 RECOVERY - MariaDB Replica Lag: s4 on clouddb1019 is OK: OK slave_sql_lag Replication lag: 0.19 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[13:52:54] <wikibugs>	 (03PS9) 10Ayounsi: Spicerack: fix Netbox 4 breaking changes [software/spicerack] - 10https://gerrit.wikimedia.org/r/1050453 (https://phabricator.wikimedia.org/T336275)
[13:52:54] <wikibugs>	 (03PS2) 10Ayounsi: Adapt tests for Netbox 4 [software/spicerack] - 10https://gerrit.wikimedia.org/r/1054562
[13:52:54] <wikibugs>	 (03PS6) 10Ayounsi: Tox: add Python3.12 support [software/spicerack] - 10https://gerrit.wikimedia.org/r/1050452
[13:53:55] <logmsgbot>	 !log cgoubert@cumin1002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw2432.codfw.wmnet
[13:54:25] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: 10gbit nic option for centrallog2002 - https://phabricator.wikimedia.org/T369826#9985484 (10fgiunchedi) Thank you @Papaul that is quite helpful!  The steps make sense to me, I'm happy to take care of the server configuration (adjusting configuration). I'd even simplify those as...
[13:57:13] <wikibugs>	 (03PS1) 10Elukey: Upstream release v8.7.0 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/1054569
[13:57:56] <wikibugs>	 (03CR) 10Volans: [C:03+1] "LGTM" [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/1054569 (owner: 10Elukey)
[13:59:43] <wikibugs>	 (03CR) 10Ssingh: Release 0.9.8-1+wmf12u1 (032 comments) [debs/python-anycast-healthchecker] - 10https://gerrit.wikimedia.org/r/1054370 (https://phabricator.wikimedia.org/T370068) (owner: 10Ssingh)
[14:00:05] <jouncebot>	 seddon, urbanecm, and dbrant: Account Vanishing deployment (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240716T1400). Please do the needful.
[14:00:12] <urbanecm>	 o/
[14:00:17] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Spicerack: fix Netbox 4 breaking changes [software/spicerack] - 10https://gerrit.wikimedia.org/r/1050453 (https://phabricator.wikimedia.org/T336275) (owner: 10Ayounsi)
[14:00:18] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Tox: add Python3.12 support [software/spicerack] - 10https://gerrit.wikimedia.org/r/1050452 (owner: 10Ayounsi)
[14:00:19] <urbanecm>	 Seddon: dbrant: Hey!
[14:00:20] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Adapt tests for Netbox 4 [software/spicerack] - 10https://gerrit.wikimedia.org/r/1054562 (owner: 10Ayounsi)
[14:00:22] <Seddon>	 o/
[14:00:33] <Seddon>	 Dmitry might not be in here
[14:00:47] <Seddon>	 One second
[14:00:51] <urbanecm>	 yep
[14:02:24] <urbanecm>	 hello dbrant!
[14:02:28] <dbrant>	 o/
[14:03:01] <wikibugs>	 (03CR) 10Effie Mouzeli: [C:03+2] mw-debug and mw-api-int [deployment-charts] - 10https://gerrit.wikimedia.org/r/1054556 (owner: 10Effie Mouzeli)
[14:03:04] <wikibugs>	 (03PS10) 10Ayounsi: Spicerack: fix Netbox 4 breaking changes [software/spicerack] - 10https://gerrit.wikimedia.org/r/1050453 (https://phabricator.wikimedia.org/T336275)
[14:03:04] <wikibugs>	 (03PS3) 10Ayounsi: Adapt tests for Netbox 4 [software/spicerack] - 10https://gerrit.wikimedia.org/r/1054562
[14:03:04] <wikibugs>	 (03PS7) 10Ayounsi: Tox: add Python3.12 support [software/spicerack] - 10https://gerrit.wikimedia.org/r/1050452
[14:03:29] <logmsgbot>	 !log cgoubert@cumin1002 END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts mw2432.codfw.wmnet
[14:03:42] <urbanecm>	 effie: should i wait for your mw changes to finish before i start with my window?
[14:03:58] <wikibugs>	 (03Merged) 10jenkins-bot: mw-debug and mw-api-int [deployment-charts] - 10https://gerrit.wikimedia.org/r/1054556 (owner: 10Effie Mouzeli)
[14:04:04] <urbanecm>	 (happy to, just let me know when i can start)
[14:04:51] <wikibugs>	 (03CR) 10Ayounsi: [C:03+1] Release 0.9.8-1+wmf12u1 (032 comments) [debs/python-anycast-healthchecker] - 10https://gerrit.wikimedia.org/r/1054370 (https://phabricator.wikimedia.org/T370068) (owner: 10Ssingh)
[14:05:23] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P66630 and previous config saved to /var/cache/conftool/dbconfig/20240716-140522-arnaudb.json
[14:05:31] <wikibugs>	 (03PS1) 10Urbanecm: Introduce Vanish Request Flow [extensions/CentralAuth] (wmf/1.43.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1054571 (https://phabricator.wikimedia.org/T367329)
[14:05:43] <wikibugs>	 (03PS10) 10Arnaudb: mysqld-exporter: hotfix config for es1 to es5 [puppet] - 10https://gerrit.wikimedia.org/r/1053698 (https://phabricator.wikimedia.org/T369720)
[14:06:08] <wikibugs>	 (03Abandoned) 10Urbanecm: Introduce Vanish Request Flow [extensions/CentralAuth] (wmf/1.43.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1054571 (https://phabricator.wikimedia.org/T367329) (owner: 10Urbanecm)
[14:06:15] <effie>	 urbanecm: yes please if possible, I checked for the backport window only sigh 
[14:06:25] <effie>	 it will be quick I reckon 
[14:06:37] <urbanecm>	 effie: no worries. we're prepping for the release now, i'll wait for your go ahead before touching prod :)
[14:06:37] <wikibugs>	 (03PS4) 10Dbrant: Enable account vanishing in CentralAuth. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1053373 (https://phabricator.wikimedia.org/T369141)
[14:06:41] <wikibugs>	 (03CR) 10Elukey: [C:03+2] Upstream release v8.7.0 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/1054569 (owner: 10Elukey)
[14:06:45] <effie>	 urbanecm: cool tx 
[14:07:21] <logmsgbot>	 !log jiji@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-debug: apply
[14:07:44] <logmsgbot>	 !log jiji@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
[14:08:32] <wikibugs>	 (03PS1) 10Urbanecm: Introduce Vanish Request Flow [extensions/CentralAuth] (wmf/1.43.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1054572 (https://phabricator.wikimedia.org/T367329)
[14:08:54] <wikibugs>	 (03PS1) 10Urbanecm: Pass wiki id to actor store for cross-db hasPublicLogs query [extensions/CentralAuth] (wmf/1.43.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1054573 (https://phabricator.wikimedia.org/T370059)
[14:08:55] <logmsgbot>	 !log jiji@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
[14:08:57] <wikibugs>	 (03PS1) 10Urbanecm: Properly set automatic vanish performer on GlobalRenameUser [extensions/CentralAuth] (wmf/1.43.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1054574 (https://phabricator.wikimedia.org/T368177)
[14:09:17] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Spicerack: fix Netbox 4 breaking changes [software/spicerack] - 10https://gerrit.wikimedia.org/r/1050453 (https://phabricator.wikimedia.org/T336275) (owner: 10Ayounsi)
[14:10:03] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Tox: add Python3.12 support [software/spicerack] - 10https://gerrit.wikimedia.org/r/1050452 (owner: 10Ayounsi)
[14:10:05] <logmsgbot>	 !log jiji@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
[14:10:31] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Adapt tests for Netbox 4 [software/spicerack] - 10https://gerrit.wikimedia.org/r/1054562 (owner: 10Ayounsi)
[14:11:20] <wikibugs>	 (03CR) 10Cwhite: [C:03+1] "Found the `pint file/disable promql/series` on line 9." [alerts] - 10https://gerrit.wikimedia.org/r/1054555 (https://phabricator.wikimedia.org/T354255) (owner: 10Filippo Giunchedi)
[14:11:23] <logmsgbot>	 !log jiji@deploy1002 helmfile [codfw] START helmfile.d/services/mw-api-int: apply
[14:12:38] <logmsgbot>	 !log jiji@deploy1002 helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
[14:13:23] <effie>	 urbanecm: done, tx 
[14:13:27] <urbanecm>	 thanks!
[14:13:32] <wikibugs>	 (03CR) 10Urbanecm: [C:03+2] Introduce Vanish Request Flow [extensions/CentralAuth] (wmf/1.43.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1054572 (https://phabricator.wikimedia.org/T367329) (owner: 10Urbanecm)
[14:13:37] <wikibugs>	 (03CR) 10Urbanecm: [C:03+2] Pass wiki id to actor store for cross-db hasPublicLogs query [extensions/CentralAuth] (wmf/1.43.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1054573 (https://phabricator.wikimedia.org/T370059) (owner: 10Urbanecm)
[14:13:41] <wikibugs>	 (03CR) 10Urbanecm: [C:03+2] Properly set automatic vanish performer on GlobalRenameUser [extensions/CentralAuth] (wmf/1.43.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1054574 (https://phabricator.wikimedia.org/T368177) (owner: 10Urbanecm)
[14:14:07] <wikibugs>	 (03CR) 10Urbanecm: [C:03+2] Enable account vanishing in CentralAuth. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1053373 (https://phabricator.wikimedia.org/T369141) (owner: 10Dbrant)
[14:14:53] <wikibugs>	 (03Merged) 10jenkins-bot: Enable account vanishing in CentralAuth. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1053373 (https://phabricator.wikimedia.org/T369141) (owner: 10Dbrant)
[14:15:34] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by urbanecm@deploy1002 using scap backport" [extensions/CentralAuth] (wmf/1.43.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1054572 (https://phabricator.wikimedia.org/T367329) (owner: 10Urbanecm)
[14:15:34] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by urbanecm@deploy1002 using scap backport" [extensions/CentralAuth] (wmf/1.43.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1054573 (https://phabricator.wikimedia.org/T370059) (owner: 10Urbanecm)
[14:15:34] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by urbanecm@deploy1002 using scap backport" [extensions/CentralAuth] (wmf/1.43.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1054574 (https://phabricator.wikimedia.org/T368177) (owner: 10Urbanecm)
[14:18:07] <wikibugs>	 (03PS2) 10Filippo Giunchedi: o11y: disable promql/series for BenthosKafkaConsumerLag [alerts] - 10https://gerrit.wikimedia.org/r/1054555 (https://phabricator.wikimedia.org/T354255)
[14:18:11] <wikibugs>	 (03PS1) 10Bking: relforge: remove non-functional TLS termination changes [puppet] - 10https://gerrit.wikimedia.org/r/1054578 (https://phabricator.wikimedia.org/T368950)
[14:20:30] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1198 (T367781)', diff saved to https://phabricator.wikimedia.org/P66631 and previous config saved to /var/cache/conftool/dbconfig/20240716-142029-arnaudb.json
[14:20:34] <stashbot>	 T367781: Drop deprecated abuse filter fields on wmf wikis - https://phabricator.wikimedia.org/T367781
[14:21:25] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Request additional mgmt IP range for frack servers - https://phabricator.wikimedia.org/T370164 (10Jhancock.wm) 03NEW
[14:22:02] <wikibugs>	 (03PS3) 10Effie Mouzeli: kubernetes: update mcrouter images to bookworm [puppet] - 10https://gerrit.wikimedia.org/r/1054507 (https://phabricator.wikimedia.org/T368366)
[14:22:34] <wikibugs>	 (03CR) 10Effie Mouzeli: [C:03+2] kubernetes: update mcrouter images to bookworm [puppet] - 10https://gerrit.wikimedia.org/r/1054507 (https://phabricator.wikimedia.org/T368366) (owner: 10Effie Mouzeli)
[14:22:43] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 4:00:00 on db1212.eqiad.wmnet with reason: Maintenance
[14:22:44] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:03+2] o11y: disable promql/series for BenthosKafkaConsumerLag [alerts] - 10https://gerrit.wikimedia.org/r/1054555 (https://phabricator.wikimedia.org/T354255) (owner: 10Filippo Giunchedi)
[14:22:56] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1212.eqiad.wmnet with reason: Maintenance
[14:22:57] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[14:23:14] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[14:23:21] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Depooling db1212 (T367781)', diff saved to https://phabricator.wikimedia.org/P66632 and previous config saved to /var/cache/conftool/dbconfig/20240716-142321-arnaudb.json
[14:24:09] <wikibugs>	 (03CR) 10Ssingh: "Thanks for the review!" [debs/python-anycast-healthchecker] - 10https://gerrit.wikimedia.org/r/1054370 (https://phabricator.wikimedia.org/T370068) (owner: 10Ssingh)
[14:24:13] <wikibugs>	 (03CR) 10Ssingh: [C:03+2] Release 0.9.8-1+wmf12u1 [debs/python-anycast-healthchecker] - 10https://gerrit.wikimedia.org/r/1054370 (https://phabricator.wikimedia.org/T370068) (owner: 10Ssingh)
[14:24:17] <wikibugs>	 (03PS8) 10CDobbins: purged: set use_pki to true for all sites [puppet] - 10https://gerrit.wikimedia.org/r/1050417 (https://phabricator.wikimedia.org/T360506)
[14:24:58] <wikibugs>	 (03CR) 10Ottomata: eventbus: enable instrumentation on group 0 (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1054357 (https://phabricator.wikimedia.org/T363587) (owner: 10Gmodena)
[14:25:00] <wikibugs>	 (03Merged) 10jenkins-bot: Introduce Vanish Request Flow [extensions/CentralAuth] (wmf/1.43.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1054572 (https://phabricator.wikimedia.org/T367329) (owner: 10Urbanecm)
[14:25:10] <wikibugs>	 (03Merged) 10jenkins-bot: Pass wiki id to actor store for cross-db hasPublicLogs query [extensions/CentralAuth] (wmf/1.43.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1054573 (https://phabricator.wikimedia.org/T370059) (owner: 10Urbanecm)
[14:25:11] <wikibugs>	 (03Merged) 10jenkins-bot: Properly set automatic vanish performer on GlobalRenameUser [extensions/CentralAuth] (wmf/1.43.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1054574 (https://phabricator.wikimedia.org/T368177) (owner: 10Urbanecm)
[14:25:47] <logmsgbot>	 !log urbanecm@deploy1002 Started scap sync-world: Backport for [[gerrit:1054572|Introduce Vanish Request Flow (T367329 T367726 T367728 T367729 T367744 T368177 T368285 T368368 T368372 T368611 T369489)]], [[gerrit:1054573|Pass wiki id to actor store for cross-db hasPublicLogs query (T370059)]], [[gerrit:1054574|Properly set automatic vanish performer on GlobalRenameUser (T368177)]], [[gerrit:1053373|Enable account vanishing
[14:25:47] <logmsgbot>	 in CentralAuth. (T369141)]]
[14:26:29] <wikibugs>	 (03CR) 10Ottomata: [C:03+1] eventbus: enable instrumentation on group 0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1054357 (https://phabricator.wikimedia.org/T363587) (owner: 10Gmodena)
[14:26:52] <stashbot>	 T367329: Create Special:AccountVanishRequest page - https://phabricator.wikimedia.org/T367329
[14:26:52] <stashbot>	 T367726: Initiate Global Rename queue from `Special:AccountVanishRequestPage` - https://phabricator.wikimedia.org/T367726
[14:26:53] <stashbot>	 T367728: Customise "status" page for Vanishing Account - https://phabricator.wikimedia.org/T367728
[14:26:53] <stashbot>	 T367729: Customise Vanishing account Approval/Decline email - https://phabricator.wikimedia.org/T367729
[14:26:54] <stashbot>	 T367744: [EPIC] Phase 3 - Enable Global Rename Queue with Account Vanishing - https://phabricator.wikimedia.org/T367744
[14:26:54] <stashbot>	 T368177: Automatically accept vanishing requests if the user has no activity - https://phabricator.wikimedia.org/T368177
[14:26:54] <stashbot>	 T368285: Update Special:GlobalRenameQueue request view to work for vanish requests - https://phabricator.wikimedia.org/T368285
[14:26:55] <stashbot>	 T368368: Create Zendesk ticket when vanishing is declined - https://phabricator.wikimedia.org/T368368
[14:26:55] <stashbot>	 T368372: Define list for "appeal for a block" - https://phabricator.wikimedia.org/T368372
[14:26:55] <stashbot>	 T368611: Update Copy in the "alert" popup - https://phabricator.wikimedia.org/T368611
[14:26:56] <stashbot>	 T369489: Enhance the auto-vanish maintenance script - https://phabricator.wikimedia.org/T369489
[14:26:56] <stashbot>	 T370059: Auto-vanishing failing with error InvalidArgumentException: DB connection domain 'loginwiki' does not match 'metawiki' - https://phabricator.wikimedia.org/T370059
[14:26:57] <stashbot>	 T369141: Setup live configuration for account vanishing - https://phabricator.wikimedia.org/T369141
[14:27:12] <wikibugs>	 (03PS1) 10Effie Mouzeli: mw-mcrouter: use puppet defined image [deployment-charts] - 10https://gerrit.wikimedia.org/r/1054580
[14:28:23] <wikibugs>	 (03CR) 10Ssingh: "Looks good!" [puppet] - 10https://gerrit.wikimedia.org/r/1050417 (https://phabricator.wikimedia.org/T360506) (owner: 10CDobbins)
[14:29:48] <wikibugs>	 (03CR) 10CDobbins: [C:03+2] purged: set use_pki to true for all sites (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1050417 (https://phabricator.wikimedia.org/T360506) (owner: 10CDobbins)
[14:29:53] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1212 (T367781)', diff saved to https://phabricator.wikimedia.org/P66633 and previous config saved to /var/cache/conftool/dbconfig/20240716-142953-arnaudb.json
[14:29:59] <stashbot>	 T367781: Drop deprecated abuse filter fields on wmf wikis - https://phabricator.wikimedia.org/T367781
[14:31:59] <wikibugs>	 (03PS3) 10Gmodena: eventbus: enable instrumentation on group 0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1054357 (https://phabricator.wikimedia.org/T363587)
[14:32:43] <jinxer-wm>	 FIRING: OtelCollectorRefusedSpans: Some spans have been refused by receiver otlp on k8s - TODO - https://grafana.wikimedia.org/d/SPebYW7Iz/opentelemetry-collector - https://alerts.wikimedia.org/?q=alertname%3DOtelCollectorRefusedSpans
[14:33:07] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'T365997 - depool db1194-s7,db1200-s5,db1201-s6', diff saved to https://phabricator.wikimedia.org/P66634 and previous config saved to /var/cache/conftool/dbconfig/20240716-143306-arnaudb.json
[14:33:08] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on db[1194,1200-1201].eqiad.wmnet,dbstore1009.eqiad.wmnet with reason: T365997
[14:33:22] <stashbot>	 T365997: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 -lsw1-f2-eqiad	 - https://phabricator.wikimedia.org/T365997
[14:33:24] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db[1194,1200-1201].eqiad.wmnet,dbstore1009.eqiad.wmnet with reason: T365997
[14:34:06] <icinga-wm>	 PROBLEM - MariaDB Replica Lag: s4 on clouddb1019 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 306.22 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[14:34:07] <claime>	 !log Cordoning kubernetes1062.eqiad.wmnet mw1494.eqiad.wmnet mw1495.eqiad.wmnet - T365997
[14:34:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:34:54] <wikibugs>	 (03CR) 10Bking: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1054578 (https://phabricator.wikimedia.org/T368950) (owner: 10Bking)
[14:36:51] <logmsgbot>	 !log cgoubert@cumin1002 conftool action : set/pooled=inactive; selector: name=(kubernetes1062.eqiad.wmnet|mw1494.eqiad.wmnet|mw1495.eqiad.wmnet),cluster=kubernetes,service=kubesvc
[14:37:23] <arnaudb>	 I've silenced clouddb1019 alert dhinus marostegui ( 53432765-2729-4b06-9198-a04d03c9966c ) → this "fale positive" should be fixed when we finsih  T369715
[14:37:23] <stashbot>	 T369715: Gather all mariadb host under the same prometheus label - https://phabricator.wikimedia.org/T369715
[14:37:43] <jinxer-wm>	 RESOLVED: OtelCollectorRefusedSpans: Some spans have been refused by receiver otlp on k8s - TODO - https://grafana.wikimedia.org/d/SPebYW7Iz/opentelemetry-collector - https://alerts.wikimedia.org/?q=alertname%3DOtelCollectorRefusedSpans
[14:37:58] <marostegui>	 arnaudb: what do you mean false positive?
[14:38:36] <arnaudb>	 you forgot the quotes!:D its a threshold that should be adjusted as its too generic for that specific host (given the criticity of the alert etc.) 
[14:39:19] <jinxer-wm>	 FIRING: [3x] JobUnavailable: Reduced availability for job netbox_django in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[14:39:27] <marostegui>	 arnaudb: But that alert comes from icinga I believe
[14:39:47] <marostegui>	 You mean you'll adjust the prometheus future one?
[14:40:09] <arnaudb>	 yep this will be fixed during the migration indeed, that was my original meaning :)
[14:40:11] <wikibugs>	 (03PS2) 10Filippo Giunchedi: data-engineering: disable promql/rate lint for MediawikiPageContentChangeEnrichAvailability [alerts] - 10https://gerrit.wikimedia.org/r/1054540 (https://phabricator.wikimedia.org/T354255)
[14:40:11] <wikibugs>	 (03PS2) 10Filippo Giunchedi: data-platform: fix datahub availability [alerts] - 10https://gerrit.wikimedia.org/r/1054551 (https://phabricator.wikimedia.org/T354255)
[14:40:55] <wikibugs>	 (03CR) 10Gmodena: eventbus: enable instrumentation on group 0 (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1054357 (https://phabricator.wikimedia.org/T363587) (owner: 10Gmodena)
[14:41:01] <wikibugs>	 (03PS4) 10Gmodena: eventbus: enable instrumentation on group 0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1054357 (https://phabricator.wikimedia.org/T363587)
[14:41:38] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, July 16 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1054025 (https://phabricator.wikimedia.org/T369979) (owner: 10Seawolf35gerrit)
[14:42:08] <icinga-wm>	 RECOVERY - MariaDB Replica Lag: s4 on clouddb1019 is OK: OK slave_sql_lag Replication lag: 24.37 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[14:43:48] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:04-1] "doesn't pass PCC https://puppet-compiler.wmflabs.org/output/1053698/3246/" [puppet] - 10https://gerrit.wikimedia.org/r/1053698 (https://phabricator.wikimedia.org/T369720) (owner: 10Arnaudb)
[14:44:55] <sukhe>	 !log reprepro -C main include bookworm-wikimedia anycast-healthchecker_0.9.8-1+wmf12u1_amd64.changes: T370068
[14:44:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:45:00] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P66635 and previous config saved to /var/cache/conftool/dbconfig/20240716-144500-arnaudb.json
[14:45:02] <stashbot>	 T370068: Upgrade anycast-healthchecker to 0.9.8 (from 0.9.1-1+wmf12u1) - https://phabricator.wikimedia.org/T370068
[14:46:16] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.hosts.downtime for 0:50:00 on lsw1-f2-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f2-eqiad
[14:46:31] <logmsgbot>	 !log cmooney@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:50:00 on lsw1-f2-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f2-eqiad
[14:46:44] <wikibugs>	 (03PS4) 10CDanis: otelcol: Stop hardcoding k8s master IP addresses [deployment-charts] - 10https://gerrit.wikimedia.org/r/1054394 (https://phabricator.wikimedia.org/T365855)
[14:46:45] <wikibugs>	 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DBA, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 -lsw1-f2-eqiad - https://phabricator.wikimedia.org/T365997#9985956 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=36afd2cf-508d-4c02-a8cc-afb66ea29242) set by cmooney@...
[14:46:49] <wikibugs>	 (03CR) 10CDanis: otelcol: Stop hardcoding k8s master IP addresses (032 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1054394 (https://phabricator.wikimedia.org/T365855) (owner: 10CDanis)
[14:46:53] <wikibugs>	 (03PS1) 10DCausse: cirrus-streaming-updater: bump image version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1054582 (https://phabricator.wikimedia.org/T368010)
[14:47:48] <wikibugs>	 (03CR) 10Bking: [C:03+2] "self-merging, as this only affects a test environment." [puppet] - 10https://gerrit.wikimedia.org/r/1054578 (https://phabricator.wikimedia.org/T368950) (owner: 10Bking)
[14:47:55] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 10fundraising-tech-ops: Q1:rack/setup/install franio200[1-3] - https://phabricator.wikimedia.org/T367819#9985959 (10Jhancock.wm) a:05Jhancock.wm→03Papaul got the servers set up with temp idrac IPs. all yours.
[14:49:36] <sukhe>	 !log [durum1001] upgrade anycast-healthchecker to 0.9.8-1+wmf12u1: T370068
[14:49:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:50:02] <wikibugs>	 (03CR) 10Kamila Součková: [C:03+1] changeprop: Disable pregeneration for mobile-sections [deployment-charts] - 10https://gerrit.wikimedia.org/r/1054512 (https://phabricator.wikimedia.org/T328036) (owner: 10Jgiannelos)
[14:50:30] <wikibugs>	 (03PS2) 10DCausse: cirrus-streaming-updater: bump image version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1054582 (https://phabricator.wikimedia.org/T368010)
[14:50:52] <wikibugs>	 (03CR) 10Jgiannelos: "I double checked turnilo for traffic. Last reference from MWOffliner related traffic to mobile-sections is on 1st of July and before that " [deployment-charts] - 10https://gerrit.wikimedia.org/r/1054512 (https://phabricator.wikimedia.org/T328036) (owner: 10Jgiannelos)
[14:51:59] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1158 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P66636 and previous config saved to /var/cache/conftool/dbconfig/20240716-145159-root.json
[14:53:17] <logmsgbot>	 !log filippo@cumin1002 START - Cookbook sre.hosts.downtime for 3:00:00 on centrallog2002.codfw.wmnet with reason: network upgrade
[14:53:31] <logmsgbot>	 !log filippo@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on centrallog2002.codfw.wmnet with reason: network upgrade
[14:53:35] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: 10gbit nic option for centrallog2002 - https://phabricator.wikimedia.org/T369826#9985983 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=0bfb0df8-b693-4fb7-8581-00886bab46c6) set by filippo@cumin1002 for 3:00:00 on 1 host(s) and their services with reason: ne...
[14:53:37] <logmsgbot>	 !log urbanecm@deploy1002 dbrant, urbanecm: Backport for [[gerrit:1054572|Introduce Vanish Request Flow (T367329 T367726 T367728 T367729 T367744 T368177 T368285 T368368 T368372 T368611 T369489)]], [[gerrit:1054573|Pass wiki id to actor store for cross-db hasPublicLogs query (T370059)]], [[gerrit:1054574|Properly set automatic vanish performer on GlobalRenameUser (T368177)]], [[gerrit:1053373|Enable account vanishing in Cen
[14:53:37] <logmsgbot>	 tralAuth. (T369141)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[14:53:41] <logmsgbot>	 !log urbanecm@deploy1002 dbrant, urbanecm: Continuing with sync
[14:53:55] <stashbot>	 T367329: Create Special:AccountVanishRequest page - https://phabricator.wikimedia.org/T367329
[14:53:55] <stashbot>	 T367726: Initiate Global Rename queue from `Special:AccountVanishRequestPage` - https://phabricator.wikimedia.org/T367726
[14:53:55] <stashbot>	 T367728: Customise "status" page for Vanishing Account - https://phabricator.wikimedia.org/T367728
[14:53:56] <stashbot>	 T367729: Customise Vanishing account Approval/Decline email - https://phabricator.wikimedia.org/T367729
[14:53:56] <stashbot>	 T367744: [EPIC] Phase 3 - Enable Global Rename Queue with Account Vanishing - https://phabricator.wikimedia.org/T367744
[14:53:57] <stashbot>	 T368177: Automatically accept vanishing requests if the user has no activity - https://phabricator.wikimedia.org/T368177
[14:53:57] <stashbot>	 T368285: Update Special:GlobalRenameQueue request view to work for vanish requests - https://phabricator.wikimedia.org/T368285
[14:53:58] <stashbot>	 T368368: Create Zendesk ticket when vanishing is declined - https://phabricator.wikimedia.org/T368368
[14:53:58] <stashbot>	 T368372: Define list for "appeal for a block" - https://phabricator.wikimedia.org/T368372
[14:53:58] <stashbot>	 T368611: Update Copy in the "alert" popup - https://phabricator.wikimedia.org/T368611
[14:53:59] <stashbot>	 T369489: Enhance the auto-vanish maintenance script - https://phabricator.wikimedia.org/T369489
[14:53:59] <stashbot>	 T370059: Auto-vanishing failing with error InvalidArgumentException: DB connection domain 'loginwiki' does not match 'metawiki' - https://phabricator.wikimedia.org/T370059
[14:53:59] <stashbot>	 T369141: Setup live configuration for account vanishing - https://phabricator.wikimedia.org/T369141
[14:55:48] <icinga-wm>	 PROBLEM - BFD status on cr2-codfw is CRITICAL: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[14:55:48] <icinga-wm>	 PROBLEM - BFD status on cr1-codfw is CRITICAL: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[14:56:28] <icinga-wm>	 PROBLEM - BGP status on cr1-codfw is CRITICAL: BGP CRITICAL - AS64605/IPv4: Connect - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[14:56:36] <icinga-wm>	 PROBLEM - BGP status on cr2-codfw is CRITICAL: BGP CRITICAL - AS64605/IPv4: Connect - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[14:56:38] <sukhe>	 hmm?
[14:57:07] <sukhe>	 looking
[14:57:30] <godog>	 oh that might be me with centrallog2002 sukhe 
[14:57:44] <godog>	 in which case, expected as part of https://phabricator.wikimedia.org/T369826
[14:57:48] <icinga-wm>	 PROBLEM - Bird Internet Routing Daemon on durum1001 is CRITICAL: PROCS CRITICAL: 0 processes with command name bird https://wikitech.wikimedia.org/wiki/Anycast%23Bird_daemon_not_running
[14:57:52] <icinga-wm>	 PROBLEM - Check if anycast-healthchecker and all configured threads are running on durum1001 is CRITICAL: CRITICAL: anycast-healthchecker could be down as pid file /var/run/anycast-healthchecker/anycast-healthchecker.pid doesnt exist https://wikitech.wikimedia.org/wiki/Anycast%23Anycast_healthchecker_not_running
[14:58:01] <godog>	 or maybe not!
[14:58:06] <wikibugs>	 (03PS1) 10DCausse: rdf-streaming-updater: configure the split graph updater [deployment-charts] - 10https://gerrit.wikimedia.org/r/1054584 (https://phabricator.wikimedia.org/T361935)
[14:58:15] <sukhe>	 yeah, maybe not!
[14:58:26] <icinga-wm>	 PROBLEM - BFD status on cr1-eqiad is CRITICAL: Down: 2 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[14:58:26] <icinga-wm>	 PROBLEM - BFD status on cr2-eqiad is CRITICAL: Down: 2 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[14:58:43] <sukhe>	 eqiad is probably expected because of durum1001
[14:58:44] <sukhe>	 codfw, no
[14:58:46] <sukhe>	 so looking
[14:59:19] <jinxer-wm>	 FIRING: [3x] JobUnavailable: Reduced availability for job netbox_django in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[15:00:05] <jouncebot>	 eoghan, jelto, arnoldokoth, and mutante: #bothumor My software never has bugs. It just develops random features. Rise for SRE Collaboration Services office hours. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240716T1500).
[15:00:09] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P66637 and previous config saved to /var/cache/conftool/dbconfig/20240716-150007-arnaudb.json
[15:00:36] <jinxer-wm>	 FIRING: [4x] JobUnavailable: Reduced availability for job netbox_django in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[15:00:49] * urbanecm still deploying MW
[15:01:40] <logmsgbot>	 !log urbanecm@deploy1002 Finished scap: Backport for [[gerrit:1054572|Introduce Vanish Request Flow (T367329 T367726 T367728 T367729 T367744 T368177 T368285 T368368 T368372 T368611 T369489)]], [[gerrit:1054573|Pass wiki id to actor store for cross-db hasPublicLogs query (T370059)]], [[gerrit:1054574|Properly set automatic vanish performer on GlobalRenameUser (T368177)]], [[gerrit:1053373|Enable account vanishing in Centra
[15:01:40] <logmsgbot>	 lAuth. (T369141)]] (duration: 35m 52s)
[15:01:46] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.downtime for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator/Phorge update
[15:01:58] <stashbot>	 T367329: Create Special:AccountVanishRequest page - https://phabricator.wikimedia.org/T367329
[15:01:58] <stashbot>	 T367726: Initiate Global Rename queue from `Special:AccountVanishRequestPage` - https://phabricator.wikimedia.org/T367726
[15:01:59] <stashbot>	 T367728: Customise "status" page for Vanishing Account - https://phabricator.wikimedia.org/T367728
[15:01:59] <stashbot>	 T367729: Customise Vanishing account Approval/Decline email - https://phabricator.wikimedia.org/T367729
[15:01:59] <stashbot>	 T367744: [EPIC] Phase 3 - Enable Global Rename Queue with Account Vanishing - https://phabricator.wikimedia.org/T367744
[15:02:00] <stashbot>	 T368177: Automatically accept vanishing requests if the user has no activity - https://phabricator.wikimedia.org/T368177
[15:02:00] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator/Phorge update
[15:02:00] <stashbot>	 T368285: Update Special:GlobalRenameQueue request view to work for vanish requests - https://phabricator.wikimedia.org/T368285
[15:02:01] <stashbot>	 T368368: Create Zendesk ticket when vanishing is declined - https://phabricator.wikimedia.org/T368368
[15:02:01] <stashbot>	 T368372: Define list for "appeal for a block" - https://phabricator.wikimedia.org/T368372
[15:02:01] <stashbot>	 T368611: Update Copy in the "alert" popup - https://phabricator.wikimedia.org/T368611
[15:02:02] <stashbot>	 T369489: Enhance the auto-vanish maintenance script - https://phabricator.wikimedia.org/T369489
[15:02:02] <stashbot>	 T370059: Auto-vanishing failing with error InvalidArgumentException: DB connection domain 'loginwiki' does not match 'metawiki' - https://phabricator.wikimedia.org/T370059
[15:02:03] <stashbot>	 T369141: Setup live configuration for account vanishing - https://phabricator.wikimedia.org/T369141
[15:02:13] <hashar>	 poor bot :D
[15:02:19] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.downtime for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator/Phorge update
[15:02:27] <wikibugs>	 (03CR) 10JMeybohm: otelcol: Stop hardcoding k8s master IP addresses (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1054394 (https://phabricator.wikimedia.org/T365855) (owner: 10CDanis)
[15:02:33] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator/Phorge update
[15:03:35] <Lucas_WMDE>	 hashar: lol
[15:03:43] <jinxer-wm>	 FIRING: OtelCollectorRefusedSpans: Some spans have been refused by receiver otlp on k8s - TODO - https://grafana.wikimedia.org/d/SPebYW7Iz/opentelemetry-collector - https://alerts.wikimedia.org/?q=alertname%3DOtelCollectorRefusedSpans
[15:04:12] <logmsgbot>	 !log brennen@deploy1002 Started deploy [phabricator/deployment@7335128]: test deploy phab2002 for T370109
[15:04:15] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.downtime for 0:30:00 on phab.wmfusercontent.org with reason: Phabricator/Phorge update
[15:04:16] <stashbot>	 T370109: Deploy Phabricator/Phorge 2024-07-16 - https://phabricator.wikimedia.org/T370109
[15:04:29] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab.wmfusercontent.org with reason: Phabricator/Phorge update
[15:04:46] <logmsgbot>	 !log brennen@deploy1002 Finished deploy [phabricator/deployment@7335128]: test deploy phab2002 for T370109 (duration: 00m 34s)
[15:05:17] <logmsgbot>	 !log brennen@deploy1002 Started deploy [phabricator/deployment@7335128]: deploy phab1004 for T370109
[15:05:50] <godog>	 !log silence OtelCollectorRefusedSpans in codfw for 7d
[15:05:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:05:58] <godog>	 !log silence OtelCollectorRefusedSpans in codfw for 7d - T370043
[15:06:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:06:09] <logmsgbot>	 !log brennen@deploy1002 Finished deploy [phabricator/deployment@7335128]: deploy phab1004 for T370109 (duration: 00m 52s)
[15:06:24] <sukhe>	 godog: 
[15:06:26] <sukhe>	 sukhe@re0.cr2-codfw> show bgp summary | match 10.192.16.35 
[15:06:26] <sukhe>	 10.192.16.35          64605          0          0       0      25       12:29 Connect
[15:06:48] <sukhe>	 so eqiad was durum1001 (me) and you were right about centrallog2002, just as an FYI for awarenes
[15:06:53] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-f2-eqiad,lsw1-f2-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-f2-eqiad
[15:07:03] <godog>	 sukhe: hah! thank you, makes sense
[15:07:04] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P66638 and previous config saved to /var/cache/conftool/dbconfig/20240716-150704-root.json
[15:07:09] <logmsgbot>	 !log cmooney@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-f2-eqiad,lsw1-f2-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-f2-eqiad
[15:07:16] <wikibugs>	 (03CR) 10Effie Mouzeli: "diff looks ok" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1054394 (https://phabricator.wikimedia.org/T365855) (owner: 10CDanis)
[15:07:22] <wikibugs>	 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DBA, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 -lsw1-f2-eqiad - https://phabricator.wikimedia.org/T365997#9986058 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=81c0aaa1-44d2-4d05-942a-66bcdfb90d2d) set by cmooney@...
[15:07:31] <wikibugs>	 (03CR) 10Effie Mouzeli: [C:03+1] otelcol: Stop hardcoding k8s master IP addresses [deployment-charts] - 10https://gerrit.wikimedia.org/r/1054394 (https://phabricator.wikimedia.org/T365855) (owner: 10CDanis)
[15:07:51] <wikibugs>	 (03PS5) 10CDanis: otelcol: Stop hardcoding k8s master IP addresses [deployment-charts] - 10https://gerrit.wikimedia.org/r/1054394 (https://phabricator.wikimedia.org/T365855)
[15:07:58] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.hosts.downtime for 0:30:00 on 21 hosts with reason: JunOS upgrade lsw1-f2-eqiad
[15:08:17] <logmsgbot>	 !log cmooney@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 21 hosts with reason: JunOS upgrade lsw1-f2-eqiad
[15:08:26] <wikibugs>	 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DBA, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 -lsw1-f2-eqiad - https://phabricator.wikimedia.org/T365997#9986071 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=58bc700a-b84d-4058-9776-9f6510239089) set by cmooney@...
[15:08:32] <topranks>	 !log Rebooting lsw1-f2-eqiad to complete JunOS upgrade T365997
[15:08:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:08:35] <stashbot>	 T365997: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 -lsw1-f2-eqiad	 - https://phabricator.wikimedia.org/T365997
[15:09:52] <icinga-wm>	 RECOVERY - Check if anycast-healthchecker and all configured threads are running on durum1001 is OK: OK: UP (pid=208272) and all threads (8) are running https://wikitech.wikimedia.org/wiki/Anycast%23Anycast_healthchecker_not_running
[15:09:58] <wikibugs>	 (03CR) 10DCausse: [C:04-1] "image & kafka topics not yet ready" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1054584 (https://phabricator.wikimedia.org/T361935) (owner: 10DCausse)
[15:10:26] <icinga-wm>	 RECOVERY - BFD status on cr1-eqiad is OK: UP: 21 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[15:10:28] <icinga-wm>	 RECOVERY - BFD status on cr2-eqiad is OK: UP: 25 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[15:10:48] <icinga-wm>	 RECOVERY - Bird Internet Routing Daemon on durum1001 is OK: PROCS OK: 1 process with command name bird https://wikitech.wikimedia.org/wiki/Anycast%23Bird_daemon_not_running
[15:15:16] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1212 (T367781)', diff saved to https://phabricator.wikimedia.org/P66640 and previous config saved to /var/cache/conftool/dbconfig/20240716-151516-arnaudb.json
[15:15:18] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 4:00:00 on db1240.eqiad.wmnet with reason: Maintenance
[15:15:20] <stashbot>	 T367781: Drop deprecated abuse filter fields on wmf wikis - https://phabricator.wikimedia.org/T367781
[15:15:32] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1240.eqiad.wmnet with reason: Maintenance
[15:15:36] <jinxer-wm>	 FIRING: [4x] ProbeDown: Service aqs1021-a:7000 has failed probes (tcp_cassandra_a_ssl_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[15:19:34] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 4:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
[15:19:47] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
[15:20:26] <wikibugs>	 (03CR) 10Jelto: [C:03+1] "lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/1054398 (owner: 10Dzahn)
[15:21:21] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] gerrit: switch firewall provider to nftables at role level [puppet] - 10https://gerrit.wikimedia.org/r/1054398 (owner: 10Dzahn)
[15:22:10] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1158 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P66641 and previous config saved to /var/cache/conftool/dbconfig/20240716-152209-root.json
[15:23:03] <jinxer-wm>	 FIRING: [2x] KafkaUnderReplicatedPartitions: Under replicated partitions for Kafka cluster jumbo-eqiad in eqiad - https://wikitech.wikimedia.org/wiki/Kafka/Administration  - https://alerts.wikimedia.org/?q=alertname%3DKafkaUnderReplicatedPartitions
[15:23:29] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 4:00:00 on db2127.codfw.wmnet with reason: Maintenance
[15:23:42] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2127.codfw.wmnet with reason: Maintenance
[15:23:50] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Depooling db2127 (T367781)', diff saved to https://phabricator.wikimedia.org/P66642 and previous config saved to /var/cache/conftool/dbconfig/20240716-152349-arnaudb.json
[15:23:53] <stashbot>	 T367781: Drop deprecated abuse filter fields on wmf wikis - https://phabricator.wikimedia.org/T367781
[15:25:36] <jinxer-wm>	 FIRING: [3x] JobUnavailable: Reduced availability for job netbox_django in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[15:25:40] <icinga-wm>	 PROBLEM - Check unit status of statograph_post on alert1001 is CRITICAL: CRITICAL: Status of the systemd unit statograph_post https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[15:25:50] <icinga-wm>	 RECOVERY - BGP status on cr1-codfw is OK: BGP OK - up: 495, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[15:25:50] <icinga-wm>	 RECOVERY - BGP status on cr2-codfw is OK: BGP OK - up: 577, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[15:25:50] <icinga-wm>	 RECOVERY - BFD status on cr2-codfw is OK: UP: 20 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[15:25:50] <icinga-wm>	 RECOVERY - BFD status on cr1-codfw is OK: UP: 22 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[15:26:31] <wikibugs>	 (03CR) 10Jelto: [V:03+1 C:03+1] "PCC SUCCESS (NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/3250/console" [puppet] - 10https://gerrit.wikimedia.org/r/1054398 (owner: 10Dzahn)
[15:26:32] <wikibugs>	 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DBA, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 -lsw1-f2-eqiad - https://phabricator.wikimedia.org/T365997#9986154 (10cmooney) Upgrade completed, all hosts back online and pinging ok.  Thanks all for the assistance!
[15:26:46] <dcausse>	 jouncebot: nowandnext
[15:26:47] <jouncebot>	 For the next 0 hour(s) and 33 minute(s): SRE Collaboration Services office hours (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240716T1500)
[15:26:47] <jouncebot>	 In 0 hour(s) and 33 minute(s): Puppet request window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240716T1600)
[15:26:51] <godog>	 sukhe: centrallog2002 is back btw
[15:26:55] <wikibugs>	 (03CR) 10Jelto: [V:03+1 C:03+1] "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/1054398 (owner: 10Dzahn)
[15:26:56] <godog>	 hence the recovery
[15:27:03] <sukhe>	 godog: ok!
[15:27:04] <sukhe>	 thanks!
[15:27:12] <claime>	 !log Uncordoning kubernetes1062.eqiad.wmnet mw1494.eqiad.wmnet mw1495.eqiad.wmnet - T365997
[15:27:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:27:15] <stashbot>	 T365997: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 -lsw1-f2-eqiad	 - https://phabricator.wikimedia.org/T365997
[15:27:21] <logmsgbot>	 !log cgoubert@cumin1002 conftool action : set/pooled=yes; selector: name=(kubernetes1062.eqiad.wmnet|mw1494.eqiad.wmnet|mw1495.eqiad.wmnet),cluster=kubernetes,service=kubesvc
[15:28:03] <jinxer-wm>	 RESOLVED: [2x] KafkaUnderReplicatedPartitions: Under replicated partitions for Kafka cluster jumbo-eqiad in eqiad - https://wikitech.wikimedia.org/wiki/Kafka/Administration  - https://alerts.wikimedia.org/?q=alertname%3DKafkaUnderReplicatedPartitions
[15:28:56] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'db1194 (re)pooling @ 5%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66643 and previous config saved to /var/cache/conftool/dbconfig/20240716-152855-arnaudb.json
[15:29:10] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'db1200 (re)pooling @ 5%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66644 and previous config saved to /var/cache/conftool/dbconfig/20240716-152910-arnaudb.json
[15:29:19] <jinxer-wm>	 RESOLVED: [4x] ProbeDown: Service aqs1021-a:7000 has failed probes (tcp_cassandra_a_ssl_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[15:29:19] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'db1201 (re)pooling @ 5%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66645 and previous config saved to /var/cache/conftool/dbconfig/20240716-152918-arnaudb.json
[15:29:21] <wikibugs>	 (03PS1) 10Brennen Bearnes: logspam.pl: s/interests/interest/ [puppet] - 10https://gerrit.wikimedia.org/r/1054589
[15:29:38] <wikibugs>	 (03CR) 10DCausse: [C:03+2] cirrus-streaming-updater: bump image version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1054582 (https://phabricator.wikimedia.org/T368010) (owner: 10DCausse)
[15:30:05] <wikibugs>	 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DBA, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 -lsw1-f2-eqiad - https://phabricator.wikimedia.org/T365997#9986188 (10ABran-WMF) dbstore1009 has replication up to date on all 3 instances  all 3 other nodes are repooling ↑
[15:30:33] <wikibugs>	 (03Merged) 10jenkins-bot: cirrus-streaming-updater: bump image version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1054582 (https://phabricator.wikimedia.org/T368010) (owner: 10DCausse)
[15:30:53] <wikibugs>	 (03Abandoned) 10Brennen Bearnes: logspam.pl: s/interests/interest/ [puppet] - 10https://gerrit.wikimedia.org/r/1054589 (owner: 10Brennen Bearnes)
[15:31:47] <wikibugs>	 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DBA, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 -lsw1-f2-eqiad - https://phabricator.wikimedia.org/T365997#9986200 (10MatthewVernon) Swift looks good, thanks.
[15:31:55] <wikibugs>	 (03PS6) 10CDanis: otelcol: Stop hardcoding k8s master IP addresses [deployment-charts] - 10https://gerrit.wikimedia.org/r/1054394 (https://phabricator.wikimedia.org/T365855)
[15:31:55] <wikibugs>	 (03PS1) 10CDanis: Fix opentelemetry-collector chart CI [deployment-charts] - 10https://gerrit.wikimedia.org/r/1054594 (https://phabricator.wikimedia.org/T365855)
[15:32:27] <logmsgbot>	 !log dcausse@deploy1002 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
[15:32:32] <logmsgbot>	 !log dcausse@deploy1002 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
[15:33:19] <icinga-wm>	 PROBLEM - Kafka MirrorMaker main-eqiad_to_main-codfw max lag in last 10 minutes on alert1001 is CRITICAL: 1.004e+05 gt 1e+05 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/d/000000521/kafka-mirrormaker?var-datasource=codfw+prometheus/ops&var-lag_datasource=eqiad+prometheus/ops&var-mirror_name=main-eqiad_to_main-codfw
[15:33:39] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: 10gbit nic option for centrallog2002 - https://phabricator.wikimedia.org/T369826#9986225 (10fgiunchedi) 05Open→03Resolved a:03fgiunchedi This is done! We went with the procedure I suggested above, namely I took the host side configuration by logging back in via console...
[15:34:51] <wikibugs>	 (03CR) 10Ottomata: [C:03+1] eventbus: enable instrumentation on group 0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1054357 (https://phabricator.wikimedia.org/T363587) (owner: 10Gmodena)
[15:35:03] <logmsgbot>	 !log dcausse@deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
[15:36:20] <logmsgbot>	 !log dcausse@deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
[15:37:13] <jinxer-wm>	 FIRING: [2x] RoutinatorRsyncErrors: Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors
[15:37:16] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P66646 and previous config saved to /var/cache/conftool/dbconfig/20240716-153715-root.json
[15:37:35] <papaul>	 !log reboot fpc0 on fasw-c-codfw.mgmt.codfw.wmnet
[15:37:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:38:39] <icinga-wm>	 PROBLEM - Druid coordinator on druid1011 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.druid.cli.Main server coordinator https://wikitech.wikimedia.org/wiki/Analytics/Systems/Druid
[15:39:08] <logmsgbot>	 !log dcausse@deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
[15:39:22] <logmsgbot>	 !log dcausse@deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
[15:39:48] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+2] parsoid testing: Switch api_proxy_uri [puppet] - 10https://gerrit.wikimedia.org/r/1053651 (https://phabricator.wikimedia.org/T367949) (owner: 10Clément Goubert)
[15:41:10] <wikibugs>	 (03CR) 10JMeybohm: [C:03+1] otelcol: Stop hardcoding k8s master IP addresses [deployment-charts] - 10https://gerrit.wikimedia.org/r/1054394 (https://phabricator.wikimedia.org/T365855) (owner: 10CDanis)
[15:41:20] <wikibugs>	 (03CR) 10JMeybohm: [C:03+1] Fix opentelemetry-collector chart CI [deployment-charts] - 10https://gerrit.wikimedia.org/r/1054594 (https://phabricator.wikimedia.org/T365855) (owner: 10CDanis)
[15:41:33] <wikibugs>	 (03CR) 10CDanis: [C:03+2] Fix opentelemetry-collector chart CI [deployment-charts] - 10https://gerrit.wikimedia.org/r/1054594 (https://phabricator.wikimedia.org/T365855) (owner: 10CDanis)
[15:41:38] <wikibugs>	 (03CR) 10CDanis: [C:03+2] otelcol: Stop hardcoding k8s master IP addresses (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1054394 (https://phabricator.wikimedia.org/T365855) (owner: 10CDanis)
[15:41:41] <icinga-wm>	 PROBLEM - Router interfaces on pfw3-codfw is CRITICAL: CRITICAL: host 208.80.153.197, interfaces up: 50, down: 1, dormant: 0, excluded: 3, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[15:41:47] <icinga-wm>	 PROBLEM - Juniper virtual chassis ports on fasw-c-codfw is CRITICAL: CRIT: Down: 2 Unknown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23VCP_status
[15:43:43] <icinga-wm>	 RECOVERY - Router interfaces on pfw3-codfw is OK: OK: host 208.80.153.197, interfaces up: 58, down: 0, dormant: 0, excluded: 3, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[15:43:47] <icinga-wm>	 RECOVERY - Juniper virtual chassis ports on fasw-c-codfw is OK: OK: UP: 4 https://wikitech.wikimedia.org/wiki/Network_monitoring%23VCP_status
[15:44:01] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'db1194 (re)pooling @ 10%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66647 and previous config saved to /var/cache/conftool/dbconfig/20240716-154401-arnaudb.json
[15:44:09] <stashbot>	 T365997: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 -lsw1-f2-eqiad	 - https://phabricator.wikimedia.org/T365997
[15:44:16] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'db1200 (re)pooling @ 10%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66648 and previous config saved to /var/cache/conftool/dbconfig/20240716-154415-arnaudb.json
[15:44:25] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'db1201 (re)pooling @ 10%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66649 and previous config saved to /var/cache/conftool/dbconfig/20240716-154424-arnaudb.json
[15:44:38] <wikibugs>	 (03Merged) 10jenkins-bot: Fix opentelemetry-collector chart CI [deployment-charts] - 10https://gerrit.wikimedia.org/r/1054594 (https://phabricator.wikimedia.org/T365855) (owner: 10CDanis)
[15:44:58] <wikibugs>	 (03Merged) 10jenkins-bot: otelcol: Stop hardcoding k8s master IP addresses [deployment-charts] - 10https://gerrit.wikimedia.org/r/1054394 (https://phabricator.wikimedia.org/T365855) (owner: 10CDanis)
[15:45:38] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2127 (T367781)', diff saved to https://phabricator.wikimedia.org/P66650 and previous config saved to /var/cache/conftool/dbconfig/20240716-154537-arnaudb.json
[15:45:42] <stashbot>	 T367781: Drop deprecated abuse filter fields on wmf wikis - https://phabricator.wikimedia.org/T367781
[15:48:46] <jinxer-wm>	 FIRING: Emergency syslog message: Alert for device fasw-c-codfw.mgmt.codfw.wmnet - Emergency syslog message   - https://alerts.wikimedia.org/?q=alertname%3DEmergency+syslog+message
[15:49:10] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 10fundraising-tech-ops, 13Patch-For-Review: Q1:rack/setup/install frand200[12] - https://phabricator.wikimedia.org/T367804#9986373 (10Papaul) ` papaul@fasw-c-codfw# run show interfaces ge-[0-1]/0/17 descriptions Interface       Admin Link Description ge-0/0/17       up    up...
[15:49:23] <wikibugs>	 (03PS1) 10Lucas Werkmeister (WMDE): systemd::timer::job: Use TimeoutStartSec= [puppet] - 10https://gerrit.wikimedia.org/r/1054603 (https://phabricator.wikimedia.org/T370171)
[15:51:27] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): "CCing Bryan who added this in I7312a6130b. I opted not to rename the `max_runtime_seconds` parameter, as it already didn’t 100% match the " [puppet] - 10https://gerrit.wikimedia.org/r/1054603 (https://phabricator.wikimedia.org/T370171) (owner: 10Lucas Werkmeister (WMDE))
[15:52:22] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P66651 and previous config saved to /var/cache/conftool/dbconfig/20240716-155221-root.json
[15:53:41] <wikibugs>	 (03Abandoned) 10JMeybohm: Add kyverno_policy_parser [deployment-charts] - 10https://gerrit.wikimedia.org/r/1052964 (https://phabricator.wikimedia.org/T368251) (owner: 10JMeybohm)
[15:53:46] <jinxer-wm>	 RESOLVED: Emergency syslog message: Device fasw-c-codfw.mgmt.codfw.wmnet recovered from Emergency syslog message   - https://alerts.wikimedia.org/?q=alertname%3DEmergency+syslog+message
[15:58:19] <elukey>	 !log uploaded spicerack_8.7.0 to apt.wikimedia.org bullseye-wikimedia
[15:58:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:58:39] <icinga-wm>	 RECOVERY - Druid coordinator on druid1011 is OK: PROCS OK: 1 process with command name java, args org.apache.druid.cli.Main server coordinator https://wikitech.wikimedia.org/wiki/Analytics/Systems/Druid
[15:59:06] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'db1194 (re)pooling @ 25%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66652 and previous config saved to /var/cache/conftool/dbconfig/20240716-155905-arnaudb.json
[15:59:10] <stashbot>	 T365997: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 -lsw1-f2-eqiad	 - https://phabricator.wikimedia.org/T365997
[15:59:19] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): "I suppose this is a somewhat risky change… several services (`git grep max_runtime_seconds`) which previously declared a max runtime but d" [puppet] - 10https://gerrit.wikimedia.org/r/1054603 (https://phabricator.wikimedia.org/T370171) (owner: 10Lucas Werkmeister (WMDE))
[15:59:21] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'db1200 (re)pooling @ 25%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66653 and previous config saved to /var/cache/conftool/dbconfig/20240716-155920-arnaudb.json
[15:59:30] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'db1201 (re)pooling @ 25%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66654 and previous config saved to /var/cache/conftool/dbconfig/20240716-155930-arnaudb.json
[16:00:05] <jouncebot>	 jhathaway and rzl: It is that lovely time of the day again! You are hereby commanded to deploy Puppet request window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240716T1600).
[16:00:05] <jouncebot>	 No Gerrit patches in the queue for this window AFAICS.
[16:00:45] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P66655 and previous config saved to /var/cache/conftool/dbconfig/20240716-160044-arnaudb.json
[16:02:55] <wikibugs>	 (03PS1) 10Aklapper: Phabricator: Update recipients of quarterly metrics mail [puppet] - 10https://gerrit.wikimedia.org/r/1054605 (https://phabricator.wikimedia.org/T370167)
[16:04:56] <wikibugs>	 (03PS1) 10Clément Goubert: parsoid::testing: remove unused file [puppet] - 10https://gerrit.wikimedia.org/r/1054607
[16:05:22] <wikibugs>	 (03CR) 10Clément Goubert: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1054607 (owner: 10Clément Goubert)
[16:05:35] <wikibugs>	 (03PS1) 10Mforns: commons-impact-analytics: bump image to v1.0.5 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1054609 (https://phabricator.wikimedia.org/T369745)
[16:05:40] <wikibugs>	 (03PS1) 10Dzahn: lists: ensure list member sync only happens on the active server [puppet] - 10https://gerrit.wikimedia.org/r/1054610 (https://phabricator.wikimedia.org/T351202)
[16:09:01] <wikibugs>	 (03CR) 10Hashar: [C:03+1] "That is a great idea yes!" [puppet] - 10https://gerrit.wikimedia.org/r/1006979 (https://phabricator.wikimedia.org/T323073) (owner: 10Dzahn)
[16:11:11] <icinga-wm>	 PROBLEM - Uncommitted DNS changes in Netbox on netbox1002 is CRITICAL: Netbox has uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes
[16:11:52] <wikibugs>	 (03CR) 10Scott French: [C:03+1] parsoid::testing: remove unused file [puppet] - 10https://gerrit.wikimedia.org/r/1054607 (owner: 10Clément Goubert)
[16:13:21] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, July 16 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1050083 (https://phabricator.wikimedia.org/T367150) (owner: 10Jdlrobson)
[16:13:25] <sukhe>	 +franio2001                               1H IN A 10.195.0.99                                                                         
[16:13:28] <sukhe>	 +franio2002                               1H IN A 10.195.0.100                                                                        
[16:13:31] <sukhe>	 +franio2003                               1H IN A 10.195.0.101                                                                        
[16:13:34] <sukhe>	 pending DNS changes ^ 
[16:13:39] <wikibugs>	 (03PS5) 10Jdlrobson: [July 16th] Enable dark mode for logged out users (tier 1 and tier 2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1050083 (https://phabricator.wikimedia.org/T367150)
[16:14:10] <sukhe>	 does anyone know who is working on these?
[16:14:12] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'db1194 (re)pooling @ 50%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66656 and previous config saved to /var/cache/conftool/dbconfig/20240716-161411-arnaudb.json
[16:14:16] <stashbot>	 T365997: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 -lsw1-f2-eqiad	 - https://phabricator.wikimedia.org/T365997
[16:14:27] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'db1200 (re)pooling @ 50%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66657 and previous config saved to /var/cache/conftool/dbconfig/20240716-161426-arnaudb.json
[16:14:29] <wikibugs>	 (03CR) 10CI reject: [V:04-1] [July 16th] Enable dark mode for logged out users (tier 1 and tier 2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1050083 (https://phabricator.wikimedia.org/T367150) (owner: 10Jdlrobson)
[16:14:36] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'db1201 (re)pooling @ 50%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66658 and previous config saved to /var/cache/conftool/dbconfig/20240716-161435-arnaudb.json
[16:15:00] <sukhe>	 JennH: sorry, are you working on franio?
[16:15:52] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P66659 and previous config saved to /var/cache/conftool/dbconfig/20240716-161552-arnaudb.json
[16:16:32] <JennH>	 Sukhe: not at the moment. I did set up the mgmt ips for them earlier.
[16:16:49] <sukhe>	 ah OK that might be it then
[16:17:14] <sukhe>	 I am going to merge the changes then if that's OK? because this will block any other DNS changes to be merged
[16:18:18] <wikibugs>	 (03CR) 10BryanDavis: [C:03+1] "I wonder if the "Note that this setting does not have any effect on Type=oneshot services, as they terminate immediately after activation " [puppet] - 10https://gerrit.wikimedia.org/r/1054603 (https://phabricator.wikimedia.org/T370171) (owner: 10Lucas Werkmeister (WMDE))
[16:18:32] <logmsgbot>	 !log sukhe@cumin1002 START - Cookbook sre.dns.netbox
[16:19:45] <wikibugs>	 10SRE-tools, 06Infrastructure-Foundations, 10Spicerack, 13Patch-For-Review: Create the python-release repository - https://phabricator.wikimedia.org/T367410#9986547 (10elukey) a:03elukey
[16:20:46] <logmsgbot>	 !log sukhe@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: merge DNS franio changes (add mgmt IPs) - sukhe@cumin1002"
[16:21:42] <logmsgbot>	 !log sukhe@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: merge DNS franio changes (add mgmt IPs) - sukhe@cumin1002"
[16:21:42] <logmsgbot>	 !log sukhe@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[16:21:47] <sukhe>	 JennH: merged. thanks!
[16:23:06] <sukhe>	 JennH: what happened here was that we made changes in Netbox but we didn't run the cookbook (cookbook sre.dns.netbox) and hence the change were pending
[16:23:34] <JennH>	 Oops my bad ty for getting that!
[16:24:15] <sukhe>	 np at all
[16:26:11] <icinga-wm>	 RECOVERY - Uncommitted DNS changes in Netbox on netbox1002 is OK: Netbox has zero uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes
[16:29:17] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'db1194 (re)pooling @ 75%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66660 and previous config saved to /var/cache/conftool/dbconfig/20240716-162916-arnaudb.json
[16:29:24] <stashbot>	 T365997: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 -lsw1-f2-eqiad	 - https://phabricator.wikimedia.org/T365997
[16:29:32] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'db1200 (re)pooling @ 75%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66661 and previous config saved to /var/cache/conftool/dbconfig/20240716-162931-arnaudb.json
[16:29:41] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'db1201 (re)pooling @ 75%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66662 and previous config saved to /var/cache/conftool/dbconfig/20240716-162940-arnaudb.json
[16:30:59] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2127 (T367781)', diff saved to https://phabricator.wikimedia.org/P66663 and previous config saved to /var/cache/conftool/dbconfig/20240716-163059-arnaudb.json
[16:31:01] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 4:00:00 on db2139.codfw.wmnet with reason: Maintenance
[16:31:03] <stashbot>	 T367781: Drop deprecated abuse filter fields on wmf wikis - https://phabricator.wikimedia.org/T367781
[16:31:04] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2139.codfw.wmnet with reason: Maintenance
[16:32:04] <wikibugs>	 06SRE, 10SRE-Access-Requests, 06Data-Engineering, 10LDAP-Access-Requests: LDAP access to the analytics-privatedata-users group for Quiddity - https://phabricator.wikimedia.org/T370091#9986638 (10Quiddity) I've read and signed the L3, and read the Responsibilities document. Thanks.
[16:39:40] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.reimage for host dbproxy2006.codfw.wmnet with OS bookworm
[16:39:49] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06DBA, 06DC-Ops: Q#:rack/setup/install dbproxy200[5-8] - https://phabricator.wikimedia.org/T362824#9986718 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host dbproxy2006.codfw.wmnet with OS bookworm
[16:41:34] <wikibugs>	 (03PS1) 10Herron: wip [puppet] - 10https://gerrit.wikimedia.org/r/1054617
[16:42:42] <wikibugs>	 (03PS6) 10Jdlrobson: [July 16th] Enable dark mode for logged out users (tier 1 and tier 2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1050083 (https://phabricator.wikimedia.org/T367150)
[16:42:49] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 10fundraising-tech-ops: Q1:rack/setup/install frand200[12] - https://phabricator.wikimedia.org/T367804#9986734 (10Papaul)
[16:43:20] <wikibugs>	 (03CR) 10CI reject: [V:04-1] [July 16th] Enable dark mode for logged out users (tier 1 and tier 2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1050083 (https://phabricator.wikimedia.org/T367150) (owner: 10Jdlrobson)
[16:44:22] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'db1194 (re)pooling @ 100%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66664 and previous config saved to /var/cache/conftool/dbconfig/20240716-164422-arnaudb.json
[16:44:26] <stashbot>	 T365997: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 -lsw1-f2-eqiad	 - https://phabricator.wikimedia.org/T365997
[16:44:38] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'db1200 (re)pooling @ 100%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66665 and previous config saved to /var/cache/conftool/dbconfig/20240716-164437-arnaudb.json
[16:44:47] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'db1201 (re)pooling @ 100%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66666 and previous config saved to /var/cache/conftool/dbconfig/20240716-164446-arnaudb.json
[16:46:13] <wikibugs>	 (03PS7) 10Jdlrobson: [July 16th] Enable dark mode for logged out users (tier 1 and tier 2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1050083 (https://phabricator.wikimedia.org/T367150)
[16:46:51] <wikibugs>	 (03CR) 10CI reject: [V:04-1] [July 16th] Enable dark mode for logged out users (tier 1 and tier 2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1050083 (https://phabricator.wikimedia.org/T367150) (owner: 10Jdlrobson)
[16:47:19] <icinga-wm>	 RECOVERY - Kafka MirrorMaker main-eqiad_to_main-codfw max lag in last 10 minutes on alert1001 is OK: (C)1e+05 gt (W)1e+04 gt 7771 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/d/000000521/kafka-mirrormaker?var-datasource=codfw+prometheus/ops&var-lag_datasource=eqiad+prometheus/ops&var-mirror_name=main-eqiad_to_main-codfw
[16:47:33] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+2] verp_bounce_post_url: Switch to mw-api-int [puppet] - 10https://gerrit.wikimedia.org/r/1053650 (https://phabricator.wikimedia.org/T367949) (owner: 10Clément Goubert)
[16:48:43] <wikibugs>	 (03PS8) 10Jdlrobson: [July 16th] Enable dark mode for logged out users (tier 1 and tier 2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1050083 (https://phabricator.wikimedia.org/T367150)
[16:50:08] <wikibugs>	 (03PS1) 10Elukey: sre.network.tls: use a different client certificate to authenticate [cookbooks] - 10https://gerrit.wikimedia.org/r/1054618 (https://phabricator.wikimedia.org/T355750)
[16:50:30] <wikibugs>	 (03CR) 10Dzahn: [V:03+1 C:03+2] "needs follow-up to ensure it does NOT also run on the failover host: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1054610" [puppet] - 10https://gerrit.wikimedia.org/r/1053399 (https://phabricator.wikimedia.org/T351202) (owner: 10Dzahn)
[16:51:16] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 4:00:00 on db2149.codfw.wmnet with reason: Maintenance
[16:51:29] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2149.codfw.wmnet with reason: Maintenance
[16:51:36] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Depooling db2149 (T367781)', diff saved to https://phabricator.wikimedia.org/P66667 and previous config saved to /var/cache/conftool/dbconfig/20240716-165135-arnaudb.json
[16:51:39] <stashbot>	 T367781: Drop deprecated abuse filter fields on wmf wikis - https://phabricator.wikimedia.org/T367781
[16:51:51] <wikibugs>	 (03CR) 10Elukey: "Folks I added more people as pebkac prevention scheme. This seems to work from a manual test on cumin1002, but lemme know if I got it wron" [cookbooks] - 10https://gerrit.wikimedia.org/r/1054618 (https://phabricator.wikimedia.org/T355750) (owner: 10Elukey)
[16:53:32] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] "https://puppet-compiler.wmflabs.org/output/1054610/3251/lists2001.wikimedia.org/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/1054610 (https://phabricator.wikimedia.org/T351202) (owner: 10Dzahn)
[16:53:37] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy2006.codfw.wmnet with reason: host reimage
[16:53:52] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] "disables timers on lists2001" [puppet] - 10https://gerrit.wikimedia.org/r/1054610 (https://phabricator.wikimedia.org/T351202) (owner: 10Dzahn)
[16:56:31] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy2006.codfw.wmnet with reason: host reimage
[16:57:31] <wikibugs>	 (03CR) 10EoghanGaffney: [C:03+1] lists: ensure list member sync only happens on the active server [puppet] - 10https://gerrit.wikimedia.org/r/1054610 (https://phabricator.wikimedia.org/T351202) (owner: 10Dzahn)
[17:00:04] <jouncebot>	 swfrench-wmf: It is that lovely time of the day again! You are hereby commanded to deploy MediaWiki infrastructure (UTC late). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240716T1700).
[17:00:10] <wikibugs>	 06SRE, 10SRE-Access-Requests, 06Data-Engineering, 10LDAP-Access-Requests: LDAP access to the analytics-privatedata-users group for Quiddity - https://phabricator.wikimedia.org/T370091#9986828 (10Ottomata) Approved!
[17:00:35] <mutante>	 !log lists2001 - systemctl reset-failed after gerrit:1054610 to fix T370098
[17:00:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:00:38] <stashbot>	 T370098: SystemdUnitFailed - lists2001 - sync-list-members - https://phabricator.wikimedia.org/T370098
[17:02:53] <wikibugs>	 (03CR) 10Dzahn: "service is already effectively disabled now since yesterday at DNS level - i'm just going to wait a bit before merging these" [puppet] - 10https://gerrit.wikimedia.org/r/1006979 (https://phabricator.wikimedia.org/T323073) (owner: 10Dzahn)
[17:03:04] <swfrench-wmf>	 here - still confirming a couple of remaining items before proceeding
[17:03:24] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] Phabricator: Update recipients of quarterly metrics mail [puppet] - 10https://gerrit.wikimedia.org/r/1054605 (https://phabricator.wikimedia.org/T370167) (owner: 10Aklapper)
[17:06:05] <RoanKattouw>	 marostegui: Yes sorry I had a long-running script on euwiki, I'll see how far it got and decide whether to restart it. I was running these scripts on a per-wiki basis but apparently that isn't enough because some wikis are large enough that it takes multiple days
[17:12:21] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2149 (T367781)', diff saved to https://phabricator.wikimedia.org/P66668 and previous config saved to /var/cache/conftool/dbconfig/20240716-171220-arnaudb.json
[17:12:25] <stashbot>	 T367781: Drop deprecated abuse filter fields on wmf wikis - https://phabricator.wikimedia.org/T367781
[17:12:49] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
[17:14:35] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
[17:14:41] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy2006.codfw.wmnet with OS bookworm
[17:14:49] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06DBA, 06DC-Ops: Q#:rack/setup/install dbproxy200[5-8] - https://phabricator.wikimedia.org/T362824#9986905 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host dbproxy2006.codfw.wmnet with OS bookworm completed: - dbproxy...
[17:15:48] <wikibugs>	 (03CR) 10Dzahn: [C:04-1] "not decom'ed yet" [puppet] - 10https://gerrit.wikimedia.org/r/1053791 (https://phabricator.wikimedia.org/T363402) (owner: 10Dzahn)
[17:18:01] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] wdqs graph split: route / to miscweb microsite [puppet] - 10https://gerrit.wikimedia.org/r/1053756 (https://phabricator.wikimedia.org/T364367) (owner: 10Ryan Kemper)
[17:19:19] <jinxer-wm>	 FIRING: SystemdUnitFailed: generate_vrts_aliases.service on mx1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[17:27:28] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P66669 and previous config saved to /var/cache/conftool/dbconfig/20240716-172727-arnaudb.json
[17:28:50] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 10fundraising-tech-ops: Q1:rack/setup/install frand200[12] - https://phabricator.wikimedia.org/T367804#9986950 (10Papaul) a:05Papaul→03Dwisehaupt @Dwisehaupt those are ready for OS install
[17:28:56] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06DBA, 06DC-Ops: Q#:rack/setup/install dbproxy200[5-8] - https://phabricator.wikimedia.org/T362824#9986955 (10Papaul)
[17:33:40] <jinxer-wm>	 FIRING: SystemdUnitFailed: docker-reporter-base-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[17:36:13] <wikibugs>	 (03PS1) 10Ottomata: Update refinery_version for canary_events, test refine, and test refine_sanitize.pp [puppet] - 10https://gerrit.wikimedia.org/r/1054623 (https://phabricator.wikimedia.org/T367949)
[17:37:45] <swfrench-wmf>	 update - I'm going to proceed with a subset of the planned depools while the remaining analytics workload is investigated
[17:38:40] <wikibugs>	 (03CR) 10Ottomata: [C:03+2] Update refinery_version for canary_events, test refine, and test refine_sanitize.pp [puppet] - 10https://gerrit.wikimedia.org/r/1054623 (https://phabricator.wikimedia.org/T367949) (owner: 10Ottomata)
[17:39:40] <wikibugs>	 (03PS2) 10Ottomata: Update refinery_version for canary_events, test refine and refine_sanitize [puppet] - 10https://gerrit.wikimedia.org/r/1054623 (https://phabricator.wikimedia.org/T367949)
[17:39:40] <logmsgbot>	 !log swfrench@cumin2002 conftool action : set/pooled=false; selector: dnsdisc=appservers-ro,name=codfw [reason: Depooling ahead of turndown - T367949]
[17:39:44] <stashbot>	 T367949: Spin down api_appserver and appserver clusters - https://phabricator.wikimedia.org/T367949
[17:39:47] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Update refinery_version for canary_events, test refine and refine_sanitize [puppet] - 10https://gerrit.wikimedia.org/r/1054623 (https://phabricator.wikimedia.org/T367949) (owner: 10Ottomata)
[17:40:06] <wikibugs>	 (03CR) 10Ottomata: [V:03+2 C:03+2] Update refinery_version for canary_events, test refine and refine_sanitize [puppet] - 10https://gerrit.wikimedia.org/r/1054623 (https://phabricator.wikimedia.org/T367949) (owner: 10Ottomata)
[17:40:11] <logmsgbot>	 !log swfrench@cumin2002 conftool action : set/pooled=false; selector: dnsdisc=api-ro,name=codfw [reason: Depooling ahead of turndown - T367949]
[17:42:03] <wikibugs>	 (03PS1) 10Dzahn: Revert^2 "wdqs: microsites for wdqs graph split" [puppet] - 10https://gerrit.wikimedia.org/r/1054624
[17:42:14] <wikibugs>	 (03PS1) 10Tchanders: Enable temporary accounts on testwiki and loginwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1054625 (https://phabricator.wikimedia.org/T348895)
[17:42:35] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P66670 and previous config saved to /var/cache/conftool/dbconfig/20240716-174235-arnaudb.json
[17:43:37] <wikibugs>	 (03CR) 10Ryan Kemper: [C:03+1] Revert^2 "wdqs: microsites for wdqs graph split" [puppet] - 10https://gerrit.wikimedia.org/r/1054624 (owner: 10Dzahn)
[17:43:58] <logmsgbot>	 !log swfrench@cumin2002 conftool action : set/pooled=false; selector: dnsdisc=appservers-rw,name=eqiad [reason: Depooling ahead of turndown - T367949]
[17:44:15] <logmsgbot>	 !log swfrench@cumin2002 conftool action : set/pooled=false; selector: dnsdisc=api-rw,name=eqiad [reason: Depooling ahead of turndown - T367949]
[17:44:15] <logmsgbot>	 !log otto@deploy1002 Started deploy [analytics/refinery@f97900c] (hadoop-test): Deploy refinery with refinery-source version 0.2.44 for mw on k8s - TEST [analytics/refinery@f97900c9]
[17:45:18] <marostegui>	 RoanKattouw: no worries, thanks for letting me know :)
[17:45:56] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] Revert^2 "wdqs: microsites for wdqs graph split" [puppet] - 10https://gerrit.wikimedia.org/r/1054624 (owner: 10Dzahn)
[17:46:11] <swfrench-wmf>	 !log appservers-rw and api-rw now resolve to failoid - T367949
[17:46:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:46:15] <stashbot>	 T367949: Spin down api_appserver and appserver clusters - https://phabricator.wikimedia.org/T367949
[17:47:39] <logmsgbot>	 !log otto@deploy1002 Finished deploy [analytics/refinery@f97900c] (hadoop-test): Deploy refinery with refinery-source version 0.2.44 for mw on k8s - TEST [analytics/refinery@f97900c9] (duration: 03m 23s)
[17:47:39] <logmsgbot>	 !log otto@deploy1002 Started deploy [analytics/refinery@f97900c]: Deploy refinery with refinery-source version 0.2.44 for mw on k8s [analytics/refinery@f97900c9]
[17:53:14] <logmsgbot>	 !log cdanis@deploy1002 helmfile [codfw] START helmfile.d/admin 'apply'.
[17:53:14] <wikibugs>	 (03PS1) 10Ottomata: Update refinery_version for  refine and refine_sanitize [puppet] - 10https://gerrit.wikimedia.org/r/1054629 (https://phabricator.wikimedia.org/T367949)
[17:55:20] <logmsgbot>	 !log cdanis@deploy1002 helmfile [codfw] DONE helmfile.d/admin 'apply'.
[17:55:35] <logmsgbot>	 !log otto@deploy1002 Finished deploy [analytics/refinery@f97900c]: Deploy refinery with refinery-source version 0.2.44 for mw on k8s [analytics/refinery@f97900c9] (duration: 08m 33s)
[17:55:43] <logmsgbot>	 !log otto@deploy1002 Started deploy [analytics/refinery@f97900c]: Deploy refinery with refinery-source version 0.2.44 for mw on k8s - take 2 [analytics/refinery@f97900c9]
[17:57:18] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] "@Steve your change is now effectively deployed (reverted the revert). and both new sites show the SPARQL input form. seems to all work fin" [puppet] - 10https://gerrit.wikimedia.org/r/1046121 (https://phabricator.wikimedia.org/T364367) (owner: 10Stevemunene)
[17:57:42] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2149 (T367781)', diff saved to https://phabricator.wikimedia.org/P66671 and previous config saved to /var/cache/conftool/dbconfig/20240716-175742-arnaudb.json
[17:57:45] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 4:00:00 on db2156.codfw.wmnet with reason: Maintenance
[17:57:46] <stashbot>	 T367781: Drop deprecated abuse filter fields on wmf wikis - https://phabricator.wikimedia.org/T367781
[17:57:58] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2156.codfw.wmnet with reason: Maintenance
[17:58:00] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 8:00:00 on db2186.codfw.wmnet with reason: Maintenance
[17:58:13] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2186.codfw.wmnet with reason: Maintenance
[17:58:20] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Depooling db2156 (T367781)', diff saved to https://phabricator.wikimedia.org/P66672 and previous config saved to /var/cache/conftool/dbconfig/20240716-175820-arnaudb.json
[17:58:28] <logmsgbot>	 !log otto@deploy1002 Finished deploy [analytics/refinery@f97900c]: Deploy refinery with refinery-source version 0.2.44 for mw on k8s - take 2 [analytics/refinery@f97900c9] (duration: 02m 44s)
[17:58:32] <logmsgbot>	 !log otto@deploy1002 Started deploy [analytics/refinery@f97900c]: Deploy refinery with refinery-source version 0.2.44 for mw on k8s - take 3 [analytics/refinery@f97900c9]
[17:59:19] <logmsgbot>	 !log otto@deploy1002 Finished deploy [analytics/refinery@f97900c]: Deploy refinery with refinery-source version 0.2.44 for mw on k8s - take 3 [analytics/refinery@f97900c9] (duration: 00m 47s)
[18:00:04] <jouncebot>	 dancy and andre: #bothumor My software never has bugs. It just develops random features. Rise for MediaWiki train - Utc-7 Version. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240716T1800).
[18:00:12] <dancy>	 o/
[18:00:19] <andre>	 fallback o/
[18:00:29] <dancy>	 Andre!
[18:00:46] <dancy>	 Upgrading scap first...
[18:00:50] <andre>	 no no, I'm just a bot account, I swear! :)
[18:00:50] <logmsgbot>	 !log dancy@deploy1002 Installing scap version "4.92.0" for 232 hosts
[18:01:55] <wikibugs>	 (03CR) 10Kosta Harlan: Enable temporary accounts on testwiki and loginwiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1054625 (https://phabricator.wikimedia.org/T348895) (owner: 10Tchanders)
[18:02:00] <wikibugs>	 (03PS1) 10TrainBranchBot: group0 wikis to 1.43.0-wmf.14 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1054630 (https://phabricator.wikimedia.org/T366959)
[18:02:02] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] group0 wikis to 1.43.0-wmf.14 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1054630 (https://phabricator.wikimedia.org/T366959) (owner: 10TrainBranchBot)
[18:02:45] <wikibugs>	 (03Merged) 10jenkins-bot: group0 wikis to 1.43.0-wmf.14 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1054630 (https://phabricator.wikimedia.org/T366959) (owner: 10TrainBranchBot)
[18:04:58] <dancy>	 Hmm... docker_pull_k8s is hanging on 26 nodes.
[18:05:47] <dancy>	 ah, there is goes.. That was weird.
[18:06:53] <dancy>	 hmm. something's not right.
[18:08:04] <wikibugs>	 (03CR) 10Ottomata: [C:03+2] Update refinery_version for  refine and refine_sanitize [puppet] - 10https://gerrit.wikimedia.org/r/1054629 (https://phabricator.wikimedia.org/T367949) (owner: 10Ottomata)
[18:09:45] <wikibugs>	 (03PS13) 10CDobbins: varnish: show better error for 429s [puppet] - 10https://gerrit.wikimedia.org/r/1041705 (https://phabricator.wikimedia.org/T354718)
[18:11:19] <wikibugs>	 (03PS1) 10Ottomata: Disable produce_canary_events systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/1054633 (https://phabricator.wikimedia.org/T370186)
[18:12:03] <wikibugs>	 (03CR) 10Btullis: [C:03+1] "Looks good to me." [puppet] - 10https://gerrit.wikimedia.org/r/1054633 (https://phabricator.wikimedia.org/T370186) (owner: 10Ottomata)
[18:12:07] <wikibugs>	 (03CR) 10Tchanders: Enable temporary accounts on testwiki and loginwiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1054625 (https://phabricator.wikimedia.org/T348895) (owner: 10Tchanders)
[18:14:10] <logmsgbot>	 !log dancy@deploy1002 rebuilt and synchronized wikiversions files: group0 wikis to 1.43.0-wmf.14  refs T366959
[18:14:14] <stashbot>	 T366959: 1.43.0-wmf.14 deployment blockers - https://phabricator.wikimedia.org/T366959
[18:15:45] <wikibugs>	 (03CR) 10Ottomata: [C:03+2] Disable produce_canary_events systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/1054633 (https://phabricator.wikimedia.org/T370186) (owner: 10Ottomata)
[18:16:04] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06DBA, 06DC-Ops: Q#:rack/setup/install dbproxy200[5-8] - https://phabricator.wikimedia.org/T362824#9987115 (10Papaul)
[18:16:26] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: Request additional mgmt IP range for frack servers - https://phabricator.wikimedia.org/T370164#9987104 (10Papaul)
[18:17:55] <wikibugs>	 (03PS1) 10Pppery: Add extra date elements for arcanist [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1054634 (https://phabricator.wikimedia.org/T363188)
[18:17:57] <wikibugs>	 (03PS1) 10Pppery: Update source strings for 2024.19 [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1054635 (https://phabricator.wikimedia.org/T363188)
[18:19:23] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: generate_vrts_aliases.service on mx1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[18:19:43] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2156 (T367781)', diff saved to https://phabricator.wikimedia.org/P66674 and previous config saved to /var/cache/conftool/dbconfig/20240716-181942-arnaudb.json
[18:19:47] <stashbot>	 T367781: Drop deprecated abuse filter fields on wmf wikis - https://phabricator.wikimedia.org/T367781
[18:21:47] <wikibugs>	 (03PS1) 10Ottomata: refinery - Remove produce_canary_events code [puppet] - 10https://gerrit.wikimedia.org/r/1054636 (https://phabricator.wikimedia.org/T370186)
[18:22:18] <wikibugs>	 (03PS1) 10CDanis: otelcol: use proper Calico selector syntax [deployment-charts] - 10https://gerrit.wikimedia.org/r/1054637 (https://phabricator.wikimedia.org/T365855)
[18:26:22] <wikibugs>	 (03CR) 10Ahmon Dancy: git: remove umask from git::clone (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/927986 (https://phabricator.wikimedia.org/T338277) (owner: 10Hashar)
[18:27:29] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.reimage for host dbproxy2007.codfw.wmnet with OS bookworm
[18:27:37] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06DBA, 06DC-Ops: Q#:rack/setup/install dbproxy200[5-8] - https://phabricator.wikimedia.org/T362824#9987166 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host dbproxy2007.codfw.wmnet with OS bookworm
[18:30:14] <wikibugs>	 (03PS2) 10Pppery: Add extra date elements for arcanist [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1054634 (https://phabricator.wikimedia.org/T363188)
[18:32:28] <wikibugs>	 (03PS3) 10Pppery: Add extra date elements for arcanist [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1054634 (https://phabricator.wikimedia.org/T363188)
[18:34:50] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P66675 and previous config saved to /var/cache/conftool/dbconfig/20240716-183449-arnaudb.json
[18:37:29] <wikibugs>	 (03PS4) 10Pppery: Add extra date elements for arcanist [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1054634 (https://phabricator.wikimedia.org/T363188)
[18:38:26] <wikibugs>	 06SRE, 06collaboration-services, 06Traffic, 13Patch-For-Review, 10Release-Engineering-Team (Radar): implement anti-abuse features for GitLab (Move GitLab behind the CDN) - https://phabricator.wikimedia.org/T366882#9987283 (10brennen)
[18:39:57] <wikibugs>	 (03PS1) 10Ebrahim: Enable ICU provided alphabetical order in the Kurdish wiki categories [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1054641 (https://phabricator.wikimedia.org/T48235)
[18:42:28] <wikibugs>	 (03PS2) 10Ebrahim: Enable ICU provided alphabetical order in the Kurdish wikis categories [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1054641 (https://phabricator.wikimedia.org/T48235)
[18:43:16] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, July 16 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item" [extensions/CommunityConfiguration] (wmf/1.43.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1054558 (owner: 10Michael Große)
[18:43:51] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, July 16 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item" [extensions/CommunityConfiguration] (wmf/1.43.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1054553 (https://phabricator.wikimedia.org/T368606) (owner: 10Michael Große)
[18:44:23] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, July 16 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item" [extensions/CommunityConfiguration] (wmf/1.43.0-wmf.14) - 10https://gerrit.wikimedia.org/r/1054554 (https://phabricator.wikimedia.org/T368606) (owner: 10Michael Große)
[18:45:53] <logmsgbot>	 !log pt1979@cumin2002 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dbproxy2007.codfw.wmnet with OS bookworm
[18:46:00] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06DBA, 06DC-Ops: Q#:rack/setup/install dbproxy200[5-8] - https://phabricator.wikimedia.org/T362824#9987340 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host dbproxy2007.codfw.wmnet with OS bookworm executed with errors...
[18:46:00] <wikibugs>	 (03PS9) 10Kimberly Sarabia: [July 16th] Enable dark mode for logged out users (tier 1 and tier 2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1050083 (https://phabricator.wikimedia.org/T367150) (owner: 10Jdlrobson)
[18:47:16] <wikibugs>	 (03CR) 10CDanis: [C:03+2] otelcol: use proper Calico selector syntax [deployment-charts] - 10https://gerrit.wikimedia.org/r/1054637 (https://phabricator.wikimedia.org/T365855) (owner: 10CDanis)
[18:49:34] <logmsgbot>	 !log cdanis@deploy1002 helmfile [codfw] START helmfile.d/admin 'apply'.
[18:49:46] <logmsgbot>	 !log cdanis@deploy1002 helmfile [codfw] DONE helmfile.d/admin 'apply'.
[18:49:57] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P66677 and previous config saved to /var/cache/conftool/dbconfig/20240716-184956-arnaudb.json
[18:50:47] <logmsgbot>	 !log cdanis@deploy1002 helmfile [eqiad] START helmfile.d/admin 'apply'.
[18:51:11] <logmsgbot>	 !log cdanis@deploy1002 helmfile [eqiad] DONE helmfile.d/admin 'apply'.
[18:51:15] <wikibugs>	 (03PS3) 10Bking: team-search-platform: migrate cirrus latencies & mem alert [alerts] - 10https://gerrit.wikimedia.org/r/1054374 (https://phabricator.wikimedia.org/T359033) (owner: 10DCausse)
[18:52:51] <wikibugs>	 (03PS4) 10Bking: team-search-platform: migrate cirrus latencies & mem alert [alerts] - 10https://gerrit.wikimedia.org/r/1054374 (https://phabricator.wikimedia.org/T359033) (owner: 10DCausse)
[18:52:54] <wikibugs>	 (03CR) 10Kosta Harlan: Enable temporary accounts on testwiki and loginwiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1054625 (https://phabricator.wikimedia.org/T348895) (owner: 10Tchanders)
[18:53:55] <wikibugs>	 06SRE, 10SRE-Access-Requests, 06Data-Engineering, 10LDAP-Access-Requests: LDAP access to the analytics-privatedata-users group for Quiddity - https://phabricator.wikimedia.org/T370091#9987384 (10KStineRowe_WMF) approved
[18:54:20] <wikibugs>	 (03PS10) 10Kimberly Sarabia: [July 16th] Enable dark mode for logged out users (tier 1 and tier 2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1050083 (https://phabricator.wikimedia.org/T367150) (owner: 10Jdlrobson)
[18:56:37] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance
[18:56:50] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance
[18:56:58] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db2140 (T367856)', diff saved to https://phabricator.wikimedia.org/P66678 and previous config saved to /var/cache/conftool/dbconfig/20240716-185657-marostegui.json
[18:57:02] <stashbot>	 T367856: Cleanup revision table schema - https://phabricator.wikimedia.org/T367856
[18:57:43] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06DBA, 06DC-Ops: Q#:rack/setup/install dbproxy200[5-8] - https://phabricator.wikimedia.org/T362824#9987404 (10Papaul)
[18:57:44] <wikibugs>	 (03PS11) 10Jdlrobson: [July 16th] Enable dark mode for logged out users (tier 1) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1050083 (https://phabricator.wikimedia.org/T367150)
[18:57:48] <wikibugs>	 (03CR) 10Jdlrobson: [C:03+1] [July 16th] Enable dark mode for logged out users (tier 1) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1050083 (https://phabricator.wikimedia.org/T367150) (owner: 10Jdlrobson)
[19:00:23] <wikibugs>	 (03PS1) 10Cwhite: admin: remove unused ssh key for cwhite [puppet] - 10https://gerrit.wikimedia.org/r/1054645
[19:01:12] <wikibugs>	 (03PS1) 10Dzahn: delete integration.mediawiki.org [dns] - 10https://gerrit.wikimedia.org/r/1054646 (https://phabricator.wikimedia.org/T361250)
[19:01:38] <wikibugs>	 (03PS1) 10Bking: elasticsearch: remove obsolete alerts [puppet] - 10https://gerrit.wikimedia.org/r/1054647 (https://phabricator.wikimedia.org/T359033)
[19:02:03] <wikibugs>	 (03CR) 10Bking: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1054647 (https://phabricator.wikimedia.org/T359033) (owner: 10Bking)
[19:05:05] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2156 (T367781)', diff saved to https://phabricator.wikimedia.org/P66679 and previous config saved to /var/cache/conftool/dbconfig/20240716-190504-arnaudb.json
[19:05:06] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 4:00:00 on db2177.codfw.wmnet with reason: Maintenance
[19:05:10] <stashbot>	 T367781: Drop deprecated abuse filter fields on wmf wikis - https://phabricator.wikimedia.org/T367781
[19:05:20] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2177.codfw.wmnet with reason: Maintenance
[19:05:27] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Depooling db2177 (T367781)', diff saved to https://phabricator.wikimedia.org/P66680 and previous config saved to /var/cache/conftool/dbconfig/20240716-190526-arnaudb.json
[19:06:58] <wikibugs>	 (03PS2) 10Bking: elasticsearch: remove obsolete alerts [puppet] - 10https://gerrit.wikimedia.org/r/1054647 (https://phabricator.wikimedia.org/T359033)
[19:07:39] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.reimage for host dbproxy2008.codfw.wmnet with OS bookworm
[19:07:50] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06DBA, 06DC-Ops: Q#:rack/setup/install dbproxy200[5-8] - https://phabricator.wikimedia.org/T362824#9987455 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host dbproxy2008.codfw.wmnet with OS bookworm
[19:08:55] <wikibugs>	 (03PS1) 10Herron: wip [alerts] - 10https://gerrit.wikimedia.org/r/1054649
[19:09:13] <wikibugs>	 (03PS1) 10CDanis: otelcol: use proper Calico selector syntax part2 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1054650 (https://phabricator.wikimedia.org/T365855)
[19:09:38] <wikibugs>	 (03PS2) 10CDanis: otelcol: use proper Calico selector syntax part2 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1054650 (https://phabricator.wikimedia.org/T365855)
[19:11:32] <wikibugs>	 (03CR) 10Ryan Kemper: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1054647 (https://phabricator.wikimedia.org/T359033) (owner: 10Bking)
[19:12:58] <wikibugs>	 (03CR) 10CDanis: [C:03+2] otelcol: use proper Calico selector syntax part2 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1054650 (https://phabricator.wikimedia.org/T365855) (owner: 10CDanis)
[19:14:01] <wikibugs>	 (03CR) 10JHathaway: [C:03+1] "I would add this info to the commit message, otherwise looks good." [puppet] - 10https://gerrit.wikimedia.org/r/1054427 (owner: 10Slyngshede)
[19:15:36] <logmsgbot>	 !log cdanis@deploy1002 helmfile [codfw] START helmfile.d/admin 'apply'.
[19:17:10] <logmsgbot>	 !log cdanis@deploy1002 helmfile [codfw] DONE helmfile.d/admin 'apply'.
[19:18:29] <logmsgbot>	 !log cdanis@deploy1002 helmfile [eqiad] START helmfile.d/admin 'apply'.
[19:18:37] <logmsgbot>	 !log cdanis@deploy1002 helmfile [eqiad] DONE helmfile.d/admin 'apply'.
[19:22:11] <wikibugs>	 (03CR) 10AOkoth: [C:03+2] vrts: fix proxy for download [cookbooks] - 10https://gerrit.wikimedia.org/r/1053761 (https://phabricator.wikimedia.org/T366078) (owner: 10AOkoth)
[19:24:42] <swfrench-wmf>	 !log depooling appservers-ro in eqiad, which is not used by remaining analytics workloads - T367949
[19:24:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:24:48] <stashbot>	 T367949: Spin down api_appserver and appserver clusters - https://phabricator.wikimedia.org/T367949
[19:25:24] <logmsgbot>	 !log swfrench@cumin2002 conftool action : set/pooled=false; selector: dnsdisc=appservers-ro,name=eqiad [reason: Depooling ahead of turndown - T367949]
[19:25:36] <jinxer-wm>	 FIRING: [2x] JobUnavailable: Reduced availability for job netbox_django in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[19:26:02] <wikibugs>	 (03Merged) 10jenkins-bot: vrts: fix proxy for download [cookbooks] - 10https://gerrit.wikimedia.org/r/1053761 (https://phabricator.wikimedia.org/T366078) (owner: 10AOkoth)
[19:26:11] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2177 (T367781)', diff saved to https://phabricator.wikimedia.org/P66681 and previous config saved to /var/cache/conftool/dbconfig/20240716-192610-arnaudb.json
[19:26:14] <stashbot>	 T367781: Drop deprecated abuse filter fields on wmf wikis - https://phabricator.wikimedia.org/T367781
[19:28:39] <jinxer-wm>	 FIRING: [2x] CirrusSearchNodeIndexingNotIncreasing: Elasticsearch instance elastic1098-production-search-eqiad is not indexing - https://wikitech.wikimedia.org/wiki/Search#Indexing_hung_and_not_making_progress - https://grafana.wikimedia.org/d/JLK3I_siz/elasticsearch-indexing?orgId=1&from=now-3d&to=now&viewPanel=57 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchNodeIndexingNotIncreasing
[19:29:23] <wikibugs>	 (03CR) 10JHathaway: [C:03+1] sre.network.tls: use a different client certificate to authenticate [cookbooks] - 10https://gerrit.wikimedia.org/r/1054618 (https://phabricator.wikimedia.org/T355750) (owner: 10Elukey)
[19:30:08] <wikibugs>	 (03CR) 10JHathaway: [C:03+1] C:idm configure 2FA proxy endpoint. [puppet] - 10https://gerrit.wikimedia.org/r/1054502 (owner: 10Slyngshede)
[19:37:13] <jinxer-wm>	 FIRING: [2x] RoutinatorRsyncErrors: Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors
[19:40:42] <wikibugs>	 (03CR) 10CDanis: [C:03+1] "thank you, nice digging" [cookbooks] - 10https://gerrit.wikimedia.org/r/1054618 (https://phabricator.wikimedia.org/T355750) (owner: 10Elukey)
[19:41:18] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P66682 and previous config saved to /var/cache/conftool/dbconfig/20240716-194117-arnaudb.json
[19:43:49] <wikibugs>	 (03CR) 10Ebrahim: [July 16th] Enable dark mode for logged out users (tier 1) (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1050083 (https://phabricator.wikimedia.org/T367150) (owner: 10Jdlrobson)
[19:56:25] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P66683 and previous config saved to /var/cache/conftool/dbconfig/20240716-195624-arnaudb.json
[19:56:32] <wikibugs>	 (03CR) 10Ebrahim: [July 16th] Enable dark mode for logged out users (tier 1) (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1050083 (https://phabricator.wikimedia.org/T367150) (owner: 10Jdlrobson)
[20:00:04] <jouncebot>	 RoanKattouw, Urbanecm, cjming, TheresNoTime, and kindrobot: How many deployers does it take to do UTC late backport window deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240716T2000).
[20:00:04] <jouncebot>	 Seawolf35, jdlrobson, and MichaelG_WMF: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[20:00:12] <urbanecm>	 i can deploy today
[20:00:20] <urbanecm>	 Seawolf35: Jdlrobson: MichaelG_WMF: around?
[20:00:34] <kimberly_sarabia>	 Hey. I'm deploying for Jobn
[20:00:36] <MichaelG_WMF>	 Around :)
[20:00:39] <kimberly_sarabia>	 Jon*
[20:00:50] <urbanecm>	 kimberly_sarabia: ack, i'll ping you once i'm done with the other two patches then?
[20:00:54] <urbanecm>	 unless you wanna drive the window
[20:00:56] <Seawolf35>	 Here, but I am on a phone so not able to debug or anything.
[20:01:17] <kimberly_sarabia>	 yep ping me whenever
[20:01:20] <urbanecm>	 sounds good
[20:01:48] <Seawolf35>	 Though I don’t think my patch should break anything spectacularly
[20:02:08] <urbanecm>	 probably not :). i can test for you, it's a change i asked for anyway :))
[20:02:12] <urbanecm>	 (thanks for the patch!)
[20:02:29] <wikibugs>	 (03CR) 10Urbanecm: [C:03+2] Ensure every test-config has valid defaults [extensions/CommunityConfiguration] (wmf/1.43.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1054558 (owner: 10Michael Große)
[20:02:38] <wikibugs>	 (03CR) 10Urbanecm: [C:03+2] Merge partial config with defaults [extensions/CommunityConfiguration] (wmf/1.43.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1054553 (https://phabricator.wikimedia.org/T368606) (owner: 10Michael Große)
[20:03:05] <wikibugs>	 (03CR) 10Urbanecm: [C:03+2] Merge partial config with defaults [extensions/CommunityConfiguration] (wmf/1.43.0-wmf.14) - 10https://gerrit.wikimedia.org/r/1054554 (https://phabricator.wikimedia.org/T368606) (owner: 10Michael Große)
[20:03:12] <wikibugs>	 (03PS7) 10Seawolf35gerrit: foundationwiki: Restrict `unfuzzy` right to autoconfirmed users [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1054025 (https://phabricator.wikimedia.org/T369979)
[20:03:12] <wikibugs>	 (03CR) 10Welcome, new contributor!: "Thank you for making your first contribution to Wikimedia! :) To learn how to get your code changes reviewed faster and more likely to get" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1054025 (https://phabricator.wikimedia.org/T369979) (owner: 10Seawolf35gerrit)
[20:03:22] <wikibugs>	 (03CR) 10Urbanecm: [C:03+2] foundationwiki: Restrict `unfuzzy` right to autoconfirmed users [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1054025 (https://phabricator.wikimedia.org/T369979) (owner: 10Seawolf35gerrit)
[20:03:45] <Seawolf35>	 Not exactly new, just lost access to my last gerrit account
[20:04:06] <wikibugs>	 (03Merged) 10jenkins-bot: foundationwiki: Restrict `unfuzzy` right to autoconfirmed users [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1054025 (https://phabricator.wikimedia.org/T369979) (owner: 10Seawolf35gerrit)
[20:04:21] <urbanecm>	 Seawolf35: that's unfortunate :-( . email reset didn't work?
[20:04:49] <Seawolf35>	 Uh, lost the email, that’s why I lost access after I forgot the password
[20:05:11] <logmsgbot>	 !log urbanecm@deploy1002 Started scap sync-world: Backport for [[gerrit:1054025|foundationwiki: Restrict `unfuzzy` right to autoconfirmed users (T369979)]]
[20:05:16] <stashbot>	 T369979: foundationwiki: Restrict `unfuzzy` right to autoconfirmed users - https://phabricator.wikimedia.org/T369979
[20:05:27] <urbanecm>	 that wasn't all tho :)
[20:05:35] <urbanecm>	 welcome back Seawolf35!
[20:05:57] <Seawolf35>	 Phone problems
[20:06:02] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Engineering, 06DC-Ops: Degraded RAID on dumpsdata1007 - https://phabricator.wikimedia.org/T369829#9987783 (10Jclark-ctr) a:03Jclark-ctr
[20:06:15] <Seawolf35>	 Every time I turn my phone off it disconnects me
[20:06:23] <RhinosF1>	 Seawolf35: btw, I don't see why you can't be on the CI whitelist. I believe the standard is basically won't upload malicious stuff.
[20:06:29] <RhinosF1>	 Seawolf35: ye irc does that
[20:08:22] <Seawolf35>	 RhinosF1  would be nice to be on the CI whitelist, certainly more convenient than waiting for CI to decide it wants to look at my code.
[20:08:39] <RhinosF1>	 I'm proposing a patch
[20:09:08] <logmsgbot>	 !log urbanecm@deploy1002 seawolf35gerrit, urbanecm: Backport for [[gerrit:1054025|foundationwiki: Restrict `unfuzzy` right to autoconfirmed users (T369979)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[20:09:22] <urbanecm>	 Seawolf35: patch's on mwdebug :)
[20:09:23] <urbanecm>	 looking
[20:09:50] <urbanecm>	 does the trick
[20:09:51] <logmsgbot>	 !log urbanecm@deploy1002 seawolf35gerrit, urbanecm: Continuing with sync
[20:10:03] <Seawolf35>	 RhinosF1 My gerrit account is Seawolf35gerrit, not Seawolf35 just so you know
[20:10:09] <RhinosF1>	 Seawolf35: https://gerrit.wikimedia.org/r/c/integration/config/+/1054657
[20:10:14] <wikibugs>	 (03CR) 10Dreamy Jazz: [C:04-1] "The use of `wmgEnableIPMasking` is in `CommonSettings-labs.php`, which means that this will have no effect for production wikis." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1054625 (https://phabricator.wikimedia.org/T348895) (owner: 10Tchanders)
[20:10:17] <RhinosF1>	 I found you easy
[20:10:18] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10cloud-services-team (Hardware): Q4:rack/setup/install cloudcephosd10[35-38] - https://phabricator.wikimedia.org/T363344#9987799 (10Jclark-ctr) @VRiley-WMF if you can update with 2nd network connection then hand over to @cmooney
[20:10:48] <RhinosF1>	 Has.har will probably deploy it tomorrow
[20:10:57] <RhinosF1>	 It's after 10 for him
[20:10:59] <Seawolf35>	 Thanks!
[20:11:32] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2177 (T367781)', diff saved to https://phabricator.wikimedia.org/P66684 and previous config saved to /var/cache/conftool/dbconfig/20240716-201131-arnaudb.json
[20:11:34] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 4:00:00 on db2190.codfw.wmnet with reason: Maintenance
[20:11:37] <stashbot>	 T367781: Drop deprecated abuse filter fields on wmf wikis - https://phabricator.wikimedia.org/T367781
[20:11:47] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2190.codfw.wmnet with reason: Maintenance
[20:11:54] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Depooling db2190 (T367781)', diff saved to https://phabricator.wikimedia.org/P66685 and previous config saved to /var/cache/conftool/dbconfig/20240716-201153-arnaudb.json
[20:12:10] <logmsgbot>	 !log swfrench@cumin2002 conftool action : set/pooled=true; selector: dnsdisc=appservers-ro,name=eqiad [reason: Repooling to concentrate clients in eqiad - T367949]
[20:12:13] <stashbot>	 T367949: Spin down api_appserver and appserver clusters - https://phabricator.wikimedia.org/T367949
[20:13:42] <MichaelG_WMF>	 Are the wmf- queues slower, or is CommunityConfiguration usually at 25 minutes and I only never noticed?
[20:14:00] <RhinosF1>	 urbanecm: probably something for you after the window ^
[20:14:20] <urbanecm>	 MichaelG_WMF: gate-and-submit should be more or less the same speed for everything. it runs tests for (most) extensions.
[20:14:34] <wikibugs>	 06SRE, 10MW-on-K8s, 06serviceops, 06Traffic, and 2 others: Spin down api_appserver and appserver clusters - https://phabricator.wikimedia.org/T367949#9987802 (10Scott_French) Current status: * appservers-rw and api-rw are depooled everywhere, and resolve to failoid as of 17:45 UTC * api-ro is serving only...
[20:14:42] <logmsgbot>	 !log urbanecm@deploy1002 Finished scap: Backport for [[gerrit:1054025|foundationwiki: Restrict `unfuzzy` right to autoconfirmed users (T369979)]] (duration: 09m 31s)
[20:14:46] <stashbot>	 T369979: foundationwiki: Restrict `unfuzzy` right to autoconfirmed users - https://phabricator.wikimedia.org/T369979
[20:15:01] <urbanecm>	 MichaelG_WMF: the gate-and-submit for the same patch in master says https://integration.wikimedia.org/ci/job/wmf-quibble-vendor-mysql-php74/11754/console : SUCCESS in 24m 04s
[20:15:58] <urbanecm>	 anyway, waiting for rest of CI
[20:15:58] <wikibugs>	 (03PS1) 10BBlack: Add disc-appservers-ro to mock_etc metafo [dns] - 10https://gerrit.wikimedia.org/r/1054658
[20:15:58] <wikibugs>	 (03PS1) 10BBlack: Switch appservers-ro to active/passive [dns] - 10https://gerrit.wikimedia.org/r/1054659
[20:15:59] <wikibugs>	 (03PS1) 10BBlack: Remove disc-appservers-ro from mock_etc geo file [dns] - 10https://gerrit.wikimedia.org/r/1054660
[20:16:11] <MichaelG_WMF>	 urbanecm: Mh. Thanks. Guess I just never noticed that and somehow associated CC with being faster due to its tests being faster than GrowthExperiments?
[20:16:41] <icinga-wm>	 PROBLEM - IPv4 ping to eqsin on ripe-atlas-eqsin is CRITICAL: CRITICAL - failed 40 probes of 793 (alerts on 35) - https://atlas.ripe.net/measurements/11645085/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[20:16:51] <urbanecm>	 MichaelG_WMF: it is faster in the `test` run. that includes only that extension's tests, and CC's tests are faster than GE's.
[20:17:29] <urbanecm>	 but gate-and-submit runs more stuff (more tests, sometimes more PHP versions when we're switching, etc.)
[20:21:04] <MichaelG_WMF>	 Yeah, that I'm aware of. Though I think `test` also includes the extensions that are the dependencies for the tested extension, which is more for CC than GE. But I guess Gate-And-Submit might just be a strict superset of that?
[20:21:28] <MichaelG_WMF>	 *more for GrowthExperiments than CommunityConfiguration
[20:21:41] <icinga-wm>	 RECOVERY - IPv4 ping to eqsin on ripe-atlas-eqsin is OK: OK - failed 28 probes of 793 (alerts on 35) - https://atlas.ripe.net/measurements/11645085/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[20:21:59] <urbanecm>	 MichaelG_WMF: yep, gate-and-submit should run https://gerrit.wikimedia.org/g/integration/config/+/327cd0d698cd8803f65b12891500cd4496dbf631/zuul/parameter_functions.py#956
[20:22:53] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by urbanecm@deploy1002 using scap backport" [extensions/CommunityConfiguration] (wmf/1.43.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1054558 (owner: 10Michael Große)
[20:22:53] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by urbanecm@deploy1002 using scap backport" [extensions/CommunityConfiguration] (wmf/1.43.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1054553 (https://phabricator.wikimedia.org/T368606) (owner: 10Michael Große)
[20:22:53] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by urbanecm@deploy1002 using scap backport" [extensions/CommunityConfiguration] (wmf/1.43.0-wmf.14) - 10https://gerrit.wikimedia.org/r/1054554 (https://phabricator.wikimedia.org/T368606) (owner: 10Michael Große)
[20:23:22] <MichaelG_WMF>	 urbanecm: let's start with wmf.13?
[20:23:39] <urbanecm>	 MichaelG_WMF: i'm pulling all of them in at the same time
[20:23:40] <wikibugs>	 (03PS14) 10CDobbins: varnish: show better error for 429s [puppet] - 10https://gerrit.wikimedia.org/r/1041705 (https://phabricator.wikimedia.org/T354718)
[20:23:50] <MichaelG_WMF>	 alright
[20:23:50] <urbanecm>	 just need CI to merge
[20:25:23] <MichaelG_WMF>	 oh right, I the additional +2 from TrainBranchBot and my mind somehow went Jenkins.
[20:25:31] <MichaelG_WMF>	 2 more Minutes then :)
[20:25:37] <urbanecm>	 yeah. 
[20:25:56] <wikibugs>	 (03PS6) 10BCornwall: Add public suffix list module [puppet] - 10https://gerrit.wikimedia.org/r/1054069
[20:25:56] <wikibugs>	 (03PS7) 10BCornwall: ncmonitor: Set path for public suffix domain list [puppet] - 10https://gerrit.wikimedia.org/r/1054073 (https://phabricator.wikimedia.org/T369114)
[20:27:16] <wikibugs>	 (03Merged) 10jenkins-bot: Ensure every test-config has valid defaults [extensions/CommunityConfiguration] (wmf/1.43.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1054558 (owner: 10Michael Große)
[20:27:17] <wikibugs>	 (03Merged) 10jenkins-bot: Merge partial config with defaults [extensions/CommunityConfiguration] (wmf/1.43.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1054553 (https://phabricator.wikimedia.org/T368606) (owner: 10Michael Große)
[20:27:32] <urbanecm>	 finally
[20:27:39] <urbanecm>	 one more patch...
[20:27:56] <logmsgbot>	 !log pt1979@cumin2002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy2008.codfw.wmnet with OS bookworm
[20:28:08] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06DBA, 06DC-Ops: Q#:rack/setup/install dbproxy200[5-8] - https://phabricator.wikimedia.org/T362824#9987852 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host dbproxy2008.codfw.wmnet with OS bookworm executed with errors...
[20:28:53] <wikibugs>	 (03Merged) 10jenkins-bot: Merge partial config with defaults [extensions/CommunityConfiguration] (wmf/1.43.0-wmf.14) - 10https://gerrit.wikimedia.org/r/1054554 (https://phabricator.wikimedia.org/T368606) (owner: 10Michael Große)
[20:29:25] <logmsgbot>	 !log urbanecm@deploy1002 Started scap sync-world: Backport for [[gerrit:1054558|Ensure every test-config has valid defaults]], [[gerrit:1054553|Merge partial config with defaults (T368606)]], [[gerrit:1054554|Merge partial config with defaults (T368606)]]
[20:29:30] <stashbot>	 T368606: Community configuration defaults are not merged with partially-specified objects - https://phabricator.wikimedia.org/T368606
[20:29:42] <wikibugs>	 (03PS8) 10BCornwall: ncmonitor: Set path for public suffix domain list [puppet] - 10https://gerrit.wikimedia.org/r/1054073 (https://phabricator.wikimedia.org/T369114)
[20:30:41] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.reimage for host dbproxy2008.codfw.wmnet with OS bookworm
[20:30:51] <logmsgbot>	 !log pt1979@cumin2002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host dbproxy2008.codfw.wmnet with OS bookworm
[20:31:07] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06DBA, 06DC-Ops: Q#:rack/setup/install dbproxy200[5-8] - https://phabricator.wikimedia.org/T362824#9987858 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host dbproxy2008.codfw.wmnet with OS bookworm
[20:31:09] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06DBA, 06DC-Ops: Q#:rack/setup/install dbproxy200[5-8] - https://phabricator.wikimedia.org/T362824#9987859 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host dbproxy2008.codfw.wmnet with OS bookworm executed with errors...
[20:32:51] <wikibugs>	 (03PS1) 10JHathaway: pcc-puppetdb: remove java pinning [puppet] - 10https://gerrit.wikimedia.org/r/1054661 (https://phabricator.wikimedia.org/T367547)
[20:33:12] <wikibugs>	 (03CR) 10JHathaway: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1054661 (https://phabricator.wikimedia.org/T367547) (owner: 10JHathaway)
[20:33:13] <logmsgbot>	 !log urbanecm@deploy1002 urbanecm, migr: Backport for [[gerrit:1054558|Ensure every test-config has valid defaults]], [[gerrit:1054553|Merge partial config with defaults (T368606)]], [[gerrit:1054554|Merge partial config with defaults (T368606)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[20:33:31] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2190 (T367781)', diff saved to https://phabricator.wikimedia.org/P66686 and previous config saved to /var/cache/conftool/dbconfig/20240716-203331-arnaudb.json
[20:33:33] <urbanecm>	 MichaelG_WMF: can you take a look and test please? :)
[20:33:35] <stashbot>	 T367781: Drop deprecated abuse filter fields on wmf wikis - https://phabricator.wikimedia.org/T367781
[20:34:11] <MichaelG_WMF>	 urbanecm: works for both testwiki as well as eswiki \o/
[20:34:31] <urbanecm>	 yay!
[20:34:32] <logmsgbot>	 !log urbanecm@deploy1002 urbanecm, migr: Continuing with sync
[20:34:33] <MichaelG_WMF>	 that is, both wmf.14 as well as wmf.13 respectively
[20:38:57] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.reimage for host dbproxy2008.codfw.wmnet with OS bookworm
[20:39:05] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06DBA, 06DC-Ops: Q#:rack/setup/install dbproxy200[5-8] - https://phabricator.wikimedia.org/T362824#9987937 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host dbproxy2008.codfw.wmnet with OS bookworm
[20:39:20] <logmsgbot>	 !log urbanecm@deploy1002 Finished scap: Backport for [[gerrit:1054558|Ensure every test-config has valid defaults]], [[gerrit:1054553|Merge partial config with defaults (T368606)]], [[gerrit:1054554|Merge partial config with defaults (T368606)]] (duration: 09m 55s)
[20:39:24] <stashbot>	 T368606: Community configuration defaults are not merged with partially-specified objects - https://phabricator.wikimedia.org/T368606
[20:39:36] <urbanecm>	 MichaelG_WMF: and live! :)
[20:39:39] <urbanecm>	 kimberly_sarabia: over to you :)
[20:40:17] <MichaelG_WMF>	 urbanecm: Thanks, confirmed 👍
[20:40:28] <kimberly_sarabia>	 Ok I'm here
[20:42:52] <urbanecm>	 kimberly_sarabia: I thought you were going to deploy your patch?
[20:43:02] <urbanecm>	 Or do you want me to deploy for you?
[20:43:22] <kimberly_sarabia>	 yes can you deploy for me? sorry for the confusion. I haven't been trained yet on that
[20:43:52] <urbanecm>	 Oh, no problem. Sorry, I misunderstood. 
[20:43:55] <urbanecm>	 Let's get started!
[20:44:08] <wikibugs>	 (03PS7) 10BCornwall: Add public suffix list module [puppet] - 10https://gerrit.wikimedia.org/r/1054069
[20:44:08] <wikibugs>	 (03PS9) 10BCornwall: ncmonitor: Set path for public suffix domain list [puppet] - 10https://gerrit.wikimedia.org/r/1054073 (https://phabricator.wikimedia.org/T369114)
[20:44:20] <wikibugs>	 (03PS12) 10Jdlrobson: [July 16th] Enable dark mode for logged out users (tier 1) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1050083 (https://phabricator.wikimedia.org/T367150)
[20:44:42] <wikibugs>	 (03CR) 10Urbanecm: [C:03+2] [July 16th] Enable dark mode for logged out users (tier 1) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1050083 (https://phabricator.wikimedia.org/T367150) (owner: 10Jdlrobson)
[20:45:00] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by urbanecm@deploy1002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1050083 (https://phabricator.wikimedia.org/T367150) (owner: 10Jdlrobson)
[20:45:23] <wikibugs>	 (03Merged) 10jenkins-bot: [July 16th] Enable dark mode for logged out users (tier 1) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1050083 (https://phabricator.wikimedia.org/T367150) (owner: 10Jdlrobson)
[20:45:53] <logmsgbot>	 !log urbanecm@deploy1002 Started scap sync-world: Backport for [[gerrit:1050083|[July 16th] Enable dark mode for logged out users (tier 1) (T367150)]]
[20:46:03] <stashbot>	 T367150: Deploy dark mode to logged-out users in tier 1 and 2 wikis on the Vector2022 and Minerva skin - https://phabricator.wikimedia.org/T367150
[20:48:27] <logmsgbot>	 !log urbanecm@deploy1002 urbanecm, jdlrobson: Backport for [[gerrit:1050083|[July 16th] Enable dark mode for logged out users (tier 1) (T367150)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[20:48:37] <urbanecm>	 kimberly_sarabia: can you test at mwdebug, please?
[20:48:38] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P66687 and previous config saved to /var/cache/conftool/dbconfig/20240716-204838-arnaudb.json
[20:49:35] <kimberly_sarabia>	 urbanecm: LGTM
[20:49:43] <urbanecm>	 proceeding
[20:49:45] <logmsgbot>	 !log urbanecm@deploy1002 urbanecm, jdlrobson: Continuing with sync
[20:54:36] <logmsgbot>	 !log urbanecm@deploy1002 Finished scap: Backport for [[gerrit:1050083|[July 16th] Enable dark mode for logged out users (tier 1) (T367150)]] (duration: 08m 43s)
[20:54:41] <stashbot>	 T367150: Deploy dark mode to logged-out users in tier 1 and 2 wikis on the Vector2022 and Minerva skin - https://phabricator.wikimedia.org/T367150
[20:54:46] <urbanecm>	 kimberly_sarabia: it's live :). 
[20:55:29] <kimberly_sarabia>	 urbanecm: Great! Thanks
[20:55:49] <urbanecm>	 no problem!
[21:03:14] <wikibugs>	 (03PS8) 10BCornwall: Add public suffix list module [puppet] - 10https://gerrit.wikimedia.org/r/1054069
[21:03:14] <wikibugs>	 (03PS10) 10BCornwall: ncmonitor: Set path for public suffix domain list [puppet] - 10https://gerrit.wikimedia.org/r/1054073 (https://phabricator.wikimedia.org/T369114)
[21:03:18] <kimberly_sarabia>	 Sorry for the dumb question but the changes I saw in mwdebug for enwiki, zhwiki, etc. are weirdly not showing in prod except for testwiki? Did we miss something?
[21:03:45] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P66688 and previous config saved to /var/cache/conftool/dbconfig/20240716-210345-arnaudb.json
[21:04:05] <kimberly_sarabia>	 oops scratch that
[21:04:51] <kimberly_sarabia>	 oh never mind, still not seeing changes outside of mwdebug. let me know if anyone has ideas
[21:05:14] <wikibugs>	 (03PS9) 10BCornwall: Add public suffix list module [puppet] - 10https://gerrit.wikimedia.org/r/1054069
[21:05:14] <wikibugs>	 (03PS11) 10BCornwall: ncmonitor: Set path for public suffix domain list [puppet] - 10https://gerrit.wikimedia.org/r/1054073 (https://phabricator.wikimedia.org/T369114)
[21:18:53] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2190 (T367781)', diff saved to https://phabricator.wikimedia.org/P66689 and previous config saved to /var/cache/conftool/dbconfig/20240716-211852-arnaudb.json
[21:18:55] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 4:00:00 on db2194.codfw.wmnet with reason: Maintenance
[21:18:57] <stashbot>	 T367781: Drop deprecated abuse filter fields on wmf wikis - https://phabricator.wikimedia.org/T367781
[21:19:08] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2194.codfw.wmnet with reason: Maintenance
[21:19:15] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Depooling db2194 (T367781)', diff saved to https://phabricator.wikimedia.org/P66690 and previous config saved to /var/cache/conftool/dbconfig/20240716-211914-arnaudb.json
[21:21:00] <wikibugs>	 (03PS15) 10CDobbins: varnish: show better error for 429s [puppet] - 10https://gerrit.wikimedia.org/r/1041705 (https://phabricator.wikimedia.org/T354718)
[21:33:40] <jinxer-wm>	 FIRING: SystemdUnitFailed: docker-reporter-base-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[21:35:56] <wikibugs>	 (03PS1) 10Scott French: DNM: service: (appserver|api)-ro to active-passive [puppet] - 10https://gerrit.wikimedia.org/r/1054667
[21:37:12] <wikibugs>	 (03CR) 10Scott French: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1054667 (owner: 10Scott French)
[21:40:55] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2194 (T367781)', diff saved to https://phabricator.wikimedia.org/P66691 and previous config saved to /var/cache/conftool/dbconfig/20240716-214054-arnaudb.json
[21:40:59] <stashbot>	 T367781: Drop deprecated abuse filter fields on wmf wikis - https://phabricator.wikimedia.org/T367781
[21:46:55] <wikibugs>	 (03PS16) 10CDobbins: varnish: show better error for 429s [puppet] - 10https://gerrit.wikimedia.org/r/1041705 (https://phabricator.wikimedia.org/T354718)
[21:56:02] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P66692 and previous config saved to /var/cache/conftool/dbconfig/20240716-215601-arnaudb.json
[21:59:13] <logmsgbot>	 !log pt1979@cumin2002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy2008.codfw.wmnet with OS bookworm
[21:59:26] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06DBA, 06DC-Ops: Q#:rack/setup/install dbproxy200[5-8] - https://phabricator.wikimedia.org/T362824#9988300 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host dbproxy2008.codfw.wmnet with OS bookworm executed with errors...
[22:11:09] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P66693 and previous config saved to /var/cache/conftool/dbconfig/20240716-221109-arnaudb.json
[22:26:16] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2194 (T367781)', diff saved to https://phabricator.wikimedia.org/P66694 and previous config saved to /var/cache/conftool/dbconfig/20240716-222616-arnaudb.json
[22:26:18] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 4:00:00 on db2209.codfw.wmnet with reason: Maintenance
[22:26:20] <stashbot>	 T367781: Drop deprecated abuse filter fields on wmf wikis - https://phabricator.wikimedia.org/T367781
[22:26:31] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2209.codfw.wmnet with reason: Maintenance
[22:26:39] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Depooling db2209 (T367781)', diff saved to https://phabricator.wikimedia.org/P66695 and previous config saved to /var/cache/conftool/dbconfig/20240716-222638-arnaudb.json
[22:40:23] <tzatziki>	 !log removing 9 files for legal compliance
[22:40:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:43:36] <wikibugs>	 (03PS11) 10BCornwall: Add public suffix list module [puppet] - 10https://gerrit.wikimedia.org/r/1054069
[22:48:15] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2209 (T367781)', diff saved to https://phabricator.wikimedia.org/P66696 and previous config saved to /var/cache/conftool/dbconfig/20240716-224815-arnaudb.json
[22:48:20] <stashbot>	 T367781: Drop deprecated abuse filter fields on wmf wikis - https://phabricator.wikimedia.org/T367781
[23:03:22] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P66697 and previous config saved to /var/cache/conftool/dbconfig/20240716-230322-arnaudb.json
[23:18:29] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P66698 and previous config saved to /var/cache/conftool/dbconfig/20240716-231829-arnaudb.json
[23:28:54] <jinxer-wm>	 FIRING: [2x] CirrusSearchNodeIndexingNotIncreasing: Elasticsearch instance elastic1098-production-search-eqiad is not indexing - https://wikitech.wikimedia.org/wiki/Search#Indexing_hung_and_not_making_progress - https://grafana.wikimedia.org/d/JLK3I_siz/elasticsearch-indexing?orgId=1&from=now-3d&to=now&viewPanel=57 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchNodeIndexingNotIncreasing
[23:29:19] <jinxer-wm>	 FIRING: [2x] JobUnavailable: Reduced availability for job netbox_django in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[23:33:36] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2209 (T367781)', diff saved to https://phabricator.wikimedia.org/P66699 and previous config saved to /var/cache/conftool/dbconfig/20240716-233336-arnaudb.json
[23:33:41] <stashbot>	 T367781: Drop deprecated abuse filter fields on wmf wikis - https://phabricator.wikimedia.org/T367781
[23:35:27] <wikibugs>	 (03CR) 10Cwhite: [C:03+2] admin: remove unused ssh key for cwhite [puppet] - 10https://gerrit.wikimedia.org/r/1054645 (owner: 10Cwhite)
[23:37:13] <jinxer-wm>	 FIRING: [2x] RoutinatorRsyncErrors: Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors
[23:38:33] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1054682
[23:38:33] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1054682 (owner: 10TrainBranchBot)
[23:44:09] <wikibugs>	 (03PS4) 10Pppery: Update source strings for 2024.19 [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1054635 (https://phabricator.wikimedia.org/T363188)
[23:50:10] <wikibugs>	 (03PS12) 10BCornwall: Add public suffix list module [puppet] - 10https://gerrit.wikimedia.org/r/1054069
[23:57:36] <wikibugs>	 (03CR) 10BCornwall: "I tested on ncmonitor1001 and verified functionality." [puppet] - 10https://gerrit.wikimedia.org/r/1054069 (owner: 10BCornwall)
[23:57:45] <wikibugs>	 (03PS15) 10BCornwall: ncmonitor: Set path for public suffix domain list [puppet] - 10https://gerrit.wikimedia.org/r/1054073 (https://phabricator.wikimedia.org/T369114)
[23:58:53] <wikibugs>	 (03PS1) 10Kimberly Sarabia: skin-themes dblist is expanded to include tier 2 wikis as well as tier 1. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1054685 (https://phabricator.wikimedia.org/T367150)