[00:16:26] <urbanecm>	 !log mwmaint2002: Stop T315510#9312431 instances of extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php (T315510)
[00:16:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:16:30] <stashbot>	 T315510: Start maintenance script to backfill talk page comment database - https://phabricator.wikimedia.org/T315510
[00:39:06] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/972506
[00:39:12] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/972506 (owner: 10TrainBranchBot)
[00:57:53] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/972506 (owner: 10TrainBranchBot)
[01:57:35] <icinga-wm>	 PROBLEM - Check unit status of httpbb_kubernetes_mw-api-ext_hourly on cumin1001 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-api-ext_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[01:58:09] <icinga-wm>	 PROBLEM - Check systemd state on cumin1001 is CRITICAL: CRITICAL - degraded: The following units failed: httpbb_kubernetes_mw-api-ext_hourly.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[02:12:11] <icinga-wm>	 PROBLEM - BGP status on cr1-esams is CRITICAL: BGP CRITICAL - AS6939/IPv4: Connect - HE, AS6939/IPv6: Connect - HE https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[02:28:53] <wikibugs>	 (03CR) 10Pppery: "Note for reviewers: This repository is not set up with Jenkins (except for L10n-bot commits), so any patches will need to be manually give" [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/969515 (https://phabricator.wikimedia.org/T294754) (owner: 10Pppery)
[02:34:45] <icinga-wm>	 PROBLEM - Host mr1-esams.oob IPv6 is DOWN: PING CRITICAL - Packet loss = 100%
[02:37:07] <icinga-wm>	 PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 195 probes of 720 (alerts on 90) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[02:38:12] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job ldap in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:45:47] <icinga-wm>	 RECOVERY - Host mr1-esams.oob IPv6 is UP: PING OK - Packet loss = 0%, RTA = 84.39 ms
[02:47:55] <icinga-wm>	 RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 60 probes of 720 (alerts on 90) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[02:56:09] <icinga-wm>	 RECOVERY - Check systemd state on cumin1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[02:59:59] <icinga-wm>	 RECOVERY - Check unit status of httpbb_kubernetes_mw-api-ext_hourly on cumin1001 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-api-ext_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[03:08:12] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job ldap in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[03:53:12] <jinxer-wm>	 (PuppetFailure) firing: Puppet has failed on lists1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure
[04:01:58] <wikibugs>	 (03CR) 10DannyS712: wm-checks-api: add PCC build outcome (031 comment) [software/gerrit] (deploy/wmf/stable-3.5) - 10https://gerrit.wikimedia.org/r/969981 (owner: 10Hashar)
[04:07:25] <wikibugs>	 (03PS1) 10Samwilson: planet: Add Wikimedia Australia feed [puppet] - 10https://gerrit.wikimedia.org/r/972534
[04:21:51] <wikibugs>	 (03PS1) 10RLazarus: k8s-controller-sidecars: Initial release [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/972535
[05:10:35] <icinga-wm>	 PROBLEM - mailman list info on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[05:11:27] <icinga-wm>	 PROBLEM - mailman archives on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[05:11:51] <icinga-wm>	 RECOVERY - mailman list info on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8571 bytes in 0.278 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[05:14:07] <icinga-wm>	 RECOVERY - mailman archives on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 50860 bytes in 0.102 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[05:27:23] <wikibugs>	 (03PS2) 10RLazarus: k8s-controller-sidecars: Initial release [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/972535
[05:31:00] <wikibugs>	 (03PS3) 10RLazarus: k8s-controller-sidecars: Initial release [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/972535
[07:00:04] <jouncebot>	 Deploy window MediaWiki infrastucture (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20231108T0700)
[07:08:12] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job ldap in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[07:40:42] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Add Puppet aliases for hosts running Puppet 5 and Puppet 7 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/972411 (https://phabricator.wikimedia.org/T349619) (owner: 10Muehlenhoff)
[07:41:18] <wikibugs>	 (03CR) 10DCausse: [C: 04-1] staging-eqiad: raise rdf-streaming-updater quota (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/972483 (https://phabricator.wikimedia.org/T349095) (owner: 10Bking)
[07:42:50] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.puppet.migrate-role for role: zookeeper::test
[07:43:20] <wikibugs>	 (03CR) 10DCausse: rdf-streaming-updater: update values for application mode (034 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/967229 (https://phabricator.wikimedia.org/T349095) (owner: 10Bking)
[07:47:00] <wikibugs>	 (03CR) 10Elukey: changeprop: set num_workers to zero (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/971225 (https://phabricator.wikimedia.org/T348950) (owner: 10Elukey)
[07:47:14] <wikibugs>	 (03PS1) 10Muehlenhoff: Switch zookeeper::test to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/972690 (https://phabricator.wikimedia.org/T349619)
[07:51:37] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Switch zookeeper::test to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/972690 (https://phabricator.wikimedia.org/T349619) (owner: 10Muehlenhoff)
[07:53:12] <jinxer-wm>	 (PuppetFailure) firing: Puppet has failed on lists1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure
[07:56:03] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: zookeeper::test
[07:58:36] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.puppet.migrate-role for role: analytics_cluster::turnilo::staging
[08:00:05] <jouncebot>	 Amir1, Urbanecm, and taavi: gettimeofday() says it's time for UTC morning backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20231108T0800)
[08:00:05] <jouncebot>	 No Gerrit patches in the queue for this window AFAICS.
[08:00:05] <wikibugs>	 (03PS1) 10Muehlenhoff: Switch analytics_cluster::turnilo::staging to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/972691 (https://phabricator.wikimedia.org/T349619)
[08:00:09] <wikibugs>	 (03PS1) 10Slyngshede: Alert on degraded MD RAID devices. [alerts] - 10https://gerrit.wikimedia.org/r/972692 (https://phabricator.wikimedia.org/T350694)
[08:02:51] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Switch analytics_cluster::turnilo::staging to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/972691 (https://phabricator.wikimedia.org/T349619) (owner: 10Muehlenhoff)
[08:02:56] <wikibugs>	 (03PS2) 10Slyngshede: Alert on degraded MD RAID devices. [alerts] - 10https://gerrit.wikimedia.org/r/972692 (https://phabricator.wikimedia.org/T350694)
[08:06:13] <wikibugs>	 (03CR) 10Slyngshede: "Severity might be a little high, but we can adjust that." [alerts] - 10https://gerrit.wikimedia.org/r/972692 (https://phabricator.wikimedia.org/T350694) (owner: 10Slyngshede)
[08:08:16] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: analytics_cluster::turnilo::staging
[08:11:54] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] logstash: increase heap to 4g [puppet] - 10https://gerrit.wikimedia.org/r/972456 (https://phabricator.wikimedia.org/T350434) (owner: 10Herron)
[08:16:06] <wikibugs>	 (03CR) 10Filippo Giunchedi: "LGTM as a starting point, after tuning/trial we can even switch to per-team alerts if desired" [alerts] - 10https://gerrit.wikimedia.org/r/972692 (https://phabricator.wikimedia.org/T350694) (owner: 10Slyngshede)
[08:16:45] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.puppet.migrate-role for role: druid::test_analytics::worker
[08:17:24] <wikibugs>	 10SRE, 10SRE-tools, 10Infrastructure-Foundations, 10Puppet-Core, and 2 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619 (10MoritzMuehlenhoff)
[08:19:03] <wikibugs>	 (03CR) 10Filippo Giunchedi: prometheus-puppet-agent-stats: this timer sometime fails (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/971946 (owner: 10Jbond)
[08:20:32] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/971187 (https://phabricator.wikimedia.org/T347593) (owner: 10EoghanGaffney)
[08:22:31] <wikibugs>	 (03PS1) 10Muehlenhoff: Switch druid::test_analytics::worker to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/972693 (https://phabricator.wikimedia.org/T349619)
[08:23:33] <wikibugs>	 10SRE, 10SRE-tools, 10Infrastructure-Foundations, 10Puppet-Core, and 2 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619 (10MoritzMuehlenhoff)
[08:23:41] <wikibugs>	 (03PS3) 10Filippo Giunchedi: alertmanager: add alerts-triage on /triage [puppet] - 10https://gerrit.wikimedia.org/r/972335 (https://phabricator.wikimedia.org/T350014)
[08:23:43] <wikibugs>	 (03CR) 10Filippo Giunchedi: alertmanager: add alerts-triage on /triage (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/972335 (https://phabricator.wikimedia.org/T350014) (owner: 10Filippo Giunchedi)
[08:24:07] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Switch druid::test_analytics::worker to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/972693 (https://phabricator.wikimedia.org/T349619) (owner: 10Muehlenhoff)
[08:26:32] <logmsgbot>	 !log arnaudb@cumin1001 dbctl commit (dc=all): 'db1236 (re)pooling @ 15%: Host warmup', diff saved to https://phabricator.wikimedia.org/P53164 and previous config saved to /var/cache/conftool/dbconfig/20231108-082631-arnaudb.json
[08:28:23] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: druid::test_analytics::worker
[08:33:25] <icinga-wm>	 PROBLEM - Check systemd state on ganeti3007 is CRITICAL: CRITICAL - degraded: The following units failed: export_smart_data_dump.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[08:36:38] <_joe_>	 jouncebot: nowandnext
[08:36:38] <jouncebot>	 For the next 0 hour(s) and 23 minute(s): UTC morning backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20231108T0800)
[08:36:38] <jouncebot>	 In 0 hour(s) and 23 minute(s): MediaWiki train - Utc-0+Utc-7 Version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20231108T0900)
[08:37:01] <_joe_>	 urbanecm: can I commander the remaining time to make a mw on k8s change?
[08:37:12] <_joe_>	 given there were no backport patches
[08:37:17] <urbanecm>	 _joe_: sure thing
[08:37:17] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] mw-jobrunner: add virtualhost explicitly for jobrunning [deployment-charts] - 10https://gerrit.wikimedia.org/r/968955 (https://phabricator.wikimedia.org/T349796) (owner: 10Giuseppe Lavagetto)
[08:37:23] <_joe_>	 thanks :)
[08:38:07] <wikibugs>	 (03Merged) 10jenkins-bot: mw-jobrunner: add virtualhost explicitly for jobrunning [deployment-charts] - 10https://gerrit.wikimedia.org/r/968955 (https://phabricator.wikimedia.org/T349796) (owner: 10Giuseppe Lavagetto)
[08:41:37] <logmsgbot>	 !log arnaudb@cumin1001 dbctl commit (dc=all): 'db1236 (re)pooling @ 30%: Host warmup', diff saved to https://phabricator.wikimedia.org/P53165 and previous config saved to /var/cache/conftool/dbconfig/20231108-084136-arnaudb.json
[08:42:41] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Sounds fine to me!" [puppet] - 10https://gerrit.wikimedia.org/r/972461 (https://phabricator.wikimedia.org/T349228) (owner: 10Eevans)
[08:49:09] <wikibugs>	 (03CR) 10Brouberol: [C: 03+1] "The code LGTM. I can't speak for the feature." [puppet] - 10https://gerrit.wikimedia.org/r/969341 (https://phabricator.wikimedia.org/T349910) (owner: 10Btullis)
[08:49:30] <logmsgbot>	 !log oblivian@deploy2002 helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
[08:49:30] <logmsgbot>	 !log oblivian@deploy2002 helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
[08:49:51] <logmsgbot>	 !log oblivian@deploy2002 helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
[08:49:53] <logmsgbot>	 !log oblivian@deploy2002 helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
[08:51:05] <moritzm>	 !log installing openjdk-8 security updates
[08:51:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:52:23] <wikibugs>	 (03PS3) 10Slyngshede: Alert on degraded MD RAID devices. [alerts] - 10https://gerrit.wikimedia.org/r/972692 (https://phabricator.wikimedia.org/T350694)
[08:53:13] <wikibugs>	 (03CR) 10Slyngshede: Alert on degraded MD RAID devices. (031 comment) [alerts] - 10https://gerrit.wikimedia.org/r/972692 (https://phabricator.wikimedia.org/T350694) (owner: 10Slyngshede)
[08:54:32] <logmsgbot>	 !log ayounsi@cumin1001 START - Cookbook sre.network.peering with action 'email' for AS: 45899
[08:54:58] <wikibugs>	 10SRE, 10SRE-tools, 10Infrastructure-Foundations, 10Puppet-Core, and 2 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619 (10MoritzMuehlenhoff)
[08:55:18] <logmsgbot>	 !log ayounsi@cumin1001 END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 45899
[08:55:25] <moritzm>	 !log restarting archiva to pick up Java security updates
[08:55:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:56:42] <logmsgbot>	 !log arnaudb@cumin1001 dbctl commit (dc=all): 'db1236 (re)pooling @ 45%: Host warmup', diff saved to https://phabricator.wikimedia.org/P53166 and previous config saved to /var/cache/conftool/dbconfig/20231108-085641-arnaudb.json
[08:59:57] <wikibugs>	 10SRE, 10Data-Engineering, 10Data-Platform-SRE: Grant IdempotentWrite Kafka Cluster ACL to User:ANONYMOUS in all Kafka clusters - https://phabricator.wikimedia.org/T334733 (10brouberol) I saw that neither `kafka-logging` nor `kafka-test` have ACLs at all:  ` # codfw brouberol@kafka-logging2003:~$ kafka acls...
[09:00:05] <jouncebot>	 jnuche and dduvall: #bothumor My software never has bugs. It just develops random features. Rise for MediaWiki train - Utc-0+Utc-7 Version. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20231108T0900).
[09:01:57] <wikibugs>	 (03PS1) 10Ayounsi: sre.network.peering: Add Auto-Submitted email header [cookbooks] - 10https://gerrit.wikimedia.org/r/972696 (https://phabricator.wikimedia.org/T347835)
[09:02:39] <logmsgbot>	 !log oblivian@deploy2002 helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
[09:02:39] <logmsgbot>	 !log oblivian@deploy2002 helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
[09:02:51] <logmsgbot>	 !log oblivian@deploy2002 helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
[09:02:51] <logmsgbot>	 !log oblivian@deploy2002 helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
[09:03:23] <_joe_>	 jnuche: when you're done with the train, please ping me; I have further fixes to make to mw-on-k8s
[09:03:25] <jnuche>	 _joe_: morning, are you done with that mw on k8s change?
[09:03:33] <_joe_>	 yes yes I am
[09:03:36] <jnuche>	 sure thing, will do
[09:03:38] <jnuche>	 thx
[09:03:46] <wikibugs>	 10SRE, 10serviceops: Rebuild PHP 7.4 packages for Bullseye - https://phabricator.wikimedia.org/T350767 (10MoritzMuehlenhoff)
[09:03:47] <_joe_>	 but I noticed another minor bug
[09:04:30] <wikibugs>	 (03PS1) 10TrainBranchBot: group1 wikis to 1.42.0-wmf.4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/972697 (https://phabricator.wikimedia.org/T350080)
[09:04:32] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] group1 wikis to 1.42.0-wmf.4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/972697 (https://phabricator.wikimedia.org/T350080) (owner: 10TrainBranchBot)
[09:05:19] <wikibugs>	 (03Merged) 10jenkins-bot: group1 wikis to 1.42.0-wmf.4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/972697 (https://phabricator.wikimedia.org/T350080) (owner: 10TrainBranchBot)
[09:05:55] <wikibugs>	 10SRE, 10SRE-tools, 10Infrastructure-Foundations, 10Puppet-Core, and 2 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619 (10MoritzMuehlenhoff)
[09:11:46] <logmsgbot>	 !log arnaudb@cumin1001 dbctl commit (dc=all): 'db1236 (re)pooling @ 60%: Host warmup', diff saved to https://phabricator.wikimedia.org/P53167 and previous config saved to /var/cache/conftool/dbconfig/20231108-091146-arnaudb.json
[09:11:52] <logmsgbot>	 !log jnuche@deploy2002 rebuilt and synchronized wikiversions files: group1 wikis to 1.42.0-wmf.4  refs T350080
[09:13:55] <wikibugs>	 10SRE, 10serviceops: Rebuild PHP 7.4 packages for Bullseye - https://phabricator.wikimedia.org/T350767 (10MoritzMuehlenhoff)
[09:16:52] <wikibugs>	 (03CR) 10Gehel: [C: 03+1] "Let's try again!" [puppet] - 10https://gerrit.wikimedia.org/r/972250 (owner: 10Stevemunene)
[09:17:29] <logmsgbot>	 !log jnuche@deploy2002 Synchronized php: group1 wikis to 1.42.0-wmf.4  refs T350080 (duration: 05m 36s)
[09:18:01] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to `discovery.processed_external_sparql_query` for AndrewTavis_WMDE - https://phabricator.wikimedia.org/T350426 (10Arnoldokoth) Hey. I'm having a hard time interpreting whether this is still stalled (maybe I'm misinterpreting the discussions or getting mixed mess...
[09:20:30] <wikibugs>	 (03CR) 10Nikerabbit: [C: 03+1] Avoid trailing newline in qqq.json [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/969515 (https://phabricator.wikimedia.org/T294754) (owner: 10Pppery)
[09:22:46] <wikibugs>	 (03PS1) 10TrainBranchBot: group0 wikis to 1.42.0-wmf.4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/972699 (https://phabricator.wikimedia.org/T350080)
[09:22:48] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] group0 wikis to 1.42.0-wmf.4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/972699 (https://phabricator.wikimedia.org/T350080) (owner: 10TrainBranchBot)
[09:23:27] <wikibugs>	 (03CR) 10Stevemunene: [V: 03+1 C: 03+2] Revert "Revert "Revert "Revert "airflow-wmde: Create scap deployment source for wmde"""" [puppet] - 10https://gerrit.wikimedia.org/r/972250 (owner: 10Stevemunene)
[09:23:29] <wikibugs>	 (03PS1) 10Arnaudb: mariadb: add db1238 and prepare db1138 retirement [puppet] - 10https://gerrit.wikimedia.org/r/972507 (https://phabricator.wikimedia.org/T344036)
[09:23:35] <wikibugs>	 (03Merged) 10jenkins-bot: group0 wikis to 1.42.0-wmf.4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/972699 (https://phabricator.wikimedia.org/T350080) (owner: 10TrainBranchBot)
[09:26:51] <logmsgbot>	 !log arnaudb@cumin1001 dbctl commit (dc=all): 'db1236 (re)pooling @ 75%: Host warmup', diff saved to https://phabricator.wikimedia.org/P53168 and previous config saved to /var/cache/conftool/dbconfig/20231108-092651-arnaudb.json
[09:29:46] <logmsgbot>	 !log jnuche@deploy2002 rebuilt and synchronized wikiversions files: group0 wikis to 1.42.0-wmf.4  refs T350080
[09:30:24] <jnuche>	 I had to roll back the train
[09:30:25] <icinga-wm>	 RECOVERY - Check systemd state on ganeti3007 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[09:30:31] <jnuche>	 _joe_: I'm done for now
[09:30:46] <_joe_>	 jnuche: thanks
[09:41:56] <logmsgbot>	 !log arnaudb@cumin1001 dbctl commit (dc=all): 'db1236 (re)pooling @ 90%: Host warmup', diff saved to https://phabricator.wikimedia.org/P53169 and previous config saved to /var/cache/conftool/dbconfig/20231108-094156-arnaudb.json
[09:56:48] <wikibugs>	 (03PS4) 10Arnaudb: debug: printing results when return object count > 1 [software/conftool] - 10https://gerrit.wikimedia.org/r/971437 (https://phabricator.wikimedia.org/T350656)
[09:57:01] <logmsgbot>	 !log arnaudb@cumin1001 dbctl commit (dc=all): 'db1236 (re)pooling @ 100%: Host warmup', diff saved to https://phabricator.wikimedia.org/P53170 and previous config saved to /var/cache/conftool/dbconfig/20231108-095701-arnaudb.json
[09:58:34] <wikibugs>	 (03PS1) 10Filippo Giunchedi: hieradata: idp_test entry for thanos OIDC [puppet] - 10https://gerrit.wikimedia.org/r/972701 (https://phabricator.wikimedia.org/T331512)
[10:00:49] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM!" [alerts] - 10https://gerrit.wikimedia.org/r/972692 (https://phabricator.wikimedia.org/T350694) (owner: 10Slyngshede)
[10:05:04] <wikibugs>	 (03CR) 10Volans: debug: printing results when return object count > 1 (031 comment) [software/conftool] - 10https://gerrit.wikimedia.org/r/971437 (https://phabricator.wikimedia.org/T350656) (owner: 10Arnaudb)
[10:05:37] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM" [cookbooks] - 10https://gerrit.wikimedia.org/r/972696 (https://phabricator.wikimedia.org/T347835) (owner: 10Ayounsi)
[10:06:35] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.puppet.migrate-host for host an-worker1111.eqiad.wmnet
[10:07:02] <wikibugs>	 (03PS1) 10Ayounsi: Add support for non EVPN switches on spines [homer/public] - 10https://gerrit.wikimedia.org/r/972702 (https://phabricator.wikimedia.org/T335028)
[10:08:17] <wikibugs>	 (03PS1) 10Muehlenhoff: Switch an-worker1111 to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/972703 (https://phabricator.wikimedia.org/T349619)
[10:09:17] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Switch an-worker1111 to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/972703 (https://phabricator.wikimedia.org/T349619) (owner: 10Muehlenhoff)
[10:15:44] <wikibugs>	 (03PS1) 10Hnowlan: rest-gateway: correct paths incorrectly specified in spreadsheet [deployment-charts] - 10https://gerrit.wikimedia.org/r/972704 (https://phabricator.wikimedia.org/T350747)
[10:16:24] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host an-worker1111.eqiad.wmnet
[10:17:41] <wikibugs>	 (03PS1) 10Effie Mouzeli: profile:k8s::deployment_server::mediawiki: switch php catchall [puppet] - 10https://gerrit.wikimedia.org/r/972705 (https://phabricator.wikimedia.org/T350770)
[10:18:34] <wikibugs>	 (03CR) 10Ayounsi: "Example diffs: https://phabricator.wikimedia.org/P53171" [homer/public] - 10https://gerrit.wikimedia.org/r/972702 (https://phabricator.wikimedia.org/T335028) (owner: 10Ayounsi)
[10:21:13] <wikibugs>	 (03CR) 10Santiago Faci: [C: 03+1] "It looks good! Thank you very much!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/972704 (https://phabricator.wikimedia.org/T350747) (owner: 10Hnowlan)
[10:21:28] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+2] sre.network.peering: Add Auto-Submitted email header [cookbooks] - 10https://gerrit.wikimedia.org/r/972696 (https://phabricator.wikimedia.org/T347835) (owner: 10Ayounsi)
[10:21:55] <wikibugs>	 (03CR) 10Sg912: [C: 03+1] rest-gateway: correct paths incorrectly specified in spreadsheet [deployment-charts] - 10https://gerrit.wikimedia.org/r/972704 (https://phabricator.wikimedia.org/T350747) (owner: 10Hnowlan)
[10:22:25] <wikibugs>	 (03CR) 10Hnowlan: [C: 03+2] rest-gateway: correct paths incorrectly specified in spreadsheet [deployment-charts] - 10https://gerrit.wikimedia.org/r/972704 (https://phabricator.wikimedia.org/T350747) (owner: 10Hnowlan)
[10:23:14] <wikibugs>	 (03Merged) 10jenkins-bot: rest-gateway: correct paths incorrectly specified in spreadsheet [deployment-charts] - 10https://gerrit.wikimedia.org/r/972704 (https://phabricator.wikimedia.org/T350747) (owner: 10Hnowlan)
[10:24:01] <logmsgbot>	 !log vgutierrez@cumin1001 START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-esams and A:cp
[10:24:26] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.puppet.migrate-role for role: dumps::web::htmldumps
[10:24:33] <logmsgbot>	 !log brouberol@deploy2002 Started deploy [airflow-dags/analytics@af7f4e5]: (no justification provided)
[10:24:57] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/972701 (https://phabricator.wikimedia.org/T331512) (owner: 10Filippo Giunchedi)
[10:25:04] <logmsgbot>	 !log brouberol@deploy2002 Finished deploy [airflow-dags/analytics@af7f4e5]: (no justification provided) (duration: 00m 31s)
[10:25:19] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] hieradata: idp_test entry for thanos OIDC [puppet] - 10https://gerrit.wikimedia.org/r/972701 (https://phabricator.wikimedia.org/T331512) (owner: 10Filippo Giunchedi)
[10:25:53] <wikibugs>	 (03Merged) 10jenkins-bot: sre.network.peering: Add Auto-Submitted email header [cookbooks] - 10https://gerrit.wikimedia.org/r/972696 (https://phabricator.wikimedia.org/T347835) (owner: 10Ayounsi)
[10:26:14] <wikibugs>	 (03PS1) 10Muehlenhoff: Switch dumps::web::htmldumps to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/972726 (https://phabricator.wikimedia.org/T349619)
[10:26:32] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+1] "Brown paper bag +1 😊" [puppet] - 10https://gerrit.wikimedia.org/r/972705 (https://phabricator.wikimedia.org/T350770) (owner: 10Effie Mouzeli)
[10:29:44] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Switch dumps::web::htmldumps to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/972726 (https://phabricator.wikimedia.org/T349619) (owner: 10Muehlenhoff)
[10:30:53] <logmsgbot>	 !log btullis@cumin1001 START - Cookbook sre.wikireplicas.add-wiki
[10:33:42] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: dumps::web::htmldumps
[10:38:27] <wikibugs>	 (03CR) 10Effie Mouzeli: "PCC ok" [puppet] - 10https://gerrit.wikimedia.org/r/972705 (https://phabricator.wikimedia.org/T350770) (owner: 10Effie Mouzeli)
[10:38:31] <wikibugs>	 (03CR) 10Effie Mouzeli: [C: 03+2] profile:k8s::deployment_server::mediawiki: switch php catchall [puppet] - 10https://gerrit.wikimedia.org/r/972705 (https://phabricator.wikimedia.org/T350770) (owner: 10Effie Mouzeli)
[10:40:05] <logmsgbot>	 !log hnowlan@deploy2002 helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
[10:40:11] <wikibugs>	 10SRE, 10SRE-tools, 10Infrastructure-Foundations, 10Puppet-Core, and 2 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619 (10MoritzMuehlenhoff)
[10:40:16] <logmsgbot>	 !log hnowlan@deploy2002 helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
[10:40:17] <wikibugs>	 (03CR) 10Ayounsi: Change 'anycast_gw' var in int config to represent type of IRB needed (032 comments) [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/971937 (https://phabricator.wikimedia.org/T350579) (owner: 10Cathal Mooney)
[10:40:41] <logmsgbot>	 !log hnowlan@deploy2002 helmfile [codfw] START helmfile.d/services/rest-gateway: apply
[10:40:49] <logmsgbot>	 !log hnowlan@deploy2002 helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
[10:43:32] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.puppet.migrate-role for role: analytics_cluster::hadoop::worker
[10:44:04] <wikibugs>	 (03PS8) 10Hashar: wm-checks-api: add PCC build outcome [software/gerrit] (deploy/wmf/stable-3.5) - 10https://gerrit.wikimedia.org/r/969981
[10:47:05] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+1] hieradata: cloudgw: drop nfs-maps [puppet] - 10https://gerrit.wikimedia.org/r/971401 (https://phabricator.wikimedia.org/T350259) (owner: 10Majavah)
[10:47:07] <wikibugs>	 (03PS1) 10Muehlenhoff: Switch analytics_cluster::hadoop::worker to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/972728 (https://phabricator.wikimedia.org/T349619)
[10:47:12] <wikibugs>	 (03CR) 10Hashar: [C: 04-1] "With `plugin.registerCustomComponent( 'check-result-expanded', PCCBuildResultElement.is )`  the `<pcc-build-result>` element ends up being" [software/gerrit] (deploy/wmf/stable-3.5) - 10https://gerrit.wikimedia.org/r/969981 (owner: 10Hashar)
[10:48:12] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Switch analytics_cluster::hadoop::worker to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/972728 (https://phabricator.wikimedia.org/T349619) (owner: 10Muehlenhoff)
[10:48:25] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+1] "Change lgtm, a couple non-blockers comments." [homer/public] - 10https://gerrit.wikimedia.org/r/970767 (https://phabricator.wikimedia.org/T347030) (owner: 10Cathal Mooney)
[10:50:31] <wikibugs>	 (03PS1) 10Jcrespo: RemoteExecution: Remove errors from cumin logging from basic tests [software/transferpy] - 10https://gerrit.wikimedia.org/r/972729 (https://phabricator.wikimedia.org/T330882)
[10:51:31] <wikibugs>	 (03PS2) 10Majavah: hieradata: cloudgw: drop nfs-maps [puppet] - 10https://gerrit.wikimedia.org/r/971401 (https://phabricator.wikimedia.org/T350259)
[10:52:46] <wikibugs>	 (03CR) 10Majavah: [C: 03+2] hieradata: cloudgw: drop nfs-maps [puppet] - 10https://gerrit.wikimedia.org/r/971401 (https://phabricator.wikimedia.org/T350259) (owner: 10Majavah)
[10:54:11] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Structured-Data-Backlog, 10UploadWizard: Access request to deleted image files in the backup cluster - https://phabricator.wikimedia.org/T350020 (10Cparle) 1. Initial access for a one-time study is really all we need for now, and if the data was ready for us to begin work on...
[10:54:39] <wikibugs>	 (03CR) 10Hashar: [C: 04-1] "From the gr-endpoint-decorator.ts:" [software/gerrit] (deploy/wmf/stable-3.5) - 10https://gerrit.wikimedia.org/r/969981 (owner: 10Hashar)
[10:57:09] <wikibugs>	 (03CR) 10Btullis: [C: 03+1] "Looks good to me, thanks." [software/transferpy] - 10https://gerrit.wikimedia.org/r/972433 (https://phabricator.wikimedia.org/T284150) (owner: 10Jcrespo)
[11:00:05] <jouncebot>	 Deploy window MediaWiki infrastucture (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20231108T1100)
[11:01:13] <wikibugs>	 (03PS1) 10Jbond: utils/setup_workspace.sh: update setup options [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/972731
[11:01:36] <effie>	 oh cool right on time for MediaWiki infrastructure deploy window 
[11:01:37] <wikibugs>	 (03PS1) 10Majavah: nftables: notify correct service resource [puppet] - 10https://gerrit.wikimedia.org/r/972732
[11:01:49] <effie>	 jouncebot: next
[11:01:49] <jouncebot>	 In 2 hour(s) and 58 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20231108T1400)
[11:02:30] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] utils/setup_workspace.sh: update setup options [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/972731 (owner: 10Jbond)
[11:04:49] <logmsgbot>	 !log btullis@cumin1001 Added views for new wiki: fonwiki T347938
[11:04:49] <logmsgbot>	 !log btullis@cumin1001 END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
[11:04:54] <stashbot>	 T347938: Prepare and check storage layer for fonwiki - https://phabricator.wikimedia.org/T347938
[11:05:12] <logmsgbot>	 !log jiji@deploy2002 helmfile [codfw] START helmfile.d/services/mw-debug: apply
[11:05:34] <logmsgbot>	 !log jiji@deploy2002 helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
[11:05:35] <logmsgbot>	 !log jiji@deploy2002 helmfile [eqiad] START helmfile.d/services/mw-debug: apply
[11:06:43] <wikibugs>	 (03Merged) 10jenkins-bot: utils/setup_workspace.sh: update setup options [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/972731 (owner: 10Jbond)
[11:06:52] <logmsgbot>	 !log jiji@deploy2002 helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
[11:06:53] <logmsgbot>	 !log jiji@deploy2002 helmfile [codfw] START helmfile.d/services/mw-web: apply
[11:07:19] <logmsgbot>	 !log jiji@deploy2002 helmfile [codfw] DONE helmfile.d/services/mw-web: apply
[11:07:20] <logmsgbot>	 !log jiji@deploy2002 helmfile [eqiad] START helmfile.d/services/mw-web: apply
[11:07:59] <logmsgbot>	 !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-esams and A:cp
[11:08:13] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job ldap in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[11:08:29] <logmsgbot>	 !log jiji@deploy2002 helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
[11:08:30] <logmsgbot>	 !log jiji@deploy2002 helmfile [codfw] START helmfile.d/services/mw-api-int: apply
[11:09:04] <logmsgbot>	 !log jiji@deploy2002 helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
[11:09:05] <logmsgbot>	 !log jiji@deploy2002 helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
[11:09:06] <wikibugs>	 (03CR) 10MVernon: "Just noticed this is still sitting in the review queue at +1. Are you planning on merging it, or did you want me or Filippo to?" [puppet] - 10https://gerrit.wikimedia.org/r/945752 (owner: 10Muehlenhoff)
[11:09:30] <logmsgbot>	 !log jiji@deploy2002 helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
[11:09:31] <logmsgbot>	 !log jiji@deploy2002 helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
[11:10:19] <logmsgbot>	 !log jiji@deploy2002 helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
[11:10:20] <logmsgbot>	 !log jiji@deploy2002 helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
[11:10:50] <wikibugs>	 (03CR) 10Muehlenhoff: thanos: Avoid Ferm-specific syntax (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/945752 (owner: 10Muehlenhoff)
[11:11:12] <logmsgbot>	 !log jiji@deploy2002 helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
[11:11:13] <logmsgbot>	 !log jiji@deploy2002 helmfile [codfw] START helmfile.d/services/mw-misc: apply
[11:11:15] <logmsgbot>	 !log jiji@deploy2002 helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
[11:11:16] <logmsgbot>	 !log jiji@deploy2002 helmfile [eqiad] START helmfile.d/services/mw-misc: apply
[11:11:19] <logmsgbot>	 !log jiji@deploy2002 helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
[11:11:20] <logmsgbot>	 !log jiji@deploy2002 helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
[11:12:04] <logmsgbot>	 !log jiji@deploy2002 helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
[11:12:05] <logmsgbot>	 !log jiji@deploy2002 helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
[11:12:28] <logmsgbot>	 !log jiji@deploy2002 helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
[11:15:14] <logmsgbot>	 !log jmm@cumin2002 END (FAIL) - Cookbook sre.puppet.migrate-role (exit_code=99) for role: analytics_cluster::hadoop::worker
[11:16:45] <wikibugs>	 (03PS1) 10Jbond: Revert "Switch analytics_cluster::hadoop::worker to Puppet 7" [puppet] - 10https://gerrit.wikimedia.org/r/972708
[11:16:52] <wikibugs>	 (03CR) 10Jbond: [V: 03+2 C: 03+2] Revert "Switch analytics_cluster::hadoop::worker to Puppet 7" [puppet] - 10https://gerrit.wikimedia.org/r/972708 (owner: 10Jbond)
[11:18:12] <wikibugs>	 (03PS1) 10Majavah: P:openstack: keystone: sync fernet keys over cloud-private [puppet] - 10https://gerrit.wikimedia.org/r/972737
[11:18:14] <wikibugs>	 (03PS1) 10Majavah: P:openstack: trove: use cloud-private for memcached in eqiad1 [puppet] - 10https://gerrit.wikimedia.org/r/972738
[11:18:41] <icinga-wm>	 PROBLEM - mailman archives on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[11:19:43] <wikibugs>	 (03CR) 10Majavah: [V: 03+1] "PCC SUCCESS (CORE_DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/352/con" [puppet] - 10https://gerrit.wikimedia.org/r/972738 (owner: 10Majavah)
[11:19:57] <icinga-wm>	 RECOVERY - mailman archives on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 50861 bytes in 0.121 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[11:22:59] <wikibugs>	 (03PS1) 10Ladsgroup: Only take one field in fetchFieldValues [extensions/PageImages] (wmf/1.42.0-wmf.4) - 10https://gerrit.wikimedia.org/r/972709 (https://phabricator.wikimedia.org/T350726)
[11:23:08] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+2] Only take one field in fetchFieldValues [extensions/PageImages] (wmf/1.42.0-wmf.4) - 10https://gerrit.wikimedia.org/r/972709 (https://phabricator.wikimedia.org/T350726) (owner: 10Ladsgroup)
[11:23:16] <Amir1>	 jouncebot: nowandnext
[11:23:16] <jouncebot>	 For the next 0 hour(s) and 36 minute(s): MediaWiki infrastucture (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20231108T1100)
[11:23:16] <jouncebot>	 In 2 hour(s) and 36 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20231108T1400)
[11:23:20] <Amir1>	 coolio
[11:23:39] <Amir1>	 jnuche: I'm about to push the fix, wanna roll the train again afterwards?
[11:26:15] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by ladsgroup@deploy2002 using scap backport" [extensions/PageImages] (wmf/1.42.0-wmf.4) - 10https://gerrit.wikimedia.org/r/972709 (https://phabricator.wikimedia.org/T350726) (owner: 10Ladsgroup)
[11:30:14] <hnowlan>	 jayme: just to mention, I've gotten the go-ahead to deploy that wikifeeds change, so your cert manager stuff will be going out with it
[11:30:19] <wikibugs>	 (03PS1) 10Majavah: cr-cloud: drop nfs-maps [homer/public] - 10https://gerrit.wikimedia.org/r/972740 (https://phabricator.wikimedia.org/T350259)
[11:30:36] <jayme>	 hnowlan: cool, thanks!
[11:32:16] <effie>	 !log stopping puppet from mc2038 
[11:32:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:33:22] <logmsgbot>	 !log hnowlan@deploy2002 helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
[11:33:56] <logmsgbot>	 !log hnowlan@deploy2002 helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
[11:34:10] <wikibugs>	 (03CR) 10MVernon: [C: 03+1] "Hi," [software/transferpy] - 10https://gerrit.wikimedia.org/r/972729 (https://phabricator.wikimedia.org/T330882) (owner: 10Jcrespo)
[11:35:36] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+1] Tranferrer: Enable transfers other than misc, core or x1 sections [software/transferpy] - 10https://gerrit.wikimedia.org/r/972433 (https://phabricator.wikimedia.org/T284150) (owner: 10Jcrespo)
[11:36:29] <logmsgbot>	 !log hnowlan@deploy2002 helmfile [codfw] START helmfile.d/services/wikifeeds: apply
[11:36:53] <logmsgbot>	 !log hnowlan@deploy2002 helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
[11:38:07] <moritzm>	 !log installing logrotate bugfix updates on Bullseye
[11:38:11] <effie>	 !log restarting memcached on mc2038 
[11:40:58] <wikibugs>	 (03Merged) 10jenkins-bot: Only take one field in fetchFieldValues [extensions/PageImages] (wmf/1.42.0-wmf.4) - 10https://gerrit.wikimedia.org/r/972709 (https://phabricator.wikimedia.org/T350726) (owner: 10Ladsgroup)
[11:42:02] <logmsgbot>	 !log ladsgroup@deploy2002 Started scap: Backport for [[gerrit:972709|Only take one field in fetchFieldValues (T350726)]]
[11:42:06] <stashbot>	 T350726: [PageImages] SelectQueryBuilder::fetchFieldValues expects the query to have only one field - https://phabricator.wikimedia.org/T350726
[11:43:23] <logmsgbot>	 !log ladsgroup@deploy2002 ladsgroup: Backport for [[gerrit:972709|Only take one field in fetchFieldValues (T350726)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[11:43:46] <logmsgbot>	 !log ladsgroup@deploy2002 ladsgroup: Continuing with sync
[11:43:53] <jnuche>	 Amir1: sry, just came back from lunch
[11:44:01] <jnuche>	 yeah, will roll forward once it's backported
[11:44:04] <Amir1>	 no worries
[11:44:06] <jnuche>	 thanks a lot for the fix
[11:44:40] <Amir1>	 I broke it so sorry for the noise :D
[11:45:34] <wikibugs>	 10SRE, 10Infrastructure-Foundations: Integrate Bullseye 11.8 point update - https://phabricator.wikimedia.org/T348327 (10MoritzMuehlenhoff)
[11:47:05] <jnuche>	 ouch
[11:47:17] <wikibugs>	 (03PS1) 10Jbond: puppet: move puppet run to files [puppet] - 10https://gerrit.wikimedia.org/r/972742
[11:47:19] <wikibugs>	 (03PS1) 10Jbond: puppet: add check for /run/puppet/disabled [puppet] - 10https://gerrit.wikimedia.org/r/972743
[11:47:21] <wikibugs>	 (03PS1) 10Jbond: puppet: puppet-run fix shellcheck issues [puppet] - 10https://gerrit.wikimedia.org/r/972744
[11:48:57] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (CORE_DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/353/con" [puppet] - 10https://gerrit.wikimedia.org/r/972744 (owner: 10Jbond)
[11:49:01] <wikibugs>	 (03PS1) 10Hnowlan: rest-gateway: increase resource limits [deployment-charts] - 10https://gerrit.wikimedia.org/r/972746
[11:49:02] <logmsgbot>	 !log ladsgroup@deploy2002 Finished scap: Backport for [[gerrit:972709|Only take one field in fetchFieldValues (T350726)]] (duration: 07m 00s)
[11:49:09] <stashbot>	 T350726: [PageImages] SelectQueryBuilder::fetchFieldValues expects the query to have only one field - https://phabricator.wikimedia.org/T350726
[11:49:16] <Amir1>	 jnuche: deployed ^
[11:50:22] <jnuche>	 ack, rolling train to group1
[11:51:12] <wikibugs>	 (03PS1) 10TrainBranchBot: group1 wikis to 1.42.0-wmf.4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/972747 (https://phabricator.wikimedia.org/T350080)
[11:51:14] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] group1 wikis to 1.42.0-wmf.4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/972747 (https://phabricator.wikimedia.org/T350080) (owner: 10TrainBranchBot)
[11:51:57] <wikibugs>	 (03Merged) 10jenkins-bot: group1 wikis to 1.42.0-wmf.4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/972747 (https://phabricator.wikimedia.org/T350080) (owner: 10TrainBranchBot)
[11:52:43] <wikibugs>	 (03PS2) 10Jbond: puppet: puppet-run fix shellcheck issues [puppet] - 10https://gerrit.wikimedia.org/r/972744
[11:53:12] <jinxer-wm>	 (PuppetFailure) firing: Puppet has failed on lists1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure
[11:54:33] <wikibugs>	 (03CR) 10Peter Fischer: [C: 03+1] cirrus updater: Re-enable the .* route for mwapi [deployment-charts] - 10https://gerrit.wikimedia.org/r/969209 (owner: 10Ebernhardson)
[11:56:55] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/972742 (owner: 10Jbond)
[11:57:39] <wikibugs>	 (03CR) 10Muehlenhoff: puppet: add check for /run/puppet/disabled (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/972743 (owner: 10Jbond)
[11:58:24] <logmsgbot>	 !log jnuche@deploy2002 rebuilt and synchronized wikiversions files: group1 wikis to 1.42.0-wmf.4  refs T350080
[11:59:46] <logmsgbot>	 !log jiji@deploy2002 helmfile [staging] START helmfile.d/services/ipoid: apply
[11:59:55] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "fixed thanks" [puppet] - 10https://gerrit.wikimedia.org/r/972743 (owner: 10Jbond)
[12:00:16] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/972743 (owner: 10Jbond)
[12:00:21] <logmsgbot>	 !log jiji@deploy2002 helmfile [staging] DONE helmfile.d/services/ipoid: apply
[12:03:00] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] puppet: move puppet run to files [puppet] - 10https://gerrit.wikimedia.org/r/972742 (owner: 10Jbond)
[12:03:05] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] puppet: add check for /run/puppet/disabled [puppet] - 10https://gerrit.wikimedia.org/r/972743 (owner: 10Jbond)
[12:03:30] <wikibugs>	 (03CR) 10Filippo Giunchedi: prometheus-puppet-agent-stats: this timer sometime fails (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/971946 (owner: 10Jbond)
[12:04:05] <logmsgbot>	 !log jnuche@deploy2002 Synchronized php: group1 wikis to 1.42.0-wmf.4  refs T350080 (duration: 05m 40s)
[12:04:28] <wikibugs>	 (03CR) 10Btullis: [C: 03+1] "This looks good as far as I am concerned." [puppet] - 10https://gerrit.wikimedia.org/r/968668 (https://phabricator.wikimedia.org/T340741) (owner: 10Jbond)
[12:05:51] <wikibugs>	 (03PS1) 10Slyngshede: CI: realpath is in /bin on macOS [puppet] - 10https://gerrit.wikimedia.org/r/972749
[12:06:49] <jnuche>	 1.42.0-wmf.4 is in group1, logs look clean
[12:06:54] <jnuche>	 Amir1: thx again :)
[12:07:23] <logmsgbot>	 !log stevemunene@cumin1001 START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid jvm daemons.
[12:07:28] <Amir1>	 awesome. Sorry for the mess
[12:08:17] <wikibugs>	 (03CR) 10Btullis: [C: 03+1] mariadb - analytics: update the ssl-ca value used by mariadb (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/968666 (https://phabricator.wikimedia.org/T340741) (owner: 10Jbond)
[12:09:11] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "lgtm thanks" [puppet] - 10https://gerrit.wikimedia.org/r/972732 (owner: 10Majavah)
[12:09:23] <wikibugs>	 (03CR) 10Majavah: [C: 03+2] nftables: notify correct service resource [puppet] - 10https://gerrit.wikimedia.org/r/972732 (owner: 10Majavah)
[12:09:53] <wikibugs>	 (03CR) 10Btullis: [C: 03+2] "In fact, I think I'll be bold and +2 this, then deploy it myself." [puppet] - 10https://gerrit.wikimedia.org/r/968666 (https://phabricator.wikimedia.org/T340741) (owner: 10Jbond)
[12:10:22] <wikibugs>	 (03CR) 10Btullis: [C: 03+1] mariadb - wikireplicas: update the ssl-ca value used by mariadb [puppet] - 10https://gerrit.wikimedia.org/r/961829 (https://phabricator.wikimedia.org/T340741) (owner: 10Jbond)
[12:11:43] <wikibugs>	 (03CR) 10Btullis: [C: 03+1] mariadb - wikireplicas: update the ssl-ca value used by mariadb (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/961829 (https://phabricator.wikimedia.org/T340741) (owner: 10Jbond)
[12:14:06] <wikibugs>	 (03CR) 10Btullis: [C: 03+1] aqs: add .../aqs/deploy/src/ to Environment [puppet] - 10https://gerrit.wikimedia.org/r/972461 (https://phabricator.wikimedia.org/T349228) (owner: 10Eevans)
[12:14:12] <wikibugs>	 (03PS2) 10Jbond: realm.pp: drop $other_site global [puppet] - 10https://gerrit.wikimedia.org/r/971461 (https://phabricator.wikimedia.org/T350008)
[12:14:22] <wikibugs>	 (03CR) 10Jbond: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/971461 (https://phabricator.wikimedia.org/T350008) (owner: 10Jbond)
[12:14:52] <wikibugs>	 (03CR) 10Btullis: [C: 03+1] varnishkafka: update SSL CA to use shared CA [puppet] - 10https://gerrit.wikimedia.org/r/972369 (https://phabricator.wikimedia.org/T340741) (owner: 10Jbond)
[12:15:02] <wikibugs>	 (03PS1) 10Hnowlan: edit-analytics: bump version [deployment-charts] - 10https://gerrit.wikimedia.org/r/972812
[12:15:12] <wikibugs>	 (03PS2) 10Jbond: realm.pp: drop namservers global as it is no longer used [puppet] - 10https://gerrit.wikimedia.org/r/971423 (https://phabricator.wikimedia.org/T350008)
[12:16:17] <wikibugs>	 (03CR) 10Jbond: realm.pp: drop namservers global as it is no longer used (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/971423 (https://phabricator.wikimedia.org/T350008) (owner: 10Jbond)
[12:16:22] <wikibugs>	 (03CR) 10Jbond: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/971423 (https://phabricator.wikimedia.org/T350008) (owner: 10Jbond)
[12:16:26] <wikibugs>	 (03CR) 10Santiago Faci: [C: 03+1] "It looks good! Thanks!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/972812 (owner: 10Hnowlan)
[12:16:39] <wikibugs>	 (03CR) 10Hnowlan: [C: 03+2] edit-analytics: bump version [deployment-charts] - 10https://gerrit.wikimedia.org/r/972812 (owner: 10Hnowlan)
[12:17:37] <wikibugs>	 (03Merged) 10jenkins-bot: edit-analytics: bump version [deployment-charts] - 10https://gerrit.wikimedia.org/r/972812 (owner: 10Hnowlan)
[12:18:09] <wikibugs>	 (03CR) 10Volans: [C: 04-1] "Setting ok_codes=[] will tell cumin to consider as succesful any exit code of the underlying executed commands, as a result the return cod" [software/transferpy] - 10https://gerrit.wikimedia.org/r/972729 (https://phabricator.wikimedia.org/T330882) (owner: 10Jcrespo)
[12:18:37] <wikibugs>	 (03PS1) 10Arnaudb: haproxy: remove dbproxy1017 from production [puppet] - 10https://gerrit.wikimedia.org/r/972509 (https://phabricator.wikimedia.org/T348956)
[12:20:28] <wikibugs>	 (03PS2) 10Jbond: realm.pp: drop use_puppetdb global [puppet] - 10https://gerrit.wikimedia.org/r/971463 (https://phabricator.wikimedia.org/T350008)
[12:20:30] <wikibugs>	 (03PS2) 10Jbond: realm.pp: remove old comments [puppet] - 10https://gerrit.wikimedia.org/r/971464 (https://phabricator.wikimedia.org/T350008)
[12:20:32] <wikibugs>	 (03PS2) 10Jbond: sanitarium_multiinstance: over private_wiki and private_tables vars to hiera [puppet] - 10https://gerrit.wikimedia.org/r/971468 (https://phabricator.wikimedia.org/T350008)
[12:20:34] <wikibugs>	 (03PS2) 10Jbond: realm.pp: drop wikimail_smarthost global [puppet] - 10https://gerrit.wikimedia.org/r/971469 (https://phabricator.wikimedia.org/T350008)
[12:20:36] <wikibugs>	 (03PS2) 10Jbond: airflow: convert to pull mail_smarthosts from hiera [puppet] - 10https://gerrit.wikimedia.org/r/971471 (https://phabricator.wikimedia.org/T350008)
[12:20:38] <wikibugs>	 (03PS2) 10Jbond: realm: drop mail_smarthost global [puppet] - 10https://gerrit.wikimedia.org/r/971472 (https://phabricator.wikimedia.org/T350008)
[12:20:40] <wikibugs>	 (03PS2) 10Jbond: realm.pp: drop ntp_peers [puppet] - 10https://gerrit.wikimedia.org/r/971476 (https://phabricator.wikimedia.org/T350008)
[12:20:46] <wikibugs>	 (03CR) 10Jbond: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/971463 (https://phabricator.wikimedia.org/T350008) (owner: 10Jbond)
[12:21:15] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] sanitarium_multiinstance: over private_wiki and private_tables vars to hiera [puppet] - 10https://gerrit.wikimedia.org/r/971468 (https://phabricator.wikimedia.org/T350008) (owner: 10Jbond)
[12:21:17] <wikibugs>	 (03CR) 10Jbond: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/971468 (https://phabricator.wikimedia.org/T350008) (owner: 10Jbond)
[12:21:36] <logmsgbot>	 !log hnowlan@deploy2002 helmfile [staging] START helmfile.d/services/edit-analytics: apply
[12:21:50] <logmsgbot>	 !log hnowlan@deploy2002 helmfile [staging] DONE helmfile.d/services/edit-analytics: apply
[12:22:28] <logmsgbot>	 !log hnowlan@deploy2002 helmfile [eqiad] START helmfile.d/services/edit-analytics: apply
[12:22:45] <logmsgbot>	 !log hnowlan@deploy2002 helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply
[12:22:49] <logmsgbot>	 !log hnowlan@deploy2002 helmfile [codfw] START helmfile.d/services/edit-analytics: apply
[12:23:06] <logmsgbot>	 !log hnowlan@deploy2002 helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply
[12:24:15] <wikibugs>	 (03PS3) 10Jbond: realm.pp: drop use_puppetdb global [puppet] - 10https://gerrit.wikimedia.org/r/971463 (https://phabricator.wikimedia.org/T350008)
[12:24:15] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.puppet.migrate-role for role: analytics_cluster::hadoop::worker
[12:24:17] <wikibugs>	 (03PS3) 10Jbond: realm.pp: remove old comments [puppet] - 10https://gerrit.wikimedia.org/r/971464 (https://phabricator.wikimedia.org/T350008)
[12:24:19] <wikibugs>	 (03PS3) 10Jbond: sanitarium_multiinstance: over private_wiki and private_tables vars to hiera [puppet] - 10https://gerrit.wikimedia.org/r/971468 (https://phabricator.wikimedia.org/T350008)
[12:24:21] <wikibugs>	 (03PS3) 10Jbond: realm.pp: drop wikimail_smarthost global [puppet] - 10https://gerrit.wikimedia.org/r/971469 (https://phabricator.wikimedia.org/T350008)
[12:24:23] <wikibugs>	 (03PS3) 10Jbond: airflow: convert to pull mail_smarthosts from hiera [puppet] - 10https://gerrit.wikimedia.org/r/971471 (https://phabricator.wikimedia.org/T350008)
[12:24:25] <wikibugs>	 (03PS3) 10Jbond: realm: drop mail_smarthost global [puppet] - 10https://gerrit.wikimedia.org/r/971472 (https://phabricator.wikimedia.org/T350008)
[12:24:27] <wikibugs>	 (03PS3) 10Jbond: realm.pp: drop ntp_peers [puppet] - 10https://gerrit.wikimedia.org/r/971476 (https://phabricator.wikimedia.org/T350008)
[12:24:33] <wikibugs>	 (03CR) 10Jbond: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/971468 (https://phabricator.wikimedia.org/T350008) (owner: 10Jbond)
[12:25:17] <wikibugs>	 (03CR) 10Jbond: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/971471 (https://phabricator.wikimedia.org/T350008) (owner: 10Jbond)
[12:25:38] <wikibugs>	 (03CR) 10Jbond: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/971476 (https://phabricator.wikimedia.org/T350008) (owner: 10Jbond)
[12:26:42] <wikibugs>	 (03CR) 10Jcrespo: RemoteExecution: Remove errors from cumin logging from basic tests (031 comment) [software/transferpy] - 10https://gerrit.wikimedia.org/r/972729 (https://phabricator.wikimedia.org/T330882) (owner: 10Jcrespo)
[12:28:21] <wikibugs>	 (03PS1) 10Muehlenhoff: Switch analytics_cluster::hadoop::worker to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/972815 (https://phabricator.wikimedia.org/T349619)
[12:30:20] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Switch analytics_cluster::hadoop::worker to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/972815 (https://phabricator.wikimedia.org/T349619) (owner: 10Muehlenhoff)
[12:32:23] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/972749 (owner: 10Slyngshede)
[12:33:35] <wikibugs>	 (03CR) 10Jbond: puppet.puppet.get_puppet_ca_hostname: return hardcoded start (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/971957 (https://phabricator.wikimedia.org/T349619) (owner: 10Jbond)
[12:33:59] <wikibugs>	 10SRE, 10Infrastructure-Foundations: Integrate Bullseye 11.8 point update - https://phabricator.wikimedia.org/T348327 (10MoritzMuehlenhoff)
[12:34:03] <wikibugs>	 (03CR) 10Jbond: puppet.puppet.get_puppet_ca_hostname: return hardcoded start (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/971957 (https://phabricator.wikimedia.org/T349619) (owner: 10Jbond)
[12:35:17] <wikibugs>	 (03CR) 10Majavah: puppet.puppet.get_puppet_ca_hostname: return hardcoded start (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/971957 (https://phabricator.wikimedia.org/T349619) (owner: 10Jbond)
[12:35:36] <wikibugs>	 (03CR) 10Jbond: puppet.puppet.get_puppet_ca_hostname: return hardcoded start (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/971957 (https://phabricator.wikimedia.org/T349619) (owner: 10Jbond)
[12:37:44] <Kizule>	 Hi, I'm trying to unprotect pages on Serbian Wikipedia.
[12:38:19] <Kizule>	 There is one page which I can't unprotect, because it exists and actually doesn't have protection, but when I follow a given link by user, it's showing me that it is protected.
[12:38:32] <Kizule>	 When I click on option to open article, it opens article.
[12:38:45] <Kizule>	 Can someone remove row from protected_titles?
[12:38:52] <Kizule>	 SELECT * FROM protected_titles WHERE pt_title='Феjшл'; on Serbian Wikipedia shows this:
[12:38:59] <Kizule>	 +--------------+-----------+---------+--------------+----------------+-----------+----------------+
[12:39:00] <Kizule>	 | pt_namespace | pt_title  | pt_user | pt_reason_id | pt_timestamp   | pt_expiry | pt_create_perm |
[12:39:00] <Kizule>	 +--------------+-----------+---------+--------------+----------------+-----------+----------------+
[12:39:01] <Kizule>	 |            0 | Феjшл     |     133 |         2383 | 20111215021350 | infinity  | sysop          |
[12:39:01] <Kizule>	 +--------------+-----------+---------+--------------+----------------+-----------+----------------+
[12:40:07] <wikibugs>	 (03CR) 10Hashar: [C: 03+2] Make serve:plugins emit a 404 for missing files [software/gerrit] (deploy/wmf/stable-3.5) - 10https://gerrit.wikimedia.org/r/972460 (owner: 10Hashar)
[12:40:39] <wikibugs>	 (03Merged) 10jenkins-bot: Make serve:plugins emit a 404 for missing files [software/gerrit] (deploy/wmf/stable-3.5) - 10https://gerrit.wikimedia.org/r/972460 (owner: 10Hashar)
[12:40:43] <wikibugs>	 (03CR) 10Hashar: [C: 03+2] Remap serving plugins under /r/plugins/ [software/gerrit] (deploy/wmf/stable-3.5) - 10https://gerrit.wikimedia.org/r/972396 (owner: 10Hashar)
[12:41:13] <wikibugs>	 (03Merged) 10jenkins-bot: Remap serving plugins under /r/plugins/ [software/gerrit] (deploy/wmf/stable-3.5) - 10https://gerrit.wikimedia.org/r/972396 (owner: 10Hashar)
[12:41:41] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] prometheus: update ssl CA to use shared CA [puppet] - 10https://gerrit.wikimedia.org/r/972368 (https://phabricator.wikimedia.org/T340741) (owner: 10Jbond)
[12:42:45] <Kizule>	 https://sr.wikipedia.org/w/index.php?action=edit&preload=&editintro=&title=%D0%A4%D0%B5j%D1%88%D0%BB%E2%80%8F%E2%80%8E&create=%D0%9D%D0%B0%D0%BF%D1%80%D0%B0%D0%B2%D0%B8+%D1%81%D1%82%D1%80%D0%B0%D0%BD%D0%B8%D1%86%D1%83 is showing me that I can change protection. When I click on that, it actually loads existing page which doesn't have protection.
[12:44:31] <Kizule>	 I think that there is some bug with encoding, so you can run DELETE FROM protected_titles WHERE pt_timestamp='20111215021350';
[12:44:58] <Kizule>	 There is only one page so it won't cause a chaos with replication.
[12:44:59] <Kizule>	 MariaDB [srwiki_p]> SELECT * FROM protected_titles WHERE pt_timestamp='20111215021350';
[12:45:00] <Kizule>	 +--------------+-----------+---------+--------------+----------------+-----------+----------------+
[12:45:00] <Kizule>	 | pt_namespace | pt_title  | pt_user | pt_reason_id | pt_timestamp   | pt_expiry | pt_create_perm |
[12:45:01] <Kizule>	 +--------------+-----------+---------+--------------+----------------+-----------+----------------+
[12:45:01] <Kizule>	 |            0 | Феjшл     |     133 |         2383 | 20111215021350 | infinity  | sysop          |
[12:45:02] <Kizule>	 +--------------+-----------+---------+--------------+----------------+-----------+----------------+
[12:45:02] <Kizule>	 1 row in set (0.002 sec)
[12:45:32] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] "deployed and tested by running" [puppet] - 10https://gerrit.wikimedia.org/r/972368 (https://phabricator.wikimedia.org/T340741) (owner: 10Jbond)
[12:45:56] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] varnishkafka: update SSL CA to use shared CA [puppet] - 10https://gerrit.wikimedia.org/r/972369 (https://phabricator.wikimedia.org/T340741) (owner: 10Jbond)
[12:50:57] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+1] cr-cloud: drop nfs-maps [homer/public] - 10https://gerrit.wikimedia.org/r/972740 (https://phabricator.wikimedia.org/T350259) (owner: 10Majavah)
[12:51:32] <wikibugs>	 (03CR) 10Majavah: [C: 03+2] cr-cloud: drop nfs-maps [homer/public] - 10https://gerrit.wikimedia.org/r/972740 (https://phabricator.wikimedia.org/T350259) (owner: 10Majavah)
[12:52:19] <wikibugs>	 (03Merged) 10jenkins-bot: cr-cloud: drop nfs-maps [homer/public] - 10https://gerrit.wikimedia.org/r/972740 (https://phabricator.wikimedia.org/T350259) (owner: 10Majavah)
[12:53:18] <Kizule>	 pt_timestamp 20150912171151 is bugged as well, needs removal via DB as well.
[12:54:04] <Kizule>	 And 20190908114554
[12:54:09] <wikibugs>	 (03CR) 10Slyngshede: [C: 03+2] CI: realpath is in /bin on macOS [puppet] - 10https://gerrit.wikimedia.org/r/972749 (owner: 10Slyngshede)
[12:55:57] <wikibugs>	 (03PS3) 10Jbond: prometheus-puppet-agent-stats: this timer sometime fails [puppet] - 10https://gerrit.wikimedia.org/r/971946
[12:56:15] <wikibugs>	 (03CR) 10Jbond: prometheus-puppet-agent-stats: this timer sometime fails (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/971946 (owner: 10Jbond)
[12:58:07] <wikibugs>	 (03PS1) 10Stevemunene: Revert "Revert "airflow-wmde: configure wmde airflow instance"" [puppet] - 10https://gerrit.wikimedia.org/r/972712
[13:00:09] <wikibugs>	 (03CR) 10Majavah: [C: 03+2] users: add network device access for taavi [homer/public] - 10https://gerrit.wikimedia.org/r/970850 (https://phabricator.wikimedia.org/T350267) (owner: 10Majavah)
[13:00:20] <jnuche>	 hi, I'm going to roll back the train again due to https://phabricator.wikimedia.org/T350777
[13:00:28] <jnuche>	 will do that in the next few minutes
[13:00:46] <wikibugs>	 (03Merged) 10jenkins-bot: users: add network device access for taavi [homer/public] - 10https://gerrit.wikimedia.org/r/970850 (https://phabricator.wikimedia.org/T350267) (owner: 10Majavah)
[13:03:36] <wikibugs>	 (03PS1) 10TrainBranchBot: group0 wikis to 1.42.0-wmf.4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/972816 (https://phabricator.wikimedia.org/T350080)
[13:03:38] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] group0 wikis to 1.42.0-wmf.4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/972816 (https://phabricator.wikimedia.org/T350080) (owner: 10TrainBranchBot)
[13:04:22] <wikibugs>	 (03Merged) 10jenkins-bot: group0 wikis to 1.42.0-wmf.4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/972816 (https://phabricator.wikimedia.org/T350080) (owner: 10TrainBranchBot)
[13:04:42] <logmsgbot>	 !log jbond@cumin1001 START - Cookbook sre.puppet.migrate-role for role: openldap::replica
[13:05:13] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting shell access to production to run maintenance scripts and inspect production MediaWiki tables for Nik Gkountas - https://phabricator.wikimedia.org/T350779 (10ngkountas)
[13:06:15] <wikibugs>	 (03PS1) 10Jbond: openldap::replica: migrate to puppet7 [puppet] - 10https://gerrit.wikimedia.org/r/972820 (https://phabricator.wikimedia.org/T349619)
[13:06:41] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] openldap::replica: migrate to puppet7 [puppet] - 10https://gerrit.wikimedia.org/r/972820 (https://phabricator.wikimedia.org/T349619) (owner: 10Jbond)
[13:08:57] <logmsgbot>	 !log stevemunene@cumin1001 END (FAIL) - Cookbook sre.druid.roll-restart-workers (exit_code=99) for Druid public cluster: Roll restart of Druid jvm daemons.
[13:10:39] <logmsgbot>	 !log taavi@cumin1001 START - Cookbook sre.dns.netbox
[13:10:50] <logmsgbot>	 !log jnuche@deploy2002 rebuilt and synchronized wikiversions files: group0 wikis to 1.42.0-wmf.4  refs T350080
[13:10:58] <stashbot>	 T350080: 1.42.0-wmf.4 deployment blockers - https://phabricator.wikimedia.org/T350080
[13:12:39] <logmsgbot>	 !log taavi@cumin1001 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: free up nfs-maps IPs T350259 - taavi@cumin1001"
[13:12:43] <stashbot>	 T350259: Check if nfs-maps.wikimedia.org is still in use - https://phabricator.wikimedia.org/T350259
[13:13:34] <Kizule>	 Disregard my previous messages, I've created task: https://phabricator.wikimedia.org/T350780
[13:13:57] <icinga-wm>	 PROBLEM - Check systemd state on ganeti6001 is CRITICAL: CRITICAL - degraded: The following units failed: export_smart_data_dump.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:14:04] <taavi>	 mutante: the sync-netbox-hiera cookbook is showing me a diff adding stewards1001, ok to merge that?
[13:14:52] <logmsgbot>	 !log taavi@cumin1001 END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: free up nfs-maps IPs T350259 - taavi@cumin1001"
[13:14:52] <logmsgbot>	 !log taavi@cumin1001 END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
[13:15:10] <taavi>	 nevermind, I'll leave it for you to run the cookbook and merge
[13:15:18] <taavi>	 since I'm only doing a DNS change anyway
[13:19:07] <logmsgbot>	 !log jbond@cumin1001 END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: openldap::replica
[13:19:09] <wikibugs>	 (03PS1) 10Hnowlan: trafficserver: revert to aqs1 for editor and pageview metrics [puppet] - 10https://gerrit.wikimedia.org/r/972822 (https://phabricator.wikimedia.org/T350708)
[13:20:05] <wikibugs>	 (03PS3) 10Jbond: puppet: add hiera_lookup function [software/spicerack] - 10https://gerrit.wikimedia.org/r/972459
[13:21:35] <wikibugs>	 (03PS2) 10Arnaudb: mariadb: add db1238 and prepare db1138 retirement [puppet] - 10https://gerrit.wikimedia.org/r/972507 (https://phabricator.wikimedia.org/T344036)
[13:25:46] <wikibugs>	 (03PS1) 10Btullis: Switch datahub to use the new an-mariadb servers instead of an-coord [deployment-charts] - 10https://gerrit.wikimedia.org/r/972823 (https://phabricator.wikimedia.org/T284150)
[13:31:57] <wikibugs>	 (03CR) 10Jcrespo: RemoteExecution: Remove errors from cumin logging from basic tests (031 comment) [software/transferpy] - 10https://gerrit.wikimedia.org/r/972729 (https://phabricator.wikimedia.org/T330882) (owner: 10Jcrespo)
[13:34:12] <moritzm>	 !log installing libxpm security updates
[13:34:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:44:45] <wikibugs>	 (03CR) 10Jbond: "ready for review" [software/spicerack] - 10https://gerrit.wikimedia.org/r/972459 (owner: 10Jbond)
[13:49:29] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: analytics_cluster::hadoop::worker
[13:51:16] <wikibugs>	 (03PS1) 10JMeybohm: Increase CPU limit quota of eventgate-analytics by 10 cpus [deployment-charts] - 10https://gerrit.wikimedia.org/r/972825 (https://phabricator.wikimedia.org/T350707)
[13:51:17] <wikibugs>	 10SRE, 10SRE-tools, 10Infrastructure-Foundations, 10Puppet-Core, and 2 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619 (10MoritzMuehlenhoff)
[13:55:17] <logmsgbot>	 !log btullis@cumin1001 START - Cookbook sre.hadoop.roll-restart-workers restart workers for Hadoop analytics cluster: Roll restart of jvm daemons for openjdk upgrade.
[13:55:40] <wikibugs>	 (03CR) 10Ottomata: [C: 03+1] Increase CPU limit quota of eventgate-analytics by 10 cpus [deployment-charts] - 10https://gerrit.wikimedia.org/r/972825 (https://phabricator.wikimedia.org/T350707) (owner: 10JMeybohm)
[13:55:46] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.puppet.migrate-role for role: kafka::test::broker
[13:57:26] <wikibugs>	 (03PS1) 10Muehlenhoff: Switch kafka::test::broker to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/972826 (https://phabricator.wikimedia.org/T349619)
[13:59:12] <_joe_>	 jouncebot: nowandnext
[13:59:12] <jouncebot>	 No deployments scheduled for the next 0 hour(s) and 0 minute(s)
[13:59:12] <jouncebot>	 In 0 hour(s) and 0 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20231108T1400)
[13:59:19] <_joe_>	 lol
[13:59:35] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on 15 hosts with reason: not pooled, reimaging in progress
[13:59:53] <_joe_>	 actually, it would be great to get the replicas bump out with a deployment
[13:59:59] <logmsgbot>	 !log sukhe@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on 15 hosts with reason: not pooled, reimaging in progress
[14:00:05] <jouncebot>	 RoanKattouw, Lucas_WMDE, Urbanecm, awight, TheresNoTime, and taavi: Time to snap out of that daydream and deploy UTC afternoon backport window. Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20231108T1400).
[14:00:05] <jouncebot>	 Daimona, Kizule, and Superpes: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[14:00:14] <Daimona>	 o/
[14:00:26] <Kizule>	 \o
[14:00:34] <wikibugs>	 (03PS1) 10Effie Mouzeli: ipoid: staging cron fix [deployment-charts] - 10https://gerrit.wikimedia.org/r/972827
[14:01:03] <Kizule>	 Daimona: I think that your backported patch for wmf.4 needs to be cherry-picked into wmf.3 as well, since wmf.4 was reverted to wmf.3.
[14:01:05] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Switch kafka::test::broker to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/972826 (https://phabricator.wikimedia.org/T349619) (owner: 10Muehlenhoff)
[14:01:33] <Daimona>	 Kizule: ah, yes, you're right! Thanks!
[14:01:43] <_joe_>	 who's deploying today?
[14:01:47] <taavi>	 I can I guess
[14:01:58] <Daimona>	 I completely forgot about wmf.3 still being around
[14:02:13] <Kizule>	 No problem, always happy to help. :)
[14:02:14] <_joe_>	 taavi: ok, can I do a small resource bump for mw on k8s before you deploy the first  patch?
[14:02:19] <wikibugs>	 (03CR) 10Effie Mouzeli: [C: 03+2] ipoid: staging cron fix [deployment-charts] - 10https://gerrit.wikimedia.org/r/972827 (owner: 10Effie Mouzeli)
[14:02:29] <_joe_>	 taavi: that will imply a small risk of failing deployment
[14:02:34] <taavi>	 _joe_: sure! lmk when I start deploying
[14:02:36] <wikibugs>	 (03PS1) 10Daimona Eaytoy: Remove feature flag for email [extensions/CampaignEvents] (wmf/1.42.0-wmf.3) - 10https://gerrit.wikimedia.org/r/972713 (https://phabricator.wikimedia.org/T347067)
[14:02:37] <_joe_>	 but we can just safely roll it back
[14:02:38] <wikibugs>	 (03CR) 10FNegri: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/972738 (owner: 10Majavah)
[14:02:44] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] mw-api-ext, mw-web: Raise replicas 50% [deployment-charts] - 10https://gerrit.wikimedia.org/r/964457 (https://phabricator.wikimedia.org/T348122) (owner: 10Clément Goubert)
[14:02:55] <_joe_>	 taavi: as soon as ^^ is merged I guess
[14:03:00] <taavi>	 ack
[14:03:08] <wikibugs>	 (03Merged) 10jenkins-bot: ipoid: staging cron fix [deployment-charts] - 10https://gerrit.wikimedia.org/r/972827 (owner: 10Effie Mouzeli)
[14:03:47] <wikibugs>	 (03Merged) 10jenkins-bot: mw-api-ext, mw-web: Raise replicas 50% [deployment-charts] - 10https://gerrit.wikimedia.org/r/964457 (https://phabricator.wikimedia.org/T348122) (owner: 10Clément Goubert)
[14:03:57] <taavi>	 Kizule: I'm not executing random SQL commands in production on rows where I don't know where they came from, please get a proper maintenance script merged into master or get someone who knows about why those rows are there and why they need to be removed to execute the SQL by themselves
[14:03:58] <logmsgbot>	 !log jiji@deploy2002 helmfile [staging] START helmfile.d/services/ipoid: apply
[14:04:02] <logmsgbot>	 !log jiji@deploy2002 helmfile [staging] DONE helmfile.d/services/ipoid: apply
[14:04:39] <logmsgbot>	 !log jiji@deploy2002 helmfile [staging] START helmfile.d/services/ipoid: apply
[14:04:50] <logmsgbot>	 !log jiji@deploy2002 helmfile [staging] DONE helmfile.d/services/ipoid: apply
[14:05:00] <taavi>	 Daimona: why does your patch need a backport?
[14:05:23] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by taavi@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/972428 (https://phabricator.wikimedia.org/T347067) (owner: 10Daimona Eaytoy)
[14:05:43] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to `discovery.processed_external_sparql_query` for AndrewTavis_WMDE - https://phabricator.wikimedia.org/T350426 (10Ottomata) > So is the way forward to add @AndrewTavis_WMDE to the analytics-search-users group?  This would accomplish the goal of the task, but is...
[14:05:44] <Daimona>	 Because the feature flag removal didn't make it in time for wmf.4
[14:05:46] <Kizule>	 taavi: "Random SQL command" is same as in https://gerrit.wikimedia.org/g/mediawiki/core/+/e009f9acb556a56340b83eeeb05d3eedc9131bda/includes/Permissions/RestrictionStore.php#200
[14:06:01] <wikibugs>	 (03PS2) 10JMeybohm: Increase CPU limit quota of eventgate-analytics by 10 cpus [deployment-charts] - 10https://gerrit.wikimedia.org/r/972825 (https://phabricator.wikimedia.org/T350707)
[14:06:10] <Kizule>	 https://gerrit.wikimedia.org/g/mediawiki/core/+/e009f9acb556a56340b83eeeb05d3eedc9131bda/includes/page/WikiPage.php#2303
[14:06:12] <wikibugs>	 (03Merged) 10jenkins-bot: beta: Stop setting $wgCampaignEventsEnableEmail, unused [mediawiki-config] - 10https://gerrit.wikimedia.org/r/972428 (https://phabricator.wikimedia.org/T347067) (owner: 10Daimona Eaytoy)
[14:06:14] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: kafka::test::broker
[14:06:19] <Daimona>	 I backported it so that the flag can be unset in the prog config too, without behaviour changes
[14:06:27] <Kizule>	 https://gerrit.wikimedia.org/g/mediawiki/core/+/e009f9acb556a56340b83eeeb05d3eedc9131bda/includes/title/Title.php#2459
[14:06:38] <Kizule>	 If Amir1 is around, that would be helpful to me. :)
[14:07:06] <wikibugs>	 (03PS2) 10Majavah: prod: Stop setting $wgCampaignEventsEnableEmail, unused [mediawiki-config] - 10https://gerrit.wikimedia.org/r/972429 (https://phabricator.wikimedia.org/T347067) (owner: 10Daimona Eaytoy)
[14:07:07] <Amir1>	 I'm around, is it urgent?
[14:07:21] <Kizule>	 Not really urgent, but it would be worth to take a look.
[14:07:32] <wikibugs>	 (03CR) 10FNegri: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/972737 (owner: 10Majavah)
[14:07:34] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by taavi@deploy2002 using scap backport" [extensions/CampaignEvents] (wmf/1.42.0-wmf.3) - 10https://gerrit.wikimedia.org/r/972713 (https://phabricator.wikimedia.org/T347067) (owner: 10Daimona Eaytoy)
[14:07:36] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by taavi@deploy2002 using scap backport" [extensions/CampaignEvents] (wmf/1.42.0-wmf.4) - 10https://gerrit.wikimedia.org/r/972260 (https://phabricator.wikimedia.org/T347067) (owner: 10Daimona Eaytoy)
[14:07:38] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by taavi@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/972429 (https://phabricator.wikimedia.org/T347067) (owner: 10Daimona Eaytoy)
[14:07:38] <taavi>	 Daimona: I don't really see the point of that as compared to removing the flag in a week, but whatever :P
[14:08:20] <Kizule>	 Here is the task so you don't have to search for it. https://phabricator.wikimedia.org/T350780
[14:08:30] <wikibugs>	 (03Merged) 10jenkins-bot: prod: Stop setting $wgCampaignEventsEnableEmail, unused [mediawiki-config] - 10https://gerrit.wikimedia.org/r/972429 (https://phabricator.wikimedia.org/T347067) (owner: 10Daimona Eaytoy)
[14:08:37] <Daimona>	 I just wanted to get rid of that task ASAP :D
[14:09:15] <wikibugs>	 (03PS3) 10JMeybohm: Increase CPU limit quota of eventgate-analytics by 10 cpus [deployment-charts] - 10https://gerrit.wikimedia.org/r/972825 (https://phabricator.wikimedia.org/T350707)
[14:09:39] <Daimona>	 (Also, I was under the impression that we missed the train by just a few hours, but then I didn't think about wmf.3...)
[14:10:24] <wikibugs>	 (03Merged) 10jenkins-bot: Remove feature flag for email [extensions/CampaignEvents] (wmf/1.42.0-wmf.3) - 10https://gerrit.wikimedia.org/r/972713 (https://phabricator.wikimedia.org/T347067) (owner: 10Daimona Eaytoy)
[14:10:30] <wikibugs>	 (03Merged) 10jenkins-bot: Remove feature flag for email [extensions/CampaignEvents] (wmf/1.42.0-wmf.4) - 10https://gerrit.wikimedia.org/r/972260 (https://phabricator.wikimedia.org/T347067) (owner: 10Daimona Eaytoy)
[14:10:41] <wikibugs>	 10SRE, 10SRE-tools, 10Infrastructure-Foundations, 10Puppet-Core, and 2 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619 (10MoritzMuehlenhoff)
[14:10:42] <logmsgbot>	 !log fnegri@cumin1001 START - Cookbook sre.hosts.reimage for host cloudservices1006.eqiad.wmnet with OS bookworm
[14:10:57] <logmsgbot>	 !log taavi@deploy2002 Started scap: Backport for [[gerrit:972713|Remove feature flag for email (T347067)]], [[gerrit:972260|Remove feature flag for email (T347067)]], [[gerrit:972429|prod: Stop setting $wgCampaignEventsEnableEmail, unused (T347067)]]
[14:11:00] <stashbot>	 T347067: Remove feature flag for email participants - https://phabricator.wikimedia.org/T347067
[14:11:02] <wikibugs>	 (03CR) 10Marostegui: [C: 03+1] mariadb: update to use shared SSL CA [puppet] - 10https://gerrit.wikimedia.org/r/972363 (https://phabricator.wikimedia.org/T340741) (owner: 10Jbond)
[14:11:09] <marostegui>	 jouncebot: next
[14:11:09] <jouncebot>	 In 0 hour(s) and 48 minute(s): Wikifunction Services UTC Afternoon (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20231108T1500)
[14:11:11] <_joe_>	 frankly I agree that executing SQL by hand in production if it's not reviewed is a bad policy, *expecially if repeatedly*
[14:11:22] <taavi>	 marostegui: hi, it's a backport window, I'm deploying
[14:11:41] <wikibugs>	 (03CR) 10Brouberol: [C: 03+2] Renew skein certificate every month via systemd timers (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/971196 (https://phabricator.wikimedia.org/T329398) (owner: 10Brouberol)
[14:12:02] <marostegui>	 taavi: Yeah I am aware
[14:12:08] <marostegui>	 taavi: ETA?
[14:12:11] <Kizule>	 _joe_: That's understandable, and that's why I just provided resources in order to make review easier, and task done.
[14:12:11] <icinga-wm>	 RECOVERY - Check systemd state on ganeti6001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[14:12:22] <logmsgbot>	 !log taavi@deploy2002 taavi and daimona: Backport for [[gerrit:972713|Remove feature flag for email (T347067)]], [[gerrit:972260|Remove feature flag for email (T347067)]], [[gerrit:972429|prod: Stop setting $wgCampaignEventsEnableEmail, unused (T347067)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[14:12:27] <taavi>	 Daimona: please test
[14:12:46] <Daimona>	 Yup, testing now
[14:12:47] <taavi>	 marostegui: a very rough guess would be about half an hour or less
[14:13:22] <taavi>	 I'll ping you when I'm done, ok?
[14:14:11] <marostegui>	 cheers
[14:14:31] <icinga-wm>	 PROBLEM - BGP status on cloudsw1-c8-eqiad.mgmt is CRITICAL: BGP CRITICAL - AS64605/IPv4: Connect - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[14:15:15] <icinga-wm>	 PROBLEM - BFD status on cloudsw1-c8-eqiad.mgmt is CRITICAL: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[14:15:26] <_joe_>	 marostegui: we might be slowed down by me though :)
[14:15:37] <marostegui>	 that's ok
[14:17:20] <Daimona>	 taavi: LGTM!
[14:17:38] <taavi>	 Daimona: thanks, syncing
[14:17:43] <logmsgbot>	 !log taavi@deploy2002 taavi and daimona: Continuing with sync
[14:18:14] * _joe_ crossing fingers
[14:18:20] <wikibugs>	 (03CR) 10Volans: [C: 04-1] "reply inline, I'll comment directly on the task" [software/transferpy] - 10https://gerrit.wikimedia.org/r/972729 (https://phabricator.wikimedia.org/T330882) (owner: 10Jcrespo)
[14:18:59] <taavi>	 it's starting sync-prod-k8s, let's see what happens
[14:19:36] <taavi>	 mw-web finished
[14:19:42] <taavi>	 so did mw-api-ext
[14:19:54] <taavi>	 _joe_: everything ok on your side?
[14:20:19] <_joe_>	 taavi: yes
[14:20:19] <taavi>	 Superpes: hi, around? yours are up next
[14:20:30] <Superpes>	 Yep I'm here taavi :D
[14:21:13] <Superpes>	 Ah, sorry, forgot to ping earlier lol
[14:22:00] <wikibugs>	 (03PS4) 10JMeybohm: Increase CPU limit quota of eventgate-analytics by 10 cpus [deployment-charts] - 10https://gerrit.wikimedia.org/r/972825 (https://phabricator.wikimedia.org/T350707)
[14:22:02] <taavi>	 Superpes: I'm a bit confused by https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/971518/, https://www.mediawiki.org/wiki/Extension:AbuseFilter shows -log-private being assigned to sysops by default
[14:22:24] <Superpes>	 Uhm Looking
[14:23:08] <Superpes>	 I also thought it was by default by didn't check (I trusted the request)
[14:23:16] <logmsgbot>	 !log taavi@deploy2002 Finished scap: Backport for [[gerrit:972713|Remove feature flag for email (T347067)]], [[gerrit:972260|Remove feature flag for email (T347067)]], [[gerrit:972429|prod: Stop setting $wgCampaignEventsEnableEmail, unused (T347067)]] (duration: 12m 19s)
[14:23:20] <stashbot>	 T347067: Remove feature flag for email participants - https://phabricator.wikimedia.org/T347067
[14:23:25] <taavi>	 Daimona: yours is live
[14:23:31] <taavi>	 although https://pl.wikipedia.org/wiki/Specjalna:Grupy_u%C5%BCytkownik%C3%B3w doesn't show it. odd
[14:23:45] <Kizule>	 taavi, Superpes: It's because it's set to false in wmf-config/abusefilter.php.
[14:24:12] <Kizule>	 Revert changes in core-Permissions.php and apply them in wmf-config/abusefilter.php.
[14:24:24] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting shell access to production to run maintenance scripts and inspect production MediaWiki tables for Nik Gkountas - https://phabricator.wikimedia.org/T350779 (10Nikerabbit) I approve. Let us know if further details are needed.
[14:24:41] <wikibugs>	 (03PS1) 10Marostegui: ProductionServices.php: Promote pc2014 to pc2 master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/972831
[14:24:45] <Superpes>	 Uhm looking if there's any particular reason 
[14:25:00] <taavi>	 aha, seems to be from https://gerrit.wikimedia.org/r/plugins/gitiles/operations/mediawiki-config/+/7e95f6d842994e82708a127b7c9ab6c08964462f%5E%21/wmf-config/abusefilter.php
[14:25:16] <wikibugs>	 (03PS46) 10Bking: rdf-streaming-updater: update values for application mode [deployment-charts] - 10https://gerrit.wikimedia.org/r/967229 (https://phabricator.wikimedia.org/T349095)
[14:25:31] <wikibugs>	 10SRE, 10SRE-tools, 10Infrastructure-Foundations, 10Puppet-Core, and 2 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619 (10MoritzMuehlenhoff)
[14:25:37] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+2] Increase CPU limit quota of eventgate-analytics by 10 cpus [deployment-charts] - 10https://gerrit.wikimedia.org/r/972825 (https://phabricator.wikimedia.org/T350707) (owner: 10JMeybohm)
[14:25:47] <taavi>	 so you should just drop the current `= false` line in abusefilter.php and revert the changes from core-Permissions.php
[14:25:55] <Superpes>	 Yep doing
[14:25:57] <wikibugs>	 10SRE, 10SRE-tools, 10Infrastructure-Foundations, 10Puppet-Core, and 2 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619 (10MoritzMuehlenhoff)
[14:26:00] <logmsgbot>	 !log bking@deploy2002 helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
[14:26:08] <logmsgbot>	 !log bking@deploy2002 helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
[14:26:22] <Superpes>	 Or maybe is it better to create another task?
[14:26:24] <Daimona>	 taavi: Thanks :)
[14:26:30] <Superpes>	 Ciao Daimona :P
[14:26:37] <wikibugs>	 (03CR) 10Marostegui: "Remember this needs to be removed from zarcillo too" [puppet] - 10https://gerrit.wikimedia.org/r/972509 (https://phabricator.wikimedia.org/T348956) (owner: 10Arnaudb)
[14:26:43] <wikibugs>	 (03CR) 10Marostegui: [C: 03+1] haproxy: remove dbproxy1017 from production [puppet] - 10https://gerrit.wikimedia.org/r/972509 (https://phabricator.wikimedia.org/T348956) (owner: 10Arnaudb)
[14:26:49] <Daimona>	 :D
[14:27:05] <logmsgbot>	 !log fnegri@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on cloudservices1006.eqiad.wmnet with reason: host reimage
[14:27:59] <wikibugs>	 (03Merged) 10jenkins-bot: Increase CPU limit quota of eventgate-analytics by 10 cpus [deployment-charts] - 10https://gerrit.wikimedia.org/r/972825 (https://phabricator.wikimedia.org/T350707) (owner: 10JMeybohm)
[14:28:05] <taavi>	 what would you need another task for? this is doing the exact thing they're asking for, just in a different way than what you initially went for
[14:28:16] <logmsgbot>	 !log bking@deploy2002 helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
[14:28:18] <logmsgbot>	 !log bking@deploy2002 helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
[14:28:37] <Superpes>	 Ahhh yep I thought you were asking to change settings for all wikis! Doing immediately :D
[14:28:46] <wikibugs>	 (03CR) 10Herron: [V: 03+1 C: 03+2] logstash: increase heap to 4g [puppet] - 10https://gerrit.wikimedia.org/r/972456 (https://phabricator.wikimedia.org/T350434) (owner: 10Herron)
[14:28:49] <wikibugs>	 (03PS1) 10Slyngshede: Improve installation and setup procedures for running locally. [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/972834
[14:30:28] <wikibugs>	 (03PS4) 10Superpes15: [plwiki] Add 'abusefilter-log-private' flag  to sysops [mediawiki-config] - 10https://gerrit.wikimedia.org/r/971518 (https://phabricator.wikimedia.org/T350509)
[14:30:53] <logmsgbot>	 !log jayme@deploy2002 helmfile [eqiad] START helmfile.d/admin 'apply'.
[14:30:55] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] trafficserver: move 15% of traffic to mw on k8s [puppet] - 10https://gerrit.wikimedia.org/r/964447 (https://phabricator.wikimedia.org/T348122) (owner: 10Clément Goubert)
[14:30:57] <wikibugs>	 (03PS2) 10Slyngshede: Improve installation and setup procedures for running locally. [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/972834
[14:31:06] <wikibugs>	 (03CR) 10Fabfur: [C: 03+1] "ok" [puppet] - 10https://gerrit.wikimedia.org/r/972822 (https://phabricator.wikimedia.org/T350708) (owner: 10Hnowlan)
[14:31:08] <wikibugs>	 (03PS5) 10Majavah: [plwiki] Add 'abusefilter-log-private' flag  to sysops [mediawiki-config] - 10https://gerrit.wikimedia.org/r/971518 (https://phabricator.wikimedia.org/T350509) (owner: 10Superpes15)
[14:31:24] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by taavi@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/971517 (https://phabricator.wikimedia.org/T350482) (owner: 10Superpes15)
[14:31:25] <icinga-wm>	 PROBLEM - mailman list info on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[14:31:26] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by taavi@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/971518 (https://phabricator.wikimedia.org/T350509) (owner: 10Superpes15)
[14:31:31] <icinga-wm>	 PROBLEM - mailman archives on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[14:31:58] <logmsgbot>	 !log fnegri@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudservices1006.eqiad.wmnet with reason: host reimage
[14:32:11] <wikibugs>	 (03Merged) 10jenkins-bot: [bnwikisource] Change the wordmark [mediawiki-config] - 10https://gerrit.wikimedia.org/r/971517 (https://phabricator.wikimedia.org/T350482) (owner: 10Superpes15)
[14:32:14] <wikibugs>	 (03Merged) 10jenkins-bot: [plwiki] Add 'abusefilter-log-private' flag  to sysops [mediawiki-config] - 10https://gerrit.wikimedia.org/r/971518 (https://phabricator.wikimedia.org/T350509) (owner: 10Superpes15)
[14:32:24] <logmsgbot>	 !log jayme@deploy2002 helmfile [eqiad] DONE helmfile.d/admin 'apply'.
[14:32:27] <logmsgbot>	 !log bking@deploy2002 helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
[14:32:28] <logmsgbot>	 !log bking@deploy2002 helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
[14:32:34] <wikibugs>	 (03PS1) 10Marostegui: production-m5.sql: Add DROP [puppet] - 10https://gerrit.wikimedia.org/r/972835 (https://phabricator.wikimedia.org/T305114)
[14:32:38] <logmsgbot>	 !log bking@deploy2002 helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
[14:32:38] <logmsgbot>	 !log taavi@deploy2002 Started scap: Backport for [[gerrit:971517|[bnwikisource] Change the wordmark (T350482)]], [[gerrit:971518|[plwiki] Add 'abusefilter-log-private' flag  to sysops (T350509)]]
[14:32:43] <icinga-wm>	 RECOVERY - mailman list info on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8571 bytes in 0.284 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[14:32:45] <stashbot>	 T350509: Add (abusefilter-log-private) right to sysop group on plwiki - https://phabricator.wikimedia.org/T350509
[14:32:45] <logmsgbot>	 !log bking@deploy2002 helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
[14:32:45] <stashbot>	 T350482: Replace the wordmark for Bengali Wikisource - https://phabricator.wikimedia.org/T350482
[14:32:47] <icinga-wm>	 RECOVERY - mailman archives on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 50860 bytes in 0.073 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[14:33:39] <logmsgbot>	 !log jayme@deploy2002 helmfile [codfw] START helmfile.d/admin 'apply'.
[14:33:55] <wikibugs>	 (03CR) 10Marostegui: "This is a noop and requires a change in the DB" [puppet] - 10https://gerrit.wikimedia.org/r/972835 (https://phabricator.wikimedia.org/T305114) (owner: 10Marostegui)
[14:34:02] <logmsgbot>	 !log taavi@deploy2002 taavi and superpes: Backport for [[gerrit:971517|[bnwikisource] Change the wordmark (T350482)]], [[gerrit:971518|[plwiki] Add 'abusefilter-log-private' flag  to sysops (T350509)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[14:34:18] <logmsgbot>	 !log jayme@deploy2002 helmfile [codfw] DONE helmfile.d/admin 'apply'.
[14:34:19] <taavi>	 Superpes: both of your patches are available for testing
[14:34:30] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.puppet.migrate-role for role: analytics_cluster::zookeeper
[14:34:50] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] production-m5.sql: Add DROP [puppet] - 10https://gerrit.wikimedia.org/r/972835 (https://phabricator.wikimedia.org/T305114) (owner: 10Marostegui)
[14:34:57] <Superpes>	 And both looks fine on WMdebug taavi
[14:35:05] <taavi>	 ok, syncing
[14:35:07] <logmsgbot>	 !log taavi@deploy2002 taavi and superpes: Continuing with sync
[14:35:14] <Superpes>	 And thanks for the fixing
[14:35:18] <Superpes>	 :)
[14:35:55] <_joe_>	 !log Running puppet on cp-text to pick up the increase in traffic to mw on k8s
[14:35:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:36:08] <wikibugs>	 (03PS1) 10Jforrester: Revert "Fix remaining uses of 'parent'->'super'" [extensions/WikibaseMediaInfo] (wmf/1.42.0-wmf.4) - 10https://gerrit.wikimedia.org/r/972715 (https://phabricator.wikimedia.org/T350777)
[14:37:39] <wikibugs>	 (03CR) 10Bking: rdf-streaming-updater: update values for application mode (032 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/967229 (https://phabricator.wikimedia.org/T349095) (owner: 10Bking)
[14:38:02] <wikibugs>	 (03PS1) 10Muehlenhoff: Switch analytics_cluster::zookeeper to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/972837 (https://phabricator.wikimedia.org/T349619)
[14:38:12] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job ldap in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[14:39:42] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Switch analytics_cluster::zookeeper to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/972837 (https://phabricator.wikimedia.org/T349619) (owner: 10Muehlenhoff)
[14:40:24] <logmsgbot>	 !log taavi@deploy2002 Finished scap: Backport for [[gerrit:971517|[bnwikisource] Change the wordmark (T350482)]], [[gerrit:971518|[plwiki] Add 'abusefilter-log-private' flag  to sysops (T350509)]] (duration: 07m 45s)
[14:40:30] <taavi>	 aaand it's live
[14:40:31] <stashbot>	 T350509: Add (abusefilter-log-private) right to sysop group on plwiki - https://phabricator.wikimedia.org/T350509
[14:40:31] <stashbot>	 T350482: Replace the wordmark for Bengali Wikisource - https://phabricator.wikimedia.org/T350482
[14:40:48] <taavi>	 marostegui: I'm done deploying
[14:40:55] <marostegui>	 taavi: thanks!!!
[14:41:25] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on pc[2012,2014].codfw.wmnet,pc1012.eqiad.wmnet with reason: Upgrade
[14:41:33] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] ProductionServices.php: Promote pc2014 to pc2 master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/972831 (owner: 10Marostegui)
[14:41:40] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc[2012,2014].codfw.wmnet,pc1012.eqiad.wmnet with reason: Upgrade
[14:42:14] <wikibugs>	 (03Merged) 10jenkins-bot: ProductionServices.php: Promote pc2014 to pc2 master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/972831 (owner: 10Marostegui)
[14:42:38] <wikibugs>	 (03PS1) 10Marostegui: Revert "ProductionServices.php: Promote pc2014 to pc2 master" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/972716
[14:42:41] <logmsgbot>	 !log marostegui@deploy2002 Started scap: Backport for [[gerrit:972831|ProductionServices.php: Promote pc2014 to pc2 master]]
[14:42:44] <wikibugs>	 (03CR) 10Marostegui: [C: 04-2] "Not ready yet" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/972716 (owner: 10Marostegui)
[14:44:04] <logmsgbot>	 !log marostegui@deploy2002 marostegui: Backport for [[gerrit:972831|ProductionServices.php: Promote pc2014 to pc2 master]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[14:44:27] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: analytics_cluster::zookeeper
[14:46:05] <logmsgbot>	 !log marostegui@deploy2002 marostegui: Continuing with sync
[14:48:42] <wikibugs>	 10SRE, 10SRE-tools, 10Infrastructure-Foundations, 10Puppet-Core, and 2 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619 (10MoritzMuehlenhoff)
[14:49:51] <wikibugs>	 10SRE, 10SRE-tools, 10Infrastructure-Foundations, 10Puppet-Core, and 2 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619 (10MoritzMuehlenhoff)
[14:51:22] <logmsgbot>	 !log marostegui@deploy2002 Finished scap: Backport for [[gerrit:972831|ProductionServices.php: Promote pc2014 to pc2 master]] (duration: 08m 41s)
[14:51:34] <wikibugs>	 (03PS47) 10Bking: rdf-streaming-updater: update values for application mode [deployment-charts] - 10https://gerrit.wikimedia.org/r/967229 (https://phabricator.wikimedia.org/T349095)
[14:51:55] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.puppet.migrate-role for role: kubernetes::staging::worker
[14:52:51] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Revert "Fix remaining uses of 'parent'->'super'" [extensions/WikibaseMediaInfo] (wmf/1.42.0-wmf.4) - 10https://gerrit.wikimedia.org/r/972715 (https://phabricator.wikimedia.org/T350777) (owner: 10Jforrester)
[14:53:12] <logmsgbot>	 !log bking@deploy2002 helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
[14:53:13] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job ldap in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[14:53:14] <wikibugs>	 (03PS1) 10Muehlenhoff: Switch kubernetes::staging::worker to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/972838 (https://phabricator.wikimedia.org/T349619)
[14:53:27] <logmsgbot>	 !log bking@deploy2002 helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
[14:54:16] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Switch kubernetes::staging::worker to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/972838 (https://phabricator.wikimedia.org/T349619) (owner: 10Muehlenhoff)
[14:54:22] <wikibugs>	 (03CR) 10Stevemunene: [C: 03+2] Revert "Revert "airflow-wmde: configure wmde airflow instance"" [puppet] - 10https://gerrit.wikimedia.org/r/972712 (owner: 10Stevemunene)
[14:55:07] <icinga-wm>	 RECOVERY - BFD status on cloudsw1-c8-eqiad.mgmt is OK: UP: 11 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[14:55:41] <icinga-wm>	 RECOVERY - BGP status on cloudsw1-c8-eqiad.mgmt is OK: BGP OK - up: 14, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[14:56:55] <wikibugs>	 (03CR) 10Marostegui: Revert "ProductionServices.php: Promote pc2014 to pc2 master" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/972716 (owner: 10Marostegui)
[14:57:01] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] Revert "ProductionServices.php: Promote pc2014 to pc2 master" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/972716 (owner: 10Marostegui)
[14:57:41] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "ProductionServices.php: Promote pc2014 to pc2 master" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/972716 (owner: 10Marostegui)
[14:58:05] <logmsgbot>	 !log marostegui@deploy2002 Started scap: Backport for [[gerrit:972716|Revert "ProductionServices.php: Promote pc2014 to pc2 master"]]
[14:59:00] <wikibugs>	 (03PS1) 10Marostegui: pc2014: Move it to pc3 [puppet] - 10https://gerrit.wikimedia.org/r/972841
[14:59:10] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: kubernetes::staging::worker
[14:59:29] <logmsgbot>	 !log marostegui@deploy2002 marostegui: Backport for [[gerrit:972716|Revert "ProductionServices.php: Promote pc2014 to pc2 master"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[14:59:37] <logmsgbot>	 !log marostegui@deploy2002 marostegui: Continuing with sync
[15:00:06] <jouncebot>	 Deploy window Wikifunction Services UTC Afternoon (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20231108T1500)
[15:01:00] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] pc2014: Move it to pc3 [puppet] - 10https://gerrit.wikimedia.org/r/972841 (owner: 10Marostegui)
[15:03:13] <wikibugs>	 (03PS1) 10Arnaudb: auto_schema: add T348183.py [software] - 10https://gerrit.wikimedia.org/r/972510 (https://phabricator.wikimedia.org/T348183)
[15:03:26] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] auto_schema: add T348183.py [software] - 10https://gerrit.wikimedia.org/r/972510 (https://phabricator.wikimedia.org/T348183) (owner: 10Arnaudb)
[15:04:29] <wikibugs>	 (03CR) 10Arnaudb: "I'm not sure this is the right way to go for https://phabricator.wikimedia.org/T348183 as I run all alter table in a single command" [software] - 10https://gerrit.wikimedia.org/r/972510 (https://phabricator.wikimedia.org/T348183) (owner: 10Arnaudb)
[15:04:56] <logmsgbot>	 !log marostegui@deploy2002 Finished scap: Backport for [[gerrit:972716|Revert "ProductionServices.php: Promote pc2014 to pc2 master"]] (duration: 06m 51s)
[15:07:08] <wikibugs>	 (03CR) 10Ladsgroup: "Wrong repo: That's auto_schema itself, you need to make a MR in https://gitlab.wikimedia.org/repos/sre/schema-changes" [software] - 10https://gerrit.wikimedia.org/r/972510 (https://phabricator.wikimedia.org/T348183) (owner: 10Arnaudb)
[15:07:13] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.puppet.migrate-role for role: kubernetes::staging::master
[15:08:20] <wikibugs>	 (03CR) 10JHathaway: [C: 03+1] "looks good" [puppet] - 10https://gerrit.wikimedia.org/r/972744 (owner: 10Jbond)
[15:08:28] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.reimage for host pc2014.codfw.wmnet with OS bookworm
[15:09:38] <wikibugs>	 (03CR) 10JHathaway: "looks good" [puppet] - 10https://gerrit.wikimedia.org/r/971461 (https://phabricator.wikimedia.org/T350008) (owner: 10Jbond)
[15:09:43] <wikibugs>	 (03PS1) 10Muehlenhoff: Switch kubernetes::staging::master to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/972843 (https://phabricator.wikimedia.org/T349619)
[15:11:03] <wikibugs>	 (03PS1) 10Arnaudb: mariadb: re-enable notifications for db1236 [puppet] - 10https://gerrit.wikimedia.org/r/972511 (https://phabricator.wikimedia.org/T344036)
[15:11:22] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Switch kubernetes::staging::master to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/972843 (https://phabricator.wikimedia.org/T349619) (owner: 10Muehlenhoff)
[15:11:32] <wikibugs>	 (03CR) 10Marostegui: "The commit message says prepare db1138 retirement, but there's nothing related to db1138 here, is that expected?" [puppet] - 10https://gerrit.wikimedia.org/r/972507 (https://phabricator.wikimedia.org/T344036) (owner: 10Arnaudb)
[15:11:50] <wikibugs>	 (03CR) 10Marostegui: [C: 03+1] mariadb: re-enable notifications for db1236 [puppet] - 10https://gerrit.wikimedia.org/r/972511 (https://phabricator.wikimedia.org/T344036) (owner: 10Arnaudb)
[15:12:01] <wikibugs>	 (03CR) 10Arnaudb: [C: 03+2] mariadb: re-enable notifications for db1236 [puppet] - 10https://gerrit.wikimedia.org/r/972511 (https://phabricator.wikimedia.org/T344036) (owner: 10Arnaudb)
[15:12:12] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "lgtm" [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/972834 (owner: 10Slyngshede)
[15:14:39] <wikibugs>	 (03PS3) 10Jbond: puppet: puppet-run fix shellcheck issues [puppet] - 10https://gerrit.wikimedia.org/r/972744
[15:14:57] <wikibugs>	 (03PS1) 10Arnaudb: mariadb: removing master candidacy info for db1136 [puppet] - 10https://gerrit.wikimedia.org/r/972512 (https://phabricator.wikimedia.org/T344036)
[15:16:05] <wikibugs>	 (03CR) 10Marostegui: mariadb: removing master candidacy info for db1136 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/972512 (https://phabricator.wikimedia.org/T344036) (owner: 10Arnaudb)
[15:16:25] <wikibugs>	 (03CR) 10Volans: puppet.puppet.get_puppet_ca_hostname: return hardcoded start (032 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/971957 (https://phabricator.wikimedia.org/T349619) (owner: 10Jbond)
[15:17:51] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] puppet: puppet-run fix shellcheck issues [puppet] - 10https://gerrit.wikimedia.org/r/972744 (owner: 10Jbond)
[15:17:54] <wikibugs>	 (03CR) 10JHathaway: [C: 03+1] "looks good" [puppet] - 10https://gerrit.wikimedia.org/r/972335 (https://phabricator.wikimedia.org/T350014) (owner: 10Filippo Giunchedi)
[15:18:23] <wikibugs>	 (03PS2) 10JMeybohm: Update api-gateway for cert-manager support [deployment-charts] - 10https://gerrit.wikimedia.org/r/972404 (https://phabricator.wikimedia.org/T300033)
[15:18:53] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: kubernetes::staging::master
[15:19:40] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Update api-gateway for cert-manager support [deployment-charts] - 10https://gerrit.wikimedia.org/r/972404 (https://phabricator.wikimedia.org/T300033) (owner: 10JMeybohm)
[15:19:44] <wikibugs>	 (03PS1) 10JMeybohm: api-gateway,rest-gateway: Switch to cert-manager certificates [deployment-charts] - 10https://gerrit.wikimedia.org/r/972844 (https://phabricator.wikimedia.org/T300033)
[15:20:35] <wikibugs>	 (03PS1) 10Effie Mouzeli: ipoid: staging testing [deployment-charts] - 10https://gerrit.wikimedia.org/r/972845
[15:20:58] <wikibugs>	 (03PS2) 10Arnaudb: mariadb: removing master candidacy info for db1136 [puppet] - 10https://gerrit.wikimedia.org/r/972512 (https://phabricator.wikimedia.org/T344036)
[15:21:13] <wikibugs>	 (03CR) 10Arnaudb: mariadb: removing master candidacy info for db1136 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/972512 (https://phabricator.wikimedia.org/T344036) (owner: 10Arnaudb)
[15:21:14] <wikibugs>	 (03CR) 10Jbond: [V: 03+1 C: 03+2] mariadb: update to use shared SSL CA [puppet] - 10https://gerrit.wikimedia.org/r/972363 (https://phabricator.wikimedia.org/T340741) (owner: 10Jbond)
[15:22:01] <wikibugs>	 (03CR) 10Effie Mouzeli: [C: 03+2] ipoid: staging testing [deployment-charts] - 10https://gerrit.wikimedia.org/r/972845 (owner: 10Effie Mouzeli)
[15:22:56] <wikibugs>	 (03Merged) 10jenkins-bot: ipoid: staging testing [deployment-charts] - 10https://gerrit.wikimedia.org/r/972845 (owner: 10Effie Mouzeli)
[15:23:40] <wikibugs>	 (03PS1) 10Muehlenhoff: Extend MOU for west1 [puppet] - 10https://gerrit.wikimedia.org/r/972848
[15:24:44] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Extend MOU for west1 [puppet] - 10https://gerrit.wikimedia.org/r/972848 (owner: 10Muehlenhoff)
[15:24:59] <jinxer-wm>	 (PuppetFailure) firing: Puppet has failed on netflow2003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure
[15:25:05] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on pc2014.codfw.wmnet with reason: host reimage
[15:25:14] <wikibugs>	 (03CR) 10Marostegui: [C: 03+1] mariadb: removing master candidacy info for db1136 [puppet] - 10https://gerrit.wikimedia.org/r/972512 (https://phabricator.wikimedia.org/T344036) (owner: 10Arnaudb)
[15:25:32] <wikibugs>	 (03CR) 10Arnaudb: [C: 03+2] mariadb: removing master candidacy info for db1136 [puppet] - 10https://gerrit.wikimedia.org/r/972512 (https://phabricator.wikimedia.org/T344036) (owner: 10Arnaudb)
[15:25:44] <wikibugs>	 10SRE, 10SRE-tools, 10Infrastructure-Foundations, 10Puppet-Core, and 2 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619 (10MoritzMuehlenhoff)
[15:26:10] <wikibugs>	 (03PS48) 10Bking: rdf-streaming-updater: update values for application mode [deployment-charts] - 10https://gerrit.wikimedia.org/r/967229 (https://phabricator.wikimedia.org/T349095)
[15:26:31] <logmsgbot>	 !log jiji@deploy2002 helmfile [staging] START helmfile.d/services/ipoid: apply
[15:26:52] <logmsgbot>	 !log jiji@deploy2002 helmfile [staging] DONE helmfile.d/services/ipoid: apply
[15:26:58] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM" [software/spicerack] - 10https://gerrit.wikimedia.org/r/972459 (owner: 10Jbond)
[15:27:21] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Traffic: Q1:Install cp11[00-15] and rotate into production - https://phabricator.wikimedia.org/T349244 (10Fabfur)
[15:27:26] <logmsgbot>	 !log bking@deploy2002 helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
[15:27:33] <logmsgbot>	 !log bking@deploy2002 helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
[15:27:41] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc2014.codfw.wmnet with reason: host reimage
[15:28:51] <wikibugs>	 (03CR) 10Volans: "Taking the liberty to abandon this as it was superseeded by I5e9fd661f2be03099bd4b0c234c972093dd7cb85" [software/spicerack] - 10https://gerrit.wikimedia.org/r/497764 (owner: 10Gehel)
[15:29:02] <wikibugs>	 (03Abandoned) 10Volans: Expose failed results as part of RemoteExecutionError. [software/spicerack] - 10https://gerrit.wikimedia.org/r/497764 (owner: 10Gehel)
[15:29:29] <gehel>	 volans: thanks for the cleanup!
[15:29:38] <volans>	 :D
[15:29:47] <volans>	 yw, I came across it by chance
[15:30:11] <wikibugs>	 (03Abandoned) 10Arnaudb: auto_schema: add T348183.py [software] - 10https://gerrit.wikimedia.org/r/972510 (https://phabricator.wikimedia.org/T348183) (owner: 10Arnaudb)
[15:30:26] <wikibugs>	 (03CR) 10Hnowlan: [C: 03+2] trafficserver: revert to aqs1 for editor and pageview metrics [puppet] - 10https://gerrit.wikimedia.org/r/972822 (https://phabricator.wikimedia.org/T350708) (owner: 10Hnowlan)
[15:31:10] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+1] mariadb - wikireplicas: update the ssl-ca value used by mariadb [puppet] - 10https://gerrit.wikimedia.org/r/961829 (https://phabricator.wikimedia.org/T340741) (owner: 10Jbond)
[15:32:56] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host kubestagemaster2001.codfw.wmnet
[15:33:34] <bvibber>	 !log brion running requeueTranscodes.php to batch-remove old low-res VP9 WebM transcodes (should be low impact)
[15:33:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:34:07] <wikibugs>	 (03PS1) 10Hnowlan: editor-analytics: bump docker image [deployment-charts] - 10https://gerrit.wikimedia.org/r/972849
[15:34:16] <wikibugs>	 (03CR) 10Majavah: [C: 04-1] mariadb - wmcs: update the ssl-ca value used by mariadb (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/968665 (https://phabricator.wikimedia.org/T340741) (owner: 10Jbond)
[15:34:59] <jinxer-wm>	 (PuppetFailure) resolved: Puppet has failed on netflow2003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure
[15:36:17] <wikibugs>	 (03CR) 10Santiago Faci: [C: 03+1] "It looks good! Thanks!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/972849 (owner: 10Hnowlan)
[15:37:51] <wikibugs>	 (03CR) 10Hnowlan: [C: 03+2] editor-analytics: bump docker image [deployment-charts] - 10https://gerrit.wikimedia.org/r/972849 (owner: 10Hnowlan)
[15:38:37] <wikibugs>	 (03Merged) 10jenkins-bot: editor-analytics: bump docker image [deployment-charts] - 10https://gerrit.wikimedia.org/r/972849 (owner: 10Hnowlan)
[15:39:37] <wikibugs>	 (03PS7) 10Jbond: mariadb - wikireplicas: update the ssl-ca value used by mariadb [puppet] - 10https://gerrit.wikimedia.org/r/961829 (https://phabricator.wikimedia.org/T340741)
[15:39:39] <wikibugs>	 (03PS6) 10Jbond: mariadb - wmcs: update the ssl-ca value used by mariadb [puppet] - 10https://gerrit.wikimedia.org/r/968665 (https://phabricator.wikimedia.org/T340741)
[15:39:49] <wikibugs>	 (03PS6) 10Jbond: mariadb - analytics: update the ssl-ca value used by mariadb [puppet] - 10https://gerrit.wikimedia.org/r/968666 (https://phabricator.wikimedia.org/T340741)
[15:39:53] <wikibugs>	 (03PS6) 10Jbond: mariadb - misc: update the ssl-ca value used by mariadb [puppet] - 10https://gerrit.wikimedia.org/r/968667 (https://phabricator.wikimedia.org/T340741)
[15:39:57] <wikibugs>	 (03PS6) 10Jbond: mariadb - dedicated dbs: update the ssl-ca value used by mariadb [puppet] - 10https://gerrit.wikimedia.org/r/968668 (https://phabricator.wikimedia.org/T340741)
[15:40:01] <wikibugs>	 (03PS6) 10Jbond: mariadb - core: update the ssl-ca value used by mariadb [puppet] - 10https://gerrit.wikimedia.org/r/968669 (https://phabricator.wikimedia.org/T340741)
[15:40:35] <wikibugs>	 (03CR) 10Jbond: "fixed cheers" [puppet] - 10https://gerrit.wikimedia.org/r/968665 (https://phabricator.wikimedia.org/T340741) (owner: 10Jbond)
[15:41:27] <logmsgbot>	 !log hnowlan@deploy2002 helmfile [staging] START helmfile.d/services/editor-analytics: apply
[15:41:39] <logmsgbot>	 !log hnowlan@deploy2002 helmfile [staging] DONE helmfile.d/services/editor-analytics: apply
[15:42:02] <wikibugs>	 (03CR) 10Majavah: [C: 03+1] mariadb - wmcs: update the ssl-ca value used by mariadb [puppet] - 10https://gerrit.wikimedia.org/r/968665 (https://phabricator.wikimedia.org/T340741) (owner: 10Jbond)
[15:42:22] <wikibugs>	 (03Abandoned) 10Bking: staging-eqiad: raise rdf-streaming-updater quota [deployment-charts] - 10https://gerrit.wikimedia.org/r/972483 (https://phabricator.wikimedia.org/T349095) (owner: 10Bking)
[15:43:24] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestagemaster2001.codfw.wmnet
[15:43:24] <wikibugs>	 (03PS1) 10Stevemunene: Revert "Revert "airflow-wmde: Add wmde service user to the Yarn production queue"" [puppet] - 10https://gerrit.wikimedia.org/r/972718
[15:43:52] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host kubestage2001.codfw.wmnet
[15:44:22] <wikibugs>	 (03PS1) 10Stevemunene: Revert "Revert "airflow-wmde: Place airflow1007 in airflow-wmde role"" [puppet] - 10https://gerrit.wikimedia.org/r/972719
[15:44:31] <wikibugs>	 10SRE, 10SRE-tools, 10Infrastructure-Foundations, 10Puppet-Core, and 2 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619 (10MoritzMuehlenhoff)
[15:45:54] <bvibber>	 !log brion running requeueTranscodes.php on mwmaint2002 to continue backfill for iOS-compatible low-res video (throttled)
[15:46:05] <wikibugs>	 (03Abandoned) 10Jcrespo: miniloader: Draft small utility to load a mydumper dump in an emergency [software/wmfbackups] - 10https://gerrit.wikimedia.org/r/863264 (https://phabricator.wikimedia.org/T319383) (owner: 10Jcrespo)
[15:48:41] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc2014.codfw.wmnet with OS bookworm
[15:49:31] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.reimage for host pc1014.eqiad.wmnet with OS bookworm
[15:50:46] <wikibugs>	 (03PS1) 10Brouberol: Setup partman reuse recipe for an-druid hosts [puppet] - 10https://gerrit.wikimedia.org/r/972851 (https://phabricator.wikimedia.org/T332604)
[15:51:19] <jinxer-wm>	 (KubernetesCalicoDown) firing: kubestage2002.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://grafana.wikimedia.org/d/G8zPL7-Wz/?var-dc=codfw%20prometheus%2Fk8s-staging&var-instance=kubestage2002.codfw.wmnet - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[15:51:31] <wikibugs>	 (03PS1) 10Jforrester: Modify regex to reflect updated DOM [extensions/WikibaseMediaInfo] (wmf/1.42.0-wmf.4) - 10https://gerrit.wikimedia.org/r/972720 (https://phabricator.wikimedia.org/T350777)
[15:51:50] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestage2001.codfw.wmnet
[15:52:07] <wikibugs>	 (03CR) 10Stevemunene: [C: 03+2] Revert "Revert "airflow-wmde: Add wmde service user to the Yarn production queue"" [puppet] - 10https://gerrit.wikimedia.org/r/972718 (owner: 10Stevemunene)
[15:53:12] <jinxer-wm>	 (PuppetFailure) firing: Puppet has failed on lists1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure
[15:53:33] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] mariadb - wikireplicas: update the ssl-ca value used by mariadb [puppet] - 10https://gerrit.wikimedia.org/r/961829 (https://phabricator.wikimedia.org/T340741) (owner: 10Jbond)
[15:53:59] <jinxer-wm>	 (PuppetFailure) firing: Puppet has failed on ganeti1014:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure
[15:54:17] <wikibugs>	 (03CR) 10Brouberol: Setup partman reuse recipe for an-druid hosts (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/972851 (https://phabricator.wikimedia.org/T332604) (owner: 10Brouberol)
[15:56:18] <jinxer-wm>	 (KubernetesCalicoDown) resolved: (2) kubestage2001.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[15:57:18] <logmsgbot>	 !log btullis@cumin1001 END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0) restart workers for Hadoop analytics cluster: Roll restart of jvm daemons for openjdk upgrade.
[15:57:58] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.puppet.migrate-role for role: dse_k8s::worker
[15:58:48] <wikibugs>	 (03CR) 10Jcrespo: "Thanks, I will amend this as a comment only/linting change only being noop to keep Matthew's useful comments. I will open a new one with V" [software/transferpy] - 10https://gerrit.wikimedia.org/r/972729 (https://phabricator.wikimedia.org/T330882) (owner: 10Jcrespo)
[15:59:06] <wikibugs>	 (03Abandoned) 10Jforrester: Revert "Fix remaining uses of 'parent'->'super'" [extensions/WikibaseMediaInfo] (wmf/1.42.0-wmf.4) - 10https://gerrit.wikimedia.org/r/972715 (https://phabricator.wikimedia.org/T350777) (owner: 10Jforrester)
[15:59:18] <wikibugs>	 (03PS1) 10Muehlenhoff: Switch dse_k8s::worker to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/972853 (https://phabricator.wikimedia.org/T349619)
[16:01:10] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Switch dse_k8s::worker to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/972853 (https://phabricator.wikimedia.org/T349619) (owner: 10Muehlenhoff)
[16:01:35] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on pc1014.eqiad.wmnet with reason: host reimage
[16:03:38] <wikibugs>	 (03CR) 10Stevemunene: [C: 03+2] Revert "Revert "airflow-wmde: Place airflow1007 in airflow-wmde role"" [puppet] - 10https://gerrit.wikimedia.org/r/972719 (owner: 10Stevemunene)
[16:03:59] <jinxer-wm>	 (PuppetFailure) resolved: Puppet has failed on ganeti1014:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure
[16:04:38] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1014.eqiad.wmnet with reason: host reimage
[16:04:57] <James_F>	 jouncebot: nowandnext
[16:04:57] <jouncebot>	 No deployments scheduled for the next 1 hour(s) and 55 minute(s)
[16:04:58] <jouncebot>	 In 1 hour(s) and 55 minute(s): MediaWiki infrastucture (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20231108T1800)
[16:05:30] <James_F>	 jnuche: OK for me to deploy that wmf.4 WBMI fix? Did you want to do so?
[16:06:39] <wikibugs>	 (03PS6) 10Kamila Součková: [WIP] add kube-state-metrics helmfile [deployment-charts] - 10https://gerrit.wikimedia.org/r/972400 (https://phabricator.wikimedia.org/T264625)
[16:06:56] <jnuche>	 James_F:  happy with you going ahead if you don't mind doing it
[16:07:05] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by jforrester@deploy2002 using scap backport" [extensions/WikibaseMediaInfo] (wmf/1.42.0-wmf.4) - 10https://gerrit.wikimedia.org/r/972720 (https://phabricator.wikimedia.org/T350777) (owner: 10Jforrester)
[16:07:07] <James_F>	 Sure.
[16:09:09] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: dse_k8s::worker
[16:09:09] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Modify regex to reflect updated DOM [extensions/WikibaseMediaInfo] (wmf/1.42.0-wmf.4) - 10https://gerrit.wikimedia.org/r/972720 (https://phabricator.wikimedia.org/T350777) (owner: 10Jforrester)
[16:09:31] <James_F>	 Dear Vector…
[16:09:34] <icinga-wm>	 PROBLEM - MariaDB Replica Lag: s3 on clouddb1021 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 570.13 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[16:09:42] <icinga-wm>	 PROBLEM - MariaDB Replica Lag: s7 on clouddb1021 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 577.88 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[16:10:11] <wikibugs>	 (03PS1) 10Jforrester: Skip PerformanceBudgetTest::testTotalModulesSize [skins/Vector] (wmf/1.42.0-wmf.4) - 10https://gerrit.wikimedia.org/r/972721 (https://phabricator.wikimedia.org/T350338)
[16:10:22] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by jforrester@deploy2002 using scap backport" [skins/Vector] (wmf/1.42.0-wmf.4) - 10https://gerrit.wikimedia.org/r/972721 (https://phabricator.wikimedia.org/T350338) (owner: 10Jforrester)
[16:10:28] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by jforrester@deploy2002 using scap backport" [extensions/WikibaseMediaInfo] (wmf/1.42.0-wmf.4) - 10https://gerrit.wikimedia.org/r/972720 (https://phabricator.wikimedia.org/T350777) (owner: 10Jforrester)
[16:10:39] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by jforrester@deploy2002 using scap backport" [skins/Vector] (wmf/1.42.0-wmf.4) - 10https://gerrit.wikimedia.org/r/972721 (https://phabricator.wikimedia.org/T350338) (owner: 10Jforrester)
[16:10:45] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by jforrester@deploy2002 using scap backport" [extensions/WikibaseMediaInfo] (wmf/1.42.0-wmf.4) - 10https://gerrit.wikimedia.org/r/972720 (https://phabricator.wikimedia.org/T350777) (owner: 10Jforrester)
[16:11:08] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.puppet.migrate-role for role: dse_k8s::master
[16:11:18] <icinga-wm>	 PROBLEM - MariaDB Replica Lag: s6 on clouddb1021 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 674.83 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[16:15:04] <icinga-wm>	 PROBLEM - MariaDB Replica Lag: s1 on clouddb1021 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 901.22 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[16:15:54] <icinga-wm>	 PROBLEM - MariaDB Replica SQL: s6 on clouddb1021 is CRITICAL: CRITICAL slave_sql_state could not connect https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[16:15:58] <icinga-wm>	 PROBLEM - MariaDB Replica SQL: s2 on clouddb1021 is CRITICAL: CRITICAL slave_sql_state could not connect https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[16:16:04] <icinga-wm>	 PROBLEM - MariaDB Replica IO: s3 on clouddb1021 is CRITICAL: CRITICAL slave_io_state could not connect https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[16:16:06] <icinga-wm>	 PROBLEM - MariaDB Replica IO: s7 on clouddb1021 is CRITICAL: CRITICAL slave_io_state could not connect https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[16:16:06] <icinga-wm>	 PROBLEM - MariaDB Replica IO: s4 on clouddb1021 is CRITICAL: CRITICAL slave_io_state could not connect https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[16:16:10] <icinga-wm>	 PROBLEM - MariaDB Replica IO: s5 on clouddb1021 is CRITICAL: CRITICAL slave_io_state could not connect https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[16:16:32] <icinga-wm>	 PROBLEM - MariaDB Replica SQL: s1 on clouddb1021 is CRITICAL: CRITICAL slave_sql_state could not connect https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[16:16:32] <icinga-wm>	 PROBLEM - MariaDB Replica IO: s6 on clouddb1021 is CRITICAL: CRITICAL slave_io_state could not connect https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[16:16:52] <icinga-wm>	 PROBLEM - MariaDB Replica IO: s8 on clouddb1021 is CRITICAL: CRITICAL slave_io_state could not connect https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[16:17:27] <wikibugs>	 (03PS1) 10Muehlenhoff: Switch dse_k8s::worker to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/972856 (https://phabricator.wikimedia.org/T349619)
[16:17:28] <icinga-wm>	 PROBLEM - MariaDB Replica Lag: s2 on clouddb1021 is CRITICAL: CRITICAL slave_sql_lag could not connect https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[16:18:04] <icinga-wm>	 PROBLEM - MariaDB read only wikireplica-s5 on clouddb1021 is CRITICAL: Could not connect to localhost:3315 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Master_comes_back_in_read_only
[16:18:22] <icinga-wm>	 PROBLEM - MariaDB read only s3 on clouddb1021 is CRITICAL: Could not connect to localhost:3313 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Master_comes_back_in_read_only
[16:18:26] <icinga-wm>	 PROBLEM - MariaDB read only s2 on clouddb1021 is CRITICAL: Could not connect to localhost:3312 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Master_comes_back_in_read_only
[16:18:28] <icinga-wm>	 PROBLEM - MariaDB read only s7 on clouddb1021 is CRITICAL: Could not connect to localhost:3317 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Master_comes_back_in_read_only
[16:18:28] <icinga-wm>	 PROBLEM - MariaDB read only s4 on clouddb1021 is CRITICAL: Could not connect to localhost:3314 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Master_comes_back_in_read_only
[16:18:32] <icinga-wm>	 PROBLEM - MariaDB read only wikireplica-s7 on clouddb1021 is CRITICAL: Could not connect to localhost:3317 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Master_comes_back_in_read_only
[16:18:32] <icinga-wm>	 PROBLEM - MariaDB read only wikireplica-s3 on clouddb1021 is CRITICAL: Could not connect to localhost:3313 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Master_comes_back_in_read_only
[16:18:34] <icinga-wm>	 PROBLEM - MariaDB read only wikireplica-s8 on clouddb1021 is CRITICAL: Could not connect to localhost:3318 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Master_comes_back_in_read_only
[16:18:40] <icinga-wm>	 PROBLEM - MariaDB read only s5 on clouddb1021 is CRITICAL: Could not connect to localhost:3315 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Master_comes_back_in_read_only
[16:18:46] <icinga-wm>	 PROBLEM - mysqld processes on clouddb1021 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting
[16:18:50] <icinga-wm>	 PROBLEM - MariaDB read only s6 on clouddb1021 is CRITICAL: Could not connect to localhost:3316 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Master_comes_back_in_read_only
[16:18:54] <icinga-wm>	 PROBLEM - MariaDB read only wikireplica-s2 on clouddb1021 is CRITICAL: Could not connect to localhost:3312 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Master_comes_back_in_read_only
[16:18:54] <icinga-wm>	 PROBLEM - MariaDB read only s1 on clouddb1021 is CRITICAL: Could not connect to localhost:3311 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Master_comes_back_in_read_only
[16:18:56] <icinga-wm>	 PROBLEM - MariaDB read only wikireplica-s1 on clouddb1021 is CRITICAL: Could not connect to localhost:3311 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Master_comes_back_in_read_only
[16:19:06] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1014.eqiad.wmnet with OS bookworm
[16:19:29] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Switch dse_k8s::worker to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/972856 (https://phabricator.wikimedia.org/T349619) (owner: 10Muehlenhoff)
[16:19:36] <icinga-wm>	 PROBLEM - MariaDB read only s8 on clouddb1021 is CRITICAL: Could not connect to localhost:3318 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Master_comes_back_in_read_only
[16:19:56] <icinga-wm>	 PROBLEM - MariaDB Replica Lag: s4 on clouddb1021 is CRITICAL: CRITICAL slave_sql_lag could not connect https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[16:21:12] <icinga-wm>	 PROBLEM - MariaDB Replica SQL: s3 on clouddb1021 is CRITICAL: CRITICAL slave_sql_state could not connect https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[16:21:54] <icinga-wm>	 PROBLEM - MariaDB read only wikireplica-s4 on clouddb1021 is CRITICAL: Could not connect to localhost:3314 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Master_comes_back_in_read_only
[16:22:03] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Modify regex to reflect updated DOM [extensions/WikibaseMediaInfo] (wmf/1.42.0-wmf.4) - 10https://gerrit.wikimedia.org/r/972720 (https://phabricator.wikimedia.org/T350777) (owner: 10Jforrester)
[16:22:05] * James_F sighs so very much at Vector.
[16:22:13] <wikibugs>	 (03CR) 10Jforrester: [C: 03+2] Modify regex to reflect updated DOM [extensions/WikibaseMediaInfo] (wmf/1.42.0-wmf.4) - 10https://gerrit.wikimedia.org/r/972720 (https://phabricator.wikimedia.org/T350777) (owner: 10Jforrester)
[16:22:30] <icinga-wm>	 PROBLEM - MariaDB Replica Lag: s5 on clouddb1021 is CRITICAL: CRITICAL slave_sql_lag could not connect https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[16:22:34] <icinga-wm>	 RECOVERY - MariaDB read only wikireplica-s2 on clouddb1021 is OK: Version 10.6.14-MariaDB, Uptime 1s, read_only: True, event_scheduler: False, 11.56 QPS, connection latency: 0.019665s, query latency: 0.017255s https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Master_comes_back_in_read_only
[16:22:36] <icinga-wm>	 RECOVERY - MariaDB read only s1 on clouddb1021 is OK: Version 10.6.14-MariaDB, Uptime 2s, read_only: True, event_scheduler: False, 11.68 QPS, connection latency: 0.405751s, query latency: 0.009362s https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Master_comes_back_in_read_only
[16:22:36] <icinga-wm>	 RECOVERY - MariaDB read only wikireplica-s1 on clouddb1021 is OK: Version 10.6.14-MariaDB, Uptime 3s, read_only: True, event_scheduler: False, 11.77 QPS, connection latency: 0.005181s, query latency: 0.000601s https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Master_comes_back_in_read_only
[16:22:56] <icinga-wm>	 RECOVERY - MariaDB read only wikireplica-s5 on clouddb1021 is OK: Version 10.6.14-MariaDB, Uptime 23s, read_only: True, event_scheduler: False, 11.75 QPS, connection latency: 0.005656s, query latency: 0.000409s https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Master_comes_back_in_read_only
[16:23:08] <icinga-wm>	 RECOVERY - MariaDB read only wikireplica-s4 on clouddb1021 is OK: Version 10.6.14-MariaDB, Uptime 35s, read_only: True, event_scheduler: False, 21.62 QPS, connection latency: 0.004534s, query latency: 0.000499s https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Master_comes_back_in_read_only
[16:23:16] <icinga-wm>	 RECOVERY - MariaDB read only s3 on clouddb1021 is OK: Version 10.6.14-MariaDB, Uptime 9s, read_only: True, event_scheduler: False, 11.48 QPS, connection latency: 0.006813s, query latency: 0.017737s https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Master_comes_back_in_read_only
[16:23:18] <icinga-wm>	 RECOVERY - MariaDB read only s8 on clouddb1021 is OK: Version 10.6.14-MariaDB, Uptime 44s, read_only: True, event_scheduler: False, 20.60 QPS, connection latency: 0.005525s, query latency: 0.000557s https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Master_comes_back_in_read_only
[16:23:20] <icinga-wm>	 RECOVERY - MariaDB read only s2 on clouddb1021 is OK: Version 10.6.14-MariaDB, Uptime 15s, read_only: True, event_scheduler: False, 11.78 QPS, connection latency: 0.004924s, query latency: 0.000509s https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Master_comes_back_in_read_only
[16:23:22] <icinga-wm>	 RECOVERY - MariaDB read only s7 on clouddb1021 is OK: Version 10.6.14-MariaDB, Uptime 48s, read_only: True, event_scheduler: False, 11.76 QPS, connection latency: 0.004756s, query latency: 0.000520s https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Master_comes_back_in_read_only
[16:23:22] <icinga-wm>	 RECOVERY - MariaDB read only s4 on clouddb1021 is OK: Version 10.6.14-MariaDB, Uptime 4s, read_only: True, event_scheduler: False, 11.77 QPS, connection latency: 0.005265s, query latency: 0.000471s https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Master_comes_back_in_read_only
[16:23:23] <logmsgbot>	 !log btullis@cumin1001 START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad
[16:23:26] <icinga-wm>	 RECOVERY - MariaDB read only wikireplica-s7 on clouddb1021 is OK: Version 10.6.14-MariaDB, Uptime 52s, read_only: True, event_scheduler: False, 11.80 QPS, connection latency: 0.005719s, query latency: 0.000480s https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Master_comes_back_in_read_only
[16:23:26] <icinga-wm>	 RECOVERY - MariaDB read only wikireplica-s3 on clouddb1021 is OK: Version 10.6.14-MariaDB, Uptime 18s, read_only: True, event_scheduler: False, 11.78 QPS, connection latency: 0.005648s, query latency: 0.000498s https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Master_comes_back_in_read_only
[16:23:30] <icinga-wm>	 RECOVERY - MariaDB read only wikireplica-s8 on clouddb1021 is OK: Version 10.6.14-MariaDB, Uptime 56s, read_only: True, event_scheduler: False, 20.60 QPS, connection latency: 0.004390s, query latency: 0.000482s https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Master_comes_back_in_read_only
[16:23:36] <icinga-wm>	 RECOVERY - MariaDB read only s5 on clouddb1021 is OK: Version 10.6.14-MariaDB, Uptime 63s, read_only: True, event_scheduler: False, 20.55 QPS, connection latency: 0.004686s, query latency: 0.000417s https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Master_comes_back_in_read_only
[16:23:40] <icinga-wm>	 RECOVERY - mysqld processes on clouddb1021 is OK: PROCS OK: 8 processes with command name mysqld https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting
[16:23:46] <icinga-wm>	 RECOVERY - MariaDB read only s6 on clouddb1021 is OK: Version 10.6.14-MariaDB, Uptime 72s, read_only: True, event_scheduler: False, 11.61 QPS, connection latency: 0.010574s, query latency: 0.000761s https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Master_comes_back_in_read_only
[16:23:59] <jinxer-wm>	 (PuppetFailure) firing: Puppet has failed on bast1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure
[16:24:44] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: dse_k8s::master
[16:25:06] <icinga-wm>	 PROBLEM - MariaDB Replica Lag: s8 on clouddb1021 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 1502.18 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[16:25:21] <jnuche>	 Vector tests be moody today...
[16:26:25] <wikibugs>	 10SRE, 10SRE-tools, 10Infrastructure-Foundations, 10Puppet-Core, and 2 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619 (10MoritzMuehlenhoff)
[16:27:04] <wikibugs>	 (03PS1) 10Krinkle: noc: fix indentation in base.css [mediawiki-config] - 10https://gerrit.wikimedia.org/r/972857
[16:28:14] <wikibugs>	 10SRE, 10SRE-tools, 10Infrastructure-Foundations, 10Puppet-Core, and 2 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619 (10MoritzMuehlenhoff)
[16:29:03] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by jforrester@deploy2002 using scap backport" [skins/Vector] (wmf/1.42.0-wmf.4) - 10https://gerrit.wikimedia.org/r/972721 (https://phabricator.wikimedia.org/T350338) (owner: 10Jforrester)
[16:29:09] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by jforrester@deploy2002 using scap backport" [extensions/WikibaseMediaInfo] (wmf/1.42.0-wmf.4) - 10https://gerrit.wikimedia.org/r/972720 (https://phabricator.wikimedia.org/T350777) (owner: 10Jforrester)
[16:29:27] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] modules: add base.statsd:1.0.1 [deployment-charts] - 10https://gerrit.wikimedia.org/r/972339 (owner: 10Giuseppe Lavagetto)
[16:29:34] <wikibugs>	 10SRE, 10SRE-tools, 10Infrastructure-Foundations, 10Puppet-Core, and 2 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619 (10MoritzMuehlenhoff)
[16:30:24] <wikibugs>	 (03Merged) 10jenkins-bot: modules: add base.statsd:1.0.1 [deployment-charts] - 10https://gerrit.wikimedia.org/r/972339 (owner: 10Giuseppe Lavagetto)
[16:31:14] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] base.statsd: add prestop sleep helper [deployment-charts] - 10https://gerrit.wikimedia.org/r/972340 (owner: 10Giuseppe Lavagetto)
[16:31:34] <wikibugs>	 10SRE, 10SRE-tools, 10Infrastructure-Foundations, 10Puppet-Core, 10Puppet (Puppet 7.0): Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619 (10MoritzMuehlenhoff)
[16:31:44] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] base.statsd: add prestop sleep helper [deployment-charts] - 10https://gerrit.wikimedia.org/r/972340 (owner: 10Giuseppe Lavagetto)
[16:31:57] <logmsgbot>	 !log btullis@cumin1001 START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-analytics cluster: Roll restart of jvm daemons.
[16:33:00] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Structured-Data-Backlog, 10UploadWizard: Access request to deleted image files in the backup cluster - https://phabricator.wikimedia.org/T350020 (10jcrespo) Ok. Then it seems we are ok for the most part- I will start working then on access, as this is the first time such acc...
[16:33:38] <logmsgbot>	 !log ebernhardson@deploy2002 Started deploy [airflow-dags/search@869cca4]: Set group ownership of processed sparql queries
[16:33:59] <jinxer-wm>	 (PuppetFailure) resolved: Puppet has failed on bast1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure
[16:34:05] <logmsgbot>	 !log ebernhardson@deploy2002 Finished deploy [airflow-dags/search@869cca4]: Set group ownership of processed sparql queries (duration: 00m 27s)
[16:35:56] <wikibugs>	 (03PS3) 10Giuseppe Lavagetto: base.statsd: add prestop sleep helper [deployment-charts] - 10https://gerrit.wikimedia.org/r/972340
[16:36:20] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Structured-Data-Backlog, 10UploadWizard: Access request to deleted image files in the backup cluster - https://phabricator.wikimedia.org/T350020 (10jcrespo) One last thing- legal rarely comments on stuff here in public on Phab- you may want to reach them directly.
[16:37:23] <icinga-wm>	 PROBLEM - mailman list info on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[16:38:16] <logmsgbot>	 !log btullis@cumin1001 END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-analytics cluster: Roll restart of jvm daemons.
[16:39:04] <wikibugs>	 (03Merged) 10jenkins-bot: Skip PerformanceBudgetTest::testTotalModulesSize [skins/Vector] (wmf/1.42.0-wmf.4) - 10https://gerrit.wikimedia.org/r/972721 (https://phabricator.wikimedia.org/T350338) (owner: 10Jforrester)
[16:39:07] <wikibugs>	 (03Merged) 10jenkins-bot: Modify regex to reflect updated DOM [extensions/WikibaseMediaInfo] (wmf/1.42.0-wmf.4) - 10https://gerrit.wikimedia.org/r/972720 (https://phabricator.wikimedia.org/T350777) (owner: 10Jforrester)
[16:39:15] <James_F>	 Finally.
[16:39:27] <icinga-wm>	 RECOVERY - mailman list info on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8571 bytes in 0.260 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[16:39:33] <logmsgbot>	 !log jforrester@deploy2002 Started scap: Backport for [[gerrit:972721|Skip PerformanceBudgetTest::testTotalModulesSize (T350338)]], [[gerrit:972720|Modify regex to reflect updated DOM (T350777)]]
[16:39:45] <wikibugs>	 (03PS3) 10Giuseppe Lavagetto: mediawiki: update statsd module [deployment-charts] - 10https://gerrit.wikimedia.org/r/972341
[16:39:49] <stashbot>	 T350338: Vector PerformanceBudgetTest::testTotalModulesSize CI break - https://phabricator.wikimedia.org/T350338
[16:39:49] <stashbot>	 T350777: 1.42.0-wmf.4: Structured Data on Wikimedia Commons not longer available - https://phabricator.wikimedia.org/T350777
[16:40:56] <logmsgbot>	 !log jforrester@deploy2002 jforrester: Backport for [[gerrit:972721|Skip PerformanceBudgetTest::testTotalModulesSize (T350338)]], [[gerrit:972720|Modify regex to reflect updated DOM (T350777)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[16:41:26] <logmsgbot>	 !log jforrester@deploy2002 jforrester: Continuing with sync
[16:43:44] <wikibugs>	 (03PS7) 10Kamila Součková: Add WIP kube-state-metrics deployment to staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/972400 (https://phabricator.wikimedia.org/T264625)
[16:44:07] <wikibugs>	 (03CR) 10Kamila Součková: [C: 03+2] Add WIP kube-state-metrics deployment to staging (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/972400 (https://phabricator.wikimedia.org/T264625) (owner: 10Kamila Součková)
[16:44:11] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] mediawiki: update statsd module [deployment-charts] - 10https://gerrit.wikimedia.org/r/972341 (owner: 10Giuseppe Lavagetto)
[16:45:21] <wikibugs>	 (03Merged) 10jenkins-bot: mediawiki: update statsd module [deployment-charts] - 10https://gerrit.wikimedia.org/r/972341 (owner: 10Giuseppe Lavagetto)
[16:47:03] <logmsgbot>	 !log jforrester@deploy2002 Finished scap: Backport for [[gerrit:972721|Skip PerformanceBudgetTest::testTotalModulesSize (T350338)]], [[gerrit:972720|Modify regex to reflect updated DOM (T350777)]] (duration: 07m 29s)
[16:47:11] <stashbot>	 T350338: Vector PerformanceBudgetTest::testTotalModulesSize CI break - https://phabricator.wikimedia.org/T350338
[16:47:11] <stashbot>	 T350777: 1.42.0-wmf.4: Structured Data on Wikimedia Commons not longer available - https://phabricator.wikimedia.org/T350777
[16:47:38] <James_F>	 jnuche: Train should be unblocked
[16:48:14] <logmsgbot>	 !log btullis@cumin1001 END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-test-eqiad
[16:48:21] <wikibugs>	 (03CR) 10Kamila Součková: [C: 03+2] Initial commit of kube-state-metrics chart from prometheus-community (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/970425 (https://phabricator.wikimedia.org/T264625) (owner: 10Kamila Součková)
[16:49:00] <wikibugs>	 (03Merged) 10jenkins-bot: Initial commit of kube-state-metrics chart from prometheus-community [deployment-charts] - 10https://gerrit.wikimedia.org/r/970425 (https://phabricator.wikimedia.org/T264625) (owner: 10Kamila Součková)
[16:49:49] <jnuche>	 James_F: thanks a lot, that patch was painful
[16:49:55] <jnuche>	 jouncebot: nowandnext
[16:49:55] <jouncebot>	 No deployments scheduled for the next 1 hour(s) and 10 minute(s)
[16:49:55] <jouncebot>	 In 1 hour(s) and 10 minute(s): MediaWiki infrastucture (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20231108T1800)
[16:50:02] <jnuche>	 ok, rolling forward
[16:50:47] <wikibugs>	 (03PS1) 10TrainBranchBot: group1 wikis to 1.42.0-wmf.4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/972861 (https://phabricator.wikimedia.org/T350080)
[16:50:49] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] group1 wikis to 1.42.0-wmf.4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/972861 (https://phabricator.wikimedia.org/T350080) (owner: 10TrainBranchBot)
[16:50:56] <wikibugs>	 (03Merged) 10jenkins-bot: Add WIP kube-state-metrics deployment to staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/972400 (https://phabricator.wikimedia.org/T264625) (owner: 10Kamila Součková)
[16:51:35] <wikibugs>	 (03Merged) 10jenkins-bot: group1 wikis to 1.42.0-wmf.4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/972861 (https://phabricator.wikimedia.org/T350080) (owner: 10TrainBranchBot)
[16:51:48] <_joe_>	 jnuche: please lmk when you're done
[16:51:51] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] mw-page-content-change-enrich - bump to v1.29.0 to pick up retry logic change [deployment-charts] - 10https://gerrit.wikimedia.org/r/972463 (https://phabricator.wikimedia.org/T347884) (owner: 10Ottomata)
[16:51:59] <jnuche>	 will do
[16:52:14] <_joe_>	 I was about to make a change to mw on k8s, I'll wait for you to be done
[16:52:59] <wikibugs>	 (03Merged) 10jenkins-bot: mw-page-content-change-enrich - bump to v1.29.0 to pick up retry logic change [deployment-charts] - 10https://gerrit.wikimedia.org/r/972463 (https://phabricator.wikimedia.org/T347884) (owner: 10Ottomata)
[16:54:13] <logmsgbot>	 !log otto@deploy2002 helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
[16:54:23] <logmsgbot>	 !log otto@deploy2002 helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
[16:54:35] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Structured-Data-Backlog, 10UploadWizard: Access request to deleted image files in the backup cluster - https://phabricator.wikimedia.org/T350020 (10jcrespo) While checking the things I need to apply the change, I need 2 additional data points-  * The list of ips where the fi...
[16:56:30] <logmsgbot>	 !log hnowlan@deploy2002 helmfile [staging] START helmfile.d/services/editor-analytics: apply
[16:56:41] <logmsgbot>	 !log hnowlan@deploy2002 helmfile [staging] DONE helmfile.d/services/editor-analytics: apply
[16:56:55] <wikibugs>	 (03CR) 10VolkerE: [C: 03+1] noc: fix indentation in base.css [mediawiki-config] - 10https://gerrit.wikimedia.org/r/972857 (owner: 10Krinkle)
[16:57:08] <wikibugs>	 (03CR) 10VolkerE: [C: 03+1] "Can't +2 here" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/972857 (owner: 10Krinkle)
[16:57:10] <wikibugs>	 (03PS1) 10Hnowlan: editor-analytics: bump image version [deployment-charts] - 10https://gerrit.wikimedia.org/r/972864
[16:58:03] <logmsgbot>	 !log jnuche@deploy2002 rebuilt and synchronized wikiversions files: group1 wikis to 1.42.0-wmf.4  refs T350080
[17:02:03] <wikibugs>	 (03PS3) 10Giuseppe Lavagetto: mw-debug: add statsd-exporter [deployment-charts] - 10https://gerrit.wikimedia.org/r/972342 (https://phabricator.wikimedia.org/T240685)
[17:02:05] <wikibugs>	 (03PS3) 10Giuseppe Lavagetto: mediawiki: add statsd exporter [deployment-charts] - 10https://gerrit.wikimedia.org/r/972343 (https://phabricator.wikimedia.org/T240685)
[17:03:56] <logmsgbot>	 !log jnuche@deploy2002 Synchronized php: group1 wikis to 1.42.0-wmf.4  refs T350080 (duration: 05m 52s)
[17:05:26] <jnuche>	 logs look clean and commons is showing structured data again, it's looking good
[17:05:30] <jnuche>	 _joe_: please go ahead
[17:05:38] <_joe_>	 jnuche: perfect, thanks
[17:05:45] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] mw-debug: add statsd-exporter [deployment-charts] - 10https://gerrit.wikimedia.org/r/972342 (https://phabricator.wikimedia.org/T240685) (owner: 10Giuseppe Lavagetto)
[17:06:17] <logmsgbot>	 !log kamila@deploy2002 helmfile [staging-codfw] START helmfile.d/admin 'apply'.
[17:06:18] <logmsgbot>	 !log kamila@deploy2002 helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
[17:07:05] <logmsgbot>	 !log kamila@deploy2002 helmfile [staging-codfw] START helmfile.d/admin 'apply'.
[17:07:06] <logmsgbot>	 !log kamila@deploy2002 helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
[17:07:13] <wikibugs>	 (03Merged) 10jenkins-bot: mw-debug: add statsd-exporter [deployment-charts] - 10https://gerrit.wikimedia.org/r/972342 (https://phabricator.wikimedia.org/T240685) (owner: 10Giuseppe Lavagetto)
[17:08:20] <logmsgbot>	 !log kamila@deploy2002 helmfile [staging-codfw] START helmfile.d/admin 'apply'.
[17:08:57] <wikibugs>	 (03PS1) 10Effie Mouzeli: Dummy commit to bump image [software/tegola] (wmf/v0.19.x) - 10https://gerrit.wikimedia.org/r/972866 (https://phabricator.wikimedia.org/T348647)
[17:09:06] <wikibugs>	 (03PS4) 10Cathal Mooney: Change 'anycast_gw' var in int config to represent type of IRB needed [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/971937 (https://phabricator.wikimedia.org/T350579)
[17:09:09] <wikibugs>	 (03CR) 10Krinkle: noc: fix indentation in base.css (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/972857 (owner: 10Krinkle)
[17:09:12] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Traffic: Q1:Install cp11[00-15] and rotate into production - https://phabricator.wikimedia.org/T349244 (10Fabfur)
[17:10:11] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+1] k8s-controller-sidecars: Initial release [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/972535 (owner: 10RLazarus)
[17:10:46] <wikibugs>	 (03CR) 10Cathal Mooney: Change 'anycast_gw' var in int config to represent type of IRB needed (032 comments) [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/971937 (https://phabricator.wikimedia.org/T350579) (owner: 10Cathal Mooney)
[17:18:56] <logmsgbot>	 !log oblivian@deploy2002 helmfile [eqiad] START helmfile.d/services/mw-debug: apply
[17:19:41] <logmsgbot>	 !log oblivian@deploy2002 helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
[17:20:17] <sukhe>	 !log depool cp3079.esams.wmnet for BIOS settings update
[17:20:48] <wikibugs>	 (03Abandoned) 10Effie Mouzeli: Switch from X-Real-IP to X-Client-IP [puppet] - 10https://gerrit.wikimedia.org/r/552515 (https://phabricator.wikimedia.org/T239340) (owner: 10Alexandros Kosiaris)
[17:22:43] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.downtime for 1:00:00 on cp3079.esams.wmnet with reason: BIOS settings change
[17:22:59] <logmsgbot>	 !log sukhe@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp3079.esams.wmnet with reason: BIOS settings change
[17:23:02] <wikibugs>	 (03CR) 10Effie Mouzeli: [C: 03+2] Dummy commit to bump image [software/tegola] (wmf/v0.19.x) - 10https://gerrit.wikimedia.org/r/972866 (https://phabricator.wikimedia.org/T348647) (owner: 10Effie Mouzeli)
[17:23:04] <wikibugs>	 (03CR) 10Kamila Součková: [C: 03+2] "recheck" [deployment-charts] - 10https://gerrit.wikimedia.org/r/972400 (https://phabricator.wikimedia.org/T264625) (owner: 10Kamila Součková)
[17:23:36] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.reboot-single for host cp3079.esams.wmnet
[17:23:51] <wikibugs>	 (03Merged) 10jenkins-bot: Dummy commit to bump image [software/tegola] (wmf/v0.19.x) - 10https://gerrit.wikimedia.org/r/972866 (https://phabricator.wikimedia.org/T348647) (owner: 10Effie Mouzeli)
[17:23:54] <wikibugs>	 (03PS3) 10JMeybohm: Update api-gateway for cert-manager support [deployment-charts] - 10https://gerrit.wikimedia.org/r/972404 (https://phabricator.wikimedia.org/T300033)
[17:23:56] <wikibugs>	 (03PS2) 10JMeybohm: api-gateway,rest-gateway: Switch to cert-manager certificates [deployment-charts] - 10https://gerrit.wikimedia.org/r/972844 (https://phabricator.wikimedia.org/T300033)
[17:27:54] <wikibugs>	 (03PS2) 10Hnowlan: wmnet: add records for mw-jobrunner [dns] - 10https://gerrit.wikimedia.org/r/972394 (https://phabricator.wikimedia.org/T349796)
[17:28:55] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] wmnet: add records for mw-jobrunner [dns] - 10https://gerrit.wikimedia.org/r/972394 (https://phabricator.wikimedia.org/T349796) (owner: 10Hnowlan)
[17:30:36] <wikibugs>	 (03PS4) 10JMeybohm: Update api-gateway for cert-manager support [deployment-charts] - 10https://gerrit.wikimedia.org/r/972404 (https://phabricator.wikimedia.org/T300033)
[17:30:38] <wikibugs>	 (03PS3) 10JMeybohm: api-gateway,rest-gateway: Switch to cert-manager certificates [deployment-charts] - 10https://gerrit.wikimedia.org/r/972844 (https://phabricator.wikimedia.org/T300033)
[17:30:46] <wikibugs>	 (03PS3) 10Hnowlan: wmnet: add records for mw-jobrunner [dns] - 10https://gerrit.wikimedia.org/r/972394 (https://phabricator.wikimedia.org/T349796)
[17:31:48] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] wmnet: add records for mw-jobrunner [dns] - 10https://gerrit.wikimedia.org/r/972394 (https://phabricator.wikimedia.org/T349796) (owner: 10Hnowlan)
[17:43:09] <logmsgbot>	 !log sukhe@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3079.esams.wmnet
[17:43:42] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.remove-downtime for cp3079.esams.wmnet
[17:43:43] <logmsgbot>	 !log sukhe@cumin2002 END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp3079.esams.wmnet
[17:46:22] <sukhe>	 !log pool cp3079.esams.wmnet
[17:51:03] <wikibugs>	 (03PS1) 10Kamila Součková: kube-state-metrics: bump chart version, add upstream version [deployment-charts] - 10https://gerrit.wikimedia.org/r/972869
[17:52:10] <logmsgbot>	 !log bking@cumin1001 END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
[17:52:21] <logmsgbot>	 !log bking@cumin1001 START - Cookbook sre.wdqs.data-reload
[17:53:00] <wikibugs>	 (03PS2) 10Kamila Součková: kube-state-metrics: bump chart version, add upstream version [deployment-charts] - 10https://gerrit.wikimedia.org/r/972869 (https://phabricator.wikimedia.org/T264625)
[17:56:30] <wikibugs>	 (03CR) 10Kamila Součková: [C: 03+2] kube-state-metrics: bump chart version, add upstream version [deployment-charts] - 10https://gerrit.wikimedia.org/r/972869 (https://phabricator.wikimedia.org/T264625) (owner: 10Kamila Součková)
[17:57:23] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: elastic relforge cluster restart - bking@cumin2002 - T350703
[17:59:04] <wikibugs>	 (03Merged) 10jenkins-bot: kube-state-metrics: bump chart version, add upstream version [deployment-charts] - 10https://gerrit.wikimedia.org/r/972869 (https://phabricator.wikimedia.org/T264625) (owner: 10Kamila Součková)
[18:00:05] <jouncebot>	 Deploy window MediaWiki infrastucture (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20231108T1800)
[18:01:56] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: elastic relforge cluster restart - bking@cumin2002 - T350703
[18:05:51] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad elastic cluster restart - bking@cumin2002 - T350703
[18:12:24] <wikibugs>	 (03CR) 10DCausse: rdf-streaming-updater: update values for application mode (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/967229 (https://phabricator.wikimedia.org/T349095) (owner: 10Bking)
[18:14:00] <wikibugs>	 10SRE, 10LDAP-Access-Requests: Grant Access to WMF for ecarg - https://phabricator.wikimedia.org/T350818 (10ecarg)
[18:15:38] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "thanks, Sam!" [puppet] - 10https://gerrit.wikimedia.org/r/972534 (owner: 10Samwilson)
[18:18:07] <logmsgbot>	 !log kamila@deploy2002 helmfile [staging-codfw] START helmfile.d/admin 'apply'.
[18:18:32] <logmsgbot>	 !log kamila@deploy2002 helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
[18:20:44] <mutante>	 taavi: it's new to me that I would have to run a sync-netbox-hiera cookbook in addition to using the makevm cookbook. when I run it with -c to check it tells me I must run --dry-run too and when I do that it says --dry-run is an unrecognized argument, so .. a bit confused why it showed that to you
[18:21:19] <taavi>	 mutante: did you have makevm crash or something similar? that would explain why the last part would not have been ran
[18:21:39] <taavi>	 it will prompt you before merging anything, so you can safely run it without --dry-run
[18:21:54] <mutante>	 taavi: there was a bug in the cookbook that got fixed.. but that happened with stewards2001, not 1001 .. 
[18:22:21] <mutante>	 error: unrecognized arguments: --dry-run
[18:22:32] <mutante>	 trying
[18:22:38] <logmsgbot>	 !log dzahn@cumin1001 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "test - dzahn@cumin1001"
[18:24:00] <mutante>	 yes, it does show a diff to add that host to netbox data.. for the other host that did not happen..
[18:24:10] <logmsgbot>	 !log dzahn@cumin1001 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "test - dzahn@cumin1001"
[18:24:15] <mutante>	 anyways it should be cleaned up now
[18:26:57] <wikibugs>	 (03CR) 10BryanDavis: "Will this help with my problem of sidecars in the Toolhub job pods at T292861?" [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/972535 (owner: 10RLazarus)
[18:30:03] <logmsgbot>	 !log fnegri@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudservices1006.eqiad.wmnet with OS bookworm
[18:30:26] <wikibugs>	 (03Restored) 10Bking: staging-eqiad: raise rdf-streaming-updater quota [deployment-charts] - 10https://gerrit.wikimedia.org/r/972483 (https://phabricator.wikimedia.org/T349095) (owner: 10Bking)
[18:32:13] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10User-fgiunchedi: Set nofail for raid0 recipes - https://phabricator.wikimedia.org/T350461 (10herron) I've had success mounting non-root filesystems that were unreliable (networked fs, external arrays, these kinds of things) using autofs, which these days can be done in...
[18:32:51] <wikibugs>	 (03CR) 10Bking: staging-eqiad: raise rdf-streaming-updater quota (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/972483 (https://phabricator.wikimedia.org/T349095) (owner: 10Bking)
[18:50:41] <andrewbogott>	 Krinkle can you join us in #wikimedia-cloud-admin for help with a mcrounter/memcached/wikitech mystery?
[18:54:42] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job ldap in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[19:00:05] <jouncebot>	 jnuche and dduvall: OwO what's this, a deployment window?? Train log triage with CPT. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20231108T1900). nyaa~
[19:00:05] <jouncebot>	 jnuche and dduvall: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) MediaWiki train - Utc-0+Utc-7 Version (secondary timeslot) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20231108T1900).
[19:00:44] <wikibugs>	 (03PS1) 10Dzahn: admin: create group for stewards VMs [puppet] - 10https://gerrit.wikimedia.org/r/972874 (https://phabricator.wikimedia.org/T344164)
[19:01:43] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] admin: create group for stewards VMs [puppet] - 10https://gerrit.wikimedia.org/r/972874 (https://phabricator.wikimedia.org/T344164) (owner: 10Dzahn)
[19:06:44] <wikibugs>	 10SRE, 10observability: Add monitoring for nutcracker - https://phabricator.wikimedia.org/T95231 (10Krinkle) 05Open→03Declined Nutcracker for MW (apart from cloudweb, T202431) has been replaced with mcrouter.
[19:07:11] <wikibugs>	 (03PS2) 10Dzahn: admin: create group for stewards VMs [puppet] - 10https://gerrit.wikimedia.org/r/972874 (https://phabricator.wikimedia.org/T344164)
[19:08:19] <wikibugs>	 (03CR) 10Muehlenhoff: admin: create group for stewards VMs (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/972874 (https://phabricator.wikimedia.org/T344164) (owner: 10Dzahn)
[19:11:02] <wikibugs>	 (03PS3) 10Dzahn: admin: create group for stewards VMs [puppet] - 10https://gerrit.wikimedia.org/r/972874 (https://phabricator.wikimedia.org/T344164)
[19:11:11] <wikibugs>	 (03CR) 10Dzahn: admin: create group for stewards VMs (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/972874 (https://phabricator.wikimedia.org/T344164) (owner: 10Dzahn)
[19:12:31] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/972874 (https://phabricator.wikimedia.org/T344164) (owner: 10Dzahn)
[19:16:19] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Stewards-and-global-tools, 10collaboration-services, and 2 others: 1 VMs requested for stewards - https://phabricator.wikimedia.org/T344164 (10Dzahn) >>! In T344164#9314186, @Urbanecm wrote: > Thanks @dzahn for making the VM! Following our IRC conversations, I'm putti...
[19:16:28] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] admin: create group for stewards VMs [puppet] - 10https://gerrit.wikimedia.org/r/972874 (https://phabricator.wikimedia.org/T344164) (owner: 10Dzahn)
[19:18:51] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Stewards-and-global-tools, 10collaboration-services, and 2 others: 1 VMs requested for stewards - https://phabricator.wikimedia.org/T344164 (10Urbanecm) >>! In T344164#9317517, @Dzahn wrote: >> ** A managed clone of a Git repository (https://gitlab.wikimedia.org/repos...
[19:20:14] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Stewards-and-global-tools, 10collaboration-services, and 2 others: 1 VMs requested for stewards - https://phabricator.wikimedia.org/T344164 (10Dzahn) >>! In T344164#9317520, @Urbanecm wrote: > Will do. Is there some sort of standard/preferred location?  Not really, de...
[19:22:57] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Stewards-and-global-tools, 10collaboration-services, and 2 others: 1 VMs requested for stewards - https://phabricator.wikimedia.org/T344164 (10Urbanecm) >>! In T344164#9317522, @Dzahn wrote: >>>! In T344164#9317520, @Urbanecm wrote: >> Will do. Is there some sort of s...
[19:24:06] <wikibugs>	 (03CR) 10Volans: [C: 04-1] "Gentle reminder for Serviceops to provide feedback on this one" [cookbooks] - 10https://gerrit.wikimedia.org/r/967166 (https://phabricator.wikimedia.org/T341973) (owner: 10Volans)
[19:24:18] <wikibugs>	 (03CR) 10Volans: "Gentle reminder to provide feedback on this one" [cookbooks] - 10https://gerrit.wikimedia.org/r/967628 (https://phabricator.wikimedia.org/T341973) (owner: 10Volans)
[19:24:33] <wikibugs>	 (03CR) 10Volans: "Gentle reminder for Serviceops to provide feedback on this one" [cookbooks] - 10https://gerrit.wikimedia.org/r/967165 (https://phabricator.wikimedia.org/T341973) (owner: 10Volans)
[19:25:47] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad elastic cluster restart - bking@cumin2002 - T350703
[19:26:20] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to `discovery.processed_external_sparql_query` for AndrewTavis_WMDE - https://phabricator.wikimedia.org/T350426 (10EBernhardson) This should now be resolved, existing partitions are owned by `analytics-privatedata-users` and new datasets going forward should also...
[19:27:37] <wikibugs>	 (03PS1) 10Milimetric: Update mediawiki_history snapshot [puppet] - 10https://gerrit.wikimedia.org/r/972880
[19:32:57] <wikibugs>	 10SRE, 10Observability-Logging: Leverage Grafana annotations to show events in graphs - https://phabricator.wikimedia.org/T222826 (10herron)
[19:37:09] <wikibugs>	 (03PS1) 10Dzahn: stewards: add git::clone of stewards/onboarding-system from gitlab [puppet] - 10https://gerrit.wikimedia.org/r/972882 (https://phabricator.wikimedia.org/T344164)
[19:37:35] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] stewards: add git::clone of stewards/onboarding-system from gitlab [puppet] - 10https://gerrit.wikimedia.org/r/972882 (https://phabricator.wikimedia.org/T344164) (owner: 10Dzahn)
[19:39:08] <wikibugs>	 (03CR) 10Dzahn: "what does jenkins dislike..." [puppet] - 10https://gerrit.wikimedia.org/r/972882 (https://phabricator.wikimedia.org/T344164) (owner: 10Dzahn)
[19:40:12] <RhinosF1>	 mutante: it says syntax error
[19:40:31] <RhinosF1>	 Could not parse for environment *root*: Syntax error at 'ensure' (file: /srv/workspace/puppet/modules/profile/manifests/stewards.pp, line: 12, column: 9)
[19:40:33] <wikibugs>	 (03CR) 10Eevans: [C: 03+2] aqs: add .../aqs/deploy/src/ to Environment [puppet] - 10https://gerrit.wikimedia.org/r/972461 (https://phabricator.wikimedia.org/T349228) (owner: 10Eevans)
[19:41:06] <bd808>	 !log Manually pinned wikitech to 1.42.0-wmf.3 via local hacks on cloudweb100{3,4}
[19:41:50] <mutante>	 RhinosF1: ah, missing a : :)
[19:42:35] <wikibugs>	 (03PS2) 10Dzahn: stewards: add git::clone of stewards/onboarding-system from gitlab [puppet] - 10https://gerrit.wikimedia.org/r/972882 (https://phabricator.wikimedia.org/T344164)
[19:42:39] <wikibugs>	 (03PS1) 10Ssingh: hiera: update authdns_servers (PCC test commit, DO NOT MERGE) [puppet] - 10https://gerrit.wikimedia.org/r/972883 (https://phabricator.wikimedia.org/T347054)
[19:42:50] <wikibugs>	 (03CR) 10Ssingh: [C: 04-2] hiera: update authdns_servers (PCC test commit, DO NOT MERGE) [puppet] - 10https://gerrit.wikimedia.org/r/972883 (https://phabricator.wikimedia.org/T347054) (owner: 10Ssingh)
[19:43:23] <wikibugs>	 (03PS2) 10Ssingh: hiera: update authdns_servers (PCC test commit, DO NOT MERGE) [puppet] - 10https://gerrit.wikimedia.org/r/972883 (https://phabricator.wikimedia.org/T347054)
[19:44:19] <bd808>	 The wikitech rollback was to find out if 1.42.0-wmf.4 is what broken OAuth owner-only auth on wikitech. And that seems to have been confirmed.
[19:44:26] <wikibugs>	 (03CR) 10Ssingh: [V: 03+1] "PCC SUCCESS (CORE_DIFF 2 NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/" [puppet] - 10https://gerrit.wikimedia.org/r/972883 (https://phabricator.wikimedia.org/T347054) (owner: 10Ssingh)
[19:45:11] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Stewards-and-global-tools, 10collaboration-services, and 2 others: 1 VMs requested for stewards - https://phabricator.wikimedia.org/T344164 (10Dzahn) >>! In T344164#9317524, @Urbanecm wrote: > Sorry for the confusion, I meant for the "where you want them to be written...
[19:45:14] <RhinosF1>	 mutante: looks like that fixed it
[19:45:58] <mutante>	 RhinosF1: where did the compiler output go ? heh
[19:46:52] <wikibugs>	 (03Abandoned) 10Ssingh: hiera: update authdns_servers (PCC test commit, DO NOT MERGE) [puppet] - 10https://gerrit.wikimedia.org/r/972883 (https://phabricator.wikimedia.org/T347054) (owner: 10Ssingh)
[19:47:49] <RhinosF1>	 mutante: PCC still works
[19:48:27] <wikibugs>	 (03PS1) 10Ebernhardson: cirrus updater: Update container image [deployment-charts] - 10https://gerrit.wikimedia.org/r/972884
[19:49:15] <logmsgbot>	 !log brett@cumin2002 START - Cookbook sre.hosts.reimage for host acmechief-test1001.eqiad.wmnet with OS bookworm
[19:49:24] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review: Upgrade Traffic hosts to bookworm - https://phabricator.wikimedia.org/T342154 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host acmechief-test1001.eqiad.wmnet with OS bookworm
[19:51:56] <mutante>	 RhinosF1: as in "you HAVE to run it locally now" ?
[19:52:40] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review: Upgrade Traffic hosts to bookworm - https://phabricator.wikimedia.org/T342154 (10BCornwall)
[19:52:54] <mutante>	 no more output on https://puppet-compiler.wmflabs.org/  ?
[19:53:12] <jinxer-wm>	 (PuppetFailure) firing: Puppet has failed on lists1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure
[19:53:59] <RhinosF1>	 mutante: no still in wmcloud
[19:54:35] <wikibugs>	 (03CR) 10RhinosF1: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/972882 (https://phabricator.wikimedia.org/T344164) (owner: 10Dzahn)
[19:54:49] <RhinosF1>	 mutante: did you run it
[19:55:04] <mutante>	 RhinosF1: yes, it tells me that it ran succesfully but there is no more link to the output?
[19:56:00] <RhinosF1>	 mutante: should be
[19:56:01] <wikibugs>	 (03PS1) 10BCornwall: acme_chief: Set acmechief-test1001 as active host [puppet] - 10https://gerrit.wikimedia.org/r/972886 (https://phabricator.wikimedia.org/T342154)
[19:56:04] <RhinosF1>	 I ran it again
[19:56:05] <wikibugs>	 (03CR) 10Ebernhardson: [C: 03+2] cirrus updater: Update container image [deployment-charts] - 10https://gerrit.wikimedia.org/r/972884 (owner: 10Ebernhardson)
[19:57:12] <logmsgbot>	 !log eevans@cumin1001 START - Cookbook sre.cassandra.roll-restart for nodes matching A:cassandra-dev: Applying JVM security upgrade - eevans@cumin1001
[19:58:50] <wikibugs>	 (03Merged) 10jenkins-bot: cirrus updater: Update container image [deployment-charts] - 10https://gerrit.wikimedia.org/r/972884 (owner: 10Ebernhardson)
[20:01:26] <logmsgbot>	 !log ebernhardson@deploy2002 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
[20:01:32] <wikibugs>	 (03CR) 10Dzahn: [V: 03+1] "https://puppet-compiler.wmflabs.org/output/972882/356/stewards1001.eqiad.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/972882 (https://phabricator.wikimedia.org/T344164) (owner: 10Dzahn)
[20:01:38] <logmsgbot>	 !log ebernhardson@deploy2002 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
[20:02:12] <mutante>	 RhinosF1: thanks, i see it now :)
[20:02:15] <logmsgbot>	 !log brett@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on acmechief-test1001.eqiad.wmnet with reason: host reimage
[20:02:38] <wikibugs>	 (03CR) 10Dzahn: [V: 03+1 C: 03+2] stewards: add git::clone of stewards/onboarding-system from gitlab [puppet] - 10https://gerrit.wikimedia.org/r/972882 (https://phabricator.wikimedia.org/T344164) (owner: 10Dzahn)
[20:03:23] <mutante>	 I think what confused me was that there is now puppet 5 and puppet 7.. ack, saw that mail
[20:05:24] <logmsgbot>	 !log brett@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on acmechief-test1001.eqiad.wmnet with reason: host reimage
[20:06:00] <mutante>	 puppet git clones, just getting the usual "dubious ownership" warning. I kind of want to fix that instead of adding the exception for the dir.
[20:06:09] <mutante>	 happened before 
[20:07:54] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Stewards-and-global-tools, 10collaboration-services, and 2 others: 1 VMs requested for stewards - https://phabricator.wikimedia.org/T344164 (10Dzahn) >>! In T344164#9314186, @Urbanecm wrote: > ** A managed clone of a Git repository (https://gitlab.wikimedia.org/repos/...
[20:08:03] <urbanecm>	 mutante: now that i think about it...i guess a system account would be helpful, to run the code under? especially a regular check to verify everything's in order.
[20:08:11] <logmsgbot>	 !log eevans@cumin1001 END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:cassandra-dev: Applying JVM security upgrade - eevans@cumin1001
[20:09:45] <mutante>	 urbanecm: not sure yet, could be solved through group ownership. all shell users are in "wikidev" group
[20:10:03] <urbanecm>	 i see
[20:10:15] <logmsgbot>	 !log eevans@cumin1001 START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore1*.eqiad.wmnet: Applying JVM security upgrade - eevans@cumin1001
[20:10:32] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Structured-Data-Backlog, 10UploadWizard: Access request to deleted image files in the backup cluster - https://phabricator.wikimedia.org/T350020 (10jcrespo) p:05Triage→03High
[20:12:48] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Traffic: Q1:Install cp11[00-15] and rotate into production - https://phabricator.wikimedia.org/T349244 (10Fabfur) Problems encountered while checking and current situation (just a note for bookeeping):  * Virtualization is enabled on cp1102 (`egrep -q  "vmx|svm" /proc/cpuinf...
[20:13:30] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] Update mediawiki_history snapshot [puppet] - 10https://gerrit.wikimedia.org/r/972880 (owner: 10Milimetric)
[20:19:44] <logmsgbot>	 !log eevans@cumin1001 END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore1*.eqiad.wmnet: Applying JVM security upgrade - eevans@cumin1001
[20:21:26] <logmsgbot>	 !log otto@deploy2002 helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
[20:21:35] <logmsgbot>	 !log otto@deploy2002 helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
[20:23:41] <logmsgbot>	 !log eevans@cumin1001 START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2*.codfw.wmnet: Applying JVM security upgrade - eevans@cumin1001
[20:25:21] <logmsgbot>	 !log otto@deploy2002 helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
[20:26:09] <logmsgbot>	 !log otto@deploy2002 helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
[20:28:40] <logmsgbot>	 !log brett@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host acmechief-test1001.eqiad.wmnet with OS bookworm
[20:28:49] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review: Upgrade Traffic hosts to bookworm - https://phabricator.wikimedia.org/T342154 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host acmechief-test1001.eqiad.wmnet with OS bookworm completed: - acmechief-test1001 (**WARN**)   - Do...
[20:30:55] <icinga-wm>	 PROBLEM - mailman list info on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[20:32:17] <icinga-wm>	 RECOVERY - mailman list info on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8572 bytes in 9.064 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[20:33:21] <logmsgbot>	 !log eevans@cumin1001 END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2*.codfw.wmnet: Applying JVM security upgrade - eevans@cumin1001
[20:33:34] <logmsgbot>	 !log brett@cumin2002 START - Cookbook sre.hosts.remove-downtime for acmechief-test1001.eqiad.wmnet
[20:33:35] <logmsgbot>	 !log brett@cumin2002 END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for acmechief-test1001.eqiad.wmnet
[20:37:21] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review: Upgrade Traffic hosts to bookworm - https://phabricator.wikimedia.org/T342154 (10BCornwall)
[20:39:01] <wikibugs>	 10SRE, 10Traffic, 10GitLab (Project Migration): Migrate Traffic repositories from Gerrit to Gitlab - https://phabricator.wikimedia.org/T347623 (10BCornwall)
[20:39:59] <logmsgbot>	 !log eevans@cumin1001 START - Cookbook sre.cassandra.roll-restart for nodes matching restbase10[19-21,28,31].eqiad.wmnet: Applying JVM security upgrade (row A) - eevans@cumin1001
[20:45:30] <wikibugs>	 (03PS7) 10BPirkle: Reconfigure the PageViewInfo extension to use AQS 2.0 via the REST Gateway [mediawiki-config] - 10https://gerrit.wikimedia.org/r/968384 (https://phabricator.wikimedia.org/T348731)
[20:49:32] <wikibugs>	 10SRE, 10Traffic, 10GitLab (Project Migration): Migrate Traffic repositories from Gerrit to Gitlab - https://phabricator.wikimedia.org/T347623 (10BCornwall) Yes, indeed! Thanks for pointing that out.
[21:00:06] <jouncebot>	 RoanKattouw, Urbanecm, cjming, TheresNoTime, kindrobot, and taavi: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for UTC late backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20231108T2100).
[21:00:06] <jouncebot>	 bpirkle: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[21:00:23] <bpirkle>	 I'm here
[21:00:31] <kindrobot>	 I can deploy
[21:03:26] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by kindrobot@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/968384 (https://phabricator.wikimedia.org/T348731) (owner: 10BPirkle)
[21:04:14] <wikibugs>	 (03Merged) 10jenkins-bot: Reconfigure the PageViewInfo extension to use AQS 2.0 via the REST Gateway [mediawiki-config] - 10https://gerrit.wikimedia.org/r/968384 (https://phabricator.wikimedia.org/T348731) (owner: 10BPirkle)
[21:04:41] <logmsgbot>	 !log kindrobot@deploy2002 Started scap: Backport for [[gerrit:968384|Reconfigure the PageViewInfo extension to use AQS 2.0 via the REST Gateway (T348731)]]
[21:06:06] <logmsgbot>	 !log kindrobot@deploy2002 bpirkle and kindrobot: Backport for [[gerrit:968384|Reconfigure the PageViewInfo extension to use AQS 2.0 via the REST Gateway (T348731)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[21:06:20] <kindrobot>	 bpirkle: can you confirm the changes?
[21:06:43] <bpirkle>	 Confirmed, looks as expected
[21:06:55] <kindrobot>	 Very good, syncing.
[21:07:12] <logmsgbot>	 !log kindrobot@deploy2002 bpirkle and kindrobot: Continuing with sync
[21:09:24] <wikibugs>	 (03PS4) 10RLazarus: k8s-controller-sidecars: Initial release [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/972535 (https://phabricator.wikimedia.org/T348284)
[21:10:05] <wikibugs>	 (03CR) 10RLazarus: [C: 03+2] k8s-controller-sidecars: Initial release (031 comment) [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/972535 (https://phabricator.wikimedia.org/T348284) (owner: 10RLazarus)
[21:10:19] <wikibugs>	 (03CR) 10RLazarus: [V: 03+2 C: 03+2] k8s-controller-sidecars: Initial release [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/972535 (https://phabricator.wikimedia.org/T348284) (owner: 10RLazarus)
[21:12:59] <logmsgbot>	 !log kindrobot@deploy2002 Finished scap: Backport for [[gerrit:968384|Reconfigure the PageViewInfo extension to use AQS 2.0 via the REST Gateway (T348731)]] (duration: 08m 17s)
[21:13:12] <logmsgbot>	 !log otto@deploy2002 Started deploy [analytics/refinery@25ef91f] (hadoop-test): deploying refinery with refinery-source  0.2.25 jars for T321854 - hadoop-test [analytics/refinery@25ef91f2]
[21:13:12] <logmsgbot>	 !log eevans@cumin1001 END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase10[19-21,28,31].eqiad.wmnet: Applying JVM security upgrade (row A) - eevans@cumin1001
[21:13:24] <kindrobot>	 !log finish UTC late backport window
[21:13:31] <bpirkle>	 Thank you!
[21:13:54] <kindrobot>	 No problem :)
[21:15:13] <logmsgbot>	 !log eevans@cumin1001 START - Cookbook sre.cassandra.roll-restart for nodes matching restbase10[22-24,29,32].eqiad.wmnet: Applying JVM security upgrade (row A) - eevans@cumin1001
[21:15:47] <wikibugs>	 (03PS2) 10Andrew Bogott: openstack::glance::service: Switch to systemd::sysuser [puppet] - 10https://gerrit.wikimedia.org/r/881887 (owner: 10Muehlenhoff)
[21:16:13] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] openstack::glance::service: Switch to systemd::sysuser [puppet] - 10https://gerrit.wikimedia.org/r/881887 (owner: 10Muehlenhoff)
[21:16:21] <logmsgbot>	 !log otto@deploy2002 Finished deploy [analytics/refinery@25ef91f] (hadoop-test): deploying refinery with refinery-source  0.2.25 jars for T321854 - hadoop-test [analytics/refinery@25ef91f2] (duration: 03m 10s)
[21:20:54] <wikibugs>	 (03PS3) 10Andrew Bogott: openstack::glance::service: Switch to systemd::sysuser [puppet] - 10https://gerrit.wikimedia.org/r/881887 (owner: 10Muehlenhoff)
[21:24:01] <wikibugs>	 (03PS1) 10Ottomata: test/refine - update refinery jar version for analytics test cluster refine job [puppet] - 10https://gerrit.wikimedia.org/r/972894 (https://phabricator.wikimedia.org/T321854)
[21:43:08] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] openstack::glance::service: Switch to systemd::sysuser [puppet] - 10https://gerrit.wikimedia.org/r/881887 (owner: 10Muehlenhoff)
[21:48:17] <logmsgbot>	 !log eevans@cumin1001 END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase10[22-24,29,32].eqiad.wmnet: Applying JVM security upgrade (row A) - eevans@cumin1001
[21:50:43] <logmsgbot>	 !log eevans@cumin1001 START - Cookbook sre.cassandra.roll-restart for nodes matching restbase10[25-27,30,33].eqiad.wmnet: Applying JVM security upgrade (row A) - eevans@cumin1001
[21:51:22] <wikibugs>	 (03PS1) 10Dzahn: stewards: use git:clone parameters to shared repo among several users [puppet] - 10https://gerrit.wikimedia.org/r/972896 (https://phabricator.wikimedia.org/T344164)
[21:54:24] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] stewards: use git:clone parameters to shared repo among several users [puppet] - 10https://gerrit.wikimedia.org/r/972896 (https://phabricator.wikimedia.org/T344164) (owner: 10Dzahn)
[21:54:30] <wikibugs>	 (03PS2) 10Dzahn: stewards: use git:clone parameters to shared repo among several users [puppet] - 10https://gerrit.wikimedia.org/r/972896 (https://phabricator.wikimedia.org/T344164)
[21:54:49] <wikibugs>	 (03PS3) 10Dzahn: stewards: use git:clone parameters to share repo among several users [puppet] - 10https://gerrit.wikimedia.org/r/972896 (https://phabricator.wikimedia.org/T344164)
[22:00:05] <jouncebot>	 Deploy window Wikifunction Services UTC Late (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20231108T2200)
[22:02:51] <wikibugs>	 (03CR) 10Btullis: [C: 03+1] "Looks great, thanks." [puppet] - 10https://gerrit.wikimedia.org/r/972851 (https://phabricator.wikimedia.org/T332604) (owner: 10Brouberol)
[22:04:05] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to stewards-users for urbanecm - https://phabricator.wikimedia.org/T350834 (10Urbanecm)
[22:06:54] <wikibugs>	 (03PS1) 10Milimetric: Update to latest snapshot [deployment-charts] - 10https://gerrit.wikimedia.org/r/972897
[22:07:09] <wikibugs>	 (03CR) 10Milimetric: [C: 03+2] Update to latest snapshot [deployment-charts] - 10https://gerrit.wikimedia.org/r/972897 (owner: 10Milimetric)
[22:08:17] <wikibugs>	 (03Merged) 10jenkins-bot: Update to latest snapshot [deployment-charts] - 10https://gerrit.wikimedia.org/r/972897 (owner: 10Milimetric)
[22:08:27] <wikibugs>	 (03CR) 10Btullis: [C: 03+1] Update to latest snapshot [deployment-charts] - 10https://gerrit.wikimedia.org/r/972897 (owner: 10Milimetric)
[22:08:56] <logmsgbot>	 !log ryankemper@cumin1001 START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster restart (java 11 sec updates) - ryankemper@cumin1001 - T350703
[22:09:10] <stashbot>	 T350703: Restart Elasticsearch services for java 11 updates - https://phabricator.wikimedia.org/T350703
[22:12:26] <logmsgbot>	 !log milimetric@deploy2002 helmfile [staging] START helmfile.d/services/edit-analytics: apply
[22:12:41] <logmsgbot>	 !log milimetric@deploy2002 helmfile [staging] DONE helmfile.d/services/edit-analytics: apply
[22:12:55] <logmsgbot>	 !log milimetric@deploy2002 helmfile [eqiad] START helmfile.d/services/edit-analytics: apply
[22:13:05] <logmsgbot>	 !log milimetric@deploy2002 helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply
[22:13:08] <logmsgbot>	 !log milimetric@deploy2002 helmfile [eqiad] START helmfile.d/services/edit-analytics: apply
[22:13:10] <logmsgbot>	 !log milimetric@deploy2002 helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply
[22:13:16] <logmsgbot>	 !log milimetric@deploy2002 helmfile [codfw] START helmfile.d/services/edit-analytics: apply
[22:13:23] <logmsgbot>	 !log milimetric@deploy2002 helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply
[22:13:33] <logmsgbot>	 !log milimetric@deploy2002 helmfile [staging] START helmfile.d/services/editor-analytics: apply
[22:14:04] <logmsgbot>	 !log milimetric@deploy2002 helmfile [staging] DONE helmfile.d/services/editor-analytics: apply
[22:17:39] <icinga-wm>	 PROBLEM - thanos.wikimedia.org tls expiry on titan1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/CAS-SSO/Administration
[22:17:41] <wikibugs>	 (03PS1) 10Milimetric: Copy version from production service [deployment-charts] - 10https://gerrit.wikimedia.org/r/972904
[22:17:50] <wikibugs>	 (03CR) 10Milimetric: [C: 03+2] Copy version from production service [deployment-charts] - 10https://gerrit.wikimedia.org/r/972904 (owner: 10Milimetric)
[22:18:07] <icinga-wm>	 PROBLEM - SSH on titan1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring
[22:18:23] <icinga-wm>	 PROBLEM - thanos.wikimedia.org requires authentication on titan1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/CAS-SSO/Administration
[22:18:40] <wikibugs>	 (03Merged) 10jenkins-bot: Copy version from production service [deployment-charts] - 10https://gerrit.wikimedia.org/r/972904 (owner: 10Milimetric)
[22:19:23] <icinga-wm>	 RECOVERY - SSH on titan1002 is OK: SSH OK - OpenSSH_9.2p1 Debian-2 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring
[22:19:39] <icinga-wm>	 RECOVERY - thanos.wikimedia.org requires authentication on titan1002 is OK: HTTP OK: Status line output matched HTTP/1.1 302 - 544 bytes in 0.057 second response time https://wikitech.wikimedia.org/wiki/CAS-SSO/Administration
[22:19:47] <wikibugs>	 (03PS1) 10Hashar: Plugin to process Puppet Catalog Compiler results [software/gerrit] (deploy/wmf/stable-3.5) - 10https://gerrit.wikimedia.org/r/972906
[22:20:19] <icinga-wm>	 RECOVERY - thanos.wikimedia.org tls expiry on titan1002 is OK: OK - Certificate thanos-query.discovery.wmnet will expire on Mon 20 Nov 2023 08:56:00 PM GMT +0000. https://wikitech.wikimedia.org/wiki/CAS-SSO/Administration
[22:23:28] <logmsgbot>	 !log milimetric@deploy2002 helmfile [staging] START helmfile.d/services/editor-analytics: apply
[22:23:40] <logmsgbot>	 !log milimetric@deploy2002 helmfile [staging] DONE helmfile.d/services/editor-analytics: apply
[22:23:46] <logmsgbot>	 !log milimetric@deploy2002 helmfile [eqiad] START helmfile.d/services/editor-analytics: apply
[22:24:14] <logmsgbot>	 !log milimetric@deploy2002 helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply
[22:24:22] <logmsgbot>	 !log milimetric@deploy2002 helmfile [codfw] START helmfile.d/services/editor-analytics: apply
[22:24:36] <logmsgbot>	 !log eevans@cumin1001 END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase10[25-27,30,33].eqiad.wmnet: Applying JVM security upgrade (row A) - eevans@cumin1001
[22:24:41] <logmsgbot>	 !log milimetric@deploy2002 helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply
[22:25:20] <wikibugs>	 (03CR) 10Milimetric: [C: 03+2] "I see now that version was only deployed to staging, so it makes me think I should revert.  For reference later, the version deployed to p" [deployment-charts] - 10https://gerrit.wikimedia.org/r/972904 (owner: 10Milimetric)
[22:27:02] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Stewards-and-global-tools, 10collaboration-services, 10vm-requests: 1 VMs requested for stewards - https://phabricator.wikimedia.org/T344164 (10Dzahn) With the last puppet change above the git repo is now shared between users and there are no more warnings about the...
[22:27:37] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Stewards-and-global-tools, 10collaboration-services, 10vm-requests: 1 VMs requested for stewards - https://phabricator.wikimedia.org/T344164 (10Dzahn) 05Open→03Resolved
[22:28:32] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Stewards-and-global-tools, 10collaboration-services, 10vm-requests: VMs requested for stewards - https://phabricator.wikimedia.org/T344164 (10Dzahn)
[22:39:10] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to stewards-users and group approver role for urbanecm - https://phabricator.wikimedia.org/T350834 (10Dzahn)
[22:40:49] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to stewards-users and group approver role for urbanecm - https://phabricator.wikimedia.org/T350834 (10Dzahn) In addition to adding urbanecm to the "stewards-users" group this is also a request to add him as the group approver for future group additions. I approve...
[22:44:19] <wikibugs>	 (03PS1) 10Dzahn: admin: add Martin Urbanec as group approver for stewards-users [puppet] - 10https://gerrit.wikimedia.org/r/972909 (https://phabricator.wikimedia.org/T350834)
[22:44:40] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] admin: add Martin Urbanec as group approver for stewards-users [puppet] - 10https://gerrit.wikimedia.org/r/972909 (https://phabricator.wikimedia.org/T350834) (owner: 10Dzahn)
[22:45:54] <wikibugs>	 (03PS2) 10Dzahn: admin: add Martin Urbanec as group approver for stewards-users [puppet] - 10https://gerrit.wikimedia.org/r/972909 (https://phabricator.wikimedia.org/T350834)
[22:51:52] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to stewards-users and group approver role for urbanecm - https://phabricator.wikimedia.org/T350834 (10Dzahn) requestor has existing shell access so no need to worry about L3, NDA, keys.. just a group addition.
[22:57:20] <wikibugs>	 (03PS1) 10Dzahn: admin: add urbanecm to stewards-users [puppet] - 10https://gerrit.wikimedia.org/r/972911 (https://phabricator.wikimedia.org/T350834)
[22:58:12] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job ldap in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[22:58:59] <wikibugs>	 (03PS1) 10Zoranzoki21: DNM Test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/972912
[22:59:51] <wikibugs>	 (03Abandoned) 10Zoranzoki21: DNM Test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/972912 (owner: 10Zoranzoki21)
[23:00:28] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to stewards-users and group approver role for urbanecm - https://phabricator.wikimedia.org/T350834 (10Dzahn) @DMburugu Namely tells me you are Martin's manager. Would you approve this access please?
[23:28:18] <logmsgbot>	 !log ryankemper@cumin1001 END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster restart (java 11 sec updates) - ryankemper@cumin1001 - T350703
[23:28:22] <stashbot>	 T350703: Restart Elasticsearch services for java 11 updates - https://phabricator.wikimedia.org/T350703
[23:51:18] <wikibugs>	 (03PS1) 10Andrew Bogott: wmcs-backup: also remove expired backups via delete-expired [puppet] - 10https://gerrit.wikimedia.org/r/972915
[23:53:13] <jinxer-wm>	 (PuppetFailure) firing: Puppet has failed on lists1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure
[23:59:16] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: Average latency high: codfw parsoid GET/200: 4.549560901581813s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=codfw&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded