[00:21:30] FIRING: [4x] ProbeDown: Service centrallog1002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [00:38:23] (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1083481 [00:38:23] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1083481 (owner: 10TrainBranchBot) [01:08:23] (03PS1) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1083484 [01:08:23] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1083484 (owner: 10TrainBranchBot) [01:10:42] (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1083481 (owner: 10TrainBranchBot) [01:18:39] FIRING: CirrusSearchTitleSuggestIndexTooOld: Some search indices that power autocomplete have not been updated recently - https://wikitech.wikimedia.org/wiki/Search/Elasticsearch_Administration#CirrusSearch_titlesuggest_index_is_too_old - TODO - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchTitleSuggestIndexTooOld [01:37:57] (03Merged) 10jenkins-bot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1083484 (owner: 10TrainBranchBot) [02:37:16] FIRING: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [02:51:04] (03PS1) 10Jdrewniak: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1083486 (https://phabricator.wikimedia.org/T128546) [03:02:16] RESOLVED: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [04:21:30] FIRING: [4x] ProbeDown: Service centrallog1002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [04:30:53] (03CR) 10Santhosh: [C:03+1] Disable MT in Content Translation on Lithuanian Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1083292 (https://phabricator.wikimedia.org/T364073) (owner: 10KartikMistry) [05:10:42] (03CR) 10Anzx: "as suggested on https://phabricator.wikimedia.org/T377648#10260983 you may also need to add `wgMinervaTalkAtTop` https://gerrit.wikime" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1083294 (https://phabricator.wikimedia.org/T377648) (owner: 10Hamish) [05:18:39] FIRING: CirrusSearchTitleSuggestIndexTooOld: Some search indices that power autocomplete have not been updated recently - https://wikitech.wikimedia.org/wiki/Search/Elasticsearch_Administration#CirrusSearch_titlesuggest_index_is_too_old - TODO - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchTitleSuggestIndexTooOld [05:58:29] (03PS2) 10Majavah: Drop labtestwiki config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1083304 (https://phabricator.wikimedia.org/T378260) [05:58:29] (03PS2) 10Majavah: Stop building LdapAuthentication i10n [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1083305 (https://phabricator.wikimedia.org/T371592) [05:58:30] (03PS1) 10Majavah: Drop 'nonglobal' dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1083493 [05:59:10] (03CR) 10Majavah: "yeah I think we can drop it: https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1083493" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1083304 (https://phabricator.wikimedia.org/T378260) (owner: 10Majavah) [06:03:27] !log taavi@cumin1002 dbctl commit (dc=all): 'depool db1169', diff saved to https://phabricator.wikimedia.org/P70590 and previous config saved to /var/cache/conftool/dbconfig/20241028-060327-taavi.json [06:03:36] ^ replication broken, will create a task [06:05:00] T378320 [06:05:00] T378320: db1169 replication broken - https://phabricator.wikimedia.org/T378320 [06:06:55] !log taavi@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: replication broken T378320 [06:07:09] !log taavi@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: replication broken T378320 [06:15:29] (03CR) 10RhinosF1: [C:03+1] Drop 'nonglobal' dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1083493 (owner: 10Majavah) [06:16:20] (03CR) 10RhinosF1: [C:03+1] Drop labtestwiki config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1083304 (https://phabricator.wikimedia.org/T378260) (owner: 10Majavah) [06:20:59] (03PS1) 10Majavah: P:mediawiki: Stop trying to run jobs on s11 [puppet] - 10https://gerrit.wikimedia.org/r/1083584 (https://phabricator.wikimedia.org/T378260) [06:21:00] (03PS1) 10Majavah: Drop config for serving labtestwiki [puppet] - 10https://gerrit.wikimedia.org/r/1083585 (https://phabricator.wikimedia.org/T378260) [06:21:02] (03PS1) 10Majavah: Drop support for s11 MariaDB section [puppet] - 10https://gerrit.wikimedia.org/r/1083586 (https://phabricator.wikimedia.org/T378260) [06:29:03] (03CR) 10Majavah: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/4393/co" [puppet] - 10https://gerrit.wikimedia.org/r/1083584 (https://phabricator.wikimedia.org/T378260) (owner: 10Majavah) [06:31:05] (03PS1) 10Majavah: Drop labtestwikitech return traffic term [homer/public] - 10https://gerrit.wikimedia.org/r/1083589 (https://phabricator.wikimedia.org/T378260) [06:59:01] (03PS1) 10Kosta Harlan: ContributionsPager: Fix getTemplateParams() parameter [core] (wmf/1.43.0-wmf.28) - 10https://gerrit.wikimedia.org/r/1083591 (https://phabricator.wikimedia.org/T378132) [06:59:09] (03PS1) 10Kosta Harlan: Fix getTemplateParams() $classes parameter [extensions/CheckUser] (wmf/1.43.0-wmf.28) - 10https://gerrit.wikimedia.org/r/1083592 (https://phabricator.wikimedia.org/T378132) [06:59:16] (03CR) 10CI reject: [V:04-1] ContributionsPager: Fix getTemplateParams() parameter [core] (wmf/1.43.0-wmf.28) - 10https://gerrit.wikimedia.org/r/1083591 (https://phabricator.wikimedia.org/T378132) (owner: 10Kosta Harlan) [06:59:19] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, October 28 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal" [core] (wmf/1.43.0-wmf.28) - 10https://gerrit.wikimedia.org/r/1083591 (https://phabricator.wikimedia.org/T378132) (owner: 10Kosta Harlan) [06:59:30] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, October 28 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal" [extensions/CheckUser] (wmf/1.43.0-wmf.28) - 10https://gerrit.wikimedia.org/r/1083592 (https://phabricator.wikimedia.org/T378132) (owner: 10Kosta Harlan) [07:00:05] Amir1, Urbanecm, and awight: How many deployers does it take to do UTC morning backport window deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241028T0700). [07:00:05] kostajh: A patch you scheduled for UTC morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [07:00:11] hello [07:02:07] I'll self-serve my deployment [07:02:08] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, October 28 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployc" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1083292 (https://phabricator.wikimedia.org/T364073) (owner: 10KartikMistry) [07:03:51] kostajh: let me know when your patch is done. I forgot to add my patch, did that manually. [07:03:57] (03CR) 10TrainBranchBot: [C:03+2] "Approved by kharlan@deploy2002 using scap backport" [core] (wmf/1.43.0-wmf.28) - 10https://gerrit.wikimedia.org/r/1083591 (https://phabricator.wikimedia.org/T378132) (owner: 10Kosta Harlan) [07:03:57] (03CR) 10TrainBranchBot: [C:03+2] "Approved by kharlan@deploy2002 using scap backport" [extensions/CheckUser] (wmf/1.43.0-wmf.28) - 10https://gerrit.wikimedia.org/r/1083592 (https://phabricator.wikimedia.org/T378132) (owner: 10Kosta Harlan) [07:04:21] FIRING: PoolcounterFullQueues: Full queues for poolcounter1006:9106 poolcounter - https://www.mediawiki.org/wiki/PoolCounter#Request_tracing_in_production - https://grafana.wikimedia.org/d/aIcYxuxZk/poolcounter?orgId=1&viewPanel=6&from=now-1h&to=now&var-dc=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DPoolcounterFullQueues [07:04:44] (03CR) 10TrainBranchBot: [C:03+2] "Approved by kharlan@deploy2002 using scap backport" [core] (wmf/1.43.0-wmf.28) - 10https://gerrit.wikimedia.org/r/1083591 (https://phabricator.wikimedia.org/T378132) (owner: 10Kosta Harlan) [07:04:44] (03CR) 10TrainBranchBot: [C:03+2] "Approved by kharlan@deploy2002 using scap backport" [extensions/CheckUser] (wmf/1.43.0-wmf.28) - 10https://gerrit.wikimedia.org/r/1083592 (https://phabricator.wikimedia.org/T378132) (owner: 10Kosta Harlan) [07:05:10] kart_: ah, ok. Sure, will let you know [07:05:32] (03CR) 10EarlyWarningBot: "Failed command: "composer run --timeout=0 phpunit:parallel:database --"" [extensions/CheckUser] (wmf/1.43.0-wmf.28) - 10https://gerrit.wikimedia.org/r/1083592 (https://phabricator.wikimedia.org/T378132) (owner: 10Kosta Harlan) [07:09:21] RESOLVED: PoolcounterFullQueues: Full queues for poolcounter1006:9106 poolcounter - https://www.mediawiki.org/wiki/PoolCounter#Request_tracing_in_production - https://grafana.wikimedia.org/d/aIcYxuxZk/poolcounter?orgId=1&viewPanel=6&from=now-1h&to=now&var-dc=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DPoolcounterFullQueues [07:10:12] (03CR) 10EarlyWarningBot: "Failed command: "composer run --timeout=0 phpunit:parallel:database --"" [extensions/CheckUser] (wmf/1.43.0-wmf.28) - 10https://gerrit.wikimedia.org/r/1083592 (https://phabricator.wikimedia.org/T378132) (owner: 10Kosta Harlan) [07:10:39] (03CR) 10EarlyWarningBot: "Failed command: "composer run --timeout=0 phpunit:parallel:database --"" [extensions/CheckUser] (wmf/1.43.0-wmf.28) - 10https://gerrit.wikimedia.org/r/1083592 (https://phabricator.wikimedia.org/T378132) (owner: 10Kosta Harlan) [07:12:08] Amir1 / urbanecm would it be OK to force merge https://gerrit.wikimedia.org/r/1083592, because the failing test issue is compatibility with a update to parameters defined in a core patch that I am also backporting now? [07:13:51] The error is `Declaration of MediaWiki\CheckUser\GlobalContributions\GlobalContributionsPager::getTemplateParams($row, &$classes) should be compatible with MediaWiki\Pager\ContributionsPager::getTemplateParams($row, $classes)` [07:14:08] !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 4:00:00 on db[1169,1234].eqiad.wmnet with reason: maintenance [07:14:12] !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db[1169,1234].eqiad.wmnet with reason: maintenance [07:14:16] it looks like that also occurred on the patch on `master` and a `recheck` fixed the issue https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CheckUser/+/1082858 [07:14:20] cc hashar [07:14:53] (03CR) 10EarlyWarningBot: "Failed command: "composer run --timeout=0 phpunit:parallel:database --"" [extensions/CheckUser] (wmf/1.43.0-wmf.28) - 10https://gerrit.wikimedia.org/r/1083592 (https://phabricator.wikimedia.org/T378132) (owner: 10Kosta Harlan) [07:16:06] (03CR) 10CI reject: [V:04-1] Fix getTemplateParams() $classes parameter [extensions/CheckUser] (wmf/1.43.0-wmf.28) - 10https://gerrit.wikimedia.org/r/1083592 (https://phabricator.wikimedia.org/T378132) (owner: 10Kosta Harlan) [07:16:31] It doesn't make sense to me why a recheck would fix that type of thing. [07:17:08] (03PS2) 10Kosta Harlan: Fix getTemplateParams() $classes parameter [extensions/CheckUser] (wmf/1.43.0-wmf.28) - 10https://gerrit.wikimedia.org/r/1083592 (https://phabricator.wikimedia.org/T378132) [07:17:13] (03PS2) 10Kosta Harlan: ContributionsPager: Fix getTemplateParams() parameter [core] (wmf/1.43.0-wmf.28) - 10https://gerrit.wikimedia.org/r/1083591 (https://phabricator.wikimedia.org/T378132) [07:17:16] (03CR) 10CI reject: [V:04-1] ContributionsPager: Fix getTemplateParams() parameter [core] (wmf/1.43.0-wmf.28) - 10https://gerrit.wikimedia.org/r/1083591 (https://phabricator.wikimedia.org/T378132) (owner: 10Kosta Harlan) [07:18:24] (03PS3) 10Kosta Harlan: Fix getTemplateParams() $classes parameter [extensions/CheckUser] (wmf/1.43.0-wmf.28) - 10https://gerrit.wikimedia.org/r/1083592 (https://phabricator.wikimedia.org/T378132) [07:19:05] (03PS4) 10Kosta Harlan: Fix getTemplateParams() $classes parameter [extensions/CheckUser] (wmf/1.43.0-wmf.28) - 10https://gerrit.wikimedia.org/r/1083592 (https://phabricator.wikimedia.org/T378132) [07:19:20] (03PS5) 10Kosta Harlan: Fix getTemplateParams() $classes parameter [extensions/CheckUser] (wmf/1.43.0-wmf.28) - 10https://gerrit.wikimedia.org/r/1083592 (https://phabricator.wikimedia.org/T378132) [07:19:35] (03PS3) 10Kosta Harlan: ContributionsPager: Fix getTemplateParams() parameter [core] (wmf/1.43.0-wmf.28) - 10https://gerrit.wikimedia.org/r/1083591 (https://phabricator.wikimedia.org/T378132) [07:19:47] (03CR) 10TrainBranchBot: "Approved by kharlan@deploy2002 using scap backport" [core] (wmf/1.43.0-wmf.28) - 10https://gerrit.wikimedia.org/r/1083591 (https://phabricator.wikimedia.org/T378132) (owner: 10Kosta Harlan) [07:19:47] (03CR) 10TrainBranchBot: "Approved by kharlan@deploy2002 using scap backport" [extensions/CheckUser] (wmf/1.43.0-wmf.28) - 10https://gerrit.wikimedia.org/r/1083592 (https://phabricator.wikimedia.org/T378132) (owner: 10Kosta Harlan) [07:20:18] given that recheck worked on the master branch, I'll see if trying the backport process again sort this out [07:20:49] kart_: if it doesn't work this time, I'll abandon my attempt and hand over to you [07:20:59] Ack [07:25:52] (03CR) 10EarlyWarningBot: "Failed command: "composer run --timeout=0 phpunit:parallel:database --"" [extensions/CheckUser] (wmf/1.43.0-wmf.28) - 10https://gerrit.wikimedia.org/r/1083592 (https://phabricator.wikimedia.org/T378132) (owner: 10Kosta Harlan) [07:27:16] kart_: bah, still failing, and I'm not sure about force merging. I don't know how to gracefully exit `scap backport` so we might have to wait for tests to finish running [07:28:13] No worries. We can wait. [07:28:43] I'll be back in 10 minutes. [07:32:09] (03CR) 10EarlyWarningBot: "Failed command: "composer run --timeout=0 phpunit:parallel:database --"" [extensions/CheckUser] (wmf/1.43.0-wmf.28) - 10https://gerrit.wikimedia.org/r/1083592 (https://phabricator.wikimedia.org/T378132) (owner: 10Kosta Harlan) [07:38:05] (03CR) 10Brouberol: [C:03+1] datahubsearch: Avoid Ferm-specific syntax [puppet] - 10https://gerrit.wikimedia.org/r/1083149 (owner: 10Muehlenhoff) [07:38:39] (03CR) 10CI reject: [V:04-1] Fix getTemplateParams() $classes parameter [extensions/CheckUser] (wmf/1.43.0-wmf.28) - 10https://gerrit.wikimedia.org/r/1083592 (https://phabricator.wikimedia.org/T378132) (owner: 10Kosta Harlan) [07:38:41] (03CR) 10CI reject: [V:04-1] ContributionsPager: Fix getTemplateParams() parameter [core] (wmf/1.43.0-wmf.28) - 10https://gerrit.wikimedia.org/r/1083591 (https://phabricator.wikimedia.org/T378132) (owner: 10Kosta Harlan) [07:38:53] (03CR) 10Brouberol: [C:03+1] "Nice, thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/1083157 (owner: 10Muehlenhoff) [07:40:31] kart_: I think you can go ahead [07:42:35] (03CR) 10Elukey: [C:03+2] tox: add Jenkins settings to reduce its execution time (031 comment) [software/pywmflib] - 10https://gerrit.wikimedia.org/r/1082524 (owner: 10Elukey) [07:42:52] kostajh: thanks [07:42:55] (03Abandoned) 10Elukey: tests: fix outstanding CI issues [software/pywmflib] (debian) - 10https://gerrit.wikimedia.org/r/1082501 (owner: 10Elukey) [07:43:47] (03CR) 10TrainBranchBot: [C:03+2] "Approved by kartik@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1083292 (https://phabricator.wikimedia.org/T364073) (owner: 10KartikMistry) [07:44:30] (03Merged) 10jenkins-bot: Disable MT in Content Translation on Lithuanian Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1083292 (https://phabricator.wikimedia.org/T364073) (owner: 10KartikMistry) [07:45:31] !log kartik@deploy2002 Started scap sync-world: Backport for [[gerrit:1083292|Disable MT in Content Translation on Lithuanian Wikipedia (T364073)]] [07:45:45] T364073: Disable machine translation in Content Translation Tool on Lithuanian Wikipedia - https://phabricator.wikimedia.org/T364073 [07:55:56] (03CR) 10Ayounsi: [C:03+1] Drop labtestwikitech return traffic term [homer/public] - 10https://gerrit.wikimedia.org/r/1083589 (https://phabricator.wikimedia.org/T378260) (owner: 10Majavah) [07:56:13] !log kartik@deploy2002 kartik: Backport for [[gerrit:1083292|Disable MT in Content Translation on Lithuanian Wikipedia (T364073)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [07:56:30] T364073: Disable machine translation in Content Translation Tool on Lithuanian Wikipedia - https://phabricator.wikimedia.org/T364073 [07:57:04] !log kartik@deploy2002 kartik: Continuing with sync [07:58:11] PROBLEM - BGP status on cr4-ulsfo is CRITICAL: BGP CRITICAL - AS6939/IPv6: Connect - HE, AS6939/IPv4: Connect - HE https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status [07:59:49] 10ops-eqiad, 06SRE, 06DBA, 06DC-Ops: db1234 crashed - https://phabricator.wikimedia.org/T378267#10266587 (10ABran-WMF) 05Open→03In progress p:05Triage→03Medium It seems it has a faulty memory stick: A critical diagnostic event occurred in the memory device at A6. Contact your service provider for... [08:00:43] 10ops-eqiad, 06SRE, 06DBA, 06DC-Ops: db1234 crashed - faulty memory stick on A6 (0x4E42) - https://phabricator.wikimedia.org/T378267#10266593 (10ABran-WMF) [08:01:02] !log Restarted CI Jenkins to update the Collapsible Sections plugin | T378327 [08:01:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:01:16] T378327: Modernize Jenkins Collapsible Console Sections plugin UI - https://phabricator.wikimedia.org/T378327 [08:01:25] !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db1234.eqiad.wmnet with reason: maintenance T378267 [08:01:28] !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db1234.eqiad.wmnet with reason: maintenance T378267 [08:01:37] T378267: db1234 crashed - faulty memory stick on A6 (0x4E42) - https://phabricator.wikimedia.org/T378267 [08:06:41] 10ops-eqiad, 06SRE, 06DBA, 06DC-Ops: db1234 crashed - faulty memory stick on A6 (0x4E42) - https://phabricator.wikimedia.org/T378267#10266601 (10taavi) fwiw, that slot has a bit of a history: T363102 [08:07:09] (03CR) 10Ayounsi: [C:03+1] "lgtm! ideally let's have a 2nd pair of eyes review it." [puppet] - 10https://gerrit.wikimedia.org/r/1082288 (https://phabricator.wikimedia.org/T376949) (owner: 10JHathaway) [08:07:56] !log kartik@deploy2002 Finished scap sync-world: Backport for [[gerrit:1083292|Disable MT in Content Translation on Lithuanian Wikipedia (T364073)]] (duration: 22m 24s) [08:08:07] T364073: Disable machine translation in Content Translation Tool on Lithuanian Wikipedia - https://phabricator.wikimedia.org/T364073 [08:10:58] kostajh: sorry the deployment window on [[Deployments]] is off by one hour due to Daylight Saving time madness [08:11:09] it is tied to the US [08:11:11] 10ops-eqiad, 06SRE, 06DBA, 06DC-Ops: db1234 crashed - faulty memory stick on A6 (0x4E42) - https://phabricator.wikimedia.org/T378267#10266607 (10ABran-WMF) ah indeed good catch @taavi thanks >>! In T363102#9758021, @VRiley-WMF wrote: > This DIMM (A6) has been replaced and the server has been powered back... [08:11:53] (03CR) 10Majavah: [C:03+2] Drop labtestwikitech return traffic term [homer/public] - 10https://gerrit.wikimedia.org/r/1083589 (https://phabricator.wikimedia.org/T378260) (owner: 10Majavah) [08:12:18] kostajh: and yes please do merge them [08:12:25] (03Merged) 10jenkins-bot: Drop labtestwikitech return traffic term [homer/public] - 10https://gerrit.wikimedia.org/r/1083589 (https://phabricator.wikimedia.org/T378260) (owner: 10Majavah) [08:12:37] (03CR) 10Hashar: [C:03+1] ContributionsPager: Fix getTemplateParams() parameter [core] (wmf/1.43.0-wmf.28) - 10https://gerrit.wikimedia.org/r/1083591 (https://phabricator.wikimedia.org/T378132) (owner: 10Kosta Harlan) [08:12:42] (03CR) 10Hashar: [C:03+1] Fix getTemplateParams() $classes parameter [extensions/CheckUser] (wmf/1.43.0-wmf.28) - 10https://gerrit.wikimedia.org/r/1083592 (https://phabricator.wikimedia.org/T378132) (owner: 10Kosta Harlan) [08:12:58] hashar: can I still do this now? [08:13:02] yes [08:13:11] I will change the hours [08:13:12] hashar: my patch is done. [08:13:49] hashar: what do I need to do, manually set +2 on CR and Verified? Wait for merge, then run `scap backport` ? [08:19:26] !log Changed UTC morning backport window from 00:00 SF to 09:00 CET (aka 08:00 UTC) | UTC morning backport window [08:19:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:19:43] kostajh: hmm let me recheck [08:19:56] I will push them [08:21:26] thx [08:21:30] FIRING: [4x] ProbeDown: Service centrallog1002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [08:23:50] (03PS4) 10Kosta Harlan: ContributionsPager: Fix getTemplateParams() parameter [core] (wmf/1.43.0-wmf.28) - 10https://gerrit.wikimedia.org/r/1083591 (https://phabricator.wikimedia.org/T378132) [08:24:14] !log Pushed https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CheckUser/+/1083592 and https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CheckUser/+/1083592 for wmf/1.43.0-wmf.28 / T378132 due to a dependency loop [08:24:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:24:26] T378132: mw-contributions-current css class missing from Special:Contributions - https://phabricator.wikimedia.org/T378132 [08:24:41] kostajh: done! [08:24:51] hashar: thank you! [08:25:44] my guess is that CheckUser got broken because patches made to core do not run its tests [08:26:37] hashar: are you syncing those patches now as well? or should I do that with `scap backport` [08:26:53] please do it :) [08:27:00] I think you should be able to sync both at the same time [08:27:36] !log Pushed https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CheckUser/+/1083592 and https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1083591 for wmf/1.43.0-wmf.28 / T378132 due to a dependency loop [08:27:40] damn copy paster [08:27:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:28:00] hashar: ok working on that now [08:28:29] !log kharlan@deploy2002 Started scap sync-world: Backport for [[gerrit:1083591|ContributionsPager: Fix getTemplateParams() parameter (T378132)]], [[gerrit:1083592|Fix getTemplateParams() $classes parameter (T378132)]] [08:31:01] (03PS1) 10Stevemunene: Add dummy keytabs for new presto hosts [labs/private] - 10https://gerrit.wikimedia.org/r/1083755 (https://phabricator.wikimedia.org/T374924) [08:31:11] PROBLEM - Router interfaces on cr4-ulsfo is CRITICAL: CRITICAL: host 198.35.26.193, interfaces up: 70, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [08:31:25] !log kharlan@deploy2002 kharlan: Backport for [[gerrit:1083591|ContributionsPager: Fix getTemplateParams() parameter (T378132)]], [[gerrit:1083592|Fix getTemplateParams() $classes parameter (T378132)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [08:31:47] T378132: mw-contributions-current css class missing from Special:Contributions - https://phabricator.wikimedia.org/T378132 [08:33:08] (03PS2) 10Stevemunene: Add dummy keytabs for new presto hosts [labs/private] - 10https://gerrit.wikimedia.org/r/1083755 (https://phabricator.wikimedia.org/T374924) [08:33:27] !log kharlan@deploy2002 kharlan: Continuing with sync [08:36:11] RECOVERY - Router interfaces on cr4-ulsfo is OK: OK: host 198.35.26.193, interfaces up: 71, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [08:38:07] !log kharlan@deploy2002 Finished scap sync-world: Backport for [[gerrit:1083591|ContributionsPager: Fix getTemplateParams() parameter (T378132)]], [[gerrit:1083592|Fix getTemplateParams() $classes parameter (T378132)]] (duration: 09m 38s) [08:38:19] T378132: mw-contributions-current css class missing from Special:Contributions - https://phabricator.wikimedia.org/T378132 [08:38:26] (03PS1) 10Stevemunene: Add new presto hosts to presto cluster [puppet] - 10https://gerrit.wikimedia.org/r/1083756 (https://phabricator.wikimedia.org/T374924) [08:38:34] hashar: done [08:38:45] !log UTC morning deploys done [08:38:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:39:08] kostajh: congrats! [08:42:10] !log T378227: deleting broken cirrus titlesugest index dewiki_titlesuggest_1729824440 [08:42:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:42:17] T378227: Investigate failed Cirrus index build services on mwmaint2002 (WIP) - https://phabricator.wikimedia.org/T378227 [08:54:23] (03CR) 10Muehlenhoff: [C:03+2] ganeti-test: Enable puppet-managed /var/lib/ganeti/known_hosts for the role [puppet] - 10https://gerrit.wikimedia.org/r/1083165 (https://phabricator.wikimedia.org/T309724) (owner: 10Muehlenhoff) [08:54:58] (03PS1) 10Majavah: P:openstack: Remove apache httpd from cloudweb servers [puppet] - 10https://gerrit.wikimedia.org/r/1083757 (https://phabricator.wikimedia.org/T371378) [08:56:17] (03CR) 10Majavah: [V:03+1] "PCC SUCCESS (CORE_DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/4394/co" [puppet] - 10https://gerrit.wikimedia.org/r/1083757 (https://phabricator.wikimedia.org/T371378) (owner: 10Majavah) [09:03:02] 06SRE, 10SRE-Access-Requests: Requesting access to 'deployment' for 'Joely Rooke WMDE' - https://phabricator.wikimedia.org/T378082#10266693 (10JoelyRooke-WMDE) Hi all, I believe I have previously signed the NDA when I got basic LDAP access (https://phabricator.wikimedia.org/T366145) and confirmed when I got ac... [09:07:23] 06SRE, 10LDAP-Access-Requests: Grant Access to ldap/nda for Deepesha Burse WMDE - https://phabricator.wikimedia.org/T378182#10266713 (10Deepesha_WMDE) >>! In T378182#10263534, @KFrancis wrote: > Please provide Deepesha Burse's email address and I will process the NDA. Thanks! Email address: deepesha.burse@wi... [09:11:29] !log Restarted CI Jenkins for plugin update - T378327 [09:11:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:11:42] T378327: Modernize Jenkins Collapsible Console Sections plugin UI - https://phabricator.wikimedia.org/T378327 [09:17:34] 10SRE-tools, 10Spicerack: Puppet module hiera_lookup not working - https://phabricator.wikimedia.org/T378331 (10Volans) 03NEW p:05Triage→03High [09:18:39] FIRING: CirrusSearchTitleSuggestIndexTooOld: Some search indices that power autocomplete have not been updated recently - https://wikitech.wikimedia.org/wiki/Search/Elasticsearch_Administration#CirrusSearch_titlesuggest_index_is_too_old - TODO - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchTitleSuggestIndexTooOld [09:21:28] (03CR) 10Vgutierrez: [C:04-1] haproxykafka: haproxykafka module (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1083203 (https://phabricator.wikimedia.org/T374128) (owner: 10Fabfur) [09:21:28] !log cmooney@cumin1002 START - Cookbook sre.hosts.reboot-single for host ganeti2041.codfw.wmnet [09:24:23] 10SRE-tools, 06Infrastructure-Foundations, 10Spicerack: Puppet module hiera_lookup not working - https://phabricator.wikimedia.org/T378331#10266765 (10Devrepo) [09:32:01] 10ops-eqiad, 06SRE, 06Data-Persistence, 06Data-Persistence-SRE, and 3 others: Q2:rack/setup/install es104[1-6] - https://phabricator.wikimedia.org/T378143#10266787 (10ABran-WMF) I've tried to reproduce what's been done in T355269 which is quite close to what we're doing here. I might be lacking some info t... [09:32:11] 10ops-codfw, 06SRE, 06Data-Persistence, 06DC-Ops, 13Patch-For-Review: Q2:rack/setup/install es204[1-6] - https://phabricator.wikimedia.org/T378146#10266794 (10ABran-WMF) [09:37:40] 10ops-codfw, 06SRE, 06Data-Persistence, 06Data-Persistence-SRE, and 3 others: Q2:rack/setup/install es204[1-6] - https://phabricator.wikimedia.org/T378146#10266801 (10ABran-WMF) [09:41:35] (03PS2) 10Arnaudb: mariadb: add 12 new es hosts [puppet] - 10https://gerrit.wikimedia.org/r/1083758 (https://phabricator.wikimedia.org/T378143) [09:41:35] (03CR) 10Arnaudb: "I'm unsure if preseed is needed here, please lmk if not!" [puppet] - 10https://gerrit.wikimedia.org/r/1083758 (https://phabricator.wikimedia.org/T378143) (owner: 10Arnaudb) [09:47:06] (03CR) 10Vgutierrez: [C:04-1] haproxykafka: haproxykafka module (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1083203 (https://phabricator.wikimedia.org/T374128) (owner: 10Fabfur) [09:47:44] (03PS1) 10Slyngshede: IDM: Redis dummy password [labs/private] - 10https://gerrit.wikimedia.org/r/1083764 (https://phabricator.wikimedia.org/T377728) [09:48:55] (03PS2) 10Slyngshede: IDM: Redis dummy password [labs/private] - 10https://gerrit.wikimedia.org/r/1083764 (https://phabricator.wikimedia.org/T377728) [09:52:21] (03CR) 10Vgutierrez: [C:04-1] haproxykafka: profile and hiera files (037 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1083204 (https://phabricator.wikimedia.org/T374128) (owner: 10Fabfur) [09:53:57] RECOVERY - MariaDB Replica SQL: s1 on db1169 is OK: OK slave_sql_state Slave_SQL_Running: Yes https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [09:54:36] (03PS3) 10Slyngshede: IDP: Redis dummy password [labs/private] - 10https://gerrit.wikimedia.org/r/1083764 (https://phabricator.wikimedia.org/T377728) [09:59:15] 06SRE, 06Infrastructure-Foundations, 10netops: Ganeti network config results in additional a6uto-conf IPv6 address - https://phabricator.wikimedia.org/T378335 (10cmooney) 03NEW p:05Triage→03Medium [10:00:05] Deploy window MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241028T1000) [10:00:20] (03CR) 10Fabfur: haproxykafka: haproxykafka module (0312 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1083203 (https://phabricator.wikimedia.org/T374128) (owner: 10Fabfur) [10:04:07] (03CR) 10Slyngshede: [C:03+2] Navigation: Fix anonymous check. [software/bitu] - 10https://gerrit.wikimedia.org/r/1082782 (owner: 10Slyngshede) [10:06:50] (03Merged) 10jenkins-bot: Navigation: Fix anonymous check. [software/bitu] - 10https://gerrit.wikimedia.org/r/1082782 (owner: 10Slyngshede) [10:12:44] !log updated spicerack to v8.15.1 on cumin2002 [10:12:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:17:30] (03PS1) 10Tiziano Fogli: add gmodena to dumps-root [puppet] - 10https://gerrit.wikimedia.org/r/1083766 (https://phabricator.wikimedia.org/T377773) [10:20:18] (03PS1) 10Muehlenhoff: Assign the ganeti role to ganeti2041/ganeti2042 [puppet] - 10https://gerrit.wikimedia.org/r/1083767 (https://phabricator.wikimedia.org/T376594) [10:20:45] (03CR) 10Tiziano Fogli: "We received the necessary approvals." [puppet] - 10https://gerrit.wikimedia.org/r/1083766 (https://phabricator.wikimedia.org/T377773) (owner: 10Tiziano Fogli) [10:21:15] (03CR) 10Hamish: "After my research on minerva manual and codew, I consider this patch is fine to close the ticket. I'll do debug when deploy, thanks!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1083294 (https://phabricator.wikimedia.org/T377648) (owner: 10Hamish) [10:21:31] (03CR) 10Muehlenhoff: [C:03+2] Assign the ganeti role to ganeti2041/ganeti2042 [puppet] - 10https://gerrit.wikimedia.org/r/1083767 (https://phabricator.wikimedia.org/T376594) (owner: 10Muehlenhoff) [10:24:28] (03CR) 10Vgutierrez: [C:04-1] haproxykafka: haproxykafka module (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1083203 (https://phabricator.wikimedia.org/T374128) (owner: 10Fabfur) [10:25:14] (03CR) 10Ayounsi: [C:03+1] "Ack" [homer/public] - 10https://gerrit.wikimedia.org/r/1082716 (https://phabricator.wikimedia.org/T378070) (owner: 10Cathal Mooney) [10:25:45] (03PS1) 10Slyngshede: P:idp add Redis database and password configuration. [puppet] - 10https://gerrit.wikimedia.org/r/1083768 (https://phabricator.wikimedia.org/T377728) [10:26:12] (03CR) 10Slyngshede: [V:03+2 C:03+2] IDP: Redis dummy password [labs/private] - 10https://gerrit.wikimedia.org/r/1083764 (https://phabricator.wikimedia.org/T377728) (owner: 10Slyngshede) [10:26:51] (03PS1) 10MVernon: cephadm: bump fs.aio-max-nr and kernel.pid_max [puppet] - 10https://gerrit.wikimedia.org/r/1083769 (https://phabricator.wikimedia.org/T279621) [10:26:52] (03PS1) 10MVernon: cephadm::osd fix comment typo (nfc) [puppet] - 10https://gerrit.wikimedia.org/r/1083770 [10:27:17] (03Abandoned) 10Slyngshede: P:idp Make Redis database number configurable. [puppet] - 10https://gerrit.wikimedia.org/r/1082711 (https://phabricator.wikimedia.org/T377937) (owner: 10Slyngshede) [10:27:22] (03CR) 10MVernon: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1083769 (https://phabricator.wikimedia.org/T279621) (owner: 10MVernon) [10:28:20] !log cmooney@cumin1002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2041.codfw.wmnet [10:29:03] (03CR) 10Slyngshede: [V:03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/4395/console" [puppet] - 10https://gerrit.wikimedia.org/r/1083768 (https://phabricator.wikimedia.org/T377728) (owner: 10Slyngshede) [10:29:17] !log cmooney@cumin1002 START - Cookbook sre.hosts.reboot-single for host ganeti2041.codfw.wmnet [10:29:53] (03PS2) 10Fabfur: haproxykafka: haproxykafka module [puppet] - 10https://gerrit.wikimedia.org/r/1083203 (https://phabricator.wikimedia.org/T374128) [10:30:14] 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Give Dumps 1.0 access to gmodena - https://phabricator.wikimedia.org/T377773#10267004 (10tappof) 05Open→03Stalled Waiting for the patch review. [10:30:25] PROBLEM - mailman list info on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [10:30:28] (03CR) 10CI reject: [V:04-1] haproxykafka: haproxykafka module [puppet] - 10https://gerrit.wikimedia.org/r/1083203 (https://phabricator.wikimedia.org/T374128) (owner: 10Fabfur) [10:31:07] PROBLEM - mailman archives on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [10:32:01] RECOVERY - mailman archives on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 52777 bytes in 4.656 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [10:32:15] RECOVERY - mailman list info on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 8922 bytes in 0.184 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [10:34:18] !log cmooney@cumin1002 START - Cookbook sre.hosts.reboot-single for host ganeti2042.codfw.wmnet [10:34:38] !log cmooney@cumin1002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2041.codfw.wmnet [10:34:38] jouncebot: nowandnext [10:34:38] For the next 0 hour(s) and 25 minute(s): MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241028T1000) [10:34:38] In 2 hour(s) and 25 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241028T1300) [10:34:49] !log cmooney@cumin1002 START - Cookbook sre.hosts.reboot-single for host ganeti2041.codfw.wmnet [10:35:09] (03PS1) 10Dreamy Jazz: Specify wiki ID to ::getId call in GlobalBlockingHandler [extensions/CheckUser] (wmf/1.43.0-wmf.28) - 10https://gerrit.wikimedia.org/r/1083772 (https://phabricator.wikimedia.org/T378085) [10:35:17] !log uploaded ircstream 0.13.0+wmf12u3 to apt.wikimedia.org (includes a fix which should hopefully reduce connection errors with bots using smart4irc) [10:35:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:35:47] (03CR) 10Dreamy Jazz: "Backporting because there has been an error on WMF production that this would fix." [extensions/CheckUser] (wmf/1.43.0-wmf.28) - 10https://gerrit.wikimedia.org/r/1083772 (https://phabricator.wikimedia.org/T378085) (owner: 10Dreamy Jazz) [10:36:25] !log T378227: rebuilding dewiki_titlesuggest [10:36:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:36:41] (03CR) 10Jcrespo: [C:03+1] cephadm::osd fix comment typo (nfc) [puppet] - 10https://gerrit.wikimedia.org/r/1083770 (owner: 10MVernon) [10:36:44] (03PS1) 10Tiziano Fogli: add kartig to deploy-ml-services group [puppet] - 10https://gerrit.wikimedia.org/r/1083773 (https://phabricator.wikimedia.org/T376585) [10:36:59] T378227: Investigate failed Cirrus index build services on mwmaint2002 (WIP) - https://phabricator.wikimedia.org/T378227 [10:37:38] (03PS3) 10Fabfur: haproxykafka: haproxykafka module [puppet] - 10https://gerrit.wikimedia.org/r/1083203 (https://phabricator.wikimedia.org/T374128) [10:39:29] (03CR) 10Vgutierrez: haproxykafka: haproxykafka module (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1083203 (https://phabricator.wikimedia.org/T374128) (owner: 10Fabfur) [10:39:33] (03CR) 10Slyngshede: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/4396/co" [puppet] - 10https://gerrit.wikimedia.org/r/1083768 (https://phabricator.wikimedia.org/T377728) (owner: 10Slyngshede) [10:39:36] (03CR) 10Vgutierrez: [C:04-1] haproxykafka: haproxykafka module [puppet] - 10https://gerrit.wikimedia.org/r/1083203 (https://phabricator.wikimedia.org/T374128) (owner: 10Fabfur) [10:39:50] !log cmooney@cumin1002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2042.codfw.wmnet [10:40:18] (03CR) 10Tiziano Fogli: "We received the necessary approvals." [puppet] - 10https://gerrit.wikimedia.org/r/1083773 (https://phabricator.wikimedia.org/T376585) (owner: 10Tiziano Fogli) [10:41:30] 06SRE, 06Infrastructure-Foundations, 10netops: Ganeti network config results in additional a6uto-conf IPv6 address - https://phabricator.wikimedia.org/T378335#10267055 (10cmooney) [10:41:39] (03PS10) 10Vgutierrez: liberica: provide a liberica module [puppet] - 10https://gerrit.wikimedia.org/r/1080708 (https://phabricator.wikimedia.org/T377127) [10:45:05] (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy2002 using scap backport" [extensions/CheckUser] (wmf/1.43.0-wmf.28) - 10https://gerrit.wikimedia.org/r/1083772 (https://phabricator.wikimedia.org/T378085) (owner: 10Dreamy Jazz) [10:45:24] (03CR) 10Slyngshede: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/4397/co" [puppet] - 10https://gerrit.wikimedia.org/r/1083768 (https://phabricator.wikimedia.org/T377728) (owner: 10Slyngshede) [10:45:29] 06SRE, 10SRE-Access-Requests, 06Machine-Learning-Team, 10LPL Essential (LPL Essential 2024 Jul-Sep), 13Patch-For-Review: Access to deploy recommendation API ML service for kartik - https://phabricator.wikimedia.org/T376585#10267059 (10tappof) Waiting for the patch review. [10:45:30] 06SRE, 06Infrastructure-Foundations, 10netops: Ganeti network config results in additional a6uto-conf IPv6 address - https://phabricator.wikimedia.org/T378335#10267071 (10cmooney) It seems that changing the "ip token" command from "pre-up" to "up" in /etc/network/interfaces makes things work as expected. Re... [10:46:00] !log cmooney@cumin1002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2041.codfw.wmnet [10:46:18] !log cmooney@cumin1002 START - Cookbook sre.hosts.reboot-single for host ganeti2042.codfw.wmnet [10:46:29] (03PS2) 10Slyngshede: P:idp add Redis database and password configuration. [puppet] - 10https://gerrit.wikimedia.org/r/1083768 (https://phabricator.wikimedia.org/T377728) [10:47:13] (03CR) 10Slyngshede: [V:03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/4398/console" [puppet] - 10https://gerrit.wikimedia.org/r/1083768 (https://phabricator.wikimedia.org/T377728) (owner: 10Slyngshede) [10:48:17] (03CR) 10Slyngshede: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/4399/co" [puppet] - 10https://gerrit.wikimedia.org/r/1083768 (https://phabricator.wikimedia.org/T377728) (owner: 10Slyngshede) [10:49:08] 10ops-codfw, 06SRE, 06Data-Persistence, 10Data-Persistence-Backup, 06DC-Ops: Q1:rack/setup/install backup2012 - https://phabricator.wikimedia.org/T371984#10267075 (10jcrespo) No need to be sorry. I was a bit pushy about it because we are in a bit of a hurry due to the parent ticket, as we were soon r... [10:50:10] (03CR) 10Slyngshede: [V:03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/4400/console" [puppet] - 10https://gerrit.wikimedia.org/r/1083768 (https://phabricator.wikimedia.org/T377728) (owner: 10Slyngshede) [10:50:34] !log elukey@puppetmaster1001:~$ sudo puppet cert destroy puppetboard.discovery.wmnet [10:50:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:51:34] 06SRE, 06Infrastructure-Foundations, 10netops: Ganeti network config results in additional a6uto-conf IPv6 address - https://phabricator.wikimedia.org/T378335#10267081 (10ayounsi) There might be some edge cases, but I think ideally we should disable the [[ https://sysctl-explorer.net/net/ipv6/autoconf/ | aut... [10:51:37] !log cmooney@cumin1002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2042.codfw.wmnet [10:51:53] (03PS11) 10Vgutierrez: liberica: provide a liberica module [puppet] - 10https://gerrit.wikimedia.org/r/1080708 (https://phabricator.wikimedia.org/T377127) [10:53:35] (03CR) 10Vgutierrez: liberica: provide a liberica module (0319 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1080708 (https://phabricator.wikimedia.org/T377127) (owner: 10Vgutierrez) [10:56:30] (03CR) 10Vgutierrez: liberica: provide a liberica module (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1080708 (https://phabricator.wikimedia.org/T377127) (owner: 10Vgutierrez) [10:57:11] (03CR) 10Muehlenhoff: [C:03+2] datahubsearch: Avoid Ferm-specific syntax [puppet] - 10https://gerrit.wikimedia.org/r/1083149 (owner: 10Muehlenhoff) [10:57:13] (03PS4) 10Fabfur: haproxykafka: haproxykafka module [puppet] - 10https://gerrit.wikimedia.org/r/1083203 (https://phabricator.wikimedia.org/T374128) [10:57:27] (03PS7) 10Vgutierrez: profile: Provide a liberica profile [puppet] - 10https://gerrit.wikimedia.org/r/1081372 (https://phabricator.wikimedia.org/T377127) [10:58:54] !log Ran `DROP TABLE /*_*/globalblocks` on all beta wikis (excluding the centralauth DB) - T377742 [10:58:56] (03PS2) 10Fabfur: haproxykafka: profile and hiera files [puppet] - 10https://gerrit.wikimedia.org/r/1083204 (https://phabricator.wikimedia.org/T374128) [10:58:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:58:58] T377742: The 'globalblocks' table exists on local beta DBs - https://phabricator.wikimedia.org/T377742 [11:00:15] (03CR) 10Elukey: "My only worry with -next suffixes is that if the move takes a long time, we may end up in a situation in which a new OS version is out (sa" [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1081989 (https://phabricator.wikimedia.org/T378128) (owner: 10Scott French) [11:02:26] (03CR) 10Dreamy Jazz: [C:03+2] Specify wiki ID to ::getId call in GlobalBlockingHandler [extensions/CheckUser] (wmf/1.43.0-wmf.28) - 10https://gerrit.wikimedia.org/r/1083772 (https://phabricator.wikimedia.org/T378085) (owner: 10Dreamy Jazz) [11:02:35] (03CR) 10Dreamy Jazz: Specify wiki ID to ::getId call in GlobalBlockingHandler [extensions/CheckUser] (wmf/1.43.0-wmf.28) - 10https://gerrit.wikimedia.org/r/1083772 (https://phabricator.wikimedia.org/T378085) (owner: 10Dreamy Jazz) [11:02:43] (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy2002 using scap backport" [extensions/CheckUser] (wmf/1.43.0-wmf.28) - 10https://gerrit.wikimedia.org/r/1083772 (https://phabricator.wikimedia.org/T378085) (owner: 10Dreamy Jazz) [11:03:26] (03CR) 10CI reject: [V:04-1] Specify wiki ID to ::getId call in GlobalBlockingHandler [extensions/CheckUser] (wmf/1.43.0-wmf.28) - 10https://gerrit.wikimedia.org/r/1083772 (https://phabricator.wikimedia.org/T378085) (owner: 10Dreamy Jazz) [11:03:41] (03CR) 10Dreamy Jazz: [C:03+2] Specify wiki ID to ::getId call in GlobalBlockingHandler [extensions/CheckUser] (wmf/1.43.0-wmf.28) - 10https://gerrit.wikimedia.org/r/1083772 (https://phabricator.wikimedia.org/T378085) (owner: 10Dreamy Jazz) [11:03:58] (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy2002 using scap backport" [extensions/CheckUser] (wmf/1.43.0-wmf.28) - 10https://gerrit.wikimedia.org/r/1083772 (https://phabricator.wikimedia.org/T378085) (owner: 10Dreamy Jazz) [11:05:52] !log updated spicerack to v8.15.1 on cumin1002 [11:05:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:06:16] (03PS3) 10Brouberol: kerberos: support binding additional domains to the WIKIMEDIA realm [puppet] - 10https://gerrit.wikimedia.org/r/1083775 (https://phabricator.wikimedia.org/T375716) [11:08:01] (03CR) 10Jcrespo: [C:03+1] cephadm: bump fs.aio-max-nr and kernel.pid_max [puppet] - 10https://gerrit.wikimedia.org/r/1083769 (https://phabricator.wikimedia.org/T279621) (owner: 10MVernon) [11:09:05] (03CR) 10MVernon: [C:03+2] cephadm: bump fs.aio-max-nr and kernel.pid_max [puppet] - 10https://gerrit.wikimedia.org/r/1083769 (https://phabricator.wikimedia.org/T279621) (owner: 10MVernon) [11:09:12] (03CR) 10MVernon: [C:03+2] cephadm::osd fix comment typo (nfc) [puppet] - 10https://gerrit.wikimedia.org/r/1083770 (owner: 10MVernon) [11:10:06] FIRING: MediaWikiLoginFailures: Elevated MediaWiki centrallogin failures (centralauth_error_badtoken) - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?viewPanel=3 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLoginFailures [11:12:01] (03CR) 10Brouberol: [C:03+1] Add dummy keytabs for new presto hosts [labs/private] - 10https://gerrit.wikimedia.org/r/1083755 (https://phabricator.wikimedia.org/T374924) (owner: 10Stevemunene) [11:14:26] RECOVERY - Disk space on archiva1002 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=archiva1002&var-datasource=eqiad+prometheus/ops [11:15:06] RESOLVED: MediaWikiLoginFailures: Elevated MediaWiki centrallogin failures (centralauth_error_badtoken) - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?viewPanel=3 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLoginFailures [11:15:29] (03PS5) 10Fabfur: haproxykafka: haproxykafka module [puppet] - 10https://gerrit.wikimedia.org/r/1083203 (https://phabricator.wikimedia.org/T374128) [11:15:32] (03PS8) 10Vgutierrez: profile: Provide a liberica profile [puppet] - 10https://gerrit.wikimedia.org/r/1081372 (https://phabricator.wikimedia.org/T377127) [11:15:59] (03CR) 10Fabfur: haproxykafka: haproxykafka module (0310 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1083203 (https://phabricator.wikimedia.org/T374128) (owner: 10Fabfur) [11:16:20] (03PS3) 10Fabfur: haproxykafka: profile and hiera files [puppet] - 10https://gerrit.wikimedia.org/r/1083204 (https://phabricator.wikimedia.org/T374128) [11:17:10] (03CR) 10Vgutierrez: profile: Provide a liberica profile (0310 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1081372 (https://phabricator.wikimedia.org/T377127) (owner: 10Vgutierrez) [11:19:07] (03CR) 10Fabfur: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1083203 (https://phabricator.wikimedia.org/T374128) (owner: 10Fabfur) [11:21:23] (03PS9) 10Vgutierrez: profile: Provide a liberica profile [puppet] - 10https://gerrit.wikimedia.org/r/1081372 (https://phabricator.wikimedia.org/T377127) [11:21:40] (03CR) 10Vgutierrez: "I've applied the Stdlib::Http::Status and Stdlib::IP::Address::V[46]::Nosubnet suggestions to service_from_wmflib() as well" [puppet] - 10https://gerrit.wikimedia.org/r/1081372 (https://phabricator.wikimedia.org/T377127) (owner: 10Vgutierrez) [11:22:56] (03Merged) 10jenkins-bot: Specify wiki ID to ::getId call in GlobalBlockingHandler [extensions/CheckUser] (wmf/1.43.0-wmf.28) - 10https://gerrit.wikimedia.org/r/1083772 (https://phabricator.wikimedia.org/T378085) (owner: 10Dreamy Jazz) [11:23:11] !log dreamyjazz@deploy2002 Started scap sync-world: Backport for [[gerrit:1083772|Specify wiki ID to ::getId call in GlobalBlockingHandler (T378085)]] [11:23:17] T378085: /wiki/Special:GlobalBlock Wikimedia\Assert\PreconditionException: Expected MediaWiki\User\UserIdentityValue to belong to the local wiki, but it belongs to 'commonswiki' - https://phabricator.wikimedia.org/T378085 [11:23:40] (03PS1) 10Elukey: role::aux_k8s::{master,worker}: add support for containerd [puppet] - 10https://gerrit.wikimedia.org/r/1083776 (https://phabricator.wikimedia.org/T378345) [11:24:56] 10SRE-tools, 06Infrastructure-Foundations: redfish: minimum version support - https://phabricator.wikimedia.org/T328593#10267253 (10ayounsi) If I understand correctly, this task is about upgrading iDRAC to be able to upgrade iDRAC or other firmware more easily in the future. If that's the case I don't think w... [11:25:27] !log dreamyjazz@deploy2002 dreamyjazz: Backport for [[gerrit:1083772|Specify wiki ID to ::getId call in GlobalBlockingHandler (T378085)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [11:25:41] !log dreamyjazz@deploy2002 dreamyjazz: Continuing with sync [11:25:41] (03CR) 10Elukey: [V:03+1] "PCC SUCCESS (): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/4403/console" [puppet] - 10https://gerrit.wikimedia.org/r/1083776 (https://phabricator.wikimedia.org/T378345) (owner: 10Elukey) [11:26:20] 06SRE, 06Infrastructure-Foundations, 10netops: Ganeti network config results in additional auto-conf IPv6 address - https://phabricator.wikimedia.org/T378335#10267264 (10cmooney) [11:26:35] 06SRE, 06Infrastructure-Foundations, 10netops: Ganeti network config results in additional auto-conf IPv6 address - https://phabricator.wikimedia.org/T378335#10267269 (10cmooney) [11:26:40] (03CR) 10Elukey: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/4404/co" [puppet] - 10https://gerrit.wikimedia.org/r/1083776 (https://phabricator.wikimedia.org/T378345) (owner: 10Elukey) [11:26:46] 06SRE, 10SRE-Access-Requests, 06Machine-Learning-Team, 10LPL Essential (LPL Essential 2024 Jul-Sep), 13Patch-For-Review: Access to deploy recommendation API ML service for kartik - https://phabricator.wikimedia.org/T376585#10267267 (10Nikerabbit) Approved. [11:27:04] 06SRE, 06Infrastructure-Foundations, 10netops: Ganeti network config results in additional auto-conf IPv6 address - https://phabricator.wikimedia.org/T378335#10267251 (10cmooney) >>! In T378335#10267081, @ayounsi wrote: > There might be some edge cases, but I think ideally we should disable the [[ https://sy... [11:27:50] 06SRE, 06Infrastructure-Foundations, 10netops: Create cookbook to set up ganeti host network - https://phabricator.wikimedia.org/T378346 (10cmooney) 03NEW p:05Triage→03Low [11:28:07] 10ops-codfw, 06DC-Ops, 06Machine-Learning-Team: hw troubleshooting: PSU failure/power cable loose for ml-serve2009.codfw.wmnet - https://phabricator.wikimedia.org/T378347 (10klausman) 03NEW [11:29:51] (03PS2) 10Elukey: role::aux_k8s::{master,worker}: add support for containerd [puppet] - 10https://gerrit.wikimedia.org/r/1083776 (https://phabricator.wikimedia.org/T378345) [11:30:04] (03CR) 10Klausman: [C:03+1] add kartig to deploy-ml-services group [puppet] - 10https://gerrit.wikimedia.org/r/1083773 (https://phabricator.wikimedia.org/T376585) (owner: 10Tiziano Fogli) [11:30:55] !log dreamyjazz@deploy2002 Finished scap sync-world: Backport for [[gerrit:1083772|Specify wiki ID to ::getId call in GlobalBlockingHandler (T378085)]] (duration: 07m 44s) [11:31:08] T378085: /wiki/Special:GlobalBlock Wikimedia\Assert\PreconditionException: Expected MediaWiki\User\UserIdentityValue to belong to the local wiki, but it belongs to 'commonswiki' - https://phabricator.wikimedia.org/T378085 [11:31:21] (03CR) 10Vgutierrez: haproxykafka: haproxykafka module (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1083203 (https://phabricator.wikimedia.org/T374128) (owner: 10Fabfur) [11:31:22] (03CR) 10Elukey: [V:03+1] "PCC SUCCESS (CORE_DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/4405/co" [puppet] - 10https://gerrit.wikimedia.org/r/1083776 (https://phabricator.wikimedia.org/T378345) (owner: 10Elukey) [11:34:14] (03CR) 10Elukey: add kartig to deploy-ml-services group (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1083773 (https://phabricator.wikimedia.org/T376585) (owner: 10Tiziano Fogli) [11:36:51] (03PS1) 10Brouberol: analytics/cluster/secrets_test: disable all hdfs_secrets rendering for now [puppet] - 10https://gerrit.wikimedia.org/r/1083777 (https://phabricator.wikimedia.org/T323692) [11:37:26] (03CR) 10CI reject: [V:04-1] analytics/cluster/secrets_test: disable all hdfs_secrets rendering for now [puppet] - 10https://gerrit.wikimedia.org/r/1083777 (https://phabricator.wikimedia.org/T323692) (owner: 10Brouberol) [11:38:45] (03PS1) 10Vgutierrez: role,site: Provide a liberica role and use it on lvs1013 [puppet] - 10https://gerrit.wikimedia.org/r/1083778 (https://phabricator.wikimedia.org/T377127) [11:39:22] (03CR) 10Brouberol: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1083777 (https://phabricator.wikimedia.org/T323692) (owner: 10Brouberol) [11:39:28] (03CR) 10CI reject: [V:04-1] role,site: Provide a liberica role and use it on lvs1013 [puppet] - 10https://gerrit.wikimedia.org/r/1083778 (https://phabricator.wikimedia.org/T377127) (owner: 10Vgutierrez) [11:40:15] (03PS2) 10Brouberol: analytics/cluster/secrets_test: disable all hdfs_secrets rendering for now [puppet] - 10https://gerrit.wikimedia.org/r/1083777 (https://phabricator.wikimedia.org/T323692) [11:40:16] FIRING: MediaWikiLatencyExceeded: p75 latency high: eqiad mw-parsoid (k8s) 1.698s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [11:40:51] (03CR) 10CI reject: [V:04-1] analytics/cluster/secrets_test: disable all hdfs_secrets rendering for now [puppet] - 10https://gerrit.wikimedia.org/r/1083777 (https://phabricator.wikimedia.org/T323692) (owner: 10Brouberol) [11:43:03] (03CR) 10David Caro: P:toolforge::proxy: use svc.toolforge.org (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1080056 (owner: 10Majavah) [11:45:16] RESOLVED: MediaWikiLatencyExceeded: p75 latency high: eqiad mw-parsoid (k8s) 1.698s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [11:45:20] (03PS2) 10Vgutierrez: role,site: Provide a liberica role and use it on lvs1013 [puppet] - 10https://gerrit.wikimedia.org/r/1083778 (https://phabricator.wikimedia.org/T377127) [11:45:38] (03PS3) 10Brouberol: analytics/cluster/secrets_test: disable all hdfs_secrets rendering for now [puppet] - 10https://gerrit.wikimedia.org/r/1083777 (https://phabricator.wikimedia.org/T323692) [11:45:58] (03CR) 10Majavah: P:toolforge::proxy: use svc.toolforge.org (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1080056 (owner: 10Majavah) [11:47:58] (03CR) 10Elukey: "Thanks! I'd just personally comment the profile include in role::analytics_test_cluster::hadoop::master, what do you think?" [puppet] - 10https://gerrit.wikimedia.org/r/1083777 (https://phabricator.wikimedia.org/T323692) (owner: 10Brouberol) [11:48:29] 06SRE, 06Infrastructure-Foundations, 10netops: Create cookbook to set up ganeti host network - https://phabricator.wikimedia.org/T378346#10267435 (10Volans) Should it be part of the `sre.ganeti.addnode` cookbook or done at reimage time? [11:49:32] (03CR) 10Muehlenhoff: [C:03+2] Deprecate system::role for Hadoop roles [puppet] - 10https://gerrit.wikimedia.org/r/1083157 (owner: 10Muehlenhoff) [11:49:59] (03PS1) 10Giuseppe Lavagetto: fetch_external_clouds_vendors_nets: compatibility with conftool 4.0 [puppet] - 10https://gerrit.wikimedia.org/r/1083781 (https://phabricator.wikimedia.org/T376877) [11:52:24] (03CR) 10David Caro: P:toolforge::proxy: use svc.toolforge.org (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1080056 (owner: 10Majavah) [11:55:25] (03PS1) 10Muehlenhoff: Remove incorreclty used system::role [puppet] - 10https://gerrit.wikimedia.org/r/1083783 [11:58:39] RESOLVED: CirrusSearchTitleSuggestIndexTooOld: Some search indices that power autocomplete have not been updated recently - https://wikitech.wikimedia.org/wiki/Search/Elasticsearch_Administration#CirrusSearch_titlesuggest_index_is_too_old - TODO - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchTitleSuggestIndexTooOld [11:59:29] 06SRE, 06Infrastructure-Foundations, 10netops: Create cookbook to set up ganeti host network - https://phabricator.wikimedia.org/T378346#10267464 (10MoritzMuehlenhoff) >>! In T378346#10267435, @Volans wrote: > Should it be part of the `sre.ganeti.addnode` cookbook Mo, having this created (and rebooted) is... [11:59:57] (03CR) 10Muehlenhoff: [C:03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/1083757 (https://phabricator.wikimedia.org/T371378) (owner: 10Majavah) [12:04:03] (03PS6) 10Fabfur: haproxykafka: haproxykafka module [puppet] - 10https://gerrit.wikimedia.org/r/1083203 (https://phabricator.wikimedia.org/T374128) [12:04:03] (03PS4) 10Fabfur: haproxykafka: profile and hiera files [puppet] - 10https://gerrit.wikimedia.org/r/1083204 (https://phabricator.wikimedia.org/T374128) [12:06:03] <_joe_> !log uploaded conftool 4.0.0-1 to reprepro T376877 [12:06:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:06:15] T376877: Deprecate sync, add apply command to requestctl - https://phabricator.wikimedia.org/T376877 [12:08:21] (03CR) 10Vgutierrez: haproxykafka: profile and hiera files (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1083204 (https://phabricator.wikimedia.org/T374128) (owner: 10Fabfur) [12:12:00] !log upgrade irc.wikimedia.org to ircstream 0.13.0+wmf12u3 T376014 [12:12:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:12:14] T376014: Create and deploy a re-reimplementation of irc.wikimedia.org in Python 3 without external service deps - https://phabricator.wikimedia.org/T376014 [12:13:00] RECOVERY - MariaDB Replica Lag: s1 on db1169 is OK: OK slave_sql_lag Replication lag: 0.00 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [12:16:23] (03CR) 10Vgutierrez: [V:03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/4408/console" [puppet] - 10https://gerrit.wikimedia.org/r/1011167 (https://phabricator.wikimedia.org/T342398) (owner: 10Majavah) [12:18:53] (03CR) 10Majavah: [V:03+1 C:03+2] P:openstack: Remove apache httpd from cloudweb servers [puppet] - 10https://gerrit.wikimedia.org/r/1083757 (https://phabricator.wikimedia.org/T371378) (owner: 10Majavah) [12:21:02] (03PS1) 10PipelineBot: mobileapps: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1083795 [12:21:31] FIRING: [4x] ProbeDown: Service centrallog1002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [12:23:45] (03PS1) 10Gmodena: services: page-content-change-enrich: version bump [deployment-charts] - 10https://gerrit.wikimedia.org/r/1083796 (https://phabricator.wikimedia.org/T377938) [12:24:00] (03CR) 10Stevemunene: [V:03+2 C:03+2] Add dummy keytabs for new presto hosts [labs/private] - 10https://gerrit.wikimedia.org/r/1083755 (https://phabricator.wikimedia.org/T374924) (owner: 10Stevemunene) [12:26:29] (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1083773 (https://phabricator.wikimedia.org/T376585) (owner: 10Tiziano Fogli) [12:26:56] (03PS1) 10Gmodena: dse-k8s-services: mw-dump: version bump [deployment-charts] - 10https://gerrit.wikimedia.org/r/1083803 (https://phabricator.wikimedia.org/T377938) [12:29:49] (03CR) 10Vgutierrez: [V:03+1] "change itself looks good but it doesn't seem to be enough: nginx http-challenges-template expects the challenges to be written on /etc/acm" [puppet] - 10https://gerrit.wikimedia.org/r/1011167 (https://phabricator.wikimedia.org/T342398) (owner: 10Majavah) [12:34:32] (03CR) 10Muehlenhoff: [C:03+1] "Looks good!" [puppet] - 10https://gerrit.wikimedia.org/r/1083775 (https://phabricator.wikimedia.org/T375716) (owner: 10Brouberol) [12:34:55] (03PS4) 10Brouberol: analytics/cluster/secrets_test: disable all hdfs_secrets rendering for now [puppet] - 10https://gerrit.wikimedia.org/r/1083777 (https://phabricator.wikimedia.org/T323692) [12:36:53] (03CR) 10Brouberol: "good call & done!" [puppet] - 10https://gerrit.wikimedia.org/r/1083777 (https://phabricator.wikimedia.org/T323692) (owner: 10Brouberol) [12:37:01] (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1083766 (https://phabricator.wikimedia.org/T377773) (owner: 10Tiziano Fogli) [12:37:09] (03CR) 10Brouberol: [V:03+1 C:03+2] kerberos: support binding additional domains to the WIKIMEDIA realm [puppet] - 10https://gerrit.wikimedia.org/r/1083775 (https://phabricator.wikimedia.org/T375716) (owner: 10Brouberol) [12:41:08] 10ops-codfw, 06SRE, 06DC-Ops: PowerSupplyFailure - https://phabricator.wikimedia.org/T378201#10267572 (10Jhancock.wm) 05Open→03Resolved a:03Jhancock.wm fixed loose cable. secured. alert cleared. [12:42:21] (03PS1) 10Muehlenhoff: Deprecate system::role for remaining SRE-Collab roles [puppet] - 10https://gerrit.wikimedia.org/r/1083805 [12:43:11] (03PS5) 10Majavah: P:acme_chief: allow enabling http-01 spport [puppet] - 10https://gerrit.wikimedia.org/r/1011167 (https://phabricator.wikimedia.org/T342398) [12:43:11] (03PS5) 10Majavah: P:wmcs::novaproxy: proxy http-01 challenges to acme-chief [puppet] - 10https://gerrit.wikimedia.org/r/1011168 (https://phabricator.wikimedia.org/T342398) [12:44:53] (03CR) 10Majavah: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/4409/co" [puppet] - 10https://gerrit.wikimedia.org/r/1011167 (https://phabricator.wikimedia.org/T342398) (owner: 10Majavah) [12:45:28] (03CR) 10Majavah: [V:03+1] "good catch. turns out the path in the template is wrong, the directory is actually `/var/lib/acme-chief/http_challenges/`. that does exist" [puppet] - 10https://gerrit.wikimedia.org/r/1011167 (https://phabricator.wikimedia.org/T342398) (owner: 10Majavah) [12:45:30] 10ops-codfw, 06SRE, 06DC-Ops, 06Machine-Learning-Team: hw troubleshooting: PSU failure/power cable loose for ml-serve2009.codfw.wmnet - https://phabricator.wikimedia.org/T378347#10267576 (10Jhancock.wm) 05Open→03Resolved it was. it's secure now. [12:46:48] (03PS1) 10Muehlenhoff: Deprecate system::role for maps [puppet] - 10https://gerrit.wikimedia.org/r/1083807 [12:49:33] 10ops-codfw, 06SRE, 06DC-Ops: Degraded RAID on wikikube-worker2068 - https://phabricator.wikimedia.org/T378255#10267594 (10Jhancock.wm) the server is not showing any errors in idrac. Both drives are green but the one in physical drive bay 0 is a solid green, not a blinking green. please let us know if this n... [12:57:31] FIRING: Traffic bill over quota: Alert for device cr3-ulsfo.wikimedia.org - Traffic bill over quota - https://alerts.wikimedia.org/?q=alertname%3DTraffic+bill+over+quota [12:57:47] (03PS1) 10Santiago Faci: aqs-http-gateway chart: Removed old property ('datasource') that set the mediawiki history snapshot name. Now that part is automated and this property is no longer needed [deployment-charts] - 10https://gerrit.wikimedia.org/r/1083810 (https://phabricator.wikimedia.org/T366157) [13:00:05] Urbanecm and TheresNoTime: That opportune time for a UTC afternoon backport window deploy is upon us again. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241028T1300). [13:00:05] cormacparle and Daimona: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [13:00:23] o/ [13:00:27] o/ [13:02:06] (03CR) 10Ssingh: "Happy to do it today :)" [puppet] - 10https://gerrit.wikimedia.org/r/1059156 (owner: 10Dzahn) [13:04:51] Any deployers around? [13:05:43] FIRING: [2x] IPv4AnchorUnreachable: ipv4 ping to eqiad RIPE Atlas anchor: failures over threshold - https://wikitech.wikimedia.org/wiki/Network_monitoring#Atlas_alerts - https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DIPv4AnchorUnreachable [13:06:13] (03PS2) 10Santiago Faci: aqs-http-gateway chart: Removed old property ('datasource') that set the mediawiki history snapshot name. Now that part is automated and this property is no longer needed [deployment-charts] - 10https://gerrit.wikimedia.org/r/1083810 (https://phabricator.wikimedia.org/T366157) [13:06:40] (03PS3) 10Slyngshede: P:idp add Redis database and password configuration. [puppet] - 10https://gerrit.wikimedia.org/r/1083768 (https://phabricator.wikimedia.org/T377728) [13:08:26] (03PS3) 10Santiago Faci: aqs-http-gateway chart: Removed old property ('datasource') that set the mediawiki history snapshot name. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1083810 (https://phabricator.wikimedia.org/T366157) [13:11:42] (03PS1) 10Muehlenhoff: Add ganeti2041/2042 to list of Ganeti nodes [puppet] - 10https://gerrit.wikimedia.org/r/1083811 [13:11:49] (03PS4) 10Slyngshede: P:idp add Redis database and password configuration. [puppet] - 10https://gerrit.wikimedia.org/r/1083768 (https://phabricator.wikimedia.org/T377728) [13:12:56] (03CR) 10Slyngshede: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/4410/co" [puppet] - 10https://gerrit.wikimedia.org/r/1083768 (https://phabricator.wikimedia.org/T377728) (owner: 10Slyngshede) [13:13:04] 06SRE, 10Ganeti, 06Infrastructure-Foundations: Add ganeti2035 to ganeti2044 and decom ganeti2009 to ganeti2018 - https://phabricator.wikimedia.org/T376594#10267669 (10MoritzMuehlenhoff) [13:13:31] (03CR) 10Muehlenhoff: [C:03+2] Add ganeti2041/2042 to list of Ganeti nodes [puppet] - 10https://gerrit.wikimedia.org/r/1083811 (owner: 10Muehlenhoff) [13:16:43] 06SRE, 06Infrastructure-Foundations: Integrate Bookworm 12.6 point update - https://phabricator.wikimedia.org/T374536#10267682 (10MoritzMuehlenhoff) [13:16:54] !log installing bash/zsh updates from bookworm point release [13:16:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:17:31] RESOLVED: Traffic bill over quota: Alert for device cr3-ulsfo.wikimedia.org - Traffic bill over quota - https://alerts.wikimedia.org/?q=alertname%3DTraffic+bill+over+quota [13:20:43] RESOLVED: [2x] IPv4AnchorUnreachable: ipv4 ping to eqiad RIPE Atlas anchor: failures over threshold - https://wikitech.wikimedia.org/wiki/Network_monitoring#Atlas_alerts - https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DIPv4AnchorUnreachable [13:22:30] PROBLEM - BGP status on cr2-eqiad is CRITICAL: BGP CRITICAL - AS7195/IPv4: Idle - EdgeUno, AS7195/IPv6: Idle - EdgeUno https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status [13:23:24] (03PS1) 10Arnaudb: mariadb: add db2223 [puppet] - 10https://gerrit.wikimedia.org/r/1083813 (https://phabricator.wikimedia.org/T374951) [13:24:21] @Urbanecm, @TheresNoTime Hi! Is either of you available for deploying? [13:24:35] Daimona: i'm here [13:25:16] Daimona: patches from https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241028T1300 i presume? [13:25:24] Yup [13:25:54] (03PS2) 10Daimona Eaytoy: Enable CampaignEvents collaboration list by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1083170 (https://phabricator.wikimedia.org/T375141) [13:26:00] (03CR) 10Urbanecm: [C:03+2] Enable CampaignEvents collaboration list by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1083170 (https://phabricator.wikimedia.org/T375141) (owner: 10Daimona Eaytoy) [13:26:04] (03PS3) 10Daimona Eaytoy: beta: Drop $wgCampaignEventsShowEventInvitationSpecialPages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1077370 (https://phabricator.wikimedia.org/T373442) [13:26:06] (03CR) 10Urbanecm: [C:03+2] beta: Drop $wgCampaignEventsShowEventInvitationSpecialPages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1077370 (https://phabricator.wikimedia.org/T373442) (owner: 10Daimona Eaytoy) [13:26:09] (03PS3) 10Daimona Eaytoy: prod: Drop $wgCampaignEventsShowEventInvitationSpecialPages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1077371 (https://phabricator.wikimedia.org/T373442) [13:26:12] (03CR) 10Urbanecm: [C:03+2] prod: Drop $wgCampaignEventsShowEventInvitationSpecialPages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1077371 (https://phabricator.wikimedia.org/T373442) (owner: 10Daimona Eaytoy) [13:26:17] let's do it [13:26:25] Thanks ^_^ [13:26:29] Thanks :) [13:26:37] (03PS4) 10Anzx: knwiktionary: update logo, wordmark [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1083754 (https://phabricator.wikimedia.org/T360022) [13:26:39] (03Merged) 10jenkins-bot: Enable CampaignEvents collaboration list by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1083170 (https://phabricator.wikimedia.org/T375141) (owner: 10Daimona Eaytoy) [13:26:50] (03PS3) 10Anzx: hewikisource: add project namespace alias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1083762 (https://phabricator.wikimedia.org/T378303) [13:26:54] (03Merged) 10jenkins-bot: beta: Drop $wgCampaignEventsShowEventInvitationSpecialPages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1077370 (https://phabricator.wikimedia.org/T373442) (owner: 10Daimona Eaytoy) [13:26:56] (03Merged) 10jenkins-bot: prod: Drop $wgCampaignEventsShowEventInvitationSpecialPages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1077371 (https://phabricator.wikimedia.org/T373442) (owner: 10Daimona Eaytoy) [13:27:18] !log urbanecm@deploy2002 Started scap sync-world: Backport for [[gerrit:1083170|Enable CampaignEvents collaboration list by default (T375141)]], [[gerrit:1077370|beta: Drop $wgCampaignEventsShowEventInvitationSpecialPages (T373442)]], [[gerrit:1077371|prod: Drop $wgCampaignEventsShowEventInvitationSpecialPages (T373442)]] [13:27:49] T375141: Release Collaboration List MVP to all wikis with CampaignEvents extension - https://phabricator.wikimedia.org/T375141 [13:27:49] T373442: Remove feature flag for hiding invitation list special pages - https://phabricator.wikimedia.org/T373442 [13:27:53] (03CR) 10Ottomata: [C:03+1] services: page-content-change-enrich: version bump [deployment-charts] - 10https://gerrit.wikimedia.org/r/1083796 (https://phabricator.wikimedia.org/T377938) (owner: 10Gmodena) [13:27:59] (03CR) 10Ottomata: [C:03+1] dse-k8s-services: mw-dump: version bump [deployment-charts] - 10https://gerrit.wikimedia.org/r/1083803 (https://phabricator.wikimedia.org/T377938) (owner: 10Gmodena) [13:29:29] !log urbanecm@deploy2002 urbanecm, daimona: Backport for [[gerrit:1083170|Enable CampaignEvents collaboration list by default (T375141)]], [[gerrit:1077370|beta: Drop $wgCampaignEventsShowEventInvitationSpecialPages (T373442)]], [[gerrit:1077371|prod: Drop $wgCampaignEventsShowEventInvitationSpecialPages (T373442)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [13:29:42] Daimona: can you test, please? [13:31:11] !log arnaudb@cumin2002 START - Cookbook sre.mysql.depool db2211 - test depool [13:31:52] !log arnaudb@cumin2002 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2211 - test depool [13:33:34] (03CR) 10Urbanecm: Add config for testing T375264 on beta (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1082809 (https://phabricator.wikimedia.org/T377988) (owner: 10Cparle) [13:33:36] !log arnaudb@cumin2002 START - Cookbook sre.mysql.pool db2211 quickly with 2 steps - test fast pool [13:34:00] volans ↑ tested depool/fast repool, lgtm :D [13:34:06] thanks! [13:34:10] <3 [13:34:26] Daimona: lgtm [13:35:05] (03CR) 10Slyngshede: P:idp add Redis database and password configuration. [puppet] - 10https://gerrit.wikimedia.org/r/1083768 (https://phabricator.wikimedia.org/T377728) (owner: 10Slyngshede) [13:35:24] Yup, LGTM, checked every wiki [13:35:58] (03CR) 10Slyngshede: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/4411/co" [puppet] - 10https://gerrit.wikimedia.org/r/1083768 (https://phabricator.wikimedia.org/T377728) (owner: 10Slyngshede) [13:36:12] Daimona: fortunately it's not yet enabled everywhere :) [13:36:13] thanks, proceeding [13:36:16] !log urbanecm@deploy2002 urbanecm, daimona: Continuing with sync [13:36:30] Lol [13:37:15] !log jmm@cumin2002 START - Cookbook sre.ganeti.addnode for new host ganeti2041.codfw.wmnet to cluster codfw and group D [13:37:40] I'm picturing myself happily reporting that all wikis starting with "a" are working correctly :P [13:37:54] XD [13:37:55] And joyfully moving on to b [13:38:30] !log jmm@cumin2002 END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2041.codfw.wmnet to cluster codfw and group D [13:39:22] :D [13:40:48] (03CR) 10Fabfur: haproxykafka: haproxykafka module (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1083203 (https://phabricator.wikimedia.org/T374128) (owner: 10Fabfur) [13:41:02] !log urbanecm@deploy2002 Finished scap sync-world: Backport for [[gerrit:1083170|Enable CampaignEvents collaboration list by default (T375141)]], [[gerrit:1077370|beta: Drop $wgCampaignEventsShowEventInvitationSpecialPages (T373442)]], [[gerrit:1077371|prod: Drop $wgCampaignEventsShowEventInvitationSpecialPages (T373442)]] (duration: 13m 43s) [13:41:12] Daimona: all live [13:41:21] anzx: hi, around? [13:41:23] T375141: Release Collaboration List MVP to all wikis with CampaignEvents extension - https://phabricator.wikimedia.org/T375141 [13:41:23] T373442: Remove feature flag for hiding invitation list special pages - https://phabricator.wikimedia.org/T373442 [13:41:30] (03PS4) 10Anzx: hewikisource: add project namespace alias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1083762 (https://phabricator.wikimedia.org/T378303) [13:41:32] Interestingly, 's' seems to be the most common initial with 101 occurrences, followed by 'a' with 77. On the other end we have 'q' and 'x' with 5 wikis each. And yes I just looked it up. [13:41:41] TYSM urbanecm [13:41:57] any time [13:42:03] Daimona: why I am unsurprised that you would do that :D [13:42:08] urbanecm: yes i am around [13:42:10] (03PS7) 10Fabfur: haproxykafka: haproxykafka module [puppet] - 10https://gerrit.wikimedia.org/r/1083203 (https://phabricator.wikimedia.org/T374128) [13:42:30] Of course I would :P Thanks @urbanecm :) [13:43:25] (03PS5) 10Anzx: knwiktionary: update logo, wordmark [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1083754 (https://phabricator.wikimedia.org/T360022) [13:43:57] ahh... of course CI doesn't run on this rebase, why am i surprised... [13:44:02] (03CR) 10Urbanecm: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1083762 (https://phabricator.wikimedia.org/T378303) (owner: 10Anzx) [13:44:05] (03CR) 10Urbanecm: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1083754 (https://phabricator.wikimedia.org/T360022) (owner: 10Anzx) [13:44:25] (03PS8) 10Fabfur: haproxykafka: haproxykafka module [puppet] - 10https://gerrit.wikimedia.org/r/1083203 (https://phabricator.wikimedia.org/T374128) [13:44:25] (03PS5) 10Fabfur: haproxykafka: profile and hiera files [puppet] - 10https://gerrit.wikimedia.org/r/1083204 (https://phabricator.wikimedia.org/T374128) [13:44:46] (03CR) 10CI reject: [V:04-1] hewikisource: add project namespace alias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1083762 (https://phabricator.wikimedia.org/T378303) (owner: 10Anzx) [13:46:20] urbanecm: sorry, I was out on my bike and got a puncture, just back at the desk now [13:46:44] cormacparle: sorry to hear that! no worries, i'll get to you soon [13:46:58] anzx: please take a look at the CI failure [13:47:16] (03CR) 10Urbanecm: [C:03+2] knwiktionary: update logo, wordmark [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1083754 (https://phabricator.wikimedia.org/T360022) (owner: 10Anzx) [13:47:35] cormacparle: can you take a look at the comment i made on your patch, please? [13:47:55] (03Merged) 10jenkins-bot: knwiktionary: update logo, wordmark [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1083754 (https://phabricator.wikimedia.org/T360022) (owner: 10Anzx) [13:48:33] (03CR) 10Cathal Mooney: [C:03+2] Interface automation templates for pfw devices [homer/public] - 10https://gerrit.wikimedia.org/r/1082716 (https://phabricator.wikimedia.org/T378070) (owner: 10Cathal Mooney) [13:48:44] (03CR) 10Cathal Mooney: [C:03+2] Add additional ignore line to Juniper warnings for Homer [puppet] - 10https://gerrit.wikimedia.org/r/1082728 (https://phabricator.wikimedia.org/T378070) (owner: 10Cathal Mooney) [13:49:52] (03Merged) 10jenkins-bot: Interface automation templates for pfw devices [homer/public] - 10https://gerrit.wikimedia.org/r/1082716 (https://phabricator.wikimedia.org/T378070) (owner: 10Cathal Mooney) [13:49:58] !log arnaudb@cumin2002 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2211 quickly with 2 steps - test fast pool [13:50:10] (03PS1) 10Bking: stat hosts: guarantee minimum RAM% for system processes [puppet] - 10https://gerrit.wikimedia.org/r/1083815 (https://phabricator.wikimedia.org/T377734) [13:51:13] (03CR) 10Ssingh: [C:03+1] "Looks good to me. Nice and clean!" [puppet] - 10https://gerrit.wikimedia.org/r/1080708 (https://phabricator.wikimedia.org/T377127) (owner: 10Vgutierrez) [13:52:17] !log urbanecm@deploy2002 Started scap sync-world: Backport for [[gerrit:1083754|knwiktionary: update logo, wordmark (T360022)]] [13:52:37] T360022: Update logo for Kannada Wikisource and Wiktionary - https://phabricator.wikimedia.org/T360022 [13:53:42] (03PS2) 10Cparle: Add config for testing T375264 on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1082809 (https://phabricator.wikimedia.org/T377988) [13:53:59] anzx: how's the CI fixing going? [13:54:06] looking [13:54:27] (03CR) 10Slyngshede: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/4412/co" [puppet] - 10https://gerrit.wikimedia.org/r/1083768 (https://phabricator.wikimedia.org/T377728) (owner: 10Slyngshede) [13:54:29] !log urbanecm@deploy2002 anzx, urbanecm: Backport for [[gerrit:1083754|knwiktionary: update logo, wordmark (T360022)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [13:55:10] (03PS5) 10Anzx: hewikisource: add project namespace alias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1083762 (https://phabricator.wikimedia.org/T378303) [13:55:15] urbanecm: updated the patch as requested [13:55:21] thanks! [13:55:32] (03CR) 10Urbanecm: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1083762 (https://phabricator.wikimedia.org/T378303) (owner: 10Anzx) [13:55:40] (03CR) 10Elukey: [C:03+1] Deprecate system::role for maps [puppet] - 10https://gerrit.wikimedia.org/r/1083807 (owner: 10Muehlenhoff) [13:55:45] (03CR) 10Urbanecm: [C:03+2] "LGTM, thanks!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1082809 (https://phabricator.wikimedia.org/T377988) (owner: 10Cparle) [13:55:59] anzx: please test the logo patch [13:56:32] (03CR) 10Urbanecm: [C:03+2] hewikisource: add project namespace alias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1083762 (https://phabricator.wikimedia.org/T378303) (owner: 10Anzx) [13:56:54] urbanecm: logo looks good [13:57:01] thanks [13:57:07] !log urbanecm@deploy2002 Sync cancelled. [13:57:19] going to squeeze the other two patches in too [13:57:31] (03CR) 10TrainBranchBot: [C:03+2] "Approved by urbanecm@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1083762 (https://phabricator.wikimedia.org/T378303) (owner: 10Anzx) [13:57:32] (03CR) 10TrainBranchBot: [C:03+2] "Approved by urbanecm@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1082809 (https://phabricator.wikimedia.org/T377988) (owner: 10Cparle) [13:57:42] (03Merged) 10jenkins-bot: Add config for testing T375264 on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1082809 (https://phabricator.wikimedia.org/T377988) (owner: 10Cparle) [13:57:45] (03Merged) 10jenkins-bot: hewikisource: add project namespace alias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1083762 (https://phabricator.wikimedia.org/T378303) (owner: 10Anzx) [13:58:01] !log urbanecm@deploy2002 Started scap sync-world: Backport for [[gerrit:1083754|knwiktionary: update logo, wordmark (T360022)]], [[gerrit:1083762|hewikisource: add project namespace alias (T378303)]], [[gerrit:1082809|Add config for testing T375264 on beta (T377988)]] [13:58:25] T360022: Update logo for Kannada Wikisource and Wiktionary - https://phabricator.wikimedia.org/T360022 [13:58:25] T378303: Add alias namespace for hewikisource - https://phabricator.wikimedia.org/T378303 [13:58:26] T375264: Identify uploads on Commons with external links - https://phabricator.wikimedia.org/T375264 [13:58:26] T377988: Add config to beta commons to allowing testing of external links in UW - https://phabricator.wikimedia.org/T377988 [14:01:33] !log urbanecm@deploy2002 anzx, cparle, urbanecm: Backport for [[gerrit:1083754|knwiktionary: update logo, wordmark (T360022)]], [[gerrit:1083762|hewikisource: add project namespace alias (T378303)]], [[gerrit:1082809|Add config for testing T375264 on beta (T377988)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [14:01:40] testing [14:01:48] (03PS5) 10Slyngshede: P:idp add Redis database and password configuration. [puppet] - 10https://gerrit.wikimedia.org/r/1083768 (https://phabricator.wikimedia.org/T377728) [14:01:48] (03CR) 10Ssingh: [C:03+1] "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/1081372 (https://phabricator.wikimedia.org/T377127) (owner: 10Vgutierrez) [14:03:54] urbanecm: checked both logo and namespaces both looks good [14:03:59] sounds good! [14:04:01] !log urbanecm@deploy2002 anzx, cparle, urbanecm: Continuing with sync [14:06:18] (03CR) 10Bking: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1083815 (https://phabricator.wikimedia.org/T377734) (owner: 10Bking) [14:07:56] (03CR) 10Ssingh: [C:03+1] "I forgot one thing, sorry:" [puppet] - 10https://gerrit.wikimedia.org/r/1080708 (https://phabricator.wikimedia.org/T377127) (owner: 10Vgutierrez) [14:08:08] (03PS1) 10Arturo Borrero Gonzalez: openstack: designate: nova_fixed_multi: base: refactor record creation routine [puppet] - 10https://gerrit.wikimedia.org/r/1083820 (https://phabricator.wikimedia.org/T378192) [14:08:45] !log urbanecm@deploy2002 Finished scap sync-world: Backport for [[gerrit:1083754|knwiktionary: update logo, wordmark (T360022)]], [[gerrit:1083762|hewikisource: add project namespace alias (T378303)]], [[gerrit:1082809|Add config for testing T375264 on beta (T377988)]] (duration: 10m 43s) [14:08:49] (03CR) 10CI reject: [V:04-1] openstack: designate: nova_fixed_multi: base: refactor record creation routine [puppet] - 10https://gerrit.wikimedia.org/r/1083820 (https://phabricator.wikimedia.org/T378192) (owner: 10Arturo Borrero Gonzalez) [14:09:19] T360022: Update logo for Kannada Wikisource and Wiktionary - https://phabricator.wikimedia.org/T360022 [14:09:19] T378303: Add alias namespace for hewikisource - https://phabricator.wikimedia.org/T378303 [14:09:20] T375264: Identify uploads on Commons with external links - https://phabricator.wikimedia.org/T375264 [14:09:20] T377988: Add config to beta commons to allowing testing of external links in UW - https://phabricator.wikimedia.org/T377988 [14:09:40] should be all live! [14:10:03] urbanecm: thanks, please clear memcache for logos [14:10:12] it's not memcache, but will do [14:10:38] 10ops-eqiad, 06SRE, 06DBA, 06DC-Ops: db1234 crashed - faulty memory stick on A6 (0x4E42) - https://phabricator.wikimedia.org/T378267#10267888 (10VRiley-WMF) a:03VRiley-WMF [14:11:05] urbanecm: thank you! there's nothing really to test for me because the code this patch configures has been reverted (until later in the week) [14:18:29] 10ops-eqiad, 06SRE, 06DC-Ops: hw troubleshooting: server failure for cloudvirt1063.eqiad.wmnet - https://phabricator.wikimedia.org/T375372#10267912 (10Jclark-ctr) Dell has agreed to replace Mainboard working on scheduling [14:21:16] 06SRE, 10SRE-tools, 06Infrastructure-Foundations: exception raised for "sre.dns.admin show" - https://phabricator.wikimedia.org/T378039#10267919 (10Volans) p:05Triage→03Medium [14:23:18] 07Puppet, 10SRE-tools, 06DC-Ops, 06Infrastructure-Foundations, 10observability: RAID monitoring on new hardware spec requires new or updated user space cli tool - https://phabricator.wikimedia.org/T377853#10267925 (10elukey) p:05Triage→03Medium [14:25:26] PROBLEM - mailman list info on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [14:25:54] PROBLEM - mailman archives on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [14:25:57] (03CR) 10Dzahn: [C:03+1] add gmodena to dumps-root [puppet] - 10https://gerrit.wikimedia.org/r/1083766 (https://phabricator.wikimedia.org/T377773) (owner: 10Tiziano Fogli) [14:26:46] RECOVERY - mailman archives on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 52777 bytes in 2.009 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [14:26:48] 06SRE, 07SRE-Unowned, 06Infrastructure-Foundations: Create and deploy a re-reimplementation of irc.wikimedia.org in Python 3 without external service deps - https://phabricator.wikimedia.org/T376014#10267926 (10elukey) @Ottomata thanks for the replies :) [14:27:18] RECOVERY - mailman list info on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 8922 bytes in 0.190 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [14:27:32] 06SRE, 06Infrastructure-Foundations, 10netops, 13Patch-For-Review: Automate interface configuration for pfw firewalls using Netbox data - https://phabricator.wikimedia.org/T378070#10267928 (10cmooney) 05Open→03Resolved [14:31:33] (03PS12) 10Vgutierrez: liberica: provide a liberica module [puppet] - 10https://gerrit.wikimedia.org/r/1080708 (https://phabricator.wikimedia.org/T377127) [14:31:33] (03PS10) 10Vgutierrez: profile: Provide a liberica profile [puppet] - 10https://gerrit.wikimedia.org/r/1081372 (https://phabricator.wikimedia.org/T377127) [14:31:33] (03PS3) 10Vgutierrez: role,site: Provide a liberica role and use it on lvs1013 [puppet] - 10https://gerrit.wikimedia.org/r/1083778 (https://phabricator.wikimedia.org/T377127) [14:31:36] 10ops-eqiad, 06SRE, 06DC-Ops, 10fundraising-tech-ops, and 2 others: Frack eqiad network upgrade: design, installation and configuration - https://phabricator.wikimedia.org/T377381#10267943 (10cmooney) >>! In T377381#10263676, @Dwisehaupt wrote: > @cmooney @Jclark-ctr There has been a request to push our ma... [14:31:43] FIRING: ElevatedMaxLagWDQS: WDQS lag is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DElevatedMaxLagWDQS [14:32:18] (03CR) 10Vgutierrez: "no problem, thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/1080708 (https://phabricator.wikimedia.org/T377127) (owner: 10Vgutierrez) [14:33:30] (03CR) 10Fabfur: haproxykafka: profile and hiera files (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1083204 (https://phabricator.wikimedia.org/T374128) (owner: 10Fabfur) [14:36:35] (03CR) 10Ssingh: [C:03+1] liberica: provide a liberica module [puppet] - 10https://gerrit.wikimedia.org/r/1080708 (https://phabricator.wikimedia.org/T377127) (owner: 10Vgutierrez) [14:36:39] !log jmm@cumin2002 START - Cookbook sre.ganeti.addnode for new host ganeti2042.codfw.wmnet to cluster codfw and group D [14:36:43] RESOLVED: ElevatedMaxLagWDQS: WDQS lag is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DElevatedMaxLagWDQS [14:37:16] FIRING: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [14:37:29] !log jmm@cumin2002 END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2042.codfw.wmnet to cluster codfw and group D [14:38:33] (03CR) 10Vgutierrez: liberica: provide a liberica module (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1080708 (https://phabricator.wikimedia.org/T377127) (owner: 10Vgutierrez) [14:46:12] (03CR) 10Muehlenhoff: [C:03+2] Deprecate system::role for maps [puppet] - 10https://gerrit.wikimedia.org/r/1083807 (owner: 10Muehlenhoff) [14:46:40] I'll run a maintenance script on eswiki to prune some dangling link recommendations. I did that before and this is just some remaining cleanup. No trouble expected [14:47:08] 06SRE, 10Wikimedia-Mailing-lists: Remove disabled users from internal mailing lists - https://phabricator.wikimedia.org/T161004#10267999 (10taavi) [14:48:07] PROBLEM - Host ganeti2042 is DOWN: PING CRITICAL - Packet loss = 100% [14:49:13] (03CR) 10JHathaway: [C:03+1] Remove CI ignored modules mechanism [puppet] - 10https://gerrit.wikimedia.org/r/1083293 (owner: 10Majavah) [14:50:35] RECOVERY - Host ganeti2042 is UP: PING OK - Packet loss = 0%, RTA = 30.35 ms [14:51:15] 10ops-codfw, 06SRE, 06DC-Ops, 06Discovery-Search, 10Data-Platform-SRE (2024.10.19 - 2024.11.08): Q2:rack/setup/install wdqs202[67] - https://phabricator.wikimedia.org/T378031#10268023 (10Gehel) [14:51:31] FIRING: [5x] ProbeDown: Service centrallog1002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [14:51:50] !log T372337 - run `mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki=eswiki --search-index` to fix the remaining ca. 10K dangling search index records [14:51:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:52:16] T372337: High number of dangling search index results at fr.wikipedia or it.wikipedia - https://phabricator.wikimedia.org/T372337 [14:52:41] 10ops-codfw, 06SRE, 06DC-Ops, 06Discovery-Search, 10Data-Platform-SRE (2024.10.19 - 2024.11.08): Q2:rack/setup/install elastic211[0-5] - https://phabricator.wikimedia.org/T378034#10268006 (10Gehel) a:05bking→03None [14:52:43] 10ops-codfw, 06SRE, 06DC-Ops, 06Discovery-Search, 10Data-Platform-SRE (2024.10.19 - 2024.11.08): Q2:rack/setup/install elastic211[0-5] - https://phabricator.wikimedia.org/T378034#10268014 (10Gehel) a:03bking [14:54:18] 10ops-eqiad, 06SRE, 06DC-Ops, 06Discovery-Search, and 2 others: Q2:rack/setup/install wdqs102[567] - https://phabricator.wikimedia.org/T378030#10268026 (10Gehel) [14:55:27] (03PS4) 10Vgutierrez: role,site: Provide a liberica role and use it on lvs1013 [puppet] - 10https://gerrit.wikimedia.org/r/1083778 (https://phabricator.wikimedia.org/T377127) [14:55:36] 06SRE, 06serviceops, 10Data-Platform-SRE (2024.10.19 - 2024.11.08): DegradedArray email alerts for aqs1013 and aqs1014 are firing since April 18 - https://phabricator.wikimedia.org/T373490#10268041 (10Gehel) [14:55:43] (03PS5) 10Vgutierrez: role,site: Provide a liberica role and use it on lvs1013 [puppet] - 10https://gerrit.wikimedia.org/r/1083778 (https://phabricator.wikimedia.org/T377127) [14:56:00] (03CR) 10Fabfur: haproxykafka: profile and hiera files (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1083204 (https://phabricator.wikimedia.org/T374128) (owner: 10Fabfur) [14:56:19] (03PS6) 10Vgutierrez: role,site: Provide a liberica role and use it on lvs1013 [puppet] - 10https://gerrit.wikimedia.org/r/1083778 (https://phabricator.wikimedia.org/T377127) [14:56:37] (03CR) 10Vgutierrez: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1083778 (https://phabricator.wikimedia.org/T377127) (owner: 10Vgutierrez) [14:57:58] (03CR) 10Dzahn: [C:03+2] Deprecate system::role for remaining SRE-Collab roles [puppet] - 10https://gerrit.wikimedia.org/r/1083805 (owner: 10Muehlenhoff) [14:59:06] (03CR) 10Elukey: [C:03+1] analytics/cluster/secrets_test: disable all hdfs_secrets rendering for now [puppet] - 10https://gerrit.wikimedia.org/r/1083777 (https://phabricator.wikimedia.org/T323692) (owner: 10Brouberol) [15:00:11] (03CR) 10Fabfur: haproxykafka: profile and hiera files (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1083204 (https://phabricator.wikimedia.org/T374128) (owner: 10Fabfur) [15:00:51] 10ops-codfw, 06SRE, 06DC-Ops: ganeti2042 seems to have a broken CPU? (new Supermicro node) - https://phabricator.wikimedia.org/T378358 (10MoritzMuehlenhoff) 03NEW [15:01:47] (03CR) 10Vgutierrez: haproxykafka: profile and hiera files (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1083204 (https://phabricator.wikimedia.org/T374128) (owner: 10Fabfur) [15:01:49] (03PS9) 10Fabfur: haproxykafka: haproxykafka module [puppet] - 10https://gerrit.wikimedia.org/r/1083203 (https://phabricator.wikimedia.org/T374128) [15:02:03] (03PS6) 10Fabfur: haproxykafka: profile and hiera files [puppet] - 10https://gerrit.wikimedia.org/r/1083204 (https://phabricator.wikimedia.org/T374128) [15:02:11] (03CR) 10Dzahn: [C:03+2] profile::phabricator::migration: Fix syntax [puppet] - 10https://gerrit.wikimedia.org/r/1083148 (owner: 10Muehlenhoff) [15:02:16] RESOLVED: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [15:05:11] (03PS7) 10Vgutierrez: role,site: Provide a liberica role and use it on lvs1013 [puppet] - 10https://gerrit.wikimedia.org/r/1083778 (https://phabricator.wikimedia.org/T377127) [15:05:32] (03CR) 10Vgutierrez: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1083778 (https://phabricator.wikimedia.org/T377127) (owner: 10Vgutierrez) [15:05:48] (03PS10) 10Fabfur: haproxykafka: haproxykafka module [puppet] - 10https://gerrit.wikimedia.org/r/1083203 (https://phabricator.wikimedia.org/T374128) [15:06:17] (03PS7) 10Fabfur: haproxykafka: profile and hiera files [puppet] - 10https://gerrit.wikimedia.org/r/1083204 (https://phabricator.wikimedia.org/T374128) [15:08:09] (03PS1) 10Slyngshede: Show currently signed in username. [software/bitu] - 10https://gerrit.wikimedia.org/r/1083835 (https://phabricator.wikimedia.org/T378344) [15:11:05] (03CR) 10Brouberol: [C:03+2] analytics/cluster/secrets_test: disable all hdfs_secrets rendering for now [puppet] - 10https://gerrit.wikimedia.org/r/1083777 (https://phabricator.wikimedia.org/T323692) (owner: 10Brouberol) [15:11:50] (03PS1) 10Michael Große: beta: enable "Surfacing structured tasks" for an early beta-wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1083836 (https://phabricator.wikimedia.org/T376677) [15:14:19] (03CR) 10Majavah: [C:03+2] Remove CI ignored modules mechanism [puppet] - 10https://gerrit.wikimedia.org/r/1083293 (owner: 10Majavah) [15:14:59] (03PS11) 10Fabfur: haproxykafka: haproxykafka module [puppet] - 10https://gerrit.wikimedia.org/r/1083203 (https://phabricator.wikimedia.org/T374128) [15:14:59] (03PS8) 10Fabfur: haproxykafka: profile and hiera files [puppet] - 10https://gerrit.wikimedia.org/r/1083204 (https://phabricator.wikimedia.org/T374128) [15:15:15] (03CR) 10Volans: "I haven't tested it but the change looks sane. I don't recall if there was anything reading the generated files directly" [puppet] - 10https://gerrit.wikimedia.org/r/1083781 (https://phabricator.wikimedia.org/T376877) (owner: 10Giuseppe Lavagetto) [15:15:20] (03CR) 10Volans: [C:03+1] fetch_external_clouds_vendors_nets: compatibility with conftool 4.0 [puppet] - 10https://gerrit.wikimedia.org/r/1083781 (https://phabricator.wikimedia.org/T376877) (owner: 10Giuseppe Lavagetto) [15:16:15] !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply [15:16:51] !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply [15:17:43] (03CR) 10Cyndywikime: [C:03+1] beta: enable "Surfacing structured tasks" for an early beta-wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1083836 (https://phabricator.wikimedia.org/T376677) (owner: 10Michael Große) [15:18:08] 10ops-codfw, 06SRE, 06DC-Ops: ganeti2042 seems to have a broken CPU? (new Supermicro node) - https://phabricator.wikimedia.org/T378358#10268204 (10MoritzMuehlenhoff) ganeti2042 has been removed from the Ganeti cluster, can be taken down for analysis/fixing any time. [15:20:32] (03PS1) 10Dzahn: phorge: delete role and profile, was temporary [puppet] - 10https://gerrit.wikimedia.org/r/1083839 [15:21:04] (03CR) 10Dzahn: "just want to double check it's not applied in cloud vps" [puppet] - 10https://gerrit.wikimedia.org/r/1083839 (owner: 10Dzahn) [15:21:31] FIRING: [5x] ProbeDown: Service centrallog1002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [15:23:16] (03CR) 10Volans: [C:03+1] "LGTM, I wonder if we should add a check/confirmation in one case." [cookbooks] - 10https://gerrit.wikimedia.org/r/1077377 (https://phabricator.wikimedia.org/T373519) (owner: 10Ayounsi) [15:24:18] PROBLEM - ganeti-confd running on ganeti2042 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 110 (gnt-confd), command name ganeti-confd https://wikitech.wikimedia.org/wiki/Ganeti [15:24:39] 06SRE: External email not being received at accountspayable@wikimedia.org - https://phabricator.wikimedia.org/T378364 (10JLam-WMF) 03NEW [15:26:58] 06SRE: External email not being received at accountspayable@wikimedia.org - https://phabricator.wikimedia.org/T378364#10268266 (10JLam-WMF) [15:27:38] PROBLEM - ganeti-noded running on ganeti2042 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 0 (root), command name ganeti-noded https://wikitech.wikimedia.org/wiki/Ganeti [15:27:59] 06SRE: External email not being received at accountspayable@wikimedia.org - https://phabricator.wikimedia.org/T378364#10268273 (10JLam-WMF) [15:28:46] !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply [15:29:21] !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply [15:30:05] (03CR) 10BryanDavis: [C:03+1] base: notify_maintainers: Don't email disabled accounts [puppet] - 10https://gerrit.wikimedia.org/r/1083295 (owner: 10Majavah) [15:30:05] jan_drewniak: Wikimedia Portals Update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241028T1530). Please do the needful. [15:30:22] !log starting portals deployment [15:30:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:33:11] (03CR) 10Jdrewniak: [C:03+2] Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1083486 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [15:34:13] (03Merged) 10jenkins-bot: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1083486 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [15:38:15] 10ops-eqiad, 06DC-Ops, 06Discovery-Search: Q2:rack/setup/install cloudelastic101[12] - https://phabricator.wikimedia.org/T378368 (10RobH) 03NEW [15:39:18] 10ops-eqiad, 06DC-Ops, 06Discovery-Search: Q2:rack/setup/install cloudelastic101[12] - https://phabricator.wikimedia.org/T378368#10268380 (10RobH) [15:39:55] 10ops-eqiad, 06DC-Ops, 06Discovery-Search: Q2:rack/setup/install cloudelastic101[12] - https://phabricator.wikimedia.org/T378368#10268385 (10RobH) a:03bking Please note the workflow for racking tasks has changed this fiscal year, and we now require the puppet updates from the sub-team receiving the new ser... [15:46:11] (03CR) 10Vgutierrez: [C:03+1] "nice one, just take into account that acme-chief instance needs to be able to reach the HTTP challenge using the same URL as let's encrypt" [puppet] - 10https://gerrit.wikimedia.org/r/1011167 (https://phabricator.wikimedia.org/T342398) (owner: 10Majavah) [15:46:31] FIRING: [5x] ProbeDown: Service centrallog1002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [15:48:06] !log re-enable IX BGP sessions in eqiad [15:48:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:49:00] !log jdrewniak@deploy2002 Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:1046698| Bumping portals to master (T128546)]] (duration: 07m 35s) [15:49:18] T128546: [Recurring Task] Update Wikipedia and sister projects portals statistics - https://phabricator.wikimedia.org/T128546 [15:51:26] !log jdrewniak@deploy2002 Synchronized portals: Wikimedia Portals Update: [[gerrit:1046698| Bumping portals to master (T128546)]] (duration: 02m 25s) [15:52:22] 06SRE, 10Wikimedia-Mailing-lists: Remove disabled users from internal mailing lists - https://phabricator.wikimedia.org/T161004#10268454 (10Dzahn) Since onboarding to mailing lists is handled by ITS, it seems logical that ITS would also handle offboarding from the same lists. So I'm not sure I agree with the... [16:01:20] (03PS1) 10Esanders: Set Flow to read-only on nowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1083848 (https://phabricator.wikimedia.org/T377990) [16:04:03] (03CR) 10Krinkle: Profiler: introduce metrics batching and centralize socket management (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1081460 (owner: 10Cwhite) [16:04:20] 06SRE, 06serviceops, 10Data-Platform-SRE (2024.10.19 - 2024.11.08): DegradedArray email alerts for aqs1013 and aqs1014 are firing since April 18 - https://phabricator.wikimedia.org/T373490#10268574 (10bking) a:03bking [16:06:28] (03CR) 10Giuseppe Lavagetto: [C:03+1] P:conftool::client: make conftool2git_host Optional [puppet] - 10https://gerrit.wikimedia.org/r/1083274 (owner: 10Scott French) [16:06:46] (03PS9) 10Fabfur: haproxykafka: profile and hiera files [puppet] - 10https://gerrit.wikimedia.org/r/1083204 (https://phabricator.wikimedia.org/T374128) [16:08:43] 10ops-eqiad, 06DC-Ops, 06Discovery-Search, 10Data-Platform-SRE (2024.10.19 - 2024.11.08): Q2:rack/setup/install cloudelastic101[12] - https://phabricator.wikimedia.org/T378368#10268599 (10Gehel) [16:10:12] (03CR) 10Majavah: [C:03+2] base: notify_maintainers: Don't email disabled accounts [puppet] - 10https://gerrit.wikimedia.org/r/1083295 (owner: 10Majavah) [16:10:14] (03CR) 10Muehlenhoff: phorge: delete role and profile, was temporary (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1083839 (owner: 10Dzahn) [16:10:16] (03CR) 10Majavah: [V:03+1 C:03+2] P:acme_chief: allow enabling http-01 spport [puppet] - 10https://gerrit.wikimedia.org/r/1011167 (https://phabricator.wikimedia.org/T342398) (owner: 10Majavah) [16:14:32] (03PS2) 10Dzahn: phorge: delete role and profile, was temporary [puppet] - 10https://gerrit.wikimedia.org/r/1083839 [16:14:37] 06SRE, 07SRE-Unowned, 06Infrastructure-Foundations: Create and deploy a re-reimplementation of irc.wikimedia.org in Python 3 without external service deps - https://phabricator.wikimedia.org/T376014#10268627 (10MoritzMuehlenhoff) Additional update: In the Grafana dashboard we saw a recurring pattern of clien... [16:15:00] (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1083839 (owner: 10Dzahn) [16:16:03] (03CR) 10Majavah: [C:03+2] P:wmcs::novaproxy: proxy http-01 challenges to acme-chief [puppet] - 10https://gerrit.wikimedia.org/r/1011168 (https://phabricator.wikimedia.org/T342398) (owner: 10Majavah) [16:16:33] !log fnegri@cumin1002 START - Cookbook sre.hosts.reboot-single for host cloudcumin2001.codfw.wmnet [16:20:18] !log fnegri@cumin1002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcumin2001.codfw.wmnet [16:20:28] FIRING: KeyholderUnarmed: 2 unarmed Keyholder key(s) on cloudcumin2001:9100 - https://wikitech.wikimedia.org/wiki/Keyholder - TODO - https://alerts.wikimedia.org/?q=alertname%3DKeyholderUnarmed [16:20:30] !log fnegri@cumin1002 START - Cookbook sre.hosts.reboot-single for host cloudcumin1001.eqiad.wmnet [16:21:53] (03PS1) 10Kosta Harlan: GlobalContributionsPager: Use Special:PermanentLink to construct link [extensions/CheckUser] (wmf/1.43.0-wmf.28) - 10https://gerrit.wikimedia.org/r/1083853 (https://phabricator.wikimedia.org/T378155) [16:22:34] 06SRE, 06serviceops, 10Data-Platform-SRE (2024.10.19 - 2024.11.08): DegradedArray email alerts for aqs1013 and aqs1014 are firing since April 18 - https://phabricator.wikimedia.org/T373490#10268679 (10bking) Based on `/etc/wikimedia/contacts.yaml` , these hosts are owned by Data Persistence. As such, I'm re... [16:24:19] (03CR) 10Mforns: [C:03+1] "LGTM!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1083810 (https://phabricator.wikimedia.org/T366157) (owner: 10Santiago Faci) [16:25:39] 06SRE, 06serviceops: DegradedArray email alerts for aqs1013 and aqs1014 are firing since April 18 - https://phabricator.wikimedia.org/T373490#10268693 (10bking) a:05bking→03None [16:26:02] (03CR) 10Santiago Faci: [C:03+2] aqs-http-gateway chart: Removed old property ('datasource') that set the mediawiki history snapshot name. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1083810 (https://phabricator.wikimedia.org/T366157) (owner: 10Santiago Faci) [16:26:17] !log fnegri@cumin1002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcumin1001.eqiad.wmnet [16:29:18] (03Merged) 10jenkins-bot: aqs-http-gateway chart: Removed old property ('datasource') that set the mediawiki history snapshot name. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1083810 (https://phabricator.wikimedia.org/T366157) (owner: 10Santiago Faci) [16:30:28] FIRING: [2x] KeyholderUnarmed: 2 unarmed Keyholder key(s) on cloudcumin1001:9100 - https://wikitech.wikimedia.org/wiki/Keyholder - TODO - https://alerts.wikimedia.org/?q=alertname%3DKeyholderUnarmed [16:32:59] !log vgutierrez@cumin1002 START - Cookbook sre.hosts.reboot-single for host lvs1016.eqiad.wmnet [16:34:42] 06SRE, 06Data-Persistence, 06serviceops: DegradedArray email alerts for aqs1013 and aqs1014 are firing since April 18 - https://phabricator.wikimedia.org/T373490#10268811 (10taavi) [16:38:35] !log vgutierrez@cumin1002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1016.eqiad.wmnet [16:38:44] !log vgutierrez@cumin1002 START - Cookbook sre.hosts.reboot-single for host lvs1015.eqiad.wmnet [16:42:35] (03PS1) 10Majavah: P:wmcs::novaproxy: Fix nginx config order [puppet] - 10https://gerrit.wikimedia.org/r/1083858 (https://phabricator.wikimedia.org/T342398) [16:44:20] !log vgutierrez@cumin1002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1015.eqiad.wmnet [16:44:32] !log vgutierrez@cumin1002 START - Cookbook sre.hosts.reboot-single for host lvs1014.eqiad.wmnet [16:49:14] (03CR) 10Dzahn: phorge: delete role and profile, was temporary (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1083839 (owner: 10Dzahn) [16:49:21] (03CR) 10Majavah: [C:03+2] P:wmcs::novaproxy: Fix nginx config order [puppet] - 10https://gerrit.wikimedia.org/r/1083858 (https://phabricator.wikimedia.org/T342398) (owner: 10Majavah) [16:50:08] !log vgutierrez@cumin1002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1014.eqiad.wmnet [16:50:16] !log vgutierrez@cumin1002 START - Cookbook sre.hosts.reboot-single for host lvs1013.eqiad.wmnet [16:51:44] PROBLEM - ensure kvm processes are running on cloudvirt1063 is CRITICAL: PROCS CRITICAL: 0 processes with regex args qemu-system-x86_64 https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [16:53:35] (03PS1) 10Majavah: dynamicproxy: Allow creating proxy at zone apex [puppet] - 10https://gerrit.wikimedia.org/r/1083862 (https://phabricator.wikimedia.org/T342398) [16:54:22] (03CR) 10CI reject: [V:04-1] dynamicproxy: Allow creating proxy at zone apex [puppet] - 10https://gerrit.wikimedia.org/r/1083862 (https://phabricator.wikimedia.org/T342398) (owner: 10Majavah) [16:55:51] !log vgutierrez@cumin1002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1013.eqiad.wmnet [16:55:52] (03PS2) 10Majavah: dynamicproxy: Allow creating proxy at zone apex [puppet] - 10https://gerrit.wikimedia.org/r/1083862 (https://phabricator.wikimedia.org/T342398) [17:00:04] Deploy window MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241028T1700) [17:00:04] ryankemper: That opportune time for a Wikidata Query Service weekly deploy deploy is upon us again. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241028T1700). [17:00:28] FIRING: [2x] KeyholderUnarmed: 2 unarmed Keyholder key(s) on cloudcumin1001:9100 - https://wikitech.wikimedia.org/wiki/Keyholder - TODO - https://alerts.wikimedia.org/?q=alertname%3DKeyholderUnarmed [17:01:05] 06SRE, 10observability, 10WMF-JobQueue: Spike in JobQueue job backlog time - https://phabricator.wikimedia.org/T378385 (10kostajh) 03NEW [17:03:44] !log fnegri@cumin1002 START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on cloudvirt1063.eqiad.wmnet with reason: cloudvirt1063 needs maintenance T375223 [17:03:58] !log fnegri@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on cloudvirt1063.eqiad.wmnet with reason: cloudvirt1063 needs maintenance T375223 [17:04:13] (03CR) 10Scott French: [C:03+2] P:conftool::client: make conftool2git_host Optional [puppet] - 10https://gerrit.wikimedia.org/r/1083274 (owner: 10Scott French) [17:04:29] 06SRE, 10observability, 10WMF-JobQueue: Spike in JobQueue job backlog time - https://phabricator.wikimedia.org/T378385#10269066 (10kostajh) [17:04:50] T375223: 2024-09-21 NodeDown cloudvirt1063 - https://phabricator.wikimedia.org/T375223 [17:05:28] RESOLVED: [2x] KeyholderUnarmed: 2 unarmed Keyholder key(s) on cloudcumin1001:9100 - https://wikitech.wikimedia.org/wiki/Keyholder - TODO - https://alerts.wikimedia.org/?q=alertname%3DKeyholderUnarmed [17:06:39] 10ops-eqiad, 06SRE, 06DC-Ops: hw troubleshooting: server failure for cloudvirt1063.eqiad.wmnet - https://phabricator.wikimedia.org/T375372#10269070 (10fnegri) @Jclark-ctr thanks for the updates, no rush from our side. [17:10:21] 06SRE, 10observability, 10WMF-JobQueue: Spike in JobQueue job backlog time - https://phabricator.wikimedia.org/T378385#10269085 (10kostajh) [17:12:09] 06SRE, 10observability, 10WMF-JobQueue: Spike in JobQueue job backlog time - https://phabricator.wikimedia.org/T378385#10269089 (10Dreamy_Jazz) [17:12:49] 06SRE, 10observability, 10WMF-JobQueue: Spike in JobQueue job backlog time (500ms -> 4-8 minutes) - https://phabricator.wikimedia.org/T378385#10269092 (10kostajh) [17:15:29] 06SRE, 10observability, 10WMF-JobQueue: Spike in JobQueue job backlog time (500ms -> 4-8 minutes) - https://phabricator.wikimedia.org/T378385#10269109 (10Dreamy_Jazz) Just to note, this increase in job queue waiting time is affecting our scanning rate. It was causing the script to assume that the jobs had fa... [17:15:43] (03CR) 10Ebernhardson: "yea i think it seems reasonable to hold this patch until we release most of the plugins through their normal means." [software/opensearch/plugins] - 10https://gerrit.wikimedia.org/r/1080749 (https://phabricator.wikimedia.org/T372769) (owner: 10Ebernhardson) [17:16:39] 06SRE, 10observability, 10WMF-JobQueue: Spike in JobQueue job backlog time (500ms -> 4-8 minutes) - https://phabricator.wikimedia.org/T378385#10269120 (10jcrespo) Potentially related to T378076, as concurrency was lowered to mitigate ongoing issues. [17:19:31] (03PS1) 10Majavah: dynamicproxy: Allow zones not managed in Designate [puppet] - 10https://gerrit.wikimedia.org/r/1083868 (https://phabricator.wikimedia.org/T342398) [17:21:26] (03PS6) 10MacFan4000: ExtensionDistributor: Mark 1.43 as beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1082256 (https://phabricator.wikimedia.org/T372322) [17:21:38] 06SRE, 06DBA, 07Wikimedia-production-error: Parsercache issues in codfw causing large-scale outage - https://phabricator.wikimedia.org/T378076#10269176 (10jcrespo) I linked the above task: {T378385}, as mitigations may have slowed down jobqueue processing. We may need to have a look, as while almost surely w... [17:23:23] (03CR) 10Jdlrobson: [C:03+1] enwiktionary: Enable mobile page tabs for non logged in users [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1083294 (https://phabricator.wikimedia.org/T377648) (owner: 10Hamish) [17:38:47] (03CR) 10Giuseppe Lavagetto: "Of course there is right now, any "requestctl sync" command will read the files on disk." [puppet] - 10https://gerrit.wikimedia.org/r/1083781 (https://phabricator.wikimedia.org/T376877) (owner: 10Giuseppe Lavagetto) [17:41:17] 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops: Q2:rack/setup/install wikikube-worker21[28-35] - https://phabricator.wikimedia.org/T377007#10269389 (10Jhancock.wm) a:05Clement_Goubert→03Jhancock.wm [17:41:31] (03PS1) 10Dreamrimmer: Enable electionadmin user group on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1083870 (https://phabricator.wikimedia.org/T378287) [17:42:10] !log jiawang@deploy2002 Started deploy [airflow-dags/analytics_product@a7456f9]: deploy tsp pipelines [17:43:14] !log jiawang@deploy2002 Finished deploy [airflow-dags/analytics_product@a7456f9]: deploy tsp pipelines (duration: 01m 33s) [17:44:59] 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops: Q2:rack/setup/install mc-gp200[4-6] - https://phabricator.wikimedia.org/T376968#10269407 (10Jhancock.wm) a:05Clement_Goubert→03Jhancock.wm [17:45:35] (03CR) 10Novem Linguae: Enable electionadmin user group on enwiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1083870 (https://phabricator.wikimedia.org/T378287) (owner: 10Dreamrimmer) [17:52:52] (03PS10) 10Fabfur: haproxykafka: profile and hiera files [puppet] - 10https://gerrit.wikimedia.org/r/1083204 (https://phabricator.wikimedia.org/T374128) [17:55:23] (03PS2) 10Dreamrimmer: Enable electionadmin user group on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1083870 (https://phabricator.wikimedia.org/T378287) [17:55:48] (03PS1) 10Jdlrobson: Partial Revert "Make sure contributor's name is on its line" [skins/MinervaNeue] (wmf/1.43.0-wmf.28) - 10https://gerrit.wikimedia.org/r/1083874 (https://phabricator.wikimedia.org/T378142) [17:56:15] 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops: Q2:rack/setup/install kubestage200[3-4] - https://phabricator.wikimedia.org/T377009#10269497 (10Jhancock.wm) a:05Clement_Goubert→03Jhancock.wm [17:56:47] (03CR) 10Fabfur: haproxykafka: profile and hiera files (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1083204 (https://phabricator.wikimedia.org/T374128) (owner: 10Fabfur) [17:57:53] (03CR) 10Dreamrimmer: "Done" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1083870 (https://phabricator.wikimedia.org/T378287) (owner: 10Dreamrimmer) [17:59:41] (03CR) 10Dreamrimmer: Enable electionadmin user group on enwiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1083870 (https://phabricator.wikimedia.org/T378287) (owner: 10Dreamrimmer) [18:01:25] (03CR) 10Fabfur: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1083204 (https://phabricator.wikimedia.org/T374128) (owner: 10Fabfur) [18:05:58] 10ops-magru, 06SRE, 06Traffic: magru: Incorrect racking for magru hosts (F-25G and Custom Config interchanged) - https://phabricator.wikimedia.org/T376737#10269580 (10RobH) https://docs.google.com/document/d/1vPpQtFutGhXY23u-ipWdYYoQN8qN8vmqCEVlIz2_IQc/edit?usp=sharing Updated with what I think is a passable... [18:08:50] 10ops-magru, 06SRE, 06Traffic: magru: Incorrect racking for magru hosts (F-25G and Custom Config interchanged) - https://phabricator.wikimedia.org/T376737#10269591 (10RobH) I'll also be out of town all next week, November 4th-8th, so I think we should schedule this for the week of my return. I'd suggest per... [18:09:53] (03PS3) 10Dzahn: phorge: delete role and profile, was temporary [puppet] - 10https://gerrit.wikimedia.org/r/1083839 (https://phabricator.wikimedia.org/T333885) [18:10:02] (03CR) 10Dzahn: [C:03+2] phorge: delete role and profile, was temporary [puppet] - 10https://gerrit.wikimedia.org/r/1083839 (https://phabricator.wikimedia.org/T333885) (owner: 10Dzahn) [18:11:03] 10ops-magru, 06SRE, 06Traffic: magru: Incorrect racking for magru hosts (F-25G and Custom Config interchanged) - https://phabricator.wikimedia.org/T376737#10269596 (10RobH) [18:20:09] (03PS2) 10BCornwall: Remove rsa-2048 certs from mail services [puppet] - 10https://gerrit.wikimedia.org/r/1075604 (https://phabricator.wikimedia.org/T375569) [18:20:12] (03PS2) 10BCornwall: archiva: Remove rsa-2048 certs [puppet] - 10https://gerrit.wikimedia.org/r/1075605 (https://phabricator.wikimedia.org/T375569) [18:20:33] 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops: Q2:rack/setup/install wikikube-worker21[56-70] - https://phabricator.wikimedia.org/T376965#10269618 (10Jhancock.wm) a:05Clement_Goubert→03Jhancock.wm [18:21:05] (03PS1) 10Jdlrobson: Restore missing second argument to "mapState" in QuickView.vue [extensions/SearchVue] (wmf/1.43.0-wmf.28) - 10https://gerrit.wikimedia.org/r/1083877 (https://phabricator.wikimedia.org/T378204) [18:21:19] 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops, 13Patch-For-Review: Q2:rack/setup/install wikikube-worker21[36-55] - https://phabricator.wikimedia.org/T377027#10269610 (10Jhancock.wm) a:05Clement_Goubert→03Jhancock.wm [18:22:16] (03Abandoned) 10Arlolra: mobileapps: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1075940 (owner: 10PipelineBot) [18:22:31] (03CR) 10Mforns: [C:03+1] "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/1082800 (https://phabricator.wikimedia.org/T364398) (owner: 10Snwachukwu) [18:23:42] (03CR) 10BCornwall: [C:03+2] "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1075605 (https://phabricator.wikimedia.org/T375569) (owner: 10BCornwall) [18:24:16] (03CR) 10Gmodena: [C:03+2] dse-k8s-services: mw-dump: version bump [deployment-charts] - 10https://gerrit.wikimedia.org/r/1083803 (https://phabricator.wikimedia.org/T377938) (owner: 10Gmodena) [18:28:30] (03PS11) 10Fabfur: haproxykafka: profile and hiera files [puppet] - 10https://gerrit.wikimedia.org/r/1083204 (https://phabricator.wikimedia.org/T374128) [18:29:13] (03CR) 10CI reject: [V:04-1] Partial Revert "Make sure contributor's name is on its line" [skins/MinervaNeue] (wmf/1.43.0-wmf.28) - 10https://gerrit.wikimedia.org/r/1083874 (https://phabricator.wikimedia.org/T378142) (owner: 10Jdlrobson) [18:34:30] (03CR) 10Jdlrobson: "recheck" [skins/MinervaNeue] (wmf/1.43.0-wmf.28) - 10https://gerrit.wikimedia.org/r/1083874 (https://phabricator.wikimedia.org/T378142) (owner: 10Jdlrobson) [18:34:42] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, October 28 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-it" [skins/MinervaNeue] (wmf/1.43.0-wmf.28) - 10https://gerrit.wikimedia.org/r/1083874 (https://phabricator.wikimedia.org/T378142) (owner: 10Jdlrobson) [18:35:04] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, October 28 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-it" [extensions/SearchVue] (wmf/1.43.0-wmf.28) - 10https://gerrit.wikimedia.org/r/1083877 (https://phabricator.wikimedia.org/T378204) (owner: 10Jdlrobson) [18:35:06] (03Merged) 10jenkins-bot: dse-k8s-services: mw-dump: version bump [deployment-charts] - 10https://gerrit.wikimedia.org/r/1083803 (https://phabricator.wikimedia.org/T377938) (owner: 10Gmodena) [18:40:18] 10ops-eqiad, 06SRE, 06DBA, 06DC-Ops: db1234 crashed - faulty memory stick on A6 (0x4E42) - https://phabricator.wikimedia.org/T378267#10269782 (10VRiley-WMF) Reached out to Dell for recommendations. Currently, they would like to try replacing the memory one more time before proceeding with motherboard. Ser... [18:43:06] (03CR) 10Jdlrobson: [C:03+1] Reduce number of bucketsizes for MediaViewer (group0) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1079640 (https://phabricator.wikimedia.org/T372165) (owner: 10Simon04) [18:47:21] 10ops-codfw, 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DC-Ops: Q1:rack/setup/install ms-be208[1-8] - https://phabricator.wikimedia.org/T371400#10269819 (10Papaul) We have been doing some testing on ouw end to better understand the issue on those servers not able to detect all the 24 disks but jus... [18:49:49] (03CR) 10Fabfur: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1083204 (https://phabricator.wikimedia.org/T374128) (owner: 10Fabfur) [18:50:32] (03CR) 10BCornwall: [V:03+2 C:03+2] archiva: Remove rsa-2048 certs [puppet] - 10https://gerrit.wikimedia.org/r/1075605 (https://phabricator.wikimedia.org/T375569) (owner: 10BCornwall) [18:50:47] (03PS3) 10BCornwall: archiva: Remove rsa-2048 certs [puppet] - 10https://gerrit.wikimedia.org/r/1075605 (https://phabricator.wikimedia.org/T375569) [18:50:52] (03CR) 10BCornwall: [V:03+2 C:03+2] archiva: Remove rsa-2048 certs [puppet] - 10https://gerrit.wikimedia.org/r/1075605 (https://phabricator.wikimedia.org/T375569) (owner: 10BCornwall) [18:58:21] (03PS2) 10BCornwall: durum: Remove rsa-2048 certs from nginx config [puppet] - 10https://gerrit.wikimedia.org/r/1075613 (https://phabricator.wikimedia.org/T375569) [19:02:00] 10ops-eqiad, 06SRE, 06DC-Ops, 10fundraising-tech-ops, and 2 others: Frack eqiad network upgrade: design, installation and configuration - https://phabricator.wikimedia.org/T377381#10269892 (10Jclark-ctr) @cmooney Sorry for delay i am good for nov 12th 100GBase-CWDM4 (green handle) we have a few extra f... [19:02:27] (03CR) 10BCornwall: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/4416/co" [puppet] - 10https://gerrit.wikimedia.org/r/1075613 (https://phabricator.wikimedia.org/T375569) (owner: 10BCornwall) [19:07:18] (03CR) 10BCornwall: [V:03+2] durum: Remove rsa-2048 certs from nginx config [puppet] - 10https://gerrit.wikimedia.org/r/1075613 (https://phabricator.wikimedia.org/T375569) (owner: 10BCornwall) [19:13:42] !log gmodena@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply [19:13:49] !log gmodena@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply [19:13:58] PROBLEM - BGP status on cr4-ulsfo is CRITICAL: BGP CRITICAL - AS64605/IPv4: Active - Anycast, AS64605/IPv6: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status [19:14:36] PROBLEM - BGP status on cr2-eqsin is CRITICAL: BGP CRITICAL - AS64605/IPv4: Active - Anycast, AS64605/IPv6: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status [19:14:39] (03PS2) 10BCornwall: ldap: Remove rsa-2048 certs [puppet] - 10https://gerrit.wikimedia.org/r/1075607 (https://phabricator.wikimedia.org/T375569) [19:15:01] !log gmodena@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply [19:15:08] !log gmodena@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply [19:16:56] (03CR) 10BCornwall: [V:03+1 C:03+2] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/4418/co" [puppet] - 10https://gerrit.wikimedia.org/r/1075607 (https://phabricator.wikimedia.org/T375569) (owner: 10BCornwall) [19:17:02] !log gmodena@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply [19:17:04] !log gmodena@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply [19:18:01] (03CR) 10BCornwall: [V:03+2 C:03+2] ldap: Remove rsa-2048 certs [puppet] - 10https://gerrit.wikimedia.org/r/1075607 (https://phabricator.wikimedia.org/T375569) (owner: 10BCornwall) [19:18:13] (03PS1) 10Kosta Harlan: GlobalContributionsPager: Don't display external namespace in article link [extensions/CheckUser] (wmf/1.43.0-wmf.28) - 10https://gerrit.wikimedia.org/r/1083886 (https://phabricator.wikimedia.org/T378155) [19:18:25] (03CR) 10Gmodena: [C:03+2] services: page-content-change-enrich: version bump [deployment-charts] - 10https://gerrit.wikimedia.org/r/1083796 (https://phabricator.wikimedia.org/T377938) (owner: 10Gmodena) [19:18:47] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, October 28 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-it" [extensions/CheckUser] (wmf/1.43.0-wmf.28) - 10https://gerrit.wikimedia.org/r/1083886 (https://phabricator.wikimedia.org/T378155) (owner: 10Kosta Harlan) [19:19:25] (03Merged) 10jenkins-bot: services: page-content-change-enrich: version bump [deployment-charts] - 10https://gerrit.wikimedia.org/r/1083796 (https://phabricator.wikimedia.org/T377938) (owner: 10Gmodena) [19:21:18] !log gmodena@deploy2002 helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply [19:21:24] !log gmodena@deploy2002 helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply [19:23:05] !log Removed RSA certificate support from ldap, archiva, durum [19:23:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:23:52] (03PS2) 10BCornwall: mirrors: Remove rsa-2048 certs from Apache config [puppet] - 10https://gerrit.wikimedia.org/r/1075617 (https://phabricator.wikimedia.org/T375569) [19:24:13] (03CR) 10Brouberol: [C:03+1] Add new presto hosts to presto cluster [puppet] - 10https://gerrit.wikimedia.org/r/1083756 (https://phabricator.wikimedia.org/T374924) (owner: 10Stevemunene) [19:24:24] !log gmodena@deploy2002 helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply [19:24:29] !log gmodena@deploy2002 helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply [19:24:38] (03CR) 10BCornwall: [V:03+1 C:03+2] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/4419/co" [puppet] - 10https://gerrit.wikimedia.org/r/1075617 (https://phabricator.wikimedia.org/T375569) (owner: 10BCornwall) [19:25:09] (03CR) 10BCornwall: [V:03+2 C:03+2] mirrors: Remove rsa-2048 certs from Apache config [puppet] - 10https://gerrit.wikimedia.org/r/1075617 (https://phabricator.wikimedia.org/T375569) (owner: 10BCornwall) [19:26:46] !log gmodena@deploy2002 helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply [19:26:49] !log gmodena@deploy2002 helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply [19:28:49] (03PS2) 10BCornwall: dumps: Remove rsa-2048 certs from nginx config [puppet] - 10https://gerrit.wikimedia.org/r/1075610 (https://phabricator.wikimedia.org/T375569) [19:29:47] (03CR) 10BCornwall: [V:03+1 C:03+2] "PCC SUCCESS (NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/4420/console" [puppet] - 10https://gerrit.wikimedia.org/r/1075610 (https://phabricator.wikimedia.org/T375569) (owner: 10BCornwall) [19:31:14] (03CR) 10BCornwall: [V:03+1 C:03+2] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/4421/co" [puppet] - 10https://gerrit.wikimedia.org/r/1075610 (https://phabricator.wikimedia.org/T375569) (owner: 10BCornwall) [19:31:31] (03CR) 10BCornwall: [V:03+2 C:03+2] dumps: Remove rsa-2048 certs from nginx config [puppet] - 10https://gerrit.wikimedia.org/r/1075610 (https://phabricator.wikimedia.org/T375569) (owner: 10BCornwall) [19:33:26] !log Removed RSA certificate support from mirrors, dumps (T375569) [19:33:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:33:55] T375569: Remove RSA certificates from puppet - https://phabricator.wikimedia.org/T375569 [19:34:59] (03PS2) 10BCornwall: tlsproxy: Remove rsa-2048 certs [puppet] - 10https://gerrit.wikimedia.org/r/1075612 (https://phabricator.wikimedia.org/T375569) [19:42:08] 06SRE, 10SRE-Access-Requests: Requesting access to 'deployment' for 'Joely Rooke WMDE' - https://phabricator.wikimedia.org/T378082#10270093 (10KFrancis) Hi all, checking my records, @JoelyRooke-WMDE does have an NDA on file for access to the WMDE LDAP Group. If this is sufficient, please proceed. [19:45:24] (03CR) 10BCornwall: [V:03+1 C:03+2] "PCC SUCCESS (NOOP 5 CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/" [puppet] - 10https://gerrit.wikimedia.org/r/1075612 (https://phabricator.wikimedia.org/T375569) (owner: 10BCornwall) [19:46:24] (03CR) 10BCornwall: [V:03+2 C:03+2] tlsproxy: Remove rsa-2048 certs [puppet] - 10https://gerrit.wikimedia.org/r/1075612 (https://phabricator.wikimedia.org/T375569) (owner: 10BCornwall) [19:46:31] FIRING: [4x] ProbeDown: Service centrallog1002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [19:52:27] !log Removed RSA certificate support from tlsproxy (T375569) [19:52:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:52:59] T375569: Remove RSA certificates from puppet - https://phabricator.wikimedia.org/T375569 [19:58:10] (03PS1) 10Kosta Harlan: temp accounts: Enable temp account autocreation on five pilot wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1083890 (https://phabricator.wikimedia.org/T378334) [19:58:40] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, October 29 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployca" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1083890 (https://phabricator.wikimedia.org/T378334) (owner: 10Kosta Harlan) [20:00:05] RoanKattouw, Urbanecm, cjming, TheresNoTime, and kindrobot: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for UTC late backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241028T2000). [20:00:05] kostajh and Jdlrobson: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [20:00:18] o/ [20:00:27] (but need to step out in 30m) [20:00:41] hello [20:00:53] Jdlrobson: do you want to start, then? [20:01:35] i am not currently deploy trained so was hoping to get some help [20:02:04] (03CR) 10Kosta Harlan: [C:03+2] GlobalContributionsPager: Don't display external namespace in article link [extensions/CheckUser] (wmf/1.43.0-wmf.28) - 10https://gerrit.wikimedia.org/r/1083886 (https://phabricator.wikimedia.org/T378155) (owner: 10Kosta Harlan) [20:02:20] (03CR) 10Kosta Harlan: [C:03+2] GlobalContributionsPager: Use Special:PermanentLink to construct link [extensions/CheckUser] (wmf/1.43.0-wmf.28) - 10https://gerrit.wikimedia.org/r/1083853 (https://phabricator.wikimedia.org/T378155) (owner: 10Kosta Harlan) [20:02:32] Jdlrobson: ah OK, I can deploy for you [20:03:25] (03CR) 10TrainBranchBot: [C:03+2] "Approved by kharlan@deploy2002 using scap backport" [skins/MinervaNeue] (wmf/1.43.0-wmf.28) - 10https://gerrit.wikimedia.org/r/1083874 (https://phabricator.wikimedia.org/T378142) (owner: 10Jdlrobson) [20:03:25] (03CR) 10TrainBranchBot: [C:03+2] "Approved by kharlan@deploy2002 using scap backport" [extensions/SearchVue] (wmf/1.43.0-wmf.28) - 10https://gerrit.wikimedia.org/r/1083877 (https://phabricator.wikimedia.org/T378204) (owner: 10Jdlrobson) [20:08:51] im also here to step in Jdlrobson if you need [20:11:05] thanks and thanks! [20:14:02] 10ops-eqiad, 06SRE, 06Data-Persistence, 10Data-Persistence-Backup, 06DC-Ops: Q1:rack/setup/install backup1012 - https://phabricator.wikimedia.org/T371416#10270225 (10wiki_willy) 05Resolved→03Open Re-opening this task, as the server has the incorrect RAID controller. We're working with Supermicro to... [20:18:03] 10ops-codfw, 06SRE, 06Data-Persistence, 10Data-Persistence-Backup, 06DC-Ops: Q1:rack/setup/install backup2012 - https://phabricator.wikimedia.org/T371984#10270249 (10wiki_willy) 05Resolved→03Open Re-opening this task, since we have the incorrect RAID controller on the server. @RobH is currently work... [20:21:52] (03PS1) 10Herron: otelcol-contrib: add tail_sampling config for thanos-query [puppet] - 10https://gerrit.wikimedia.org/r/1083892 (https://phabricator.wikimedia.org/T378190) [20:22:55] (03Merged) 10jenkins-bot: GlobalContributionsPager: Don't display external namespace in article link [extensions/CheckUser] (wmf/1.43.0-wmf.28) - 10https://gerrit.wikimedia.org/r/1083886 (https://phabricator.wikimedia.org/T378155) (owner: 10Kosta Harlan) [20:22:59] (03Merged) 10jenkins-bot: GlobalContributionsPager: Use Special:PermanentLink to construct link [extensions/CheckUser] (wmf/1.43.0-wmf.28) - 10https://gerrit.wikimedia.org/r/1083853 (https://phabricator.wikimedia.org/T378155) (owner: 10Kosta Harlan) [20:24:10] (03Merged) 10jenkins-bot: Partial Revert "Make sure contributor's name is on its line" [skins/MinervaNeue] (wmf/1.43.0-wmf.28) - 10https://gerrit.wikimedia.org/r/1083874 (https://phabricator.wikimedia.org/T378142) (owner: 10Jdlrobson) [20:25:01] Jdlrobson bwang: it will probably be another 20 minutes before SearchVue's CI is done (it just started) [20:25:15] sounds good thanks! [20:25:16] 👍 [20:25:31] bwang: I think I will need to switch over to you! [20:25:55] ok sounds good [20:26:28] To test the mobile patch, you'll need to visit https://en.m.wikipedia.org/wiki/Special:Watchlist and make sure AMC mode is disabled [20:26:44] https://usercontent.irccloud-cdn.com/file/Ov1ksthz/Screenshot%202024-10-28%20at%201.26.40%E2%80%AFPM.png [20:27:05] We want to make sure the rows are clickable and take you to diff (right now they are not - only individual links such as username/title etc..) [20:27:24] i can likely test this one if you have issues just ping me [20:29:23] (03Merged) 10jenkins-bot: Restore missing second argument to "mapState" in QuickView.vue [extensions/SearchVue] (wmf/1.43.0-wmf.28) - 10https://gerrit.wikimedia.org/r/1083877 (https://phabricator.wikimedia.org/T378204) (owner: 10Jdlrobson) [20:30:26] RECOVERY - BGP status on cr2-codfw is OK: BGP OK - up: 370, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status [20:30:55] !log kharlan@deploy2002 Started scap sync-world: Backport for [[gerrit:1083874|Partial Revert "Make sure contributor's name is on its line" (T378142)]], [[gerrit:1083877|Restore missing second argument to "mapState" in QuickView.vue (T378204)]], [[gerrit:1083853|GlobalContributionsPager: Use Special:PermanentLink to construct link (T378155)]], [[gerrit:1083886|GlobalContributionsPager: Don't display external namespace in [20:30:55] article link (T378155)]] [20:31:31] T378142: watchlist doesn't link to diffs on mobile - https://phabricator.wikimedia.org/T378142 [20:31:31] T378204: Client error "TypeError: Cannot convert undefined or null to object" - https://phabricator.wikimedia.org/T378204 [20:31:32] T378155: Special:GlobalContributions ignores namespace in links - https://phabricator.wikimedia.org/T378155 [20:33:06] !log kharlan@deploy2002 jdlrobson, kharlan: Backport for [[gerrit:1083874|Partial Revert "Make sure contributor's name is on its line" (T378142)]], [[gerrit:1083877|Restore missing second argument to "mapState" in QuickView.vue (T378204)]], [[gerrit:1083853|GlobalContributionsPager: Use Special:PermanentLink to construct link (T378155)]], [[gerrit:1083886|GlobalContributionsPager: Don't display external namespace in artic [20:33:06] le link (T378155)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [20:33:19] bwang: the changes are available for testing [20:33:28] which server? [20:33:56] (03CR) 10Andrea Denisse: [C:03+1] "LGTM, tho I think that elaborating on the rationale behind removing the RSA certificates in the respective task is a good idea as there's " [puppet] - 10https://gerrit.wikimedia.org/r/1075615 (https://phabricator.wikimedia.org/T375569) (owner: 10BCornwall) [20:33:57] k8s-mwdebug [20:34:38] bwang: ^ [20:34:47] thx! [20:37:00] looks good! [20:37:33] !log kharlan@deploy2002 jdlrobson, kharlan: Continuing with sync [20:37:42] bwang: thanks, syncing [20:42:15] 10ops-eqiad, 06SRE, 06DC-Ops, 10fundraising-tech-ops, and 2 others: Frack eqiad network upgrade: design, installation and configuration - https://phabricator.wikimedia.org/T377381#10270349 (10Dwisehaupt) @cmooney @Jclark-ctr Wonderful, thanks for the flexibility. We should get confirmation of the shift in... [20:42:20] !log kharlan@deploy2002 Finished scap sync-world: Backport for [[gerrit:1083874|Partial Revert "Make sure contributor's name is on its line" (T378142)]], [[gerrit:1083877|Restore missing second argument to "mapState" in QuickView.vue (T378204)]], [[gerrit:1083853|GlobalContributionsPager: Use Special:PermanentLink to construct link (T378155)]], [[gerrit:1083886|GlobalContributionsPager: Don't display external namespace in [20:42:20] article link (T378155)]] (duration: 11m 24s) [20:42:54] T378142: watchlist doesn't link to diffs on mobile - https://phabricator.wikimedia.org/T378142 [20:42:54] T378204: Client error "TypeError: Cannot convert undefined or null to object" - https://phabricator.wikimedia.org/T378204 [20:42:55] T378155: Special:GlobalContributions ignores namespace in links - https://phabricator.wikimedia.org/T378155 [20:43:46] bwang: all done [20:44:09] !log UTC late deploys done [20:44:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:46:43] thanks! [20:51:26] one more backport, it turns out [20:55:04] (03PS1) 10Kosta Harlan: GlobalContributionsPager: Make article link redirect to the page [extensions/CheckUser] (wmf/1.43.0-wmf.28) - 10https://gerrit.wikimedia.org/r/1083902 (https://phabricator.wikimedia.org/T378155) [20:56:59] (03CR) 10TrainBranchBot: [C:03+2] "Approved by kharlan@deploy2002 using scap backport" [extensions/CheckUser] (wmf/1.43.0-wmf.28) - 10https://gerrit.wikimedia.org/r/1083902 (https://phabricator.wikimedia.org/T378155) (owner: 10Kosta Harlan) [20:59:26] PROBLEM - SSH on prometheus1006 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring [21:00:04] Reedy, sbassett, Maryum, and manfredi: #bothumor My software never has bugs. It just develops random features. Rise for Weekly Security deployment window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241028T2100). [21:00:31] still need another 10 minutes for the last backport [21:01:31] FIRING: [6x] ProbeDown: Service centrallog1002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [21:02:15] (03CR) 10Tchanders: [C:03+1] temp accounts: Enable temp account autocreation on five pilot wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1083890 (https://phabricator.wikimedia.org/T378334) (owner: 10Kosta Harlan) [21:07:16] FIRING: JobUnavailable: Reduced availability for job thanos-sidecar in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [21:15:16] RECOVERY - SSH on prometheus1006 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring [21:16:31] FIRING: [6x] ProbeDown: Service centrallog1002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [21:17:12] (03Merged) 10jenkins-bot: GlobalContributionsPager: Make article link redirect to the page [extensions/CheckUser] (wmf/1.43.0-wmf.28) - 10https://gerrit.wikimedia.org/r/1083902 (https://phabricator.wikimedia.org/T378155) (owner: 10Kosta Harlan) [21:17:16] RESOLVED: JobUnavailable: Reduced availability for job thanos-sidecar in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [21:17:30] !log kharlan@deploy2002 Started scap sync-world: Backport for [[gerrit:1083902|GlobalContributionsPager: Make article link redirect to the page (T378155)]] [21:17:55] T378155: Special:GlobalContributions ignores namespace in links - https://phabricator.wikimedia.org/T378155 [21:19:41] !log kharlan@deploy2002 kharlan: Backport for [[gerrit:1083902|GlobalContributionsPager: Make article link redirect to the page (T378155)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [21:21:13] !log kharlan@deploy2002 kharlan: Continuing with sync [21:26:24] (03CR) 10Scott French: [C:03+2] shellbox: pin all instances at live image version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1082317 (https://phabricator.wikimedia.org/T375243) (owner: 10Scott French) [21:26:31] !log kharlan@deploy2002 Finished scap sync-world: Backport for [[gerrit:1083902|GlobalContributionsPager: Make article link redirect to the page (T378155)]] (duration: 09m 01s) [21:26:49] !log T372074 `sudo requestctl delete action cache-text/T372074` && `sudo requestctl delete action cache-text/T372074_wdqs_codfw_flap` [21:26:53] T378155: Special:GlobalContributions ignores namespace in links - https://phabricator.wikimedia.org/T378155 [21:26:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:27:46] (03Merged) 10jenkins-bot: shellbox: pin all instances at live image version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1082317 (https://phabricator.wikimedia.org/T375243) (owner: 10Scott French) [21:29:30] PROBLEM - Confd vcl based reload on cp2029 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [21:29:31] PROBLEM - Confd vcl based reload on cp2037 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [21:29:31] PROBLEM - Confd vcl based reload on cp2039 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [21:29:31] PROBLEM - Confd vcl based reload on cp2027 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [21:29:31] PROBLEM - Confd vcl based reload on cp2035 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [21:29:31] PROBLEM - Confd vcl based reload on cp2041 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [21:29:31] !log UTC late deploys done, for real [21:29:32] PROBLEM - Confd vcl based reload on cp2033 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [21:29:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:29:40] PROBLEM - Confd vcl based reload on cp6014 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [21:29:46] PROBLEM - Confd vcl based reload on cp6015 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [21:29:50] PROBLEM - Confd vcl based reload on cp1108 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [21:29:54] PROBLEM - Confd vcl based reload on cp4041 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [21:29:54] PROBLEM - Confd vcl based reload on cp4043 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [21:29:54] PROBLEM - Confd vcl based reload on cp4037 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [21:29:54] PROBLEM - Confd vcl based reload on cp4044 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [21:29:54] PROBLEM - Confd vcl based reload on cp4038 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [21:29:54] PROBLEM - Confd vcl based reload on cp3067 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [21:29:55] PROBLEM - Confd vcl based reload on cp3073 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [21:29:55] PROBLEM - Confd vcl based reload on cp3071 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [21:29:56] PROBLEM - Confd vcl based reload on cp3068 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [21:29:56] PROBLEM - Confd vcl based reload on cp3072 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [21:29:57] PROBLEM - Confd vcl based reload on cp3070 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [21:29:57] PROBLEM - Confd vcl based reload on cp3066 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [21:29:58] PROBLEM - Confd vcl based reload on cp3069 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [21:29:58] PROBLEM - Confd vcl based reload on cp6013 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [21:29:59] PROBLEM - Confd vcl based reload on cp6012 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [21:29:59] PROBLEM - Confd vcl based reload on cp6009 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [21:30:00] PROBLEM - Confd vcl based reload on cp6010 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [21:30:00] PROBLEM - Confd vcl based reload on cp6011 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [21:30:01] PROBLEM - Confd vcl based reload on cp6016 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [21:30:06] PROBLEM - Confd vcl based reload on cp7005 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [21:30:06] PROBLEM - Confd vcl based reload on cp7006 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [21:30:06] PROBLEM - Confd vcl based reload on cp7003 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [21:30:06] PROBLEM - Confd vcl based reload on cp7007 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [21:30:06] PROBLEM - Confd vcl based reload on cp7002 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [21:30:06] PROBLEM - Confd vcl based reload on cp7008 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [21:30:06] PROBLEM - Confd vcl based reload on cp7001 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [21:30:07] PROBLEM - Confd vcl based reload on cp7004 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [21:30:10] PROBLEM - Confd vcl based reload on cp1100 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [21:30:10] PROBLEM - Confd vcl based reload on cp1110 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [21:30:10] PROBLEM - Confd vcl based reload on cp1106 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [21:30:10] PROBLEM - Confd vcl based reload on cp1102 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [21:30:10] PROBLEM - Confd vcl based reload on cp1104 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [21:30:10] PROBLEM - Confd vcl based reload on cp1112 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [21:30:11] PROBLEM - Confd vcl based reload on cp1114 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [21:30:16] PROBLEM - Confd vcl based reload on cp4040 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [21:30:16] PROBLEM - Confd vcl based reload on cp4042 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [21:30:18] PROBLEM - Confd vcl based reload on cp4039 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [21:30:26] PROBLEM - Confd vcl based reload on cp5018 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [21:30:26] PROBLEM - Confd vcl based reload on cp5017 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [21:30:26] PROBLEM - Confd vcl based reload on cp5019 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [21:30:26] PROBLEM - Confd vcl based reload on cp5020 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [21:30:26] PROBLEM - Confd vcl based reload on cp5023 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [21:30:26] PROBLEM - Confd vcl based reload on cp5022 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [21:30:26] PROBLEM - Confd vcl based reload on cp5024 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [21:30:27] PROBLEM - Confd vcl based reload on cp5021 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [21:30:30] PROBLEM - Confd vcl based reload on cp2031 is CRITICAL: reload-vcl failed to run since 0h, 2 minutes. https://wikitech.wikimedia.org/wiki/Varnish [21:30:34] (03PS1) 10BCornwall: varnish: Move wm_recv_early subroutine to inline [puppet] - 10https://gerrit.wikimedia.org/r/1083913 (https://phabricator.wikimedia.org/T370200) [21:30:36] (03PS1) 10BCornwall: varnish: Move wm_recv_purge subroutine to inline [puppet] - 10https://gerrit.wikimedia.org/r/1083914 (https://phabricator.wikimedia.org/T370200) [21:31:00] (03Abandoned) 10BCornwall: varnish: Consolidate analytics subroutines [puppet] - 10https://gerrit.wikimedia.org/r/1070688 (https://phabricator.wikimedia.org/T370200) (owner: 10BCornwall) [21:33:52] ryankemper: them alerts seem well timed to your log [21:34:17] RhinosF1: looking [21:36:08] rzl any chance you could help us w/the requestctl stuff? [21:36:51] We were trying to clean up an unused rule (see #security) and it seems we generated an invalid config [21:39:03] (03CR) 10Dzahn: [C:03+1] "lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/1075608 (https://phabricator.wikimedia.org/T375569) (owner: 10BCornwall) [21:39:20] (03CR) 10Dzahn: [C:03+1] "(can't speak for all systems here, but afaict yes :)" [puppet] - 10https://gerrit.wikimedia.org/r/1075608 (https://phabricator.wikimedia.org/T375569) (owner: 10BCornwall) [21:48:26] RECOVERY - Confd vcl based reload on cp5017 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [21:48:26] RECOVERY - Confd vcl based reload on cp5019 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [21:48:26] RECOVERY - Confd vcl based reload on cp5020 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [21:48:26] RECOVERY - Confd vcl based reload on cp5022 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [21:48:26] RECOVERY - Confd vcl based reload on cp5018 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [21:48:26] RECOVERY - Confd vcl based reload on cp5023 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [21:48:26] RECOVERY - Confd vcl based reload on cp5021 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [21:48:27] RECOVERY - Confd vcl based reload on cp5024 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [21:48:30] RECOVERY - Confd vcl based reload on cp2031 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [21:48:30] RECOVERY - Confd vcl based reload on cp2039 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [21:48:30] RECOVERY - Confd vcl based reload on cp2037 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [21:48:30] RECOVERY - Confd vcl based reload on cp2029 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [21:48:31] RECOVERY - Confd vcl based reload on cp2035 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [21:48:31] RECOVERY - Confd vcl based reload on cp2033 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [21:48:31] RECOVERY - Confd vcl based reload on cp2041 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [21:48:32] RECOVERY - Confd vcl based reload on cp2027 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [21:48:40] RECOVERY - Confd vcl based reload on cp6014 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [21:48:41] !log T372074 `sudo requestctl commit` [21:48:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:48:46] RECOVERY - Confd vcl based reload on cp6015 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [21:48:50] RECOVERY - Confd vcl based reload on cp1108 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [21:48:51] Turned out it was as simple as the needing a commit after a delete :/ [21:48:54] RECOVERY - Confd vcl based reload on cp4043 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [21:48:54] RECOVERY - Confd vcl based reload on cp4038 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [21:48:54] RECOVERY - Confd vcl based reload on cp4044 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [21:48:54] RECOVERY - Confd vcl based reload on cp4037 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [21:48:54] RECOVERY - Confd vcl based reload on cp4041 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [21:48:55] RECOVERY - Confd vcl based reload on cp3070 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [21:48:55] RECOVERY - Confd vcl based reload on cp3071 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [21:48:56] RECOVERY - Confd vcl based reload on cp3073 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [21:48:56] RECOVERY - Confd vcl based reload on cp3069 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [21:48:57] RECOVERY - Confd vcl based reload on cp3066 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [21:48:57] RECOVERY - Confd vcl based reload on cp3067 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [21:48:58] RECOVERY - Confd vcl based reload on cp3072 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [21:48:58] RECOVERY - Confd vcl based reload on cp3068 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [21:48:59] RECOVERY - Confd vcl based reload on cp6011 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [21:48:59] RECOVERY - Confd vcl based reload on cp6010 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [21:49:00] RECOVERY - Confd vcl based reload on cp6009 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [21:49:00] RECOVERY - Confd vcl based reload on cp6013 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [21:49:01] RECOVERY - Confd vcl based reload on cp6012 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [21:49:01] RECOVERY - Confd vcl based reload on cp6016 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [21:49:06] RECOVERY - Confd vcl based reload on cp7001 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [21:49:06] RECOVERY - Confd vcl based reload on cp7008 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [21:49:06] RECOVERY - Confd vcl based reload on cp7002 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [21:49:06] RECOVERY - Confd vcl based reload on cp7006 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [21:49:06] RECOVERY - Confd vcl based reload on cp7003 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [21:49:06] RECOVERY - Confd vcl based reload on cp7004 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [21:49:06] RECOVERY - Confd vcl based reload on cp7007 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [21:49:07] RECOVERY - Confd vcl based reload on cp7005 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [21:49:10] RECOVERY - Confd vcl based reload on cp1106 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [21:49:10] RECOVERY - Confd vcl based reload on cp1100 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [21:49:10] RECOVERY - Confd vcl based reload on cp1110 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [21:49:10] RECOVERY - Confd vcl based reload on cp1104 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [21:49:10] RECOVERY - Confd vcl based reload on cp1102 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [21:49:10] RECOVERY - Confd vcl based reload on cp1114 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [21:49:11] RECOVERY - Confd vcl based reload on cp1112 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [21:49:16] RECOVERY - Confd vcl based reload on cp4040 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [21:49:16] RECOVERY - Confd vcl based reload on cp4042 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [21:49:18] RECOVERY - Confd vcl based reload on cp4039 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [21:57:12] inflatador: just seeing that, still need anything? [21:58:38] rzl: we're all good here, thanks [21:58:57] 👍 [22:00:40] !log T372074 `sudo requestctl delete ipblock abuse/wdqs` && `sudo requestctl delete pattern ua/wdqs_sparql` to clean up objects removed in commit `d26fc1e910579d33d33ec3d5a192d137045eba4b` ( <-- this occurred before the requestctl commit; i just missed making the irc log) [22:00:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:01:07] (03CR) 10BCornwall: [C:03+1] Drop labtestwikitech name [dns] - 10https://gerrit.wikimedia.org/r/1083306 (https://phabricator.wikimedia.org/T378260) (owner: 10Majavah) [22:09:54] !log ebernhardson@deploy2002 Started deploy [airflow-dags/search@99eb6f3]: T375387: update discolytics to 0.27.0 [22:10:13] T375387: Include fulltext search results Page Previews of sufficient dwell time in Search Metrics dashboard - https://phabricator.wikimedia.org/T375387 [22:10:44] !log ebernhardson@deploy2002 Finished deploy [airflow-dags/search@99eb6f3]: T375387: update discolytics to 0.27.0 (duration: 00m 50s) [22:19:28] 06SRE, 10LDAP-Access-Requests: Grant Access to ldap/nda for Deepesha Burse WMDE - https://phabricator.wikimedia.org/T378182#10270648 (10KFrancis) The NDA has been sent out for signatures. I'll confirm when it's complete. [22:27:57] !log ebernhardson@deploy2002 Started deploy [airflow-dags/search@d85a93c]: add missing comma [22:28:33] !log ebernhardson@deploy2002 Finished deploy [airflow-dags/search@d85a93c]: add missing comma (duration: 00m 36s) [22:38:06] FIRING: MediaWikiLoginFailures: Elevated MediaWiki centrallogin failures (centralauth_error_badtoken) - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?viewPanel=3 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLoginFailures [22:39:34] (03PS1) 10Ryan Kemper: ryankemper: add timestamps to bash history [puppet] - 10https://gerrit.wikimedia.org/r/1083925 [22:43:06] RESOLVED: MediaWikiLoginFailures: Elevated MediaWiki centrallogin failures (centralauth_error_badtoken) - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?viewPanel=3 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLoginFailures [22:47:18] PROBLEM - Check unit status of httpbb_kubernetes_mw-parsoid_hourly on cumin2002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-parsoid_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [22:47:25] FIRING: SystemdUnitFailed: httpbb_kubernetes_mw-parsoid_hourly.service on cumin2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [23:01:06] !log swfrench@deploy2002 helmfile [staging] START helmfile.d/services/shellbox: apply [23:01:45] !log swfrench@deploy2002 helmfile [staging] DONE helmfile.d/services/shellbox: apply [23:01:56] !log swfrench@deploy2002 helmfile [staging] START helmfile.d/services/shellbox-constraints: apply [23:02:34] !log swfrench@deploy2002 helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply [23:02:45] !log swfrench@deploy2002 helmfile [staging] START helmfile.d/services/shellbox-media: apply [23:03:09] !log swfrench@deploy2002 helmfile [staging] DONE helmfile.d/services/shellbox-media: apply [23:03:20] !log swfrench@deploy2002 helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply [23:03:39] !log swfrench@deploy2002 helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply [23:03:50] !log swfrench@deploy2002 helmfile [staging] START helmfile.d/services/shellbox-timeline: apply [23:04:19] !log swfrench@deploy2002 helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply [23:04:30] !log swfrench@deploy2002 helmfile [staging] START helmfile.d/services/shellbox-video: apply [23:05:03] !log swfrench@deploy2002 helmfile [staging] DONE helmfile.d/services/shellbox-video: apply [23:16:56] oh hi jouncebot. I still owe you a bouncer don't I. [23:17:06] jouncebot: nowandnext [23:17:07] No deployments scheduled for the next 2 hour(s) and 42 minute(s) [23:17:07] In 2 hour(s) and 42 minute(s): Automatic branching of MediaWiki, extensions, skins, and vendor – see Heterogeneous_deployment/Train_deploys (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241029T0200) [23:29:58] have gotten a handful of reports on Discord of some unspecified problem accessing wikipedia, have not gotten further details. [23:47:18] RECOVERY - Check unit status of httpbb_kubernetes_mw-parsoid_hourly on cumin2002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-parsoid_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [23:47:25] RESOLVED: SystemdUnitFailed: httpbb_kubernetes_mw-parsoid_hourly.service on cumin2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed