[00:22:40] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Q2:rack/setup/install mc-gp200[4-6] - https://phabricator.wikimedia.org/T376968#10291293 (10Jhancock.wm) mc-gp2006 nic port isn't coming up. tried different DAC cables but not coming up. will try a different port to see if it's a switch issue in the morning. [00:51:26] 06serviceops, 10MW-on-K8s: Support bringing text files into the container for one-off maintenance scripts - https://phabricator.wikimedia.org/T376230#10291322 (10RLazarus) 05Open→03Resolved This is now supported, and documented at https://wikitech.wikimedia.org/wiki/Maintenance_scripts#Input_from_a_fil... [01:06:24] 06serviceops: Consider lifting AssembleUploadChunks and PublishStashedFile out of the low-traffic consumer - https://phabricator.wikimedia.org/T379035 (10Scott_French) 03NEW [01:07:59] 06serviceops, 10FlaggedRevs, 10WMF-JobQueue: Spike in JobQueue job backlog time (500ms -> 4-8 minutes) - https://phabricator.wikimedia.org/T378385#10291363 (10Scott_French) Thanks, @Bawolff - Yes, indeed, those both fan into the low-traffic consumer. While we don't really have a prioritization mechanism... [02:22:19] 06serviceops, 10FlaggedRevs, 10WMF-JobQueue: Spike in JobQueue job backlog time (500ms -> 4-8 minutes) - https://phabricator.wikimedia.org/T378385#10291414 (10Bawolff) [09:01:56] 06serviceops: MW script "eval.php" failing during scap operations - https://phabricator.wikimedia.org/T379044 (10jnuche) 03NEW [09:05:03] 06serviceops, 06SRE, 05MediaWiki-backport-deployments, 05Train Deployments: MW script "eval.php" failing during scap operations - https://phabricator.wikimedia.org/T379044#10291704 (10kostajh) [09:06:34] 06serviceops, 06SRE, 05MediaWiki-backport-deployments, 05Train Deployments: MW script "eval.php" failing during scap operations - https://phabricator.wikimedia.org/T379044#10291711 (10kostajh) [09:06:37] 06serviceops, 06SRE, 05MediaWiki-backport-deployments, 05Train Deployments: MW script "eval.php" failing during scap operations - https://phabricator.wikimedia.org/T379044#10291707 (10kostajh) p:05Triage→03Unbreak! Marking as UBN, as this is blocking train and backport deployments. [09:20:26] 06serviceops, 06SRE, 05MediaWiki-backport-deployments, 05Train Deployments: MW script "eval.php" failing during scap operations - https://phabricator.wikimedia.org/T379044#10291760 (10Joe) Investigating right now. It would help to know on what server this happened. I assume the active deployment server? [09:21:38] 06serviceops, 06SRE, 05MediaWiki-backport-deployments, 05Train Deployments: MW script "eval.php" failing during scap operations - https://phabricator.wikimedia.org/T379044#10291762 (10jnuche) > I assume the active deployment server? Yeah, on `deploy2002` [09:37:39] 06serviceops, 06SRE, 05MediaWiki-backport-deployments, 05Train Deployments: MW script "eval.php" failing during scap operations - https://phabricator.wikimedia.org/T379044#10291795 (10Joe) Rsyslog had some tls errors so i restarted it, but i doubted it could be the real culprit, given it is reached via udp... [10:01:16] 06serviceops, 06SRE, 05MediaWiki-backport-deployments, 05Train Deployments: MW script "eval.php" failing during scap operations - https://phabricator.wikimedia.org/T379044#10291857 (10Joe) I would suggest, on the short term, to just run docker with `--network=host` and then check what log calls are being m... [10:07:07] 06serviceops, 06SRE, 05MediaWiki-backport-deployments, 05Train Deployments: MW script "eval.php" failing during scap operations - https://phabricator.wikimedia.org/T379044#10291863 (10jnuche) More details, successful backports were still happening yesterday, e.g.: https://sal.toolforge.org/log/6iIf-ZIBFFSC... [10:37:50] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Q2:rack/setup/install kubestage200[3-4] - https://phabricator.wikimedia.org/T377009#10291974 (10Clement_Goubert) Thanks @Jhancock.wm ! [11:05:36] 06serviceops: kubestage200[3-4] implementation tracking - https://phabricator.wikimedia.org/T377011#10292083 (10Clement_Goubert) 05Open→03In progress a:03jasmine_ Hey Jasmine, Could you put these hosts in production using the procedure at https://wikitech.wikimedia.org/wiki/Kubernetes/Clusters/Add_or_remo... [11:31:23] 06serviceops, 06SRE, 05MediaWiki-backport-deployments, 05Train Deployments: MW script "eval.php" failing during scap operations - https://phabricator.wikimedia.org/T379044#10292204 (10jnuche) [11:32:00] 06serviceops, 06SRE, 05MediaWiki-backport-deployments, 05Train Deployments: MW script "eval.php" failing during scap operations - https://phabricator.wikimedia.org/T379044#10292200 (10jnuche) p:05Unbreak!→03Triage Train now got past the failing stage. Removing this as a blocker [11:34:40] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q2:rack/setup/install wikikube-worker13[05-12] - https://phabricator.wikimedia.org/T377021#10292215 (10Clement_Goubert) Hostnames fixed in tasks and in puppet, sorry about that. [13:03:25] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Degraded RAID on wikikube-worker2068 - https://phabricator.wikimedia.org/T378255#10292455 (10Clement_Goubert) 05In progress→03Resolved RAID is now rebuilt. [13:12:33] hnowlan: thanks for the kartotherian reviews <3 [13:57:32] 06serviceops, 06MW-Interfaces-Team, 10RESTBase Sunsetting, 13Patch-For-Review: Switchover plan from RESTbase to REST Gateway for rest_v1/page/html and rest_v1/page/title endpoints - https://phabricator.wikimedia.org/T374683#10292649 (10akosiaris) Minor hiccup aside with having to quote values in the templa... [14:22:19] 06serviceops, 06MW-Interfaces-Team, 10RESTBase Sunsetting, 13Patch-For-Review: Switchover plan from RESTbase to REST Gateway for rest_v1/page/html and rest_v1/page/title endpoints - https://phabricator.wikimedia.org/T374683#10292741 (10akosiaris) rest-gateway and CDN changes merged, I 've forced esams so I... [14:29:50] 06serviceops, 06MW-Interfaces-Team, 10RESTBase Sunsetting, 13Patch-For-Review: Switchover plan from RESTbase to REST Gateway for rest_v1/page/html and rest_v1/page/title endpoints - https://phabricator.wikimedia.org/T374683#10292754 (10daniel) >>! In T374683#10292741, @akosiaris wrote: > That `tid` change... [15:58:13] 06serviceops, 10MW-on-K8s, 07Datacenter-Switchover, 13Patch-For-Review: Control mw-on-k8s periodic maintenance jobs with an etcd value - https://phabricator.wikimedia.org/T367118#10293139 (10Clement_Goubert) Now available on deployment servers, updated every 30s from `confd`: `lang=yaml, name=/etc/helmfile... [15:59:25] 06serviceops, 06MW-Interfaces-Team, 10RESTBase Sunsetting: Switchover plan from RESTbase to REST Gateway for rest_v1/page/html and rest_v1/page/title endpoints - https://phabricator.wikimedia.org/T374683#10293144 (10akosiaris) 05Open→03Resolved a:03akosiaris I 'll resolve then. Thanks! [16:25:45] anyone have a few minutes for a couple small code reviews? starting with https://gerrit.wikimedia.org/r/c/operations/dns/+/1087511 [16:27:03] I can look cdanis [16:28:39] thanks you two [16:39:24] https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087515 [16:58:57] 06serviceops, 06SRE, 05MediaWiki-backport-deployments, 05Train Deployments: MW script "eval.php" failing during scap operations - https://phabricator.wikimedia.org/T379044#10293477 (10thcipriani) >>! In T379044#10291795, @Joe wrote: > and something in eval.php tries to log the call. I guess it's something... [17:06:36] 06serviceops, 10MW-on-K8s, 10Observability-Metrics, 13Patch-For-Review, 10SRE Observability (FY2024/2025-Q2): Create a per-release deployment of statsd-exporter for mw-on-k8s - https://phabricator.wikimedia.org/T365265#10293532 (10lmata) [17:11:00] 06serviceops, 10MW-on-K8s, 10Observability-Metrics, 07Grafana, 10SRE Observability (FY2024/2025-Q2): Gaps in Grafana graphs using Thanos - https://phabricator.wikimedia.org/T371885#10293577 (10lmata) [20:05:28] last patch :D https://gerrit.wikimedia.org/r/1087565 [20:10:02] 06serviceops, 06MW-Interfaces-Team, 10RESTBase Sunsetting: Switchover plan from RESTbase to REST Gateway for rest_v1/page/html and rest_v1/page/title endpoints - https://phabricator.wikimedia.org/T374683#10294212 (10Legoktm) 05Resolved→03Open Most of my bots and tools that used RESTBase are now broken, a... [20:11:35] I don't know what's going on, but ^ is starting to get ridiculous [20:12:23] how hard is it to proactively tell people you're going to make changes when said people have been asking for details for literally a year now [20:40:33] rzl: swfrench-wmf: one of you maybe? https://gerrit.wikimedia.org/r/1087565 [20:46:25] cdanis: ack, looking [20:49:24] LGTM [21:21:36] 06serviceops: Migrate production Shellbox variants to PHP 8.1 - https://phabricator.wikimedia.org/T377038#10294436 (10Scott_French) Alright, we're ready to start the first migration, using shellbox-syntaxhighlight as a pilot. I'd propose the following schedule to switch to 8.1, executed in parallel across eqiad... [21:26:11] thanks <3 [21:51:52] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Q2:rack/setup/install wikikube-worker21[28-35] - https://phabricator.wikimedia.org/T377007#10294547 (10Jhancock.wm) [22:00:43] 06serviceops, 10MediaWiki-Platform-Team (Radar): Regenerate UcfirstOverrides.php for PHP 7.4 -> 8.1 transition - https://phabricator.wikimedia.org/T372603#10294577 (10Scott_French) 05Open→03In progress The changes in T370934 are now live, and we have 8.1-based MediaWiki images built during scap deployments... [22:05:18] 06serviceops, 10Charts, 10Wikimedia-Extension-setup, 07Epic, 07Wikimedia-extension-review-queue: Epic: Deploy Chart extension in production - https://phabricator.wikimedia.org/T369944#10294611 (10CDanis) [22:36:11] 06serviceops, 06MW-Interfaces-Team, 10RESTBase Sunsetting, 13Patch-For-Review: Switchover plan from RESTbase to REST Gateway for rest_v1/page/html and rest_v1/page/title endpoints - https://phabricator.wikimedia.org/T374683#10294694 (10HCoplin-WMF) @Legoktm -- this was 100% an unintended breakage that we a... [22:58:23] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Q2:rack/setup/install wikikube-worker21[28-35] - https://phabricator.wikimedia.org/T377007#10294729 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host wikikube-worker2130.codfw.wmnet with OS bookworm [22:58:24] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Q2:rack/setup/install wikikube-worker21[28-35] - https://phabricator.wikimedia.org/T377007#10294730 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host wikikube-worker2131.codfw.wmnet with OS bookworm [22:58:25] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Q2:rack/setup/install wikikube-worker21[28-35] - https://phabricator.wikimedia.org/T377007#10294731 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host wikikube-worker2132.codfw.wmnet with OS bookworm [22:58:27] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Q2:rack/setup/install wikikube-worker21[28-35] - https://phabricator.wikimedia.org/T377007#10294732 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host wikikube-worker2133.codfw.wmnet with OS bookworm [23:01:50] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Q2:rack/setup/install wikikube-worker21[28-35] - https://phabricator.wikimedia.org/T377007#10294736 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host wikikube-worker2134.codfw.wmnet with OS bookworm [23:01:51] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Q2:rack/setup/install wikikube-worker21[28-35] - https://phabricator.wikimedia.org/T377007#10294737 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host wikikube-worker2135.codfw.wmnet with OS bookworm [23:03:02] 06serviceops, 10MediaWiki-Platform-Team (Radar): Regenerate UcfirstOverrides.php for PHP 7.4 -> 8.1 transition - https://phabricator.wikimedia.org/T372603#10294739 (10Scott_French) Alright, step 1, using the 7.4 and 8.1 flavors of the same image: ` swfrench@deploy2002:~$ mwscript-k8s --comment='Generating uct... [23:35:05] 06serviceops, 06MW-Interfaces-Team, 10RESTBase Sunsetting, 13Patch-For-Review, 07Wikimedia-Incident: Switchover plan from RESTbase to REST Gateway for rest_v1/page/html and rest_v1/page/title endpoints - https://phabricator.wikimedia.org/T374683#10294780 (10Legoktm) >>! In T374683#10294694, @HCoplin-WMF... [23:55:27] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Q2:rack/setup/install wikikube-worker21[28-35] - https://phabricator.wikimedia.org/T377007#10294843 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host wikikube-worker2131.codfw.wmnet with OS bookworm completed: - wi... [23:55:28] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Q2:rack/setup/install wikikube-worker21[28-35] - https://phabricator.wikimedia.org/T377007#10294844 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host wikikube-worker2133.codfw.wmnet with OS bookworm completed: - wi... [23:55:30] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Q2:rack/setup/install wikikube-worker21[28-35] - https://phabricator.wikimedia.org/T377007#10294845 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host wikikube-worker2130.codfw.wmnet with OS bookworm completed: - wi... [23:58:55] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Q2:rack/setup/install wikikube-worker21[28-35] - https://phabricator.wikimedia.org/T377007#10294848 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host wikikube-worker2132.codfw.wmnet with OS bookworm completed: - wi... [23:58:58] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Q2:rack/setup/install wikikube-worker21[28-35] - https://phabricator.wikimedia.org/T377007#10294850 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host wikikube-worker2135.codfw.wmnet with OS bookworm completed: - wi... [23:59:00] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Q2:rack/setup/install wikikube-worker21[28-35] - https://phabricator.wikimedia.org/T377007#10294851 (10Jhancock.wm) [23:59:02] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Q2:rack/setup/install wikikube-worker21[28-35] - https://phabricator.wikimedia.org/T377007#10294852 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host wikikube-worker2134.codfw.wmnet with OS bookworm completed: - wi... [23:59:05] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Q2:rack/setup/install wikikube-worker21[28-35] - https://phabricator.wikimedia.org/T377007#10294853 (10Jhancock.wm)