[02:31:16] 06serviceops, 10Charts, 10Shellbox, 06SRE: Figure out how a shellbox instance for the Chart extension would work - https://phabricator.wikimedia.org/T370739 (10Catrope) 03NEW [02:37:31] 06serviceops, 10Charts, 10Shellbox, 06SRE: Figure out how a shellbox instance for the Chart extension would work - https://phabricator.wikimedia.org/T370739#10005577 (10Catrope) The Chart extension is still in early development, so this is by no means the final form of the code, but for now we have a simpl... [05:39:38] 06serviceops, 10Charts, 10Shellbox, 06SRE: Figure out how a shellbox instance for the Chart extension would work - https://phabricator.wikimedia.org/T370739#10005664 (10Legoktm) > However, the Chart extension's use case would involve shelling out to a Node.js script, which would need to install dependencie... [07:50:59] 06serviceops, 10Charts, 10Wikimedia-Extension-setup, 07Wikimedia-extension-review-queue: Deploy Chart extension in production - https://phabricator.wikimedia.org/T369944#10005804 (10jijiki) [09:17:51] kostajh: regarding your token issue [09:18:07] do you know if those tokens are stored on memcached? [09:18:08] 06serviceops, 10Charts, 10Shellbox, 06SRE: Figure out how a shellbox instance for the Chart extension would work - https://phabricator.wikimedia.org/T370739#10005968 (10akosiaris) What @Legoktm suggsted. If you have already a JSON input for that command and expect back an SVG (it looks this way judging fro... [09:25:55] effie: I believe so, would have to double check though [09:27:21] kostajh: find the keygroup and anything that could be helpful (eg TTLs, key size), so we can have a go if there is something going on on the datastore side [09:44:51] It seems strange that this would impact the mobile site but not desktop [09:48:33] hello operators of the services dem :) [09:48:48] we're upgrading the switch in eqiad F3 later, it has these hosts: [09:48:48] kubernetes1025, kubernetes1026, kubernetes1052, kubernetes1053, kubernetes1054, kubernetes1055, kubernetes1056, mw1496 [09:49:11] if someone could depool them prior it would be great, starting at 15:00 UTC / 17:00 CEST so no rush [10:00:29] 06serviceops, 06Infrastructure-Foundations, 13Patch-For-Review: Cleanup old Docker images running Debian Stretch/Jessie - https://phabricator.wikimedia.org/T367427#10006126 (10elukey) ` $ docker run -it --rm --entrypoint /bin/bash docker-registry.wikimedia.org/python3:0.0.2-20230423 Unable to find image 'doc... [10:07:06] topranks: on it [10:08:06] claime: thanks! [10:12:15] 06serviceops, 10MW-on-K8s, 10Observability-Logging, 06SRE: benthos mw-accesslog-metrics kafka lag and interpolation errors - https://phabricator.wikimedia.org/T367076#10006143 (10kamila) FTR, I have reverted the buffer patch, as it shouldn't be necessary now that we have more partitions thanks to T3692... [10:13:01] 06serviceops, 10LPL Essential, 10MinT, 10Community Wishlist (Translations), 10Community-Tech (Fennec Fox (Aug 12-23, 2024)): Caching service request for MinT - https://phabricator.wikimedia.org/T370755 (10santhosh) 03NEW [10:13:41] hello folks! [10:13:53] I found other two images on the registry running stretch: https://phabricator.wikimedia.org/T367427#10006126 [10:14:10] python3 and ruby, I am planning to drop them, just writing in here for confirmation [11:28:26] topranks: all done [11:29:26] claime: nice one :) [11:32:33] elukey: I don't see any images relying on them in either production-images or CI, so LGTM [11:33:53] super thanks! [11:34:06] elukey: for python3, is it just this tag or the image in general? [11:34:12] https://gerrit.wikimedia.org/r/plugins/gitiles/research/mwaddlink/+/refs/heads/main/.pipeline/blubber.yaml [11:34:22] https://phabricator.wikimedia.org/T336682 [11:35:35] claime: ack I'll check! [11:35:44] I think it is also fine if future build breaks [13:02:04] don't suppose anyone here is familiar with the dse-k8s-worker nodes? [13:02:23] Ben was to drain dse-k8s-worker1008 before the F3 switch upgrade but I think he's off [13:03:22] topranks: I can do it [13:04:10] claime: that would be cool, was looking at the docs I could attempt myself but probably better someone more familiar does it [13:04:32] topranks: I need to get on writing a cookbook for this [13:05:46] topranks: done [13:06:04] you're a super-star thanks! [13:25:44] 06serviceops, 10MW-on-K8s: Allow running periodic jobs for mw on k8s - https://phabricator.wikimedia.org/T341555#10006732 (10Clement_Goubert) == General design == - Map 1-to-1 or 1-to-n between puppet-defined `profile::mediawiki::periodic_job` and kubernetes `CronJob` objects - One helmfile release - As a `Cro... [13:36:14] qq about image-catalog - I see that gunicorn is running on deployment nodes, does it have a UI that we can access? [13:37:03] I didn't find any docs about it, I am interested since I'd like to see if I can use it to track what images are deployed on what k8s cluster [13:47:01] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: hw troubleshooting: CPU 2 machine check error detected for rdb1014.eqiad.wmnet - https://phabricator.wikimedia.org/T370633#10006824 (10Clement_Goubert) [13:48:26] elukey: I had no idea x) [13:49:29] 06serviceops, 13Patch-For-Review: deploy1003 implementation tracking - https://phabricator.wikimedia.org/T364417#10006835 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by akosiaris@cumin1002 for host deploy1003.eqiad.wmnet with OS bullseye [13:52:17] elukey: no there is no UI unfortunately [13:53:06] Ah actually I had an idea, because I apparently have python3-imagecatalog in my search history :D [13:53:08] akosiaris: how does it work? I can RTM if there is anything to read [13:55:58] elukey: I wish there was. I think Reuven might be able to help with explaining most of it, but you can get an idea by ssh -L 3691:localhost:3691 deploy1002.eqiad.wmnet and then visit with curl or in the browser the various endpoints listed in [13:55:58] https://gerrit.wikimedia.org/r/plugins/gitiles/operations/docker-images/imagecatalog/+/refs/heads/master/imagecatalog/web.py [13:56:51] and there is a CLI tool btw [13:59:28] akosiaris: ahhh okok I thought there wasn't any UI yet, using the tunnel is fine! [14:00:54] ah, sorry, when I said UI, I meant some actual HTML based user interface and exposed via the CDN. AFAIK this doesn't exist yet [14:05:47] okok got it, I can follow up with Reuven about image catalog [14:06:04] ideally I'd love a place where we have a list of images versions running on what k8s cluster [15:21:51] claime: the switch upgrade is done for when you have a moment to undrain those hosts [15:22:02] topranks: awesome, thanks [15:25:21] 06serviceops, 06Infrastructure-Foundations, 10Data-Platform-SRE (2024.07.08 - 2024.07.28), 13Patch-For-Review: Create a helm chart for the cloudnativepg postgresql operator - https://phabricator.wikimedia.org/T364797#10007264 (10Gehel) a:05brouberol→03None [15:45:31] 06serviceops, 10Citoid, 10VisualEditor, 10VisualEditor-MediaWiki-References, and 2 others: Register Citoid as a "friendly bot" with Cloudflare - https://phabricator.wikimedia.org/T370118#10007424 (10akosiaris) Hi, Couple of notes here: **IP List** The blog says "These IPs must be publicly documented and... [16:18:07] 06serviceops, 10MW-on-K8s, 06SRE, 06Traffic, and 2 others: Spin down api_appserver and appserver clusters - https://phabricator.wikimedia.org/T367949#10007582 (10Scott_French) Silenced ProbeDown for api-https:443 and appservers-https:443 for 24h: * f6f67d8d-6381-43b3-9262-9a8cf58f2b19 * ed0d352b-fb83-4bd4-... [16:43:37] 06serviceops, 07Wikimedia-production-error: Misbehaving mw-api-ext pods serving 5xx - https://phabricator.wikimedia.org/T370425#10007654 (10jijiki) >>! In T370425#10002480, @Joe wrote: > The `SIGILL` thing happened on bare metal as well, albeit quite rarely. We never properly tracked down what happened, but it... [19:37:21] 06serviceops, 10MW-on-K8s, 06SRE, 06Traffic, and 2 others: Spin down api_appserver and appserver clusters - https://phabricator.wikimedia.org/T367949#10008340 (10Volans) [19:41:34] 06serviceops, 10MW-on-K8s, 06SRE, 06Traffic, and 2 others: Spin down api_appserver and appserver clusters - https://phabricator.wikimedia.org/T367949#10008343 (10Volans) I took the liberty to add a cleanup item to the task description. If that should be part of another task feel to move it around. [20:55:37] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: Q1:rack/setup/install wikikube-worker1240 to wikikube-worker1304 - https://phabricator.wikimedia.org/T369743#10008650 (10Jclark-ctr) [20:56:18] 06serviceops, 10MW-on-K8s, 06SRE, 06Traffic, and 2 others: Spin down api_appserver and appserver clusters - https://phabricator.wikimedia.org/T367949#10008652 (10Scott_French) Many thanks, all who helped get this out the door. At this point, the LVS service turndown is done, and we've shaken out a handful... [20:56:41] 06serviceops, 10MW-on-K8s, 06SRE, 06Traffic, and 2 others: Spin down api_appserver and appserver clusters - https://phabricator.wikimedia.org/T367949#10008653 (10Scott_French) [21:41:17] 06serviceops, 10Dumps-Generation, 06MediaWiki-Platform-Team: Migrate WMF production from PHP 7.4 to PHP 8.1 - https://phabricator.wikimedia.org/T319432#10008830 (10Krinkle) [21:41:20] 06serviceops, 10Dumps-Generation, 06MediaWiki-Platform-Team: Migrate WMF production from PHP 7.4 to PHP 8.1 - https://phabricator.wikimedia.org/T319432#10008832 (10Krinkle) [21:41:28] 06serviceops, 10MW-on-K8s, 06SRE, 06Traffic, 10Release-Engineering-Team (Seen): Serve production traffic via Kubernetes - https://phabricator.wikimedia.org/T290536#10008833 (10Krinkle) [21:46:15] 06serviceops, 10Dumps-Generation, 06MediaWiki-Platform-Team: Migrate WMF production from PHP 7.4 to PHP 8.1 - https://phabricator.wikimedia.org/T319432#10008846 (10Krinkle) [21:46:52] 06serviceops, 06MediaWiki-Platform-Team, 07Epic: Migrate Wikimedia production from PHP 8.1 to PHP 8.3 - https://phabricator.wikimedia.org/T360995#10008847 (10Krinkle) [21:48:38] 06serviceops, 06MediaWiki-Platform-Team, 07Epic: Migrate Wikimedia production from PHP 8.1 to PHP 8.3 - https://phabricator.wikimedia.org/T360995#10008849 (10Krinkle) I've cleared some of the checkboxes copied from the PHP 8.1 upgrade task since we're updating the process for Kubernetes. I suggest we re-copy... [21:49:11] 06serviceops, 10Dumps-Generation, 06MediaWiki-Platform-Team: Migrate WMF production from PHP 7.4 to PHP 8.1 - https://phabricator.wikimedia.org/T319432#10008856 (10Krinkle) [22:22:09] 06serviceops, 06DC-Ops, 10ops-codfw, 10Prod-Kubernetes, and 2 others: Relabel codfw kubernetes nodes - https://phabricator.wikimedia.org/T370672#10008928 (10Jhancock.wm) a:03Jhancock.wm