[00:41:39] 06serviceops, 13Patch-For-Review: Consider lifting AssembleUploadChunks and PublishStashedFile out of the low-traffic consumer - https://phabricator.wikimedia.org/T379035#10315131 (10Scott_French) a:03Scott_French Looking at `includes/api/ApiUpload.php`, it seems there are three jobs in the critical path for... [00:45:35] 06serviceops, 06Commons, 10MediaWiki-Uploading, 10UploadWizard: Repeated UploadWizard failures: "Server did not respond in time" - https://phabricator.wikimedia.org/T379462#10315135 (10Scott_French) a:03Scott_French [00:56:00] 06serviceops, 10MediaWiki-Platform-Team (Radar), 10MW-1.44-notes (1.44.0-wmf.4; 2024-11-19): Regenerate UcfirstOverrides.php for PHP 7.4 -> 8.1 transition - https://phabricator.wikimedia.org/T372603#10315140 (10Scott_French) 05In progress→03Resolved [06:38:08] 06serviceops, 13Patch-For-Review: Consider lifting AssembleUploadChunks and PublishStashedFile out of the low-traffic consumer - https://phabricator.wikimedia.org/T379035#10315359 (10Joe) The only advantage of the second option is that it's easier to reason about the max concurrency of all upload jobs, but I d... [07:07:07] 06serviceops, 06Data Products, 07Epic: SDS 2.1.1 Evaluations of 3rd part Experimentation Platform by SRE Service Ops - https://phabricator.wikimedia.org/T369174#10315403 (10Joe) >>! In T369174#10314737, @odimitrijevic wrote: > Hi @Legoktm, the linked document has been abandoned and is not longer under co... [10:03:30] 06serviceops, 06DC-Ops, 10ops-eqiad, 10Prod-Kubernetes: wikikube-ctrl1002 and wikikube-ctrl1003: Switch network cable from port 2 to port 1 on the 10G NIC - https://phabricator.wikimedia.org/T379717 (10JMeybohm) 03NEW [10:03:46] 06serviceops, 06DC-Ops, 10ops-eqiad, 10Prod-Kubernetes: wikikube-ctrl1002 and wikikube-ctrl1003: Switch network cable from port 2 to port 1 on the 10G NIC - https://phabricator.wikimedia.org/T379717#10315693 (10JMeybohm) [10:03:47] 06serviceops, 10Prod-Kubernetes, 07Kubernetes: Migrate wikikube-eqiad to containerd - https://phabricator.wikimedia.org/T377876#10315694 (10JMeybohm) [10:04:08] 06serviceops, 06DC-Ops, 10ops-eqiad, 10Prod-Kubernetes: wikikube-ctrl1002 and wikikube-ctrl1003: Switch network cable from port 2 to port 1 on the 10G NIC - https://phabricator.wikimedia.org/T379717#10315696 (10JMeybohm) [10:24:37] 06serviceops, 10Prod-Kubernetes, 07Kubernetes: Migrate wikikube-eqiad to containerd - https://phabricator.wikimedia.org/T377876#10315721 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.reimage-stacked-control-plane started by jayme@cumin2002 Reimaging k8s control planes of cluster wikikube-eqiad: container... [10:26:36] 06serviceops, 06DC-Ops, 10ops-eqiad, 10Prod-Kubernetes, and 2 others: wikikube-ctrl1001.eqiad.wmnet fails PXE boot - https://phabricator.wikimedia.org/T379629#10315722 (10JMeybohm) 05Open→03Resolved Resolving this. We're going to fix the others in T379717 [10:30:06] 06serviceops, 06DC-Ops, 10ops-codfw, 10Prod-Kubernetes, 07Kubernetes: wikikube-ctrl2002: Switch network cable from port 2 to port 1 on the 10G NIC - https://phabricator.wikimedia.org/T379719 (10JMeybohm) 03NEW [11:22:19] 06serviceops, 06DC-Ops, 10ops-eqiad: Degraded RAID on wikikube-worker1256 - https://phabricator.wikimedia.org/T379454#10315986 (10Clement_Goubert) [11:23:23] 06serviceops, 06Wikipedia-Android-App-Backlog: Timeout errors when making requests to Firebase for push notifications - https://phabricator.wikimedia.org/T379647#10315989 (10Jgiannelos) Update from debugging: * After running a local env with * squid proxy * local dns forwarder * external traffic of the... [11:25:20] 06serviceops, 06DC-Ops, 10ops-eqiad: Degraded RAID on wikikube-worker1256 - https://phabricator.wikimedia.org/T379454#10315991 (10ops-monitoring-bot) depool host wikikube-worker1256.eqiad.wmnet by cgoubert@cumin1002 with reason: Degraded RAID [11:25:51] 06serviceops, 06DC-Ops, 10ops-eqiad: Degraded RAID on wikikube-worker1256 - https://phabricator.wikimedia.org/T379454#10315993 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.pool-depool-node started by cgoubert@cumin1002 depool for host wikikube-worker1256.eqiad.wmnet completed: - wikikube-worker1256.eqia... [11:26:49] 06serviceops, 06DC-Ops, 10ops-eqiad: Degraded RAID on wikikube-worker1256 - https://phabricator.wikimedia.org/T379454#10316001 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=b845f658-b5b1-44ba-b75b-ce7430a01e60) set by cgoubert@cumin1002 for 7 days, 0:00:00 on 1 host(s) and their service... [11:27:43] 06serviceops, 06DC-Ops, 10ops-eqiad: Degraded RAID on wikikube-worker1256 - https://phabricator.wikimedia.org/T379454#10316003 (10Clement_Goubert) Host depooled and downtimed, you can replace the disk when able. [11:41:31] 06serviceops, 10Prod-Kubernetes, 07Kubernetes: Migrate wikikube-eqiad to containerd - https://phabricator.wikimedia.org/T377876#10316050 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.reimage-stacked-control-plane started by jayme@cumin2002 Reimaging k8s control planes of cluster wikikube-eqiad: container... [11:51:14] 06serviceops: kubestage200[3-4] implementation tracking - https://phabricator.wikimedia.org/T377011#10316170 (10Clement_Goubert) p:05Triage→03Medium [12:28:52] 06serviceops, 06Infrastructure-Foundations, 06Machine-Learning-Team: Migrate the ownership of Docker images in production-images repo to mailing lists - https://phabricator.wikimedia.org/T373526#10316316 (10BTullis) Removing the #data-platform-sre tag because I think that our element of this has been complet... [12:28:55] 06serviceops, 13Patch-For-Review: wikikube-worker21[28-35] implementation tracking - https://phabricator.wikimedia.org/T377008#10316325 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host wikikube-worker2128.codfw.wmnet with OS bookworm [12:29:54] 06serviceops, 13Patch-For-Review: wikikube-worker21[28-35] implementation tracking - https://phabricator.wikimedia.org/T377008#10316333 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host wikikube-worker2129.codfw.wmnet with OS bookworm [12:31:16] 06serviceops, 13Patch-For-Review: wikikube-worker21[28-35] implementation tracking - https://phabricator.wikimedia.org/T377008#10316348 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host wikikube-worker2130.codfw.wmnet with OS bookworm [12:32:32] 06serviceops: wikikube-worker21[28-35] implementation tracking - https://phabricator.wikimedia.org/T377008#10316353 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host wikikube-worker2131.codfw.wmnet with OS bookworm [12:45:06] 06serviceops: wikikube-worker21[28-35] implementation tracking - https://phabricator.wikimedia.org/T377008#10316364 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikikube-worker2128.codfw.wmnet with OS bookworm executed with errors: - wikikube-worker2128 (**FA... [12:45:48] 06serviceops: wikikube-worker21[28-35] implementation tracking - https://phabricator.wikimedia.org/T377008#10316376 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host wikikube-worker2128.codfw.wmnet with OS bookworm [12:54:25] 06serviceops: wikikube-worker21[28-35] implementation tracking - https://phabricator.wikimedia.org/T377008#10316403 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikikube-worker2128.codfw.wmnet with OS bookworm executed with errors: - wikikube-worker2128 (**FA... [12:55:12] 06serviceops: wikikube-worker21[28-35] implementation tracking - https://phabricator.wikimedia.org/T377008#10316406 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host wikikube-worker2128.codfw.wmnet with OS bookworm [12:59:25] 06serviceops: wikikube-worker21[28-35] implementation tracking - https://phabricator.wikimedia.org/T377008#10316412 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikikube-worker2129.codfw.wmnet with OS bookworm executed with errors: - wikikube-worker2129 (**FA... [13:32:58] 06serviceops, 06Content-Transform-Team-WIP, 06Wikipedia-Android-App-Backlog, 13Patch-For-Review: Timeout errors when making requests to Firebase for push notifications - https://phabricator.wikimedia.org/T379647#10316574 (10Jgiannelos) [13:33:29] 06serviceops, 06Content-Transform-Team-WIP, 06Wikipedia-Android-App-Backlog, 13Patch-For-Review: Timeout errors when making requests to Firebase for push notifications - https://phabricator.wikimedia.org/T379647#10316578 (10Jgiannelos) a:03Jgiannelos [14:15:26] 06serviceops: wikikube-worker21[28-35] implementation tracking - https://phabricator.wikimedia.org/T377008#10316798 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikikube-worker2128.codfw.wmnet with OS bookworm executed with errors: - wikikube-worker2128 (**FA... [14:33:29] 06serviceops: wikikube-worker21[28-35] implementation tracking - https://phabricator.wikimedia.org/T377008#10316909 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host wikikube-worker2128.codfw.wmnet with OS bookworm [14:37:03] 06serviceops: wikikube-worker21[28-35] implementation tracking - https://phabricator.wikimedia.org/T377008#10316937 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikikube-worker2130.codfw.wmnet with OS bookworm executed with errors: - wikikube-worker2130 (**FA... [14:37:08] 06serviceops: wikikube-worker21[28-35] implementation tracking - https://phabricator.wikimedia.org/T377008#10316940 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikikube-worker2131.codfw.wmnet with OS bookworm executed with errors: - wikikube-worker2131 (**FA... [14:52:40] Hi, we have a connection to a replica that's not going away, I want to debug this further but the IP is wikikube (I think). Back then I could have just done ps aux in mwmaint [14:52:46] | 3049098781 | wikiadmin2023 | 10.194.147.226:42062 | commonswiki | Sleep | 931 | | NULL [14:54:13] Is there a way to figure out the container connecting from "10.194.147.226:42062"? [14:54:51] btw, I'm sure this is a maint script, since it's connecting via wikiadmin user (not wikiuser) [14:55:17] mw-script.codfw.6497ohz1-rgwvg [14:55:31] unfortunately reverse DNS doesn't work because that's just 'endpoints' (listening servers) atm [14:56:31] Amir1: Command: [14:56:33] /usr/bin/php [14:56:35] Args: [14:56:37] /srv/mediawiki/multiversion/MWScript.php [14:56:39] extensions/TimedMediaHandler/maintenance/requeueTranscodes.php [14:56:41] --wiki=commonswiki [14:56:43] --throttle [14:56:45] --video [14:56:47] --key=144p.mjpeg.mov [14:56:49] --missing [14:56:51] State: Running [14:56:53] Started: Tue, 08 Oct 2024 23:40:22 +0000 [14:57:10] 06serviceops: wikikube-worker21[28-35] implementation tracking - https://phabricator.wikimedia.org/T377008#10317093 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host wikikube-worker2129.codfw.wmnet with OS bookworm [14:57:43] cdanis: Thanks <3 Would you mind giving me a how-to? I'm sure this won't be the last time a maint script avoids reloading the config [14:58:31] Amir1: do you already know any kubectl? just want to be able to give the best answer :) [14:59:02] I have used kubectl a bit [14:59:11] but honestly don't know much [15:02:05] oh wow there's actually more than that there even :D [15:02:10] Amir1: https://phabricator.wikimedia.org/P71034 [15:02:34] in Labels you can see username=bvibber and in Annotations you can see comment: T363966 [15:02:48] thank you <3 [15:02:50] <3 [15:02:57] you're the best [15:02:58] you can also peek the logs if you like [15:03:06] nah, that's enough [15:03:17] kubectl logs -n mw-script mw-script.codfw.6497ohz1-rgwvg mediawiki-6497ohz1-app [15:03:46] in case of emergencies, how should I kill a pod? [15:04:00] the least terrible way [15:04:33] https://wikitech.wikimedia.org/wiki/Maintenance_scripts#Interacting_with_jobs [15:04:46] you will actually need to kill the job, not the pod [15:04:56] but! any running pod points back to its job [15:05:18] in Labels there you can also see job-name=mw-script.codfw.6497ohz1 [15:05:59] and so for that one you'd do `kubectl -n mw-script delete job mw-script.codfw.6497ohz1` [15:07:08] ah, thanks <3 [15:12:19] 06serviceops: wikikube-worker21[28-35] implementation tracking - https://phabricator.wikimedia.org/T377008#10317174 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host wikikube-worker2130.codfw.wmnet with OS bookworm [15:14:37] 06serviceops: wikikube-worker21[28-35] implementation tracking - https://phabricator.wikimedia.org/T377008#10317194 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikikube-worker2128.codfw.wmnet with OS bookworm completed: - wikikube-worker2128 (**PASS**) - R... [15:15:39] 06serviceops: wikikube-worker21[28-35] implementation tracking - https://phabricator.wikimedia.org/T377008#10317200 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host wikikube-worker2131.codfw.wmnet with OS bookworm [15:40:05] 06serviceops: wikikube-worker21[28-35] implementation tracking - https://phabricator.wikimedia.org/T377008#10317338 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikikube-worker2129.codfw.wmnet with OS bookworm completed: - wikikube-worker2129 (**PASS**) - R... [15:44:09] 06serviceops: wikikube-worker21[28-35] implementation tracking - https://phabricator.wikimedia.org/T377008#10317358 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host wikikube-worker2132.codfw.wmnet with OS bookworm [15:45:03] 06serviceops: wikikube-worker21[28-35] implementation tracking - https://phabricator.wikimedia.org/T377008#10317371 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host wikikube-worker2133.codfw.wmnet with OS bookworm [15:53:25] 06serviceops: wikikube-worker21[28-35] implementation tracking - https://phabricator.wikimedia.org/T377008#10317457 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikikube-worker2130.codfw.wmnet with OS bookworm completed: - wikikube-worker2130 (**PASS**) - R... [15:57:00] 06serviceops: wikikube-worker21[28-35] implementation tracking - https://phabricator.wikimedia.org/T377008#10317495 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikikube-worker2131.codfw.wmnet with OS bookworm completed: - wikikube-worker2131 (**PASS**) - R... [16:06:22] 06serviceops: wikikube-worker21[28-35] implementation tracking - https://phabricator.wikimedia.org/T377008#10317559 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host wikikube-worker2134.codfw.wmnet with OS bookworm [16:07:16] 06serviceops: wikikube-worker21[28-35] implementation tracking - https://phabricator.wikimedia.org/T377008#10317561 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host wikikube-worker2135.codfw.wmnet with OS bookworm [16:24:57] 06serviceops: wikikube-worker21[28-35] implementation tracking - https://phabricator.wikimedia.org/T377008#10317637 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikikube-worker2132.codfw.wmnet with OS bookworm completed: - wikikube-worker2132 (**PASS**) - D... [16:29:47] 06serviceops: wikikube-worker21[28-35] implementation tracking - https://phabricator.wikimedia.org/T377008#10317681 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikikube-worker2133.codfw.wmnet with OS bookworm completed: - wikikube-worker2133 (**PASS**) - D... [16:48:36] 06serviceops: wikikube-worker21[28-35] implementation tracking - https://phabricator.wikimedia.org/T377008#10317789 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikikube-worker2134.codfw.wmnet with OS bookworm completed: - wikikube-worker2134 (**PASS**) - D... [16:50:56] 06serviceops: wikikube-worker21[28-35] implementation tracking - https://phabricator.wikimedia.org/T377008#10317806 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikikube-worker2135.codfw.wmnet with OS bookworm completed: - wikikube-worker2135 (**PASS**) - D... [17:22:08] 06serviceops, 10Prod-Kubernetes, 07Kubernetes: Cookbook to roll-reimage k8s nodes - https://phabricator.wikimedia.org/T377857#10317956 (10kamila) 05Open→03In progress [17:23:49] 06serviceops: wikikube-worker21[28-35] implementation tracking - https://phabricator.wikimedia.org/T377008#10317962 (10ops-monitoring-bot) pool host wikikube-worker[2128-2135].codfw.wmnet by cgoubert@cumin1002 with reason: New nodes [17:23:52] 06serviceops: wikikube-worker21[28-35] implementation tracking - https://phabricator.wikimedia.org/T377008#10317963 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.pool-depool-node started by cgoubert@cumin1002 pool for host wikikube-worker[2128-2135].codfw.wmnet completed: - wikikube-worker[2128-2135].codfw.w... [17:25:56] 06serviceops: Decommission kubernetes20[07-14].codfw.wmnet - https://phabricator.wikimedia.org/T379788 (10Clement_Goubert) 03NEW [17:27:08] 06serviceops: wikikube-worker21[28-35] implementation tracking - https://phabricator.wikimedia.org/T377008#10317986 (10Clement_Goubert) 05Open→03Resolved a:03Clement_Goubert Nodes imaged and pooled, resolving. Decom of refreshed hosts will be tracked in T379788 [17:39:28] 06serviceops, 10Thumbor: Thumbor haproxy readiness check isn't failing on unhealthy pods - https://phabricator.wikimedia.org/T379561#10318072 (10hnowlan) 05In progress→03Resolved [17:39:51] 06serviceops, 06Infrastructure-Foundations, 10netops, 07Kubernetes: Reimage one of the wikikube-worker1240 to wikikube-worker1304 node in eqiad as a replacement for wikikube-ctrl1001 - https://phabricator.wikimedia.org/T379790 (10akosiaris) 03NEW [17:45:28] 06serviceops, 10Release Pipeline, 06SRE, 07Epic, 10Release-Engineering-Team (Seen): Migrate production services to kubernetes using the pipeline - https://phabricator.wikimedia.org/T198901#10318127 (10akosiaris) [17:49:13] 06serviceops, 10Release Pipeline, 06SRE, 07Epic, 10Release-Engineering-Team (Seen): Migrate production services to kubernetes using the pipeline - https://phabricator.wikimedia.org/T198901#10318162 (10akosiaris) 05Open→03Resolved a:03akosiaris Everything that was in scope has been migrated. Whi... [17:53:43] 06serviceops, 06Data-Persistence, 10Prod-Kubernetes: Reevaluate the requirement for dedicated sessionstore/kask nodes in wikikube clusters - https://phabricator.wikimedia.org/T379599#10318225 (10akosiaris) The best I could find is T220821 and T221986. As far as I am concerned, the approach we took back the... [18:18:44] 06serviceops: Consider lifting AssembleUploadChunks and PublishStashedFile out of the low-traffic consumer - https://phabricator.wikimedia.org/T379035#10318390 (10Scott_French) Thank you very much for the second opinion and the reviews on the patches, @Joe! Since ~ 17:50 UTC today, we've been processing all thr... [18:18:51] 06serviceops: Consider lifting AssembleUploadChunks and PublishStashedFile out of the low-traffic consumer - https://phabricator.wikimedia.org/T379035#10318392 (10Scott_French) p:05Triage→03High [18:18:59] 06serviceops: Consider lifting AssembleUploadChunks and PublishStashedFile out of the low-traffic consumer - https://phabricator.wikimedia.org/T379035#10318393 (10Scott_French) 05Open→03In progress [18:24:07] I've never used scap3 before and I feel like I'm horribly missing like five different things [18:41:23] 06serviceops, 06Infrastructure-Foundations, 10netops, 07Kubernetes: Reimage one of the wikikube-worker1240 to wikikube-worker1304 node in eqiad as a replacement for wikikube-ctrl1001 - https://phabricator.wikimedia.org/T379790#10318533 (10cmooney) Polling Netbox to find what switch each of those are connec... [19:18:35] 06serviceops, 06Commons, 10MediaWiki-Uploading, 10UploadWizard: Repeated UploadWizard failures: "Server did not respond in time" - https://phabricator.wikimedia.org/T379462#10318739 (10Scott_French) 05Open→03Resolved Alright, as noted in T379035#10318390, since ~ 17:50 UTC today all three job types... [19:28:51] 06serviceops, 13Patch-For-Review: Turn up PHP 8.1-flavored mw-debug k8s deployment - https://phabricator.wikimedia.org/T372604#10318811 (10Scott_French) 05Stalled→03In progress The mwdebug-next deployments are now running 8.1 and pass the "standard" suite of httpbb checks that we use to validate deployment... [20:01:58] 06serviceops, 13Patch-For-Review: Monitoring to surface "low-traffic" jobs isolation failure - https://phabricator.wikimedia.org/T378609#10319064 (10Scott_French) p:05Low→03High [20:02:07] 06serviceops, 10FlaggedRevs, 10WMF-JobQueue: Spike in JobQueue job backlog time (500ms -> 4-8 minutes) - https://phabricator.wikimedia.org/T378385#10319062 (10Scott_French) a:05kostajh→03Scott_French Since the three job types critical to uploads have now been moved to dedicated consumers (T379035), the p... [20:02:08] 06serviceops, 13Patch-For-Review: Monitoring to surface "low-traffic" jobs isolation failure - https://phabricator.wikimedia.org/T378609#10319065 (10Scott_French) 05Open→03In progress [20:13:54] does anyone have a quick recipe for running a tcpdump in a pod netns [20:14:02] I could have sworn there was one on wikitech but I can't find it [20:17:40] this is gonna be some horrid golang TLS version mismatch thing, I can feel it [20:34:55] K$ [20:35:31] sudo nsenter -t pid -n tcpdump -ni [20:36:15] Finding the pid can be done with docker top containername [20:36:32] Not sure for containerd [20:41:52] thanks akosiaris [20:42:15] I got a recipe from some rando https://gist.github.com/johscheuer/dc20988895d6fddfd057e221d47587d3 [20:43:00] you can just do e.g. `sudo nerdctl top e851f534c871` [20:43:05] to get a pid [20:43:55] and then `nsenter -t $pid -n` works as usual