[12:56:40] (VarnishHighThreadCount) firing: Varnish's thread count on cp6003:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/wiU3SdEWk/cache-host-drilldown?viewPanel=99&var-site=drmrs&var-instance=cp6003 - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [12:57:33] 06Traffic, 06Data Products, 06Data-Engineering, 10Observability-Logging, 13Patch-For-Review: Move analytics log from Varnish to HAProxy - https://phabricator.wikimedia.org/T351117#9688020 (10CodeReviewBot) gmodena opened https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/64... [13:00:26] 10netops, 06Infrastructure-Foundations: eqiad-drmrs transport down (April 2024) - https://phabricator.wikimedia.org/T361825 (10ayounsi) 03NEW [13:00:34] 10netops, 06Infrastructure-Foundations: eqiad-drmrs transport down (April 2024) - https://phabricator.wikimedia.org/T361825#9688138 (10ops-monitoring-bot) ===== Automated diagnostic for Netbox circuit ID 108 --- **Interface cr1-drmrs:xe-0/1/2** - admin-status: up - ⚠️ oper-status: down - interface-flapped:... [13:01:10] 10netops, 06Infrastructure-Foundations: eqiad-drmrs transport down (April 2024) - https://phabricator.wikimedia.org/T361825#9688165 (10ayounsi) Emailed Telxius NOC. [13:01:40] (VarnishHighThreadCount) firing: Varnish's thread count on cp6003:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/wiU3SdEWk/cache-host-drilldown?viewPanel=99&var-site=drmrs&var-instance=cp6003 - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [13:09:33] 06Traffic, 06Content-Transform-Team-WIP, 10Mobile-Content-Service, 10RESTBase Sunsetting, and 3 others: 14Setup allowed list for MCS decom - 14https://phabricator.wikimedia.org/T340036#9688380 (10akosiaris) 14I guess it's about time I ask if it is ok to remove those exceptions now and return 403 to ev... [13:16:44] 06Traffic, 06Data Products, 06Data-Engineering, 10Observability-Logging, 13Patch-For-Review: Move analytics log from Varnish to HAProxy - https://phabricator.wikimedia.org/T351117#9688466 (10gmodena) >>! In T351117#9638903, @Fabfur wrote: > @gmodena you should have some more data to play with now, while... [14:01:40] (VarnishHighThreadCount) firing: (2) Varnish's thread count on cp6003:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/wiU3SdEWk/cache-host-drilldown?viewPanel=99&var-site=drmrs&var-instance=cp6003 - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [14:10:48] Hi team! As part of the WDQS graph split, I was tasked with taking recording some traffic coming from a specific public IP, going to query.wikimedia.org, in the form of a pcap. [14:10:48] The goal is to have some (timestamp, src_ip, src_port, dst_ip, dst_port) data to the hosting provided managing that IP, so they could identify the customer. [14:10:48] Would you have a recommendation as to where I could run that probe, and possible do-s and don't-s? Thanks! [14:11:02] *to send to the hosting provider [14:15:03] brouberol: hmm that's an interesting one [14:15:16] Host query.wikimedia.org not found: 3(NXDOMAIN) [14:15:20] you can find this information on Turnilo though? [14:15:36] XioNoX: wikdata [14:16:40] ah, right, yeah it's on the CDN, so not for my layer :) [14:16:50] XioNoX: yeah :) [14:16:59] pcap would be tough above layer 4 [14:18:09] turnilo is a good bet I think, but it's sampled [14:18:31] I think it depends on what we are trying to do here but yeah, brouberol, feel free to DM and discuss if that's easier [14:18:55] otherwise saving packets from that single IP in bulk, then processing it half-manually [14:21:40] (VarnishHighThreadCount) resolved: Varnish's thread count on cp6003:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/wiU3SdEWk/cache-host-drilldown?viewPanel=99&var-site=drmrs&var-instance=cp6003 - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [14:47:57] 06Traffic, 10Observability-Logging: Add metrics to Benthos - https://phabricator.wikimedia.org/T361845 (10Fabfur) 03NEW [14:49:29] 10netops, 06DC-Ops, 06Infrastructure-Foundations: Take advantage of 10Gb NICs in the new network stack - https://phabricator.wikimedia.org/T360297#9688907 (10cmooney) @ayounsi thanks for the patch! LGTM. Unfortunately I think the approach might not suit in a lot of cases, due to the Trident 3 port-block re... [15:45:23] 10netops, 06Infrastructure-Foundations: eqiad-drmrs transport down (April 2024) - https://phabricator.wikimedia.org/T361825#9689177 (10ops-monitoring-bot) ===== Automated diagnostic for Netbox circuit ID 108 --- **Interface cr1-drmrs:xe-0/1/2** - admin-status: up - ⚠️ oper-status: down - interface-flapped:... [16:04:35] 06Traffic, 06DC-Ops, 10ops-esams, 06SRE, 13Patch-For-Review: esams text cp nvme upgrade - https://phabricator.wikimedia.org/T360430#9689289 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by fabfur@cumin1002 for host cp3068.esams.wmnet with OS bullseye [16:56:00] 06Traffic, 06DC-Ops, 10ops-esams, 06SRE, 13Patch-For-Review: esams text cp nvme upgrade - https://phabricator.wikimedia.org/T360430#9689603 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by fabfur@cumin1002 for host cp3068.esams.wmnet with OS bullseye completed: - cp3068 (**PASS**)... [17:11:27] 06Traffic, 06DC-Ops, 10ops-esams, 06SRE, 13Patch-For-Review: esams text cp nvme upgrade - https://phabricator.wikimedia.org/T360430#9689660 (10Fabfur) [17:18:00] 06Traffic, 06DC-Ops, 10ops-codfw, 10ops-eqiad, 10SRE-swift-storage: Reimage cookbook on new eqiad hosts stuck at PXE booting - https://phabricator.wikimedia.org/T350179#9689708 (10ssingh) Traffic has been reimaging hosts in esams (we have done three so far for T360430) and we observed that we didn't have... [18:28:14] 06Traffic, 06DC-Ops, 10ops-codfw, 10ops-eqiad, 10SRE-swift-storage: Reimage cookbook on new eqiad hosts stuck at PXE booting - https://phabricator.wikimedia.org/T350179#9690061 (10ssingh) Update: I ran the firmware-upgrade cookbook on cp4052 and updated it's firmware to `6.10.30.20`, did a `racreset` to... [18:36:31] 06Traffic, 06DC-Ops, 10ops-codfw, 10ops-eqiad, 10SRE-swift-storage: Reimage cookbook on new eqiad hosts stuck at PXE booting - https://phabricator.wikimedia.org/T350179#9690121 (10ssingh) Any other opinions/thoughts on how we can try and fix this and where? I am very happy to do the legwork but kind of l... [18:57:41] 06Traffic, 06DC-Ops, 10ops-esams, 06SRE, 13Patch-For-Review: esams text cp nvme upgrade - https://phabricator.wikimedia.org/T360430#9690201 (10RobH) [19:24:08] 10netops, 06Infrastructure-Foundations, 10ops-codfw: codfw: use old asw switches from row A and B as msw switches in row C and D - https://phabricator.wikimedia.org/T361871 (10Papaul) 03NEW [20:38:59] 06Traffic, 06Data-Engineering-Radar, 13Patch-For-Review: Lock-in Varnish and VarnishKafka versions - https://phabricator.wikimedia.org/T304617#9690726 (10CodeReviewBot) brett opened https://gitlab.wikimedia.org/repos/sre/varnishkafka/-/merge_requests/4 Release 1.1.0-4 [20:47:23] 06Traffic, 06collaboration-services: 14Consider separating Gitlab code management and deb building management - 14https://phabricator.wikimedia.org/T357719#9690739 (10BCornwall) 14@LSobanski This is still quite painful. There's so much drift all over the branches. There's so much rebasing and cherry-picki... [20:49:23] 06Traffic, 06Data-Engineering-Radar, 13Patch-For-Review: Lock-in Varnish and VarnishKafka versions - https://phabricator.wikimedia.org/T304617#9690746 (10CodeReviewBot) brett merged https://gitlab.wikimedia.org/repos/sre/varnishkafka/-/merge_requests/4 Release 1.1.0-4 [20:49:43] 06Traffic, 06Data-Engineering-Radar, 13Patch-For-Review: Lock-in Varnish and VarnishKafka versions - https://phabricator.wikimedia.org/T304617#9690747 (10CodeReviewBot) brett closed https://gitlab.wikimedia.org/repos/sre/varnishkafka/-/merge_requests/3 Release 1.1.0-4