[01:07:06] 10Traffic: purgeList.php does not seem to work in Wikimedia production - https://phabricator.wikimedia.org/T292810 (10Tgr) [02:02:00] 10Traffic, 10SRE: purgeList.php does not seem to work in Wikimedia production - https://phabricator.wikimedia.org/T292810 (10Krinkle) https://wikitech.wikimedia.org/wiki/How_to_deploy_code#Changing_files_in_/static https://wikitech.wikimedia.org/wiki/Backport_windows/Deployers#Purging TLDR: Purge via en.wikip... [02:35:51] 10Traffic, 10SRE: purgeList.php does not seem to work in Wikimedia production - https://phabricator.wikimedia.org/T292810 (10Tgr) 05Open→03Invalid D'oh, thanks. [07:17:24] 10Traffic, 10SRE, 10Patch-For-Review, 10User-ema: Experiment with single backend CDN nodes - https://phabricator.wikimedia.org/T288106 (10ema) [07:24:44] 10Traffic, 10SRE, 10User-ema: Package and deploy Varnish 6.0.8 - https://phabricator.wikimedia.org/T292290 (10ema) [07:25:02] 10Traffic, 10SRE, 10SRE Observability (FY2021/2022-Q2), 10User-ema: Investigate cp5006 crash - https://phabricator.wikimedia.org/T292506 (10ema) [07:25:13] 10Traffic, 10Observability-Alerting, 10SRE, 10User-ema: Prometheus Varnish exporter alert: add runbook and link to dashboard - https://phabricator.wikimedia.org/T289974 (10ema) [07:56:34] 10Traffic, 10User-ema: ATS should alert if the number of total or active connections reached maximum - https://phabricator.wikimedia.org/T292815 (10ema) [08:20:23] 10Traffic, 10SRE: Wikipedia not accessible in Russia on 2021-10-07 16:00-17:00UTC - https://phabricator.wikimedia.org/T292776 (10Zemant) its OK now! [08:25:26] 10Traffic, 10SRE, 10SRE Observability, 10User-ema: Multiple ATS HTTP2 stats missing from Prometheus - https://phabricator.wikimedia.org/T292817 (10ema) [08:25:33] 10Traffic, 10SRE, 10User-ema: ATS should alert if the number of total or active connections reached maximum - https://phabricator.wikimedia.org/T292815 (10ema) [09:02:15] 10Traffic, 10SRE, 10User-ema: Create runbook for VarnishTrafficDrop alert, change dashboard link - https://phabricator.wikimedia.org/T292820 (10ema) [09:08:51] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Traffic Engineering for Anycast Ranges - https://phabricator.wikimedia.org/T288843 (10ayounsi) I currently assume that: * IX peers are mostly local, so no special care needs to happen to them ** If this happens to be incorrect we could inv... [09:13:50] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Traffic Engineering for Anycast Ranges - https://phabricator.wikimedia.org/T288843 (10jbond) lgtm > using the NO-EXPORT BGP community (most likely not supported by many peers) FYI i have had a good experience using no-export at IX's, i.e.... [09:31:13] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Traffic Engineering for Anycast Ranges - https://phabricator.wikimedia.org/T288843 (10cmooney) >> IX peers are mostly local, so no special care needs to happen to them >> >> - If this happens to be incorrect we could investigate not sendin... [11:53:14] 10Traffic, 10SRE, 10User-ema: Create runbook for VarnishTrafficDrop alert, change dashboard link - https://phabricator.wikimedia.org/T292820 (10ema) p:05Triage→03Medium [11:53:32] 10Traffic, 10SRE, 10SRE Observability, 10User-ema: Multiple ATS HTTP2 stats missing from Prometheus - https://phabricator.wikimedia.org/T292817 (10ema) p:05Triage→03Medium [11:53:45] 10Traffic, 10SRE, 10User-ema: ATS should alert if the number of total or active connections reached maximum - https://phabricator.wikimedia.org/T292815 (10ema) p:05Triage→03High [13:56:11] 10Traffic, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Anycast: Add IPv6 support to bird and anycast-healthchecker (Puppet) - https://phabricator.wikimedia.org/T292737 (10ssingh) The Puppet change has been merged but I am going to keep this open in case @ayounsi feels that there is something e... [17:00:28] 10HTTPS, 10Traffic, 10SRE, 10Documentation, 10Performance-Team (Radar): TLS certificates renewal process - https://phabricator.wikimedia.org/T196248 (10BBlack) 05Open→03Resolved a:03BBlack Added a section to https://wikitech.wikimedia.org/wiki/HTTPS about renewal which mentions aging out new manual... [17:02:25] 10HTTPS, 10Traffic, 10SRE, 10Security: Investigate our mitigation strategy for HTTPS response length attacks - https://phabricator.wikimedia.org/T92298 (10BBlack) [17:02:36] 10Traffic, 10SRE, 10Goal, 10Performance-Team (Radar), 10Wikimedia-Incident: Support TLSv1.3 - https://phabricator.wikimedia.org/T170567 (10BBlack) 05Open→03Resolved a:03Vgutierrez TLSv1.3 has been working for quite some time! Any other issues should be in other tickets (and are, in some cases!). [17:17:24] 10Traffic, 10SRE, 10Patch-For-Review: Cleanup after varnish-be -> ats-be migration - https://phabricator.wikimedia.org/T241239 (10BBlack) 05Open→03Resolved a:03ema @ema I'm going to assume we're done with all the easy cleanups here. There's one un-merged patch on this at https://gerrit.wikimedia.org/r... [17:19:03] 10Traffic, 10SRE, 10Performance-Team (Radar): User traffic sometimes gets HTTP 502 from ATS - https://phabricator.wikimedia.org/T239382 (10BBlack) 05Open→03Declined Declining this one, as whatever this was, the report is now ~2 years old and everything related has changed or been refined substantially si... [17:31:16] 10Traffic, 10SRE, 10Performance-Team (Radar), 10Sustainability (MediaWiki-MultiDC): Create HTTP verb and sticky cookie DC routing in VCL - https://phabricator.wikimedia.org/T91820 (10Krinkle) [17:31:46] 10Traffic, 10SRE, 10Performance-Team (Radar), 10Sustainability (MediaWiki-MultiDC): Create HTTP verb and sticky cookie DC routing in VCL - https://phabricator.wikimedia.org/T91820 (10Krinkle) [17:48:23] 10Traffic, 10Performance-Team, 10SRE: Enable webp thumbnails on all images for non-Commons wikis - https://phabricator.wikimedia.org/T269946 (10Krinkle) [17:48:29] 10Traffic, 10Performance-Team, 10SRE, 10SRE-swift-storage, 10Patch-For-Review: Automatically clean up unused thumbnails in Swift - https://phabricator.wikimedia.org/T211661 (10Krinkle) [17:59:49] 10Traffic, 10netops, 10Infrastructure-Foundations, 10Pybal, 10SRE: Rename lvs* LLDP port descriptions after upgrading to stretch - https://phabricator.wikimedia.org/T192087 (10BBlack) [17:59:59] 10Traffic, 10netops, 10Infrastructure-Foundations, 10SRE: cr1-eqsin 4 onboard interfaces down - https://phabricator.wikimedia.org/T193897 (10BBlack) [18:00:06] 10Traffic, 10netops, 10Infrastructure-Foundations, 10SRE: Aug 28th: turn off 1/3 esams-knams lasers in advance of Relined PA-988002 maintenance - https://phabricator.wikimedia.org/T230448 (10BBlack) [18:00:14] 10Traffic, 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Interface errors on asw-d-codfw:xe-2/0/47 - https://phabricator.wikimedia.org/T193677 (10BBlack) [18:00:26] 10Traffic, 10netops, 10Infrastructure-Foundations, 10SRE, 10observability: Network port utilization alerts should be paging - https://phabricator.wikimedia.org/T224888 (10BBlack) [18:00:34] 10Traffic, 10netops, 10Infrastructure-Foundations, 10SRE: BGP: Investigate isolating codfw and eqiad - https://phabricator.wikimedia.org/T246721 (10BBlack) [18:00:44] 10Traffic, 10netops, 10Infrastructure-Foundations, 10SRE: Wikimedia projects not reachable for some Telecom Italia users - https://phabricator.wikimedia.org/T262869 (10BBlack) [18:00:48] 10Traffic, 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Remove multicast - https://phabricator.wikimedia.org/T257573 (10BBlack) [18:01:03] sorry for the spam, these are all just column updates deep in the weeds of things, and column-changes aren't on the menu for bulk silent actions [18:01:22] 10Traffic, 10netops, 10Infrastructure-Foundations, 10SRE, 10Goal: Increase network capacity (2018-19 Q2 Goal) - https://phabricator.wikimedia.org/T207668 (10BBlack) [18:01:28] 10Traffic, 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: ulsfo <-> codfw transit link flapping causing nginx availability alerts - https://phabricator.wikimedia.org/T219591 (10BBlack) [18:01:39] 10Traffic, 10netops, 10Infrastructure-Foundations, 10SRE, 10Wikimedia-Incident: Configure interface damping on primary links - https://phabricator.wikimedia.org/T196432 (10BBlack) [18:01:45] 10Traffic, 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Free up 185.15.59.0/24 - https://phabricator.wikimedia.org/T211254 (10BBlack) [18:01:51] 10Traffic, 10netops, 10Infrastructure-Foundations, 10SRE: IPv6 ~20ms higher ping than IPv4 to gerrit - https://phabricator.wikimedia.org/T211079 (10BBlack) [18:02:08] 10Traffic, 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: cp intermittent IPsec MTU issue - https://phabricator.wikimedia.org/T195365 (10BBlack) [18:02:18] 10Traffic, 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-ulsfo: troubleshoot cr3/cr4 link - https://phabricator.wikimedia.org/T196030 (10BBlack) [18:02:28] 10Traffic, 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Offload pings to dedicated server - https://phabricator.wikimedia.org/T190090 (10BBlack) [18:02:34] 10Traffic, 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: eqiad row D switch upgrade - https://phabricator.wikimedia.org/T172459 (10BBlack) [18:30:31] 10Traffic, 10SRE: Sudden surge of requests to https://wikipedia.org/ from Telus customers - https://phabricator.wikimedia.org/T276213 (10BBlack) 05Open→03Declined Feel free to reopen/link if this is useful in a future investigation! [18:33:39] 10Traffic, 10netops, 10Infrastructure-Foundations, 10SRE: Register as14907 dot net (or other similar domain) for network infra concerns - https://phabricator.wikimedia.org/T292866 (10CDanis) [18:35:59] 10netops, 10DNS, 10Infrastructure-Foundations, 10SRE, and 2 others: Cloud: define relationship between wikimediacloud.org domain, CIDR prefixes and netbox automation - https://phabricator.wikimedia.org/T266331 (10BBlack) [18:41:11] 10Traffic, 10netops, 10Data-Services, 10Infrastructure-Foundations, and 2 others: wikireplicas last-minute infra work to discuss / resolve - https://phabricator.wikimedia.org/T273248 (10BBlack) 05Open→03Resolved a:03ayounsi [18:51:29] 10Traffic, 10SRE, 10Patch-For-Review: cp_upload @ eqsin cascading failures, February 2021 - https://phabricator.wikimedia.org/T274888 (10BBlack) [18:54:08] 10netops, 10Infrastructure-Foundations, 10SRE: TATA SKY Broadband (AS134674) issues with connecting to upload.wikimedia.org - https://phabricator.wikimedia.org/T275234 (10BBlack) Removing #Traffic as I don't think this looks actionable for our team (but might still be for netops if the conversations above ar... [18:54:42] that TATA SKY update scared me for a second; I thought our friends were back [19:00:58] 10Traffic, 10netops, 10Infrastructure-Foundations, 10SRE, 10procurement: drmrs: primary software task - https://phabricator.wikimedia.org/T282788 (10BBlack) [19:02:43] yeah unless there's some actual human commentary for me following up in some specific way, assume all the changes with (BBlack) are just meta-updates for organizing/cleanup purposes [19:02:54] (I am closing/moving/detagging a few in clearer cases, though) [19:09:57] 10Traffic, 10Cloud-Services, 10DNS, 10SRE: PDNS in cloud can return inconsistent answers - https://phabricator.wikimedia.org/T281700 (10BBlack) As noted in the description, DNS is inconsistent in general within reasonable TTL bounds, so I don't see resolving the inconsistency being shown here as a good rea... [19:11:40] 10Traffic, 10netops, 10Infrastructure-Foundations, 10SRE: externally-hosted NEL report forwarders for more timely report reception - https://phabricator.wikimedia.org/T292870 (10CDanis) p:05Triage→03Low [19:11:58] 10Traffic, 10netops, 10Infrastructure-Foundations, 10SRE: externally-hosted NEL report forwarders for more timely report reception - https://phabricator.wikimedia.org/T292870 (10CDanis) [19:21:12] 10Traffic, 10MediaWiki-General, 10Pybal, 10SRE, and 2 others: SELECT query arriving to wikidatawiki db codfw hosts causing pile ups during schema change - https://phabricator.wikimedia.org/T284981 (10BBlack) We chose S:BP for those queries on the assumption that, by its nature, it would be a cheap page to... [19:24:39] 10Traffic, 10netops, 10Infrastructure-Foundations, 10SRE: externally-hosted NEL report forwarders for more timely report reception - https://phabricator.wikimedia.org/T292870 (10CDanis) [19:25:54] 10Traffic, 10SRE: cp3059 Varnish child crash: Worker Pool Queue does not move - https://phabricator.wikimedia.org/T285953 (10BBlack) 05Open→03Resolved a:03BBlack We have a new varnish version coming soon, so stale crash reports are probably of little value now. [19:34:04] 10Traffic, 10Analytics, 10Analytics-Kanban: Review use of realloc in varnishkafka - https://phabricator.wikimedia.org/T287561 (10BBlack) The last time I looked at the patches, I was a bit baffled and left it alone. It's not clear that there's any active issue affecting us that this will solve, and these kin... [19:34:26] bblack: I kind of enjoy that while you're triaging and de-tagging and cleaning things up, I'm filing some wildly-speculative stuff that has been in the back of my mind for a while :) [19:35:32] please don't file any for us if you can help it. I picked Friday because I was hoping not much else would move within our tag while I was working :) [19:36:24] (or well, it's fine to do so too, but be aware they'll likely move quickly to the Icebox [19:36:27] ) [19:37:18] we're taking a stab at moving away from phab as the one tool for all information flow - keeping phab to just active planned/incidental work, and moving the speculative/future-planning stuff Elsewhere [19:37:31] cool! [19:37:42] the stuff I just filed can probably go on a 'radar' tag if you have one [19:38:02] (but for now, all the existing speculative stuff is moving (along with lots of other stuff) to our Icebox for now, and then it will be gathered up and organized Elsewhere later) [19:38:04] more of an FYI for Traffic (plus ~some work required ~eventually, but probably just agreement on plans and then config code reviews?) [19:38:06] cool [19:38:26] Icebox sgtm [19:42:14] 10Traffic, 10SRE: DNS Discovery for active/passive failover within a data centre - https://phabricator.wikimedia.org/T287584 (10BBlack) 05Open→03Declined Given your generous offer of declination, I think we'll take that route! :) In general, our DNS Discovery stuff really is meant to handle x-dc situation... [19:53:48] 10Traffic, 10CirrusSearch, 10Discovery-Search, 10Infrastructure-Foundations, and 6 others: Half a million of CirrusSearch jobqueue execution errors per hour since 2021-09-30 16:02 - https://phabricator.wikimedia.org/T292291 (10BBlack) Update on the ca-certificates end of this: Debian has a patch that will... [19:55:51] 10Traffic, 10Inuka-Team, 10KaiOS-Wikipedia-app, 10SRE: Many KaiOS devices can't access WMF websites and can't use Wikipedia app - https://phabricator.wikimedia.org/T292632 (10BBlack) 05Open→03Resolved a:03Vgutierrez Closing for now as I don't think there's anything we want to do on our end here. Tha... [19:55:55] 10Traffic, 10SRE: Let's Encrypt issuance chains update - https://phabricator.wikimedia.org/T283164 (10BBlack) [21:35:37] 10Traffic, 10CirrusSearch, 10Discovery-Search, 10Infrastructure-Foundations, and 6 others: Half a million of CirrusSearch jobqueue execution errors per hour since 2021-09-30 16:02 - https://phabricator.wikimedia.org/T292291 (10Legoktm) >>! In T292291#7413420, @BBlack wrote: > Update on the ca-certificates...