[00:14:12] above conversation prompted me to look into Vary: header [00:14:24] why does it have a Vary: Authorization ? [00:14:27] (Vary: Accept-Encoding,Cookie,Authorization) [00:21:08] Platonides: I think that's for OAuth2, which uses that header [00:22:08] oh [00:22:40] https://gerrit.wikimedia.org/g/mediawiki/extensions/OAuth/+/337171c10eefcce6da834c4315e300fd26c0156c/src/SessionProvider.php#310 [00:22:41] I wasn't aware it was used for anything, but if OAuth2 uses it, it makes sense [00:23:12] https://www.mediawiki.org/wiki/OAuth/For_Developers#Making_requests_on_the_user's_behalf_2 [00:24:32] it seems overly broad to include that on every page view, though [00:24:39] it seems that only API needs it [06:57:56] (EdgeTrafficDrop) firing: 69% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=codfw&var-cache_type=text - https://alerts.wikimedia.org [07:02:56] (EdgeTrafficDrop) resolved: 69% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=codfw&var-cache_type=text - https://alerts.wikimedia.org [08:04:31] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review, 10Sustainability (Incident Followup): Use next-hop-self for iBGP sessions - https://phabricator.wikimedia.org/T295672 (10ayounsi) As a general note we need to be careful with rolling out config fixes in reaction to unexpected issues. Even... [08:04:56] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review, 10Sustainability (Incident Followup): Use next-hop-self for iBGP sessions - https://phabricator.wikimedia.org/T295672 (10ayounsi) 05Open→03In progress [08:41:35] 10netops, 10Infrastructure-Foundations, 10SRE: Packet Drops on Eqiad ASW -> CR uplinks - https://phabricator.wikimedia.org/T291627 (10ayounsi) [08:41:43] 10netops, 10Infrastructure-Foundations, 10SRE: Create an alert for output discards on network devices - https://phabricator.wikimedia.org/T284593 (10ayounsi) 05In progress→03Resolved This is now set to alert to NOC through alertmanager. Added a quick mention in https://wikitech.wikimedia.org/wiki/Networ... [10:12:56] 10Traffic, 10SRE: Upgrade pybal-test200[23] from Stretch to Buster - https://phabricator.wikimedia.org/T297187 (10ema) [10:14:42] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review, 10Sustainability (Incident Followup): Use next-hop-self for iBGP sessions - https://phabricator.wikimedia.org/T295672 (10cmooney) > To be clear, I agree that your proposal is a good solution however I'm wondering what's most future-proof.... [13:17:28] 10netops, 10Data-Engineering, 10Infrastructure-Foundations, 10SRE, and 2 others: Collect netflow data for internal traffic - https://phabricator.wikimedia.org/T263277 (10JAllemandou) > What would be the next steps? Here is a proposal: # [DE, SRE]Agree on the name of the flow :) Will it be `sflow` (this f... [13:42:04] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review, 10Sustainability (Incident Followup): Use next-hop-self for iBGP sessions - https://phabricator.wikimedia.org/T295672 (10ayounsi) Ok, sounds good to me! [14:02:53] 10netops, 10Data-Engineering, 10Infrastructure-Foundations, 10SRE, and 2 others: Collect netflow data for internal traffic - https://phabricator.wikimedia.org/T263277 (10ayounsi) Sounds good! 1. we can use "internal_flows" (not _netflow as netflow is a protocol). 2. can I start this anytime, or we need to... [14:29:22] 10netops, 10Data-Engineering, 10Infrastructure-Foundations, 10SRE, and 2 others: Collect netflow data for internal traffic - https://phabricator.wikimedia.org/T263277 (10Ottomata) > Agree on the name of the flow : Some guidelines: https://wikitech.wikimedia.org/wiki/Event_Platform/Schemas/Guidelines#Event... [14:31:47] 10netops, 10Data-Engineering, 10Infrastructure-Foundations, 10SRE, and 2 others: Collect netflow data for internal traffic - https://phabricator.wikimedia.org/T263277 (10JAllemandou) >>! In T263277#7552972, @ayounsi wrote: > Sounds good! > 1. we can use "internal_flows" (not _netflow as netflow is a proto... [14:37:51] 10netops, 10Infrastructure-Foundations, 10SRE, 10SRE Observability (FY2021/2022-Q2), 10Sustainability (Incident Followup): Alert that should have paged via VictorOps was delayed because of partial networking outage - https://phabricator.wikimedia.org/T294166 (10herron) [14:40:21] 10netops, 10Data-Engineering, 10Infrastructure-Foundations, 10SRE, and 2 others: Collect netflow data for internal traffic - https://phabricator.wikimedia.org/T263277 (10Ottomata) > can I start this anytime, or we need to create the kafka topic somewhere? Not really needed, unless you need to set special t... [15:03:41] (EdgeTrafficDrop) firing: 37% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=codfw&var-cache_type=text - https://alerts.wikimedia.org [15:08:41] (EdgeTrafficDrop) resolved: (2) 59% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://alerts.wikimedia.org [15:35:47] 10netops, 10Data-Engineering, 10Infrastructure-Foundations, 10SRE, and 2 others: Collect netflow data for internal traffic - https://phabricator.wikimedia.org/T263277 (10ayounsi) `internal_network_flows` works, `network.flows.internal` too. @Ottomata indeed we do have restriction on the producer side (it'... [15:55:14] 10netops, 10Data-Engineering, 10Infrastructure-Foundations, 10SRE, and 2 others: Collect netflow data for internal traffic - https://phabricator.wikimedia.org/T263277 (10BTullis) In case it helps, I came across this abandoned change from 2020: https://gerrit.wikimedia.org/r/c/schemas/event/secondary/+/6080... [18:20:05] 10HTTPS, 10SRE, 10Traffic-Icebox: HTTPS for internal service traffic - https://phabricator.wikimedia.org/T108580 (10Majavah) [18:20:45] 10HTTPS, 10Cloud-Services, 10SRE, 10Traffic-Icebox, 10cloud-services-team (Kanban): cloudweb2001-dev: add TLS termination - https://phabricator.wikimedia.org/T263829 (10Majavah) 05Open→03Resolved a:03Majavah [18:21:11] I just closed https://phabricator.wikimedia.org/T263829, meaning that all ATS->backend connections are made using tls \o/ [18:21:31] I guess we can close https://phabricator.wikimedia.org/T210411 and https://phabricator.wikimedia.org/T108580 now? [18:23:03] If that was really the last stop on T108580 that's amazing [18:23:04] T108580: HTTPS for internal service traffic - https://phabricator.wikimedia.org/T108580 [18:26:57] (EdgeTrafficDrop) firing: 30% request drop in text@eqsin during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=eqsin&var-cache_type=text - https://alerts.wikimedia.org [18:31:57] (EdgeTrafficDrop) firing: (4) 67% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://alerts.wikimedia.org [18:36:57] (EdgeTrafficDrop) resolved: (4) 67% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://alerts.wikimedia.org [19:42:47] 10netops, 10Data-Engineering, 10Infrastructure-Foundations, 10SRE, and 2 others: Collect netflow data for internal traffic - https://phabricator.wikimedia.org/T263277 (10Ottomata) Ah, right! https://phabricator.wikimedia.org/T248865#6289287 So yeah, unless we can at least control the event format, we can... [22:39:57] (VarnishPrometheusExporterDown) firing: Varnish Exporter on instance cp5006:9331 is unreachable - https://alerts.wikimedia.org