[00:02:57] (HAProxyEdgeTrafficDrop) firing: 69% request drop in text@drmrs during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=drmrs&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [00:07:57] (HAProxyEdgeTrafficDrop) resolved: 69% request drop in text@drmrs during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=drmrs&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [07:18:57] 10Traffic, 10DNS, 10Infrastructure-Foundations, 10Mail, and 2 others: Consider if to support BIMI for wiki mail - https://phabricator.wikimedia.org/T311685 (10jcrespo) [07:21:46] 10Traffic, 10DNS, 10Infrastructure-Foundations, 10Mail, and 2 others: Consider if to support BIMI for wiki mail - https://phabricator.wikimedia.org/T311685 (10jcrespo) I created this when I saw someone mentioning it on discord. Ping @Vgutierrez @BBlack (I personally have no thought, I didn't know this was... [08:19:53] 10Traffic, 10netops, 10Infrastructure-Foundations, 10SRE: Upgrade to Bird 2 - https://phabricator.wikimedia.org/T310574 (10ayounsi) 05Open→03Resolved a:03ayounsi Awesome, thanks a lot @ssingh I slightly cleaned up the doc (added a mention of the bird2 upgrade) And updated the dashboard at https://g... [08:30:05] 10Traffic, 10DNS, 10Fundraising-Backlog, 10Infrastructure-Foundations, and 3 others: Consider if to support BIMI for wiki mail - https://phabricator.wikimedia.org/T311685 (10greg) The email team in fundraising has interest in this topic as well. [08:35:36] 10Traffic, 10DNS, 10Fundraising-Backlog, 10Infrastructure-Foundations, and 3 others: Consider if to support BIMI for wiki mail - https://phabricator.wikimedia.org/T311685 (10jcrespo) Probably related: T211404 T167337 [09:23:00] fyi, I updated the DNS recursor grafana dashboard - https://grafana.wikimedia.org/d/000000399/dns-recursors?orgId=1 [09:24:47] Added a "all" site selector (and use it by default), migrated to the new timeseries visualisation, stacked the values, set 0 as min, use bars instead of lines, set the default time selector to 3h [12:54:45] XioNoX: looks nice! [12:56:35] sukhe: I'm wondering if I should add the relevant bird panels to the dashboard or not [12:56:53] but also I think I'm procrastinating on more important things to do [12:56:55] or have one bird dashborad itself? not sure [12:56:56] haha yeah [12:57:05] it's only natural after the upgrade :P [12:57:55] sukhe: https://grafana.wikimedia.org/d/dxbfeGDZk/anycast?orgId=1 [13:28:43] very nice and clean! [15:16:11] 10Traffic, 10Data-Engineering-Kanban, 10SRE, 10Data Engineering Planning: Spike: Investigate creating robust alerts to notify that caching nodes are not sending traffic data - https://phabricator.wikimedia.org/T304651 (10JArguello-WMF) [15:34:48] 10Traffic, 10Data-Engineering-Kanban, 10SRE, 10Data Engineering Planning (Sprint 01): Spike: Investigate creating robust alerts to notify that caching nodes are not sending traffic data - https://phabricator.wikimedia.org/T304651 (10JArguello-WMF) [16:00:50] the next step for https://phabricator.wikimedia.org/T138093 (query parameter normalization) is to get the vmod packaged for Debian, so it can be deployed on the Beta Cluster. Is there anyone from SRE who can help with that? Since the repo layout and build process for vmods follows a template, it should be straightforward to package. [16:27:37] 10Traffic, 10SRE, 10Patch-For-Review: Test ESI feasibility with current Varnish installation - https://phabricator.wikimedia.org/T308799 (10Vgutierrez) @AndyRussG currently in our CDN varnish and ATS runs on the same nodes. All the communication with backend servers/applayer is performed by ats-be (see https... [16:46:32] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review, 10cloud-services-team (Kanban): Replace labstore100[67] with clouddumps100[12] - https://phabricator.wikimedia.org/T309346 (10wiki_willy) [17:09:43] 10Traffic, 10Data-Engineering, 10Data-Engineering-Kanban, 10SRE: Spike: Investigate creating robust alerts to notify that caching nodes are not sending traffic data - https://phabricator.wikimedia.org/T304651 (10EChetty) [18:01:15] 10Traffic, 10DNS, 10Fundraising-Backlog, 10Infrastructure-Foundations, and 3 others: Consider if to support BIMI for wiki mail - https://phabricator.wikimedia.org/T311685 (10ssingh) p:05Triage→03Medium [20:14:40] 10Traffic, 10SRE: pontoon.traffic.eqiad1.wikimedia.cloud unable to run puppet agent due to certificate mismatch - https://phabricator.wikimedia.org/T310303 (10BCornwall) @Vgutierrez Indeed, do you have any reason to keep these *specific* instances around, or are you okay with a replacement? [22:25:38] (LVSHighRX) firing: Excessive RX traffic on lvs2009:9100 (ens2f0np0) #page - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs2009 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [23:05:38] (LVSHighRX) resolved: Excessive RX traffic on lvs2009:9100 (ens2f0np0) #page - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs2009 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX