[03:39:57] (VarnishTrafficDrop) firing: 43% GET drop in text@eqsin during the past 30 minutes - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org [03:44:57] (VarnishTrafficDrop) resolved: 66% GET drop in text@eqsin during the past 30 minutes - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org [05:36:17] 10Traffic, 10DNS, 10SRE: Additional DNS entries for Wikilearn project (Community Development) - https://phabricator.wikimedia.org/T292537 (10Vgutierrez) 05Stalled→03In progress [05:43:47] 10Traffic, 10DNS, 10SRE, 10Patch-For-Review: Additional DNS entries for Wikilearn project (Community Development) - https://phabricator.wikimedia.org/T292537 (10Vgutierrez) 05In progress→03Resolved a:03Vgutierrez ` vgutierrez@carrot:~/wikimedia.org/operations/dns$ host _fbf735f01a612e98f20b40a80776ee... [09:29:05] 10Acme-chief, 10Traffic: Implement a watchdog mechanism on acme-chief - https://phabricator.wikimedia.org/T292619 (10Vgutierrez) [09:30:00] 10Acme-chief, 10Traffic: Implement a watchdog mechanism on acme-chief - https://phabricator.wikimedia.org/T292619 (10Vgutierrez) p:05Triage→03Medium [11:20:57] (VarnishTrafficDrop) firing: 54% GET drop in text@esams during the past 30 minutes - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org [11:24:02] uh this looks like a pretty large decrease ^ [11:24:37] https://grafana.wikimedia.org/d/000000541/varnish-caching-last-week-comparison?viewPanel=5&orgId=1&refresh=15m&from=now-1h&to=now [11:25:56] (VarnishTrafficDrop) resolved: 62% GET drop in text@esams during the past 30 minutes - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org [11:25:58] at 11:20 in text@esams we had essentially half the requests compared to one week ago [11:34:30] (hey - please see email just sent to the team alias) [11:37:09] question_mark: just saw that, take care <3 [12:09:50] 10netops, 10Infrastructure-Foundations: Eqiad Expansion - LVS Connectivity Options - https://phabricator.wikimedia.org/T292630 (10cmooney) [12:10:01] 10netops, 10Infrastructure-Foundations: Eqiad Expansion - LVS Connectivity Options - https://phabricator.wikimedia.org/T292630 (10cmooney) p:05Triage→03High [12:30:49] 10Traffic, 10Beta-Cluster-Infrastructure, 10SRE, 10Patch-For-Review: Figure out why deployment-cache-text06 keeps crashing - https://phabricator.wikimedia.org/T286502 (10ema) 05Open→03Resolved a:03ema After lowering the amount of memory used for the ATS backend ram cache, there's now some more availa... [12:44:53] 10Traffic, 10Inuka-Team, 10KaiOS-Wikipedia-app, 10SRE: Many KaiOS devices can't access WMF websites and can't use Wikipedia app - https://phabricator.wikimedia.org/T292632 (10SBisson) [13:27:50] 10Traffic, 10Inuka-Team, 10KaiOS-Wikipedia-app, 10SRE: Many KaiOS devices can't access WMF websites and can't use Wikipedia app - https://phabricator.wikimedia.org/T292632 (10Vgutierrez) p:05Triage→03Low ack, thanks for the heads up @SBisson. I'm not familiar with KaiOS so forgive me if it's a stupid... [13:40:19] 10Traffic, 10Analytics, 10Analytics-Kanban, 10Data-Engineering, and 6 others: Migrated Server-side EventLogging events recording http.client_ip as 127.0.0.1 - https://phabricator.wikimedia.org/T288853 (10DAbad) @Ottomata can we close this ticket out now? Or is there work left? [13:42:29] 10Traffic, 10Analytics, 10Analytics-Kanban, 10Data-Engineering, and 6 others: Migrated Server-side EventLogging events recording http.client_ip as 127.0.0.1 - https://phabricator.wikimedia.org/T288853 (10Ottomata) There is still work, I haven't deployed the config change. Sorry about that. I got caught up... [13:56:40] 10Traffic: RIPE Atlas monitoring of reachability & latency towards anycasted Wikidough IP - https://phabricator.wikimedia.org/T283614 (10ayounsi) RIPE atlas measurements have a lot of limitations when they're recurrent and not part of the Anchor network, which make them less ideal for monitoring Wikidough. For e... [14:20:11] 10netops, 10Infrastructure-Foundations, 10SRE: Eqiad Expansion - LVS Connectivity Options - https://phabricator.wikimedia.org/T292630 (10cmooney) @wiki_willy One option that Dell do seem to have are the Mellanox ConnextX-3 and ConnectX-4 QSFP+ based cards. With the right module we can in theory do 4x10G off... [14:36:30] 10Traffic, 10SRE, 10Patch-For-Review: Experiment with single backend CDN nodes - https://phabricator.wikimedia.org/T288106 (10ema) >>! In T288106#7405557, @gerritbot wrote: > Change 726912 had a related patch set uploaded (by Ema; author: Ema): > %%%[operations/puppet@production] cache: exclude single backen... [15:27:47] 10netops, 10Infrastructure-Foundations, 10SRE: Eqiad Expansion - LVS Connectivity Options - https://phabricator.wikimedia.org/T292630 (10BBlack) I'm pretty sure you're right about the 2x PCIe limitation on these servers, unfortunately. What I'm not so sure about, is whether the (currently BIOS-disabled) "on... [15:36:30] 10netops, 10Infrastructure-Foundations, 10SRE: Eqiad Expansion - LVS Connectivity Options - https://phabricator.wikimedia.org/T292630 (10BBlack) Also note that lvs1013-16 are ~4 years old now. I don't think they were scheduled for refresh this year, but they probably would be next year (and by then we would... [15:55:28] bblack: BTW, in case that you didn't see it: https://phabricator.wikimedia.org/T292632 [15:55:42] yeah I saw [15:55:50] I don't think there's much we can do, though [15:56:27] apparently the only option for KaiOS is a vendor patch (but many are out of support in every sense), or rooting your device to install the ISRG Root X1 pem file. There's no unrooted way to do so :/ [15:56:53] yeah.. sadly state for those devices [15:57:07] cause I'm assuming that they are also missing any kind of security patches [15:57:17] yeah, for a long time [15:57:49] the root of many such issues is the sad decisions of Vendors (OS-makers, device-makers, and carriers selling devices/services) [15:57:56] and according to https://www.mozilla.org/en-US/security/known-vulnerabilities/firefox/ [15:58:15] there are a fairly amount of critical vulnerabilities potentially affecting firefox 48 [15:59:13] they can't of course offer perpetual support - but if you have an idea about a reasonable support timeline (say 3 years or whatever), and you know the economic situation of your target market(s), you can keep device prices offered within reason for those people to upgrade every few years, instead of holding onto a dead device because they can't afford a new one in time (or work on payment plan [15:59:19] systems / subscription models with free replacement upgrades every N years, etc) [15:59:57] there's lots of ways to solve it, but I blame Vendors for putting the users in the situations they're in [16:17:05] bblack: vgutierrez: maybe the kaios testing lab could run their own dnsmasq or similar that mapped queries to one of the non-LE edges [16:54:43] 10Traffic, 10Inuka-Team, 10KaiOS-Wikipedia-app, 10SRE: Many KaiOS devices can't access WMF websites and can't use Wikipedia app - https://phabricator.wikimedia.org/T292632 (10SBisson) >>! In T292632#7405463, @Vgutierrez wrote: > ... does that mean that WiFi-only devices are unsupported and missing security... [17:17:56] 10netops, 10Infrastructure-Foundations, 10SRE: Eqiad Expansion - LVS Connectivity Options - https://phabricator.wikimedia.org/T292630 (10cmooney) Thanks for the detailed response @BBlack > And yes, the tunnel option is also risky/complex/painful, but we'll have to weigh that against all the above. Without... [19:39:55] 10netops, 10Infrastructure-Foundations, 10SRE: Eqiad Expansion - LVS Connectivity Options - https://phabricator.wikimedia.org/T292630 (10cmooney) > I know in the R720s, R730s etc the "on-board" network ports are on a daughterboard which is swappable. From everything I've seen online this isn't the case with... [21:05:59] 10netops, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10ops-codfw: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL ) - https://phabricator.wikimedia.org/T283582 (10Papaul) We are seeing this issue because all those hosts are running an old firmware version for the IDRAC. Upgrading...