[05:22:03] 10netops, 10Infrastructure-Foundations, 10SRE: Switch buffer re-partition - Eqiad Row C - https://phabricator.wikimedia.org/T286065 (10Marostegui) @Bstorm this also includes dbproxy1018 and dbproxy1019 which are the clouddb* proxies [05:24:29] 10netops, 10Infrastructure-Foundations, 10SRE: Switch buffer re-partition - Eqiad Row C - https://phabricator.wikimedia.org/T286065 (10Marostegui) [17:14:13] ema, bblack, vgutierrez - around? [17:18:05] Running some errands right now [17:19:50] ah ok, there is a PS redundancy failure for one rack in eqsin, nothing on fire afaics but another pair of eyes could help [17:20:20] Ack [17:20:28] I'll check it as soon as I get home [17:22:56] perfect, Arzhel is also checking [17:26:53] 10Traffic, 10SRE, 10ops-eqsin, 10User-MediaJS: IPMI Sensor Status Power_Supply Status: Critical on various eqsin servers - https://phabricator.wikimedia.org/T286113 (10elukey) ` 19:17 XioNoX: around? 19:20 elukey: what's up? 19:21 looks like mgmt router is down so mgmt network... [17:28:12] the task is --^ [17:28:23] added all the context, it seems Equinix maintenance [17:28:33] but if they unplug the other PS we are in trouble :) [17:31:19] 10Traffic, 10SRE, 10ops-eqsin, 10User-MediaJS: IPMI Sensor Status Power_Supply Status: Critical on various eqsin servers - https://phabricator.wikimedia.org/T286113 (10elukey) Info about a similar use case (credits to Arzhel): https://phabricator.wikimedia.org/T206861#4664474 Things to decide: 1) Do we n... [17:40:01] 10Traffic, 10SRE, 10ops-eqsin, 10Patch-For-Review, 10User-MediaJS: IPMI Sensor Status Power_Supply Status: Critical on various eqsin servers - https://phabricator.wikimedia.org/T286113 (10elukey) >>! In T286113#7195875, @elukey wrote: > Thanks for the ping, this seems to be a single rack problem - https:... [17:48:50] vgutierrez: as FYI me and Arzhel decided to depool eqsin, all logged in the task [17:49:17] mainteance should hopefully last one hour or so [17:49:32] even if alerts have been ongoing for a bit now [17:50:14] ah no, maintenance window UTC:SATURDAY, 03 JUL 14:00 - SATURDAY, 03 JUL 22:00 [17:50:17] sigh [17:51:04] I added a reminder for my tomorrow morning to check on eqsin [17:51:58] XioNoX: will do the same, thanks for the help! [17:52:14] thank you! [17:53:57] (VarnishTrafficDrop) firing: 59% GET drop in text@ during the past 30 minutes - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org [17:54:17] 10Traffic, 10SRE, 10ops-eqsin, 10Patch-For-Review, 10User-MediaJS: IPMI Sensor Status Power_Supply Status: Critical on various eqsin servers - https://phabricator.wikimedia.org/T286113 (10elukey) Me and Arzhel decided to depool eqsin, the PS redundancy failure's maintenance window seems to be: ` UTC: S... [17:58:57] (VarnishTrafficDrop) firing: (2) 32% GET drop in text@eqsin during the past 30 minutes - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org [18:03:22] elukey: XioNoX ack [18:12:05] elukey: so it looks like we don't have PS redundancy on the mgmt network in eqsin [18:13:38] vgutierrez: so they are doing power maintenance on the power feeds, both racks are impacted not only mgmt [18:13:54] we have not seen user impact since one PS was up etc.. [18:14:24] yup.. I've seen that as well [18:58:57] (VarnishTrafficDrop) resolved: (2) 69% GET drop in text@eqsin during the past 30 minutes - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org [19:40:57] (VarnishTrafficDrop) firing: 68% GET drop in text@ during the past 30 minutes - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org [19:45:57] (VarnishTrafficDrop) firing: (2) 68% GET drop in text@eqsin during the past 30 minutes - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org [19:50:57] (VarnishTrafficDrop) resolved: (2) 69% GET drop in text@eqsin during the past 30 minutes - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org