[07:02:14] 10netops, 10DBA, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Switch buffer re-partition - Eqiad Row B - https://phabricator.wikimedia.org/T286061 (10Marostegui) m1-master.eqiad.wmnet switched over to dbproxy1012 which is on row A. Once this row is done, we need to revert that. [07:02:44] 10netops, 10DBA, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Switch buffer re-partition - Eqiad Row B - https://phabricator.wikimedia.org/T286061 (10Marostegui) [16:41:47] 10Traffic: Unexpected auditd service restart failure - https://phabricator.wikimedia.org/T287266 (10ssingh) [17:07:05] 10Traffic, 10SRE: Unexpected auditd service restart failure - https://phabricator.wikimedia.org/T287266 (10ssingh) p:05Triage→03Low [18:37:20] 10netops, 10Infrastructure-Foundations, 10SRE: cr2-codfw:fpc0 crash - https://phabricator.wikimedia.org/T287110 (10cmooney) @Papaul replaced card and interfaces have been switched up. All seems ok. ` cmooney@re0.cr2-codfw> show chassis fpc pic-status 0 Slot 0 Online MPCE Type 3... [18:40:35] anybody around? [18:40:56] Papaul has replaced the faulty card in cr2-codfw and all looks ok, all interfaces back up and trafficing. [18:41:22] I am inclined to re-pool eqiad, but unsure the exact best way to proceed. [18:41:34] Should I just revert https://gerrit.wikimedia.org/r/c/operations/dns/+/703562 ? [18:51:34] rzl maybe? as you're clinic duty :) [18:51:54] I'n on my phone, but it would be nice to repool eqiad before the weekend [18:52:09] sure, happy to help [18:52:14] Thanks XioNox [18:52:19] thanks a lot! [18:52:30] rzl: great [18:52:34] topranks: yep, we'll revert that commit, feel free to send it to me for review [18:54:44] topranks: ops/dns works just like the ops/puppet repo, you'll +2 and self-merge [18:54:55] then ssh to any authdns server and `sudo authdns-update` [18:55:36] cool thanks. [18:55:54] above change reverted if you can +1 [18:56:15] already stamped it, fire when ready [18:56:26] ok great thanks :) [18:57:00] should I worry it says "merge conflict" on it? [18:57:25] it'll probably rebase cleanly if you hit the rebase button in gerrit [18:57:38] not a whole lot happening in that file lately, except for adding and removing this line repeatedly [18:58:04] ok let's see. thanks. [18:59:07] yep, perfect [18:59:16] wait for jenkins to V+2 it again, and then you can go ahead [19:00:05] all looks good with that thanks [19:14:57] (VarnishTrafficDrop) firing: 61% GET drop in text@ during the past 30 minutes - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org [19:34:34] 10netops, 10Infrastructure-Foundations, 10SRE: cr2-codfw:fpc0 crash - https://phabricator.wikimedia.org/T287110 (10cmooney) Everything still looking good, eqiad re-pooled and combined stats across sites as they were but eqiad back in the pool. Resolving task. [19:34:56] 10netops, 10Infrastructure-Foundations, 10SRE: cr2-codfw:fpc0 crash - https://phabricator.wikimedia.org/T287110 (10cmooney) 05Open→03Resolved [19:49:57] (VarnishTrafficDrop) resolved: 63% GET drop in text@ during the past 30 minutes - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org