[02:19:47] 10Mail, 10Infrastructure-Foundations, 10SRE: mx1001.wikimedia.org mail delivery timeouts - https://phabricator.wikimedia.org/T299107 (10Legoktm) [08:28:53] 10Mail, 10Infrastructure-Foundations, 10SRE: mx1001.wikimedia.org mail delivery timeouts - https://phabricator.wikimedia.org/T299107 (10MoritzMuehlenhoff) Interesting, thanks for reverting quickly! So the mail issues in the 2012-12-03 weren't just a Heisenbug after all, but we'll probably need a less product... [08:58:37] 10SRE-tools, 10netbox, 10Infrastructure-Foundations: Netbox Reports Ideas and Requests - https://phabricator.wikimedia.org/T222931 (10ayounsi) >>! In T222931#7617721, @Dzahn wrote: > [...] Cf. T283483 [10:22:45] XioNoX, topranks: I need to reboot netbox1001 and netboxdb1001, given that dc ops are not yet around, do you need it currently? good to go ahead? [10:23:35] Morris Won’t interrupt my work [10:23:54] moritzm: yeah I'm working on it it right now, can it wait a bit? [10:24:17] moritzm: even :) [10:24:24] like 1h maxd [10:24:25] XioNoX: sure, can you ping me when you take a break or when you're ready? [10:24:32] moritzm: will do! [10:25:09] topranks: fortunately there's no internet worm named after me :-) [10:25:11] yet... [10:25:48] We all dream of being immortalised that way. One day! [10:51:36] moritzm: I'm done! [10:51:47] https://gerrit.wikimedia.org/r/c/operations/software/netbox-extras/+/753699 deployed successfully [10:59:38] ack, starting with netboxdb1001 [11:03:34] and netbox1001 next [12:17:47] Am I being stupid or is there restrictions to edit this wikitech page? [12:17:48] https://wikitech.wikimedia.org/wiki/SRE/Clinic_Duty/Access_requests [12:18:18] I went to correct a small typo but I just don't see "edit" links anywhere, I'm logged in and they are present on other wikitech pages to me. [12:18:46] yeah, indeed. If you click "Page information" it shows that ACLs are applied [12:19:13] which makes sense to protect against someone adding phishing-enabled info there or similar [12:20:38] I'm not sure if I can added you or even who added myself to administrators, maybe ask in the -sre channel if anyone can add you [12:21:04] or let me know what you want to fix and I'll do it on your behalf [12:21:30] Ok thanks Moritz, yeah makes sense to have some restrictions around it. [12:21:43] I'll see if I can get the permission added, if not I may take you up on that offer to edit. [12:21:44] Thanks! [15:30:25] 10Mail, 10Infrastructure-Foundations, 10SRE: mx1001.wikimedia.org mail delivery timeouts - https://phabricator.wikimedia.org/T299107 (10jhathaway) @MoritzMuehlenhoff my initial thought is that since we know reverting the kernel solves the issue, we could do some short reboots into the new kernel to gather mo... [15:37:30] 10Mail, 10Infrastructure-Foundations, 10SRE: mx1001.wikimedia.org mail delivery timeouts - https://phabricator.wikimedia.org/T299107 (10MoritzMuehlenhoff) >>! In T299107#7619785, @jhathaway wrote: > @MoritzMuehlenhoff my initial thought is that since we know reverting the kernel solves the issue, we could do... [15:39:04] topranks, XioNoX: is flowspec1001 good to reboot? [15:39:12] moritzm: yep [15:39:34] it's a "on hold" project so not used for anything [15:39:57] thx [15:44:31] 10Mail, 10Infrastructure-Foundations, 10SRE: mx1001.wikimedia.org mail delivery timeouts - https://phabricator.wikimedia.org/T299107 (10jhathaway) great, I'll report back what I find. [16:31:05] 10Puppet, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Unused puppet resources audit, 2021 - https://phabricator.wikimedia.org/T272559 (10dcaro) [17:47:04] 10Mail, 10Infrastructure-Foundations, 10SRE: mx1001.wikimedia.org mail delivery timeouts - https://phabricator.wikimedia.org/T299107 (10Dzahn) >>! In T299107#7618862, @MoritzMuehlenhoff wrote: > Did you by chance see whether the "Check size of conntrack table" Icinga check alerted? I checked Icinga, nothin... [17:50:27] 10Mail, 10Infrastructure-Foundations, 10SRE: mx1001.wikimedia.org mail delivery timeouts - https://phabricator.wikimedia.org/T299107 (10Dzahn) [19:33:35] 10Mail, 10Infrastructure-Foundations, 10SRE: mx1001.wikimedia.org mail delivery timeouts - https://phabricator.wikimedia.org/T299107 (10Platonides) @MoritzMuehlenhoff, did you see https://www.spinics.net/lists/stable/msg509296.html ? Apparently upstream identified the issue as 09e856d54bda5f288ef8437a90ab2b9... [19:39:26] 10Mail, 10Infrastructure-Foundations, 10SRE: mx1001.wikimedia.org mail delivery timeouts - https://phabricator.wikimedia.org/T299107 (10jhathaway) @Platonides that revert made it into 5.10.78, https://cdn.kernel.org/pub/linux/kernel/v5.x/ChangeLog-5.10.78, so I don't believe that is the issue, since we were... [20:52:35] 10netops, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10SRE, 10ops-codfw: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL ))) - https://phabricator.wikimedia.org/T283582 (10hashar) [20:58:05] 10netops, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10SRE, 10ops-codfw: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL ))) - https://phabricator.wikimedia.org/T283582 (10hashar) @Papaul wrote: > The IDRAC on this server needs reset. Please c...