[06:38:16] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: Patch Telxius transport cross-connect to cr1-eqiad - https://phabricator.wikimedia.org/T293709 (10ayounsi) [09:01:44] 10netops, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10ops-codfw: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL ) - https://phabricator.wikimedia.org/T283582 (10cmooney) a:05cmooney→03None Sorry @Dzahn I should have updated it before now. Makes sense to re-assign to DC-Ops... [09:10:28] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Traffic Engineering for Anycast Ranges - https://phabricator.wikimedia.org/T288843 (10ayounsi) Example before/after for Telia in eqiad: `lines=20 ayounsi@re0.cr2-eqiad> show route advertising-protocol bgp 80.239.132.225 inet.0: 852341 des... [09:15:05] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Traffic Engineering for Anycast Ranges - https://phabricator.wikimedia.org/T288843 (10ayounsi) Confirmed with Telia's looking glass: https://lg.twelve99.net/?type=bgp&router=prs-b6&address=185.71.138.0/24 [10:42:45] jbond: hi, if you feel adventurous I am systemd::timer::job will have a `splay` parameter with https://gerrit.wikimedia.org/r/c/operations/puppet/+/731839/ ;) [10:44:40] hashar: ack will take a look in a sec [12:30:13] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Traffic Engineering for Anycast Ranges - https://phabricator.wikimedia.org/T288843 (10ayounsi) 05Open→03Resolved a:03ayounsi A good baseline has now been applied across most of our transits. Further tuning will happen when sub-optimal... [12:52:45] jbond: we can do https://gerrit.wikimedia.org/r/c/operations/puppet/+/731840/ , I cherry picked it on the integration puppet master and it is behind `if $::realm == 'labs'` ;) [13:29:30] 10netbox, 10Infrastructure-Foundations: Agree how to document intra-DC patch panels in Netbox - https://phabricator.wikimedia.org/T293221 (10ayounsi) I agree that's the textbook way of doing it and where we need to be in the longer term. We currently document patch panels information in both the "termination"... [16:29:19] 10netops, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10ops-codfw: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL ) - https://phabricator.wikimedia.org/T283582 (10Dzahn) [16:29:48] 10netops, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10ops-codfw: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL ) - https://phabricator.wikimedia.org/T283582 (10Dzahn) [16:30:22] 10netops, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10ops-codfw: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL )) - https://phabricator.wikimedia.org/T283582 (10Dzahn) [17:38:13] 10netbox, 10Infrastructure-Foundations: Agree how to document intra-DC patch panels in Netbox - https://phabricator.wikimedia.org/T293221 (10RobH) I'd like to remove using the google sheet and support any alternative that moves it wholly to netbox. I think with the circuits patch panel field entry combined wi... [18:28:01] 10netbox, 10Infrastructure-Foundations: Agree how to document intra-DC patch panels in Netbox - https://phabricator.wikimedia.org/T293221 (10cmooney) @ayounsi While I agree with your analysis of the difficulties, I wonder if a hybrid approach might be possible. What we need to model for the eqiad expansion is... [20:54:57] 10netops, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10ops-codfw: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL )) - https://phabricator.wikimedia.org/T283582 (10Dzahn) affected hosts I am ACKing right now in Icinga: contint2001.mgmt ms-fe200... [20:56:54] 10netops, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10ops-codfw: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL )) - https://phabricator.wikimedia.org/T283582 (10Dzahn) [21:36:33] 10netops, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10ops-codfw: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL )) - https://phabricator.wikimedia.org/T283582 (10Papaul) |ores2005.mgmt|PER430| |gerrit2001.mgmt|PER430| |ms-fe2006.mgmt|PER430| |... [21:39:37] 10netops, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10ops-codfw: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL )) - https://phabricator.wikimedia.org/T283582 (10Papaul) [21:41:44] 10Puppet, 10Cloud-Services, 10Infrastructure-Foundations, 10SRE, and 2 others: Create a cron to clean clientbucket every day or hour - https://phabricator.wikimedia.org/T165885 (10Dzahn) @jbond I had an actual alert for this on mwmaint1002, looked up whether we apply this in prod or only cloud so far or if...