[00:30:59] this time it failed with [00:31:02] https://www.irccloud.com/pastebin/Aw8VubII/ [00:31:27] which I am going to ignore unless someone tells me otherwise [08:50:26] 10Puppet, 10Infrastructure-Foundations, 10SRE Observability: prometheus-statsd-exporter failure to start due to invalid yaml config - https://phabricator.wikimedia.org/T302372 (10fgiunchedi) [09:23:21] 10Puppet, 10Infrastructure-Foundations, 10SRE Observability: prometheus-statsd-exporter failure to start due to invalid yaml config - https://phabricator.wikimedia.org/T302372 (10fgiunchedi) 05Open→03Resolved a:03fgiunchedi This is done, followup at {T302373} [09:45:14] andrewbogott: errors should never be ignored, the cookbook error out because it checked that something was in some state and it was not, and, most important, it didn't complete, so many more actions that would have been performed after that step were not performed. [09:47:00] 10netops, 10Infrastructure-Foundations, 10SRE, 10SRE Observability (FY2021/2022-Q3), 10User-fgiunchedi: blackbox-exporter no icmp replies on prometheus1006 for a few services - https://phabricator.wikimedia.org/T302265 (10fgiunchedi) Prometheus doesn't run on VMs in eqiad/codfw (not sure if this fact was... [10:05:08] andrewbogott: answering to your specific questions: [10:09:33] 1) https://www.irccloud.com/pastebin/n36qrG8S/ that is a function of the sre.pdus cookbooks, it has nothing to do with your reimages from yesterday, you probably wanted to refer to this one instead: [10:09:37] https://doc.wikimedia.org/spicerack/master/api/spicerack.remote.html#spicerack.remote.RemoteHosts.wait_reboot_since [10:11:02] that method polls the host until it can find an uptime such that uptime < now() - since, where since is the parameter passed to the method [10:11:18] meaning that the host got rebooted after 'since' [10:13:44] in your first reimage of cloudcontrol1004, the host never rebooted after it entered the Debian Installer, so most likely the d-i had stopped because of an error or missing config in the preseed. A common cause is a partman recipe that is incompatible with the host. [10:15:06] 2) as for https://www.irccloud.com/pastebin/Aw8VubII/ that means that the Force PXE flag that should be automatically removed after a reboot was not removed, meaning that at the next reboot the host would have rebooted into PXE instead of local disks [14:21:16] 10Puppet, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Unused puppet resources audit, 2021 - https://phabricator.wikimedia.org/T272559 (10dcaro) [14:22:22] 10Puppet, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Unused puppet resources audit, 2021 - https://phabricator.wikimedia.org/T272559 (10dcaro) 05Open→03Resolved a:03dcaro I think this is ready to be closed! \o/ There's some related patches pending, but those are not directly these anymore. [14:22:24] 10Puppet, 10Infrastructure-Foundations, 10User-jbond: Puppet Improvements 2021/2022 - https://phabricator.wikimedia.org/T294906 (10dcaro) [14:30:44] 10netops, 10Infrastructure-Foundations, 10SRE, 10SRE Observability (FY2021/2022-Q3), 10User-fgiunchedi: blackbox-exporter no icmp replies on prometheus1006 for a few services - https://phabricator.wikimedia.org/T302265 (10cmooney) > As far as this task goes to me it still remains a mystery why it looks l... [14:34:37] 10Puppet, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Unused puppet resources audit, 2021 - https://phabricator.wikimedia.org/T272559 (10dcaro) [14:50:02] volans: thanks! I suspected that that latter issue was about pxe booting but of course the host didn't pxe boot after that warning... is it possible the flags are different on an hp server vs. a dell? [14:51:10] no, the 0004000000 flag is the one that sayd force pxe at the next reboot, and should automatically clear itself upon reboot [14:53:56] 10Puppet, 10Infrastructure-Foundations, 10SRE Observability: prometheus-statsd-exporter failure to start due to invalid yaml config - https://phabricator.wikimedia.org/T302372 (10jhathaway) @fgiunchedi very sorry about the breakage, I wish I would have caught that in the review. [15:15:07] 10netops, 10Discovery, 10Infrastructure-Foundations, 10SRE: Speed up network connections for Elastic hosts - https://phabricator.wikimedia.org/T301577 (10bking) Per Cathal's feedback above, we are closing this ticket as he correctly stated "it represents significant risk for what seems to be scant benefit.... [15:15:44] 10netops, 10Discovery, 10Infrastructure-Foundations, 10SRE: Speed up network connections for Elastic hosts - https://phabricator.wikimedia.org/T301577 (10bking) 05Open→03Resolved [15:26:04] 10Puppet, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Unused puppet resources audit, 2021 - https://phabricator.wikimedia.org/T272559 (10Dzahn) epic task! kudos for finishing it [16:01:50] 10Puppet, 10Infrastructure-Foundations, 10SRE Observability: prometheus-statsd-exporter failure to start due to invalid yaml config - https://phabricator.wikimedia.org/T302372 (10fgiunchedi) No worries @jhathaway ! It was a combination of factors that meant deployment would fail silently too :( i.e. no puppe... [16:34:14] 10netops, 10Infrastructure-Foundations, 10SRE, 10SRE Observability (FY2021/2022-Q3), 10User-fgiunchedi: blackbox-exporter no icmp replies on prometheus1006 for a few services - https://phabricator.wikimedia.org/T302265 (10BBlack) >>! In T302265#7731305, @fgiunchedi wrote: > The current pings from promet... [18:11:51] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Configuration of New Switches Eqiad Rows E-F - https://phabricator.wikimedia.org/T299758 (10cmooney) [18:39:54] 10Puppet, 10Infrastructure-Foundations: Where to Put Community Modules? - https://phabricator.wikimedia.org/T302423 (10Aklapper) assuming this is about #puppet [18:43:43] 10Puppet, 10Infrastructure-Foundations: Where to Put Community Modules? - https://phabricator.wikimedia.org/T302423 (10Dzahn) To start with I would just like to add a bit of info that we have a history of using git submodules inside the puppet repo and not liking them and then moving away from them again, whic... [18:45:04] 10Puppet, 10Infrastructure-Foundations: Where to Put Community Modules? - https://phabricator.wikimedia.org/T302423 (10jhathaway) >>! In T302423#7733059, @Dzahn wrote: > To start with I would just like to add a bit of info that we have a history of using git submodules inside the puppet repo and not liking the... [18:46:46] 10Puppet, 10Infrastructure-Foundations: Where to Put Community Modules? - https://phabricator.wikimedia.org/T302423 (10jbond) @jhathaway thanks for writing this up just a few quick comments. Before commenting i would say that in my mind we have [[ https://phabricator.wikimedia.org/T265138#7041244 | four type... [18:51:57] 10Puppet, 10Infrastructure-Foundations: Where to Put Community Modules? - https://phabricator.wikimedia.org/T302423 (10jbond) p:05Triage→03Medium [18:52:26] 10Puppet, 10Infrastructure-Foundations: Where to Put Community Modules? - https://phabricator.wikimedia.org/T302423 (10jbond) [18:52:29] 10Puppet, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review, 10User-jbond: Work required to prepare for puppet 6 - https://phabricator.wikimedia.org/T265138 (10jbond) [18:58:56] 10Puppet, 10Infrastructure-Foundations: Where to Put Community Modules? - https://phabricator.wikimedia.org/T302423 (10Dzahn) >>! In T302423#7733064, @jhathaway wrote: >>>! In T302423#7733059, @Dzahn wrote: >> To start with I would just like to add a bit of info that we have a history of using git submodules i... [20:24:43] 10Puppet, 10Infrastructure-Foundations: Where to Put Community Modules? - https://phabricator.wikimedia.org/T302423 (10jhathaway) >>! In T302423#7733067, @jbond wrote: > Before commenting i would say that in my mind we have [[ https://phabricator.wikimedia.org/T265138#7041244 | four types types of modules ]]... [20:34:32] 10Puppet, 10Infrastructure-Foundations: Where to Put Community Modules? - https://phabricator.wikimedia.org/T302423 (10jbond) >>! In T302423#7733421, @jhathaway wrote: > According to [[ https://puppet.com/docs/puppet/6/type.html#puppet-60-type-changes | puppet's docs ]] and my own inspection of Puppet's 6.26... [22:18:58] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, 10ops-drmrs: Q3:(Need By: ASAP) rack/setup/install cr[12]-drmrs - https://phabricator.wikimedia.org/T300277 (10RobH) 05Open→03Resolved I closed out the ticket and this is now resolved. [22:19:14] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, 10ops-drmrs: Q3:(Need By: ASAP) rack/setup/install cr[12]-drmrs - https://phabricator.wikimedia.org/T300277 (10RobH)