[02:08:59] (PuppetDisabled) firing: Puppet disabled on puppetserver2001:9100 - https://wikitech.wikimedia.org/wiki/Puppet/Runbooks#Puppet_Disabled - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?var-cluster=misc&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DPuppetDisabled [02:19:44] (SystemdUnitFailed) firing: (7) debian-weekly-rebuild.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [03:01:59] (PuppetDisabled) firing: Puppet disabled on puppetdb1003:9100 - https://wikitech.wikimedia.org/wiki/Puppet/Runbooks#Puppet_Disabled - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?var-cluster=puppet&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DPuppetDisabled [05:57:01] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Update network SSH keys to ssh-ed25519 - https://phabricator.wikimedia.org/T336769 (10ayounsi) [06:08:59] (PuppetDisabled) firing: Puppet disabled on puppetserver2001:9100 - https://wikitech.wikimedia.org/wiki/Puppet/Runbooks#Puppet_Disabled - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?var-cluster=misc&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DPuppetDisabled [06:19:44] (SystemdUnitFailed) firing: (7) debian-weekly-rebuild.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:43:51] (ProbeDown) firing: (2) Service idm2001:443 has failed probes (http_idm_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#idm2001:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [07:01:59] (PuppetDisabled) firing: Puppet disabled on puppetdb1003:9100 - https://wikitech.wikimedia.org/wiki/Puppet/Runbooks#Puppet_Disabled - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?var-cluster=puppet&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DPuppetDisabled [07:58:59] (PuppetDisabled) resolved: Puppet disabled on puppetserver2001:9100 - https://wikitech.wikimedia.org/wiki/Puppet/Runbooks#Puppet_Disabled - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?var-cluster=misc&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DPuppetDisabled [08:03:40] 10Puppet, 10Infrastructure-Foundations, 10Project-Admins, 10PM: Clarify Puppet tag - https://phabricator.wikimedia.org/T295221 (10Aklapper) @joanna_borun I boldly edited the description at https://phabricator.wikimedia.org/project/manage/78/ - does that make sense? [09:01:31] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 3 others: cloudservices2004-dev: reimage into new network setup - https://phabricator.wikimedia.org/T338778 (10aborrero) Seeing many errors like this: ` Jun 19 09:00:07 cloudservices2004-dev pdns_server[1181224]: Received NOTIFY for codfw1dev... [09:32:38] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 3 others: cloudservices2004-dev: reimage into new network setup - https://phabricator.wikimedia.org/T338778 (10aborrero) >>! In T338778#8945972, @aborrero wrote: > Seeing many errors like this: > > ` > Jun 19 09:00:07 cloudservices2004-dev pd... [09:36:16] 10puppet-compiler, 10Infrastructure-Foundations: Puppet compiler fails due to unset fact wmflib.is_container - https://phabricator.wikimedia.org/T338961 (10jbond) >>! In T338961#8932079, @jhathaway wrote: > sorry should have more explicit, I don't seem to have access to the labs host? > > ` > $ ssh puppetmast... [10:10:00] topranks: XioNoX: i went to deploy a homer change but i notice that there are many changes pending for lsw1-a1-codfw.mgmt.codfw.wmnet. looks like its possibly unconfiguered? [10:10:51] jbond: I'm not 100% sure what the status of it is, it's really only in planning stage [10:11:08] I've to rush out for my covid booster, I'll set it back to "planned" in Netbox so you can proceed and check in a while [10:11:24] cheers [10:11:46] ok done [10:11:51] tthanks [10:12:44] yeah I set the rest of those in codfw back to planned last week after initial config, I think I may have just missed that one [10:12:56] ahh ok cheers [10:12:56] I'll double check the diff later [10:19:30] 10SRE-tools, 10Infrastructure-Foundations, 10SRE: reimage cookbook should exit cleanly if no puppet role is applied to a node - https://phabricator.wikimedia.org/T338990 (10Volans) The problem is that we can't use the exit code of the NOOP run because it can be both ok and not ok with the same non-zero exit... [10:19:44] (SystemdUnitFailed) firing: (7) debian-weekly-rebuild.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:34:16] 10SRE-tools, 10Infrastructure-Foundations, 10Spicerack, 10Patch-For-Review: ServiceLVS without monitor breaks spicerack - https://phabricator.wikimedia.org/T339243 (10Volans) That was changed in https://gerrit.wikimedia.org/r/c/operations/puppet/+/924342 without modifying spicerack although it's written o... [10:40:49] 10SRE-tools, 10Infrastructure-Foundations, 10SRE: reimage cookbook should exit cleanly if no puppet role is applied to a node - https://phabricator.wikimedia.org/T338990 (10jbond) > The problem is that we can't use the exit code of the NOOP run because it can be both ok and not ok with the same non-zero exit... [10:41:59] 10SRE-tools, 10Infrastructure-Foundations, 10Spicerack, 10Patch-For-Review: ServiceLVS without monitor breaks spicerack - https://phabricator.wikimedia.org/T339243 (10Clement_Goubert) I figured. I merged the change, tell me when you cut a release and we can resolve. [10:43:51] (ProbeDown) firing: (2) Service idm2001:443 has failed probes (http_idm_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#idm2001:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [11:59:55] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 3 others: cloudservices2004-dev: reimage into new network setup - https://phabricator.wikimedia.org/T338778 (10cmooney) >>! In T338778#8946041, @aborrero wrote: > Fixed by running this in the pdns database; > > ` > update domains set master=... [12:17:17] 10netops, 10Infrastructure-Foundations, 10SRE: Configure QoS marking and policy across network - https://phabricator.wikimedia.org/T339850 (10cmooney) p:05Triage→03Medium [12:40:50] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 3 others: cloudservices2004-dev: reimage into new network setup - https://phabricator.wikimedia.org/T338778 (10aborrero) >>! In T338778#8946438, @cmooney wrote: >>>! In T338778#8946041, @aborrero wrote: >> Fixed by running this in the pdns da... [12:45:17] 10puppet-compiler, 10Infrastructure-Foundations: Puppet compiler fails due to unset fact wmflib.is_container - https://phabricator.wikimedia.org/T338961 (10hashar) > unless there are still some cloud projects reporting issues id say we could close this I still have the issue with lack of `wmflib.is_container`... [12:55:07] 10puppet-compiler, 10Infrastructure-Foundations: Puppet compiler fails due to unset fact wmflib.is_container - https://phabricator.wikimedia.org/T338961 (10jbond) >>! In T338961#8946558, @hashar wrote: >> unless there are still some cloud projects reporting issues id say we could close this > > I still have t... [12:57:25] 10puppet-compiler, 10Infrastructure-Foundations: Puppet compiler fails due to unset fact wmflib.is_container - https://phabricator.wikimedia.org/T338961 (10hashar) 05Open→03Resolved a:03jhathaway I did comment `check experimental` to trigger a new build but clicked ON THE PREVIOUS report instead then cam... [12:57:52] 10netops, 10Infrastructure-Foundations, 10SRE: Configure ECMP hashing function on QFX5120 platform - https://phabricator.wikimedia.org/T339852 (10cmooney) p:05Triage→03Medium [12:58:05] jbond: jhathaway: indeed the PCC works with `wmflib.is_container` . I clicked the faulty report from last week instead of the new one I asked to generate a few minutes go [12:58:08] s/go/ago [12:58:11] tldr: FIXED! thx ;) [12:58:24] hashar: ack [12:58:39] fyi you can go to https://puppet-compiler.wmflabs.org/output/$gerrit_id to get a list of all pcc reports [12:58:45] the last one will be the most recent [12:58:47] e.g. https://puppet-compiler.wmflabs.org/output/927750 [13:19:53] nice :] [14:00:40] #sorru session expired [14:01:52] 00 [14:01:53] . [14:19:44] (SystemdUnitFailed) firing: (7) debian-weekly-rebuild.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:23:51] (ProbeDown) resolved: Service idm2001:443 has failed probes (http_idm_wikimedia_org_ip6) - https://wikitech.wikimedia.org/wiki/Runbook#idm2001:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [14:24:03] 10SRE-tools, 10Infrastructure-Foundations, 10Spicerack, 10Patch-For-Review: switchdc SAL log entries are getting cut off because long lines are being split over IRC - https://phabricator.wikimedia.org/T285709 (10Volans) p:05Triage→03Low [14:29:26] 10SRE-tools, 10Observability-Logging, 10Spicerack: Create a cookbook for managing Logstash cluster restarts - https://phabricator.wikimedia.org/T293929 (10joanna_borun) [14:29:31] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Update network SSH keys to ssh-ed25519 - https://phabricator.wikimedia.org/T336769 (10ayounsi) [14:38:25] 10netops, 10Infrastructure-Foundations, 10SRE: Configure ECMP hashing function on QFX5120 platform - https://phabricator.wikimedia.org/T339852 (10ayounsi) Not tested but looks like the syntax changed slightly to: ` set forwarding-options enhanced-hash-key inet ? Possible completions: + apply-grou... [14:44:23] 10SRE-tools, 10Infrastructure-Foundations, 10Spicerack: Allow to dry_run RemoteHosts.wait_reboot_since() and PuppetHosts.wait_since() - https://phabricator.wikimedia.org/T311050 (10joanna_borun) [14:44:30] 10SRE-tools, 10Infrastructure-Foundations, 10Spicerack: IcingaHosts.wait_for_downtimed() does not honor dry_run - https://phabricator.wikimedia.org/T315537 (10SLyngshede-WMF) 05In progress→03Resolved [14:46:04] 10SRE-tools, 10Infrastructure-Foundations, 10Spicerack: Allow to dry_run RemoteHosts.wait_reboot_since() and PuppetHosts.wait_since() - https://phabricator.wikimedia.org/T311050 (10Volans) @JMeybohm am I interpreting correctly that you're saying that those are raising an exception because the reboot or puppe... [15:00:23] 10SRE-tools, 10Infrastructure-Foundations, 10Spicerack, 10Patch-For-Review: ServiceLVS without monitor breaks spicerack - https://phabricator.wikimedia.org/T339243 (10Volans) p:05Triage→03High [15:01:30] 10SRE-tools, 10Infrastructure-Foundations, 10Spicerack: Spicerack: don't IRC log start/stop of cookbook - https://phabricator.wikimedia.org/T324655 (10jbond) > SGTM, the ability to only log successful executions would be a win for not impactful cookbooks. @ayounsi i think i see you have updated the networkin... [15:16:04] 10SRE-tools, 10Infrastructure-Foundations, 10Spicerack: Spicerack: don't IRC log start/stop of cookbook - https://phabricator.wikimedia.org/T324655 (10ayounsi) I still think it would be valuable to be able to not log anything without hack. right now the cookbook fails with a success during a show operation. [15:48:38] 10SRE-tools, 10Infrastructure-Foundations, 10Spicerack: Spicerack: don't IRC log start/stop of cookbook - https://phabricator.wikimedia.org/T324655 (10jbond) p:05Triage→03Medium >>! In T324655#8947003, @ayounsi wrote: > I still think it would be valuable to be able to not log anything without hack. right... [16:15:59] 10Puppet, 10Infrastructure-Foundations, 10Project-Admins, 10PM: Clarify Puppet tag - https://phabricator.wikimedia.org/T295221 (10joanna_borun) @Aklapper looks good to me. Thank you. [16:49:05] 10SRE-tools, 10Infrastructure-Foundations, 10Spicerack, 10Patch-For-Review: ServiceLVS without monitor breaks spicerack - https://phabricator.wikimedia.org/T339243 (10Volans) The ideal solution would be to make the spicerack class accept happily any undefined parameter, the only problem with that is that a... [16:49:25] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Update network SSH keys to ssh-ed25519 - https://phabricator.wikimedia.org/T336769 (10ayounsi) [17:14:55] 10SRE-tools, 10Infrastructure-Foundations, 10Observability-Logging, 10Spicerack: Create a cookbook for managing Logstash cluster restarts - https://phabricator.wikimedia.org/T293929 (10lmata) [18:19:44] (SystemdUnitFailed) firing: (7) debian-weekly-rebuild.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [22:19:44] (SystemdUnitFailed) firing: (7) debian-weekly-rebuild.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed