[01:30:05] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on testvm2005:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [03:22:15] (SystemdUnitFailed) firing: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:30:05] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on testvm2005:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [07:22:15] (SystemdUnitFailed) firing: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:36:02] 10SRE-tools, 06DC-Ops, 06Infrastructure-Foundations, 10observability: Improve automation for the vendor maintenance calendar - https://phabricator.wikimedia.org/T357630#9656714 (10jcrespo) [09:30:05] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on testvm2005:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [10:44:50] Any idea what this error means? [10:44:50] testvm2006:~$ sudo run-puppet-agent [10:44:51] [...] [10:44:51] Error: Could not send report: SSL_connect returned=1 errno=0 peeraddr=10.192.0.19:8140 state=error: certificate verify failed (self-signed certificate in certificate chain): [self-signed certificate in certificate chain for /C=US/ST=California/L=San Francisco/O=Wikimedia Foundation, Inc/OU=Cloud Services/CN=Wikimedia_Internal_Root_CA] [11:08:43] note that it's a new server being provisioned, and the first puppet run worked fine [11:11:01] <_joe_> hi, there is an alert about routinator sync since last friday, and it's both eqiad and codfw [11:22:15] (SystemdUnitFailed) firing: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:47:48] (PuppetZeroResources) firing: Puppet has failed generate resources on testvm2006:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [11:56:38] XioNoX: The certificate in /var/lib/puppet/ssl/certs seems wrong [11:57:00] The issuer is wrong [11:57:31] slyngs: how come ? :) [11:57:51] On testvm2002 the same certificate is signed by "Issuer: C = US, L = San Francisco, O = "Wikimedia Foundation, Inc", OU = SRE Foundations, CN = puppet_rsa" [11:58:08] On test2006 it's signed by "Issuer: CN = Puppet CA: palladium.eqiad.wmnet" [11:58:42] That would explain the error you're getting, I think [11:58:52] slyngs: testvm2006 you mean? [11:58:58] Yes, sorry [11:59:13] openssl x509 -noout -text -in testvm2006.codfw.wmnet.pem [11:59:15] but yeah that might explain it, however not sure where that's coming from [12:00:03] I just used the makevm cookbook [12:00:05] Is that the backed in certificate, which should then be replaced when the client/agent runs and registers with the puppet master? So maybe that step failed [12:00:44] I'll try a re-image and see where that brings me [12:01:14] That's probably the easiest first step :-) [12:02:54] It might be possible to remove the SSL dir, run the agent again and the sign the new certificate on the puppetserver, that bring us into "I'd like to ask an adult" - territory [12:03:17] XioNoX: that seems a puppet7 vs puppet5 error [12:03:39] An adult has arrived :-) [12:03:41] <_joe_> it is, your cert is issued by the puppet5 CA [12:04:07] is the role migrated? [12:04:28] <_joe_> slyngs: did you just call volans "adult"? [12:04:47] .... Yes ? [12:04:54] <_joe_> :) [12:06:53] volans: I haven't followed the puppet migration stuff :) [12:07:00] and it's not "my" role [12:07:09] I mean, Puppet role :) [12:07:29] The role for the test hosts have been migrated, so you'll need a -p 7 on the makevm / reimage cookbook [12:10:02] ah, ok [12:11:49] _joe_: I acked the rpki alert for a few days. The issue is outside of our org, but large enough to make us think that something was wrong with our setup. So kind of an edge case [12:12:00] I made the same mistake, so I should have recognized the error :-( [12:12:26] I thought puppet 7 was now the default :) [12:12:45] XioNoX / volans would either of you take a quick look https://gerrit.wikimedia.org/r/c/operations/dns/+/1013949 "just" a quick DNS change [12:13:20] slyngs: +1 [12:15:26] XioNoX: I believe reimage will complain if you don't specify the puppet version, but when it's called via makevm it defaults to 6 [12:15:33] Sorry 5 [12:15:50] noted, I'm deleting the VM and starting again [12:16:08] parser.add_argument('-p', '--puppet-version', choices=(5, 7), default=5, type=int, help='The puppet version to use when reimaging. One of %(choices)s.') [12:16:08] 10netops, 10Ganeti, 06Infrastructure-Foundations, 06SRE: Investigate Ganeti in routed mode - https://phabricator.wikimedia.org/T300152#9657390 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by ayounsi@cumin1002 for hosts: `testvm2006.codfw.wmnet` - testvm2006.codfw.wmnet (**FAIL**) - Do... [12:17:46] I'm wondering if we shouldn't change it to 7 now [12:21:04] I ask Moritz and his argument was "No, because we have a cookbook for migrating from 5 to 7, but not back" [12:21:09] Something like that [12:21:48] fair! [13:10:42] 10netops, 06Infrastructure-Foundations, 06SRE: Move public-vlan host BGP peerings from CRs to top-of-rack switches in codfw - https://phabricator.wikimedia.org/T360772#9657554 (10ayounsi) > So we need to decide if this imbalance for local queries is going to be an issue. I think load is the main thing to loo... [13:30:05] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on testvm2005:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [15:06:51] 10netops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE, 13Patch-For-Review: Decom asw-b-codfw switch stack - https://phabricator.wikimedia.org/T360776#9657884 (10Papaul) [15:22:15] (SystemdUnitFailed) firing: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:24:46] 10Mail, 06Infrastructure-Foundations: Exim: add lists and auto-generated headers - https://phabricator.wikimedia.org/T347831#9657963 (10andrea.denisse) [15:35:50] 10CAS-SSO, 06Infrastructure-Foundations: Enable self-service IDP two-factor authentication management - https://phabricator.wikimedia.org/T359552#9658035 (10joanna_borun) [15:36:06] 10CAS-SSO, 06Infrastructure-Foundations: Enable self-service IDP two-factor authentication management - https://phabricator.wikimedia.org/T359552#9658037 (10joanna_borun) p:05Triage→03Medium [16:57:02] 10netops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE, 13Patch-For-Review: Decom asw-b-codfw switch stack - https://phabricator.wikimedia.org/T360776#9658513 (10Papaul) [17:02:45] 10netops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Connect two hosts in codfw row A/B for switch migration testing - https://phabricator.wikimedia.org/T345803#9658554 (10Jhancock.wm) sretest2003 and 2004 have been renamed to their original server names and been offlined (including ssd removal). [17:30:05] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on testvm2005:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [18:04:16] 10SRE-tools, 06Infrastructure-Foundations, 10Spicerack, 10cloud-services-team (FY2023/2024-Q3-Q4), 13Patch-For-Review: spicerack: tox fails to install PyYAML using python 3.11 on bookworm - https://phabricator.wikimedia.org/T345337#9658807 (10Volans) Will you take care also of debian packaging it and any... [18:22:32] 10netbox, 10ChangeProp, 06collaboration-services, 10GitLab, and 9 others: Figure out a plan to move forward with regarding Redis License changes - https://phabricator.wikimedia.org/T360596#9658856 (10bd808) [18:48:32] 10SRE-tools, 10Cloud-VPS, 10Spicerack: Support downtiming metricsinfra alerts in wmcs-cookbooks - https://phabricator.wikimedia.org/T360932 (10taavi) 03NEW [18:53:46] 10SRE-tools, 10Cloud-VPS, 06Infrastructure-Foundations, 10Spicerack, 13Patch-For-Review: Support downtiming metricsinfra alerts in wmcs-cookbooks - https://phabricator.wikimedia.org/T360932#9659005 (10taavi) a:03taavi [19:22:15] (SystemdUnitFailed) firing: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:42:11] 10Mail, 06Infrastructure-Foundations, 06SRE: Access to DMARCIAN - https://phabricator.wikimedia.org/T356920#9659204 (10DBu-WMF) p:05Medium→03High Can I please have access to DMARC Digests as soon as possible. We are starting to see deliverability issues at Google Postmaster. Whoever has the DMARC Diges... [19:47:28] 10Mail, 06Infrastructure-Foundations, 06SRE: Access to DMARCIAN - https://phabricator.wikimedia.org/T356920#9659214 (10DBu-WMF) looks like I do not have access to ticket T330944. Can someone please grant me access. [20:33:50] 10netbox, 10ChangeProp, 06collaboration-services, 10GitLab, and 9 others: Figure out a plan to move forward with regarding Redis License changes - https://phabricator.wikimedia.org/T360596#9659379 (10Tgr) >>! In T360596#9652082, @Krinkle wrote: > In MediaWiki (as deployed at WMF), there exists 1 use of Red... [21:30:05] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on testvm2005:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [22:04:12] 10Mail, 06Infrastructure-Foundations, 06SRE: Access to DMARCIAN - https://phabricator.wikimedia.org/T356920#9659677 (10Dzahn) >>! In T356920#9659214, @DBu-WMF wrote: > looks like I do not have access to ticket T330944. Can someone please grant me access. I tried this on February 8 by asking on this ticket... [23:22:15] (SystemdUnitFailed) firing: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed