[08:13:07] 10SRE-tools, 10Infrastructure-Foundations, 10Patch-For-Review: Upgrade Fastnetmon to 1.2.4 - https://phabricator.wikimedia.org/T330884 (10MoritzMuehlenhoff) >>! In T330884#8863775, @MoritzMuehlenhoff wrote: > @ayounsi There's now netflow2003 running Bookworm with FNM 1.2.4. If that works fine, we can reimage... [08:14:24] 10SRE-tools, 10Infrastructure-Foundations, 10Patch-For-Review: Upgrade Fastnetmon to 1.2.4 - https://phabricator.wikimedia.org/T330884 (10ayounsi) Awesome! In place is fine. [08:25:58] 10SRE-tools, 10Infrastructure-Foundations, 10Patch-For-Review: Upgrade Fastnetmon to 1.2.4 - https://phabricator.wikimedia.org/T330884 (10MoritzMuehlenhoff) [08:39:34] 10SRE-tools, 10Infrastructure-Foundations, 10Patch-For-Review: Upgrade Fastnetmon to 1.2.4 - https://phabricator.wikimedia.org/T330884 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host netflow4002.ulsfo.wmnet with OS bookworm [09:34:50] 10SRE-tools, 10Infrastructure-Foundations, 10Patch-For-Review: Upgrade Fastnetmon to 1.2.4 - https://phabricator.wikimedia.org/T330884 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host netflow4002.ulsfo.wmnet with OS bookworm completed: - netflow4002 (**PASS**) -... [09:37:59] there was a traceback during the reimage of netflow4002, known issue, should I open a task or report this to an existing one? https://paste.debian.net/hidden/7c27fef1/ [09:53:08] moritzm: looks like related to https://github.com/wikimedia/operations-software-netbox-extras/commit/095c84d58ffbd5a2c8b898ee0eb3c00452a10d54 [09:53:15] moritzm: can you open a task? [09:53:37] beautiful function that only returns true :) [09:53:51] so probably easy to cleanup [09:57:06] sure, will open a task in a few [10:41:28] jbond: what do you mean by "once the async patches have been merged" in https://gerrit.wikimedia.org/r/c/operations/software/homer/+/928795 ? [10:42:02] but the wmflib is a good idea [10:43:01] 10SRE-tools, 10Infrastructure-Foundations, 10Patch-For-Review: Upgrade Fastnetmon to 1.2.4 - https://phabricator.wikimedia.org/T330884 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host netflow5002.eqsin.wmnet with OS bookworm [10:44:07] XioNoX: https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/906065 [10:44:51] jbond: haha wow [10:44:59] jbond: why is it much faster? [10:49:27] XioNoX: that one is just much faster as it sends all 5 queries in parrellel not serial [10:49:38] ok, I see [10:49:53] moritzm: https://gerrit.wikimedia.org/r/c/operations/software/netbox-extras/+/929178 [10:50:55] ack, thx [10:54:01] jbond: for the homer/netbox/gql patch I'll probably need help to update the tests [10:54:31] can yuo send the link again [10:55:05] https://gerrit.wikimedia.org/r/c/operations/software/homer/+/928795 [10:55:19] dunno if it needs the async patch to be deployed first though [10:55:37] XioNoX: no it shuldn;t yuo are not using that [10:57:23] moritzm: fix deployed [11:01:27] thx [11:09:25] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10cloud-services-team, 10User-aborrero: cloudservices2004-dev: reimage into new network setup - https://phabricator.wikimedia.org/T338778 (10aborrero) [11:12:06] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10cloud-services-team, 10User-aborrero: cloudservices2004-dev: reimage into new network setup - https://phabricator.wikimedia.org/T338778 (10aborrero) [11:13:10] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10cloud-services-team, 10User-aborrero: cloudservices2004-dev: reimage into new network setup - https://phabricator.wikimedia.org/T338778 (10aborrero) p:05Triage→03Medium [11:51:13] 10SRE-tools, 10Infrastructure-Foundations, 10Patch-For-Review: Upgrade Fastnetmon to 1.2.4 - https://phabricator.wikimedia.org/T330884 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host netflow5002.eqsin.wmnet with OS bookworm completed: - netflow5002 (**PASS**) -... [12:27:19] 10SRE-tools, 10Infrastructure-Foundations, 10Traffic: Write a cookbook to roll reboot cache hosts - https://phabricator.wikimedia.org/T338783 (10Volans) p:05Triage→03Medium [12:28:17] 10SRE-tools, 10Infrastructure-Foundations, 10Patch-For-Review: Upgrade Fastnetmon to 1.2.4 - https://phabricator.wikimedia.org/T330884 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host netflow6001.drmrs.wmnet with OS bookworm [12:35:05] XioNoX: im just going to go grab some lunch but see here https://gerrit.wikimedia.org/r/c/operations/software/homer/+/929324 [12:35:16] it still needs a bit of work but i think its mostly done now [12:35:37] i thin there is just a miss match between the mocked data i have used and whats in the yaml files [12:38:12] 10SRE-tools, 10Infrastructure-Foundations, 10Patch-For-Review: Upgrade Fastnetmon to 1.2.4 - https://phabricator.wikimedia.org/T330884 (10MoritzMuehlenhoff) [13:09:20] 10SRE-tools, 10Infrastructure-Foundations, 10Patch-For-Review: Upgrade Fastnetmon to 1.2.4 - https://phabricator.wikimedia.org/T330884 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host netflow6001.drmrs.wmnet with OS bookworm completed: - netflow6001 (**PASS**) -... [13:30:25] 10SRE-tools, 10Infrastructure-Foundations, 10Patch-For-Review: Upgrade Fastnetmon to 1.2.4 - https://phabricator.wikimedia.org/T330884 (10MoritzMuehlenhoff) [13:46:36] awesome! thanks [14:41:37] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 2 others: Move cloud vps ns-recursor IPs to host/row-independent addressing - https://phabricator.wikimedia.org/T307357 (10aborrero) [14:42:09] I'm seeing more and more "This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository." from Jenkins, for example with https://gerrit.wikimedia.org/r/c/operations/homer/public/+/929330/ any idea what's up? [14:44:39] 10Puppet, 10Analytics-Radar, 10Data-Engineering-Icebox: modules/udp2log/manifests/instance/monitoring.pp has unreachable code - https://phabricator.wikimedia.org/T152104 (10joanna_borun) [14:45:41] 10Puppet, 10Cloud-VPS, 10Patch-For-Review: role::puppetmaster::standalone clones Git repositories as gitpuppet, git-sync-upstream overwrites them as root - https://phabricator.wikimedia.org/T152059 (10joanna_borun) [14:47:38] 10Puppet, 10Infrastructure-Foundations: Puppet resource for creating a postgresql database - https://phabricator.wikimedia.org/T96054 (10jbond) 05Open→03In progress I belive we now have this in puppet please re-open if i missed something [14:50:16] 10Puppet, 10Infrastructure-Foundations, 10PostgreSQL, 10User-jbond: puppetdb: tune postgress instance - https://phabricator.wikimedia.org/T287672 (10jbond) 05Open→03Resolved a:03jbond unfortunately i forgot what this relates to and general performance is improved now [14:50:22] 10Puppet, 10Infrastructure-Foundations, 10Patch-For-Review, 10User-jbond: puppetdb seems to be slow on host reimage - https://phabricator.wikimedia.org/T263578 (10jbond) [14:55:50] 10Puppet, 10Infrastructure-Foundations: Puppet resource for creating a postgresql database - https://phabricator.wikimedia.org/T96054 (10jbond) 05In progress→03Resolved a:03jbond [14:58:45] 10CFSSL-PKI, 10Puppet, 10Infrastructure-Foundations, 10SRE, 10User-jbond: PKI server don't reimage cleanly - https://phabricator.wikimedia.org/T270269 (10joanna_borun) [14:59:27] 10CFSSL-PKI, 10Puppet, 10Infrastructure-Foundations, 10SRE, 10User-jbond: PKI server don't reimage cleanly - https://phabricator.wikimedia.org/T270269 (10joanna_borun) p:05Medium→03Low [15:00:29] 10Puppet, 10Infrastructure-Foundations, 10SRE: unbound variable error when calling puppet-merge script with an explicit treeish - https://phabricator.wikimedia.org/T264014 (10jbond) 05Open→03Resolved a:03jbond [15:01:28] 10SRE-tools, 10Infrastructure-Foundations, 10Puppet-Core, 10SRE: First puppet run after reimage slow (connection timeout) - https://phabricator.wikimedia.org/T262609 (10joanna_borun) [15:01:31] 10SRE-tools, 10Infrastructure-Foundations, 10Puppet-Core, 10SRE: First puppet run after reimage slow (connection timeout) - https://phabricator.wikimedia.org/T262609 (10jbond) @Volans do you know if this is still an issue [15:10:21] 10SRE-tools, 10Infrastructure-Foundations, 10Puppet-Core, 10SRE: First puppet run after reimage slow (connection timeout) - https://phabricator.wikimedia.org/T262609 (10Volans) @jbond , no idea if this is till happening, I guess we could look at a bunch of puppet run logs from the reimages and see if there... [15:18:22] 10netops, 10Infrastructure-Foundations, 10SRE, 10Epic: [tracking] Don't keep on the public vlans hosts that don't require it - https://phabricator.wikimedia.org/T317177 (10aborrero) [15:18:32] 10netops, 10Infrastructure-Foundations, 10SRE: Plan codfw row A/B top-of-rack switch refresh - https://phabricator.wikimedia.org/T327938 (10Papaul) [15:18:40] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 2 others: Move cloud vps ns-recursor IPs to host/row-independent addressing - https://phabricator.wikimedia.org/T307357 (10aborrero) 05Open→03Stalled This is done for codfw1dev DNS servers. I'll mark this task as stalled until we work on... [15:23:07] 10netops, 10Infrastructure-Foundations, 10SRE: Plan codfw row A/B top-of-rack switch refresh - https://phabricator.wikimedia.org/T327938 (10Papaul) [15:32:56] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 2 others: cloudservices2004-dev: reimage into new network setup - https://phabricator.wikimedia.org/T338778 (10aborrero) [15:42:29] 10Puppet, 10Infrastructure-Foundations, 10SRE: puppetdb7 cross polonation - https://phabricator.wikimedia.org/T338811 (10jbond) [15:44:06] 10Puppet, 10Infrastructure-Foundations, 10SRE: puppetdb7 cross polonation - https://phabricator.wikimedia.org/T338811 (10jbond) [15:44:25] fyi i tried to describe the puppetdb issue/plan in https://phabricator.wikimedia.org/T338811 if you have a sec please give it a read and make sure it makes some senses (volans moritzm ) [15:45:23] sure wilco [15:45:48] thanks <3 [15:49:28] it seems to me that if you want to be 100% to no loose any date you should set min_successful_submissions=2 from the beginning, but that could cause any failure in the new stack to reflect into the current puppet runs [15:57:20] 10Puppet, 10Infrastructure-Foundations, 10SRE: puppetdb7 cross polonation - https://phabricator.wikimedia.org/T338811 (10jbond) p:05Triage→03Medium [15:58:06] volans: not possible thats what i was trying to say [15:58:16] i.e. there is a contradition between the docs and the code [15:58:17] https://github.com/puppetlabs/puppetdb/blob/650a0d5e3cd6b4dc05035895ae7aaf220c7a4037/puppet/lib/puppet/util/puppetdb/config.rb#L103 [15:58:55] doh [15:59:18] yes i know, still mean to check when min_successful_submissions is decrementedbyt not got to that yet [16:02:55] the only trick I can think off is to have the same server twice and maybe they check the length before and then unique it :D [16:03:15] but probably it could cause duplicate sends... so probably not a good idea :D [16:12:33] 10Puppet, 10SRE, 10Traffic, 10User-jbond: In valid byte sequence: File[/etc/update-ocsp.d/hooks/trafficserver-tls-ocsp] - https://phabricator.wikimedia.org/T238198 (10jbond) [16:12:55] 10Puppet, 10SRE, 10Patch-For-Review, 10User-jbond: Create NRPE check to alert when cergen certificates are due to expire - https://phabricator.wikimedia.org/T238833 (10jbond) [16:14:04] 10Puppet, 10SRE, 10Traffic, 10User-jbond: In valid byte sequence: File[/etc/update-ocsp.d/hooks/trafficserver-tls-ocsp] - https://phabricator.wikimedia.org/T238198 (10Vgutierrez) 05Open→03Resolved a:03Vgutierrez @jbond I think we can close this one [16:14:46] 10Puppet, 10Infrastructure-Foundations, 10SRE: Why doesn't profile::mediawiki::nutcracker create /var/run/nutcracker/ ? - https://phabricator.wikimedia.org/T204450 (10jbond) 05Open→03Resolved a:03jbond we no longer have this profile [16:15:48] 10Puppet, 10Beta-Cluster-Infrastructure, 10SRE, 10Performance-Team (Radar): Define scap::sources in a way that is shared between prod and beta - https://phabricator.wikimedia.org/T196034 (10jbond) [16:22:18] 10Puppet: Module uwsgi doesn't allow passing multiple config params of same name - https://phabricator.wikimedia.org/T123809 (10jbond) [16:23:57] 10Mail, 10Puppet, 10Infrastructure-Foundations, 10SRE: Cleanup debconf handling in mailman puppet setup - https://phabricator.wikimedia.org/T144933 (10jbond) @MoritzMuehlenhoff should this be closed [16:24:57] 10Puppet, 10Infrastructure-Foundations, 10SRE: Puppet failures with "Attempt to assign to a reserved variable name: 'trusted'" - https://phabricator.wikimedia.org/T153246 (10jbond) 05Open→03Resolved a:03jbond Im going to close this im pretty sure its fixed now but please re-open if not [16:25:22] 10Puppet, 10Toolforge, 10Documentation: Document our GridEngine set up - https://phabricator.wikimedia.org/T88733 (10jbond) [16:27:03] 10Puppet, 10Infrastructure-Foundations, 10User-jbond: puppetdb: filter large factsets - https://phabricator.wikimedia.org/T287674 (10jbond) 05Open→03Resolved a:03jbond We added some filters and puppetdb performance seems to have settled [16:27:10] 10Puppet, 10Infrastructure-Foundations, 10Patch-For-Review, 10User-jbond: puppetdb seems to be slow on host reimage - https://phabricator.wikimedia.org/T263578 (10jbond) [16:28:16] 10Puppet, 10Observability-Alerting, 10Puppet-Infrastructure, 10SRE: Notification spam from "last puppet run" upon re-enabling puppet - https://phabricator.wikimedia.org/T263720 (10jbond) [16:28:53] 10Puppet, 10Infrastructure-Foundations, 10SRE, 10conftool: confd fails to start after a reimage - https://phabricator.wikimedia.org/T244477 (10jbond) [16:29:43] 10Packaging, 10Puppet, 10Infrastructure-Foundations, 10User-jbond: create puppetboard debian package - https://phabricator.wikimedia.org/T292523 (10jbond) 05In progress→03Resolved [16:29:55] 10Puppet, 10Infrastructure-Foundations, 10User-jbond: Upgrade puppetboard to the latest version - https://phabricator.wikimedia.org/T292522 (10jbond) [16:30:30] 10Packaging, 10Puppet, 10Infrastructure-Foundations, 10User-jbond: Update python3-pypuppetdb package to 2.4.0 - https://phabricator.wikimedia.org/T292525 (10jbond) 05Open→03Resolved [16:30:36] 10Packaging, 10Puppet, 10Infrastructure-Foundations, 10User-jbond: create puppetboard debian package - https://phabricator.wikimedia.org/T292523 (10jbond) [16:36:18] 10Puppet, 10Infrastructure-Foundations: Puppetdb: not refreshed on config change? - https://phabricator.wikimedia.org/T291540 (10jbond) 05Open→03Resolved a:03jbond @volans im going to reject this and say its better to manually disable puppet fleet wide to roll out theses changes but please re-open if you... [16:36:21] 10Puppet, 10Infrastructure-Foundations, 10User-jbond: Puppet Improvements 2021/2022 - https://phabricator.wikimedia.org/T294906 (10jbond) [16:48:51] 10CFSSL-PKI, 10Infrastructure-Foundations: nobody really needs or wants 521 - https://phabricator.wikimedia.org/T338822 (10jbond) p:05Triage→03Medium [16:52:13] 10CFSSL-PKI, 10Infrastructure-Foundations: nobody really needs or wants 521 - https://phabricator.wikimedia.org/T338822 (10jbond) @Vgutierrez @BBlack i have created this ticked based on your comment that we shouldn't be using secp521r1, could you provide more information then the bug link above. specifically... [16:55:26] 10Puppet, 10Infrastructure-Foundations, 10User-jbond: Puppet Improvements 2021/2022 - https://phabricator.wikimedia.org/T294906 (10jbond) [16:55:32] 10Puppet, 10Infrastructure-Foundations, 10SRE, 10User-Joe: Update puppet code to conform to puppet 4.x and later standards - https://phabricator.wikimedia.org/T181967 (10jbond) 05Open→03Resolved a:03jbond closing this it must be done now [17:07:48] 10netops, 10Infrastructure-Foundations, 10SRE: Plan codfw row A/B top-of-rack switch refresh - https://phabricator.wikimedia.org/T327938 (10Papaul) [17:08:23] 10netops, 10Infrastructure-Foundations, 10SRE: Plan codfw row A/B top-of-rack switch refresh - https://phabricator.wikimedia.org/T327938 (10Papaul) [17:14:48] 10Puppet, 10Infrastructure-Foundations: Add check for puppetboard - https://phabricator.wikimedia.org/T296304 (10jbond) 05Open→03Resolved a:03jbond [17:18:31] 10Puppet, 10Infrastructure-Foundations, 10User-jbond: Upgrade puppetboard to the latest version - https://phabricator.wikimedia.org/T292522 (10jbond) 05In progress→03Resolved [17:18:46] 10Puppet, 10Infrastructure-Foundations, 10User-jbond: Upgrade puppetboard to the latest version - https://phabricator.wikimedia.org/T292522 (10jbond) [17:18:49] 10Puppet, 10Infrastructure-Foundations, 10Patch-For-Review, 10User-jbond: Decommission puppetboard[12]001 - https://phabricator.wikimedia.org/T296744 (10jbond) 05In progress→03Resolved [17:19:47] 10Puppet, 10Infrastructure-Foundations, 10Patch-For-Review: Hosts distribution across puppetmasters - https://phabricator.wikimedia.org/T291541 (10jbond) for the records with puppet 7 i plan to explore using srv records which may help with this [17:20:07] 10Puppet, 10Infrastructure-Foundations: update hiera order in production environment - https://phabricator.wikimedia.org/T301349 (10jbond) 05Open→03Resolved a:03jbond [17:21:34] 10Puppet, 10netbox, 10Infrastructure-Foundations, 10Maps, 10SRE: Postgres puppet modules use MD5 for users by default - https://phabricator.wikimedia.org/T300048 (10jbond) @hnowlan i did some patches to add support for this with the puppetdb upgrade. it no longer suports password changes but it dose all... [17:21:47] 10netbox, 10Infrastructure-Foundations, 10Maps, 10Puppet-Infrastructure, and 2 others: Postgres puppet modules use MD5 for users by default - https://phabricator.wikimedia.org/T300048 (10jbond) [17:24:11] 10puppet-compiler, 10Infrastructure-Foundations, 10SRE, 10Continuous-Integration-Config, 10Release-Engineering-Team (Seen): Figure out a way to enable volunteers to use the puppet compiler - https://phabricator.wikimedia.org/T192532 (10jbond) [17:24:37] 10Puppet, 10Infrastructure Security, 10Infrastructure-Foundations: puppet admin: check if additional groups in systemd::sysuser conflicts with admin.yaml - https://phabricator.wikimedia.org/T308826 (10jbond) [17:25:09] 10Puppet, 10Infrastructure-Foundations, 10SRE, 10User-jbond: puppet documentation generation is missing some compnets - https://phabricator.wikimedia.org/T271909 (10jbond) 05Open→03Resolved a:03jbond [17:25:14] 10netops, 10Infrastructure-Foundations, 10SRE: Plan codfw row A/B top-of-rack switch refresh - https://phabricator.wikimedia.org/T327938 (10Papaul) [17:26:20] 10Puppet, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review, 10User-jbond: Frequent puppet failures - https://phabricator.wikimedia.org/T221529 (10jbond) 05Open→03Resolved a:03jbond [17:27:13] 10Puppet, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Puppet should prune stale entries from sudoers.d - https://phabricator.wikimedia.org/T309268 (10jbond) 05Open→03Resolved a:03jbond [17:28:16] 10netops, 10Infrastructure-Foundations, 10SRE: Plan codfw row A/B top-of-rack switch refresh - https://phabricator.wikimedia.org/T327938 (10Papaul) [17:28:23] 10netops, 10Infrastructure-Foundations, 10Puppet-Core, 10SRE, 10User-jbond: Investigate improvements to how puppet manages network interfaces - https://phabricator.wikimedia.org/T234207 (10jbond) [17:28:54] 10Puppet, 10Infrastructure-Foundations, 10SRE: Usual git mechanism for aborting commit does not work on the private puppet repo - https://phabricator.wikimedia.org/T211121 (10jbond) 05Open→03Resolved a:03jbond closing, but please re-open if its still an issue [17:31:24] 10SRE-tools, 10Infrastructure-Foundations: Fix autorestart and debclient dependency - https://phabricator.wikimedia.org/T324229 (10jbond) [17:31:48] 10Puppet, 10Infrastructure-Foundations, 10User-jbond: puppetdb Investigate the expected bahaviour of the edges table - https://phabricator.wikimedia.org/T287673 (10jbond) 05Open→03Resolved a:03jbond closing this we have hopefully made it past the puppetdb issues [17:31:53] 10Puppet, 10Infrastructure-Foundations, 10Patch-For-Review, 10User-jbond: puppetdb seems to be slow on host reimage - https://phabricator.wikimedia.org/T263578 (10jbond) [17:33:38] 10Puppet, 10Cloud-VPS, 10Infrastructure-Foundations, 10cloud-services-team, 10User-jbond: Audit puppet usage in cloud hosts - https://phabricator.wikimedia.org/T289658 (10jbond) 05In progress→03Resolved a:03jbond going to resolve this i think the original question was answered [17:33:47] 10Puppet, 10Cloud Services Proposals, 10Cloud-VPS, 10Infrastructure-Foundations, and 3 others: Easing pain points caused by divergence between cloudservices and production puppet usecases - https://phabricator.wikimedia.org/T285539 (10jbond) [17:35:23] 10Puppet, 10Puppet-Infrastructure, 10cloud-services-team: Reduce the effects of puppet breakage on VPS - https://phabricator.wikimedia.org/T226270 (10jbond) [19:27:42] 10Puppet, 10Infrastructure-Foundations, 10SRE: puppetdb7 cross polonation - https://phabricator.wikimedia.org/T338811 (10MoritzMuehlenhoff) [19:28:10] jbond: it looks good to me! [20:04:52] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Codfw:row A/B: rack/cable new switches - https://phabricator.wikimedia.org/T332180 (10Papaul)