[07:33:15] 10netbox, 10Infrastructure-Foundations: Should we have two versions of the Juniper QFX5120-48Y in Netbox? - https://phabricator.wikimedia.org/T331519 (10ayounsi) >> For what I saw the prompt is the fastest way to differentiate them from the CLI: > JUNOS 19.1R3-S2.3 Kernel 64-bit FLEX JNPR-11.0-20200618.2bc7e35... [07:59:39] 10SRE-tools, 10DC-Ops, 10Infrastructure-Foundations, 10Sustainability (Incident Followup): PXE Boot defaults to automatically reimaging (normally destroying os and all filesystemdata) on all servers - https://phabricator.wikimedia.org/T251416 (10MoritzMuehlenhoff) Agreed, I think we can simply resolve task. [08:55:12] 10netops, 10Infrastructure-Foundations, 10SRE: Netflow/pmacct: use forwardingStatus - https://phabricator.wikimedia.org/T331707 (10ayounsi) 05Declined→03Open > We do not support forwarding status in ipfix message. > However, you may use ‘report-zero-oif-gw-on-discard’ in which Jflow can be forced to repo... [09:31:31] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Netflow/pmacct: use forwardingStatus - https://phabricator.wikimedia.org/T331707 (10ayounsi) Next steps here are: # Check with data-engineering ( @BTullis ?) if it's ok to add those 3 new keys (and what changes are needed in druid/turnilo)... [09:36:14] 10SRE-tools, 10Infrastructure-Foundations: Upgrade Fastnetmon to 1.2.4 - https://phabricator.wikimedia.org/T330884 (10MoritzMuehlenhoff) I tried to backport FNM 1.2.4, it does some tricky things with Boost and so far I couldn't force cmake to accept Bullseye's Boost libs. Given that Bookworm is close (unstable... [09:37:04] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Netflow/pmacct: use forwardingStatus - https://phabricator.wikimedia.org/T331707 (10ayounsi) [09:38:54] 10SRE-tools, 10Infrastructure-Foundations: Upgrade Fastnetmon to 1.2.4 - https://phabricator.wikimedia.org/T330884 (10ayounsi) That's fine for me! [09:42:06] 10SRE-tools, 10Infrastructure-Foundations: Upgrade Fastnetmon to 1.2.4 - https://phabricator.wikimedia.org/T330884 (10MoritzMuehlenhoff) >>! In T330884#8696816, @ayounsi wrote: > That's fine for me! Is it as easy as a re-image? It'll take a bit to sort out the setup, my proposal would be to add an additional... [10:02:58] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Netflow/pmacct: use forwardingStatus - https://phabricator.wikimedia.org/T331707 (10BTullis) Seems like a great idea to me. I don't have any concerns about the addition of the three new keys, or the boolean if it ends up being a computed va... [10:58:51] Hi, we are deploying our new ceph cluster and discover that the thirdparty/ceph-quincy repo is empty https://apt-browser.toolforge.org/bullseye-wikimedia/thirdparty/ceph-quincy/ [10:58:51] Do you have some pointer on how I could get it populated with packages? From the documentation it seems that thirdparty are only for sync with external repo, is there any process performing this sync or it's a manual action? [11:03:19] these are getting synced within reprepro: https://wikitech.wikimedia.org/wiki/Reprepro#Updating_external_repositories [11:03:57] was the component only recently added to Puppet or was it used before for some different role? [11:08:24] From my knowledge it is recently added to Puppet (we just started to install the cluster yesterday) [11:08:24] dcaro: do you already rely on the quincy repo from one of the cloudceph clusters? [11:10:46] The repo config has been added in January 2023 but I don't know if the sync has been performed afterwards [11:16:55] these simply don't appear to be in use yet [11:17:13] quincy seems to refer to Ceph 17 and cloudceph* is running 15 [11:17:31] so there should be no impact in syncing, let me do that now [11:19:35] done, these are imported now [11:35:03] thanks moritzm [11:51:23] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Netflow/pmacct: use forwardingStatus - https://phabricator.wikimedia.org/T331707 (10ayounsi) p:05Triage→03Low a:03JAllemandou Thanks! Moving the task over to Joseph [12:13:08] 10CFSSL-PKI, 10Infrastructure-Foundations: cfssl: investigate using post handshake authentication - https://phabricator.wikimedia.org/T332149 (10jbond) p:05Triage→03Medium [12:15:49] 10SRE-tools, 10Infrastructure-Foundations, 10cloud-services-team: Allow wmcs cookbooks running on cloudcuminXXXX to write to the SAL - https://phabricator.wikimedia.org/T325756 (10Volans) To recap from an IRC chat, we need to define where should the automatic SAL log that spicerack emits on START/END of cook... [12:28:37] 10SRE-tools, 10Infrastructure-Foundations: sync firmware between cumin hosts - https://phabricator.wikimedia.org/T332158 (10jbond) 05Open→03In progress p:05Triage→03Medium [13:10:34] 10SRE-tools, 10Infrastructure-Foundations: sync firmware between cumin hosts - https://phabricator.wikimedia.org/T332158 (10Volans) One of the simplest option could be to scp/rsync the single files right after downloading them using the keyholder ssh key for cumin. [13:11:14] moritzm: what's the failure for regenerate_certificate()? (re: https://phabricator.wikimedia.org/T330495#8697765 ) [13:11:27] it might be simpler to fix spicerack than repackage puppet ;) [13:11:38] lmk if I can help [13:29:38] volans: i have not looked into it but puppet agent -t failes on the first run (requesting puppet certificate). it looks like a possible cheange to the ca apis which is preventing puppet agent 7 from talking to puppet master 5.5 [13:30:21] ah you mean is not just an issue of the order of the commands we run [13:30:36] but more in depth into the puppet client/CA interaction between versions [13:30:44] one way to work around it would be top update puppet so that it manualy generates the cert/key on the puppet master and then copies them over however if we can package 5.5 easily i think that may be better as we are allready planning for the puppet7 migration to have it pure puppet 7 [13:31:23] and i feel that hitting issues this early one probably means that mixing them could end up causing us fruther problems in the future [13:31:44] that said once the certs are in place puppet agent 7 and puppet master 5.5 seem to work well so \o/ [13:31:59] and JFTR, traceback is this: https://paste.debian.net/1274093/ [13:32:58] yes that is the output from `puppet agent -t` with puppet agent 7 when there are no certs [13:33:59] did the workflow change? is the a different command to generate the CSR? [13:34:50] havn;t explred but there is not a specific command to run. when running puppet agent, if there are no certs, then puppet agent will tyry to request one from the puppet server [14:00:30] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, and 2 others: Q4/Q1:knams racking elevations & planning - https://phabricator.wikimedia.org/T331886 (10cmooney) @RobH @ayounsi in terms of the CR to ASW connectivity I think this makes sense? |CR|CR Port|ASW|ASW Port| |------------|-------------|---... [14:20:38] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Codfw:row A/B: rack/cable new switches - https://phabricator.wikimedia.org/T332180 (10Papaul) [14:25:15] hi all fyi i have updated pki.discovery.wmnet to uses dnsdiscovery and set it to active active, I have tested and everything seems to be working as expected abd this should go largley unoticed (cc jayme) [14:29:45] thanks for the headsup! [14:34:17] np [15:35:00] 10SRE-tools, 10Infrastructure-Foundations, 10cloud-services-team: Allow wmcs cookbooks running on cloudcuminXXXX to write to the SAL - https://phabricator.wikimedia.org/T325756 (10bd808) Basically wm-bot serves the same relay bot role in the WMCS SAL logging path as logmsgbot does for wiki cluster SAL loggin... [16:30:18] 10SRE-tools, 10Infrastructure-Foundations: sync firmware between cumin hosts - https://phabricator.wikimedia.org/T332158 (10jbond) this has now been added to the upgrade cookbook [17:13:17] 10SRE-tools, 10Infrastructure-Foundations: sync firmware between cumin hosts - https://phabricator.wikimedia.org/T332158 (10jbond) 05In progress→03Resolved a:03jbond [19:31:54] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review, 10cloud-services-team (FY2022/2023-Q3): Configure cloudsw1-b1-codfw and migrate cloud hosts in codfw B1 to it - https://phabricator.wikimedia.org/T327919 (10cmooney) @papaul, I suggest we move the ports from the existing switch to the new... [19:40:17] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, and 2 others: Q4/Q1:knams racking elevations & planning - https://phabricator.wikimedia.org/T331886 (10cmooney) As per discussion on IRC we can re-use the [[ https://www.fs.com/products/36114.html?attribute=400&id=9735 | 40GBase-LR4 ]] QSFP+ optics i...