[07:39:58] 10CFSSL-PKI, 06Infrastructure-Foundations, 13Patch-For-Review: CFSSL gencert "remote error: tls: certificate require" - https://phabricator.wikimedia.org/T355750#9989163 (10elukey) ` elukey@cumin1002:~$ sudo cookbook sre.network.tls lsw1-d1-codfw Acquired lock for key /spicerack/locks/cookbooks/sre.network.t... [08:14:46] 10CFSSL-PKI, 06Infrastructure-Foundations, 13Patch-For-Review: CFSSL gencert "remote error: tls: certificate require" - https://phabricator.wikimedia.org/T355750#9989212 (10ayounsi) 05Open→03Resolved a:05CDanis→03elukey Yep it's all good ! I manually added the host to gNMIc and metrics are proper... [09:04:51] elukey: wow amazing thanks for the fix cfssl issue :) [09:05:25] <3 [09:05:55] brutal self nerd snipe, but I learned some things that John did :D [09:08:46] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Request additional mgmt IP range for frack servers - https://phabricator.wikimedia.org/T370164#9989347 (10ayounsi) We will need to migrate the whole range to a new prefix :( Running 2 ranges is going to be a pain long term, and would n... [09:12:29] elukey: haha, well it is most appreciated, means I can add the new nodes in codfw and use grafana during my network move tomorrow which makes things a lot easier and more responsive :) [09:13:29] topranks:glad that it helps!! [09:16:06] yep, just adding the magru nodes here too :) [09:29:15] 10netops, 06Infrastructure-Foundations, 06SRE: Core router error logs: "sshd: Did not receive identification string" from prometheus hosts - https://phabricator.wikimedia.org/T368513#9989379 (10ayounsi) @cmooney @fgiunchedi I'm wondering if the probe could/should be changed to a TCP handshake only or totally... [09:46:15] 10netops, 06Infrastructure-Foundations, 06SRE: Core router error logs: "sshd: Did not receive identification string" from prometheus hosts - https://phabricator.wikimedia.org/T368513#9989413 (10cmooney) >>! In T368513#9938867, @fgiunchedi wrote: > Those are SSH probes from local prometheus hosts indeed, in t... [10:12:31] XioNoX: I've sent my refactor for netbox tests to enable netbox 4 migration [10:26:21] volans: thx, should be be moved before 1050453 in the chain ? [10:27:01] or merged :D [10:28:10] volans: :) should the comments be in the code? to not be lost once it's merged [10:28:52] as you want, it's just meant to last a day or two [10:30:32] volans: I +1ed it but I clearly don't understand it well enough for a proper review :) [10:31:55] ack, thx [11:08:15] 10netops, 06Data-Persistence, 06DBA, 06Infrastructure-Foundations, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 -lsw1-f2-eqiad - https://phabricator.wikimedia.org/T365997#9989577 (10cmooney) 05Open→03Resolved [12:52:52] slyngs: o/ around? [12:53:01] Yes [12:54:10] I'd need to do a debmonitor (server) release, afaics we are missing the 0.4.x git tags. Not blocking but I was wondering if we should/want to push them or not [12:54:52] last one that I see is v0.3.2 (commit from John) [12:56:18] I think we should, it's a bit nicer, if we need to do a security release in the future [12:57:00] Oh I see, no, just tag the new release [12:57:13] okok! [12:59:03] I should have tagget the previous release, but didn't for some reason [13:00:15] np! I have another question - I see that we have a debian branch plus a package called debmonitor-server on debmonitorXXXX [13:00:39] and also https://gerrit.wikimedia.org/r/admin/repos/operations/software/debmonitor/deploy [13:01:28] the package seems to contain few things, but I am wondering if I have to build it [13:04:35] We don't need the debmonitor/deploy anymore. Building the debian package yields two package, python3-debmonitor and debmonitor-server [13:04:54] But yes, there is no automatic building of the package [13:06:23] elukey: then I guess we could adapt .wmfconfig to support this use case [13:06:30] is the first one with multiple packages [13:06:40] and multiple packages that have to be build for different platforms [13:06:44] server only on X, client on all [13:07:24] slyngs: ooooooh nice! [13:08:11] Not really sure where the nice bit came into play... but I hope it's the no need for the deploy repo [13:13:53] I filed https://gerrit.wikimedia.org/r/c/operations/software/debmonitor/+/1054879 for the new release, I'll open a task for py-release to support it! [13:14:09] thx [13:27:06] 10netops, 06Infrastructure-Foundations, 06SRE: Core router error logs: "sshd: Did not receive identification string" from prometheus hosts - https://phabricator.wikimedia.org/T368513#9990011 (10fgiunchedi) So I looked where the probes come from, and they are part of the generic "probe mgmt network hosts for... [13:44:37] I am seeing the following issue while building: https://phabricator.wikimedia.org/P66731 [13:45:14] dh_auto_test tries to pip install packages, but without the network it fails of course [13:46:52] I didn't see anything in phab related to how debmonitor is built, so no idea.. [13:47:38] (running tox though kind of brings to pip install deps..) [13:51:50] jobo: I was perhaps going to create a master phab task (under nda) for those actions for WME? [14:06:03] Not a bad idea, thanks topranks [14:06:10] 10SRE-tools, 06Infrastructure-Foundations, 10Spicerack: Create the python-release repository - https://phabricator.wikimedia.org/T367410#9990408 (10elukey) [14:06:40] 10SRE-tools, 06Infrastructure-Foundations, 10Spicerack: Create the python-release repository - https://phabricator.wikimedia.org/T367410#9990413 (10elukey) 05Open→03Resolved Spicerack 8.7.0 was released by me, we made it :) [14:06:57] 10Packaging, 06Infrastructure-Foundations: Package ipxe-qemu - https://phabricator.wikimedia.org/T369136#9990417 (10elukey) [14:07:15] ok will do [14:51:14] ok I give up. I've been trying to find a way to use the redfish api to set a raid controller to HBA (context in https://phabricator.wikimedia.org/T358489), I've gone as far as getting to the StorageController settings, but the raid mode is supposed to be in Oem: Dell: DellStorageController: ControllerMode, and the Oem section of the data I can get from [14:51:16] /redfish/v1/Systems/System.Embedded.1/Storage/RAID.SL.3-1/Controllers/RAID.SL.3-1/Settings is basically empty [14:51:20] 'Oem': {'Dell': {'@odata.type': '#DellOemStorageController.v1_0_0.DellOemStorageController', [14:51:22] 'DellStorageController': {}}}} [14:51:40] I must be missing something really stupid, anyone have any ideas? [14:52:52] have you alrady looked at cookbooks/sre/swift/convert-disks.py? [14:54:18] yes, I've based my cookbook on it, trying to generalize it because the hardcoded values didn't work for us [14:54:50] ack [14:56:25] ConvertToNonRaid does not work "The operation cannot be completed either because the operation is not supported on the target device, or the RAIDType of [14:56:27] \"MD Software RAID\" does not allow the operation." [14:56:51] you have software raid? [14:57:12] also I'm not here :D [14:57:22] I shouldn't answer [15:03:47] volans: yes [15:04:17] claime: but you know there are office hours in ~1h for that if you want [15:05:01] I'll be there :p [15:42:24] filed the new proposal for the dcops ssh access: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1054894 [15:42:28] lemme know :) [17:04:01] FIRING: NTPNoSynced: NTP not synced on dbproxy2007:9100 - https://wikitech.wikimedia.org/wiki/NTP - TODO - https://alerts.wikimedia.org/?q=alertname%3DNTPNoSynced [19:46:06] 10Mail, 06collaboration-services, 06Infrastructure-Foundations, 10Phabricator, 10Release-Engineering-Team (Priority Backlog 📥): Missing some email notifications from Phabricator (2024-07-17) - https://phabricator.wikimedia.org/T370352#9992460 (10Aklapper) [19:54:22] 10Mail, 06collaboration-services, 06Infrastructure-Foundations, 10Phabricator, 10Release-Engineering-Team (Priority Backlog 📥): Missing some email notifications from Phabricator (2024-07-17) - https://phabricator.wikimedia.org/T370352#9992488 (10Aklapper) To rule out some stuff: * Any smaller [upstream c... [20:14:43] 10Mail, 06collaboration-services, 06Infrastructure-Foundations, 10Phabricator, 10Release-Engineering-Team (Priority Backlog 📥): Missing some email notifications from Phabricator (2024-07-17) - https://phabricator.wikimedia.org/T370352#9992539 (10matmarex) I can see the missing notifications at https://ph... [20:23:31] 10Mail, 06collaboration-services, 06Infrastructure-Foundations, 10Phabricator, 10Release-Engineering-Team (Priority Backlog 📥): Missing some email notifications from Phabricator (2024-07-17) - https://phabricator.wikimedia.org/T370352#9992564 (10matmarex) p:05Unbreak!→03Triage Huh, some of the missin... [20:32:08] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Request additional mgmt IP range for frack servers - https://phabricator.wikimedia.org/T370164#9992603 (10cmooney) >>! In T370164#9989347, @ayounsi wrote: > We will need to migrate the whole range to a new prefix :( Running 2 ranges is... [20:36:04] 10Mail, 06collaboration-services, 06Infrastructure-Foundations, 10Phabricator, 10Release-Engineering-Team (Priority Backlog 📥): Missing some email notifications from Phabricator (2024-07-17) - https://phabricator.wikimedia.org/T370352#9992604 (10matmarex) 05Open→03Invalid Sorry about the alarm. L... [20:37:32] 10Mail, 06collaboration-services, 06Infrastructure-Foundations, 10Phabricator, 10Release-Engineering-Team (Priority Backlog 📥): Missing some email notifications from Phabricator (2024-07-17) - https://phabricator.wikimedia.org/T370352#9992608 (10Aklapper) Nah, better to be safe here. Glad you found t... [21:04:16] FIRING: NTPNoSynced: NTP not synced on dbproxy2007:9100 - https://wikitech.wikimedia.org/wiki/NTP - TODO - https://alerts.wikimedia.org/?q=alertname%3DNTPNoSynced [21:07:26] 10netops, 06Infrastructure-Foundations, 06SRE: Issue with subscribing to GNMI telemetry on certain QFX5120 devices - https://phabricator.wikimedia.org/T370366 (10cmooney) 03NEW p:05Triage→03Low [21:08:01] 10netops, 06Infrastructure-Foundations, 06SRE: Issue with subscribing to GNMI telemetry on certain QFX5120 devices - https://phabricator.wikimedia.org/T370366#9992754 (10cmooney) [21:08:03] 10netops, 06Infrastructure-Foundations, 06SRE: Productionize gnmic network telemetry pipeline - https://phabricator.wikimedia.org/T369384#9992755 (10cmooney) [21:21:07] 10netops, 06Infrastructure-Foundations, 06SRE: Issue creating GNMI telemetry subscription to certain QFX5120 devices - https://phabricator.wikimedia.org/T370366#9992812 (10cmooney) [21:50:32] 10Mail, 06collaboration-services, 06Infrastructure-Foundations, 10Phabricator, 10Release-Engineering-Team (Priority Backlog 📥): Missing some email notifications from Phabricator (2024-07-17) - https://phabricator.wikimedia.org/T370352#9992891 (10brennen) We spent some time digging around in the datab... [22:18:59] 10netops, 06Infrastructure-Foundations, 06SRE: Issue creating GNMI telemetry subscription to certain QFX5120 devices - https://phabricator.wikimedia.org/T370366#9992974 (10cmooney) [22:21:07] 10Mail, 06collaboration-services, 06Infrastructure-Foundations, 10Phabricator, 10Release-Engineering-Team (Priority Backlog 📥): Missing some email notifications from Phabricator (2024-07-17) - https://phabricator.wikimedia.org/T370352#9992981 (10brennen) For future reference: https://wikitech.wikimed... [22:24:51] 10netops, 06Infrastructure-Foundations, 06SRE: Issue creating GNMI telemetry subscription to certain QFX5120 devices - https://phabricator.wikimedia.org/T370366#9992984 (10cmooney)