[00:03:34] RESOLVED: DiskSpace: Disk space serpens:9100:/ 3.025% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=serpens - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [01:44:19] 10netops, 06Infrastructure-Foundations, 06SRE: Upgrade Eqiad row E-F Spines to JunOS 22.2R3 - https://phabricator.wikimedia.org/T366361#9853206 (10OKJ04) [01:44:49] 10SRE-tools, 06Infrastructure-Foundations, 10Puppet-Infrastructure, 10Spicerack, 06SRE: Migrate puppet merges to a cookbook - https://phabricator.wikimedia.org/T366355#9853212 (10OKJ04) [01:45:42] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Include vlans with defined IRB int in device vlans even if no port present - https://phabricator.wikimedia.org/T366348#9853219 (10OKJ04) [01:51:51] 10netops, 06Infrastructure-Foundations, 06SRE: Upgrade Eqiad row E-F Spines to JunOS 22.2R3 - https://phabricator.wikimedia.org/T366361#9853308 (10JJMC89) [01:52:11] 10SRE-tools, 06Infrastructure-Foundations, 10Puppet-Infrastructure, 10Spicerack, 06SRE: Migrate puppet merges to a cookbook - https://phabricator.wikimedia.org/T366355#9853314 (10JJMC89) [01:53:03] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Include vlans with defined IRB int in device vlans even if no port present - https://phabricator.wikimedia.org/T366348#9853321 (10JJMC89) [08:06:02] brett: what's the actual problem you're trying to solve? [12:59:02] 10SRE-tools, 06Infrastructure-Foundations: Support creating phab tasks in wmflib.phabricator - https://phabricator.wikimedia.org/T366470 (10JMeybohm) 03NEW [13:30:38] 10netops, 06Infrastructure-Foundations, 13Patch-For-Review: Juniper: use export-format state-data json compact - https://phabricator.wikimedia.org/T362523#9854770 (10ayounsi) 05Open→03Resolved > Our engineering team has now indicated that the compact json is not supported, due to hardware limitations... [14:19:55] 10SRE-tools, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, and 2 others: Netbox errors caused by system board replacement - https://phabricator.wikimedia.org/T358542#9855011 (10Volans) p:05Triage→03Medium [14:22:29] 10SRE-tools, 06Infrastructure-Foundations, 10Puppet-Infrastructure, 10Spicerack, 06SRE: Migrate puppet merges to a cookbook - https://phabricator.wikimedia.org/T366355#9855026 (10MoritzMuehlenhoff) p:05Triage→03Medium [14:26:19] hello :) XioNoX is there a need for a puppet run on some specific hosts before running sre.hosts.rename? [14:26:51] claime: not the rename [14:26:58] I just tried to do it for mw1358 to wikikube-worker1001, and I got a spicerack.redfish.RedfishError: PATCH https://10.65.2.21/redfish/v1/Managers/iDRAC.Embedded.1/EthernetInterfaces/NIC.1 returned HTTP 400 with message: "failed, Invalid URI" [14:27:00] move-vlan, yes [14:27:11] hmm [14:27:28] could be that the iDRAC is too old [14:27:47] ah, possibly [14:28:34] I'll try with another machine that should be newer, I'll come back to this one later then [14:29:04] claime: you can also try the `sre.hardware.upgrade-firmware` cookbook first [14:29:10] XioNoX: ack [14:32:21] claime: and the doc https://wikitech.wikimedia.org/wiki/SRE/Dc-operations/Platform-specific_documentation/Dell_Documentation#Updating_Firmware [14:33:02] dcops can help too [14:33:10] 10SRE-tools, 06Infrastructure-Foundations: Support creating phab tasks in wmflib.phabricator - https://phabricator.wikimedia.org/T366470#9855087 (10Volans) p:05Triage→03Medium [14:34:16] XioNoX: yeah it's not happy [14:34:41] I'll check with dcops once I'm done running the others [14:34:42] maybe idrac is too old for the upgrade script :) [14:34:44] XioNoX: you could add a check in the cookbook [14:35:00] volans: ah? that would be nice [14:36:54] I think I have hit that check many times in running the upgrade-firmware cookbook [14:37:00] it is nice [15:41:54] 10SRE-tools, 06DC-Ops, 06Infrastructure-Foundations, 10Spicerack, 13Patch-For-Review: Spicerack: expand Supermicro support in the Redfish module - https://phabricator.wikimedia.org/T365372#9855462 (10elukey) Network config for kubernetes2054 as seen by Redfish (supermicro): ` >>> pprint(a.request("get",... [15:43:43] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: codfw row C/D upgrade racking task - https://phabricator.wikimedia.org/T360789#9855466 (10Papaul) [16:33:39] 10SRE-tools, 06DC-Ops, 06Infrastructure-Foundations, 10Spicerack, 13Patch-For-Review: Spicerack: expand Supermicro support in the Redfish module - https://phabricator.wikimedia.org/T365372#9855808 (10elukey) I checked the BIOS settings of kubernetes2054 (Supermicro nodes already configured by DCops) and... [16:36:12] 10SRE-tools, 06DC-Ops, 06Infrastructure-Foundations, 10Spicerack, 13Patch-For-Review: Spicerack: expand Supermicro support in the Redfish module - https://phabricator.wikimedia.org/T365372#9855821 (10elukey) Next steps: * Refactor the provision cookbook to be less DELL specific and allow other vendors, l... [16:46:27] Hmm the name change has not propagated has expected, some hosts are not resolving the new name (which is unexpected because sre.dns.netbox is run by the cookbook) [16:54:29] claime: oh that's not good [16:54:33] and weird [16:54:35] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: codfw row C/D upgrade racking task - https://phabricator.wikimedia.org/T360789#9855905 (10Papaul) [16:54:38] yeah [16:54:38] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: codfw row C/D upgrade racking task - https://phabricator.wikimedia.org/T360789#9855906 (10Papaul) @cmooney all good on lsw1-d4, lsw1-c2 and lsw1-d8 [16:55:17] claime: do you have an example? [16:55:35] is it transient or still happening? [16:55:40] yeah, I can't resolve wikikube-worker1001.eqiad.wmnet from cumin for instance [16:56:06] It's making puppet fail on some hosts https://puppetboard.wikimedia.org/nodes?status=failed [16:56:34] claime: there is no wikikube-worker1001 in netbox [16:56:50] 1002 works [16:56:54] XioNoX: I'm an idiot... [16:57:04] That's the one that's waiting on a idrac update [16:57:09] so it wasn't renamed [16:57:12] sorry for the scare [16:57:29] I probably will have to remove it from puppet temporarily until I can actually do the rename [16:57:45] claime: did it show an error in the rename? [16:58:08] XioNoX: that's the one I showed you earlier [16:58:26] spicerack.redfish.RedfishError: PATCH https://10.65.2.21/redfish/v1/Managers/iDRAC.Embedded.1/EthernetInterfaces/NIC.1 returned HTTP 400 with message: "failed, Invalid URI" [16:58:34] ah yeah ok [16:58:35] And the idrac cookbook update doesn't work either [16:58:39] at least it's not a new failure :) [16:58:41] yeah [16:59:12] I'll patch puppet to de-rename it for now so it doesn't make puppet fail [20:02:57] 10SRE-tools, 06SRE: Provide an utility script to replace a failed device in raid 0 array - https://phabricator.wikimedia.org/T350492#9856945 (10Volans) p:05Triage→03Medium yeah, removing the unowned tag, doesn't seem to fit this one IMHO.