[06:57:57] 10netops, 06Infrastructure-Foundations, 06SRE: cr3-ulsfo flapping on July 14 - https://phabricator.wikimedia.org/T370048#9984008 (10ayounsi) 05Open→03Resolved a:03ayounsi Closing this task in favor of {T364092}. [07:01:18] 10netops, 06Infrastructure-Foundations, 06SRE: Upgrade core routers to Junos 22.4R3 - https://phabricator.wikimedia.org/T364092#9984011 (10ayounsi) There has been a spike of CPU usage on cr1-eqiad (with no impact), not sure if just a coincidence. [07:12:51] slyngs: I'm getting an addition for idp2004 from the sre.puppet.sync-netbox-hiera cookbook, ok to add? [07:13:00] Yes please [07:13:18] {done} [12:05:36] 10netbox, 06Infrastructure-Foundations: Netbox : sync src/ submodule - https://phabricator.wikimedia.org/T369690#9984927 (10ayounsi) 05Open→03Resolved Thanks. I went that way, master on https://gerrit.wikimedia.org/r/plugins/gitiles/operations/software/netbox has been updated to match upstream's master... [12:10:38] 10netbox, 06Infrastructure-Foundations, 13Patch-For-Review: Upgrade Netbox to 4.x - https://phabricator.wikimedia.org/T336275#9984933 (10ops-monitoring-bot) Deployed netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.7 to netbox-next - ayounsi@cumin1002 - T336275 [12:27:28] volans: about your comment on https://gerrit.wikimedia.org/r/c/operations/software/spicerack/+/1050453 not sure it's worth implementing support and tests for both netbox 4 and 3 just for saving time during the transition period, what do you think ? [12:28:27] how will you test spicerack changes with the testing netbox 4 without breaking production or blocking spicerack releases? [12:29:48] volans: answered in the CR, but similar changes were tested in Homer, here it's just 2 changes so not sure it's worth going through the whole testing [12:30:26] same with cookbooks, we don't have a full testing platform :( [12:36:31] if you add support for both v3 and v4 you can deploy spicerack and test the cookbooks with the test-cookbook binary [12:40:28] replied on CR [12:41:50] volans: can spicerack points to the netbox-next instance? [12:42:25] it's config based, so yes passing a different config [12:43:14] and that's supported by the test-cookbook [12:43:19] ok [12:44:00] I was hopping to not have to postpone the upgrade even further :) [12:47:29] you will have to add to ~/cookbooks_testing/config.yaml's instance_params mapping spicerack_config_dir and have a copy of the prod one where you change only the netbox address [12:47:38] but should be easily doable [12:51:54] lovely the pki cloud puppetserver was stuck to more than a year ago with operations/puppet [12:51:57] * elukey sigh [12:53:27] gosh [12:53:43] but miracolously all good [12:53:56] the traffic cloud nodes now can fetch certs from pki as well [13:07:39] 10CFSSL-PKI, 06Infrastructure-Foundations, 13Patch-For-Review: CFSSL gencert "remote error: tls: certificate require" - https://phabricator.wikimedia.org/T355750#9985129 (10elukey) >>! In T355750#9976497, @elukey wrote: > Completely different use case: `traffic-cache-upload-bullseye.traffic.eqiad1.wikimedia.... [13:39:51] volans: I updated the CR, but not sure what's causing CI to fail. Testing it manually seems to work fine: `getattr(server, 'device_role', server.role).slug` 'server' [13:42:34] checking [13:42:54] the child commit is for netbox 4 tests [13:43:38] so, because that is a mocked object when you do getattr also teh one that doesn't exists actually exists [13:44:16] how do I tell it "please don't exist?" :) [13:44:39] >>> a = object() [13:44:41] >>> a.foo [13:44:47] AttributeError: 'object' object has no attribute 'foo' [13:45:04] so just tell it in _netbox_virtual_machine [13:45:21] sorry in _netbox_host [13:45:30] that host.role.side_effect = AttributeError [13:46:11] you also have prospector and style complaining about 2 small thigns [13:54:26] thx! [14:15:50] volans: I tried it but still no luck :( same error [14:18:43] I hoped it would work, then we need to set a spec fo the mock [14:19:38] what does that mean? :) [14:20:08] something like [14:20:09] >>> a = mock.MagicMock(spec=['a', 'b']) [14:20:19] you pass the names of the variables that exist [14:20:21] >>> a.c [14:20:22] ... [14:20:24] AttributeError: Mock object has no attribute 'c' [14:22:43] elukey: thanks for the spicerack release <3 [14:23:35] arnaudb: still in progress, don't thank me yet :D [14:23:54] lets consider it as cheering then :p [14:25:29] :D [14:25:51] I am trying to act as Riccardo but it is difficult, python doesn't like me [14:28:28] :D [14:28:48] volans: I'm a bit lost, where should I do that? [14:31:10] _base_netbox_host [14:31:14] host = mock.MagicMock() [14:31:24] we might need to define all the allowed properties [14:31:26] and that's a pain [14:31:33] because there might be more that what we mock [14:31:37] and the tests still work [14:34:06] I might have another idea, but will need to test it [14:40:49] cool, thanks [14:43:28] jhathaway: are you around? I'd like to merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/1053650 but would very much appreciate having someone that knows what it does to help test everything is ok afterwards [14:46:45] 10netops, 06Data-Persistence, 06DBA, 06Infrastructure-Foundations, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 -lsw1-f2-eqiad - https://phabricator.wikimedia.org/T365997#9985956 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=36afd2cf-508d-4c02-a8cc-afb66ea29242) set... [15:03:01] XioNoX: I have an idea of a workaround that might be simple but I can't work on it today, has to be tomorrow morning, sorry [15:03:48] volans: thanks, worse case we revert to my previous PS :) Spicerack is the last blocker to the Netbox upgrade [15:04:58] ack [15:04:59] thx [15:07:22] 10netops, 06Data-Persistence, 06DBA, 06Infrastructure-Foundations, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 -lsw1-f2-eqiad - https://phabricator.wikimedia.org/T365997#9986058 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=81c0aaa1-44d2-4d05-942a-66bcdfb90d2d) set... [15:08:26] 10netops, 06Data-Persistence, 06DBA, 06Infrastructure-Foundations, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 -lsw1-f2-eqiad - https://phabricator.wikimedia.org/T365997#9986071 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=58bc700a-b84d-4058-9776-9f6510239089) set... [15:26:32] 10netops, 06Data-Persistence, 06DBA, 06Infrastructure-Foundations, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 -lsw1-f2-eqiad - https://phabricator.wikimedia.org/T365997#9986154 (10cmooney) Upgrade completed, all hosts back online and pinging ok. Thanks all for the assistance! [15:30:05] 10netops, 06Data-Persistence, 06DBA, 06Infrastructure-Foundations, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 -lsw1-f2-eqiad - https://phabricator.wikimedia.org/T365997#9986188 (10ABran-WMF) dbstore1009 has replication up to date on all 3 instances all 3 other nodes are repooling ↑ [15:31:47] 10netops, 06Data-Persistence, 06DBA, 06Infrastructure-Foundations, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 -lsw1-f2-eqiad - https://phabricator.wikimedia.org/T365997#9986200 (10MatthewVernon) Swift looks good, thanks. [16:00:02] New spiecerack release https://pypi.org/project/wikimedia-spicerack/8.7.0/ [16:00:09] this was not done by Riccardo :D [16:00:28] still some details to iron out but we are getting there [16:00:39] cc arnaudb --^ [16:05:53] 🎉 congrats elukey [16:19:45] 10SRE-tools, 06Infrastructure-Foundations, 10Spicerack, 13Patch-For-Review: Create the python-release repository - https://phabricator.wikimedia.org/T367410#9986547 (10elukey) a:03elukey [16:47:36] 10CFSSL-PKI, 06Infrastructure-Foundations: CFSSL gencert "remote error: tls: certificate require" - https://phabricator.wikimedia.org/T355750#9986762 (10elukey) I managed to get the certificate via: ` elukey@cumin1002:~$ sudo cfssl gencert -loglevel 0 -config /etc/cfssl/client-cfssl.conf -tls-remote-ca /etc/s... [16:48:39] jhathaway: merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1053650 [16:56:27] thanks claime [16:57:23] jhathaway: there's like 5 verp bounces an hour soooo [16:57:40] it may take a while to be sure it works, but there's no reason it shouldn't [16:57:40] patience :) [17:00:34] I'm handing the patience over to you because I have to go :p [17:01:21] happy to take it! [17:02:01] cheers <3 I'll run puppet on the other mail hosts before I go so swfrench-wmf can proceed [17:19:19] FIRING: SystemdUnitFailed: generate_vrts_aliases.service on mx1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:16:26] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Request additional mgmt IP range for frack servers - https://phabricator.wikimedia.org/T370164#9987104 (10Papaul) [18:19:18] RESOLVED: SystemdUnitFailed: generate_vrts_aliases.service on mx1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed