[00:06:13] (DiskSpace) resolved: Disk space puppetmaster1001:9100:/ 5.527% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=puppetmaster1001 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [00:19:24] (SystemdUnitFailed) firing: netbox_report_accounting_run.service Failed on netbox1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:48:35] (SystemdUnitFailed) resolved: netbox_report_accounting_run.service Failed on netbox1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [01:58:35] (SystemdUnitFailed) firing: httpbb_kubernetes_mw-api-int_hourly.service Failed on cumin1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:58:35] (SystemdUnitFailed) resolved: httpbb_kubernetes_mw-api-int_hourly.service Failed on cumin1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:53:18] 10netops, 10Infrastructure-Foundations, 10SRE: CRs ECMP traffic to LVS VIPs despite higher MED on backup route - https://phabricator.wikimedia.org/T348446 (10ayounsi) Maybe prepending the AS on the backup LVS is easier to do than expected? i though PyBal's development had stopped, but seeing @Vgutierrez 's [... [08:24:35] 10netops, 10Infrastructure-Foundations, 10SRE: CRs ECMP traffic to LVS VIPs despite higher MED on backup route - https://phabricator.wikimedia.org/T348446 (10Vgutierrez) >>! In T348446#9252677, @ayounsi wrote: > Maybe prepending the AS on the backup LVS is easier to do than expected? > i though PyBal's devel... [08:25:17] jhathaway: wrt the labs/private change mentioned in -sre, I can't seem to find it on gerrit ( https://gerrit.wikimedia.org/r/q/Ibebc930cdaaf35f8f9fc77ded654cb0b307ff2d9 ) by any chance did you push it without a CR? Usually we always create the CR and then self C+2,V+2,submit to merge it to at least keep track of them. [08:26:04] related to the specific change I wonder if we should instead create a key for every dev-env on the fly instead of having a single one that is also public [08:53:48] jbond: is https://gerrit.wikimedia.org/r/plugins/gitiles/operations/software/netbox-extras/+/refs/heads/master/customscripts/hiera_export.py still used? IIRC now it's all done by the cookbook via PQL right? [11:17:29] volans: sorry for the late response, i had to go to the council today to sort out some documents and im still a bit naive in my expectation for puntuality with such engadgemtns. i need to rember that a meeting at 10:30 means some undetermined point after 10:30 so best not make plans ;) [11:17:42] im pretty sure its nt needed but double checking now [11:18:24] jbond: lol, sounds about right as mediterranean bureacrarcy expectations ;) [11:18:32] no hurry [11:19:17] indeed :) [12:05:39] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Consolidate Automation Templates for DC Switches - https://phabricator.wikimedia.org/T312635 (10cmooney) [12:24:47] 10Puppet, 10Infrastructure-Foundations, 10Puppet CI, 10SRE, and 2 others: update pcc with puppet 7 support - https://phabricator.wikimedia.org/T236373 (10jbond) p:05Low→03Medium [12:28:19] 10Packaging, 10Infrastructure-Foundations, 10cloud-services-team (FY2023/2024-Q1): wmfbackups packages for Debian Bookworm - https://phabricator.wikimedia.org/T347740 (10jcrespo) > Do you expect the Bullseye package to work in Bookworm without the patches you mentioned? Yes. [12:46:11] 10netops, 10Infrastructure-Foundations, 10SRE: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - https://phabricator.wikimedia.org/T348977 (10cmooney) p:05Triage→03Low [12:52:50] 10netops, 10Infrastructure-Foundations, 10SRE: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - https://phabricator.wikimedia.org/T348977 (10cmooney) [13:37:16] volans yes that was my mistake, I normally to a git review, but muscle memory got the best of me, sorry [13:38:10] no worries, I don't know why it's setup to accept pushes at all :D [13:38:26] yeah, disabling would be nice [13:38:39] we should ask WMCS [13:38:56] I guess as the owner of the that repo, although the "ownership" is a bit fuzzy on that one [13:39:08] the important part is not doing it all the time :D errors happens [13:47:07] for sure [13:49:03] * jbond is a big offender of directly pushing to that repo [14:03:55] jhathaway: about the change itself, are we ok with that key being public? do we want to auto-geneate it somehow? [14:07:07] I think autogeneration is worth considering, but it is quite a bit more complex, I'm not sure it matters too much for pki certs, but is more important for ssh keys, if the dev environments are shared at some point, at present though dev envs are only local [14:11:34] ok [14:34:32] 10SRE-tools, 10Infrastructure-Foundations, 10Spicerack: spicerack: tox fails to install PyYAML using python 3.11 on bookworm - https://phabricator.wikimedia.org/T345337 (10dcaro) [14:51:47] FYi age is the implmentation morit.z was looking at but specifically the rage (rust implmentation) https://github.com/str4d/rage [14:54:11] Packaging of the prometheus-ethtool-exporter: https://gitlab.wikimedia.org/slyngshede/prometheus-ethtool-exporter/-/tree/master/debian [14:56:33] 10SRE-tools, 10Infrastructure-Foundations, 10Spicerack: spicerack: tox fails to install PyYAML using python 3.11 on bookworm - https://phabricator.wikimedia.org/T345337 (10dcaro) Hmm, elasticsearch-curator also tries to build and install pyyaml 3 if it's not there, and failing for me on python3.11. Note that... [14:56:35] 10netops, 10Infrastructure-Foundations, 10SRE: CRs ECMP traffic to LVS VIPs despite higher MED on backup route - https://phabricator.wikimedia.org/T348446 (10cmooney) >>! In T348446#9252677, @ayounsi wrote: > Maybe prepending the AS on the backup LVS is easier to do than expected? > i though PyBal's developm... [15:11:10] thx jbond [15:17:03] noprobs [15:40:07] 10Mail, 10Infrastructure-Foundations, 10SRE, 10Znuny, 10collaboration-services: OTRS/mail: investigate why "T=remote_smtp_signed: all hosts for 'ticket.wikimedia.org' have been failing for a long time" - https://phabricator.wikimedia.org/T297160 (10LSobanski) a:03Arnoldokoth No reports for two years. L... [15:41:15] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Change cloud-instance-transport vlan subnets from /30 to /29 - https://phabricator.wikimedia.org/T348140 (10dcaro) a:03dcaro [15:42:49] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 2 others: Change cloud-instance-transport vlan subnets from /30 to /29 - https://phabricator.wikimedia.org/T348140 (10dcaro) [16:03:36] (SystemdUnitFailed) firing: docker-reporter-k8s-images.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [17:03:35] (SystemdUnitFailed) firing: (2) docker-reporter-k8s-images.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:03:35] (SystemdUnitFailed) firing: (2) docker-reporter-k8s-images.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [22:03:35] (SystemdUnitFailed) firing: docker-reporter-k8s-images.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [23:14:13] (DiskSpace) firing: Disk space puppetmaster1001:9100:/ 5.948% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=puppetmaster1001 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace