[07:52:01] 10SRE-tools, 10homer, 10Infrastructure-Foundations, 10Patch-For-Review: Validate (and document) Homer config files - https://phabricator.wikimedia.org/T272688 (10ayounsi) 05Open→03Resolved a:03ayounsi All done! [09:09:23] 10netops, 10Infrastructure-Foundations: Arista engagement - new vendor evaluation/testing. - https://phabricator.wikimedia.org/T290716 (10cmooney) [09:10:13] 10netops, 10Infrastructure-Foundations: Arista engagement - new vendor evaluation/testing. - https://phabricator.wikimedia.org/T290716 (10cmooney) p:05Triage→03Medium [09:33:06] now that the bullseye/Exim setup is working on the test server, Keith and I are planning to reimage mx2001.wikimedia.org on Monday. To prevent external mail delivery while the server is being reimaged, we need port 25 filtered on the routers while the reimage is in progress (so that external mail servers resort to mx1001 only) [09:33:22] topranks, XioNoX: is either of you available for this on Monday afternoon? [09:33:39] yep [09:33:43] +1 [09:34:48] great, will ping you on Monday then [09:35:20] just need to add the IP to https://gerrit.wikimedia.org/r/c/operations/homer/public/+/710943 [11:23:48] FYI: I'm starting a transfer of about 1.8 TB between aqs1004 and aqs1011 - Hope it won't cause you any alert noise or issues. It's using transfer.py from cumin1001. [11:25:11] XioNoX: topranks: fyi ^ [11:26:21] Thanks. Should be fine yeah, both connected at 1G so unlikely to overwhelm the core. [11:26:46] If we get any alerts for BW usage on those particular ports we'll know what it is - so thanks for the heads up! [11:30:05] Cool. It's good to find out the best place to let people know.. [11:31:34] 2nd best place after #wikimedia-sre-foundations-netops-notify-transfer ;) [11:38:03] Awesome :-) Bookmarked. [12:04:29] 10Puppet, 10Infrastructure-Foundations: Temporary failures for prometheus_puppet_agent_stats - https://phabricator.wikimedia.org/T290726 (10fgiunchedi) [12:21:17] 10CAS-SSO, 10Infrastructure-Foundations, 10Observability-Metrics, 10SRE, and 3 others: Sign-in links from Grafana dashboards don't work when not signed into SSO - https://phabricator.wikimedia.org/T269272 (10fgiunchedi) Is this still an issue @RLazarus ? I can't reproduce it anymore [12:24:18] 10netops, 10Infrastructure-Foundations, 10Observability-Metrics, 10SRE: replace check_ripe_atlas Python script with a check_prometheus backed by atlasexporter data - https://phabricator.wikimedia.org/T251155 (10fgiunchedi) This should be a prometheus-native alert in `alerts.git` nowadays [13:09:04] 10netops, 10Infrastructure-Foundations: Mellanox engagement - new vendor evaluation/testing. - https://phabricator.wikimedia.org/T290732 (10cmooney) [13:14:29] 10Puppet, 10Infrastructure-Foundations: Temporary failures for prometheus_puppet_agent_stats - https://phabricator.wikimedia.org/T290726 (10jbond) I wonder if we should just drop the git_sha from her and concentrate on getting that data into logstash? [14:32:02] 10Puppet, 10Infrastructure-Foundations: Temporary failures for prometheus_puppet_agent_stats - https://phabricator.wikimedia.org/T290726 (10fgiunchedi) I tend to agree, since we have a path forward now with logstash + puppet reports might as well back out of the git_sha in prometheus metrics (and eliminate the...