[06:59:51] 10netbox, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Setup Swift Storage for Netbox image (was: netbox won't allow me to upload photos of the rack) - https://phabricator.wikimedia.org/T209182 (10akosiaris) Hi everyone, Since the last comment is from 2 years ago from a person no longer with t... [07:25:21] 10netbox, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Setup Swift Storage for Netbox image (was: netbox won't allow me to upload photos of the rack) - https://phabricator.wikimedia.org/T209182 (10Volans) 05Open→03Resolved a:03Volans This can be solved, was just forgotten AFAICT. We do us... [10:49:21] Hello. Could I ask for some guidance please? Data Engineering has been looking at building a new conda environment as a Debian package, with which we will apply on all of the hadoop workers. [10:49:22] This is similar to what we currently do with anaconda-wmf (https://gerrit.wikimedia.org/r/plugins/gitiles/operations/debs/anaconda-wmf/%2B/refs/heads/debian) and will hopefully replace that in time. [10:50:19] Currently the deb file for this new conda environment is being built using GitLab-CI and there is a .deb file produced as an artifact there: https://gitlab.wikimedia.org/repos/data-engineering/conda-base-env [10:50:38] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Finalise design extension of WMCS networks to new cloudsw in Eqiad rows E/F - https://phabricator.wikimedia.org/T304989 (10cmooney) @nskaggs anyone with access to Netbox and ability to run homer (which I believe should be most of SRE) shoul... [10:50:55] https://gitlab.wikimedia.org/repos/data-engineering/conda-base-env/-/packages [10:52:51] My question is: Is it permissible for us to add this package to apt.wikimedia.org with reprepro, or should be be looking at rebuilding it outside of GitLab-CI (e.g. on build2001.codfw.wmnet)? [10:53:34] cc moritzm ^^^ for authoritative answer [10:55:00] my understanding is that in the current state we don't trust artifacts generated by CI for production. I think is the same for GitLab but I'm not 100% sure. I know that the capability to generate trusted artifacts from CI has been discussed since long time though. [11:13:27] Gotcha. Thanks volans. I know that there are a couple of places where we build jar files with jenkins and then host them on archiva for use in analytics jobs - but maybe conda environments and deb files need more stringent controls. [11:29:21] hi, I'm looking at moving some icinga checks to prometheus/alertmanager and have them open tasks, specifically checking mgmt ssh, my current plan is to issue a puppetdb query to ask for physical hosts and use their mgmt dns records, is that the right approach? should/can I use netbox data instead ? [11:29:31] forgot to link the task, T225140 [11:29:31] T225140: Icinga alerts that should open tasks instead of alerting - https://phabricator.wikimedia.org/T225140 [11:31:56] godog: how would prometheus chech the SSH connection? [11:31:59] *check [11:33:19] I don't think a puppetdb approach would be sound, as we have in puppetdb only hostss with a OS running and puppet running, but we want to reach the mgmt interface (hence monitoring it) also when the OS is not yet/anymore there or the host is broken IMHO [11:34:18] netbox is a more authoritative source, for example in spicerack you can get the mgmt FQDN with https://doc.wikimedia.org/spicerack/master/api/spicerack.netbox.html#spicerack.netbox.NetboxServer.mgmt_fqdn [11:34:20] volans: blackbox exporter would connect and check for the ssh banner, more or less like icinga does now [11:34:39] where does it run? [11:35:13] ATM on prometheus hosts, but we can run it from whichever host(s) [11:35:19] got it [11:37:18] good point re: netbox vs puppetdb, how would you recommend making the list of mgmt interfaces available to prometheus ? [11:37:23] (happy to discuss on task too) [11:37:35] also what is your expected workflow? get all the mgmt host FQDNs or get the mgmt FQDN of a given hostname? [11:38:18] the former, I think a list of mgmt fqdn will do [11:39:19] ok, so I see various approaches here if you have a task I can comment more in detail after lunch [11:39:42] for sure, thank you! https://phabricator.wikimedia.org/T225140 [11:40:01] that's the general task for "alerts that should open tasks but don't yet" [11:40:16] ack [11:40:17] will do [11:40:36] thank you! I'll open a subtask to move mgmt checks to prometheus now fwiw [11:41:13] ok I'll look for that one [11:41:26] https://phabricator.wikimedia.org/T310266 [12:10:35] btullis: I think we're already doing this, so adding further ones seems fine to me? https://debmonitor.wikimedia.org/packages/anaconda-wmf [12:10:53] or are these different in some manner? [12:12:45] Cool, I think the only difference really is the manner of the build. anaconda-wmf is built by hand on a build server according to these instructions: https://gerrit.wikimedia.org/r/plugins/gitiles/operations/debs/anaconda-wmf/+/refs/heads/debian/README.debian.md [12:14:25] The new conda-base-env is built by GitLab-CI under docker, but it uses our wikimedia-buster base image to do so and everything is open, so I reckon that it's safe enough. [12:14:53] There won't be any automatic step between the build and reprepro - that would be a manual operation I think. [12:18:13] yeah, in this specific case it seems totally fine. it's not that the previous build step had any meaningful control over the 10G of Conda files either :-) [12:21:15] Great, thanks. [16:15:51] 10netbox, 10Infrastructure-Foundations: netbox network report improvment - https://phabricator.wikimedia.org/T310299 (10ayounsi) p:05Triage→03Low [20:13:55] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Finalise design extension of WMCS networks to new cloudsw in Eqiad rows E/F - https://phabricator.wikimedia.org/T304989 (10nskaggs) Thanks for the explanation. I just want to make sure if not a cookbook, then a runbook at least to make it v... [21:34:04] 10Mail, 10Infrastructure-Foundations, 10SRE, 10Epic: Move most (all?) exim personal aliases to WMF ITS - https://phabricator.wikimedia.org/T122144 (10Dzahn) 05Stalled→03Open [21:50:25] 10Mail, 10Infrastructure-Foundations, 10SRE, 10Epic: Move most (all?) exim personal aliases to WMF ITS - https://phabricator.wikimedia.org/T122144 (10Dzahn) I talked with Jesse about all this. We agreed I will follow-up about the last few things, you Faidon, also mentioned in our mail. cpt-leads@, techchom... [21:50:45] 10Mail, 10Infrastructure-Foundations, 10SRE, 10Epic: Move most (all?) exim personal aliases to WMF ITS - https://phabricator.wikimedia.org/T122144 (10Dzahn) 05Open→03In progress [23:31:04] 10Puppet, 10Infrastructure-Foundations, 10Patch-For-Review: Package 'cgroup-bin' has no installation candidate on Debian 11 (modules/mediawiki/manifests/cgroup.pp) - https://phabricator.wikimedia.org/T309449 (10Legoktm) 05Open→03Resolved