[08:53:21] alert2001 alert, maybe a network or vendor glitch? [08:53:51] I was able to reach wikitech static from here, but maybe something failed between it and codfw [08:58:20] there are also various systemctl units failed on alert2001 (as it's not checked by icinga), but I can't look at them right now [09:02:54] I will have a quick look [09:03:05] wmf_auto_restart_ircecho.service and wmf_auto_restart_prometheus-icinga-am.service [09:03:23] so that to me would point to more of a net failure on alert2001 side [09:04:57] both of those sound like something that would only be running on the active host (alert1001), which might explain why ther're failing [09:09:20] indeed there was network request failures from alert2001 [09:09:30] at the same time than firewall drops [09:43:18] FYI the sync_check_icinga_contacts.service if failing because keyholder has noot be re-armed after the reboot [09:48:17] i just rearmed the keyholder on alert2001 [10:58:32] FYi all i have deployed a change restricting the list of ports allowed through the proxy https://gerrit.wikimedia.org/r/c/operations/puppet/+/753016 [11:04:52] Just as a heads up: Recently, scp has switched to defaulting to sftp-the-protocol-and-subsystem when doing copies. Gerrit does not allow/have the sftp subsystem, so the commands the gerrit webui might give you won't work, unless you also add -O to that command line. I _think_ this is only relevant for Bookworm and onwards (and other distros that closely track OpenSSH releases). [11:05:36] Might be useful to amend the WMF-specific .ssh/config to use the equivalent Option to -O. Looking into that currently [11:06:59] is it possible to make gerrit support the new defaults, instead of requiring every user to add a workaround? [11:07:16] I have too little knowledge about Gerrit to answer that [11:07:54] e.g. I don't know if the ssh server there is implemented by gerrit itself or just a restricted openssh server [11:10:21] *sigh* of course -O for scp does not have an equivalent ssh_config option [11:24:32] o/ [11:24:47] klausman: Gerrit has a Java based SSH server which is listening on port 29418 [11:25:31] it only supports a few Gerrit commands and git related commands such as receive-pack / upload-pack [11:26:25] if there is a command shown by Gerrit web UI which is invalid I am definitely happy to have task filed against #gerrit and investigate / report back to upstream [11:27:53] also recent OpenSSH versions disable some algo for the key exchange https://phabricator.wikimedia.org/T276486 it has the workaround [11:27:54] weeelll [11:28:01] Host gerrit.wikimedia.org [11:28:01] PubkeyAcceptedKeyTypes +ssh-rsa [11:28:18] the command works for old versions, but newer versions need -O (which old versions would reject) [11:28:44] So changing the command in the webui would not really work, since all you get is to choose who to break [11:29:22] The KEX options work because there are shh_config options for those. The -O option for scp does not have an equivalent ssh_config option [11:29:43] :-\ [11:29:47] Exactly. [11:29:58] at least upstream has fixed in Gerrit 3.6 (we run 3.4 now) [11:30:12] So 3.6 does support sftp? [11:30:14] I plan to upgrade to 3.5 end of may / beginning of June then to 3.6 [11:30:29] as for sftp , I never heard of using sftp with git [11:30:41] Well, it's not git per se. Hang on [11:30:50] https://gerrit.wikimedia.org/r/admin/repos/research/ores/wheels [11:31:06] Note how on the ssh tab, the cloen command is followed by an scp command [11:31:22] This is because that repo needs Git LFS hooks [11:31:34] The scp command fails, since [see above] [11:31:53] AH [11:35:00] then that hook is to add the Change-Id in the commit messages [11:39:15] guess one could use `scp -o PubkeyAcceptedKeyTypes=+ssh-rsa` [11:39:32] But it's not the KEX that's the problem [11:39:45] then reading https://man.openbsd.org/scp I notice they have changed to use SFTP protocol instead of SCP bah [11:39:53] exactly. [11:40:14] and there is no config-file option to change back, only the command line option -O [11:40:37] (to scp, not ssh) [11:41:00] so that is a different issue than T276486 which is about key exchange [11:41:01] T276486: gerrit's sshd is incompatible with RSA pubkeys + Fedora 33 clients (and future versions of OpenSSH proper) - https://phabricator.wikimedia.org/T276486 [11:41:30] Gerrit upstream is updating the Java ssh daemon in Gerrit 3.6 "maybe" that comes with sftp support [11:42:48] I am also puzzled that OpenSSH upstream didn't implement fallback-to-scp logic [11:43:10] forces folks to switch :D [11:43:24] I guess in most case the remote server supports sftp just fine [11:44:49] klausman: may you file a task about newish openssh switching to sftp and our Gerrit not supporting it ? [11:45:23] I would need a recent openssh client to test it out against a newish Gerrit version locally, then I can file the task upstream if needed [11:51:12] * hashar off for lunch [11:54:45] will do! also, lunch [13:12:40] klausman: they want to deprecate the scp protocol entirely because it has a bunch of gross stuff in it (like needing shell globs to be quoted, for instance) [13:12:48] there was some traffic on the openssh-dev list about this recently [13:16:02] klausman: i don't have an scp that has `-O`; on my workstation or on cumin1001 [13:17:18] kormat: you don't have 9.0 then [13:17:26] yeah, just found it in the release note [13:17:28] *notes [13:18:01] ahh. and now i know what 'bookworm' refers to. [13:22:15] Yeah, I'm aware of the whys and wherefores of OpenSSH getting rid of scp-the-protocol, and the reasons are sound. But I wished they'd done it in a more cross-compatible way. [13:28:22]  [13:30:15] klausman: yeah, IME they're not great at that, sometimes even just wrt forwards/backwards-compat (see also T253824) [13:30:16] T253824: planned upstream deprecation of the ssh-rsa signing algorithm (RSA with SHA-1) - https://phabricator.wikimedia.org/T253824 [13:30:18] it is unfortunate [15:05:01] Is anyone aware of the maxmind invoicing problem, is there a phabricator task? [15:08:07] https://phabricator.wikimedia.org/T302864#7751417 lists Data Engineering as the new point of contact, best to reach out to Olja Dimitrjevic [15:08:14] jhathaway: I think that the most relevant recent ticket is this: https://phabricator.wikimedia.org/T302864 but that is marked as resolved - What's the current issue? [15:09:41] btullis: issue looks very similar, invoicing has lapsed again [15:10:13] moritzm: thanks, I'll reach out to Olja [15:20:01] jhathaway: there is a thread in slack let me work out how to get a slack link [15:20:13] https://wikimedia.slack.com/archives/C037238JCSC/p1651590995035509 [15:20:26] jbond: thanks [17:27:31] Why is geoip-database actually installed on so many hosts? https://debmonitor.wikimedia.org/packages/geoip-database [17:27:37] I don't see a reason for that in puppet [17:28:01] and when I removed it from an install host and run puppet it doesn't come back either. [17:28:17] was that pulled in by default some time in the past? [17:28:51] asking because we got that email saying install1003 tried to run it.. but I don't see a cron for that either.. whut [18:09:52] mutante: it appears to be part of our base package set? [18:09:55] $ git grep geoip-database|cat [18:09:56] modules/apt/files/base_packages.bullseye:geoip-database [18:09:58] modules/apt/files/base_packages.buster:geoip-database [18:10:00] modules/apt/files/base_packages.stretch:geoip-database [18:12:36] jhathaway: ah! interesting. that would explain why it does not get reinstalled after removal. But it does not explain why it actually tried to run an update [18:12:45] and I think we don't want that (anymore) [18:12:51] true [18:12:58] all hosts are supposed to get that from "volatile" [18:13:04] and only the puppetmasters pull it [18:13:09] yeah, seems strange to have that as part of a base package set [18:13:18] I think at some point in the past that was changed [18:13:47] now I am wondering if "install1003" was in that email because it was a random host out of many [18:14:02] or because it has public IP [18:14:11] either way I purged it from install* [18:15:47] will upload a patch to remove it from base.. but .. I still find it a bit mysterious.just having that package doesnt mean you get our license [18:16:57] dpkg -L geoip-database does have /usr/share/GeoIP/GeopIP.dat though [18:18:48] and on install1003 there is no more /usr/share/GeoIP now because I used --purge [18:20:37] finally '/usr/bin/geoipupdate' does not exist on those hosts that still do have the package installed. and that is the update command. I am confused [18:21:29] did geoipupdate get removed as a side effect of purging geoip-database? [18:22:22] no, because '`/usr/bin/geoipupdate' (No such file or directory)" on 97 ms* hosts [18:23:06] but the same hosts have ii geoip-database and ii libgeoip1:amd64 [18:24:08] kind of want to purge it from everything but also wait if that email appears again now, from a host that is not install1003 or if it doesnt happen again [18:25:28] ah, right, so "Package['geoipupdate']," is distinct from package 'geopip-database" [18:25:58] and geoipupdate is only installed on puppetmasters as it was expected for all of them : https://debmonitor.wikimedia.org/packages/geoipupdate [18:26:07] which still does not explain why install* tried an update [18:27:01] Is anyone besides us (search team) still using nginx for TLS proxy, or has everyone else gone over to envoy? Curious as the version of nginx that ships w/bullseye doesn't like the "ssl_ecdhe_curve" directive [18:28:01] inflatador: nginx is used for example on conf* (etcd) hosts, profile::etcd::tlsproxy but they are just not bullseye yet [18:28:33] thanks mutante ! I'm picking my way thru nginx docs ( http://nginx.org/en/docs/http/ngx_http_ssl_module.html#ssl_ecdh_curve ) [18:28:58] looks like it's "ssl_ecdh_curve" without the "e", but in practice that doesn't seem to work either. [18:29:06] inflatador: I don't know whether they are supposed to be replaced by envoy though. Possibly [18:30:01] we do need to move eventually, but would rather do that once we're done with all the OS and ES version upgrades [18:30:30] inflatador: https://phabricator.wikimedia.org/rOPUP9a3645877fe32273152cff908e5e884dc53c8001 [18:30:38] swift disabled it for that same reason [18:30:43] not in bullseye it said [18:31:13] looks like they added "ssl_ecdhe_curve => false," [18:31:16] ah, thanks much! [18:41:20] mutante: I can't find any reason why install1003 would try to download the geoip database [18:49:16] jhathaway: I realized why it's install1003 just now [18:49:22] webproxy.eqiad.wmnet is an alias for install1003.wikimedia.org. [18:49:28] they will see the proxy IP [18:49:37] could have been any host behind it [18:50:02] so I think in reality it was puppetmaster [18:52:27] that gets us into the second part of this.. why did the expired license matter..even when we stopped trying to get v1 databases.. but there is already a mail thread about that going on [18:52:33] with their sales [18:59:17] taking a longer lunch break, bbl [19:03:24] mutante: that makes sense