[10:03:34] <jynus>	 good morning
[10:05:13] <jynus>	 it seems all backups are failing?
[10:05:55] <marostegui>	 yeah I created https://phabricator.wikimedia.org/T351617
[10:06:32] <jynus>	 they may be running but not reporting, I will check
[10:06:59] <marostegui>	 yeah I didn't spend much time on it given that you were coming back today
[10:07:46] <jynus>	 last s1 dump is from 2023-11-21--00-00-05
[10:08:14] <jynus>	 and last snapshot is from 2023-11-20--00-00-01
[10:08:49] <jynus>	 so the backups are happening but with 0 monitoring
[10:09:38] <jynus>	 Can't connect to MySQL server on 'localhost' ([Errno 111] Connection refused)"
[10:09:47] <jynus>	 :-(
[10:11:07] <jynus>	 it is not reading the new config file, despite the new config file being correct
[10:11:21] <marostegui>	 is that related to the puppet migration?
[10:11:35] <jynus>	 I don't know yet
[10:11:59] <jynus>	 I am an idiot
[10:12:08] <jynus>	 stats_file: '/etc/wmfbackups/statistics.cnf'
[10:12:27] <jynus>	 I changed the mysql connection file
[10:12:37] <jynus>	 but I am still pointing to the old one
[10:12:57] <jynus>	 so it loads 0 config and tries connecting with default parameters (localhost)
[10:13:07] <jynus>	 so yeah, just a puppet fix will work
[10:26:11] <jynus>	 I am applying https://gerrit.wikimedia.org/r/c/operations/puppet/+/976158 which should solve the monitoring issue
[10:26:20] <marostegui>	 great!
[10:27:15] <jynus>	 or wait
[10:27:26] <jynus>	 maybe I should stop hardcoding such a variable
[10:55:26] <jbond>	 jynus: am i abl to progress with migrating the backup roles? 
[10:55:42] <jynus>	 jbond: I am fixing right now the last issue
[10:55:53] <jbond>	 jynus: ack thanks
[10:56:04] <jbond>	 marostegui: any db stuff i can migrate roles or hosts?
[10:56:31] <marostegui>	 jbond: No, I need to time to check stuff
[10:56:47] <jbond>	 marostegui: ack
[11:09:06] <jynus>	 jbond: not sure if important, but I got a 500 error on cumin2002 while running puppet
[11:09:53] <jbond>	 ill double check but its probably transient
[11:10:12] <jynus>	 yeah, not worried about that
[11:10:26] <jynus>	 just in case there were still tuning needed for load or something
[11:10:53] <jbond>	 jynus: i think the load is good now but we hav an issue when puppet-merge runs
[11:11:07] <jynus>	 ah, interesting
[11:11:34] <jynus>	 knowing that is already useful to me
[11:20:14] <jbond>	 jynus: the task is https://phabricator.wikimedia.org/T350809 (just reopened it)
[11:21:09] <jynus>	 thank you, again- not worried about it, just knowing it can happen and why is already useful
[11:58:50] <jynus>	 https://phabricator.wikimedia.org/T351617#9348521
[13:21:11] <jynus>	 See the recoveries ongoing now on -operations
[14:24:36] <Emperor>	 Hi folks, could I get a +1 to expand our envoy rollout to one more codfw node and one eqiad node, please? I actually aim to deploy tomorrow morning assuming the existing envoy node behaves itself overnight. https://gerrit.wikimedia.org/r/c/operations/puppet/+/976229
[14:25:03] <Emperor>	 As well as the existing swift monitoring you can see the new envoy graphs for the one codfw-swift node - https://grafana.wikimedia.org/d/VTCkm29Wz/envoy-telemetry?orgId=1&var-datasource=codfw%20prometheus%2Fops&var-destination=All&var-origin=swift&var-origin_instance=All
[14:37:13] <arnaudb>	 Emperor: done!
[14:46:18] <urandom>	 Emperor: lgtm, but out of curiosity, why does it require a reimage?
[14:55:20] <Emperor>	 because the nginx puppetry doesn't have a present/absent parameter you can use to remove all the nginx puppet resources
[14:55:42] <Emperor>	 (so rather than trying to remove them all by hand, just reimage to start from a clean slate)
[14:56:54] <urandom>	 ah, gotcha
[15:00:01] <Emperor>	 urandom: you may be the wrong person to ask, but it came up in a CR you sent my way, so: why is profile::installserver::preseed::preseed_per_hostname: set in hieradata/role/common/apt_repo.yaml ?
[15:02:09] <urandom>	 there was an email to ops@ about a week ago, netboot.cfg is now being generated to make it less error prone
[15:02:53] <Emperor>	 OK, but why in the apt_repo hiera file?
[15:03:04] <urandom>	 oh, right... yeah, I wondered about that myself
[15:03:36] <urandom>	 does the install server and apt repo run on the same machine?
[15:04:00] <urandom>	 that wouldn't necessarily justify that, but might explain it?
[15:05:08] <Emperor>	 no - role(installserver) is install[12]004,3003,[456]002 apt_repo is apt[12]001] and apt1002
[15:06:08] <Emperor>	 brouberol: do you know why apt_repo.yaml was used for this hiera stuff rather than an installserver hiera file?