[01:09:31] <icinga-wm>	 PROBLEM - MariaDB sustained replica lag on m1 on db2160 is CRITICAL: 25.4 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2160&var-port=13321
[01:11:21] <icinga-wm>	 RECOVERY - MariaDB sustained replica lag on m1 on db2160 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2160&var-port=13321
[08:23:43] <marostegui>	 jynus: db2093 (db_inventory) was upgraded to 10.6 last week, did the backup run fine there?
[08:24:07] <jynus>	 it runs tonight, sorry
[08:24:13] <jynus>	 but I can do a test run now
[08:24:24] <marostegui>	 no no
[08:24:26] <marostegui>	 don't worry
[08:24:27] <marostegui>	 no rush
[08:50:43] <jynus>	 https://phabricator.wikimedia.org/P43590
[08:54:14] <jynus>	 marostegui: I am thinking of scheduling the es backup this week later, I wonder when it could be a good time? As you may need to repool es servers after the maintenance (?)
[09:03:22] <marostegui>	 jynus: how long does it take?
[09:03:26] <marostegui>	 cause you maybe can run it now?
[09:03:32] <marostegui>	 And it might be finished before tomorrow's maintenance?
[09:03:42] <marostegui>	 If not, anytime after the 7th maintenance could be good
[09:03:43] <marostegui>	 Up to you :)
[09:04:21] <jynus>	 it takes 32h+
[09:05:20] <jynus>	 the thing is, if the window gets long, you may need extra time for repools
[09:05:53] <jynus>	 so maybe something like 2 hours after the window may be safer
[09:15:15] <marostegui>	 yeah
[09:15:19] <marostegui>	 that sounds good
[09:36:03] <jynus>	 this is long due configurability that will make easier to reschedule backups: https://gerrit.wikimedia.org/r/c/operations/puppet/+/886833
[09:42:41] <marostegui>	 oh that is useful indeed
[09:54:20] <jynus>	 I wonder if we should test the 10.6 upgrade with the 2 backup replicas first, rather than the misc?
[09:54:39] <marostegui>	 yeah, so db2093 is done
[09:54:45] <marostegui>	 if the backup works fine, I will upgrade db1115
[09:54:48] <jynus>	 that way I can put them down without affecting production
[09:54:55] <marostegui>	 then we can check with a backup source if you want
[09:54:56] <jynus>	 yeah, it did, see paste I did
[09:55:29] <marostegui>	 Oh sorry missed that
[09:55:38] <jynus>	 the idea is to do a full cycle (backup - deletion - recovery) on a non trivial host
[09:55:40] <marostegui>	 Then I am going to upgrade db1115, which is orchestrator too :)
[09:55:55] <jynus>	 as db_inventory is very very small
[09:56:00] <marostegui>	 yep
[09:56:45] <jynus>	 we could also do some backup sources, there are some that are redundant
[10:09:50] <jynus>	 this will delay es codfw backups 24 hours, just in case: https://gerrit.wikimedia.org/r/c/operations/puppet/+/886834
[10:36:47] <marostegui>	 Orchestrator is going to be unavailable for a bit
[10:40:57] <marostegui>	 orchestratror is now back
[12:32:05] <btullis>	 marostegui: I'm sorry to trouble you, but I'm having trouble finding out why these systemd service files aren't present on db1108 after a reboot.
[12:33:51] <marostegui>	 did anything change?
[12:33:59] <btullis>	 Lots about these instances is present, but I'm expecting to find `/etc/systemd/system/mariadb@analytics_meta.service` and it's just not there. https://github.com/wikimedia/operations-puppet/blob/production/hieradata/role/common/mariadb/misc/analytics/backup.yaml
[12:34:21] <btullis>	 I haven't changed anything, just rebooted.
[12:34:51] <marostegui>	 yeah, I don't think we have touched that instance in years
[12:34:59] <marostegui>	 At least /srv content for both of them are there
[12:37:36] <jynus>	 I saw nothing weird on the host- but puppet was failing earlier on some hosts- maybe try rerunning puppet and/or reinstalling the mariadb package to see if there was some glitch or something
[12:37:45] <marostegui>	 I did a puppet run
[12:37:47] <marostegui>	 Just in case
[12:38:38] <marostegui>	 btullis: so when you did the stop mariadb@XX it was all fine? 
[12:38:41] <marostegui>	 so the units were present?
[12:39:05] <marostegui>	 ii  wmf-mariadb104                       10.4.18-1 that's old
[12:39:41] <btullis>	 I'm afraid that all I did was the reboot-single cookbook. I didn't think to check whether the unit files were present. I knew I'd have to start the slave on each instance manually after reboot, but otherwise Icinga was green.
[12:39:43] <marostegui>	 ah right this host is buster
[12:40:01] <marostegui>	 btullis: but does that stop mariadb?
[12:41:28] <marostegui>	 The matomo instance is present
[12:41:57] <btullis>	 Oh..
[12:42:01] <marostegui>	 I started it
[12:42:34] <marostegui>	 root@db1108:/srv# systemctl start mariadb@analytics_meta
[12:42:35] <marostegui>	 root@db1108:/srv#
[12:42:46] <marostegui>	 root@db1108:/srv# mysql -S /run/mysqld/mysqld.analytics_meta.sock -e "start slave"
[12:42:46] <marostegui>	 root@db1108:/srv#
[12:43:29] <marostegui>	 It looks fine to me
[12:43:30] <marostegui>	 root@db1108:/srv# journalctl -xe -u mariadb@analytics_meta | grep -i 3352
[12:43:30] <marostegui>	 Feb 06 12:42:31 db1108 mysqld[27192]: Version: '10.4.22-MariaDB-log'  socket: '/run/mysqld/mysqld.analytics_meta.sock'  port: 3352  MariaDB Server
[12:43:46] <marostegui>	 Not sure what you were looking at, but the instances start ok
[12:44:12] <btullis>	 Oh, I'm so sorry for wasting your time.
[12:44:30] <marostegui>	 No no!
[12:44:32] <marostegui>	 Not at all :)
[12:47:30] <btullis>	 I think I forgot that the service itself doesn't start on boot. In my mind it was only the replication threads that don't start automatically. Then I couldn't see the systemd instantiated aliases, probably just fat fingers.
[12:48:25] <jynus>	 btullis: there is config to do so, but I think it is only like that on cloud, for on production data > availability
[12:50:06] <btullis>	 Gotcha. Many thanks both. I feel slightly foolish, but if that's the worst of it that's a great outcome :-)
[12:50:43] <jynus>	 not foolish at all, I belive services not starting by default is an unexpected result, but done on purpose
[12:51:22] <jynus>	 specially unexpected if you weren't part of the discussion of why that is like that by defult AND didn't conciously changed the confing yourself
[12:53:52] <marostegui>	 btullis: yeah, we don't start mariadb on boot in purpose, it is better than way, so in case of crashes or unexpected reboots we don't get unoticed possibly corrupted data in production
[12:55:05] <btullis>	 +1 thanks. I've got a few mariadb related tasks to do in the next few months, so hopefully the muscle memory will get a bit stronger too.
[12:55:12] <jynus>	 I guess one thing that could be done is to modify the existing reboot or create a new one specific for mariadb hosts
[12:55:23] <jynus>	 *script
[12:55:37] <jynus>	 I belive there was some WIP script already
[14:50:11] <jynus>	 What's the proxy patch, https://gerrit.wikimedia.org/r/c/operations/puppet/+/886904 ?
[14:50:20] <marostegui>	 jynus: this is it https://gerrit.wikimedia.org/r/c/operations/puppet/+/886904
[14:50:31] <jynus>	 looking
[14:51:12] <jynus>	 nitpick: move*D*
[14:54:38] <jynus>	 One thing I am seeing is binlog_format | MIXED on the new primary, so you will have to switch it and close the file manually
[14:54:59] <jynus>	 (I know you know this, just making sure it was in your mind)
[14:55:19] <marostegui>	 yeah, it is fine
[14:57:24] <jynus>	 checked ips, port, hosts ,etc
[14:57:32] <marostegui>	 in fact I am going to change it now that it is just a replica
[14:58:31] <jynus>	 doing it directly is ok, just being me ultra pendantic on reviews :-D
[14:58:42] <marostegui>	 hehe I know, that's good
[16:31:23] <Emperor>	 ugh, I have forgotten all the SQL I once knew
[16:31:40] <marostegui>	 I never knew any
[16:33:02] <Emperor>	 ;p
[16:37:49] <Emperor>	 WITH cd AS (SELECT * from codfw.object WHERE deleted == 1), ed AS (SELECT * from eqiad.object WHERE deleted ==1) SELECT cd.name FROM cd LEFT JOIN ed USING(name) WHERE ed.name IS NULL;
[16:38:00] <Emperor>	 ^-- feels gross
[16:38:19] <Emperor>	 [sqlite]