[08:31:12] <federico3>	 Amir1: can i flip s4 codfw master?
[09:01:25] <Amir1>	 Go for it
[09:12:48] <federico3>	 ok, running the flip
[09:18:31] <federico3>	 ok the flip is done, running the schema changes on db2179
[09:27:53] <Amir1>	 \o/
[09:29:16] <Amir1>	 I try to do s4 and s1 in eqiad now
[09:45:54] <federico3>	 ok the 2 schema changes are done on 2179
[09:47:21] <federico3>	 however before pooling it in we might want to swap weights with another db because it has main=0 api=300 
[09:47:25] <federico3>	 Amir1: ^^^
[09:47:59] <Amir1>	 sounds good to me. Can you pick a normal random replica in codfw and swap the wieghts?
[09:48:22] <Amir1>	 a replica that is only pooled in main group (not api/vslow/dump)
[09:49:16] <federico3>	 yep
[09:49:59] <federico3>	 if we want to avoid jolting the hosts we would have to depool both hosts tho
[09:50:35] <Amir1>	 I think it's fine for now, right now is not the peak in the day and codfw is the secondary dc and gets a lot less queries 
[09:50:46] <federico3>	 ok
[09:51:05] <Amir1>	 I kinda like the word jolting :D 
[09:51:27] <federico3>	 switching with db2237 then
[09:52:06] <Amir1>	 Thanks
[09:53:37] <federico3>	 Amir1: the derivative of acceleration :D https://en.wikipedia.org/wiki/Jerk_%28physics%29
[10:06:14] <federico3>	 Amir1: see -operations
[10:11:43] <federico3>	 Amir1: are all these hosts lagging together? https://grafana-rw.wikimedia.org/d/bd60e6f6-11fc-47f4-a6ba-109c1aed251d/federico-s-mariadb-replication-dash?folderUid=Wagp6Ryik&forceLogin=true&from=now-3h&orgId=1&timezone=utc&to=now
[10:25:41] <Amir1>	 federico3: sorry, was dealing with the incident in _security
[10:26:26] <federico3>	 ah - well i have a nice backlog to read
[11:49:26] <Amir1>	 btw, jynus you might like this a bit: https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&refresh=5m&var-server=db1157&var-datasource=000000026&var-cluster=mysql&from=now-90d&to=now&timezone=utc&viewPanel=panel-12 https://usercontent.irccloud-cdn.com/file/IkYDazav/grafik.png
[11:49:41] <Amir1>	 a 10% reduction in number of files in s3
[11:50:00] <Amir1>	 stuff like T397367
[11:50:00] <stashbot>	 T397367: Drop unneeded empty tables from wikis - https://phabricator.wikimedia.org/T397367
[11:50:23] <Amir1>	 that should make the growth of backups tracking db slightly slower
[11:53:44] <jynus>	 that's cool
[11:54:22] <jynus>	 let me take a break, I need to focus on some backup stuff I need to deploy, but I need some time for a coffee
[11:55:01] <Amir1>	 go for it. Thanks for helping with the incident
[12:02:01] <Amir1>	 rebooting db1215 (zarcillo master) for kernel upgrade
[12:04:45] <federico3>	 Amir1: ack
[12:09:09] <federico3>	 it's back and the ui reconnected by itself
[12:28:40] <Amir1>	 I ran a check via debmonitor and puppet and it seems db2207 candidate master of s2 in codfw also is on the old thing. I quickly upgrade it
[12:30:36] <Amir1>	 thankfully this is the only host whether master or candidate master
[13:01:33] <federico3>	 was it missing in the list in the task? 
[13:01:59] <Amir1>	 somehow
[13:02:01] <Amir1>	 but meh
[13:03:27] <jynus>	 I forgot to say: db1176 probably should be in read_only=0 (it doesn't matter, it is a test host, but that's how alerts are configured)
[13:03:56] <jynus>	 either that or change the section config, but I think a set global is just easier
[13:04:31] <jynus>	 https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=db1176&service=MariaDB+read+only+test-s4
[13:04:32] <Amir1>	 done
[13:05:16] <jynus>	 it is mostly nice to test thinks like switchovers in a more realistic config
[13:05:21] <jynus>	 *things
[13:47:00] <Emperor>	 "yay" the trixie kernel has changed what it puts into /dev/disk/by-path 
[13:54:36] <Amir1>	 rebooting db1217, the only replica of several misc sections
[13:56:42] <Emperor>	 by-path - T404351 
[13:56:43] <stashbot>	 T404351: swift_disks fact needs to cope with change in /dev/disk/by-path in trixie - https://phabricator.wikimedia.org/T404351
[14:04:58] <urandom>	 Emperor: I was wondering if that were possible
[14:31:20] <jynus>	 i will set db2185 (zarcillo codfw) as read only, I think had the same issue as test-s4
[14:31:31] <jynus>	 unless you tell me not to
[15:04:43] <urandom>	 sobanski: so, what do you need from me to create a phab template/form?  subject, body text, ...?
[15:05:08] <urandom>	 sobanski: and what is yours (and any others) opinion on using a tag for design reviews?
[15:05:30] <urandom>	 I've always been told that we should be conservative about creating tags
[15:17:14] <Platonides>	 any ideas why a mariadb might increase ibdata1 hundreds of GB in spite of using innodb_per_table ?
[15:17:35] <Platonides>	 (confirmed with show variables, by matching .frm with ibd files and by querying INNODB_SYS_TABLES)
[15:18:18] <Platonides>	 some redo logs maybe? (but supposedly that's at ib_logfile0)
[15:19:40] <Platonides>	 I suspect of l10n_cache table which -for some reason- seems to be deleted and recreated constantly, but still that doesn't seem to explain extending using the system tablespace and so much
[16:33:48] <jinxer-wm>	 FIRING: PuppetFailure: Puppet has failed on ms-be2068:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure
[17:52:03] <jasmine_>	 hi data persistence, we ran into the following error [0] earlier today  during 00-downtime-db-readonly-checks - just wanted to ask if this might be related to the s4/s1 work earlier today
[17:52:12] <jasmine_>	 https://www.irccloud.com/pastebin/grEAaH8y/
[17:56:56] <jasmine_>	 get_core_dbs() also returns similarly for 03-set-db-readonly
[17:57:13] <jasmine_>	 https://www.irccloud.com/pastebin/f8cKh6Gf/
[18:58:36] <jasmine_>	 update! you can disregard the above!)  swfrench-wmf tracked down that x3 just hasn't been added to spicerack at [0] yet, which explains the outdated expected values above, so we should be good 
[18:58:36] <jasmine_>	 [0] - https://gerrit.wikimedia.org/g/operations/software/spicerack/+/fc6039579363d238d0f62279859999d1cb398c17/spicerack/mysql.py#29
[20:34:03] <jinxer-wm>	 FIRING: PuppetFailure: Puppet has failed on ms-be2068:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure