[08:31:12] Amir1: can i flip s4 codfw master? [09:01:25] Go for it [09:12:48] ok, running the flip [09:18:31] ok the flip is done, running the schema changes on db2179 [09:27:53] \o/ [09:29:16] I try to do s4 and s1 in eqiad now [09:45:54] ok the 2 schema changes are done on 2179 [09:47:21] however before pooling it in we might want to swap weights with another db because it has main=0 api=300 [09:47:25] Amir1: ^^^ [09:47:59] sounds good to me. Can you pick a normal random replica in codfw and swap the wieghts? [09:48:22] a replica that is only pooled in main group (not api/vslow/dump) [09:49:16] yep [09:49:59] if we want to avoid jolting the hosts we would have to depool both hosts tho [09:50:35] I think it's fine for now, right now is not the peak in the day and codfw is the secondary dc and gets a lot less queries [09:50:46] ok [09:51:05] I kinda like the word jolting :D [09:51:27] switching with db2237 then [09:52:06] Thanks [09:53:37] Amir1: the derivative of acceleration :D https://en.wikipedia.org/wiki/Jerk_%28physics%29 [10:06:14] Amir1: see -operations [10:11:43] Amir1: are all these hosts lagging together? https://grafana-rw.wikimedia.org/d/bd60e6f6-11fc-47f4-a6ba-109c1aed251d/federico-s-mariadb-replication-dash?folderUid=Wagp6Ryik&forceLogin=true&from=now-3h&orgId=1&timezone=utc&to=now [10:25:41] federico3: sorry, was dealing with the incident in _security [10:26:26] ah - well i have a nice backlog to read [11:49:26] btw, jynus you might like this a bit: https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&refresh=5m&var-server=db1157&var-datasource=000000026&var-cluster=mysql&from=now-90d&to=now&timezone=utc&viewPanel=panel-12 https://usercontent.irccloud-cdn.com/file/IkYDazav/grafik.png [11:49:41] a 10% reduction in number of files in s3 [11:50:00] stuff like T397367 [11:50:00] T397367: Drop unneeded empty tables from wikis - https://phabricator.wikimedia.org/T397367 [11:50:23] that should make the growth of backups tracking db slightly slower [11:53:44] that's cool [11:54:22] let me take a break, I need to focus on some backup stuff I need to deploy, but I need some time for a coffee [11:55:01] go for it. Thanks for helping with the incident [12:02:01] rebooting db1215 (zarcillo master) for kernel upgrade [12:04:45] Amir1: ack [12:09:09] it's back and the ui reconnected by itself [12:28:40] I ran a check via debmonitor and puppet and it seems db2207 candidate master of s2 in codfw also is on the old thing. I quickly upgrade it [12:30:36] thankfully this is the only host whether master or candidate master [13:01:33] was it missing in the list in the task? [13:01:59] somehow [13:02:01] but meh [13:03:27] I forgot to say: db1176 probably should be in read_only=0 (it doesn't matter, it is a test host, but that's how alerts are configured) [13:03:56] either that or change the section config, but I think a set global is just easier [13:04:31] https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=db1176&service=MariaDB+read+only+test-s4 [13:04:32] done [13:05:16] it is mostly nice to test thinks like switchovers in a more realistic config [13:05:21] *things [13:47:00] "yay" the trixie kernel has changed what it puts into /dev/disk/by-path [13:54:36] rebooting db1217, the only replica of several misc sections [13:56:42] by-path - T404351 [13:56:43] T404351: swift_disks fact needs to cope with change in /dev/disk/by-path in trixie - https://phabricator.wikimedia.org/T404351 [14:04:58] Emperor: I was wondering if that were possible [14:31:20] i will set db2185 (zarcillo codfw) as read only, I think had the same issue as test-s4 [14:31:31] unless you tell me not to [15:04:43] sobanski: so, what do you need from me to create a phab template/form? subject, body text, ...? [15:05:08] sobanski: and what is yours (and any others) opinion on using a tag for design reviews? [15:05:30] I've always been told that we should be conservative about creating tags [15:17:14] any ideas why a mariadb might increase ibdata1 hundreds of GB in spite of using innodb_per_table ? [15:17:35] (confirmed with show variables, by matching .frm with ibd files and by querying INNODB_SYS_TABLES) [15:18:18] some redo logs maybe? (but supposedly that's at ib_logfile0) [15:19:40] I suspect of l10n_cache table which -for some reason- seems to be deleted and recreated constantly, but still that doesn't seem to explain extending using the system tablespace and so much [16:33:48] FIRING: PuppetFailure: Puppet has failed on ms-be2068:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [17:52:03] hi data persistence, we ran into the following error [0] earlier today during 00-downtime-db-readonly-checks - just wanted to ask if this might be related to the s4/s1 work earlier today [17:52:12] https://www.irccloud.com/pastebin/grEAaH8y/ [17:56:56] get_core_dbs() also returns similarly for 03-set-db-readonly [17:57:13] https://www.irccloud.com/pastebin/f8cKh6Gf/ [18:58:36] update! you can disregard the above!) swfrench-wmf tracked down that x3 just hasn't been added to spicerack at [0] yet, which explains the outdated expected values above, so we should be good [18:58:36] [0] - https://gerrit.wikimedia.org/g/operations/software/spicerack/+/fc6039579363d238d0f62279859999d1cb398c17/spicerack/mysql.py#29 [20:34:03] FIRING: PuppetFailure: Puppet has failed on ms-be2068:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure