[09:00:54] jynus: let me know when I could stop the backup source for m1 [09:03:23] any time until monday night [09:03:51] nice! [09:03:54] Doing it now then, thanks! [09:04:04] marostegui: when do you think I could take an extra es backups? [09:04:10] jynus: whenever you want [09:04:22] no upgrades going on those dbs ? [09:04:45] I've done most of them I think, some may be pending, but not urgent [09:04:49] you can proceed as you want [09:05:12] I think I will leave it running at the end of the day [09:05:17] excellent [09:06:40] heads up that I will deploy new grants for that to 1 es6,es7 host for that [09:06:43] ok! [09:11:25] FIRING: SystemdUnitFailed: prometheus-mysqld-exporter.service on db1250:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:12:13] ^ expected [09:12:18] I am setting up the host [10:01:25] RESOLVED: SystemdUnitFailed: prometheus-mysqld-exporter.service on db1250:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:02:29] jynus: m1 source back up [10:04:15] if you cloned from it, remember to remove the dump user but without binlog ! [10:04:58] ah good point! [10:05:03] I will drop the user entirely then? [10:05:07] yep [10:05:33] I plan to add that to automation on the higher level recovery [10:06:10] it won't cause any problem, but better to remove the possibility of accidentally backup the wrong host [10:07:01] done [10:30:13] taking a break before fireworks [12:33:34] _joe_: FWIW, the thumbnail steps is behind a ratio based feature flag (ratio will make it randomly split them based on hash of the image name) so I'm planning to roll it out really slowly to avoid melting thumbor/swift [12:33:42] it might take a month or so at least [15:41:40] I have started new backups from the new hosts [15:42:02] I don't see how that could cause issues, as it is literally the same code and config, but giving a heads up [15:42:19] that should last ~8 hours [15:42:40] I will give on call a heads up too [16:13:18] just did it [16:13:26] https://phabricator.wikimedia.org/T387892#10605906