[06:45:25] db1240 backup load process failed at 04:24 UTC :-( [07:03:27] and puppet ran? [07:06:47] I'm going to disable it just in case, but I don't think that affects regular dbs [07:06:53] or dbprov hosts [07:26:53] Amir1: I have finished https://phabricator.wikimedia.org/T366982 [08:53:12] marostegui: Thanks <3 [09:41:56] Checking better the import may have been successful, but just failed on a table with: [09:41:59] ERROR 1265 (01000) at line 2884: Data truncated for column 'fa_major_mime' at row 3273 [09:49:30] which is weird because that would mean there is an enum somwhere with more values [09:50:15] is cumin1002 down or is it my connection? [09:51:06] it looks like the connection between cumin1002 and zarcillo [09:52:45] $ ping zarcillo-master.eqiad.wmnet [09:52:46] PING db1215.eqiad.wmnet ( 56(84) bytes of data. [09:52:46] 64 bytes from db1215.eqiad.wmnet ( icmp_seq=1 ttl=63 time=0.636 ms [09:52:55] from cumin1002 I have no trouble [09:53:53] yeah looks like it is back [10:16:31] arnaudb: if you're still interested in reviewing https://wikitech.wikimedia.org/wiki/MariaDB/Rebooting_a_host I think I made all the changes I had in mind :) [10:16:44] * arnaudb checks [10:20:57] dhinus: amazing doc! [10:42:33] arnaudb: thanks :) [10:42:47] the skeleton was already there, I just added many more details [12:15:25] FIRING: [2x] SystemdUnitFailed: envoyproxy.service on moss-fe1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:29:40] there seems to be a bug/race in our envoy setup, in that new nodes end up with an empty envoy config. [12:30:07] (which you can fix by running /usr/local/sbin/build-envoy-config -c /etc/envoy ) [12:30:25] FIRING: [2x] SystemdUnitFailed: envoyproxy.service on moss-fe1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:35:25] RESOLVED: [2x] SystemdUnitFailed: envoyproxy.service on moss-fe1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed