[10:19:32] lunch [13:19:29] o/ [15:02:18] \o [15:06:13] o/ [15:07:21] errand [15:23:00] Running the data transfers cookbook from wcqs2001 to wcqs1001 ATM, let me know if y'all see any problems [16:03:32] workout, back in ~40 [16:04:05] Oh, and the data transfer cookbook finished, will move to the next host [16:59:16] journal on wcqs1002 is 337G vs 443G on wcqs2001 & wcqs1001 do definitely something stopped the transfer prematurely [16:59:39] s/do/so [16:59:41] yeah, confirmed, service on 1001 has been up since 15:55 and I started the bad cookbook at 16:04:22 [16:59:57] no logs of it touching anything but icinga [17:00:15] wondering if the transfer technique we use is resilient to network hiccups [17:00:26] so 1002 failed, but I don't think it had anything do to with the cookbook I immediately cancelled [17:00:34] we might perhaps at least do a quick size check from the cookbook [17:00:51] inflatador: probably not then, thanks for checking [17:00:53] Sounds like a good idea [17:01:22] I'll start a ticket for that, in the meantime I'll retry the xfer with 1001 as source this time [17:02:14] thanks! [17:17:48] well, the xfer finished, but disk space is still much smaller on wcqs1002 [17:18:56] it's even smaller now: 88G [17:19:01] weird... [17:20:22] trying again with codfw2001 as source again [17:37:07] lunch/errands, back in ~1h [17:46:18] dinner [18:33:33] inflatador: pairing if you're around meet.google.com/eki-rafx-cxi [18:41:34] ebernhardson ryankemper ticket for co_okbook update https://phabricator.wikimedia.org/T321605 [20:24:55] ryankemper did you reboot wcqs2001 ? I saw an alert go by for NFS mounts on it [20:25:16] inflatador: yes [20:26:59] Looks like the alerts are for "labstore1006/1007" which I believe are decommissioned hosts, so probably no action needed [20:56:54] okay wcqs restarts are all done [20:56:59] s/restarts/reboots [21:09:33] ryankemper ACK, will try to run the data-transfer cookbook shortly [21:46:19] good news: we hit our error message and it appears to display correctly bad news: we hit our error msg ;) [21:46:31] `RuntimeError: Dest filesize of 259426516992 does not match source filesize of 443175731200` [21:46:49] hmm, so i guess the question is now why has the transfer become flakey [21:47:03] this is inside the DC? [21:47:07] I'll give it one more try before I fold the tent [21:47:08] Yeah [21:47:32] sounds good [21:50:37] I updated https://phabricator.wikimedia.org/T316236 with a list of hosts that have the correct data [21:51:02] Let me check if we stop puppet before we start these xfers. Maybe we're getting bitten by ferm rules [21:52:20] lucky us, the latest run failed immediately [21:59:58] :S [22:10:38] I'm out, but left a few notes on the puppet/ferm theory https://phabricator.wikimedia.org/T321605#8343826