[07:15:03] o/ [07:56:03] o/ [07:56:39] inflatador / ryankemper: (for when you're around) there are a bunch of unassigned shards on relforge. No emergency, but could you have a look today? [08:00:15] See T324939 [08:00:15] T324939: Multiple unassigned shards on the Elasticsearch relforge cluster - https://phabricator.wikimedia.org/T324939 [10:09:53] Errand + lunch [11:13:34] lunch [13:52:32] gehel :eyes on relforge [13:54:45] o/ [14:09:03] OK, relforge is fixed...looks like replica allocation was left in disabled after a failed cookbook run. will update docs and close ticket [14:09:19] inflatador: thanks! [14:32:57] pfischer: we're in https://meet.google.com/usq-jygh-toi with David and Andrew [15:01:00] doctor appointment in 15, may be a bit late to triage mtg [15:23:24] errand [15:23:40] (won't be around for triage) [16:01:06] and triage is starting in https://meet.google.com/eki-rafx-cxi [16:02:35] pfischer: problem connecting? [17:55:51] lunch/errands, back in ~90m [19:15:58] ryankemper: can we reschedule our 1:1? I still have kids to put in bed and I haven't eaten dinner yet. [19:16:15] * gehel was too optimistic about the number of meetings this evening [19:16:32] dinner [19:17:01] gehel: Sounds good. Want to find a time that works on wednesday (or tues) and throw it on my calendar? [19:19:40] I'll do that! Thanks for the flexibility, and sorry for the last minute cancellation [20:48:51] finally back [20:49:13] I just had the most stereotypical American vehicle registration experience ;( [21:45:37] the codfw and eqiad WDQS data reloads both got all the way to the end and then decided to throw the error "--kafka-timestamp should be set when reloading commons or wikidata" . I don't think that's a real error but we should probably fix that ;) [21:47:43] inflatador: That might well be a real errors. Can you keep the servers depooled until David is around to confirm? [21:48:18] Did this happen during the short WDQS outage? And maybe during the restart by ryankemper? [21:48:39] * gehel should really not be on IRC at this time. Going back to not working and sorry for the interruption. [21:48:43] gehel no, this was from a separate task https://phabricator.wikimedia.org/T316236 [21:48:57] anyway yeah, go home! ;P [21:49:04] Oh, WCQS, not WDQS [21:49:41] actually there's one for wdqs too, I just gave the wrong task ;( [21:49:53] https://phabricator.wikimedia.org/T323096 [21:50:29] ryankemper: did a forced restart of all the WDQS servers earlier today. That would have caused issues if there was a reload in progress. [21:51:39] gehel: it would have caused issues for codfw but not for eqiad, and it sounds like from inflatador's message that both eqiad and codfw had that same error [21:52:11] oh wait the earlier messages might have been for WCQS. brian and I can poke around in our pairing soon [21:52:56] we can delay repooling per g-ehel's suggestion, mainly want to do some log diving to see exactly what happened [21:53:15] pretty sure the kafka timestamp only speeds up reloads and is not a requirement, but we can wait and check with David [21:55:12] inflatador: Actually I'd wager that without the kafka timestamp that it won't work properly. Since flink is a stateful stream processor it needs to know the correct kafka offset to start from so it knows what changes to apply to the journal file. Not 100% sure but I think my hunch is right [21:55:48] That being said it might not require a separate reload; if the journal file is there and the updater hasn't started then we can probably set it ourselves before starting the updater