[07:15:03] <dcausse>	 o/
[07:56:03] <gehel>	 o/
[07:56:39] <gehel>	 inflatador / ryankemper: (for when you're around) there are a bunch of unassigned shards on relforge. No emergency, but could you have a look today?
[08:00:15] <gehel>	 See T324939
[08:00:15] <stashbot>	 T324939: Multiple unassigned shards on the Elasticsearch relforge cluster - https://phabricator.wikimedia.org/T324939
[10:09:53] <gehel>	 Errand + lunch 
[11:13:34] <dcausse>	 lunch
[13:52:32] <inflatador>	 <o/
[13:52:53] <inflatador>	 gehel :eyes on relforge
[13:54:45] <dcausse>	 o/
[14:09:03] <inflatador>	 OK, relforge is fixed...looks like replica allocation was left in disabled after a failed cookbook run. will update docs and close ticket 
[14:09:19] <gehel>	 inflatador: thanks!
[14:32:57] <gehel>	 pfischer: we're in https://meet.google.com/usq-jygh-toi with David and Andrew
[15:01:00] <inflatador>	 doctor appointment in 15, may be a bit late to triage mtg
[15:23:24] <dcausse>	 errand
[15:23:40] <dcausse>	 (won't be around for triage)
[16:01:06] <gehel>	 and triage is starting in https://meet.google.com/eki-rafx-cxi
[16:02:35] <gehel>	 pfischer: problem connecting?
[17:55:51] <inflatador>	 lunch/errands, back in ~90m
[19:15:58] <gehel>	 ryankemper: can we reschedule our 1:1? I still have kids to put in bed and I haven't eaten dinner yet.
[19:16:15] * gehel was too optimistic about the number of meetings this evening 
[19:16:32] <dcausse>	 dinner
[19:17:01] <ryankemper>	 gehel: Sounds good. Want to find a time that works on wednesday (or tues) and throw it on my calendar?
[19:19:40] <gehel>	 I'll do that! Thanks for the flexibility, and sorry for the last minute cancellation 
[20:48:51] <inflatador>	 finally back
[20:49:13] <inflatador>	 I just had the most stereotypical American vehicle registration experience ;(
[21:45:37] <inflatador>	 the codfw and eqiad WDQS data reloads both got all the way to the end and then decided to throw the error "--kafka-timestamp should be set when reloading commons or wikidata" . I don't think that's a real error but we should probably fix that ;)
[21:47:43] <gehel>	 inflatador: That might well be a real errors. Can you keep the servers depooled until David is around to confirm?
[21:48:18] <gehel>	 Did this happen during the short WDQS outage? And maybe during the restart by ryankemper?
[21:48:39] * gehel should really not be on IRC at this time. Going back to not working and sorry for the interruption.
[21:48:43] <inflatador>	 gehel no, this was from a separate task https://phabricator.wikimedia.org/T316236 
[21:48:57] <inflatador>	 anyway yeah, go home! ;P
[21:49:04] <gehel>	 Oh, WCQS, not WDQS
[21:49:41] <inflatador>	 actually there's one for wdqs too, I just gave the wrong task ;(
[21:49:53] <inflatador>	 https://phabricator.wikimedia.org/T323096
[21:50:29] <gehel>	 ryankemper: did a forced restart of all the WDQS servers earlier today. That would have caused issues if there was a reload in progress.
[21:51:39] <ryankemper>	 gehel: it would have caused issues for codfw but not for eqiad, and it sounds like from inflatador's message that both eqiad and codfw had that same error
[21:52:11] <ryankemper>	 oh wait the earlier messages might have been for WCQS. brian and I can poke around in our pairing soon
[21:52:56] <inflatador>	 we can delay repooling per g-ehel's suggestion, mainly want to do some log diving to see exactly what happened
[21:53:15] <inflatador>	 pretty sure the kafka timestamp only speeds up reloads and is not a requirement, but we can wait and check with David
[21:55:12] <ryankemper>	 inflatador: Actually I'd wager that without the kafka timestamp that it won't work properly. Since flink is a stateful stream processor it needs to know the correct kafka offset to start from so it knows what changes to apply to the journal file. Not 100% sure but I think my hunch is right
[21:55:48] <ryankemper>	 That being said it might not require a separate reload; if the journal file is there and the updater hasn't started then we can probably set it ourselves before starting the updater