[08:52:58] wdqs1010 reload status: chunk 1022 out of 1047 and then 11 lexeme chunks to go, we're getting really close [08:53:11] Keep your fingers crossed! [10:50:19] lunch [11:01:37] lunch 2 [11:25:04] Lunch+relocation [11:31:36] dcausse: Would you have time to look into the airflow deployment with me? https://gerrit.wikimedia.org/r/c/wikimedia/discovery/analytics/+/889579 passes now, BTW. [12:44:44] pfischer: sure, I'm available when you want to do the deploy [13:41:08] gehel: could you subscribe Peter to discovery-alerts? [13:44:42] don [13:44:45] done [13:53:16] thanks! [14:08:36] rebooting Mac for updates [14:13:44] o/ [14:33:15] * gehel will skip the unmeeting today, conflicting ERC workshop [14:40:15] dcausse: airflow DAG “wcqs_streaming_updater_reconcile_hourlyschedule” ran successfully :-) [14:40:25] \o/ [14:44:49] back [14:47:33] pfischer / dcausse: T317202 is in "needs reporting", but the attached CR https://gerrit.wikimedia.org/r/c/schemas/event/primary/+/856507/ is not yet merged. So I'm assuming this should go back to "needs review"? [14:47:34] T317202: Model the update document used by the CirrusSearch Update Pipeline - https://phabricator.wikimedia.org/T317202 [14:48:37] hm yes, I think we might want to keep this patch while the pipeline is being worked on as things might change while we implement features [14:48:52] so either in waiting or needs review is fine to me [14:53:40] Looks like the new PostgresDB for the new airflow instance is ready to be created re https://gerrit.wikimedia.org/r/c/operations/puppet/+/889572/ [14:54:03] we might need some other puppet changes before this works, checking... [15:07:11] weekly update posted: https://app.asana.com/0/0/1203994009038960 [15:08:12] inflatador: I've pinged Olja about getting some support on creating that airflow instance. In an ideal world, I think we should just request a new instance to data engineering and let them provide it to us. [15:08:27] we're not in an ideal world yet :) [15:08:55] gehel thanks. I don't know enough to know what I don't know in airflow-land ;) [15:15:42] sigh... the more I look the less I understand how MW can stop refreshing links when there are loops... [15:19:12] seems related to ParserOutput cache time and page.page_links_updated... [15:48:12] dcausse: is this blocking your work? Should you just let Platform Engineering sort it out? [15:49:48] gehel: it's me being curious and it somewhats relates to the update pipeline and me trying to better understand how re-renders happen and are triggered in MW [15:50:28] make sense! [15:52:02] I've pinged the tech mgmt channel on Slack. I'm not sure if anyone really feels ownership of JobQueue these days. Mat will follow up on his side. [15:53:04] I think Timo or Aaron might know what happens (reading the codebase) [15:56:09] working out, will be a little late to unmtg probably [16:37:03] back [16:38:16] I have to give a 5m presentation on my job to my son's class (8 yr olds) today, any suggestions? [17:39:21] lunch, back in ~1h [18:57:29] back [19:25:00] heading to my son's class, back in ~ 1h [19:32:10] I was wrong, it starts in an hour [20:00:59] The data reload cookbook failed. Not sure if the data is corrupt though, what's a good way to check (wdqs1010 is the host) [20:04:57] error message here https://phabricator.wikimedia.org/P44684 [20:07:09] Looks like the error probably came from https://github.com/wikimedia/operations-cookbooks/blob/master/cookbooks/sre/wdqs/data-reload.py#L191 [20:07:55] ebernhardson I have to leave soon (back in ~90m ) but are you around this afternoon to help finish this up? My gut feeling is that this is a recoverable error [20:09:45] Looks like we set 'timestamp' at https://github.com/wikimedia/operations-cookbooks/blob/0c38b23886f6e4f352c56f137daa7ade33553f9e/cookbooks/sre/wdqs/data-reload.py#LL94C21-L94C21 but then call 'timestampS` here https://github.com/wikimedia/operations-cookbooks/blob/master/cookbooks/sre/wdqs/data-reload.py#L132 [20:49:46] back [21:25:59] inflatador: seems like a cookbook error indeed, will set the timestamp manually and start the updater [21:30:55] type checks & mypy would have helped here I guess [21:38:32] dcausse excellent! Let me know how it goes re: updater. Is it a matter of wrong data types or are we just using incorrect variable name? [21:42:10] inflatador: I think it's a problem of type, passing str instead of datetime I think [21:42:41] I've set the offsets manually will restart the updater now [21:56:09] OK got it [22:00:50] sigh... used the wrong script to setup offsets, it's consuming 32days of backlog instead of 17days, should be no big deal but will take a bit more time to catchup [22:08:54] Still...this is the closest we've been in awhile! [22:09:26] Also, cloudelastic restart failed due to timeout connecting to omega, I think this is just a transient error...trying again [22:13:38] yes, I think we're getting close, let's wait for the backfill to finish (should take the whole week-end) before claiming victory :) [22:14:21] ACK, have a great weekend! [22:14:59] thx, you too!