[07:48:08] o/ [10:11:59] o/ dcausse: welcome back :-) [10:12:10] thanks! :) [10:13:15] still catching up on emails and notifications but please let me know if there's anything you're blocked on that I could help with [10:13:58] dcausse: nothing urgent, but I'd like to chat with you about a plan for splitting the WDQS graph at some point. [10:14:07] sure [10:14:17] reading through the notes atm [11:22:22] lunch [14:01:58] o/ [14:03:27] o/ [15:45:07] \o [15:46:46] o/ [16:38:44] dcausse: hi! thank you for the quick review re: the alerts, a sanity check: I see there's no wdqs swift metrics in codfw, does that ring a bell? or streamer update actually writes to codfw swift too and we're not reporting metrics ? [16:38:49] or is it eqiad-only ? [17:00:21] inflatador: could i get you to copy /var/log/syslog* on snapshot1008 into my homedir and change the perms so i can read it? maybe messages*, kern* and daemon* as well. Doing a quick look into the dump failure ticket and we have nothing in logstash for the wikidata failure. I don't have great ideas, but it seems like the system killed our task and hoping to find an oom killer or some such [17:01:32] ebernhardson ACK , looking now [17:03:21] with all other failures cirrus would have some error, log things to logstash, and then print an error message into the script output. On this on the script output simply stops which suggests a process kill to me [17:06:57] ebernhardson I dumped /var/log/* into 'log' in your homedir, LMK if you need anything else [17:07:10] inflatador: thanks! [17:14:19] sadly no useful info :( nothing in any of the logs around the time our script died. [17:17:33] godog: hm... the updater writes to thanos.discovery.wmnet [17:17:52] if it's swift_account_stats_bytes_total I'm not sure how it's collected [17:18:35] perhaps this alert should be global? [17:18:56] Not sure if there is any work for us to do here but it affected wikidata so FYI https://phabricator.wikimedia.org/T330906 [17:19:01] dcausse: ah got it, thank you! that explains, I'll adjust tomorrow [17:19:12] thanks! [17:20:59] inflatador: thanks for the info, this should not have affected our services [17:21:34] we never use the wikidata entity IRI to access its data [17:25:56] related T226453 [17:25:56] T226453: Concept URI in sidebar on Wikidata uses HTTP instead of HTTPS - https://phabricator.wikimedia.org/T226453 [17:26:15] I don't think it makes sense to change it now [17:26:32] it'd be a massive migration [17:29:11] dcausse thanks for the context, it came up at the SRE mtg and wanted to get it out there just in case [17:29:23] sure, thanks! [17:54:04] moving T331127 to current work, I think it's a regression I caused [17:54:05] T331127: phantom redirects lingering in incategory searches after page moves - https://phabricator.wikimedia.org/T331127 [18:41:08] dinner [19:28:45] gehel: running 2’ late [19:29:14] Ack [19:32:14] Just putting here at a note to myself. We need to adapt https://gerrit.wikimedia.org/r/c/operations/puppet/+/878128 to work with our new airflow instance [19:35:15] inflatador: as long as your doing reminders, i noticed in peter's patch we need to do https://gerrit.wikimedia.org/r/c/operations/puppet/+/574539 for the new instance as wel [19:35:39] some files that some of the dags need that puppet deploys to the old instance [19:36:54] ebernhardson thanks, feel free to pass any patches along, sorry we have not been proactive with the puppet deploys lately [19:45:25] * inflatador eventually notices that this patch is actually merged ;) [19:45:51] but will need to be adapted for new instance etc etc [19:51:18] yea, thats the patch that i did for airflow 1 a few years ago, we need the same files deployed to the new instance. I can look over how the new version needs to be written. I looked briefly but not 100% sure where the mysql credentials definition should go [19:56:22] ebernhardson cool, I'm available in an hour or so if you wanna go thru it together [20:02:06] lunch, back in ~1h [20:54:50] back [21:05:20] hmm, maybe i just skip this part...been pondering hwo in airflow 1 we had spark available during the airflow test suite and would push hql statements through spark's sql parser to validate rendered hql is syntactically correct, but in this airflow 2 deployment spark is not part of the existing dependencies and adding it at test time would probably be a bit heavy [21:07:11] (it would require having a jvm available, and the multi-hundred mb spark image). Pondering if there is some lighter weight way to accomplish the same goal... [21:43:49] ebernhardson I'm up at https://meet.google.com/fde-tbpf-wqh if you wanna look at the airflow/puppet stuff (no pressure, just rubber-ducking at this point) [21:55:23] > push hql statements through spark's sql parser to validate rendered hql is syntactically correct [21:55:46] this sounds like the realm of the job, not the scheduler, ya? [22:32:20] ottomata: well, here it's just a plain HiveOperator(...) [22:32:39] ottomata: so if we want to verify the jobs when deployed won't totally blow up, the airflow test would need to validate the syntax