[00:24:18] PROBLEM - Check unit status of eventlogging_to_druid_network_flows_internal_daily on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit eventlogging_to_druid_network_flows_internal_daily https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [00:28:28] ottomata: any chance that event schema of mine could be merged? [00:32:50] PROBLEM - Check unit status of eventlogging_to_druid_network_flows_internal-sanitization_daily on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit eventlogging_to_druid_network_flows_internal-sanitization_daily https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [07:49:36] !log Rerun failed webrequest jobs (text and upload, 2022-01-19T19:00 [07:49:37] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [09:01:07] 10Data-Engineering: Investigate releasing historical top-pageview-per-country data - https://phabricator.wikimedia.org/T299627 (10JAllemandou) [10:13:15] 10Analytics-Clusters, 10Data-Engineering, 10Data-Engineering-Kanban, 10Cassandra, and 3 others: Investigate high levels of garbage collection on new AQS nodes - https://phabricator.wikimedia.org/T298516 (10BTullis) That's great. Thanks for the updated patch Eric. I'm happy for you to go ahead and upgrade a... [10:26:41] 10Analytics-Clusters, 10Data-Engineering, 10Data-Engineering-Kanban, 10Cassandra, and 3 others: Investigate high levels of garbage collection on new AQS nodes - https://phabricator.wikimedia.org/T298516 (10BTullis) I've merged and deployed the patch, then checked with a puppet run on aqs1010. The new versi... [11:10:37] 10Data-Engineering, 10Data-Engineering-Kanban, 10Patch-For-Review: Hive query failure in Jupyter notebook on stat1005 - https://phabricator.wikimedia.org/T297734 (10BTullis) I have now created a patch to redirect parquet logs to the console and to reduce their verbosity. It will require testing on the test c... [11:36:34] !log temporarily disabling puppet on servers with hive installed T297734 [11:36:37] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [11:36:37] T297734: Hive query failure in Jupyter notebook on stat1005 - https://phabricator.wikimedia.org/T297734 [11:46:21] 10Data-Engineering, 10Data-Engineering-Kanban, 10Patch-For-Review: Hive query failure in Jupyter notebook on stat1005 - https://phabricator.wikimedia.org/T297734 (10BTullis) No problems detected in the test cluster. Re-enabling puppet for all other hive enabled servers. [11:58:24] !log re-enabled puppet on all hive nodes, deploying the updated log4j configuration for parquet [11:58:26] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [12:10:54] 10Data-Engineering, 10Data-Engineering-Kanban: Hive query failure in Jupyter notebook on stat1005 - https://phabricator.wikimedia.org/T297734 (10BTullis) @jwang - Please could you try your query again and see if the situation is any better now? Once agin, apologies for the delay in fixing this issue. [12:33:57] 10Analytics-Radar, 10PM: Fix Analytics workflow for #Analytics-EventLogging tasks - https://phabricator.wikimedia.org/T274490 (10Aklapper) 05Open→03Declined I guess this got superseded by T295397 [13:23:16] taking a break [13:45:59] 10Data-Engineering, 10Data-Engineering-Kanban: Check home/HDFS leftovers of christinedk - https://phabricator.wikimedia.org/T297461 (10Ottomata) Done. ` sudo -u hdfs kerberos-run-command hdfs hdfs dfs -rm -r /user/christinedk 22/01/20 13:44:51 INFO fs.TrashPolicyDefault: Moved: 'hdfs://analytics-hadoop/user/c... [13:46:16] 10Data-Engineering, 10Data-Engineering-Kanban: Check home/HDFS leftovers of christinedk - https://phabricator.wikimedia.org/T297461 (10Ottomata) 05Open→03Resolved a:03Ottomata [14:08:06] (03CR) 10Mforns: "Thanks for the thorough thoughts on my comments!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/747065 (https://phabricator.wikimedia.org/T297679) (owner: 10Awight) [14:20:07] heya teammm, could an SRE please have a quick look at https://gerrit.wikimedia.org/r/c/operations/puppet/+/753052 and merge if OK? :] [14:20:34] * btullis lookin' [14:20:53] (03Abandoned) 10Mforns: Add airflow DAG for anomaly detection (POC) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/702668 (https://phabricator.wikimedia.org/T285692) (owner: 10Mforns) [14:21:29] gotcha mforns ! [14:21:40] hey ottomata :] thanks! [14:21:56] thanks btullis as well! [14:24:57] Looks OK to me. How would I go about checking that the `--execute='6726880685f7b96b02a55ed7513d78c5'` is the right security checksum? [14:26:06] btullis: If you execute the same command without the --execute flag, then it does a DRY-RUN and it prints all files and partitions it would delete, and it gives you the correct execute token. [14:26:31] btullis: it's a mechanism to force a DRY-RUN before EXECUTION, and to freeze params in puppet [14:28:00] mforns: Cool. Thanks. This is as analytics on an-launcher1002, right? [14:28:10] btullis: yes [14:31:44] Got it. `Security checksum (use --help for more information): 6726880685f7b96b02a55ed7513d78c5` [14:32:32] btullis: yes! [14:32:51] it's just an argument checksum [14:33:35] I like it. [14:33:52] to prevent unintentional changes, or un-tested changes [14:34:05] since it's data deletion... [15:58:30] mforns: HELlLLOoOOOoooOOOo [16:04:37] mforns: afk for a few mins, but if you have time would love to catch up on airflow stuff, and also brain bounce the spark+skein interface i'm working on. [16:08:23] (03PS5) 10Awight: Sanitize additional event streams [analytics/refinery] - 10https://gerrit.wikimedia.org/r/747065 (https://phabricator.wikimedia.org/T297679) [16:08:28] (03CR) 10Awight: Sanitize additional event streams (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/747065 (https://phabricator.wikimedia.org/T297679) (owner: 10Awight) [16:18:44] ottomata: :] heyaaa yess, let's sync up whenever it's good for you, I was reading your task [16:23:55] (03CR) 10Awight: "This change is ready for review." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/755724 (owner: 10Awight) [16:24:15] mforns: ready, bc? [16:24:45] ottomata: ok! omw [17:08:29] 10Data-Engineering, 10Data-Engineering-Kanban: Send cassandra3 (new hosts) logs to logstash - https://phabricator.wikimedia.org/T297460 (10BTullis) 05Open→03Resolved [17:08:32] 10Analytics-Clusters, 10Data-Engineering, 10Data-Engineering-Kanban, 10Cassandra, and 2 others: Cassandra3 migration for Analytics AQS - https://phabricator.wikimedia.org/T249755 (10BTullis) [17:09:36] 10Data-Engineering, 10Data-Engineering-Kanban, 10Superset, 10Epic: Presto/Superset User Experience Improvement - https://phabricator.wikimedia.org/T294259 (10BTullis) [18:02:49] 10Analytics, 10Analytics-Wikistats, 10I18n: WikiReportsLocalizations.pm still fetches language names from SVN - https://phabricator.wikimedia.org/T64570 (10Aklapper) @odimitrijevic: Could you please answer the last comment (or find out who knows)? Thanks in advance! [18:08:14] * addshore gently pokes ottomata toward https://gerrit.wikimedia.org/r/c/schemas/event/secondary/+/745914 ;) [18:13:42] 10Data-Engineering, 10Data-Engineering-Kanban, 10Product-Analytics (Kanban): Test log file and error notification - https://phabricator.wikimedia.org/T295733 (10BTullis) I spent a while going through all of the links in the chain with @Mayakp.wiki and I now understand //why// the logs are not appearing in th... [18:31:48] addshore: looking! [18:31:54] <3 [18:32:50] (03CR) 10Ottomata: [C: 03+1] Add analytics/mwcli/command_execution [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/745914 (https://phabricator.wikimedia.org/T293583) (owner: 10Addshore) [18:33:01] addshore: +1, i thnik you should be able to merge [18:33:12] <3 [18:33:19] oh, i should be able to merge! [18:33:24] better log back into gerrit then! :D [18:34:01] (03CR) 10Addshore: [C: 03+2] Add analytics/mwcli/command_execution [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/745914 (https://phabricator.wikimedia.org/T293583) (owner: 10Addshore) [18:34:47] (03Merged) 10jenkins-bot: Add analytics/mwcli/command_execution [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/745914 (https://phabricator.wikimedia.org/T293583) (owner: 10Addshore) [18:58:52] 10Data-Engineering, 10Superset: Document and share Superset Hive Date Filter Guidance - https://phabricator.wikimedia.org/T299681 (10odimitrijevic) [19:01:38] 10Data-Engineering, 10Superset: Document and share Superset Hive Date Filter Guidance - https://phabricator.wikimedia.org/T299681 (10odimitrijevic) @Iflorez putting this on your radar. Also adding @JAllemandou who has more context. [19:22:15] Anyone with a quick pointer to docs to find the events I am logging and make sure they look right? :) [19:34:09] (03CR) 10Mforns: "Thanks a lot for this cleaning!!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/755724 (owner: 10Awight) [19:38:18] (03CR) 10Mforns: [V: 03+2 C: 03+2] "LGTM!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/747065 (https://phabricator.wikimedia.org/T297679) (owner: 10Awight) [20:08:10] addshore: https://wikitech.wikimedia.org/wiki/Event_Platform/Instrumentation_How_To#In_production [20:13:47] mforns: yt? [20:13:56] heya ottomata [20:14:01] trying to understand this templating thing and how it interacts with kwargs, its really weird. [20:14:04] maybe you can help? [20:14:20] I can try! [20:14:22] bc? [20:14:25] ya [20:14:30] omw [20:16:18] hmm, i should be sending them to https://intake-analytics.wikimedia.org/v1/events?hasty=true right? [20:16:23] perhaps I should look at this in the morning [20:41:02] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Catalog: Run Datahub on test cluster - https://phabricator.wikimedia.org/T299703 (10razzi) [20:46:51] addshore yes that should be it [20:46:55] lets see if i can help. [20:47:08] i believe i should have sent a bunch of events [20:47:16] things like [20:47:18] `{"$schema":"/analytics/mwcli/command_execution/1.0.0","command":"docker docker-compose","dt":"2022-01-20T20:14:01.319Z","meta":{"stream":"mwcli.command_execution"},"version":"latest"}` [20:47:33] dont see them in the event stream, or validation errors, but it could be an issue on my end! [20:48:18] addshore: [20:48:21] did you add stream config? [20:48:21] https://wikitech.wikimedia.org/wiki/Event_Platform/Instrumentation_How_To#Deployment [20:48:33] No! [20:48:46] I need to do that even if my thing has nothing to do with mediawiki right? :P [20:48:47] (you don't need to do the register for EventLogging ext part), but you do need to declare your stream [20:48:49] yes [20:48:55] cool, will do, thanks! :) [20:49:08] we just use mediawiki config to manage the streams, there is even an API [20:49:38] curl https://meta.wikimedia.org/w/api.php?action=streamconfigs [21:00:36] 10Data-Engineering, 10Superset: Document and share Superset Hive Date Filter Guidance - https://phabricator.wikimedia.org/T299681 (10Mayakp.wiki) Hi @odimitrijevic and @JAllemandou we are in the process of changing the date range filter for the health metrics dashboards T298578. Would love to use this ! and im... [21:01:04] that config is nhuge [21:01:06] *huge [21:01:16] almost! try with ?all_settings=true :p [21:01:20] but ou can filter [21:01:41] should i just pick a key? and I guess I am aiming for == eventgate-analytics streams == ? [21:01:51] https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/EventStreamConfig/#streamconfig-mw-api-endpoint [21:01:55] oh [21:01:57] all you need to dioo is [21:02:20] exactly this [21:02:20] https://wikitech.wikimedia.org/wiki/Event_Platform/Instrumentation_How_To#Stream_Configuration [21:02:30] all you need is your stream name, and then those two settings [21:03:24] so i go from title of `analytics/mwcli/command_execute` to `analytics.mwcli.command_execute` ? or just `mwcli.command_execute` ? [21:03:49] looks like you have [21:03:50] "stream":"mwcli.command_execution" [21:03:51] in your examples [21:03:58] so probbaly just mwcli.command_execution [21:03:59] and [21:04:10] yes, schema_title : analytics/mwcli/command_execute [21:04:10] aaah right, `mwcli.command_execute` indeed in the examples [21:06:34] https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/755794 feels about right then :) [21:31:05] +1 addshore :) [21:39:13] hmmm mforns joal just thought of a possible reason we can't use skein [21:39:17] keytabs... [21:39:21] unless.... [21:39:23] hmm [21:39:31] no maybe it will be ok, because of the airflow keytab service? [21:39:39] hmm [21:41:35] ahhh hm, skein has a keytab param. [21:41:35] hm