[05:25:15] 10Analytics, 10Data-Engineering, 10Inuka-Team, 10Product-Analytics: Superset timeouts for KaiOS dashboard - https://phabricator.wikimedia.org/T277320 (10razzi) I also found that the dashboard was slow but no timeout. [06:14:22] 10Data-Engineering, 10Data-Engineering-Kanban, 10User-razzi: Triage Superset Dashboard Timeouts - https://phabricator.wikimedia.org/T294768 (10razzi) The only dashboard that times out consistently is the [IP Masking Dashboard](https://superset.wikimedia.org/superset/dashboard/148/) reported by @Iflorez. The... [06:18:46] 10Data-Engineering, 10Data-Engineering-Kanban, 10User-razzi: Triage Superset Dashboard Timeouts - https://phabricator.wikimedia.org/T294768 (10razzi) My recommendations: - Put fewer charts on a single dashboard (IP Masking has 25, I'd recommend no more than a page full of charts, or ~6 charts. Dashboards ca... [10:05:15] 10Analytics, 10Data-Engineering, 10Product-Analytics, 10Structured-Data-Backlog, and 4 others: Create a Commons equivalent of the wikidata_entity table in the Data Lake - https://phabricator.wikimedia.org/T258834 (10Gehel) a:05AKhatun_WMF→03JAllemandou [10:57:15] 10Data-Engineering, 10DBA, 10Infrastructure-Foundations, 10Puppet: Split mariadb::dbstore_multiinstance into 2 separate roles (backup sources and analytics) - https://phabricator.wikimedia.org/T296285 (10jcrespo) [10:57:39] 10Data-Engineering, 10DBA, 10Infrastructure-Foundations, 10Puppet: Split mariadb::dbstore_multiinstance into 2 separate roles (backup sources and analytics) - https://phabricator.wikimedia.org/T296285 (10jcrespo) [10:58:49] 10Data-Engineering, 10DBA, 10Infrastructure-Foundations, 10Puppet: Split mariadb::dbstore_multiinstance into 2 separate roles (backup sources and analytics) - https://phabricator.wikimedia.org/T296285 (10jcrespo) ^What do you think #data-engineering people? [11:03:33] 10Data-Engineering, 10DBA, 10Infrastructure-Foundations, 10Puppet: Split mariadb::dbstore_multiinstance into 2 separate roles (backup sources and analytics) - https://phabricator.wikimedia.org/T296285 (10BTullis) Thanks @jcrespo - I'm happy with that proposed change and with the naming convention. > ...pro... [11:06:14] 10Data-Engineering, 10DBA, 10Infrastructure-Foundations, 10Puppet: Split mariadb::dbstore_multiinstance into 2 separate roles (backup sources and analytics) - https://phabricator.wikimedia.org/T296285 (10jcrespo) > it will give us greater flexibility if and when we want the dbstore* and db* configurations... [11:10:36] 10Data-Engineering, 10DBA, 10Infrastructure-Foundations, 10Puppet: Split mariadb::dbstore_multiinstance into 2 separate roles (backup sources and analytics) - https://phabricator.wikimedia.org/T296285 (10BTullis) Understood, thanks. Well I'm on-board with it. [11:15:29] I need to restart `presto-server.service` and `oozie.service` on an-coord1001 today, as part of T295673 - There aren't currently any oozie jubs running at the moment, according to Hue. [11:15:41] I think I'm OK just to restart these services, right? [11:16:46] This says to contact us: https://wikitech.wikimedia.org/wiki/Service_restarts#Oozie [11:16:46] > to pause its Bundles/Coordinators/Workflows to avoid any failure in the Hadoop cluster [11:18:35] ...but I can't find any reference to pausing here: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Oozie/Administration [11:18:35] Do I still need to do this if I have verified that no jobs are running? [11:26:44] mforns: Should I merge and deploy this change today? https://gerrit.wikimedia.org/r/c/operations/puppet/+/740233 [11:41:11] btullis: nono oozie can be restarted anytime, it saves its state to the db [11:41:34] there may be some corner cases if it is actually saving on the db, but I have never experienced any [11:43:32] elukey: Great, thanks. Good to know. I'll try to update the docs to make that a bit clearer then. [11:49:35] !log btullis@an-coord1001:~$ sudo systemctl restart oozie.service [11:49:37] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [11:49:55] !log btullis@an-coord1001:~$ sudo systemctl restart presto-server.service [11:49:57] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [11:56:37] !log roll-restarting the cassandra services on the aqs cluster. (Not the aqs_next cluster) [11:56:40] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [12:09:05] Hi btullis :) sorry to be late for the party :S [12:09:13] thanks elukey for the answer [13:45:12] 10Analytics-Radar, 10SRE, 10SRE Observability, 10Wikimedia-Logstash, and 2 others: Retire udp2log: onboard its producers and consumers to the logging pipeline - https://phabricator.wikimedia.org/T205856 (10fgiunchedi) [14:15:20] joal: The party doesn't start until you get here :-) [14:19:46] Our team is having fun with Superset, but wondering why we can't add each other as "owners" of a dataset. Seems that only a_team members can be added? [14:21:52] awight: Is it that the drop-down list of users doesn't contain who you hope to add? If so, that's a known (and recently discovered) issue. [14:33:01] btullis: Yes--actually, I found my team in the users list in most places in the interface, but in dataset -> edit dataset -> owners it seems we cannot add commoners. [14:38:24] awight: Great. In fact, I don't think that it's based on social status at all (although that would be amusing).It's just a bug in Superset: https://phabricator.wikimedia.org/T292262#7489015 [14:38:32] https://github.com/apache/superset/issues/16883 [14:39:31] There's a fix ready to go in Superset version 1.4.0 but that's not quite out yet. The workaround, I believe, is to use the legacy datasource editor until this fixed. [14:41:18] Hey team, Naé is sick at home, I shall be here at meetings but probably not more [14:43:37] joal: Best wishes to Naé. [14:53:02] Thanks btullis - she sleeps now but has a bad cough :( [15:03:34] btullis, mforns: the alert for the analytics_delayed sanitization is due to small of a difference in time between the main sanitization and the monitor job - On a usual basis the sanitization take a few minutes, now with the backfilling of searchsatisfaction it takes between 30 and 40 minutes regularly, and the day of the alert took more than 1h [15:03:59] hmmm [15:04:11] thanks for looking into that joal :] [15:05:35] btw, btullis re. your question, I think this can be merged whenever possible: https://gerrit.wikimedia.org/r/c/operations/puppet/+/740233 [15:14:11] btullis: Wow, thanks for the information! The workaround will be fine for us. [15:37:24] 10Analytics, 10Analytics-Kanban, 10Data-Engineering-Kanban, 10Event-Platform, and 5 others: Revisions missing from mediawiki_revision_create - https://phabricator.wikimedia.org/T215001 (10Milimetric) I'm sorry this update is a bit late. I'm calling this task done, but I'll update the description with the... [15:37:54] awight: A pleasure. 1.4 should be out soon, but in the meantime I suppose I should update the Superset page in wikitech with this workaround. [15:38:55] 10Analytics, 10Analytics-Kanban, 10Data-Engineering-Kanban, 10Event-Platform, and 5 others: Revisions missing from mediawiki_revision_create - https://phabricator.wikimedia.org/T215001 (10Milimetric) [15:39:27] k, I owed everyone an update on that task, I commented, moved to done, and updated the description. I'm out for now but I'll be back later to work on the wikistats messaging bug. [15:48:00] thank you dan! [15:49:18] 10Analytics, 10Analytics-Kanban, 10Data-Engineering-Kanban, 10Event-Platform, and 5 others: Revisions missing from mediawiki_revision_create - https://phabricator.wikimedia.org/T215001 (10Ottomata) Thank you Dan! I'll link to your comment from {T120242}. [15:53:16] 10Analytics, 10DBA, 10Event-Platform, 10WMF-Architecture-Team: Consistent MediaWiki state change events | MediaWiki events as source of truth - https://phabricator.wikimedia.org/T120242 (10Ottomata) In https://phabricator.wikimedia.org/T215001#7523796 @Milimetric did some analysis on missing revision creat... [17:02:34] 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban, 10User-razzi: Presto error in Superset - https://phabricator.wikimedia.org/T292879 (10Ottomata) Yeah a new task maybe makes sense. [17:13:29] 10Analytics, 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban, 10User-razzi: Setup Presto UI in production - https://phabricator.wikimedia.org/T292087 (10razzi) 05In progress→03Resolved This is done and documented on https://wikitech.wikimedia.org/wiki/Analytics/Systems/Presto/Admini... [17:13:33] 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban, 10Patch-For-Review, 10User-razzi: Add a presto query logger - https://phabricator.wikimedia.org/T269832 (10razzi) [17:52:37] 10Data-Engineering, 10Data-Engineering-Kanban, 10Epic: Presto/Superset User Experience Improvement - https://phabricator.wikimedia.org/T294259 (10JAllemandou) [17:52:39] 10Analytics, 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban, 10Epic: Analytics Presto improvements - https://phabricator.wikimedia.org/T266639 (10JAllemandou) [17:53:12] razzi: I just closed the task about presto partitions as declined with a comment [17:53:25] 10Analytics, 10Patch-For-Review: Presto should warn or prevent users from querying without Hive partition predicates - https://phabricator.wikimedia.org/T273004 (10JAllemandou) 05Open→03Declined We have decided not to pursue this road: putting a limit to the number of partitions that presto can quesry at o... [18:01:27] mforns: i was able to move the config.py stuff to Artifact factory methods as you suggested: https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/commit/955b7e7f83e3a313741b07fc3a90802dad6f8760#6f02979cab5b4fa5c207b24fd1ad15fec4124b2c_127_135 [18:18:12] (03PS2) 10DCausse: rdf-streaming-updater: add a "reconcile" operation [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/737429 (https://phabricator.wikimedia.org/T279541) [18:19:43] ottomata: great thank you :] [19:29:54] (03CR) 10Mforns: "Thanks for this patch!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/737471 (https://phabricator.wikimedia.org/T287255) (owner: 10Jenniferwang) [19:42:52] (03CR) 10Jenniferwang: "Hi, thanks for the review." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/737471 (https://phabricator.wikimedia.org/T287255) (owner: 10Jenniferwang) [19:47:11] joal, looking at the refine monitor for the delayed sanitization, I see the data it checks has a 2h45m offset, so unless the sanitization takes that amount of time, it should not alert, right? [19:48:21] https://github.com/wikimedia/puppet/blob/production/modules/profile/manifests/analytics/refinery/job/refine_sanitize.pp#L136 [20:07:10] mforns: [20:07:11] https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/blob/wmf_airflow_lib/wmf_airflow_lib/README.md#wmf-artifact-cache-cli [20:07:23] pip install -e . [20:07:45] wmf-artifact-cache status ../config/artifact_config.yaml ../analytics/artifacts.yaml [20:07:49] etc. [20:29:59] lookin [20:31:33] Hey mforns :) [20:31:40] heya [20:32:58] mforns: it's true that the job has an offset in term of dates it checks, but as the job is daily, even with 2h offset, if the sanitization job hasn't finished it still leaves 22 hours checked not sanitized - right? [20:33:29] that is right! of course [20:33:58] the problem is not with the offset of checked dates, but with the offset of starting time :) [20:34:02] so more important than the hour lag, is the execution time of the monitor in this case, since it's daily [20:34:11] got it [20:35:08] then, let's move back to date-offset=0, and say starting-time-offset=2h no? [20:35:10] mforns: And this not being easy to grasp at first glance, it'd be awesome if you could add a comment to the file :) [20:35:35] sure! [20:35:50] is 2 hours starting-time-offset good for you? [20:36:56] mforns: I hope 2hours offset for starting time is good enough! if it's not we need to check why the job takes so long :) Also, if we don't add an offset for checked-dates but have one for starting point, we're gonna end up with issues, as the computation of the since and until is done relative to the job starting point, right? [20:37:49] So we need both an updated offset for start-time, and another update for checked dates so that those dates match the refined dates by the main job - [20:38:12] Man this is tedious :) [20:38:20] hmmm [20:38:59] i think you're totally right! [20:39:15] ok, will change [20:39:30] Thanks a lot mforns :) [20:41:17] 10Analytics, 10Beta-Cluster-Infrastructure, 10Beta-Cluster-reproducible, 10User-Urbanecm: 502, connect failed for intake-analytics.wikimedia.beta.wmflabs.org - https://phabricator.wikimedia.org/T289029 (10AlexisJazz) 05Resolved→03Open It's broken again. [20:53:02] joal, btullis: the last attempt: https://gerrit.wikimedia.org/r/c/operations/puppet/+/740931 [21:02:09] ottomata: artifact code LGTM! :] [22:12:14] 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban, 10Desktop Improvements, and 3 others: Add agent_type and access_method to sticky header instrumentation - https://phabricator.wikimedia.org/T294246 (10nray) a:05nray→03Edtadros [22:12:20] 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban, 10Desktop Improvements, and 3 others: Add agent_type and access_method to sticky header instrumentation - https://phabricator.wikimedia.org/T294246 (10nray) a:05Edtadros→03cjming [22:50:04] 10Analytics, 10Beta-Cluster-Infrastructure, 10Beta-Cluster-reproducible, 10User-Urbanecm: 502, connect failed for intake-analytics.wikimedia.beta.wmflabs.org - https://phabricator.wikimedia.org/T289029 (10Urbanecm) 05Open→03Resolved Did the same thing as before, works again now. [23:44:38] 10Data-Engineering, 10Research: Consider adding more namespaces to Clickstream dataset - https://phabricator.wikimedia.org/T296359 (10Isaac)