[00:00:32] RECOVERY - Check unit status of hadoop-namenode-backup-fetchimage on an-master1002 is OK: OK: Status of the systemd unit hadoop-namenode-backup-fetchimage https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [01:24:36] RECOVERY - Check unit status of monitor_refine_event on an-launcher1002 is OK: OK: Status of the systemd unit monitor_refine_event https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [07:29:45] joal: Hi! I added the spark testing repo as a dependency as David had instructed. I run into this now `Failure to find com.holdenkarau:spark-testing-base_2.11:jar:2.4.4_0.11.0 in http://repo.artima.com/releases/ was cached in the local repository, resolution will not be reattempted until the update interval of artima has elapsed or updates are forced`. [07:29:45] Not sure what to do from my side. Thanks! [07:49:48] (03PS24) 10Joal: Update to spark-3 and scala-2.12 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/656897 [07:51:50] Hi tanny411 - can you try with this version please: https://mvnrepository.com/artifact/com.holdenkarau/spark-testing-base_2.11/2.4.4_0.14.0 [07:52:26] tanny411: this version is available from maven central, while I think the one you wish to use is not, and therefore our package management system (archiva) doesn't manage to get it [07:54:57] (03PS1) 10Aqu: Fix returned error code in HDFSArchiver [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/798392 [07:55:58] (03CR) 10Joal: [C: 03+1] "LGTM! Sorry for having caught this in previous review :)" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/798392 (owner: 10Aqu) [07:57:48] (03CR) 10Joal: [C: 03+2] "Merging" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/798392 (owner: 10Aqu) [08:05:35] (03CR) 10Joal: [C: 03+2] "recheck" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/798392 (owner: 10Aqu) [08:07:21] (03Merged) 10jenkins-bot: Fix returned error code in HDFSArchiver [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/798392 (owner: 10Aqu) [08:40:55] 10Analytics-Radar, 10Dumps-Generation, 10Patch-For-Review, 10Platform Team Workboards (Clinic Duty Team): page_restrictions field incomplete in current and historical dumps - https://phabricator.wikimedia.org/T251411 (10Ladsgroup) [08:41:23] 10Analytics-Radar, 10Dumps-Generation, 10Patch-For-Review, 10Platform Team Workboards (Clinic Duty Team): page_restrictions field incomplete in current and historical dumps - https://phabricator.wikimedia.org/T251411 (10Ladsgroup) [08:41:45] (03CR) 10Aqu: [C: 03+1] "👍" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/656897 (owner: 10Joal) [09:03:39] Heads-up, I'm going to merge and deploy https://gerrit.wikimedia.org/r/c/operations/puppet/+/791663 and then perform a rolling restart of cassandra in the aqs cluster soon. [09:04:22] It should be a no-op, but it's enabling inter-dc encryption (prior to bringing the 2nd dc online). [09:13:42] (03CR) 10Joal: Update to spark-3 and scala-2.12 (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/656897 (owner: 10Joal) [09:24:24] (03PS25) 10Joal: Update to spark-3 and scala-2.12 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/656897 [09:28:45] (03CR) 10CI reject: [V: 04-1] Update to spark-3 and scala-2.12 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/656897 (owner: 10Joal) [09:54:53] (03CR) 10Snwachukwu: Add projectview hql scripts to analytics/refinery/hql path. (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/797240 (https://phabricator.wikimedia.org/T309023) (owner: 10Snwachukwu) [10:01:42] (03PS26) 10Joal: Update to spark-3 and scala-2.12 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/656897 [10:18:09] PROBLEM - Hadoop NodeManager on an-worker1139 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23Yarn_Nodemanager_process [10:20:20] --^ This was killed by an out-of-memory error. [10:20:25] https://www.irccloud.com/pastebin/1vllonPP/ [10:21:02] !log restarted hadoop-yarn-nodemanager on an-worker1139 [10:21:04] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:22:11] RECOVERY - Hadoop NodeManager on an-worker1139 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23Yarn_Nodemanager_process [10:43:12] 10Data-Engineering, 10Data-Engineering-Kanban, 10Cassandra: Enable Cassandra encryption (inter-node & client) - https://phabricator.wikimedia.org/T307798 (10BTullis) The change deployment and rolling restart are complete. The cookbook had an interesting failure, in that sometimes the cassandra services didn... [11:01:08] 10Data-Engineering, 10Data-Services, 10Thai-Sites, 10User-bd808, 10cloud-services-team (Kanban): user_properties_anon view not being created/maintained consistently on wikireplicas due to lack of meta_p in all sections - https://phabricator.wikimedia.org/T294652 (10Bebiezaza) [11:49:28] 10Data-Engineering, 10Airflow: Install spark3 in analytics clusters - https://phabricator.wikimedia.org/T295072 (10Antoine_Quhen) Experimental Spark3 is in use for 1 job triggered by Airflow: https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/blob/main/wmf_airflow_common/config/experimental_spa... [11:55:13] (03PS27) 10Joal: Update to spark-3 and scala-2.12 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/656897 [12:00:00] (03CR) 10Joal: [V: 03+2 C: 03+2] "Merging for next deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/797513 (https://phabricator.wikimedia.org/T309057) (owner: 10Gerrit maintenance bot) [12:26:59] (03CR) 10Joal: "Minor comments inline - It would also be good to copy the refinery/hive/projectview/hourly/create_projectview_hourlytable.hql file in the " [analytics/refinery] - 10https://gerrit.wikimedia.org/r/797240 (https://phabricator.wikimedia.org/T309023) (owner: 10Snwachukwu) [12:31:21] (03PS1) 10Joal: Add missing create table scripts in hql folder [analytics/refinery] - 10https://gerrit.wikimedia.org/r/798643 [12:32:59] Ok here we go team - merging the spark3 code for a release laterin the day [12:33:12] (03CR) 10Joal: [C: 03+2] "Merging for release" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/656897 (owner: 10Joal) [12:33:15] \\o// [12:33:42] * joal sweats heavily at all the following stuff that will need to be synchronized [12:35:55] 10Data-Engineering, 10Airflow: Install spark3 in analytics clusters - https://phabricator.wikimedia.org/T295072 (10Ottomata) Very cool! [12:41:16] (03Merged) 10jenkins-bot: Update to spark-3 and scala-2.12 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/656897 (owner: 10Joal) [12:41:45] hello folks, rolling out the change for the default fixed kafka uid/gid (no-op) [12:41:57] joal: good luck with spark3 :) [12:42:11] elukey: ack, many thanks. [12:46:16] 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban, 10Event-Platform, and 2 others: Problem with delay caused by intake-analytics.wikimedia.org - https://phabricator.wikimedia.org/T295427 (10EChetty) a:05phuedx→03BTullis [12:52:42] 10Data-Engineering, 10serviceops, 10Patch-For-Review: Move kafka clusters to fixed uid/gid - https://phabricator.wikimedia.org/T296982 (10elukey) 05Open→03Resolved Change is rolled out everywhere, and now we have sane defaults in `profile::kafka::broker`. [12:57:11] 10Data-Engineering, 10Airflow: Migrate the projectview jobs - https://phabricator.wikimedia.org/T305844 (10Snwachukwu) [12:57:13] 10Data-Engineering-Kanban, 10Airflow, 10Patch-For-Review: Add copies of projectview hql script to analytics/refinery/hql path - https://phabricator.wikimedia.org/T309023 (10Snwachukwu) [13:04:16] thanks elukey <3 [13:12:52] (03PS1) 10Joal: Bump version to 0.2.0-SNAPSHOT to release 0.2.0 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/798662 (https://phabricator.wikimedia.org/T291386) [13:13:05] ottomata, aqu - if any of view has a minute --^ [13:13:20] I need this to release refinery-source v0.2.0 [13:13:29] ok [13:16:28] (03CR) 10Aqu: [C: 03+1] "✔" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/798662 (https://phabricator.wikimedia.org/T291386) (owner: 10Joal) [13:16:39] Thanks aqu :) [13:17:46] (03CR) 10Joal: [C: 03+2] "Merging for release" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/798662 (https://phabricator.wikimedia.org/T291386) (owner: 10Joal) [13:17:58] (03CR) 10Joal: [V: 03+2 C: 03+2] Bump version to 0.2.0-SNAPSHOT to release 0.2.0 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/798662 (https://phabricator.wikimedia.org/T291386) (owner: 10Joal) [13:18:40] 10Data-Engineering, 10Data-Engineering-Kanban: Draft initial data storage platform and place budget hold for Q2 - https://phabricator.wikimedia.org/T308318 (10BTullis) I have completed a draft of the design document for this MVP. [[https://docs.google.com/document/d/1dhAlABcM08zMcw9u01qwukhnw2bf6jQ9rKsRkuRRjd... [13:18:58] !log Release refinery-source v0.2.0 to archiva [13:19:00] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:19:39] Actually I almost forgot the changelog.md - thank you wikitech [13:24:19] (03PS1) 10Joal: Bump changelog.md for version 0.2.0 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/798666 [13:24:31] aqu: if I may again :S --^ [13:26:11] (03CR) 10Aqu: [C: 03+2] "✔" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/798666 (owner: 10Joal) [13:27:41] Thank you! [13:30:35] (03CR) 10Aqu: "Delete:" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/798643 (owner: 10Joal) [13:39:19] (03Merged) 10jenkins-bot: Bump changelog.md for version 0.2.0 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/798666 (owner: 10Joal) [13:44:34] (03CR) 10Joal: Add missing create table scripts in hql folder (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/798643 (owner: 10Joal) [13:44:47] Starting build #105 for job analytics-refinery-maven-release-docker [13:59:25] Project analytics-refinery-maven-release-docker build #105: 09SUCCESS in 14 min: https://integration.wikimedia.org/ci/job/analytics-refinery-maven-release-docker/105/ [14:02:24] 10Data-Engineering, 10Data-Engineering-Kanban, 10Patch-For-Review: Fix turnilo after upgrade - https://phabricator.wikimedia.org/T308778 (10BTullis) Moving this ticket to paused, whilst we wait for news from the upstream. [14:10:32] 10Analytics-Wikistats, 10Data-Engineering, 10NFDI: Is it possible to setup wikistats for a new wiki? - https://phabricator.wikimedia.org/T308253 (10JArguello-WMF) 05Open→03Declined [14:49:36] 10Data-Engineering-Kanban, 10Data-Catalog: Spike: Evaluate datahub schema versioning support - https://phabricator.wikimedia.org/T307716 (10EChetty) [16:07:35] 10Data-Engineering, 10Airflow: Migrate the referrer job - https://phabricator.wikimedia.org/T305842 (10Antoine_Quhen) a:03Antoine_Quhen [16:14:01] 10Data-Engineering, 10Data-Engineering-Kanban, 10Airflow: Migrate the referrer job - https://phabricator.wikimedia.org/T305842 (10Antoine_Quhen) [16:43:12] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Catalog: Evaluate DataHub as a Data Catalog - https://phabricator.wikimedia.org/T299703 (10Milimetric) >>! In T299703#7663941, @BTullis wrote: > `source: > type: "kafka" > config: > connection: > bootstrap: "kafka-jumbo1001.eqiad.wmnet:9092"... [16:47:22] 10Analytics-Radar, 10Machine-Learning-Team, 10Patch-For-Review: Upgrade ROCm to 4.5 - https://phabricator.wikimedia.org/T295661 (10elukey) Time flies and both ROCm and tensorflow-io got several releases. https://github.com/tensorflow/io/releases/tag/v0.23.0 is out and contains the pull request that I made f... [17:05:22] 10Data-Engineering, 10Data-Engineering-Kanban, 10Product-Analytics, 10Superset, 10Patch-For-Review: Upgrade Superset to 1.4.2 - https://phabricator.wikimedia.org/T304972 (10mpopov) [17:22:13] 10Data-Engineering, 10Equity-Landscape: Grants Metrics Transformation - https://phabricator.wikimedia.org/T306620 (10ntsako) Raw grants data loaded under ` ntsako.grants ` [17:45:45] Hey mforns - your opinion is more than welcome on that one: https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/62 [17:46:01] lookin'! [17:56:44] joal: left some comments! [17:56:52] Thanks mforns ) [18:13:25] (03CR) 10Mforns: [V: 03+2 C: 03+2] "LGTM!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/797199 (https://phabricator.wikimedia.org/T308767) (owner: 10Snwachukwu) [18:13:33] (03PS2) 10Mforns: Fix api hql file. [analytics/refinery] - 10https://gerrit.wikimedia.org/r/797199 (https://phabricator.wikimedia.org/T308767) (owner: 10Snwachukwu) [18:13:38] (03CR) 10Mforns: [V: 03+2] Fix api hql file. [analytics/refinery] - 10https://gerrit.wikimedia.org/r/797199 (https://phabricator.wikimedia.org/T308767) (owner: 10Snwachukwu) [18:33:44] !Log Deploying refinery, regular weekly deployment [18:34:48] !log Deploying refinery, regular weekly deployment [18:34:50] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:14:44] 10Data-Engineering, 10Data-Engineering-Kanban, 10Cassandra: Enable Cassandra encryption (inter-node & client) - https://phabricator.wikimedia.org/T307798 (10Eevans) >>! In T307798#7952577, @BTullis wrote: > The change deployment and rolling restart are complete. > > The cookbook had an interesting failure,... [19:54:47] !log Deployed refinery using scap, then deployed onto hdfs successfully. [19:54:49] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [21:38:00] 10Data-Engineering, 10CheckUser, 10MW-1.38-notes (1.38.0-wmf.26; 2022-03-14), 10MW-1.39-notes (1.39.0-wmf.7; 2022-04-11), and 3 others: Update CheckUser for actor and comment table - https://phabricator.wikimedia.org/T233004 (10Zabe) [21:45:12] 10Data-Engineering, 10Event-Platform, 10Generated Data Platform, 10Patch-For-Review: [Shared Event Platform] Ability to use Event Platform streams in Flink without boilerplate - https://phabricator.wikimedia.org/T308356 (10Ottomata) Spent some time writing some tests and trying to make `KafkaEventDynamicTa...