[00:10:05] (03CR) 10Xcollazo: Clean up and parameterize SQL code for Common Impact Metrics. (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1016796 (https://phabricator.wikimedia.org/T358681) (owner: 10Xcollazo) [00:29:19] (03PS17) 10Xcollazo: Clean up and parameterize SQL code for Common Impact Metrics. [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1016796 (https://phabricator.wikimedia.org/T358681) [00:32:34] (03CR) 10Xcollazo: [V:03+2] "Patch set 17 fixes `commons_edits` so that when there is an anonymous or redacted `user_name`, the chosen keyword does not clash with allo" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1016796 (https://phabricator.wikimedia.org/T358681) (owner: 10Xcollazo) [06:28:29] 06Data-Engineering: Requesting kerberos identity for Surbhi Gupta - https://phabricator.wikimedia.org/T362602 (10SGupta-WMF) 03NEW [06:30:32] 06Data-Engineering: Requesting kerberos identity for Surbhi Gupta - https://phabricator.wikimedia.org/T362602#9716362 (10SGupta-WMF) [06:37:30] (03CR) 10Joal: [C:03+1] "LGTM" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1019932 (owner: 10Aleksandar Mastilovic) [06:42:16] Good morning btullis - I tried to ping you yesterday but you probably missed it :) [06:43:12] btullis: my question is about GPUs - Do we still have some of them in machines belonging to the hadoop cluster, or are they all gone, leaving a happy life of non-hadoopy hardware? [07:23:48] (03CR) 10Joal: Productionize CommonsCategoryGraphBuilder for CIM project (034 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1015013 (https://phabricator.wikimedia.org/T358681) (owner: 10Mforns) [07:24:56] 10Data-Engineering (Q4 2024 April 1st - June 30th): [Maintenance] Migrate ReportUpdater browser queries to Airflow - https://phabricator.wikimedia.org/T354552#9716543 (10JAllemandou) a:05JAllemandou→03amastilovic [07:25:08] 10Quarry, 10Data-Services, 10cloud-services-team (FY2023/2024-Q3-Q4): Create db user for Quarry with readonly access to public ToolsDB databases - https://phabricator.wikimedia.org/T348407#9716536 (10KCVelaga_WMF) @fnegri any approximate on when this might be prioritized. This will be very helpful for creati... [07:26:06] 10Data-Engineering (Q4 2024 April 1st - June 30th): Fix and validate browser report DAG and queries - https://phabricator.wikimedia.org/T362201#9716551 (10JAllemandou) →14Duplicate dup:03T354552 [07:26:20] 10Data-Engineering (Q4 2024 April 1st - June 30th): [Maintenance] Migrate ReportUpdater browser queries to Airflow - https://phabricator.wikimedia.org/T354552#9716548 (10JAllemandou) [07:59:48] 06Data-Engineering: Requesting kerberos identity for Surbhi Gupta - https://phabricator.wikimedia.org/T362602#9716620 (10SGupta-WMF) [08:01:48] 06Data-Engineering, 06SRE, 10SRE-Access-Requests: Requesting kerberos identity for Surbhi Gupta - https://phabricator.wikimedia.org/T362602#9716624 (10SGupta-WMF) [08:27:56] joal: Apologies for missing the ping yesterday. Yes, we still have two GPUs in the Hadoop cluster. They are on an-worker1100 and an-worker1101. We recently updated the node labels to match. See: T361225 [08:27:56] T361225: Update GPU labels in Hadoop 's Yarn config - https://phabricator.wikimedia.org/T361225 [08:28:41] 06Data-Engineering, 06SRE, 10SRE-Access-Requests: Requesting kerberos identity for Surbhi Gupta - https://phabricator.wikimedia.org/T362602#9716689 (10BTullis) a:03BTullis [08:31:41] ack! thanks a lot btullis :) [08:32:49] btullis: would you know by any chance if someone still uses the fifo queue for those GPUs? I have a PR about a new yarn queue to help with small jobs, and I dropped the fifo queue assuming it was not used anymore [08:35:37] 10Quarry, 10Data-Services, 10cloud-services-team (FY2023/2024-Q3-Q4): Create db user for Quarry with readonly access to public ToolsDB databases - https://phabricator.wikimedia.org/T348407#9716712 (10dcaro) @fnegri just verifying, the `quarry_readonly` user only has to have access to the public databases (no... [08:51:08] (that is me, heating up stat1009 for a few hours) [08:57:31] awight: Thanks for the heads-up :-) [09:00:51] joal: I'm not actually aware of anyone actively using them, but equally I don't have much evidence that they are not. We can look at Grafana here and I notice that the GPU usage is flat on an-worker1100, but consistently spiking between 2% and 8% on an-worker1001. https://grafana.wikimedia.org/d/ZAX3zaIWz/amd-rocm-gpu?orgId=1&var-source=eqiad%20prometheus%2Fops&var-instance=an-worker1101:9100&from=now-30d&to=now [09:01:19] That may just mean that the card needs resetting. [09:01:54] Should we look at removing these GPUs from Hadoop altogether, along with the fifo queue? [09:16:30] !log restarting mapreduce history service on an-master1003 for T356382 [09:16:32] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [09:37:22] btullis: if no one uses them, I think it'd be a good idea - as for the fifo queue, I'm gonna send a message on slack to ask if anyone uses it, and if I do't get an answer before tomorrow, let's remove it :) [10:30:46] 10Quarry, 10Data-Services, 10cloud-services-team (FY2023/2024-Q3-Q4): Create db user for Quarry with readonly access to public ToolsDB databases - https://phabricator.wikimedia.org/T348407#9717305 (10fnegri) @KCVelaga_WMF I'm sorry there was no progress on this so far, it is still in my backlog. I plan to fi... [10:53:49] (03CR) 10Aqu: "Reviewed again." [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1003745 (https://phabricator.wikimedia.org/T356363) (owner: 10Joal) [10:55:40] (03PS4) 10Sg912: Add queries to format commons impact metrics data as dumps [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1019845 (https://phabricator.wikimedia.org/T358701) (owner: 10Mforns) [10:57:22] (03PS26) 10Joal: Extract RefineSingleApp code from Refine [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1003745 (https://phabricator.wikimedia.org/T356363) [10:57:36] (03CR) 10Joal: Extract RefineSingleApp code from Refine (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1003745 (https://phabricator.wikimedia.org/T356363) (owner: 10Joal) [11:02:10] !log upgrade datahub to v0.12.1 T361688 [11:02:13] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [11:02:13] T361688: Upgrade datahub to v0.12.1 - https://phabricator.wikimedia.org/T361688 [11:08:35] (03CR) 10Joal: [C:03+1] "I also found something to be fixed in how we write reports: when there empty values, spark-csv writes double quotes: "" while the original" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1019363 (owner: 10Aleksandar Mastilovic) [11:14:00] 06Data-Engineering, 06SRE, 10SRE-Access-Requests: Requesting kerberos identity for Surbhi Gupta - https://phabricator.wikimedia.org/T362602#9717418 (10WDoranWMF) Approved [12:07:09] 10Quarry, 10Data-Services, 10cloud-services-team (FY2023/2024-Q3-Q4): Create db user for Quarry with readonly access to public ToolsDB databases - https://phabricator.wikimedia.org/T348407#9717627 (10KCVelaga_WMF) @fnegri that'll be amazing, thank you! Also, a quick question, does this also enable [[ https:/... [12:15:42] 10Quarry, 10Data-Services, 10cloud-services-team (FY2023/2024-Q3-Q4): Create db user for Quarry with readonly access to public ToolsDB databases - https://phabricator.wikimedia.org/T348407#9717683 (10fnegri) Superset must be configured separately, but it can reuse the same credentials. [12:23:18] 10Quarry, 10Data-Services, 10cloud-services-team (FY2023/2024-Q3-Q4): Create db user for Quarry with readonly access to public ToolsDB databases - https://phabricator.wikimedia.org/T348407#9717714 (10dcaro) >>! In T348407#9717305, @fnegri wrote: > @KCVelaga_WMF I'm sorry there was no progress on this so far,... [12:31:43] 10Quarry, 10Data-Services, 10cloud-services-team (FY2023/2024-Q3-Q4): Create db user for Quarry with readonly access to public ToolsDB databases - https://phabricator.wikimedia.org/T348407#9717779 (10fnegri) > We might want to give a different user to avoid confusion (ex. who is running this huge query that... [12:55:09] 06Data-Engineering, 10Cassandra, 06Data-Persistence, 06Data-Platform-SRE: Encrypt Airflow connections to AQS Cassandra - https://phabricator.wikimedia.org/T362181#9717806 (10lbowmaker) [12:55:43] 06Data-Engineering, 06Data-Platform: Add movement insights group/users to MWH denormalize job alerts - https://phabricator.wikimedia.org/T357472#9717812 (10lbowmaker) 05Open→03Resolved [12:59:12] 10Data-Engineering (Q4 2024 April 1st - June 30th), 06Data Products: Modify ClickStreamBuilder pipeline to cope with pagelinks schema changes - https://phabricator.wikimedia.org/T355588#9717838 (10lbowmaker) [13:00:22] 06Data-Engineering, 06SRE, 10SRE-Access-Requests: Requesting kerberos identity for Surbhi Gupta - https://phabricator.wikimedia.org/T362602#9717857 (10ssingh) [13:06:24] 06Data-Engineering, 06SRE, 10SRE-Access-Requests: Requesting kerberos identity for Surbhi Gupta - https://phabricator.wikimedia.org/T362602#9717899 (10ssingh) [13:13:45] 06Data-Engineering, 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting kerberos identity for Surbhi Gupta - https://phabricator.wikimedia.org/T362602#9717931 (10BTullis) [13:15:31] 06Data-Engineering, 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting kerberos identity for Surbhi Gupta - https://phabricator.wikimedia.org/T362602#9717940 (10BTullis) I have created the principal for Surbhi. ` btullis@krb1001:~$ sudo sudo manage_principals.py get sg912 get_principal: Principal... [13:19:06] 06Data-Engineering, 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting kerberos identity for Surbhi Gupta - https://phabricator.wikimedia.org/T362602#9717943 (10ssingh) >>! In T362602#9717940, @BTullis wrote: > I have created the principal for Surbhi. > ` > btullis@krb1001:~$ sudo sudo manage_pri... [13:31:57] (03PS18) 10Mforns: Clean up and parameterize SQL code for Common Impact Metrics. [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1016796 (https://phabricator.wikimedia.org/T358681) (owner: 10Xcollazo) [13:52:43] (03PS1) 10Kai Nissen (WMDE): Fix typo in property attribute [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/1020222 [13:53:08] 06Data-Engineering, 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting kerberos identity for Surbhi Gupta - https://phabricator.wikimedia.org/T362602#9718122 (10ssingh) 05Open→03Resolved Marking this as resolved; if `kinit` doesn't work for you or if there are any issues, please re-open t... [13:58:34] (03CR) 10Kai Nissen (WMDE): "I stumbled upon a typo in the schema and thought I'd quickly patch it instead of reporting it. Please have a look." [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/1020222 (owner: 10Kai Nissen (WMDE)) [14:01:01] (03PS2) 10Kai Nissen (WMDE): Fix typo in property attribute [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/1020222 [14:07:42] (03CR) 10Aleksandar Mastilovic: [V:03+2 C:03+2] "Merging..." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1019932 (owner: 10Aleksandar Mastilovic) [14:20:44] 06Data-Engineering, 10Foundational Technology Requests, 13Patch-For-Review: Enable the Marketing Campaigns Reporting plugin for matomo - https://phabricator.wikimedia.org/T319013#9718321 (10CodeReviewBot) btullis opened https://gitlab.wikimedia.org/repos/data-engineering/matomo/plugin-marketingcampaignsr... [14:21:27] 06Data-Engineering, 10Foundational Technology Requests, 13Patch-For-Review: Enable the Marketing Campaigns Reporting plugin for matomo - https://phabricator.wikimedia.org/T319013#9718343 (10CodeReviewBot) btullis merged https://gitlab.wikimedia.org/repos/data-engineering/matomo/plugin-marketingcampaignsr... [14:40:49] !log failed back HDFS namenode from an-master1004 to an-master1003. [14:40:51] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:41:01] https://www.irccloud.com/pastebin/0bcXy8YF/ [14:52:24] (03PS1) 10Peter Fischer: cirrussearch/update_pipeline/update add change_type.PAGE_RERENDER_UPSERT enum constant [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/1020259 (https://phabricator.wikimedia.org/T358599) [15:00:48] !log kicked off a rolling restart of the hadoop worker datanode and nodemanager process for T356382 [15:00:50] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:18:33] 06Data-Engineering: Package request: install elixir and erlang-otp to the analytics clients - https://phabricator.wikimedia.org/T362678 (10awight) 03NEW [15:19:56] 06Data-Engineering, 10Cassandra: Set up regular-repairs for AQS cassandra cluster tables - https://phabricator.wikimedia.org/T297944#9718762 (10Eevans) p:05High→03Low We've made the upgrade to 4.x already, and we did so without a migration. If I've understood the context above, that was the reason for ele... [15:20:46] 06Data-Engineering, 10Cassandra: Set up regular-repairs for AQS cassandra cluster tables - https://phabricator.wikimedia.org/T297944#9718768 (10Eevans) [15:24:50] (03CR) 10Ebernhardson: [C:03+2] cirrussearch/update_pipeline/update add change_type.PAGE_RERENDER_UPSERT enum constant [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/1020259 (https://phabricator.wikimedia.org/T358599) (owner: 10Peter Fischer) [15:27:34] (03Merged) 10jenkins-bot: cirrussearch/update_pipeline/update add change_type.PAGE_RERENDER_UPSERT enum constant [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/1020259 (https://phabricator.wikimedia.org/T358599) (owner: 10Peter Fischer) [15:29:45] 06Data-Engineering: Package request: install elixir and erlang-otp to the analytics clients - https://phabricator.wikimedia.org/T362678#9718841 (10awight) [15:34:35] 06Data-Engineering: Package request: install elixir and erlang-otp to the analytics clients - https://phabricator.wikimedia.org/T362678#9718900 (10awight) Some of these packages already appeary in debmonitor: * https://debmonitor.wikimedia.org/source-packages/erlang * https://debmonitor.wikimedia.org/source-pack... [16:08:09] We're running another ~3 hour job on stat1009. [16:44:09] (03CR) 10Aleksandar Mastilovic: [V:03+2 C:03+2] "Makes sense. Let me create a separate patch/commit for this and address it in all reportupdater DAG queries at once." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1019363 (owner: 10Aleksandar Mastilovic) [17:03:45] (03PS1) 10Aleksandar Mastilovic: Update converted reportupdater DAG queries to correct CSV options [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1020291 [17:09:08] (03CR) 10Joal: [C:03+1] "LGTM! Thank you for this" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1020291 (owner: 10Aleksandar Mastilovic) [17:12:56] (03CR) 10Aleksandar Mastilovic: [V:03+2 C:03+2] "Ready to merge" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1020291 (owner: 10Aleksandar Mastilovic) [17:22:44] 06Data-Engineering, 06Data-Platform-SRE: Package request: install elixir and erlang-otp to the analytics clients - https://phabricator.wikimedia.org/T362678#9719727 (10lbowmaker) [19:27:50] (03CR) 10Aqu: [C:03+2] Extract RefineSingleApp code from Refine [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1003745 (https://phabricator.wikimedia.org/T356363) (owner: 10Joal) [19:39:38] (03PS1) 10Aqu: Update changelog.md for 0.2.35 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1020338 [19:41:43] (03CR) 10Aqu: [V:03+2 C:03+2] Update changelog.md for 0.2.35 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1020338 (owner: 10Aqu) [20:01:40] Starting build #1 for job analytics-refinery-maven-release [20:08:15] !log Weekly deploy of refinery using scap, then deployed onto hdfs [20:08:16] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [20:18:12] Project analytics-refinery-maven-release build #1: 09SUCCESS in 16 min: https://integration.wikimedia.org/ci/job/analytics-refinery-maven-release/1/ [20:26:54] Starting build #1 for job analytics-refinery-update-jars [20:28:20] (03PS1) 10Maven-release-user: Add refinery-source jars for v0.2.35 to artifacts [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1019783 [20:28:20] Project analytics-refinery-update-jars build #1: 09SUCCESS in 1 min 26 sec: https://integration.wikimedia.org/ci/job/analytics-refinery-update-jars/1/ [20:28:38] (03CR) 10Aqu: [C:03+2] Add refinery-source jars for v0.2.35 to artifacts [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1019783 (owner: 10Maven-release-user) [20:28:40] (03CR) 10Aqu: [V:03+2 C:03+2] Add refinery-source jars for v0.2.35 to artifacts [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1019783 (owner: 10Maven-release-user) [20:55:59] (03PS6) 10Mforns: Productionize CommonsCategoryGraphBuilder for CIM project [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1015013 (https://phabricator.wikimedia.org/T358681) [22:14:53] (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-coord1003:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1003:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage