[06:18:17] hello folks :) [06:20:47] dcausse: o/ if you are around, log size issue with an-airflow1001, otherwise I can drop some old stuff [06:26:21] 10Analytics, 10SRE, 10Patch-For-Review: Trash cleanup cron spams on an-test hosts - https://phabricator.wikimedia.org/T286442 (10elukey) @BTullis we have been doing it manually for the stat100x boxes so far, nothing on puppet! [06:41:21] 10Analytics, 10Analytics-Kanban, 10 Data-Engineering, 10Data-Engineering-Kanban, 10Patch-For-Review: Test Alluxio as cache layer for Presto - https://phabricator.wikimedia.org/T266641 (10elukey) >>! In T266641#7288128, @BTullis wrote: > I was unaware of some of the limitations of the community edition of... [06:42:53] ouch this problem with alluxio is a serious one --^ :( [06:45:53] 10Analytics-Radar, 10Dumps-Generation: xmldatadumps dumpstatus.json files only readable by root - https://phabricator.wikimedia.org/T287989 (10elukey) I think that this is resolved @ArielGlenn! Thanks for the help :) [06:59:59] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Add 6 worker nodes to the HDFS Namenode config of the Analytics Hadoop cluster - https://phabricator.wikimedia.org/T275767 (10elukey) Great work! Going to add some notes to previous posts inline: >>! In T275767#7300801, @BTullis wrote: > I am... [07:06:26] 10Analytics-Clusters, 10Analytics-Radar, 10SRE, 10SRE Observability (FY2021/2022-Q1): Move kafkamon hosts to Debian Buster - https://phabricator.wikimedia.org/T252773 (10Volans) Thanks for the fix. [07:07:07] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Add analytics-presto.eqiad.wmnet CNAME for Presto coordinator failover - https://phabricator.wikimedia.org/T273642 (10elukey) @BTullis have we tested a failover from an-coord1001 to an-coord1002? I am aware of the parent task's follow ups, but I a... [07:08:58] 10Analytics: Update ROCm version on GPU instances. - https://phabricator.wikimedia.org/T287267 (10elukey) @odimitrijevic yes definitely we can work on it (either by ourselves or working with Ben/Razzi if they are not super busy with other projects). Lemme know :) [07:13:14] 10Analytics-Clusters, 10Analytics-Kanban: Set up an-web1001 and decommission thorium - https://phabricator.wikimedia.org/T285355 (10elukey) @RKemper Great work on helm, but I am now wondering what are the expectations of availability for thorium/an-web1001 from the link recommendation point of view. For exampl... [07:31:09] elukey: looking (relatedly Erik is looking into fixing this for real by shipping logs to hdfs instead) [07:32:02] dcausse: ack! We could also think about logstash as option (but I am sure that you folks are aware of it :D) [07:33:11] elukey: yes but I think the point is to let airflow still have access to them so that when you browse the task failures from the ui you can see the errors [07:33:34] dcausse: ahhh lovely thanks for the explanation [08:16:42] 10Analytics, 10Traffic: Review use of realloc in varnishkafka - https://phabricator.wikimedia.org/T287561 (10elukey) Sorry I missed the questions before, trying to add some context. I just added a question in the patch, will try to follow up as much as possible (I am familiar with the code base but I haven't t... [08:45:10] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Refresh Druid nodes (druid100[1-3]) - https://phabricator.wikimedia.org/T255148 (10Dzahn) sorry, I uploaded the changes above to this task by accident. They belong to T281538. [08:58:18] isaacj_: Is this hive performance still an issue for you? I've just run a hive test query without any delay, but feel free to let me know if you're still having issues. [09:01:48] (03PS1) 10GoranSMilovanovic: _lib/_pckg, pipes [analytics/wmde/WD/WikidataAnalytics] - 10https://gerrit.wikimedia.org/r/714536 [09:02:05] (03CR) 10GoranSMilovanovic: [V: 03+2 C: 03+2] _lib/_pckg, pipes [analytics/wmde/WD/WikidataAnalytics] - 10https://gerrit.wikimedia.org/r/714536 (owner: 10GoranSMilovanovic) [09:16:29] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Add 6 worker nodes to the HDFS Namenode config of the Analytics Hadoop cluster - https://phabricator.wikimedia.org/T275767 (10BTullis) OK, so I think I'm ready to go with the addition of these six nodes. My only outstanding concern is what happens... [09:22:41] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Add 6 worker nodes to the HDFS Namenode config of the Analytics Hadoop cluster - https://phabricator.wikimedia.org/T275767 (10elukey) Some notes :) * it may happen that the first puppet run doesn't finish cleanly (namely reporting some errors). A... [09:24:01] elukey: welcome back from your holiday. [09:26:05] I have a quick question about the HDFS web UI. Wikitech says to port forward to 50070 on an-master1001 (or the active namenode?) I'm not getting through to it. Am I missing something? [09:27:07] btullis: o/ [09:27:22] I think that the docs are stale, there is now an https port [09:27:28] it should be in the 50xxx range [09:28:11] (03PS1) 10GoranSMilovanovic: T283575 [analytics/wmde/WD/WikidataAnalytics] - 10https://gerrit.wikimedia.org/r/714540 [09:30:11] Ah, do I need a SOCKS proxy instead of an SSH port forward? [09:31:19] nono I think a port change just be enough [09:31:35] 50470 [09:31:37] via https [09:31:40] it should work [09:32:28] ssh -L 50470:localhost:50470 an-master1001.eqiad.wmnet [09:32:45] the first page has the css broken for some reason, but if you click anywhere it should refresh ok [09:33:34] the "Datanodes" tab shows how balanced the nodes are [09:35:04] Gotcha. I am in. Thanks. Will update Wikitech. I see reference to 50470 on here: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Journalnode_process but not on the main administration page. [09:37:01] ack thanks! [10:04:32] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Add 6 worker nodes to the HDFS Namenode config of the Analytics Hadoop cluster - https://phabricator.wikimedia.org/T275767 (10BTullis) The only issue observed during first puppet run was that hadoop services tried to start before installing the Ja... [10:15:21] 10Analytics-Clusters, 10Analytics-Kanban: Move the Analytics infrastructure to Debian Buster - https://phabricator.wikimedia.org/T234629 (10BTullis) [10:29:27] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Add 6 worker nodes to the HDFS Namenode config of the Analytics Hadoop cluster - https://phabricator.wikimedia.org/T275767 (10BTullis) 84 nodes in service. {F34618694} I will start the balancer via the systemd service file and tail the log. I'd ra... [10:30:52] !log btullis@an-launcher1002:~$ sudo systemctl start hdfs-balancer.service [10:30:56] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:32:40] The six new nodes are in the hadoop cluster. I've started a rebalancing operation, so I'm actively monitoring performance and looking out for errors. [10:41:52] nice! Don't worry too much about the balancer thing, it will take time to gently move blocks here and there [10:42:03] (it may also require multiple runs) [10:42:22] we have enough space on other workers that this is not super urgent [10:42:24] :) [10:42:34] (going to lunch) [10:47:29] Ack. Thanks again. Will just keep half an eye on the hdfs metrics and get back to presto, alluxio and kerberos stuff for now. [10:58:04] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Add 6 worker nodes to the HDFS Namenode config of the Analytics Hadoop cluster - https://phabricator.wikimedia.org/T275767 (10BTullis) The rebalancing operation is proceeding nicely. Each node is copying data from the network at around 190 MB/s. I... [11:08:30] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Add analytics-presto.eqiad.wmnet CNAME for Presto coordinator failover - https://phabricator.wikimedia.org/T273642 (10BTullis) At the moment we haven't updated the presto configuration in production, because this CR is still active: https://gerrit... [12:57:06] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Add analytics-presto.eqiad.wmnet CNAME for Presto coordinator failover - https://phabricator.wikimedia.org/T273642 (10elukey) Nice work, thanks a lot for the summary! [14:14:50] (03CR) 10Joal: [V: 03+2 C: 03+2] "Merging for deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/714402 (owner: 10Joal) [14:44:52] (03PS1) 10Michael DiPietro: Add database autocompletion [analytics/quarry/web] (buster) - 10https://gerrit.wikimedia.org/r/714573 (https://phabricator.wikimedia.org/T287471) [14:44:58] (03PS1) 10Michael DiPietro: Revert "Add database autocompletion" [analytics/quarry/web] (buster) - 10https://gerrit.wikimedia.org/r/714574 [14:45:04] (03PS1) 10Michael DiPietro: add stop query function [analytics/quarry/web] (buster) - 10https://gerrit.wikimedia.org/r/714575 (https://phabricator.wikimedia.org/T71037) [14:45:10] (03PS1) 10Michael DiPietro: quarry stop button to follow current status [analytics/quarry/web] (buster) - 10https://gerrit.wikimedia.org/r/714576 (https://phabricator.wikimedia.org/T289348) [14:45:16] (03PS1) 10Michael DiPietro: Add database autocompletion [analytics/quarry/web] (buster) - 10https://gerrit.wikimedia.org/r/714577 (https://phabricator.wikimedia.org/T287471) [15:03:41] (03CR) 10Andrew Bogott: [C: 03+1] Add database autocompletion [analytics/quarry/web] (buster) - 10https://gerrit.wikimedia.org/r/714573 (https://phabricator.wikimedia.org/T287471) (owner: 10Michael DiPietro) [15:17:26] (03CR) 10Michael DiPietro: [C: 03+2] Add database autocompletion [analytics/quarry/web] (buster) - 10https://gerrit.wikimedia.org/r/714573 (https://phabricator.wikimedia.org/T287471) (owner: 10Michael DiPietro) [15:17:50] (03CR) 10Michael DiPietro: [C: 03+2] Revert "Add database autocompletion" [analytics/quarry/web] (buster) - 10https://gerrit.wikimedia.org/r/714574 (owner: 10Michael DiPietro) [15:18:02] (03Merged) 10jenkins-bot: Add database autocompletion [analytics/quarry/web] (buster) - 10https://gerrit.wikimedia.org/r/714573 (https://phabricator.wikimedia.org/T287471) (owner: 10Michael DiPietro) [15:18:27] (03Merged) 10jenkins-bot: Revert "Add database autocompletion" [analytics/quarry/web] (buster) - 10https://gerrit.wikimedia.org/r/714574 (owner: 10Michael DiPietro) [15:19:07] (03CR) 10Michael DiPietro: [V: 03+2 C: 03+2] Revert "Add database autocompletion" [analytics/quarry/web] (buster) - 10https://gerrit.wikimedia.org/r/714574 (owner: 10Michael DiPietro) [15:19:15] (03CR) 10Michael DiPietro: [C: 03+2] add stop query function [analytics/quarry/web] (buster) - 10https://gerrit.wikimedia.org/r/714575 (https://phabricator.wikimedia.org/T71037) (owner: 10Michael DiPietro) [15:19:48] (03Merged) 10jenkins-bot: add stop query function [analytics/quarry/web] (buster) - 10https://gerrit.wikimedia.org/r/714575 (https://phabricator.wikimedia.org/T71037) (owner: 10Michael DiPietro) [15:20:08] (03CR) 10Mforns: "Hi, thanks for submitting this change!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/713570 (owner: 10Jenniferwang) [15:20:14] (03CR) 10Michael DiPietro: [C: 03+2] quarry stop button to follow current status [analytics/quarry/web] (buster) - 10https://gerrit.wikimedia.org/r/714576 (https://phabricator.wikimedia.org/T289348) (owner: 10Michael DiPietro) [15:20:59] (03Merged) 10jenkins-bot: quarry stop button to follow current status [analytics/quarry/web] (buster) - 10https://gerrit.wikimedia.org/r/714576 (https://phabricator.wikimedia.org/T289348) (owner: 10Michael DiPietro) [15:22:02] (03CR) 10Michael DiPietro: [C: 03+2] Add database autocompletion [analytics/quarry/web] (buster) - 10https://gerrit.wikimedia.org/r/714577 (https://phabricator.wikimedia.org/T287471) (owner: 10Michael DiPietro) [15:22:41] (03Merged) 10jenkins-bot: Add database autocompletion [analytics/quarry/web] (buster) - 10https://gerrit.wikimedia.org/r/714577 (https://phabricator.wikimedia.org/T287471) (owner: 10Michael DiPietro) [15:28:55] 10Quarry: Find somewhere else (not NFS) to store Quarry's resultsets - https://phabricator.wikimedia.org/T178520 (10Bstorm) [15:30:38] 10Quarry: Dev environment should have separate database to test against - https://phabricator.wikimedia.org/T287902 (10Bstorm) [15:31:02] 10Quarry: Dev environment should have separate database to test against - https://phabricator.wikimedia.org/T287902 (10Bstorm) a:03Bstorm [15:33:24] (03CR) 10Mforns: "Thanks for submitting this change!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/713644 (https://phabricator.wikimedia.org/T287255) (owner: 10MNeisler) [15:41:14] 10Analytics, 10Analytics-Kanban, 10 Data-Engineering, 10Data-Engineering-Kanban, 10Patch-For-Review: Test Alluxio as cache layer for Presto - https://phabricator.wikimedia.org/T266641 (10BTullis) I'm still trying to work through all of the pieces so I could be completely wrong, but I think that we //migh... [15:45:47] 10Analytics, 10Analytics-Kanban, 10 Data-Engineering, 10Data-Engineering-Kanban, 10Patch-For-Review: Test Alluxio as cache layer for Presto - https://phabricator.wikimedia.org/T266641 (10JAllemandou) >>! In T266641#7288128, @BTullis wrote: > I was unaware of some of the limitations of the community editi... [16:22:56] 10Analytics, 10Analytics-Kanban, 10 Data-Engineering, 10Data-Engineering-Kanban, 10Patch-For-Review: Test Alluxio as cache layer for Presto - https://phabricator.wikimedia.org/T266641 (10elukey) There is something that I don't understand about the licenses though. https://www.alluxio.io/pricing/ seems to... [17:02:35] (03PS1) 10Jenniferwang: T287256: add UniversalLanguageSelector to sanitized event database Change-Id: I0eee231f95cceb9949a4a2b77f5774760f045b58 [analytics/refinery] - 10https://gerrit.wikimedia.org/r/714607 [17:06:08] (03PS1) 10Michael DiPietro: .gitreview: associate local 'buster' branch with gerrit 'buster' branch [analytics/quarry/web] (buster-2) - 10https://gerrit.wikimedia.org/r/714608 [17:06:14] (03PS1) 10Michael DiPietro: upgrade quarry to python 3.7 [analytics/quarry/web] (buster-2) - 10https://gerrit.wikimedia.org/r/714609 (https://phabricator.wikimedia.org/T288528) [17:06:20] (03PS1) 10Michael DiPietro: updating branch [analytics/quarry/web] (buster-2) - 10https://gerrit.wikimedia.org/r/714610 [17:15:42] (03Abandoned) 10Jenniferwang: T287256: add UniversalLanguageSelector to sanitized event database Change-Id: I0eee231f95cceb9949a4a2b77f5774760f045b58 [analytics/refinery] - 10https://gerrit.wikimedia.org/r/714607 (owner: 10Jenniferwang) [17:27:50] 10Analytics, 10Analytics-Kanban, 10 Data-Engineering, 10Data-Engineering-Kanban, 10Patch-For-Review: Test Alluxio as cache layer for Presto - https://phabricator.wikimedia.org/T266641 (10BTullis) It looks like the open source version has got a [[https://github.com/Alluxio/alluxio/blob/55cc3b70278672ca076... [18:11:56] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Metrics-Platform, 10Product-Data-Infrastructure: Client-side error logging should use Elastic Common Schema (ECS) fields when possible - https://phabricator.wikimedia.org/T267602 (10colewhite) >>! In T267602#7294918, @Mholloway wrote: > Does {T272238... [18:17:49] (03Abandoned) 10Michael DiPietro: .gitreview: associate local 'buster' branch with gerrit 'buster' branch [analytics/quarry/web] (buster-2) - 10https://gerrit.wikimedia.org/r/714608 (owner: 10Michael DiPietro) [18:17:55] (03Abandoned) 10Michael DiPietro: upgrade quarry to python 3.7 [analytics/quarry/web] (buster-2) - 10https://gerrit.wikimedia.org/r/714609 (https://phabricator.wikimedia.org/T288528) (owner: 10Michael DiPietro) [18:18:01] (03Abandoned) 10Michael DiPietro: updating branch [analytics/quarry/web] (buster-2) - 10https://gerrit.wikimedia.org/r/714610 (owner: 10Michael DiPietro) [18:20:30] (03PS1) 10Jenniferwang: T287256: add UniversalLanguageSelector to sanitized event database Change-Id: I0eee231f95cceb9949a4a2b77f5774760f045b58 [analytics/refinery] - 10https://gerrit.wikimedia.org/r/714613 [18:22:59] (03Abandoned) 10Jenniferwang: T287256: add UniversalLanguageSelector to sanitized event database Change-Id: I0eee231f95cceb9949a4a2b77f5774760f045b58 [analytics/refinery] - 10https://gerrit.wikimedia.org/r/714613 (owner: 10Jenniferwang) [18:23:07] (03CR) 10Bearloga: [C: 03+1] Add the mediawiki_pref_diff event platform stream to the allowlist (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/713644 (https://phabricator.wikimedia.org/T287255) (owner: 10MNeisler) [18:26:35] (03PS1) 10Jenniferwang: add UniversalLanguageSelector to sanitized event database Bug: T287256 Change-Id: I0eee231f95cceb9949a4a2b77f5774760f045b58 [analytics/refinery] - 10https://gerrit.wikimedia.org/r/714614 (https://phabricator.wikimedia.org/T287256) [18:30:27] (03Abandoned) 10Jenniferwang: add UniversalLanguageSelector to sanitized event database Bug: T287256 Change-Id: I0eee231f95cceb9949a4a2b77f5774760f045b58 [analytics/refinery] - 10https://gerrit.wikimedia.org/r/714614 (https://phabricator.wikimedia.org/T287256) (owner: 10Jenniferwang) [18:32:14] (03PS2) 10Bearloga: add UniversalLanguageSelector to sanitized event database [analytics/refinery] - 10https://gerrit.wikimedia.org/r/713570 (https://phabricator.wikimedia.org/T287256) (owner: 10Jenniferwang) [18:47:14] (03PS3) 10Jenniferwang: add UniversalLanguageSelector to sanitized event database [analytics/refinery] - 10https://gerrit.wikimedia.org/r/713570 (https://phabricator.wikimedia.org/T287256) [18:49:32] (03PS4) 10Jenniferwang: add UniversalLanguageSelector to sanitized event database [analytics/refinery] - 10https://gerrit.wikimedia.org/r/713570 (https://phabricator.wikimedia.org/T287256) [18:50:42] 10Analytics, 10FR-Tech-Analytics, 10Privacy Engineering: event.WikipediaPortal referer modification - https://phabricator.wikimedia.org/T279952 (10sguebo_WMF) >>! In T279952#7034307, @mforns wrote: > A couple comments. > > 1) The other day, talking with the team, we thought we Analytics could take this task... [19:01:35] mforns: Hello :) [19:04:31] (03CR) 10Jenniferwang: "Thanks for the review. Have updated based on your comments." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/713570 (https://phabricator.wikimedia.org/T287256) (owner: 10Jenniferwang) [19:04:56] 10Analytics-Clusters, 10Analytics-Kanban: Set up an-web1001 and decommission thorium - https://phabricator.wikimedia.org/T285355 (10RKemper) @elukey Agreed, that's definitely something we should hash out with them. I'll hop in their channel and ask about it and report back here as well as in the (fka) analytic... [19:05:22] (03CR) 10Bearloga: Add the mediawiki_pref_diff event platform stream to the allowlist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/713644 (https://phabricator.wikimedia.org/T287255) (owner: 10MNeisler) [19:07:16] 10Analytics, 10FR-Tech-Analytics, 10Privacy Engineering: event.WikipediaPortal referer modification - https://phabricator.wikimedia.org/T279952 (10mforns) @sguebo_WMF Hi! I agree with you that masking the referer URL mitigates the privacy risk. For instance, if someone came from a video streaming site and t... [19:08:13] hello joal :) [19:08:22] troubleshooooooting? [19:08:24] mforns: how is now for a quick talk? [19:08:28] yess [19:08:38] omw to bc [19:08:40] I'm shouting trouble, yeah - LOUDLY! [19:08:48] xD [19:18:12] (03CR) 10Bearloga: Add the mediawiki_pref_diff event platform stream to the allowlist (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/713644 (https://phabricator.wikimedia.org/T287255) (owner: 10MNeisler) [19:49:57] (03CR) 10Mforns: Add the mediawiki_pref_diff event platform stream to the allowlist (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/713644 (https://phabricator.wikimedia.org/T287255) (owner: 10MNeisler) [19:53:41] (03CR) 10Mforns: [V: 03+2 C: 03+2] "Thanks for the changes! LGTM" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/713570 (https://phabricator.wikimedia.org/T287256) (owner: 10Jenniferwang) [20:27:28] 10Analytics, 10Patch-For-Review, 10Product-Analytics (Kanban), 10Readers-Web-Backlog (Tracking): Add UniversalLanguageSelector to the allowlist - https://phabricator.wikimedia.org/T287256 (10jwang) [20:31:34] (03Abandoned) 10MNeisler: Add the mediawiki_pref_diff event platform stream to the allowlist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/713644 (https://phabricator.wikimedia.org/T287255) (owner: 10MNeisler) [20:33:18] 10Analytics, 10Patch-For-Review, 10Product-Analytics (Kanban): Add mediawiki_pref_diff to the allowlist - https://phabricator.wikimedia.org/T287255 (10MNeisler) Blocked on work that will be done in T289622 [20:58:33] (03PS1) 10Michael DiPietro: celery update [analytics/quarry/web] (buster) - 10https://gerrit.wikimedia.org/r/714632 [22:31:06] 10Analytics, 10DC-Ops, 10ops-eqiad: (Need By: ASAP) rack/setup/install an-db100[12].eqiad.wmnet - https://phabricator.wikimedia.org/T289632 (10RobH) [22:31:58] 10Analytics, 10DC-Ops, 10ops-eqiad: (Need By: ASAP) rack/setup/install an-db100[12].eqiad.wmnet - https://phabricator.wikimedia.org/T289632 (10RobH) [22:32:05] 10Analytics, 10DC-Ops, 10ops-eqiad: (Need By: ASAP) rack/setup/install an-db100[12].eqiad.wmnet - https://phabricator.wikimedia.org/T289632 (10RobH) [22:33:43] 10Analytics, 10DC-Ops, 10ops-eqiad: (Need By: ASAP) rack/setup/install an-db100[12].eqiad.wmnet - https://phabricator.wikimedia.org/T289632 (10RobH) [22:33:58] 10Analytics, 10DC-Ops, 10ops-eqiad: (Need By: ASAP) rack/setup/install an-db100[12].eqiad.wmnet - https://phabricator.wikimedia.org/T289632 (10RobH) a:03Jclark-ctr