[08:20:07] Heya btullis - May I ask for a quick help? [08:44:09] joal: Yes, happy to help. Apols for delay. [08:45:13] np btullis - elukey chimed in so he started the thing :) [08:46:00] Cool. What was the thing, just out of interest? [08:46:07] hello btullis! We just merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/719680/ but another one for role::aqs_next is needed (Joseph on it), I'll leave the rest to you! [08:46:11] deploying new druid source to AQS [08:46:23] I basically only merged a puppet change :) [08:46:37] (so we need to run puppet again, plus the cookbook) [08:47:09] Ah yes, that's the regular change we have to do every month, isn't it? [08:47:30] yes btullis [08:49:00] OK, great. I'll handle the next commit and the cookbook then. Thanks for stepping in elukey. [08:49:39] s/commit/merge/ [08:50:21] btullis: I just sent a new patch for aqs-next [08:50:39] Joseph has too many karma credits to get back that when I can I try to help :D [08:50:52] :) [09:03:40] btullis: Heya - how is it going on the restart front? [09:04:03] btullis: I need to leave for some errand and wonder if I'll be needed or not [09:05:00] Half way through a puppet run on the 12 hosts. (Running sequentially from cumin) and then I'll start the cookbook. I think you're fine to run the errand. I'll post further progress here. [09:05:28] there is a step to check aqs1004 once depooled though [09:05:40] ack btullis - Can ou tell me if the cookbook has run for aqs1004 for instance? I usually test it after it's done [09:07:11] Oh right, sorry I missed that. I haven't started the cookbook yet. I can wait until you're back if you like and we can look at it together. Is that the best thing? [09:07:39] btullis: we can do the test now if ok for you - after my test on aqs1004 it's ready for you to finalize [09:08:19] btullis: it's usually a matter of seconds [09:08:49] OK, do you want to Meet at the same time, or just chat here? [09:09:00] chat is fine :) [09:09:10] except if you prefer meet btullis :) [09:10:28] No, that's fine. I have this output for you from the cookbook. [09:10:28] > >>> Please test aqs on the canary. [09:10:47] Confirmed btullis - all good for me :) [09:11:01] Proceeding. Many thanks. [09:11:19] nah nah, thank you btullis - sorry for the time pressure [09:11:38] And thank you also elukey for starting the process :) [09:11:46] gone for errand - back i na bit [09:11:56] No worries at all. It's useful for me to understand all the processes. [09:38:51] 10Analytics, 10Cassandra, 10Data-Engineering, 10Data-Engineering-Kanban, and 2 others: Cassandra3 migration for Analytics AQS - https://phabricator.wikimedia.org/T249755 (10hnowlan) [09:39:17] 10Analytics-Clusters, 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban, 10Platform Team Workboards (Platform Engineering Reliability): Upgrade the Cassandra AQS cluster to Cassandra 3.11 - https://phabricator.wikimedia.org/T255141 (10hnowlan) [11:02:42] 10Analytics, 10Event-Platform, 10Platform Engineering: EventStreams sending same data over and over (page links change) - https://phabricator.wikimedia.org/T290211 (10daniel) Some observations off the top of my head: * If a link update (more specifically, a RefreshLinksJob) fails, it will be re-scheduled. Th... [11:45:35] 10Analytics, 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban, 10Patch-For-Review: Test Alluxio as cache layer for Presto - https://phabricator.wikimedia.org/T266641 (10BTullis) As per the discussion on the patch, I too am not keen to set up passwordless SSH access from masters to worker... [12:14:11] PROBLEM - Throughput of EventLogging EventError events on alert1001 is CRITICAL: 122.9 ge 30 https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging/Administration https://grafana.wikimedia.org/dashboard/db/eventlogging?panelId=13&fullscreen&orgId=1 [12:21:51] joal: Hey! Have you heard about our DSE hackathon yet? [12:21:55] RECOVERY - Throughput of EventLogging EventError events on alert1001 is OK: (C)30 ge (W)20 ge 0.8214 https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging/Administration https://grafana.wikimedia.org/dashboard/db/eventlogging?panelId=13&fullscreen&orgId=1 [12:22:11] I've added "un-fork analytics/gobblin" as a potential project: https://docs.google.com/document/d/1g1tPPWuiOTNBsH5-vK-7BEb_Esal3PWmHCosIwPTvd8/edit#heading=h.o01z7iss34px [12:22:48] Hi gehel :) it's been mentionned in the team, but not in a deep enough way for me to memorize/have ideas about it :) [12:23:24] There is nothing deep about it yet. We'll take time together to work on whatever seems to make sense. [12:24:05] I remembered we talked about reorganizing our Gobblin fork, so I thought that might be an interesting project (I might be the only one to find playing with Maven configuration interesting) [12:24:59] huhu :) Can't spark for others gehel, but I have spend quite some time with Gobblin lately, and would probably try something else, even if the idea is bright :) [12:25:06] I'll obviously help as needed :) [12:25:35] gehel: I was thinking about the possibility of experimenting with https://wikitech.wikimedia.org/wiki/Puppet/Pontoon for Analytics and Search [12:26:35] for Gobblin - anything that has not been upstreamed could be, reducing as much as possible WMF-specific things (if possible) [12:27:40] elukey: first time I hear about pontoon! Seems pretty cool! [12:27:45] Feel free to add it to the list! [12:30:03] I'll add comments to what you are writing :) [12:30:20] :) [12:30:28] I'll let you take it over from here! [12:38:04] done! [12:44:30] thanks! [12:45:06] gehel: thank you for the gobblin idea - it would be great to unfork :) [15:05:27] 10Analytics, 10Platform Team Workboards (Image Suggestion API): Airflow collaborations - https://phabricator.wikimedia.org/T282033 (10mforns) [15:05:40] 10Analytics: Agree on a repository structure for Airflow-related code - https://phabricator.wikimedia.org/T290664 (10mforns) [16:00:02] heya a-team :] I created this task ^ for us to discuss the repository structure for Airflow-related code. I didn't want to invade your inbox by subscribing you directly, but please feel free to subscribe and discuss! Thanks :-) [16:07:45] 10Analytics-Clusters, 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban, 10Platform Team Workboards (Platform Engineering Reliability): Upgrade the Cassandra AQS cluster to Cassandra 3.11 - https://phabricator.wikimedia.org/T255141 (10BTullis) a:05hnowlan→03BTullis [16:07:59] 10Analytics, 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban, 10Product-Analytics: Upgrade Superset to 1.2 - https://phabricator.wikimedia.org/T288115 (10razzi) [16:08:19] 10Analytics, 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban, 10Product-Analytics: Upgrade Superset to 1.2 - https://phabricator.wikimedia.org/T288115 (10razzi) 1.3 is out!! https://pypi.org/project/apache-superset/ [16:08:43] 10Analytics, 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban, 10Product-Analytics: Upgrade Superset to 1.3 - https://phabricator.wikimedia.org/T288115 (10razzi) [16:09:15] 10Analytics-Clusters, 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban, 10Platform Team Workboards (Platform Engineering Reliability): Upgrade the Cassandra AQS cluster to Cassandra 3.11 - https://phabricator.wikimedia.org/T255141 (10BTullis) p:05Triage→03High [16:36:45] FYI: I have started the import of a 600 GB table into cassandra3. The biggest previous has been 11 GB. [16:37:19] This is: `local_group_default_T_mediarequest_per_file/data` and the snapshot that I'm importing should be 1/4 of the entire table. [16:38:12] btullis: I looked a bit in sstableloader and indeed, it reschuffles data as needed (which means it probably does for us) [16:38:32] Great, thanks. [16:39:13] 10Analytics: Agree on a repository structure for Airflow-related code - https://phabricator.wikimedia.org/T290664 (10odimitrijevic) p:05Triage→03High a:03mforns [16:41:35] 10Analytics, 10Analytics-Wikistats: Wikimedia Statistics - Horizontal (time) axis wrongly formatted when the option "Monthly" is choosen - https://phabricator.wikimedia.org/T290551 (10odimitrijevic) p:05Triage→03Low [16:43:37] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Update mediawiki-history jobs spark settings - https://phabricator.wikimedia.org/T290469 (10odimitrijevic) p:05Triage→03High [16:45:11] 10Analytics, 10Analytics-EventLogging, 10Event-Platform, 10Wikidata, and 3 others: Migrate WikibaseTermboxInteraction EventLogging Schema to new EventPlatform thingy - https://phabricator.wikimedia.org/T290303 (10odimitrijevic) p:05Triage→03High a:03mforns [16:46:55] 10Analytics-EventLogging, 10Analytics-Radar, 10Epic: Review and evolve client environment around EventLogging - https://phabricator.wikimedia.org/T240462 (10odimitrijevic) [16:47:37] 10Analytics: Jupyter notebook logs should appear in Logstash - https://phabricator.wikimedia.org/T288348 (10BTullis) p:05Triage→03Low a:03BTullis This is a follow-up to {T287339} [16:48:15] 10Analytics, 10Analytics-Kanban: Check AQS with cassandra (serving + data) - https://phabricator.wikimedia.org/T290068 (10odimitrijevic) p:05Triage→03High [16:52:21] 10Analytics, 10Data-Engineering: SPIKE - Will Hadoop 3 container support help us for Airflow deployment pipelines? - https://phabricator.wikimedia.org/T288247 (10odimitrijevic) Container support has been backported to the version that we are using - 2.10 [16:52:49] 10Analytics: Check home/HDFS leftovers of fdans - https://phabricator.wikimedia.org/T290231 (10odimitrijevic) p:05Triage→03High [16:55:09] 10Analytics: Check home/HDFS leftovers of jkatz - https://phabricator.wikimedia.org/T287235 (10odimitrijevic) @kzimmerman does any of the data above need to be kept and transferred to someone else? [16:55:41] 10Analytics, 10SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users group for Abban Dunne - https://phabricator.wikimedia.org/T289775 (10odimitrijevic) Approved [17:11:10] joal: To confirm, after this table imports I will switch to a model of completing a *table* import (from all remaining snapshots), as opposed to completing each snapshot in sequence. [17:12:09] btullis: let's spend a minute in the cave to make sure we're on same page? I actually also have a couple question :) [17:20:39] btullis: actually, let's sync on that tomorrow - it's late already [17:30:35] 10Analytics: Check home/HDFS leftovers of jkatz - https://phabricator.wikimedia.org/T287235 (10kzimmerman) @odimitrijevic I think we should delete it. There's nothing that would be in use for other data workflows. [17:55:45] * mforns going out for some errands, will be back later tonight [19:18:17] 10Analytics: Agree on a repository structure for Airflow-related code - https://phabricator.wikimedia.org/T290664 (10gmodena) Hey @mforns thanks for starting this. To keep complexity low to being with, I'd be keen to start with a monorepo like the pattern proposed in Option 1. IMHO it's easier to split things... [19:41:03] (03PS1) 10Andrew Bogott: Added test_health.py [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/720094 (https://phabricator.wikimedia.org/T210359) [19:44:40] (03CR) 10jerkins-bot: [V: 04-1] Added test_health.py [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/720094 (https://phabricator.wikimedia.org/T210359) (owner: 10Andrew Bogott) [19:48:47] (03PS2) 10Andrew Bogott: Added test_health.py [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/720094 (https://phabricator.wikimedia.org/T210359) [19:50:04] (03CR) 10jerkins-bot: [V: 04-1] Added test_health.py [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/720094 (https://phabricator.wikimedia.org/T210359) (owner: 10Andrew Bogott) [20:13:55] 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10Patch-For-Review: Add dimensions to editors_daily dataset - https://phabricator.wikimedia.org/T256050 (10mpopov) p:05High→03Medium Lowering the priority on this. @cchen can you please touch on this in the transition of your dataset & high-level m... [20:15:38] 10Analytics: Agree on a repository structure for Airflow-related code - https://phabricator.wikimedia.org/T290664 (10mforns) @gmodena thanks for chiming in! > To keep complexity low to being with, I'd be keen to start with a monorepo like the pattern proposed in Option 1. IMHO it's easier to split things up, ra... [20:45:12] 10Analytics, 10Product-Analytics: Internal nbviewer instance for sharing notebooks among 'wmf' and 'nda' members - https://phabricator.wikimedia.org/T290693 (10mpopov) p:05Triage→03Low [20:47:07] 10Analytics, 10Analytics-SWAP: Functionality to share & view SWAP notebooks - https://phabricator.wikimedia.org/T156934 (10mpopov) [20:48:15] 10Analytics, 10Analytics-SWAP, 10Product-Analytics: Functionality to share & view notebooks - https://phabricator.wikimedia.org/T156934 (10mpopov) p:05Medium→03Low [20:48:48] 10Analytics, 10Analytics-SWAP, 10Product-Analytics: Functionality to share & view notebooks - https://phabricator.wikimedia.org/T156934 (10mpopov) [20:48:51] 10Analytics, 10Product-Analytics: Internal nbviewer instance for sharing notebooks among 'wmf' and 'nda' members - https://phabricator.wikimedia.org/T290693 (10mpopov) [20:49:43] 10Analytics, 10Product-Analytics: Internal nbviewer instance for sharing notebooks among 'wmf' and 'nda' members - https://phabricator.wikimedia.org/T290693 (10mpopov) [21:01:26] (03CR) 10Andrew Bogott: "recheck" [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/720094 (https://phabricator.wikimedia.org/T210359) (owner: 10Andrew Bogott) [21:47:54] 10Analytics, 10Event-Platform, 10Platform Engineering: EventStreams sending same data over and over (page links change) - https://phabricator.wikimedia.org/T290211 (10Pchelolo) a:03Pchelolo [21:48:20] 10Analytics, 10Product-Analytics: Internal nbviewer instance for sharing notebooks among 'wmf' and 'nda' members - https://phabricator.wikimedia.org/T290693 (10Urbanecm) When I need this sort of safe storage, I make use of https://people.wikimedia.org/~urbanecm/nda. people.wikimedia.org allows anyone with prod... [22:06:21] 10Analytics, 10Product-Analytics (Kanban): [REQUEST] Investigate decrease in New Registered Users - https://phabricator.wikimedia.org/T289799 (10Iflorez) [22:07:32] 10Analytics-Radar, 10Product-Analytics (Kanban): [REQUEST] Investigate decrease in New Registered Users - https://phabricator.wikimedia.org/T289799 (10Iflorez) [22:12:55] 10Analytics-Radar, 10Product-Analytics (Kanban): [REQUEST] Investigate decrease in New Registered Users - https://phabricator.wikimedia.org/T289799 (10Iflorez) Hi All, I'm investigating this. Following my recent check-in with @mpopov, my next step is to touch base with @nettrom_WMF when he returns on Monday... [22:36:03] (03CR) 10Bstorm: add stop status (031 comment) [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/719567 (https://phabricator.wikimedia.org/T289349) (owner: 10Michael DiPietro) [22:48:23] (03CR) 10Bstorm: "As is, this still seems to provide a "failed" status in my local testing. I deleted my images and made sure it remade them." [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/719567 (https://phabricator.wikimedia.org/T289349) (owner: 10Michael DiPietro) [22:50:01] (03CR) 10Bstorm: add stop status (031 comment) [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/719567 (https://phabricator.wikimedia.org/T289349) (owner: 10Michael DiPietro) [22:54:07] (03CR) 10Bstorm: "That compiled js should be generated with nunchucks. I the readme, it's supposed to go like: `nunjucks-precompile quarry/web/static/templa" [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/719567 (https://phabricator.wikimedia.org/T289349) (owner: 10Michael DiPietro) [22:55:09] (03CR) 10Bstorm: add stop status (031 comment) [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/719567 (https://phabricator.wikimedia.org/T289349) (owner: 10Michael DiPietro)