[00:15:13] (03CR) 10Gergő Tisza: [C: 03+2] Add navigation_type action_data [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/761742 (https://phabricator.wikimedia.org/T301486) (owner: 10MewOphaswongse) [00:15:50] (03Merged) 10jenkins-bot: Add navigation_type action_data [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/761742 (https://phabricator.wikimedia.org/T301486) (owner: 10MewOphaswongse) [00:49:48] 10Analytics, 10Analytics-Wikistats, 10Data-Engineering, 10Data-Engineering-Kanban, 10Product-Analytics: Wikistats pageview data missing counts for Mobile App pageviews on Commons, going back to 2020-11 - https://phabricator.wikimedia.org/T299439 (10SNowick_WMF) Thanks for looking into this @JAllemandou.... [00:50:58] 10Analytics, 10Analytics-Wikistats, 10Data-Engineering, 10Data-Engineering-Kanban, and 4 others: Wikistats pageview data missing counts for Mobile App pageviews on Commons, going back to 2020-11 - https://phabricator.wikimedia.org/T299439 (10SNowick_WMF) [01:34:17] (03CR) 10Sharvaniharan: Add a required variable to app analytics fragment (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/761452 (https://phabricator.wikimedia.org/T299239) (owner: 10Sharvaniharan) [01:51:19] (03PS11) 10Sharvaniharan: Add a required variable to app analytics fragment [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/761452 [01:51:49] (03CR) 10Sharvaniharan: Add a required variable to app analytics fragment (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/761452 (owner: 10Sharvaniharan) [08:23:04] (03PS1) 10Aqu: Migrate wikidata/item_page_link/weekly [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/761876 (https://phabricator.wikimedia.org/T300023) [08:58:44] (03CR) 10Gehel: [C: 03+1] "LGTM" [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/761400 (owner: 10Joal) [10:12:32] 10Analytics-Clusters, 10Data-Engineering, 10Data-Engineering-Kanban, 10Cassandra, and 3 others: Switch over the Cassandra AQS cluster to the new hosts - https://phabricator.wikimedia.org/T297803 (10BTullis) I've created a patch to remove the old cassandra 2 hosts from conftool, so that they cannot be accid... [10:17:17] 10Data-Engineering, 10Data-Catalog: Data Catalog Deployment Plan [Mile Stone 2] - https://phabricator.wikimedia.org/T299888 (10BTullis) Here is the work that I have done so far on the [[https://docs.google.com/document/d/1EDXwh4WPDp-nYzV-Rvy01x8s1fb9drxLnzihRrgNAeM/edit|Design Document for the Data Catalog MVP... [10:20:40] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Catalog: Deploy DataHub in MVP phase - https://phabricator.wikimedia.org/T301385 (10BTullis) a:03BTullis [10:21:56] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Catalog: Create DataHub containers with deployment pipeline - https://phabricator.wikimedia.org/T301453 (10BTullis) a:03BTullis I am beginning to look at this task now. [10:22:17] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Catalog: Create DataHub containers with deployment pipeline - https://phabricator.wikimedia.org/T301453 (10BTullis) p:05Triage→03High [10:22:34] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Catalog: Deploy DataHub in MVP phase - https://phabricator.wikimedia.org/T301385 (10BTullis) p:05Triage→03High [10:23:10] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Catalog: Define the Kubernetes Deployments for Datahub - https://phabricator.wikimedia.org/T301454 (10BTullis) [10:23:12] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Catalog: Create DataHub containers with deployment pipeline - https://phabricator.wikimedia.org/T301453 (10BTullis) [11:05:00] (03CR) 10Phuedx: Metrics Platform event schema (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/676392 (https://phabricator.wikimedia.org/T276379) (owner: 10Jason Linehan) [12:35:00] PROBLEM - Check unit status of produce_canary_events on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [12:35:16] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Catalog, 10Release Pipeline: Create DataHub containers with deployment pipeline - https://phabricator.wikimedia.org/T301453 (10akosiaris) Adding Release Engineering team for their awareness and help with the integration/config repo that will be required... [12:41:51] 10Data-Engineering, 10FR-Tech-Analytics, 10Privacy Engineering: event.WikipediaPortal referer modification - https://phabricator.wikimedia.org/T279952 (10EYener) Hi all! Pinging @mforns who was helpful with the original whitelist request from phab ticket T273246. I recall that we had some discussion over whe... [12:45:47] RECOVERY - Check unit status of produce_canary_events on an-launcher1002 is OK: OK: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [13:40:31] 10Data-Engineering-Radar, 10Gerrit-Privilege-Requests, 10Release-Engineering-Team: Requesting membership of the analytics group in gerrit for 'btullis' - https://phabricator.wikimedia.org/T300631 (10akosiaris) Adding a couple of people for more visibility [13:41:14] 10Data-Engineering-Radar, 10Gerrit-Privilege-Requests, 10Release-Engineering-Team: Requesting membership of the analytics group in gerrit for 'btullis' - https://phabricator.wikimedia.org/T300631 (10BTullis) Apologies for the nudge, but I'd really appreciate it if we could grant these rights please. I'd like... [13:54:05] 10Data-Engineering-Radar, 10Gerrit-Privilege-Requests, 10Release-Engineering-Team: Requesting membership of the analytics group in gerrit for 'btullis' - https://phabricator.wikimedia.org/T300631 (10Majavah) > I'd like to create a new repo in gerrit under analytics/ and I can't do so at the moment. Project... [13:59:39] 10Data-Engineering-Radar, 10Gerrit-Privilege-Requests, 10Release-Engineering-Team: Requesting membership of the analytics group in gerrit for 'btullis' - https://phabricator.wikimedia.org/T300631 (10BTullis) OK, thanks @Majavah - I'll add my request to that page now. [14:09:58] (03PS2) 10Michael DiPietro: minikube helm chart [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/761631 (https://phabricator.wikimedia.org/T301469) [14:15:21] 10Data-Engineering, 10Infrastructure-Foundations, 10Product-Analytics, 10Research, and 2 others: Maybe restrict domains accessible by webproxy - https://phabricator.wikimedia.org/T300977 (10akosiaris) >>! In T300977#7700987, @Ottomata wrote: > Hahah, maybe what we should do is excludelist the internal doma... [14:15:29] (03CR) 10jerkins-bot: [V: 04-1] minikube helm chart [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/761631 (https://phabricator.wikimedia.org/T301469) (owner: 10Michael DiPietro) [14:17:41] 10Data-Engineering-Radar, 10Gerrit-Privilege-Requests, 10Release-Engineering-Team: Requesting membership of the analytics group in gerrit for 'btullis' - https://phabricator.wikimedia.org/T300631 (10BTullis) I've added my repository request to [[https://www.mediawiki.org/wiki/Gerrit/New_repositories/Requests... [14:21:47] (03CR) 10Ottomata: "Sounds good, this latest patch will work. Keeping is_anon and deprecating it in favor of another." [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/761452 (owner: 10Sharvaniharan) [14:24:46] (03CR) 10Ottomata: Metrics Platform event schema (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/676392 (https://phabricator.wikimedia.org/T276379) (owner: 10Jason Linehan) [14:30:30] (03PS3) 10Michael DiPietro: minikube helm chart [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/761631 (https://phabricator.wikimedia.org/T301469) [14:33:41] (03CR) 10jerkins-bot: [V: 04-1] minikube helm chart [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/761631 (https://phabricator.wikimedia.org/T301469) (owner: 10Michael DiPietro) [14:33:52] 10Data-Engineering, 10Infrastructure-Foundations, 10Product-Analytics, 10Research, and 2 others: Maybe restrict domains accessible by webproxy - https://phabricator.wikimedia.org/T300977 (10akosiaris) >>! In T300977#7701074, @mpopov wrote: > A couple of questions/comments: > >>>! In T300977#7700842, @jbon... [14:44:20] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Catalog, 10Release Pipeline: Create DataHub containers with deployment pipeline - https://phabricator.wikimedia.org/T301453 (10BTullis) Thanks @akosiaris - I've done several things to starting kicking this off: * I've requested a new Gerrit repository... [14:47:57] 10Data-Engineering, 10Data-Catalog, 10Epic: Data Catalog MVP - https://phabricator.wikimedia.org/T299910 (10BTullis) [14:47:59] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Catalog: Deploy DataHub in MVP phase - https://phabricator.wikimedia.org/T301385 (10BTullis) [14:49:14] 10Data-Engineering, 10Data-Catalog, 10Epic: Data Catalog MVP - https://phabricator.wikimedia.org/T299910 (10BTullis) a:03BTullis Here is the [[https://docs.google.com/document/d/1EDXwh4WPDp-nYzV-Rvy01x8s1fb9drxLnzihRrgNAeM/edit|design document]] for this MVP deployment. [14:50:25] 10Data-Engineering-Kanban, 10Data-Catalog: Set up opensearch cluster for datahub - https://phabricator.wikimedia.org/T301382 (10BTullis) [14:50:27] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Catalog: Deploy DataHub in MVP phase - https://phabricator.wikimedia.org/T301385 (10BTullis) [14:50:29] 10Data-Engineering, 10Data-Catalog, 10Epic: Data Catalog MVP - https://phabricator.wikimedia.org/T299910 (10BTullis) [14:51:01] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Catalog: Proof-of-concept Karapace as Confluent schema registry replacement - https://phabricator.wikimedia.org/T301386 (10BTullis) [14:51:03] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Catalog: Deploy DataHub in MVP phase - https://phabricator.wikimedia.org/T301385 (10BTullis) [14:51:05] 10Data-Engineering, 10Data-Catalog, 10Epic: Data Catalog MVP - https://phabricator.wikimedia.org/T299910 (10BTullis) [14:51:27] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Catalog, 10Release Pipeline: Create DataHub containers with deployment pipeline - https://phabricator.wikimedia.org/T301453 (10BTullis) [14:51:29] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Catalog: Deploy DataHub in MVP phase - https://phabricator.wikimedia.org/T301385 (10BTullis) [14:51:31] 10Data-Engineering, 10Data-Catalog, 10Epic: Data Catalog MVP - https://phabricator.wikimedia.org/T299910 (10BTullis) [14:51:49] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Catalog: Define the Kubernetes Deployments for Datahub - https://phabricator.wikimedia.org/T301454 (10BTullis) [14:51:51] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Catalog: Deploy DataHub in MVP phase - https://phabricator.wikimedia.org/T301385 (10BTullis) [14:51:53] 10Data-Engineering, 10Data-Catalog, 10Epic: Data Catalog MVP - https://phabricator.wikimedia.org/T299910 (10BTullis) [14:52:19] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Catalog: Define LVS load-balancing for OpenSearch cluster - https://phabricator.wikimedia.org/T301458 (10BTullis) [14:52:21] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Catalog: Deploy DataHub in MVP phase - https://phabricator.wikimedia.org/T301385 (10BTullis) [14:52:23] 10Data-Engineering, 10Data-Catalog, 10Epic: Data Catalog MVP - https://phabricator.wikimedia.org/T299910 (10BTullis) [14:52:43] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Catalog: Define LVS load-balancing for OpenSearch cluster - https://phabricator.wikimedia.org/T301458 (10BTullis) [14:52:45] 10Data-Engineering-Kanban, 10Data-Catalog: Set up opensearch cluster for datahub - https://phabricator.wikimedia.org/T301382 (10BTullis) [14:53:01] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Catalog: Configure MariaDB database for DataHub on an-coord1001 - https://phabricator.wikimedia.org/T301459 (10BTullis) [14:53:03] 10Data-Engineering, 10Data-Catalog, 10Epic: Data Catalog MVP - https://phabricator.wikimedia.org/T299910 (10BTullis) [14:53:25] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Catalog: Update DNS for the DataHub MVP services - https://phabricator.wikimedia.org/T301460 (10BTullis) [14:53:27] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Catalog: Deploy DataHub in MVP phase - https://phabricator.wikimedia.org/T301385 (10BTullis) [14:53:29] 10Data-Engineering, 10Data-Catalog, 10Epic: Data Catalog MVP - https://phabricator.wikimedia.org/T299910 (10BTullis) [14:54:32] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Catalog: Configure CAS-SSO authentication for the DataHub frontend - https://phabricator.wikimedia.org/T301462 (10BTullis) [14:54:34] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Catalog: Deploy DataHub in MVP phase - https://phabricator.wikimedia.org/T301385 (10BTullis) [14:54:36] 10Data-Engineering, 10Data-Catalog, 10Epic: Data Catalog MVP - https://phabricator.wikimedia.org/T299910 (10BTullis) [14:55:16] Sorry about all the phab-spam. [14:59:01] 10Data-Engineering, 10Airflow: Add data-quality to airflow DAGs' name - https://phabricator.wikimedia.org/T300054 (10mforns) @JAllemandou I changed the name from data quality to anomaly detection, because this job family (covered by the AnomalyDetection DAG factory) includes things like the traffic anomaly che... [14:59:04] 10Data-Engineering, 10Data-Catalog, 10Epic: Set up karapace instance for datahub - https://phabricator.wikimedia.org/T301562 (10BTullis) [14:59:56] 10Data-Engineering, 10Airflow: [Airflow] Add DAG subfolder name to error email's subject - https://phabricator.wikimedia.org/T300054 (10mforns) [15:03:03] 10Data-Engineering, 10Data-Catalog, 10Epic: Set up karapace instance for datahub - https://phabricator.wikimedia.org/T301562 (10BTullis) [15:04:23] 10Data-Engineering, 10Airflow: [Airflow] Add DAG subfolder name to error email's subject - https://phabricator.wikimedia.org/T300054 (10mforns) Actually, having all tags in the error email subject, might not be a good idea... It would tie us to using just 1 tag per DAG, otherwise the email subject would be too... [15:05:57] 10Data-Engineering, 10Data-Catalog, 10Epic: Set up karapace instance for datahub - https://phabricator.wikimedia.org/T301562 (10BTullis) a:05BTullis→03razzi Assigning to @razzi if that's OK. [15:06:49] !log set hive.warehouse.subdir.inherit.perms = false - T291664 [15:06:52] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:06:52] T291664: Set hive.warehouse.subdir.inherit.perms to false - https://phabricator.wikimedia.org/T291664 [15:08:12] ottomata: --^ Nice! [15:08:46] i thought about not merging it before weekend..but it should only ever matter when people make new databases [15:08:48] in hive [15:08:54] annnnnd the fallout is easy to fix if it doesn't work [15:08:59] so MERGED. :) [15:09:15] Bosh! [15:10:05] 10Data-Engineering, 10Data-Catalog, 10Epic: Create debian package of karapace - https://phabricator.wikimedia.org/T301565 (10BTullis) [15:10:17] 10Data-Engineering, 10Data-Catalog: Create debian package of karapace - https://phabricator.wikimedia.org/T301565 (10BTullis) [15:10:49] 10Data-Engineering, 10Data-Catalog: Set up karapace instance for datahub - https://phabricator.wikimedia.org/T301562 (10BTullis) [15:11:32] 10Data-Engineering, 10Data-Catalog: Create debian package of karapace - https://phabricator.wikimedia.org/T301565 (10BTullis) a:05razzi→03None [15:11:51] 10Analytics, 10Data-Engineering, 10Event-Platform, 10Patch-For-Review, 10Readers-Web-Backlog (Kanbanana-FY-2021-22): WikipediaPortal Event Platform Migration - https://phabricator.wikimedia.org/T282012 (10Ottomata) @ovasileva curious for an update, how's this going? [15:23:01] joal: o/ i have a little space today, let me know if i can help with gobblin or anything [15:49:49] 10Data-Engineering, 10Data-Engineering-Kanban, 10Airflow, 10Epic: Airflow MVP - https://phabricator.wikimedia.org/T288263 (10mforns) This should also be done no? [15:58:35] 10Data-Engineering-Kanban, 10Data-Catalog: Set up opensearch cluster for datahub - https://phabricator.wikimedia.org/T301382 (10BTullis) I added `datahubsearch` as a new prefix for servers here: https://wikitech.wikimedia.org/wiki/SRE/Infrastructure_naming_conventions#Servers So we can call these servers: data... [16:09:13] 10Data-Engineering, 10Data-Engineering-Kanban, 10Airflow, 10Discovery-Search, and 4 others: Write an Airflow job converting commons structured data dump to Hive - https://phabricator.wikimedia.org/T299059 (10mforns) [16:14:21] 10Data-Engineering, 10Airflow: [Airflow] Research, discuss and decide on DAG/task dependencies VS. success/failure files (Oozie style) - https://phabricator.wikimedia.org/T301568 (10mforns) [16:23:57] 10Data-Engineering: [Anomaly detection] Allow for custom email alert content - https://phabricator.wikimedia.org/T301571 (10mforns) [16:37:23] 10Data-Engineering: [Anomaly detection] Create a heatmap view in Superset - https://phabricator.wikimedia.org/T301572 (10mforns) [16:44:04] Hi ottomata - thanks for offering, I have not restarted my gobblin-metrics stuff yet :S [16:45:59] 10Data-Engineering, 10Airflow: [Airflow] Add DAG subfolder name to error email's subject - https://phabricator.wikimedia.org/T300054 (10JAllemandou) Thanks a lot for looking into this @mforns :) Prefixing is not "pretty", but it's a low-tech win :) [17:03:24] (03PS24) 10Phuedx: Metrics Platform event schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/676392 (https://phabricator.wikimedia.org/T276379) (owner: 10Jason Linehan) [17:55:27] hi all! quick question... is any data about requests served by each data center stored beyond the 90-day purge window anywhere? thx in advance!! [18:16:14] 10Data-Engineering, 10FR-Tech-Analytics, 10Privacy Engineering: event.WikipediaPortal referer modification - https://phabricator.wikimedia.org/T279952 (10mforns) Hey @EYener :-) The country_code is indeed already in the geocoded_data field, so I don't think that there are any privacy concerns in allow-listin... [18:17:32] 10Data-Engineering, 10FR-Tech-Analytics, 10Privacy Engineering: event.WikipediaPortal referer modification - https://phabricator.wikimedia.org/T279952 (10EYener) Awesome thank you @mforns! @JMando and I can tackle this over the next few weeks. [18:18:19] 10Data-Engineering, 10FR-Tech-Analytics, 10Privacy Engineering: event.WikipediaPortal referer modification - https://phabricator.wikimedia.org/T279952 (10mforns) :+1: [18:18:19] ottomata joal btullis ^ ? (apologies for the Friday ping!) [18:18:52] AndyRussG: not that I know of? I don't think datacenter is kept in the pageview metrics [18:18:58] and we don't keep any raw webrequests after 90 days [18:19:05] event data does have source datacenter as a partition [18:19:19] so you probably could get it for any of those, like revision-creates [18:19:25] but not for just general webrequests [18:20:57] ottomata ohh event data has it, fantastic... definitely helpful!!! [18:22:37] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Catalog, 10Release Pipeline: Create DataHub containers with deployment pipeline - https://phabricator.wikimedia.org/T301453 (10jeena) Linking this tutorial for the deployment pipeline in case you find it helpful: https://wikitech.wikimedia.org/wiki/Depl... [18:24:21] ottomata ah right and some events also produce database-stored traces... so with enough of those, we could get a pretty reliable estimate of the recent data loss by country, wiki project, access method, maybe even some other variables, even beyond the webrequest purge window! [18:44:57] ottomata anyway much stuff to possibly dig into, thx so much, hugely appreciated!! [18:47:32] ottomata: have you seen the Airflow Skein logs issue? I responded to it in the email thread, any ideas? [18:49:08] 10Data-Engineering-Radar, 10Gerrit-Privilege-Requests, 10Release-Engineering-Team: Requesting membership of the analytics group in gerrit for 'btullis' - https://phabricator.wikimedia.org/T300631 (10thcipriani) 05Open→03Resolved a:03thcipriani > I think that I should be in the 'Analytics' group in gerr... [19:02:23] 10Data-Engineering-Radar, 10Gerrit-Privilege-Requests, 10Release-Engineering-Team: Requesting membership of the analytics group in gerrit for 'btullis' - https://phabricator.wikimedia.org/T300631 (10BTullis) Many thanks. [19:11:17] (03PS4) 10Michael DiPietro: minikube helm chart [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/761631 (https://phabricator.wikimedia.org/T301469) [19:14:55] (03CR) 10jerkins-bot: [V: 04-1] minikube helm chart [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/761631 (https://phabricator.wikimedia.org/T301469) (owner: 10Michael DiPietro) [19:19:07] 10Data-Engineering, 10SRE: Trash cleanup cron spams on an-test hosts - https://phabricator.wikimedia.org/T286442 (10Dzahn) [20:01:39] (03PS1) 10Michael DiPietro: noop [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/761984