[05:37:47] 10Data-Engineering, 10Data-Engineering-Kanban: reset kerberos password - https://phabricator.wikimedia.org/T303146 (10odimitrijevic) p:05Triageβ†’03High [05:43:33] 10Data-Engineering, 10MW-on-K8s, 10serviceops: IPInfo MediaWiki extension depends on presence of maxmind db in the container/host - https://phabricator.wikimedia.org/T288375 (10odimitrijevic) [11:00:14] 10Data-Engineering, 10Data-Catalog: Streamline CI for our fork of DataHub - https://phabricator.wikimedia.org/T303381 (10BTullis) [11:56:45] 10Data-Engineering, 10MediaWiki-extensions-WikimediaEvents, 10Product-Analytics: Remove InputDeviceDynamics EventLoggingSchemas entry - https://phabricator.wikimedia.org/T302896 (10phuedx) >>! In T302896#7761270, @kzimmerman wrote: > @phuedx is there something here that might impact our work in Product Analy... [12:18:33] 10Data-Engineering, 10Data-Catalog, 10Epic: Data Catalog MVP - https://phabricator.wikimedia.org/T299910 (10BTullis) [12:18:36] 10Data-Engineering, 10Data-Catalog: Streamline CI for our fork of DataHub - https://phabricator.wikimedia.org/T303381 (10BTullis) [12:20:45] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Catalog, 10Patch-For-Review: Define the Helm charts and helmfile deployments for Datahub - https://phabricator.wikimedia.org/T301454 (10BTullis) I have now got the helm-lint checks to pass in CI: https://gerrit.wikimedia.org/r/c/operations/deployment-ch... [14:08:00] 10Data-Engineering, 10Data-Catalog: Streamline CI for our fork of DataHub - https://phabricator.wikimedia.org/T303381 (10Ottomata) Another idea: don't build the project when building the images. Our β€˜fork’ would just have some tooling to build (and upload?) a release of the build artifacts. A separate repo... [14:18:14] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Catalog: Define LVS load-balancing for OpenSearch cluster - https://phabricator.wikimedia.org/T301458 (10BTullis) I'm not sure yet why it's not working. This is my test against the backend server: ` btullis@lvs1019:~$ curl http://datahubsearch.svc.eqiad.... [14:20:46] milimetric: yt? [14:21:19] trying to figure out why this coord is late [14:21:20] https://hue.wikimedia.org/hue/jobbrowser/#!id=0019083-210107075406929-oozie-oozi-C [14:21:33] hey ottomata [14:21:36] I can help on that [14:21:37] the mediawiki_private are sqoop jobs? [14:21:40] oh okay thanks joal [14:21:49] no they aren't sqoop [14:21:49] hm [14:22:48] ottomata: this job is dependent on geoeditors_monthly job, and the latter has been re-written as an airflow job [14:23:07] ottomata: it's my bad not having tracked that dependency in my evaluation of risk [14:23:18] ah! [14:34:35] 10Data-Engineering, 10Data-Catalog: Streamline CI for our fork of DataHub - https://phabricator.wikimedia.org/T303381 (10BTullis) Yes, that sounds like it would work. In fact I'm not sure that we're going to have to modify the code much anyway for the MVP. The one file that I know we will have to change is the... [14:37:41] 10Data-Engineering, 10Data-Catalog, 10SRE, 10serviceops, 10Service-deployment-requests: New Service Request: DataHub - https://phabricator.wikimedia.org/T303049 (10BTullis) The helm charts and helmfile deployment are now passing the CI `helm-lint` stage. [14:42:56] o/ ottomata, I'm back from dropping off kids ~9:30 every day, today a bit later [14:43:20] can I still help or did jo beat me to it? [14:43:42] milimetric: looks like jo al had the answer, no worries! [14:45:10] joal: okay so nothing to be done? [14:46:20] .7 [14:46:22] uff [14:46:23] :) [14:48:17] ottomata: I'll sync with mforns about the job we moved to airflow (IIRC it had issues), and will ask Ntsako to work on the following jobs if possible [14:48:25] okay gr8 [14:48:25] ty [14:48:38] joal: I'm here, can I help? [14:49:05] Hi mforns :) [14:49:12] helloooo :] [14:49:23] I have questions about the geoeditors-monthly job moved to airflow [14:49:32] ok [14:49:36] wanna cave? [14:49:57] mforns: arf, NaΓ© just woke up - will ping you later [14:50:03] sorry [14:50:03] ok, no problemo [14:50:09] πŸ‘ [15:01:13] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Catalog: Define LVS load-balancing for OpenSearch cluster - https://phabricator.wikimedia.org/T301458 (10BTullis) OK, all I had to do was to restart the `opensearch_1@datahub` service on the three datahubsearch servers. They were already configured with `... [15:16:30] 10Analytics-Radar, 10Data-Engineering, 10Event-Platform, 10MediaWiki-Recent-changes, and 2 others: Remove deprecated RCFeedEngine support - https://phabricator.wikimedia.org/T250628 (10Krinkle) Next step (after branch cut) removal of the deprecated portions. [15:17:10] 10Analytics-Radar, 10Data-Engineering, 10Event-Platform, 10MediaWiki-Recent-changes, and 2 others: Remove deprecated RCFeedEngine support - https://phabricator.wikimedia.org/T250628 (10Krinkle) [15:56:07] 10Data-Engineering, 10Data-Engineering-Kanban, 10Airflow: [Airflow] Create success_file operator - https://phabricator.wikimedia.org/T303405 (10ntsako) [15:58:46] 10Data-Engineering, 10Data-Catalog: Set up karapace instance for datahub - https://phabricator.wikimedia.org/T301562 (10BTullis) a:05razziβ†’03BTullis I have created the VM in T301563 - Now proceeding to boot the machine and make a suitable role. [16:05:10] milimetric: mforns task generator idea: https://gist.github.com/ottomata/c4e81ac91f215c24e4553d874d9543b1 [16:07:37] I like! [16:14:19] 10Data-Engineering, 10ContentTranslation, 10Language-analytics, 10Product-Analytics: Abuse filter analytics dashboard is broken - https://phabricator.wikimedia.org/T302970 (10MNeisler) Update: @Amire80 was unable to fix. He tried running `hdfs dfs -chmod -R o+r /user/amire80/data/cx_abuse_filter_daily` to... [16:15:48] ottomata / mforns: hm, both >> and << return the right operand. So that's pretty weird, in that solution we'd have to be careful to always return the task furthest downstream [16:15:57] *right operand: rightmost [16:16:33] !log fix group ownership of wmf_product.db/poageviews_corrected/year=222/month=2 after reverting T291664 - sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chgrp -R analytics-privatedata-users /user/hive/warehouse/wmf_product.db/pageviews_corrected/year=2022/month=2 [16:16:36] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:16:36] T291664: Set hive.warehouse.subdir.inherit.perms to false - https://phabricator.wikimedia.org/T291664 [16:34:00] 10Data-Engineering, 10Data-Catalog: Set up karapace instance for datahub - https://phabricator.wikimedia.org/T301562 (10BTullis) [16:56:47] 10Data-Engineering, 10Data-Catalog, 10SRE, 10serviceops, 10Service-deployment-requests: New Service Request: DataHub - https://phabricator.wikimedia.org/T303049 (10JMeybohm) >>! In T303049#7753274, @BTullis wrote: > How can I tell what the source IP address(es) of my services will be, as seen by the back... [17:04:26] re: airflow task generator, I was thinking something like this, close to Andrew's. I think in either style we wouldn't have to set the dag for each task, and give flexibility to the callers: https://gist.github.com/milimetric/df0b4779bea4b1d0e8cdfb9bf867a328 [17:17:15] interestingm, exceptt, you need to se the dag somewhere [17:21:50] ottomata, btullis o/ are we doing the sync today? [17:22:43] elukey: I think so, unless you're busy. We could skip if you like. I haven't got anything that I'm burning to discuss. [17:22:47] yes lets do it [17:22:58] even if just 15 mins to say hi [17:23:21] ack [17:30:59] elukey: btullis [17:31:03] i changed the meeting url [17:31:08] so it doesn't use batcave [17:31:19] meet.google.com/fhr-ymka-izz [17:46:58] hi mforns - I have an airflow related question - would you have a minute? [17:47:28] joal: in a meeting, can I ping you when done? [17:47:33] sure! [18:01:54] (03CR) 10Bearloga: [C: 03+2] Add a required field in mobile_apps fragment [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/766897 (owner: 10Sharvaniharan) [18:02:34] (03Merged) 10jenkins-bot: Add a required field in mobile_apps fragment [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/766897 (owner: 10Sharvaniharan) [18:04:00] joal: ping? [18:04:07] batcave mforns ? [18:04:13] omw! [18:13:01] 10Data-Engineering, 10Data-Catalog, 10SRE, 10serviceops, 10Service-deployment-requests: New Service Request: DataHub - https://phabricator.wikimedia.org/T303049 (10BTullis) > Those will be the IP ranges of the different k8s clusters (the non-ML ones). You can look those up in netbox: https://netbox.wikim... [18:19:28] !log btullis@ganeti1024:~$ sudo gnt-instance start karapace1001.eqiad.wmnet (T301562) [18:19:30] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:19:30] T301562: Set up karapace instance for datahub - https://phabricator.wikimedia.org/T301562 [18:20:08] 10Data-Engineering, 10Data-Catalog: Set up karapace instance for datahub - https://phabricator.wikimedia.org/T301562 (10BTullis) p:05Triageβ†’03High Booted karapace1001 into insetup role. [18:24:32] 10Data-Engineering-Kanban, 10Data-Catalog, 10Patch-For-Review: Set up opensearch cluster for datahub - https://phabricator.wikimedia.org/T301382 (10BTullis) 05Openβ†’03Resolved [18:24:34] 10Data-Engineering, 10Data-Catalog, 10Epic: Data Catalog MVP - https://phabricator.wikimedia.org/T299910 (10BTullis) [18:24:45] milimetric: in your idea where would you pass the dag to the tasks? [18:25:37] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Catalog: Update DNS for the DataHub MVP services - https://phabricator.wikimedia.org/T301460 (10BTullis) p:05Triageβ†’03High a:03BTullis [18:26:08] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Catalog: Configure CAS-SSO authentication for the DataHub frontend - https://phabricator.wikimedia.org/T301462 (10BTullis) p:05Triageβ†’03High a:03BTullis [18:28:16] joal: am here if you wantt to talk gobbln; i don't have anything to solve figure out atm other than my troubles with pushing metrics [18:32:30] !log fix group ownership of wmf_product.db/global_markets_pageviews/year=2022/month=2 after reverting T291664 - sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chgrp -R analytics-privatedata-users /user/hive/warehouse/wmf_product.db/global_markets_pageviews/year=2022/month=2 [18:32:33] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:32:33] T291664: Set hive.warehouse.subdir.inherit.perms to false - https://phabricator.wikimedia.org/T291664 [18:33:40] !log fix group ownership of wmf_product.db//new_editors/cohort=2021-12 after reverting T291664 - sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chgrp -R analytics-privatedata-users /user/hive/warehouse/wmf_product.db/new_editors/cohort=2021-12 [18:33:43] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:34:39] 10Data-Engineering, 10Data-Engineering-Kanban, 10Airflow: Low Risk Oozie Migration: session length - https://phabricator.wikimedia.org/T300029 (10Antoine_Quhen) [18:37:18] ottomata: here! [18:37:23] ottomata: batcave? [18:44:49] OH [18:44:50] coming [19:07:08] 10Data-Engineering, 10Data-Engineering-Kanban, 10ContentTranslation, 10Language-analytics, 10Product-Analytics: Abuse filter analytics dashboard is broken - https://phabricator.wikimedia.org/T302970 (10odimitrijevic) p:05Mediumβ†’03High [19:07:16] 10Data-Engineering, 10Data-Engineering-Kanban, 10ContentTranslation, 10Language-analytics, 10Product-Analytics: Abuse filter analytics dashboard is broken - https://phabricator.wikimedia.org/T302970 (10odimitrijevic) [19:07:45] 10Data-Engineering, 10Data-Engineering-Kanban: Check home/HDFS leftovers of rhuang-ctr - https://phabricator.wikimedia.org/T302194 (10odimitrijevic) [19:10:00] 10Data-Engineering: Migrate to MaxMind GeoIP2 - https://phabricator.wikimedia.org/T302989 (10odimitrijevic) [19:15:16] 10Data-Engineering, 10Data-Engineering-Kanban, 10ContentTranslation, 10Language-analytics, 10Product-Analytics: Abuse filter analytics dashboard is broken - https://phabricator.wikimedia.org/T302970 (10MNeisler) a:05MNeislerβ†’03None [19:15:46] 10Data-Engineering: Procure MaxMind GeoIP2 Database License - https://phabricator.wikimedia.org/T303453 (10odimitrijevic) [19:16:30] 10Data-Engineering, 10Data-Engineering-Kanban: Procure MaxMind GeoIP2 Database License - https://phabricator.wikimedia.org/T303453 (10odimitrijevic) p:05Triageβ†’03High a:03odimitrijevic [19:35:10] 10Data-Engineering: Migrate to MaxMind GeoIP2 - https://phabricator.wikimedia.org/T302989 (10odimitrijevic) [19:35:12] 10Data-Engineering, 10Data-Engineering-Kanban: Procure MaxMind GeoIP2 Database License - https://phabricator.wikimedia.org/T303453 (10odimitrijevic) [19:37:32] 10Data-Engineering: Download the Maxmind Geoip2 Databases - https://phabricator.wikimedia.org/T303461 (10odimitrijevic) [19:38:47] 10Data-Engineering: Switch Matamo to using GeoIP2 databases - https://phabricator.wikimedia.org/T303462 (10odimitrijevic) [19:39:19] 10Data-Engineering: Modify Matamo to use GeoIP2 databases - https://phabricator.wikimedia.org/T303462 (10odimitrijevic) [19:39:34] 10Data-Engineering: Modify Refine jobs to use GeoIP2 databases - https://phabricator.wikimedia.org/T303463 (10odimitrijevic) [19:39:49] 10Data-Engineering: Deprecate GeoIP Legacy Download - https://phabricator.wikimedia.org/T303464 (10odimitrijevic) [19:40:14] 10Data-Engineering: Purge GeoIP2 datasets as per the licensing agreement - https://phabricator.wikimedia.org/T303465 (10odimitrijevic) [19:41:05] 10Data-Engineering, 10Spike: Identify Opportunities in using the new GeoIP2 databases - https://phabricator.wikimedia.org/T303466 (10odimitrijevic) [19:41:48] 10Data-Engineering, 10Spike: Identify Opportunities in using the new GeoIP2 databases - https://phabricator.wikimedia.org/T303466 (10odimitrijevic) [19:41:50] 10Data-Engineering: Purge GeoIP2 datasets as per the licensing agreement - https://phabricator.wikimedia.org/T303465 (10odimitrijevic) [19:41:52] 10Data-Engineering: Modify Matamo to use GeoIP2 databases - https://phabricator.wikimedia.org/T303462 (10odimitrijevic) [19:41:54] 10Data-Engineering: Download the Maxmind Geoip2 Databases - https://phabricator.wikimedia.org/T303461 (10odimitrijevic) [19:41:56] 10Data-Engineering: Migrate to MaxMind GeoIP2 - https://phabricator.wikimedia.org/T302989 (10odimitrijevic) [19:41:58] 10Data-Engineering: Modify Refine jobs to use GeoIP2 databases - https://phabricator.wikimedia.org/T303463 (10odimitrijevic) [20:00:12] 10Data-Engineering, 10Data-Engineering-Kanban: Procure MaxMind GeoIP2 Database License - https://phabricator.wikimedia.org/T303453 (10odimitrijevic) Here are the subscriptions currently on our account: * GeoIP Legacy Region Database 2012-02-28 2022-04-07 * GeoIP2 and GeoIP Legacy City Database 2011-10-24 2022... [20:00:25] ottomata: I'm not sure what the most airflow-ey way to do it is, but it seems like yours (with DAG(...) as dag:) is popular [20:19:10] (03PS1) 10Aqu: Migrate session_length/daily from Oozie to Airflow [analytics/refinery] - 10https://gerrit.wikimedia.org/r/769512 (https://phabricator.wikimedia.org/T300029) [20:35:18] (03PS1) 10Aqu: Migrate session_length/daily from Oozie to Airflow [analytics/refinery] - 10https://gerrit.wikimedia.org/r/769515 (https://phabricator.wikimedia.org/T300029) [20:35:56] milimetric: imean, in your example, you need to tell the Task/Operator what dag they are a part of either when you instantiate them, or once later by calling task.dag(dag) [20:36:26] (03PS1) 10Aqu: Migrate session_length/daily from Oozie to Airflow [analytics/refinery] - 10https://gerrit.wikimedia.org/r/769516 (https://phabricator.wikimedia.org/T300029) [20:38:29] (03PS2) 10Aqu: Migrate session_length/daily from Oozie to Airflow [analytics/refinery] - 10https://gerrit.wikimedia.org/r/769512 (https://phabricator.wikimedia.org/T300029) [20:39:05] (03Abandoned) 10Aqu: Migrate session_length/daily from Oozie to Airflow [analytics/refinery] - 10https://gerrit.wikimedia.org/r/769512 (https://phabricator.wikimedia.org/T300029) (owner: 10Aqu) [20:39:09] (03Abandoned) 10Aqu: Migrate session_length/daily from Oozie to Airflow [analytics/refinery] - 10https://gerrit.wikimedia.org/r/769516 (https://phabricator.wikimedia.org/T300029) (owner: 10Aqu) [20:39:44] 10Data-Engineering: Migrate to MaxMind GeoIP2 - https://phabricator.wikimedia.org/T302989 (10odimitrijevic) [20:41:51] 10Data-Engineering, 10Data-Engineering-Kanban, 10Airflow, 10Epic: Define and implement archiving for Airflow - https://phabricator.wikimedia.org/T300039 (10Antoine_Quhen) a:03Antoine_Quhen [20:49:11] 10Data-Engineering, 10Data-Engineering-Kanban, 10Airflow: Variabilization of existing jobs - https://phabricator.wikimedia.org/T303473 (10Antoine_Quhen) [21:00:19] ottomata: I saw examples like with DAG ...: (instantiate tasks) and it looked like in that context they just assign themselves to the dag created by `with`, no? [21:05:02] oh, maybe milimetric . [21:05:54] !log fix group ownership of cchen.db/new_editors/cohort=2021-12 after reverting T291664 - sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chgrp -R analytics-privatedata-users /user/hive/warehouse/cchen.db/new_editors/cohort=2021-12 [21:05:57] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [21:05:57] T291664: Set hive.warehouse.subdir.inherit.perms to false - https://phabricator.wikimedia.org/T291664 [21:32:09] 10Data-Engineering, 10Privacy Engineering, 10Privacy: Privacy review for dataset publishing (Wikidata topic -> pageview data) - https://phabricator.wikimedia.org/T303304 (10JFishback_WMF)