[02:04:22] 07Analytics-Data-Problem, 06Data-Engineering, 06Data Products, 10Pageviews-API: Missed pageview data over API - https://phabricator.wikimedia.org/T370108#10181909 (10Mayakp.wiki) Hi @VirginiaPoundstone , Ive added the #analytics-data-problem tag to this task, and I think that's the best course of action fo... [09:03:01] 06Data-Engineering, 06Data Products, 06DBA, 10wikitech.wikimedia.org, 07Schema-change: Please drop globalblocks table from labswiki - https://phabricator.wikimedia.org/T375783#10182256 (10Dreamy_Jazz) Once `labswiki` is merged into the production cluster, I presume we should be able to use the `centralau... [10:55:07] 06Data-Engineering, 06Data Products, 06DBA, 10wikitech.wikimedia.org, 07Schema-change: Please drop globalblocks table from labswiki - https://phabricator.wikimedia.org/T375783#10182553 (10Ladsgroup) 05Open→03Stalled We will do that in October 1. Let's mark this stalled until then. [11:25:55] 06Data-Engineering, 10Data Pipelines, 06Data Products, 10Dumps-Generation, and 2 others: MediaWiki Dumps XML - Provide attribute to indicate that user is temporary account in exported content - https://phabricator.wikimedia.org/T365693#10182681 (10kostajh) >>! In T365693#10175611, @xcollazo wrote: > Can we... [12:46:11] 06Data-Engineering, 10Data-Platform-SRE (2024.09.06 - 2024.09.27), 10Temporary accounts (Blockers to minor pilot wiki deployment): Generate a list of Superset users affected by changes to IP masking/temp users - https://phabricator.wikimedia.org/T347510#10183018 (10BTullis) 05Open→03Resolved a:03BTu... [13:02:23] 06Data-Engineering, 10Cassandra, 10Data Pipelines, 10Data-Platform-SRE (2024.09.28 - 2024.10.18), 13Patch-For-Review: Create puppet defined type for adding/updating/deleting secrets or other small files on HDFS - https://phabricator.wikimedia.org/T323692#10183087 (10Gehel) [13:03:42] 10Data-Engineering (Q1 2024 July 1st - September 30th), 10Data-Platform-SRE (2024.09.28 - 2024.10.18): Deploy the HDFS synchronizer service to the dse-k8s cluster - https://phabricator.wikimedia.org/T371994#10183097 (10Gehel) [13:06:54] 10Data-Engineering (Q1 2024 July 1st - September 30th), 10Prod-Kubernetes, 06serviceops, 10Data-Platform-SRE (2024.09.28 - 2024.10.18), and 3 others: Migrate Search Platform-owned helm charts to Calico Network Policies - https://phabricator.wikimedia.org/T373195#10183111 (10Gehel) [13:08:30] 06Data-Engineering, 10Data-Platform-SRE (2024.09.28 - 2024.10.18): Some wikibase tables not available in commonswiki_p - https://phabricator.wikimedia.org/T298452#10183138 (10Gehel) [13:09:20] 06Data-Engineering, 10Data-Platform-SRE (2024.09.28 - 2024.10.18): Design a suitable DAG deployment method - https://phabricator.wikimedia.org/T368033#10183158 (10Gehel) [13:41:10] 10Data-Engineering (Q1 2024 July 1st - September 30th), 10Dumps 2.0 (Kanban Board): Implement Airflow Dataset class for RestExternalTaskSensor - https://phabricator.wikimedia.org/T372647#10183283 (10Ottomata) > it does not care about whether there's an actual file in the s3 bucket, or whether the contents of... [15:15:30] 06Data-Engineering, 10Data-Engineering-Jupyter, 10CAS-SSO, 06Infrastructure-Foundations, 10Data-Platform-SRE (2024.09.28 - 2024.10.18): Allow login to JupyterHub via CAS - https://phabricator.wikimedia.org/T260386#10183628 (10BTullis) [15:17:44] 06Data-Engineering, 10Data-Engineering-Jupyter, 10CAS-SSO, 06Infrastructure-Foundations, 10Data-Platform-SRE (2024.09.28 - 2024.10.18): Allow login to JupyterHub via CAS - https://phabricator.wikimedia.org/T260386#10183636 (10BTullis) a:03BTullis I have some ideas about how we might improve our current... [16:32:32] 10Data-Engineering (Q1 2024 July 1st - September 30th), 06Product-Analytics, 13Patch-For-Review: [SPIKE] Experiment with approaches for a incremental updates of MediaWiki data in the Data Lake - https://phabricator.wikimedia.org/T370354#10183884 (10Ottomata) Parking these links for future reference: - https:... [17:08:38] 06Data-Engineering, 10Data-Platform-SRE (2024.09.28 - 2024.10.18): Design a suitable DAG deployment method - https://phabricator.wikimedia.org/T368033#10184060 (10amastilovic) >>! In T368033#10172189, @brouberol wrote: > Additional thought: once we migrate to `KubernetesExecutor` instead of `LocalExecutor`, th... [18:01:55] 06Data-Engineering, 10Data-Platform-SRE (2024.09.28 - 2024.10.18): Design a suitable DAG deployment method - https://phabricator.wikimedia.org/T368033#10184240 (10brouberol) @amastilovic I'm interested! Do you have something I could read to know more? [20:53:28] 06Data-Engineering, 10Dumps 2.0: [Iceberg Migration] Extend Iceberg table maintenance mechanism to support data rewrite - https://phabricator.wikimedia.org/T373694#10184796 (10xcollazo) 05Open→03In progress a:03xcollazo [20:53:56] 06Data-Engineering, 10Dumps 2.0 (Kanban Board): [Iceberg Migration] Extend Iceberg table maintenance mechanism to support data rewrite - https://phabricator.wikimedia.org/T373694#10184802 (10xcollazo) [21:02:39] 06Data-Engineering: [SPIKE] Learn and document how to use Flink-CDC from MediaWiki MariaDB locally - https://phabricator.wikimedia.org/T373144#10184826 (10Ottomata) I've been playing with this today, and I noticed this: https://nightlies.apache.org/flink/flink-cdc-docs-master/docs/connectors/flink-sources/overv... [22:32:23] 06Data-Engineering, 10Data-Platform-SRE (2024.09.28 - 2024.10.18): Design a suitable DAG deployment method - https://phabricator.wikimedia.org/T368033#10184980 (10amastilovic) >>! In T368033#10184240, @brouberol wrote: > @amastilovic I'm interested! Do you have something I could read to know more? Yep - here'... [22:49:09] 10Data-Engineering (Q1 2024 July 1st - September 30th), 10Dumps 2.0 (Kanban Board): Implement Airflow Dataset class for RestExternalTaskSensor - https://phabricator.wikimedia.org/T372647#10184996 (10amastilovic) >>! In T372647#10183283, @Ottomata wrote: >> it does not care about whether there's an actual file... [23:17:44] 06Data-Engineering, 06Data-Platform-SRE: Airflow scheduler and webserver logs should be readable by airflow instance admins - https://phabricator.wikimedia.org/T304615#10185013 (10BTullis) 05Open→03Declined This ticket is no longer relevant, based on the work in {T362788}. Scheduler and webserver logs... [23:34:55] 06Data-Engineering, 10Data Pipelines: Reimage an-test-client1001.eqiad.wmnet - https://phabricator.wikimedia.org/T324127#10185025 (10BTullis) 05Open→03Resolved a:03BTullis This host has been replaced by an-test-client1002, so I think that this ticket can be closed. [23:36:23] 06Data-Engineering, 10Cassandra, 10Data Pipelines: Encrypt Spark-Cassandra connection - https://phabricator.wikimedia.org/T310820#10185030 (10BTullis) I think that this is complete now, isn't it? Encryption to cassandra is enabled by default. [23:55:14] 06Data-Engineering, 10Data-Engineering-Jupyter: Notebook machine to double as RStudio Server? - https://phabricator.wikimedia.org/T190769#10185046 (10BTullis) @mpopov is this requirement still valid, as far as you are concerned? For context, I'm reviewing the current Jupyter offering and I am investigating wh... [23:59:22] 06Data-Engineering, 06Data-Platform-SRE: LVS in Analytics VLANs - https://phabricator.wikimedia.org/T288750#10185050 (10BTullis)