[00:07:40] 06Data-Engineering, 10Wmfdata-Python, 10Movement-Insights (FY25-26 H1): spark.run returns SQL dates as Python datetime.date, not Pandas datetime - https://phabricator.wikimedia.org/T384548#11096581 (10nshahquinn-wmf) [00:08:27] 06Data-Engineering, 10Wmfdata-Python, 10Movement-Insights (FY25-26 H1): mariadb.run returns SQL dates as Python datetime.date, not Pandas datetime - https://phabricator.wikimedia.org/T384546#11096584 (10nshahquinn-wmf) [00:09:39] 06Data-Engineering, 10Wmfdata-Python, 10Movement-Insights (FY25-26 H1): Deprecate Wmfdata's Hive module - https://phabricator.wikimedia.org/T384541#11096585 (10nshahquinn-wmf) a:03nshahquinn-wmf [00:11:43] 06Data-Engineering, 06Data-Engineering-Radar, 10Wmfdata-Python, 10Movement-Insights (FY25-26 H1): Convert existing Wmfdata docstrings to a standard format - https://phabricator.wikimedia.org/T380742#11096591 (10nshahquinn-wmf) [00:12:00] 06Data-Engineering, 06Data-Engineering-Radar, 10Wmfdata-Python, 10Movement-Insights (FY25-26 H1): Convert existing Wmfdata docstrings to a standard format - https://phabricator.wikimedia.org/T380742#11096592 (10nshahquinn-wmf) a:03nshahquinn-wmf [00:12:39] 06Data-Engineering, 06Data-Engineering-Radar, 10Wmfdata-Python, 10Movement-Insights (FY25-26 H1): Fix lint issues in Wmfdata-Python and make the linting CI blocking - https://phabricator.wikimedia.org/T381657#11096593 (10nshahquinn-wmf) [00:12:42] 06Data-Engineering, 06Data-Engineering-Radar, 10Wmfdata-Python, 10Movement-Insights (FY25-26 H1): Fix lint issues in Wmfdata-Python and make the linting CI blocking - https://phabricator.wikimedia.org/T381657#11096594 (10nshahquinn-wmf) a:03nshahquinn-wmf [00:13:04] 06Data-Engineering, 10Wmfdata-Python, 10Movement-Insights (FY25-26 H1): spark.run returns SQL dates as Python datetime.date, not Pandas datetime - https://phabricator.wikimedia.org/T384548#11096595 (10nshahquinn-wmf) a:03nshahquinn-wmf [00:13:13] 06Data-Engineering, 10Wmfdata-Python, 10Movement-Insights (FY25-26 H1): mariadb.run returns SQL dates as Python datetime.date, not Pandas datetime - https://phabricator.wikimedia.org/T384546#11096596 (10nshahquinn-wmf) a:03nshahquinn-wmf [00:17:08] 06Data-Engineering, 06Data-Engineering-Icebox, 10Wmfdata-Python, 07Documentation, 10Movement-Insights (FY25-26 H1): Publish HTML docs for Wmfdata-Python on doc.wikimedia.org - https://phabricator.wikimedia.org/T298178#11096604 (10nshahquinn-wmf) [00:56:13] 06Data-Engineering, 06DBA, 13Patch-For-Review, 07Schema-change-in-production: Add new rc_name_source_patrolled_timestamp index to recentchanges table in wmf production - https://phabricator.wikimedia.org/T402010#11096717 (10Ladsgroup) a:03Ladsgroup [06:55:58] 06Data-Engineering, 10Wmfdata-Python, 10Movement-Insights (FY25-26 H1), 13Patch-For-Review: Deprecate Wmfdata's Hive module - https://phabricator.wikimedia.org/T384541#11096921 (10nshahquinn-wmf) 05Open→03In progress [07:01:32] 06Data-Engineering, 06Data-Engineering-Radar, 10Wmfdata-Python: Suppress tracebacks for Kerberos errors - https://phabricator.wikimedia.org/T345219#11096931 (10nshahquinn-wmf) [07:01:38] 06Data-Engineering, 06Data-Engineering-Radar, 10Wmfdata-Python: Retrieve host & port info when connecting to MariaDB replicas on the cluster - https://phabricator.wikimedia.org/T340472#11096933 (10nshahquinn-wmf) [07:01:42] 06Data-Engineering, 06Data-Engineering-Radar, 10Wmfdata-Python: Let user specify cnf to use when connecting to MariaDB - https://phabricator.wikimedia.org/T340469#11096934 (10nshahquinn-wmf) [07:01:44] 06Data-Engineering, 06Data-Engineering-Radar, 10Wmfdata-Python, 07Epic: Enable Wmfdata-Python to access MariaDB replicas from the cluster - https://phabricator.wikimedia.org/T340467#11096935 (10nshahquinn-wmf) [07:01:46] 06Data-Engineering, 06Data-Engineering-Icebox, 10Wmfdata-Python: Add utility functions for converting between various time formats - https://phabricator.wikimedia.org/T273209#11096938 (10nshahquinn-wmf) [07:01:49] 06Data-Engineering, 06Data-Engineering-Icebox, 10Wmfdata-Python: Update all Wmfdata-Python run functions to have consistent API - https://phabricator.wikimedia.org/T273197#11096940 (10nshahquinn-wmf) [07:01:50] 06Data-Engineering, 06Data-Engineering-Icebox, 10Wmfdata-Python: Update all run functions to allow specifying date and index columns - https://phabricator.wikimedia.org/T273208#11096939 (10nshahquinn-wmf) [07:33:41] 10Data-Engineering (Q1 FY25/26 July 1st - September 30th), 10EventStreams, 10Event-Platform: EventStreams: duplicate events from double compute (wdqs/rdf) streams - https://phabricator.wikimedia.org/T396564#11096994 (10dcausse) >>! In T396564#11095076, @tchin wrote: > @dcausse fyi I just deployed eventstream... [10:09:49] 06Data-Engineering, 06Data-Engineering-Radar, 10AbuseFilter, 07Schema-change, and 2 others: AbuseFilter abuse_filter_log table: Store IP addresses as hex values - https://phabricator.wikimedia.org/T395612#11097450 (10OKryva-WMF) [10:20:07] 06Data-Engineering, 06DBA, 13Patch-For-Review, 07Schema-change-in-production: Add new rc_name_source_patrolled_timestamp index to recentchanges table in wmf production - https://phabricator.wikimedia.org/T402010#11097468 (10Ladsgroup) [12:00:05] 10Data-Engineering (Q1 FY25/26 July 1st - September 30th): Update Blunderbuss wikitech documentation - https://phabricator.wikimedia.org/T402290 (10Ahoelzl) 03NEW [14:20:50] 06Data-Engineering, 10Data-Platform-SRE (2025.07.26 - 2025.08.15), 13Patch-For-Review: Ensure that blunderbuss uses the minimum HDFS file system permissions required - https://phabricator.wikimedia.org/T401103#11098569 (10BTullis) I started by thinking that we needed a POSIX uid/gid for blunderbuss, so I cre... [14:28:17] 06Data-Engineering, 10Data-Platform-SRE (2025.07.26 - 2025.08.15), 13Patch-For-Review: Ensure that blunderbuss uses the minimum HDFS file system permissions required - https://phabricator.wikimedia.org/T401103#11098618 (10BTullis) I have created this new keytab. ` root@krb1002:~# mkdir /srv/kerberos/keytabs/... [14:29:50] (03CR) 10Aqu: [C:03+1] "Looking good." [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1179754 (https://phabricator.wikimedia.org/T402229) (owner: 10Xcollazo) [14:34:10] (03CR) 10Xcollazo: [C:03+2] Remove extra newlines from XML revision rendering from MW Dumper [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1179754 (https://phabricator.wikimedia.org/T402229) (owner: 10Xcollazo) [14:46:48] (03Merged) 10jenkins-bot: Remove extra newlines from XML revision rendering from MW Dumper [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1179754 (https://phabricator.wikimedia.org/T402229) (owner: 10Xcollazo) [14:51:42] 06Data-Engineering, 10Data-Platform-SRE (2025.07.26 - 2025.08.15), 13Patch-For-Review: Ensure that blunderbuss uses the minimum HDFS file system permissions required - https://phabricator.wikimedia.org/T401103#11098783 (10BTullis) Before changing how it works, I will make sure that all of the existing files... [14:55:02] 06Data-Engineering, 10Data-Platform-SRE (2025.07.26 - 2025.08.15), 13Patch-For-Review: Ensure that blunderbuss uses the minimum HDFS file system permissions required - https://phabricator.wikimedia.org/T401103#11098805 (10BTullis) Looks good. It has the principal `blunderbuss/blunderbuss.discovery.wmnet@WIKI... [14:59:22] 10Data-Engineering (Q1 FY25/26 July 1st - September 30th), 10EventStreams, 10Discovery-Search (2025.07.25 - 2025.08.15), 10Event-Platform, 13Patch-For-Review: EventStreams: duplicate events from double compute (wdqs/rdf) streams - https://phabricator.wikimedia.org/T396564#11098831 (10dcausse) [14:59:46] 10Data-Engineering (Q1 FY25/26 July 1st - September 30th), 10EventStreams, 10Discovery-Search (2025.07.25 - 2025.08.15), 10Event-Platform, 13Patch-For-Review: EventStreams: duplicate events from double compute (wdqs/rdf) streams - https://phabricator.wikimedia.org/T396564#11098833 (10dcausse) 05Open... [15:10:10] 10Data-Engineering (Q1 FY25/26 July 1st - September 30th): Investigate why Blunderbuss cache artifacts can have different file permissions - https://phabricator.wikimedia.org/T402315 (10amastilovic) 03NEW [15:22:57] 10Data-Engineering (Q1 FY25/26 July 1st - September 30th), 13Patch-For-Review: Refine to Hive with Airflow – Post-Migration Cleanup - https://phabricator.wikimedia.org/T392698#11098965 (10Antoine_Quhen) [15:26:32] 10Data-Engineering (Q1 FY25/26 July 1st - September 30th): airflow-dags: Move Dockerfile linter to standard linter image and stage - https://phabricator.wikimedia.org/T402204#11098983 (10amastilovic) 05Open→03Resolved [15:47:20] 10Data-Engineering (Q1 FY25/26 July 1st - September 30th): Investigate why Blunderbuss cache artifacts can have different file permissions - https://phabricator.wikimedia.org/T402315#11099085 (10BTullis) Here is a full file listing from after a forced cache warm. `lines=15 hdfs@an-launcher1002:~$ hdfs dfs -ls -R... [15:47:44] 10Data-Engineering (Q1 FY25/26 July 1st - September 30th): Investigate why Blunderbuss cache artifacts can have different file permissions - https://phabricator.wikimedia.org/T402315#11099086 (10BTullis) I reset the permissions with the following command: ` hdfs@an-launcher1002:~$ hdfs dfs -chmod 644 $(hdfs dfs... [15:47:58] 10Data-Engineering (Q1 FY25/26 July 1st - September 30th): Investigate why Blunderbuss cache artifacts can have different file permissions - https://phabricator.wikimedia.org/T402315#11099087 (10BTullis) [15:48:02] 06Data-Engineering, 10Data-Platform-SRE (2025.07.26 - 2025.08.15), 13Patch-For-Review: Ensure that blunderbuss uses the minimum HDFS file system permissions required - https://phabricator.wikimedia.org/T401103#11099088 (10BTullis) [15:58:20] 06Data-Engineering, 10AbuseFilter, 06DBA, 07Schema-change-in-production: Add default value for afl_ip and remove default value for afl_ip_hex in abuse_filter_log table - https://phabricator.wikimedia.org/T401906#11099132 (10FCeratto-WMF) p:05Triage→03Medium a:03FCeratto-WMF [16:44:37] 06Data-Engineering, 10Wmfdata-Python, 10Movement-Insights (FY25-26 H1), 13Patch-For-Review: Deprecate Wmfdata's Hive module - https://phabricator.wikimedia.org/T384541#11099357 (10nshahquinn-wmf) 05In progress→03Resolved [16:46:33] 10Data-Engineering (Q1 FY25/26 July 1st - September 30th): Blunderbuss: Move Hadoop/HDFS XML configuration into Helm deployment chart - https://phabricator.wikimedia.org/T402323 (10amastilovic) 03NEW [16:47:22] 10Data-Engineering (Q1 FY25/26 July 1st - September 30th), 06Data-Platform-SRE: Blunderbuss: Move Hadoop/HDFS XML configuration into Helm deployment chart - https://phabricator.wikimedia.org/T402323#11099371 (10BTullis) [16:48:00] 10Data-Engineering (Q1 FY25/26 July 1st - September 30th), 06Data-Platform-SRE: Blunderbuss: Move Hadoop/HDFS XML configuration into Helm deployment chart - https://phabricator.wikimedia.org/T402323#11099372 (10amastilovic) Use the following example of how these values should be saved in `deployment-charts` re... [17:00:26] 10Data-Engineering (Q1 FY25/26 July 1st - September 30th), 06Data-Platform-SRE: Blunderbuss: Move Hadoop/HDFS XML configuration into Helm deployment chart - https://phabricator.wikimedia.org/T402323#11099416 (10BTullis) I've added #data-platform-sre because we have used this technique before, for spark-history... [17:10:21] 06Data-Engineering: Mitigate consequences of Gobblin hiccups generating late events and alerts - https://phabricator.wikimedia.org/T402324 (10Antoine_Quhen) 03NEW [17:40:35] (03CR) 10Btullis: [V:03+2 C:03+2] Remove obsolete scap target [analytics/refinery/scap] - 10https://gerrit.wikimedia.org/r/1176289 (https://phabricator.wikimedia.org/T390941) (owner: 10Btullis) [17:50:03] 10Data-Engineering (Q1 FY25/26 July 1st - September 30th), 13Patch-For-Review: Investigate why Blunderbuss cache artifacts can have different file permissions - https://phabricator.wikimedia.org/T402315#11099662 (10BTullis) 05Open→03Resolved p:05Triage→03High This is now fixed. We found that the um... [20:15:31] (03PS1) 10Xcollazo: Add support for multiple compression algorithms to MW Dumper. [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1180212 [20:16:42] (03PS2) 10Xcollazo: Add support for multiple compression algorithms to MW Dumper. [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1180212 (https://phabricator.wikimedia.org/T402209) [20:35:32] (03PS3) 10Xcollazo: Add support for multiple compression algorithms to MW Dumper. [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1180212 (https://phabricator.wikimedia.org/T402209) [21:45:12] 06Data-Engineering, 06DBA, 07Schema-change-in-production: Add new rc_name_source_patrolled_timestamp index to recentchanges table in wmf production - https://phabricator.wikimedia.org/T402010#11100455 (10Ladsgroup) [21:45:56] 06Data-Engineering, 06DBA, 07Schema-change-in-production: Add new rc_name_source_patrolled_timestamp index to recentchanges table in wmf production - https://phabricator.wikimedia.org/T402010#11100457 (10Ladsgroup) I'm going to run this live (not with replication though). This is too fast. [21:50:21] 06Data-Engineering, 06DBA, 07Schema-change-in-production: Add new rc_name_source_patrolled_timestamp index to recentchanges table in wmf production - https://phabricator.wikimedia.org/T402010#11100469 (10Ladsgroup) [23:20:02] 10Data-Engineering (Q1 FY25/26 July 1st - September 30th), 10Data-Platform, 06Movement-Insights: Consider making the Automata heuristics private - https://phabricator.wikimedia.org/T402336#11100682 (10nshahquinn-wmf) I feel strongly about transparency and there are many Foundation-internal things that I wish...