[08:13:47] hey folks! I'd need to restart postgres on an-db* for https://phabricator.wikimedia.org/T374240 [08:13:54] any specific procedure to follow? [08:32:30] elukey: Could we wait just a little while for an-db1001, please? I can certainly do it this week, but it requires a restart of all 7 airflow schedulers because they don't have postgres retry logic. [08:32:58] btullis: no problem anytime! [08:53:19] Great, thanks. [08:54:23] !restarted postgresql@13-main.service on an-db1002 for T374240 [09:42:35] 06Data-Engineering, 10Data-Platform-SRE (2024.09.06 - 2024.09.27): an-launcher1002 /srv filling up mostly because of logs from dynamic mapped Airflow tasks - https://phabricator.wikimedia.org/T370437#10129166 (10BTullis) Unfortunately, I don't see the 30 GB free on the volume group behind `/srv`. ` btullis@an-... [09:56:11] 06Data-Engineering, 10Dumps 2.0, 10Event-Platform: [SPIKE] how can we support Spark producer/consumers in Event Platform - https://phabricator.wikimedia.org/T374341 (10gmodena) 03NEW [10:03:32] 06Data-Engineering, 10Dumps 2.0, 10Event-Platform: [SPIKE] how can we support Spark producer/consumers in Event Platform - https://phabricator.wikimedia.org/T374341#10129226 (10dcausse) [10:19:03] 06Data-Engineering, 10Data-Platform-SRE (2024.09.06 - 2024.09.27): an-launcher1002 /srv filling up mostly because of logs from dynamic mapped Airflow tasks - https://phabricator.wikimedia.org/T370437#10129286 (10BTullis) At the moment, I think that the most effective short-term solution would be to prune some... [10:46:04] 06Data-Engineering, 10Data-Platform-SRE (2024.09.06 - 2024.09.27), 07Kubernetes, 13Patch-For-Review: Migrate dse cluster off of Pod Security Policies - https://phabricator.wikimedia.org/T369492#10129338 (10brouberol) The `restricted` PSS has been enforced for all namespaces in `dse-k8s-eqiad`. [11:08:44] 06Data-Engineering, 10Data-Platform-SRE (2024.09.06 - 2024.09.27), 07Kubernetes, 13Patch-For-Review: Migrate dse cluster off of Pod Security Policies - https://phabricator.wikimedia.org/T369492#10129413 (10brouberol) 05Open→03Resolved I've run puppet on both Kube masters `lang=diff brouberol@dse-k8... [11:26:39] 06Data-Engineering, 10Data-Platform-SRE (2024.09.06 - 2024.09.27): an-launcher1002 /srv filling up mostly because of logs from dynamic mapped Airflow tasks - https://phabricator.wikimedia.org/T370437#10129463 (10BTullis) Oh, I see. There's nothing wrong with the logrotate fragment, but we have a dedicated syst... [11:32:55] 06Data-Engineering, 06Data-Platform, 06DBA, 07Schema-change-in-production: Change page.page_links_updated to fixed-length timestamp in wmf wikis - https://phabricator.wikimedia.org/T371742#10129479 (10Ladsgroup) [11:44:51] 06Data-Engineering, 10Data Pipelines, 10Data-Platform-SRE (2024.09.06 - 2024.09.27): [Airflow] Add log rotation to scheduler logs - https://phabricator.wikimedia.org/T315326#10129512 (10BTullis) p:05Triage→03High a:03BTullis I will claim this task and start working on it, based on the investigation ben... [11:49:12] 06Data-Engineering, 06Data Products, 06DBA, 07Schema-change-in-production: Cleanup revision table schema - https://phabricator.wikimedia.org/T367856#10129526 (10Ladsgroup) I'm taking over and doing s8 in eqiad now. [11:49:42] 06Data-Engineering, 06Data Products, 06DBA, 07Schema-change-in-production: Cleanup revision table schema - https://phabricator.wikimedia.org/T367856#10129527 (10Ladsgroup) [11:53:04] 06Data-Engineering, 10Data-Platform-SRE (2024.09.06 - 2024.09.27), 07Kubernetes: Migrate dse cluster off of Pod Security Policies - https://phabricator.wikimedia.org/T369492#10129535 (10brouberol) 05Resolved→03Open Reopening as we've found out that our Spark operator does not support setting `seccompProf... [12:44:45] 06Data-Engineering, 10Dumps 2.0, 10Event-Platform: [SPIKE] how can we support Spark producer/consumers in Event Platform - https://phabricator.wikimedia.org/T374341#10129665 (10Ottomata) I think for both of the listed use cases, what is mostly needed is producer support. See also: https://wikitech.wikime... [12:54:17] 06Data-Engineering, 10Dumps 2.0, 10Event-Platform: [SPIKE] how can we support Spark producer/consumers in Event Platform - https://phabricator.wikimedia.org/T374341#10129713 (10dcausse) [13:29:18] 06Data-Engineering, 10Data-Platform-SRE (2024.09.06 - 2024.09.27), 07Kubernetes, 13Patch-For-Review: Migrate dse cluster off of Pod Security Policies - https://phabricator.wikimedia.org/T369492#10129831 (10brouberol) 05Open→03Resolved [13:45:21] 06Data-Engineering, 10Event-Platform: Update eventutilities_python wrappers to support Flink 1.20 - https://phabricator.wikimedia.org/T374359 (10gmodena) 03NEW [13:47:50] 06Data-Engineering, 10Data-Platform-SRE (2024.09.06 - 2024.09.27): an-launcher1002 /srv filling up mostly because of logs from dynamic mapped Airflow tasks - https://phabricator.wikimedia.org/T370437#10129936 (10BTullis) I see now that I had already allocated the 30 GB of free space in T370392#9993829 I have... [13:48:44] 10Data-Engineering (Q1 2024 July 1st - September 30th), 06Java-Scala-Standardization, 10Data-Platform-SRE (2024.09.06 - 2024.09.27), 13Patch-For-Review: Migrate wmf-jvm-parent-pom and supporting components to the Maven group on Gitlab - https://phabricator.wikimedia.org/T369901#10129943 (10Gehel) 05Op... [13:49:55] 10Data-Engineering (Q1 2024 July 1st - September 30th), 06Data-Platform-SRE, 06Discovery-Search, 06Java-Scala-Standardization, 07Epic: [Epic] Replace Archiva with Gitlab artifact repositories - https://phabricator.wikimedia.org/T367315#10129951 (10Gehel) [13:51:14] 10Data-Engineering (Q1 2024 July 1st - September 30th), 06Discovery-Search, 06Java-Scala-Standardization, 10Data-Platform-SRE (2024.09.06 - 2024.09.27), 13Patch-For-Review: Update parent pom to disable fetching dependencies from Archiva and use Gitlab i... - https://phabricator.wikimedia.org/T367404#10129959 [13:51:36] 10Data-Engineering (Q1 2024 July 1st - September 30th), 06Data-Platform-SRE, 06Discovery-Search, 06Java-Scala-Standardization, 07Epic: [Epic] Replace Archiva with Gitlab artifact repositories - https://phabricator.wikimedia.org/T367315#10129961 (10Gehel) [14:11:06] 10Quarry: Set query result retention time - https://phabricator.wikimedia.org/T360041#10130050 (10rook) https://lists.wikimedia.org/hyperkitty/list/cloud-announce@lists.wikimedia.org/thread/JA4F2K4EBEC3CMS54JDTJBMRAPKND2NN/ [14:13:47] 06Data-Engineering, 10Dumps 2.0 (Kanban Board), 13Patch-For-Review: Flink job to enrich reconciliation events - https://phabricator.wikimedia.org/T368787#10130063 (10Ahoelzl) [14:25:52] (03PS1) 10Milimetric: Implement custom jdbc datasource [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1071624 (https://phabricator.wikimedia.org/T372677) [14:26:26] 06Data-Engineering, 06cloud-services-team, 05Cloud-Services-Origin-User: WMCS-roots paging responsibilities - https://phabricator.wikimedia.org/T344608#10130139 (10fnegri) p:05Triage→03Medium [14:26:47] 06Data-Engineering, 06cloud-services-team, 05Cloud-Services-Origin-User: WMCS-roots paging responsibilities - https://phabricator.wikimedia.org/T344608#10130140 (10fnegri) [14:34:24] (03PS2) 10Milimetric: Implement custom jdbc datasource [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1071624 (https://phabricator.wikimedia.org/T372677) [14:34:27] (03PS1) 10Xcollazo: Don't track Mac or IntelliJ temp files. [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1071626 [14:36:11] (03PS3) 10Milimetric: Implement custom jdbc datasource [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1071624 (https://phabricator.wikimedia.org/T372677) [14:38:24] (03CR) 10CI reject: [V:04-1] Implement custom jdbc datasource [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1071624 (https://phabricator.wikimedia.org/T372677) (owner: 10Milimetric) [14:41:49] (03PS1) 10Xcollazo: Sqoop the content table for all wikis [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1071629 (https://phabricator.wikimedia.org/T374280) [14:44:31] 10Quarry: upgrade ansible - https://phabricator.wikimedia.org/T374362 (10rook) 03NEW [14:49:53] 10Quarry: upgrade ansible - https://phabricator.wikimedia.org/T374362#10130274 (10github-toolforge-bot) vivian-rook opened https://github.com/toolforge/quarry/pull/69 [14:55:25] 10Quarry: upgrade ansible - https://phabricator.wikimedia.org/T374362#10130306 (10github-toolforge-bot) vivian-rook closed https://github.com/toolforge/quarry/pull/69 [14:56:41] 10Quarry: upgrade ansible - https://phabricator.wikimedia.org/T374362#10130299 (10rook) 05Open→03Resolved a:03rook [15:00:08] (03PS4) 10Milimetric: Implement custom jdbc datasource [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1071624 (https://phabricator.wikimedia.org/T372677) [15:02:23] 10Quarry: Upgrade to Ansible 10.3.0 - https://phabricator.wikimedia.org/T374362#10130341 (10Aklapper) [15:11:47] 10Quarry: Set query result retention time - https://phabricator.wikimedia.org/T360041#10130397 (10Nemo_bis) > The query itself will remain, so getting fresh results should be nothing more than a submit query away. That's not quite accurate when the purpose of the query is to get trends, for example in the numbe... [15:26:24] 10Quarry: Set query result retention time - https://phabricator.wikimedia.org/T360041#10130460 (10rook) >>! In T360041#10130397, @Nemo_bis wrote: >> The query itself will remain, so getting fresh results should be nothing more than a submit query away. > > That's not quite accurate when the purpose of the query... [15:32:45] 06Data-Engineering, 03Discovery-Search (Current work): Datahub - ingest Hive discovery database - https://phabricator.wikimedia.org/T374118#10130526 (10Gehel) p:05Triage→03Medium We'll need to decide what is relevant to expose. And check the permissions / access. [15:32:49] 06Data-Engineering, 03Discovery-Search (Current work): Datahub - ingest Hive discovery database - https://phabricator.wikimedia.org/T374118#10130529 (10Gehel) [15:38:32] 10Quarry: Set query result retention time - https://phabricator.wikimedia.org/T360041#10130581 (10SD0001) Is there a benefit to doing this? According to T178520, disk usage was 112 GB in 2017. I seem to recall it being around 195G last time I checked. Although now, it appears to have mysteriously shrunk to 100G:... [15:51:01] 10Quarry: Set query result retention time - https://phabricator.wikimedia.org/T360041#10130639 (10rook) The issue is not one of size, or suspicion that people may think the data is fresh, but the data itself. Periodically there are tickets opened regarding data that has been removed from the wikis but remains in... [16:00:48] (03CR) 10Joal: [C:03+1] "LGTM - But consider I know nothing about this :)" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1071629 (https://phabricator.wikimedia.org/T374280) (owner: 10Xcollazo) [16:08:11] (03CR) 10Milimetric: [V:03+2 C:03+2] "cool, let the content flow" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1071629 (https://phabricator.wikimedia.org/T374280) (owner: 10Xcollazo) [16:16:32] (03CR) 10Ottomata: [C:03+2] Don't track Mac or IntelliJ temp files. [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1071626 (owner: 10Xcollazo) [16:16:35] (03CR) 10Ottomata: [V:03+2 C:03+2] Don't track Mac or IntelliJ temp files. [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1071626 (owner: 10Xcollazo) [16:42:46] (03PS2) 10Clare Ming: Update Metrics Platform common fragment: [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/1066880 (https://phabricator.wikimedia.org/T366802) [16:58:42] (03PS1) 10Clare Ming: Update Metrics Platform app base with common fragment bump [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/1071655 (https://phabricator.wikimedia.org/T366802) [16:59:05] (03Abandoned) 10Clare Ming: Update Metrics Platform app base schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/1066889 (https://phabricator.wikimedia.org/T366802) (owner: 10Clare Ming) [17:04:54] (03PS1) 10Clare Ming: Update Metrics Platform web base with common fragment bump [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/1071657 (https://phabricator.wikimedia.org/T366802) [17:06:33] (03Abandoned) 10Clare Ming: Update Metrics Platform web base with common fragment bump [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/1071657 (https://phabricator.wikimedia.org/T366802) (owner: 10Clare Ming) [17:06:45] (03Abandoned) 10Clare Ming: Update Metrics Platform app base with common fragment bump [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/1071655 (https://phabricator.wikimedia.org/T366802) (owner: 10Clare Ming) [17:11:41] (03PS3) 10Clare Ming: Update Metrics Platform common fragment major bump: [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/1066880 (https://phabricator.wikimedia.org/T366802) [17:16:27] (03PS1) 10Clare Ming: Update Metrics Platform app base major version bump with common updates: [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/1071662 (https://phabricator.wikimedia.org/T366802) [17:16:56] (03CR) 10Clare Ming: "i lied - abandoned in favor of https://gerrit.wikimedia.org/r/c/schemas/event/secondary/+/1071662" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/1066889 (https://phabricator.wikimedia.org/T366802) (owner: 10Clare Ming) [17:19:43] (03PS1) 10Clare Ming: Update Metrics Platform web base major version bump with common updates: [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/1071663 (https://phabricator.wikimedia.org/T366802) [17:20:12] (03CR) 10CI reject: [V:04-1] Update Metrics Platform web base major version bump with common updates: [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/1071663 (https://phabricator.wikimedia.org/T366802) (owner: 10Clare Ming) [17:28:33] (03PS1) 10Clare Ming: Exclude Metrics Platform versions from compatibility errors due to typo correction introduced in earlier fragment on maxLength property. [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/1071666 (https://phabricator.wikimedia.org/T366802) [17:31:56] (03Abandoned) 10Clare Ming: Update Metrics Platform web base schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/1066883 (https://phabricator.wikimedia.org/T366802) (owner: 10Clare Ming) [17:32:20] (03CR) 10Clare Ming: "this will pass CI once https://gerrit.wikimedia.org/r/c/schemas/event/secondary/+/1071666 is merged" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/1071663 (https://phabricator.wikimedia.org/T366802) (owner: 10Clare Ming) [17:40:46] (03CR) 10Santiago Faci: "Change looks good but I'm not sure about two files I have mentioned" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/1066880 (https://phabricator.wikimedia.org/T366802) (owner: 10Clare Ming) [18:46:46] (03PS4) 10Clare Ming: Update Metrics Platform common fragment major bump: [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/1066880 (https://phabricator.wikimedia.org/T366802) [18:47:33] Is there any way for me to run a superset presto query with a longer timeout? [18:47:38] (03CR) 10Clare Ming: Update Metrics Platform common fragment major bump: (032 comments) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/1066880 (https://phabricator.wikimedia.org/T366802) (owner: 10Clare Ming) [18:48:45] (03PS2) 10Clare Ming: Update Metrics Platform app base major version bump with common updates: [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/1071662 (https://phabricator.wikimedia.org/T366802) [18:49:01] (03PS2) 10Clare Ming: Update Metrics Platform web base major version bump with common updates: [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/1071663 (https://phabricator.wikimedia.org/T366802) [18:49:30] (03CR) 10CI reject: [V:04-1] Update Metrics Platform web base major version bump with common updates: [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/1071663 (https://phabricator.wikimedia.org/T366802) (owner: 10Clare Ming) [18:52:25] (03PS2) 10Clare Ming: Exclude Metrics Platform versions from compatibility errors due to typo correction introduced in earlier fragment on maxLength property. [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/1071666 (https://phabricator.wikimedia.org/T366802) [19:31:20] (03CR) 10Santiago Faci: [C:03+2] "Looks good!" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/1071666 (https://phabricator.wikimedia.org/T366802) (owner: 10Clare Ming) [19:31:46] (03Merged) 10jenkins-bot: Exclude Metrics Platform versions from compatibility errors due to typo correction introduced in earlier fragment on maxLength property. [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/1071666 (https://phabricator.wikimedia.org/T366802) (owner: 10Clare Ming) [19:32:10] (03PS5) 10Clare Ming: Update Metrics Platform common fragment major bump: [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/1066880 (https://phabricator.wikimedia.org/T366802) [19:33:04] 06Data-Engineering, 10Temporary accounts: Update Data Engineering-owned products that may be affected by IP Masking - https://phabricator.wikimedia.org/T326875#10131683 (10kostajh) [19:33:10] (03PS3) 10Clare Ming: Update Metrics Platform app base major version bump with common updates: [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/1071662 (https://phabricator.wikimedia.org/T366802) [19:33:17] (03PS3) 10Clare Ming: Update Metrics Platform web base major version bump with common updates: [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/1071663 (https://phabricator.wikimedia.org/T366802) [19:35:22] (03CR) 10Santiago Faci: [C:03+2] "Looks good!" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/1066880 (https://phabricator.wikimedia.org/T366802) (owner: 10Clare Ming) [19:36:10] (03Merged) 10jenkins-bot: Update Metrics Platform common fragment major bump: [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/1066880 (https://phabricator.wikimedia.org/T366802) (owner: 10Clare Ming) [19:42:44] (03CR) 10Santiago Faci: [C:03+2] "Looks good!" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/1071662 (https://phabricator.wikimedia.org/T366802) (owner: 10Clare Ming) [19:42:51] (03CR) 10Santiago Faci: "Looks good!" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/1071663 (https://phabricator.wikimedia.org/T366802) (owner: 10Clare Ming) [19:43:12] (03Merged) 10jenkins-bot: Update Metrics Platform app base major version bump with common updates: [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/1071662 (https://phabricator.wikimedia.org/T366802) (owner: 10Clare Ming) [19:48:54] 06Data-Engineering, 10Data-Platform-SRE (2024.09.06 - 2024.09.27): Design a suitable DAG deployment method - https://phabricator.wikimedia.org/T368033#10131748 (10brouberol) >>! In T368033#9919695, @amastilovic wrote: >>>! In T368033#9912923, @BTullis wrote: >> In some ways, the fundamental question is: do we... [20:00:43] (03PS5) 10Milimetric: Implement custom jdbc datasource [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1071624 (https://phabricator.wikimedia.org/T372677) [20:03:08] 06Data-Engineering, 06Wikimedia Enterprise, 10Wikimedia Enterprise Engineering, 10Event-Platform: Provide data on whether the file itself was changed in mediawiki.page_change.v1 event - https://phabricator.wikimedia.org/T373644#10131783 (10Ottomata) [20:03:12] 06Data-Engineering, 06tech-decision-forum, 10Event-Platform: MediaWiki Event Carried State Transfer - Problem Statement - https://phabricator.wikimedia.org/T291120#10131784 (10Ottomata) [20:03:58] 06Data-Engineering, 10Temporary accounts, 10Event-Platform: Update Data Engineering-owned products that may be affected by IP Masking - https://phabricator.wikimedia.org/T326875#10131785 (10gmodena) [20:05:19] (03PS6) 10Milimetric: Implement custom jdbc datasource [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1071624 (https://phabricator.wikimedia.org/T372677) [20:07:51] 06Data-Engineering, 10Data-Platform-SRE (2024.09.06 - 2024.09.27): Design a suitable DAG deployment method - https://phabricator.wikimedia.org/T368033#10131793 (10brouberol) Another idea would be to fork `git-sync` and introduce a REST API through which we could trigger a sync. Having looked at the code, it sh... [20:16:33] 06Data-Engineering, 10Data-Platform-SRE (2024.09.06 - 2024.09.27): Design a suitable DAG deployment method - https://phabricator.wikimedia.org/T368033#10131836 (10Ottomata) @brouberol I know this is not exactly the same thing, but there must be some synergy between this and {T365659}. Most of the requiremen... [20:21:34] 06Data-Engineering, 10Data-Platform-SRE (2024.09.06 - 2024.09.27): Design a suitable DAG deployment method - https://phabricator.wikimedia.org/T368033#10131850 (10brouberol) Final idea of the evening: if we require a 2-step deploy, then we could simply redeploy the scheduler with the sha1 ref of `airflow-dags`... [20:34:11] 10Data-Engineering (Q1 2024 July 1st - September 30th): Retry package added needs the types-retry 0.9.9.4 typing stub - https://phabricator.wikimedia.org/T374396 (10Snwachukwu) 03NEW [21:02:31] 10Quarry: Set query result retention time - https://phabricator.wikimedia.org/T360041#10131990 (10Krinkle) Is there a recommended place to paste Quarry results in a way that 1) doesn't automatically expire, 2) is human-readable, and 3) has CSV/JSON export? If we don't recommend such a place, I assume we go from... [23:40:03] 10Quarry: Set query result retention time - https://phabricator.wikimedia.org/T360041#10132239 (10rook) I apologize I have yet to understand the interest in old data. The above seems to be suggesting that if the data is retained for 90 days, it would be copied all over the web to be read later. Results that are...