[00:06:26] 10Data-Engineering, 10Data-Persistence, 10Data-Platform-SRE ( 2024.02.12 - 2024.03.03): Migrate dbstore* hosts to 10.6 - https://phabricator.wikimedia.org/T356961#9578675 (10BTullis) [00:13:53] (HiveServerHeapUsage) resolved: Hive Server JVM Heap usage is above 80% on an-coord1003:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1003:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [00:14:23] (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-coord1003:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1003:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [00:19:23] (HiveServerHeapUsage) resolved: Hive Server JVM Heap usage is above 80% on an-coord1003:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1003:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [00:49:15] (HdfsRpcQueueLength) firing: RPC call queue length on the analytics-hadoop cluster is too high. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Namenode_RPC_length_queue - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=54&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsRpcQueueLength [00:54:15] (HdfsRpcQueueLength) resolved: RPC call queue length on the analytics-hadoop cluster is too high. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Namenode_RPC_length_queue - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=54&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsRpcQueueLength [00:56:06] (03PS5) 10Aleksandar Mastilovic: Add HQL files for CX report [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1003928 [09:44:52] 10Data-Engineering, 10Data-Platform-SRE, 10Epic: Migrate the Analytics Superset instances to our DSE Kubernetes cluster - https://phabricator.wikimedia.org/T347710#9579625 (10brouberol) [10:27:22] (03CR) 10Btullis: [C: 03+2] Add stat1010 ans stat1011 to hdfs_tools target [analytics/hdfs-tools/deploy] - 10https://gerrit.wikimedia.org/r/1005538 (https://phabricator.wikimedia.org/T354526) (owner: 10Stevemunene) [10:27:26] (03CR) 10Btullis: [V: 03+2 C: 03+2] Add stat1010 ans stat1011 to hdfs_tools target [analytics/hdfs-tools/deploy] - 10https://gerrit.wikimedia.org/r/1005538 (https://phabricator.wikimedia.org/T354526) (owner: 10Stevemunene) [10:28:11] (03CR) 10Btullis: [C: 03+1] Include fix for table schema previews of presto tables with array columns [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/999569 (https://phabricator.wikimedia.org/T356477) (owner: 10Brouberol) [10:55:19] (03CR) 10Joal: "One nit - I'll push a patch for this in order to unlock the train." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1003928 (owner: 10Aleksandar Mastilovic) [11:04:14] (03PS6) 10Joal: Add HQL files for CX abuse filter report [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1003928 (owner: 10Aleksandar Mastilovic) [11:11:07] (03CR) 10Joal: [C: 03+1] "Marking comments as done. Waiting for another +1 as I added a change myself." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1003928 (owner: 10Aleksandar Mastilovic) [11:12:17] 10Data-Engineering (Sprint 9), 10ChangeProp, 10observability, 10service-runner, 10Event-Platform: Upgrade prom-client in NodeJS service-runner and enable collectDefaultMetrics - https://phabricator.wikimedia.org/T350180#9579873 (10gmodena) I am taking a stab at this tasks, because we need gc and memory i... [11:14:45] 10Data-Engineering (Sprint 9), 10ChangeProp, 10observability, 10service-runner, 10Event-Platform: Upgrade prom-client in NodeJS service-runner and enable collectDefaultMetrics - https://phabricator.wikimedia.org/T350180#9579885 (10gmodena) @Jdforrester-WMF FWIW I saw you started deprecation work in https... [11:30:07] 10Data-Engineering, 10Data-Platform-SRE, 10Epic: Migrate the Analytics Superset instances to our DSE Kubernetes cluster - https://phabricator.wikimedia.org/T347710#9579927 (10brouberol) [11:31:04] 10Data-Engineering, 10Data-Platform-SRE, 10Epic: Migrate the Analytics Superset instances to our DSE Kubernetes cluster - https://phabricator.wikimedia.org/T347710#9210812 (10brouberol) [11:31:18] 10Data-Engineering, 10Data-Platform-SRE, 10Epic: Migrate the Analytics Superset instances to our DSE Kubernetes cluster - https://phabricator.wikimedia.org/T347710#9210812 (10brouberol) [11:34:56] 10Data-Engineering, 10Data-Platform-SRE ( 2024.02.12 - 2024.03.03): Migrate bare-metal superset services over to Kubernetes - https://phabricator.wikimedia.org/T358569#9579955 (10brouberol) [11:47:01] 10Data-Engineering, 10Data-Platform-SRE: Cleanup superset related resources from puppet - https://phabricator.wikimedia.org/T358570#9579993 (10brouberol) [11:47:26] 10Data-Engineering, 10Data-Platform-SRE, 10Epic: Migrate the Analytics Superset instances to our DSE Kubernetes cluster - https://phabricator.wikimedia.org/T347710#9580004 (10brouberol) [12:53:03] 10Data-Engineering, 10MediaWiki-extensions-EventLogging, 10Metrics Platform Backlog, 10Data Products (Data Products Sprint 10), 10Technical-Debt: Fix public documentation for mw.eventLog.submit() and dispatch() - https://phabricator.wikimedia.org/T357003#9525001 (10phuedx) This was done by @apaskulin as... [12:53:21] 10Data-Engineering, 10MediaWiki-extensions-EventLogging, 10Metrics Platform Backlog, 10Data Products (Data Products Sprint 10), 10Technical-Debt: Fix public documentation for mw.eventLog.submit() and dispatch() - https://phabricator.wikimedia.org/T357003#9580130 (10phuedx) [12:53:36] 10Data-Engineering, 10MediaWiki-extensions-EventLogging, 10Metrics Platform Backlog, 10Data Products (Data Products Sprint 10), 10Technical-Debt: Fix public documentation for mw.eventLog.submit() and dispatch() - https://phabricator.wikimedia.org/T357003#9580132 (10phuedx) a:03phuedx [13:39:07] 10Data-Engineering, 10Data-Platform-SRE ( 2024.02.12 - 2024.03.03), 10Patch-For-Review: [superset-k8s] Find a solution for the requestctl-generator html page - https://phabricator.wikimedia.org/T356490#9580333 (10brouberol) a:03brouberol [13:48:10] 10Data-Engineering (Sprint 9), 10ChangeProp, 10observability, 10service-runner, 10Event-Platform: Upgrade prom-client in NodeJS service-runner and enable collectDefaultMetrics - https://phabricator.wikimedia.org/T350180#9580360 (10Jdforrester-WMF) >>! In T350180#9579885, @gmodena wrote: > @Jdforrester-WM... [14:05:30] (03CR) 10Brouberol: [C: 03+2] Include fix for table schema previews of presto tables with array columns [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/999569 (https://phabricator.wikimedia.org/T356477) (owner: 10Brouberol) [14:05:33] (03CR) 10Brouberol: [V: 03+2 C: 03+2] Include fix for table schema previews of presto tables with array columns [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/999569 (https://phabricator.wikimedia.org/T356477) (owner: 10Brouberol) [14:06:15] 10Data-Engineering, 10Data-Platform-SRE ( 2024.02.12 - 2024.03.03), 10Patch-For-Review: Migrate bare-metal superset services over to Kubernetes - https://phabricator.wikimedia.org/T358569#9580398 (10brouberol) [14:09:04] 10Data-Engineering (Sprint 9), 10ChangeProp, 10observability, 10service-runner, 10Event-Platform: Upgrade prom-client in NodeJS service-runner and enable collectDefaultMetrics - https://phabricator.wikimedia.org/T350180#9580404 (10tchin) If it's to a point where we even need to use a new name, might as w... [14:11:05] 10Data-Engineering, 10Data-Platform-SRE, 10Epic: Migrate the Analytics Superset instances to our DSE Kubernetes cluster - https://phabricator.wikimedia.org/T347710#9580407 (10brouberol) [14:11:07] 10Data-Engineering, 10Data-Platform-SRE ( 2024.02.12 - 2024.03.03), 10Patch-For-Review: [superset-k8s] Find a solution for the requestctl-generator html page - https://phabricator.wikimedia.org/T356490#9580406 (10brouberol) 05Open→03Resolved [14:15:08] 10Data-Engineering, 10Data-Platform-SRE, 10Epic: Migrate the Analytics Superset instances to our DSE Kubernetes cluster - https://phabricator.wikimedia.org/T347710#9580437 (10brouberol) [14:18:46] 10Data-Engineering, 10Data-Platform-SRE, 10Epic: Migrate the Analytics Superset instances to our DSE Kubernetes cluster - https://phabricator.wikimedia.org/T347710#9580439 (10BTullis) Added the new kerberos principals. ` root@krb1001:~# kadmin.local addprinc -randkey superset/superset-next.svc.eqiad.wmnet@WI... [14:25:56] 10Data-Engineering, 10Data-Platform-SRE, 10Epic: Migrate the Analytics Superset instances to our DSE Kubernetes cluster - https://phabricator.wikimedia.org/T347710#9580470 (10BTullis) Created keytab files: ` root@krb1001:/srv/kerberos/keytabs# mkdir -p superset-next.svc.eqiad.wmnet/superset root@krb1001:/srv... [14:52:05] 10Data-Engineering, 10Data-Platform-SRE, 10Epic: Migrate the Analytics Superset instances to our DSE Kubernetes cluster - https://phabricator.wikimedia.org/T347710#9580640 (10BTullis) We rolled this out to superset-next and then we updated the database settings in the UI as shown. {F42181026,width=80%} [14:52:13] 10Data-Engineering, 10Data-Platform-SRE, 10Epic: Migrate the Analytics Superset instances to our DSE Kubernetes cluster - https://phabricator.wikimedia.org/T347710#9580643 (10BTullis) [15:08:35] 10Data-Engineering (Sprint 9), 10ChangeProp, 10observability, 10service-runner, 10Event-Platform: Upgrade prom-client in NodeJS service-runner and enable collectDefaultMetrics - https://phabricator.wikimedia.org/T350180#9580712 (10Jdforrester-WMF) Update: We've got access back, and v4.0.0 is finally rele... [15:18:31] brouberol, btullis: puppet is failing on the idp-test hosts, seems related to your superset/DSE work, is that known? https://paste.debian.net/hidden/7280e0f7/ [15:19:22] oops, let me fix that real quick [15:19:26] thanks for the report [15:19:33] cool, thanks [15:21:56] https://gerrit.wikimedia.org/r/c/operations/puppet/+/1006942 [15:23:26] +1d [15:30:05] all fixed, except for idp-test1003 which fails for bookworm related reasons [15:41:39] (03PS1) 10DLynch: Add an app_install_id column to editattemptstep [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/1006951 (https://phabricator.wikimedia.org/T353911) [15:42:22] (03CR) 10DLynch: "Alternative would be to mix in `fragment/analytics/product_metrics/app` to editattemptstep, but that adds a bunch of app-related fields th" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/1006951 (https://phabricator.wikimedia.org/T353911) (owner: 10DLynch) [16:11:53] (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-coord1003:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1003:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [16:22:52] 10Data-Engineering, 10Data-Platform-SRE ( 2024.02.12 - 2024.03.03): Alerts Review: determine if we can use Prometheus to alert based on historical datasets - https://phabricator.wikimedia.org/T357537#9580912 (10bking) 05Open→03In progress a:03bking [16:30:49] (03PS1) 10Joal: Add deny-list option to import_mediawiki_dumps [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1006957 (https://phabricator.wikimedia.org/T357859) [16:39:42] (03CR) 10TChin: [C: 03+1] Add HQL files for CX abuse filter report [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1003928 (owner: 10Aleksandar Mastilovic) [16:40:07] (03PS2) 10Joal: Add deny-list option to import_mediawiki_dumps [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1006957 (https://phabricator.wikimedia.org/T357859) [16:41:04] 10Data-Engineering, 10Privacy Engineering: Fix CI/CD issues in the differential-privacy repository - https://phabricator.wikimedia.org/T358601#9581008 (10Htriedman) [16:45:37] (03PS3) 10Joal: Add deny-list option to import_mediawiki_dumps [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1006957 (https://phabricator.wikimedia.org/T357859) [16:47:41] (03PS4) 10Joal: Add deny-list option to import_mediawiki_dumps [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1006957 (https://phabricator.wikimedia.org/T357859) [17:11:53] (HiveServerHeapUsage) resolved: Hive Server JVM Heap usage is above 80% on an-coord1003:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1003:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [18:05:49] (03PS1) 10Aleksandar Mastilovic: Add HQL query files for the "pingback" report [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1006970 [18:10:35] (03CR) 10Aleksandar Mastilovic: ""Pingback" report queries migrated to be used in an Airflow DAG." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1006970 (owner: 10Aleksandar Mastilovic) [18:13:08] (03CR) 10Aqu: [C: 03+1] "All good." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1003928 (owner: 10Aleksandar Mastilovic) [18:14:56] !log deploying eventstreams [18:14:58] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:22:05] (03PS3) 10Sbisson: Fix Wikistories schemas [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/1006060 (https://phabricator.wikimedia.org/T343183) [18:23:55] (03CR) 10TChin: [C: 03+2] Add HQL files for CX abuse filter report [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1003928 (owner: 10Aleksandar Mastilovic) [18:24:41] (03CR) 10TChin: [V: 03+2 C: 03+2] Add HQL files for CX abuse filter report [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1003928 (owner: 10Aleksandar Mastilovic) [19:59:21] (03CR) 10Xcollazo: [C: 03+1] "Two minor non-blocking comments below. LGTM!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1006957 (https://phabricator.wikimedia.org/T357859) (owner: 10Joal) [20:12:47] (03PS5) 10Joal: Add deny-list option to import_mediawiki_dumps [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1006957 (https://phabricator.wikimedia.org/T357859) [20:13:08] (03CR) 10Joal: "Thanks for reviewing @xcollazo :)" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1006957 (https://phabricator.wikimedia.org/T357859) (owner: 10Joal) [21:26:57] (03CR) 10Shay Nowick: [C: 03+2] "LGTM" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/1006951 (https://phabricator.wikimedia.org/T353911) (owner: 10DLynch) [21:27:30] (03Merged) 10jenkins-bot: Add an app_install_id column to editattemptstep [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/1006951 (https://phabricator.wikimedia.org/T353911) (owner: 10DLynch) [23:23:54] (03CR) 10Neil Shah-Quinn (WMF): Add deny-list option to import_mediawiki_dumps (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1006957 (https://phabricator.wikimedia.org/T357859) (owner: 10Joal)