[00:13:40] (03PS3) 10Snwachukwu: Add Dynamic Pivot job for reportupdater reports [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/995271 (https://phabricator.wikimedia.org/T354552) [00:22:29] (03CR) 10CI reject: [V: 04-1] Add Dynamic Pivot job for reportupdater reports [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/995271 (https://phabricator.wikimedia.org/T354552) (owner: 10Snwachukwu) [08:15:38] good morning πŸ‘‹πŸΎ [08:33:10] morning! [08:52:53] 10Data-Platform-SRE ( 2024.02.12 - 2024.03.03): Create a helm chart for Superset - https://phabricator.wikimedia.org/T352166 (10brouberol) 05Openβ†’03Resolved [08:52:56] 10Data-Engineering, 10Data-Platform-SRE, 10Epic: Migrate the Analytics Superset instances to our DSE Kubernetes cluster - https://phabricator.wikimedia.org/T347710 (10brouberol) [08:52:59] 10Data-Engineering, 10Data-Platform-SRE, 10Epic: Migrate the Analytics Superset instances to our DSE Kubernetes cluster - https://phabricator.wikimedia.org/T347710 (10brouberol) [08:53:01] 10Data-Platform-SRE ( 2024.02.12 - 2024.03.03), 10Patch-For-Review: Create helmfile deployment files for superset and superset-next - https://phabricator.wikimedia.org/T353790 (10brouberol) 05Openβ†’03Resolved [09:08:31] (03CR) 10Stevemunene: [C: 03+1] Include fix for table schema previews of presto tables with array columns [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/999569 (https://phabricator.wikimedia.org/T356477) (owner: 10Brouberol) [09:10:37] 10Data-Engineering, 10Data-Platform-SRE (2024.01.22 - 2024.02.11), 10Patch-For-Review: Apply our patches not yet merged upstream to the supserset codebase in our Docker image - https://phabricator.wikimedia.org/T356477 (10CodeReviewBot) brouberol merged https://gitlab.wikimedia.org/repos/data-engineering/sup... [10:09:14] 10Data-Engineering, 10MediaWiki-extensions-EventLogging, 10Metrics Platform Icebox, 10Epic: [EPIC] Deprecate EventLogging::schemaValidate() - https://phabricator.wikimedia.org/T317793 (10phuedx) [10:28:09] 10Data-Engineering, 10Data-Platform-SRE ( 2024.02.12 - 2024.03.03), 10Patch-For-Review: [superset k8s] Update public domain DNS records to make them point to the DSE Kubernetes ingress - https://phabricator.wikimedia.org/T356482 (10brouberol) ` brouberol@dns1004:~$ host superset-k8s.wikimedia.org superset-k... [11:22:59] 10Data-Engineering, 10Data Products, 10Observability-Logging, 10Traffic, 10Patch-For-Review: Move analytics log from Varnish to HAProxy - https://phabricator.wikimedia.org/T351117 (10gmodena) > TBD on final stream name in T314956: [Event Platform] Declare webrequest as an Event Platform stream, but the c... [11:41:05] 10Data-Engineering, 10Wikidata, 10Wikidata-Termbox, 10serviceops, and 3 others: Migrate Termbox SSR from Node 16 to 18 - https://phabricator.wikimedia.org/T355685 (10Lucas_Werkmeister_WMDE) > Patches are up for review! Looks alright to me – I think if another SRE can review the general changes, we can try... [12:14:52] hello, FYI Puppet CA certificate stat1005.eqiad.wmnet will expire in 4d 22h and should be renewed (there is a cookbook for that ;) ) [12:14:59] See https://alerts.wikimedia.org/?q=%40state%3Dactive&q=%40cluster%3Dwikimedia.org&q=alertname%3DPuppetCertificateAboutToExpire [12:16:21] volans: thanks, let me have a look [12:19:32] all done. I'm keeping an eye on the alert to see whether it resolves [12:19:49] great, thanks a lot [12:20:48] it's all gone. Thanks for the report πŸ‘ [12:45:49] 10Data-Engineering, 10Data-Platform-SRE ( 2024.02.12 - 2024.03.03): [superset k8s] Update public domain DNS records to make them point to the DSE Kubernetes ingress - https://phabricator.wikimedia.org/T356482 (10brouberol) 05Openβ†’03Resolved [12:45:51] 10Data-Engineering, 10Data-Platform-SRE, 10Epic: Migrate the Analytics Superset instances to our DSE Kubernetes cluster - https://phabricator.wikimedia.org/T347710 (10brouberol) [12:46:04] 10Data-Engineering, 10Data-Platform-SRE, 10Epic: Migrate the Analytics Superset instances to our DSE Kubernetes cluster - https://phabricator.wikimedia.org/T347710 (10brouberol) [12:46:06] 10Data-Engineering, 10Data-Platform-SRE ( 2024.02.12 - 2024.03.03), 10Patch-For-Review: [superset k8s] Add entries to the puppet service catalog - https://phabricator.wikimedia.org/T356483 (10brouberol) 05Openβ†’03Resolved [12:46:44] 10Data-Engineering, 10Data-Platform-SRE, 10Epic: Migrate the Analytics Superset instances to our DSE Kubernetes cluster - https://phabricator.wikimedia.org/T347710 (10brouberol) [12:56:31] 10Data-Engineering, 10CX-cxserver, 10Citoid, 10Content-Transform-Team-WIP, and 11 others: Migrate node-based services in production to node18 - https://phabricator.wikimedia.org/T349118 (10MSantos) [12:56:53] 10Data-Platform-SRE ( 2024.02.12 - 2024.03.03): Configure OIDC Authentication for Superset on K8S - https://phabricator.wikimedia.org/T353794 (10brouberol) a:03brouberol [13:02:35] 10Data-Engineering, 10Metrics Platform Backlog, 10Data Products (Data Products Sprint 09), 10Event-Platform (Sprint 09): [SPIKE] Draft of Mediawiki extension proposal for Metrics Platform Instrumentation (& Experimentation) - https://phabricator.wikimedia.org/T355599 (10phuedx) [13:03:10] 10Data-Engineering, 10Metrics Platform Backlog, 10Data Products (Data Products Sprint 09), 10Event-Platform (Sprint 09): [SPIKE] Draft of Mediawiki extension proposal for Metrics Platform Instrumentation (& Experimentation) - https://phabricator.wikimedia.org/T355599 (10phuedx) >>! In T355599#9501776, @phu... [13:08:34] (03PS4) 10Snwachukwu: Add Dynamic Pivot job for reportupdater reports [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/995271 (https://phabricator.wikimedia.org/T354552) [13:19:32] (03CR) 10CI reject: [V: 04-1] Add Dynamic Pivot job for reportupdater reports [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/995271 (https://phabricator.wikimedia.org/T354552) (owner: 10Snwachukwu) [13:53:37] 10Data-Engineering, 10Data-Platform-SRE ( 2024.02.12 - 2024.03.03): Configure ingress internal DNS records - https://phabricator.wikimedia.org/T356481 (10brouberol) 05Resolvedβ†’03Open [13:53:40] 10Data-Engineering, 10Data-Platform-SRE, 10Epic: Migrate the Analytics Superset instances to our DSE Kubernetes cluster - https://phabricator.wikimedia.org/T347710 (10brouberol) [13:56:43] 10Data-Engineering, 10Data-Platform-SRE ( 2024.02.12 - 2024.03.03): Configure ingress internal DNS records - https://phabricator.wikimedia.org/T356481 (10brouberol) It seems that https://superset-k8s.wikimedia.org and https://superset-next-k8s.wikimedia.org return 502 errors. It might be expected for https://... [13:57:08] 10Data-Engineering (Sprint 8), 10EventStreams, 10Prod-Kubernetes, 10serviceops, and 2 others: eventstreams regularly uses more than 95% of its memory limit - https://phabricator.wikimedia.org/T357005 (10gmodena) >>! In T357005#9531775, @tchin wrote: > Looking at the logs, this seems to coincide with the re... [14:02:03] 10Data-Engineering, 10Data Pipelines: CI/CD Pipeline Implementation - https://phabricator.wikimedia.org/T304929 (10lbowmaker) 05Openβ†’03Declined CI/CD documented here: https://wikitech.wikimedia.org/wiki/Data_Engineering/Systems/Airflow/Developer_guide#CI [14:03:30] 10Data-Engineering, 10Data Pipelines: [NEEDS GROOMING] Support migration of simple (Hive > Hive) jobs - https://phabricator.wikimedia.org/T333006 (10lbowmaker) 05Openβ†’03Resolved a:03lbowmaker Duplicate [14:04:45] 10Data-Engineering, 10Data Pipelines: [Airflow] Research, discuss and decide on DAG/task dependencies VS. success/failure files (Oozie style) - https://phabricator.wikimedia.org/T301568 (10lbowmaker) 05Openβ†’03Resolved a:03lbowmaker [14:05:49] 10Data-Engineering, 10Data Pipelines: Investigate datahub stack trace on an-airflow1004.eqiad.wmnet - https://phabricator.wikimedia.org/T332822 (10lbowmaker) 05Openβ†’03Resolved a:03lbowmaker Resolving, we have moved forward with different versions of Airflow since this ticket was created. Can re-open if i... [14:07:54] 10Data-Engineering, 10Data Pipelines: Airflow: pin dependency versions to prevent long installs - https://phabricator.wikimedia.org/T309046 (10lbowmaker) 05Openβ†’03Resolved a:03lbowmaker Resolved here: https://phabricator.wikimedia.org/T311111#8980704 [14:08:34] 10Data-Engineering, 10EventStreams, 10MediaWiki-General, 10Privacy Engineering, and 3 others: Create Mediawiki "oversightprotect" action that suppresses usernames of all edits of a page - https://phabricator.wikimedia.org/T354577 (10DannyS712) a:05DannyS712β†’03None Not sure when I'm going to have time f... [14:19:39] 10Data-Engineering, 10Data Pipelines: Production Airflow dags should be moved to the shared repo - https://phabricator.wikimedia.org/T295807 (10lbowmaker) 05Openβ†’03Resolved a:03lbowmaker [14:25:13] 10Data-Engineering: Requesting Kerberos access for ahoelzl - https://phabricator.wikimedia.org/T345961 (10lbowmaker) 05Openβ†’03Resolved a:03lbowmaker Resolved here: https://phabricator.wikimedia.org/T345959 [14:35:07] 10Data-Engineering, 10Data-Platform-SRE ( 2024.02.12 - 2024.03.03): Check home/HDFS leftovers of daniram - https://phabricator.wikimedia.org/T355108 (10brouberol) a:03brouberol [14:40:38] 10Data-Engineering, 10Data Pipelines: CI/CD Pipeline Design - https://phabricator.wikimedia.org/T304926 (10lbowmaker) 05Openβ†’03Resolved a:03lbowmaker https://wikitech.wikimedia.org/wiki/Data_Engineering/Systems/Airflow/Developer_guide#CI [14:47:02] 10Data-Engineering, 10Data-Platform-SRE ( 2024.02.12 - 2024.03.03): Check home/HDFS leftovers of daniram - https://phabricator.wikimedia.org/T355108 (10brouberol) 05Openβ†’03Resolved **Linux boxes** `daniram` only had personal data on `stat1004.eqiad.wmnet`. I moved it into `/home/piccardi/daniram-data` on... [14:50:58] 10Data-Engineering, 10Data-Platform-SRE ( 2024.02.12 - 2024.03.03): Check home/HDFS leftovers of shubhankar - https://phabricator.wikimedia.org/T355501 (10brouberol) 05Openβ†’03Resolved a:03brouberol I didn't find anything in HDFS indeed: ` brouberol@an-master1003:~$ sudo kerberos-run-command hdfs hdfs dfs... [14:53:58] 10Data-Platform-SRE: Improve Elastic operation macros/tmux - https://phabricator.wikimedia.org/T357142 (10Gehel) p:05Triageβ†’03Medium [14:54:23] 10Data-Platform-SRE: Monitor Elastic S3 repository status - https://phabricator.wikimedia.org/T357146 (10Gehel) p:05Triageβ†’03Medium [14:55:02] 10Data-Platform-SRE ( 2024.02.12 - 2024.03.03): Monitor Elastic S3 repository status - https://phabricator.wikimedia.org/T357146 (10Gehel) [15:08:51] 10Analytics, 10Data-Engineering, 10EventStreams, 10Wikidata, and 2 others: Expose rdf-streaming-updater.mutation content through EventStreams - https://phabricator.wikimedia.org/T294133 (10Lucas_Werkmeister_WMDE) AFAIK: The β€œlegacy” updater queries the recent changes [via the API](https://www.mediawiki.org... [15:12:33] 10Data-Engineering, 10Data Pipelines: [NEEDS GROOMING} Airflow development instances should be available on demand - https://phabricator.wikimedia.org/T295814 (10lbowmaker) 05Openβ†’03Declined [15:37:15] 10Data-Platform-SRE ( 2024.02.12 - 2024.03.03): Check home/HDFS leftovers of jbond - https://phabricator.wikimedia.org/T352511 (10brouberol) 05Openβ†’03Resolved a:03brouberol I couldn't find any data in JBond's personal HDFS directory: ` brouberol@an-master1003:~$ sudo kerberos-run-command hdfs hdfs dfs -ls... [15:37:36] 10Data-Engineering, 10Data Products: Adapt Sqoop to pagelinks schema change - https://phabricator.wikimedia.org/T345771 (10JAllemandou) Thank you so much @Ladsgroup for the recap. [15:42:38] 10Data-Platform-SRE ( 2024.02.12 - 2024.03.03): Remove nickifeajika from analytics-privatedata-users - https://phabricator.wikimedia.org/T353665 (10brouberol) 05Openβ†’03Resolved a:03brouberol This seems to have been solved by @MoritzMuehlenhoff : ` ~/wmf/puppet superset-oidc *5 ?1 ❯ g log --grep nickifeaji... [15:44:10] 10Data-Engineering, 10Data-Platform-SRE ( 2024.02.12 - 2024.03.03): Check home/HDFS leftovers of nickifeajika - https://phabricator.wikimedia.org/T354241 (10brouberol) a:03brouberol I found data on stat boxes as well as in HDFS. Shoud I delete it, or place it in someone else's home directory? If so, who? [15:46:44] 10Data-Platform-SRE ( 2024.02.12 - 2024.03.03): Set requests (not limits) for cirrus-streaming-updater in k8s - https://phabricator.wikimedia.org/T348350 (10brouberol) a:03brouberol [16:00:00] 10Data-Engineering, 10Data-Platform-SRE ( 2024.02.12 - 2024.03.03): Check home/HDFS leftovers of nickifeajika - https://phabricator.wikimedia.org/T354241 (10MoritzMuehlenhoff) @Miriam @fkaelin Can you provide some advice here? [16:07:26] 10Data-Engineering, 10MediaWiki-General, 10Event-Platform, 10Patch-For-Review: Create legacy EventLogging proxy HTTP intake (for MediaWikiPingback) endpoint to EventGate - https://phabricator.wikimedia.org/T353817 (10Ottomata) > PHP execution. > Afaik PHP execution is limited for security reasons to only s... [16:08:08] 10Data-Engineering, 10MediaWiki-General, 10Event-Platform, 10Patch-For-Review: Create legacy EventLogging proxy HTTP intake (for MediaWikiPingback) endpoint to EventGate - https://phabricator.wikimedia.org/T353817 (10Ottomata) [16:11:46] 10Data-Engineering, 10CX-cxserver, 10Citoid, 10Content-Transform-Team-WIP, and 11 others: Migrate node-based services in production to node18 - https://phabricator.wikimedia.org/T349118 (10Sbailey) a:03Sbailey [16:12:35] 10Data-Platform-SRE ( 2024.02.12 - 2024.03.03), 10Discovery-Search (Current work): Rebuild and deploy textify plugin - https://phabricator.wikimedia.org/T356651 (10Gehel) [16:22:40] 10Data-Platform-SRE: RdfStreamingUpdaterSpaceUsageTooHigh - https://phabricator.wikimedia.org/T356698 (10Gehel) [16:27:23] 10Data-Platform-SRE, 10Data-Platform: Audit Search Platform-owned WMCS accounts - https://phabricator.wikimedia.org/T357162 (10Gehel) [16:28:51] 10Data-Platform-SRE ( 2024.02.12 - 2024.03.03): Set requests (not limits) for cirrus-streaming-updater in k8s - https://phabricator.wikimedia.org/T348350 (10brouberol) @RKemper This is what I found. I started to take a look at our pods. The `flink-main-container` of our `flink-producer` has the following resour... [16:35:08] 10Data-Platform-SRE ( 2024.02.12 - 2024.03.03), 10Movement-Insights: Create a DataHub group for the Movement Insights team - https://phabricator.wikimedia.org/T354211 (10brouberol) 05Openβ†’03Resolved a:03brouberol All done! {F41868334} I've also taken care of the member removal from the specified groups. [17:26:14] (03PS1) 10Aleksandar Mastilovic: Adding report updater CX queries [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1002588 (https://phabricator.wikimedia.org/T356424) [17:29:03] (03CR) 10Aleksandar Mastilovic: "Something nice" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1002588 (https://phabricator.wikimedia.org/T356424) (owner: 10Aleksandar Mastilovic) [17:30:02] (03CR) 10Joal: Adding report updater CX queries (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1002588 (https://phabricator.wikimedia.org/T356424) (owner: 10Aleksandar Mastilovic) [17:33:55] (03PS2) 10Aleksandar Mastilovic: Adding report updater CX queries [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1002588 (https://phabricator.wikimedia.org/T356424) [17:35:03] (03CR) 10Aleksandar Mastilovic: "I acknowledged your comment, Joseph!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1002588 (https://phabricator.wikimedia.org/T356424) (owner: 10Aleksandar Mastilovic) [17:36:23] (03CR) 10Joal: [C: 03+1] Adding report updater CX queries [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1002588 (https://phabricator.wikimedia.org/T356424) (owner: 10Aleksandar Mastilovic) [17:36:41] (03CR) 10Joal: [V: 03+2 C: 03+2] Adding report updater CX queries [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1002588 (https://phabricator.wikimedia.org/T356424) (owner: 10Aleksandar Mastilovic) [17:46:35] (03CR) 10Joal: Adding report updater CX queries [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1002588 (https://phabricator.wikimedia.org/T356424) (owner: 10Aleksandar Mastilovic) [18:26:53] (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-coord1003:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1003:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [19:09:03] (GobblinKafkaRecordsExtractedNotEqualRecordsExpected) firing: Gobblin job event_default ingested an unexpected number of records for a Kafka topic partition. ... [19:09:04] - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Gobblin - https://grafana.wikimedia.org/d/pAQaJwEnk/gobblin?orgId=1&var-gobblin_job_name=event_default&var-kafka_topic=eqiad.mediawiki.cirrussearch.page_rerender.v1&viewPanel=4 - https://alerts.wikimedia.org/?q=alertname%3DGobblinKafkaRecordsExtractedNotEqualRecordsExpected [19:48:49] 10Data-Platform-SRE: RdfStreamingUpdaterSpaceUsageTooHigh - https://phabricator.wikimedia.org/T356698 (10Gehel) p:05Triageβ†’03High [19:49:01] 10Data-Platform-SRE ( 2024.02.12 - 2024.03.03): RdfStreamingUpdaterSpaceUsageTooHigh - https://phabricator.wikimedia.org/T356698 (10Gehel) [20:09:03] (GobblinKafkaRecordsExtractedNotEqualRecordsExpected) resolved: Gobblin job event_default ingested an unexpected number of records for a Kafka topic partition. ... [20:09:04] - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Gobblin - https://grafana.wikimedia.org/d/pAQaJwEnk/gobblin?orgId=1&var-gobblin_job_name=event_default&var-kafka_topic=eqiad.mediawiki.cirrussearch.page_rerender.v1&viewPanel=4 - https://alerts.wikimedia.org/?q=alertname%3DGobblinKafkaRecordsExtractedNotEqualRecordsExpected [20:11:53] (HiveServerHeapUsage) resolved: Hive Server JVM Heap usage is above 80% on an-coord1003:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1003:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [20:34:37] 10Data-Engineering, 10Data-Platform-SRE: Implement periodical cleaning of Airflow databases - https://phabricator.wikimedia.org/T322036 (10lbowmaker) [20:35:40] 10Data-Platform-SRE ( 2024.02.12 - 2024.03.03), 10Discovery-Search (Current work), 10Patch-For-Review: Rebuild and deploy textify plugin - https://phabricator.wikimedia.org/T356651 (10CodeReviewBot) ebernhardson opened https://gitlab.wikimedia.org/repos/search-platform/cirrussearch-elasticsearch-image/-/merg... [20:35:48] 10Data-Engineering, 10Fundraising Tech - Chaos Crew, 10Fundraising-Backlog, 10MediaWiki-Core-Tests, and 3 others: CentralNotice failing in browser test on master - https://phabricator.wikimedia.org/T354977 (10XenoRyet) 05Openβ†’03Resolved [20:37:25] 10Data-Platform-SRE ( 2024.02.12 - 2024.03.03), 10Discovery-Search (Current work), 10Patch-For-Review: Rebuild and deploy textify plugin - https://phabricator.wikimedia.org/T356651 (10EBernhardson) Released the plugin as -wmf12. Patch above updates the .deb to use the newest versions. MR also up on gitlab t... [22:59:53] (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-coord1003:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1003:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [23:39:53] (HiveServerHeapUsage) resolved: Hive Server JVM Heap usage is above 80% on an-coord1003:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1003:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage