[07:48:33] (03CR) 10Elukey: [C: 03+2] oozie: add cache_status to webrequest's druid indexations [analytics/refinery] - 10https://gerrit.wikimedia.org/r/858561 (https://phabricator.wikimedia.org/T314981) (owner: 10Elukey) [07:48:40] (03CR) 10Elukey: [V: 03+2 C: 03+2] oozie: add cache_status to webrequest's druid indexations [analytics/refinery] - 10https://gerrit.wikimedia.org/r/858561 (https://phabricator.wikimedia.org/T314981) (owner: 10Elukey) [07:51:46] o/ [07:51:52] I added the above patch to https://etherpad.wikimedia.org/p/analytics-weekly-train [08:16:39] 10Data-Engineering, 10Data-Services, 10cloud-services-team (Kanban): clouddb* hosts with ipv6 access timeout from cumin - https://phabricator.wikimedia.org/T323550 (10Marostegui) [08:16:49] 10Data-Engineering, 10Data-Services, 10cloud-services-team (Kanban): clouddb* hosts with ipv6 access timeout from cumin - https://phabricator.wikimedia.org/T323550 (10Marostegui) p:05Triage→03High [08:19:12] (VarnishkafkaNoMessages) firing: (4) varnishkafka on cp1085 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [08:24:12] (VarnishkafkaNoMessages) resolved: (4) varnishkafka on cp1085 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [08:31:32] 10Data-Engineering, 10CheckUser, 10MW-1.38-notes (1.38.0-wmf.26; 2022-03-14), 10MW-1.39-notes (1.39.0-wmf.23; 2022-08-01), and 3 others: Update CheckUser for actor and comment table - https://phabricator.wikimedia.org/T233004 (10Marostegui) [08:52:15] (03PS1) 10Phedenskog: Add cumuluative layout schema. [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/859433 (https://phabricator.wikimedia.org/T281103) [09:18:07] (03PS1) 10Stevemunene: Fix turnilo after upgrade. Upgrade to vesion 1.38.2 [analytics/turnilo/deploy] - 10https://gerrit.wikimedia.org/r/859436 (https://phabricator.wikimedia.org/T308778) [09:46:54] (03CR) 10Btullis: [V: 03+2 C: 03+2] "Looks good to me. I'll merge until we get Steve's membership of the analytics group in gerrit completed." [analytics/turnilo/deploy] - 10https://gerrit.wikimedia.org/r/859436 (https://phabricator.wikimedia.org/T308778) (owner: 10Stevemunene) [09:57:10] 10Data-Engineering, 10Data-Services, 10cloud-services-team (Kanban): clouddb* hosts with ipv6 access timeout from cumin - https://phabricator.wikimedia.org/T323550 (10Marostegui) Just to make it clear, after a timeout I would assume it reverts to the ipv4 resolve and it ends up working, but it takes minutes.... [10:58:59] PROBLEM - MegaRAID on an-worker1090 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [11:09:57] RECOVERY - MegaRAID on an-worker1090 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [11:18:53] (03PS1) 10Volans: oozie, druid: add aggregated_time_firstbyte [analytics/refinery] - 10https://gerrit.wikimedia.org/r/859463 [11:22:10] 10Analytics, 10Analytics-Wikistats, 10Data-Engineering: Anonymous edits - https://phabricator.wikimedia.org/T323562 (10ChristianKl) [11:40:38] PROBLEM - MegaRAID on an-worker1090 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [11:52:59] ACKNOWLEDGEMENT - MegaRAID on an-worker1090 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough Btullis T318659 - Added more downtime, but replacement batteries are on their way https://wikitech.wikimedia.org/wiki/MegaCli%23M [11:52:59] ng [12:22:33] RECOVERY - MegaRAID on an-worker1090 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [12:29:19] 10Data-Engineering-Planning, 10DBA, 10Data-Services, 10Data Pipelines (Sprint 04), 10cloud-services-team (Kanban): Prepare and check storage layer for bnwikiquote - https://phabricator.wikimedia.org/T319190 (10BTullis) @Marostegui and @Ladsgroup - Would you mind helping with this please, or pointing me t... [13:27:29] hello folks [13:27:32] just created https://wikitech.wikimedia.org/wiki/Analytics/Systems/Druid/Alerts#Druid_webrequest_sampled_live_Supervisor [13:40:55] and also https://gerrit.wikimedia.org/r/c/operations/alerts/+/859502 to add alerts (on team-sre, not on team-data-eng) [13:41:02] lemme know if it makes sense [13:43:53] (03CR) 10Elukey: [C: 03+1] "It makes sense to me but I'll let DE folks to chime in if anything is missing!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/859463 (owner: 10Volans) [13:54:36] (03PS2) 10Sergio Gimeno: Add user new impact data to the impact homepagemodule [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/859113 (https://phabricator.wikimedia.org/T323160) [13:56:59] 10Data-Engineering-Planning, 10DBA, 10Data-Services, 10Data Pipelines (Sprint 04), 10cloud-services-team (Kanban): Prepare and check storage layer for tlwikiquote - https://phabricator.wikimedia.org/T317111 (10Marostegui) @BTullis there were some steps missing (which I assumed were done before) at T31711... [14:07:04] 10Data-Engineering-Planning, 10DBA, 10Data-Services, 10Data Pipelines (Sprint 04), 10cloud-services-team (Kanban): Prepare and check storage layer for bnwikiquote - https://phabricator.wikimedia.org/T319190 (10Marostegui) There's indeed a step being missed there which is usually ran by DBAs: ` set sessio... [14:24:57] 10Data-Engineering-Planning, 10DBA, 10Data-Services, 10Data Pipelines (Sprint 04), 10cloud-services-team (Kanban): Prepare and check storage layer for bnwikiquote - https://phabricator.wikimedia.org/T319190 (10BTullis) Thanks @Marostegui - No, as far as I am aware, running that cookbook is the only thing... [14:29:06] 10Data-Engineering-Planning, 10DBA, 10Data-Services, 10Data Pipelines (Sprint 04), 10cloud-services-team (Kanban): Prepare and check storage layer for tlwikiquote - https://phabricator.wikimedia.org/T317111 (10BTullis) Many thanks @Marostegui - Proceeding now. ` sudo cookbook sre.wikireplicas.add-wiki --... [14:36:02] 10Data-Engineering-Planning, 10DBA, 10Data-Services, 10Data Pipelines (Sprint 04), 10cloud-services-team (Kanban): Prepare and check storage layer for bnwikiquote - https://phabricator.wikimedia.org/T319190 (10Marostegui) 05Open→03Resolved >>! In T319190#8413096, @BTullis wrote: > Thanks @Marostegui... [14:36:16] (03PS1) 10Phedenskog: Add largest contentful paint schema. [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/859534 (https://phabricator.wikimedia.org/T281022) [14:39:34] (03CR) 10CI reject: [V: 04-1] Add largest contentful paint schema. [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/859534 (https://phabricator.wikimedia.org/T281022) (owner: 10Phedenskog) [14:56:41] (03CR) 10Phedenskog: [C: 04-1] "Timo: Hmm, what do you think, should we keep CLS and LCP in navtiming schema or make one schema for each? I think its valuable to have the" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/859433 (https://phabricator.wikimedia.org/T281103) (owner: 10Phedenskog) [15:03:10] * joal is happy :) https://grafana.wikimedia.org/d/000000585/hadoop?orgId=1&viewPanel=28 [15:03:20] (HdfsTotalFilesHeap) resolved: Total files on the analytics-hadoop HDFS cluster are more than the heap can support. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_total_files_and_heap_size - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=28&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsTotalFilesHeap [15:07:02] (03CR) 10Phedenskog: [C: 04-1] "Lets wait with this and first decide if we should add this directly in havtiming instead." [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/859534 (https://phabricator.wikimedia.org/T281022) (owner: 10Phedenskog) [15:12:20] (HdfsTotalFilesHeap) resolved: Total files on the analytics-hadoop HDFS cluster are more than the heap can support. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_total_files_and_heap_size - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=28&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsTotalFilesHeap [15:31:26] 10Data-Engineering, 10Event-Platform Value Stream, 10Patch-For-Review: Design Schema for page state and page state with content (enriched) streams - https://phabricator.wikimedia.org/T308017 (10Isaac) > I don't know if I totally follow either, but there is more context the initial collab design doc see "Do w... [16:21:16] (03Abandoned) 10Phedenskog: Add cumuluative layout schema. [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/859433 (https://phabricator.wikimedia.org/T281103) (owner: 10Phedenskog) [16:21:45] (03Abandoned) 10Phedenskog: Add largest contentful paint schema. [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/859534 (https://phabricator.wikimedia.org/T281022) (owner: 10Phedenskog) [16:31:15] 10Analytics-Jupyter, 10Data-Engineering-Planning, 10Product-Analytics, 10Data Pipelines (Sprint 04), 10Patch-For-Review: Add support for jupyterlab on conda-analytics - https://phabricator.wikimedia.org/T321088 (10xcollazo) All right, https://gitlab.wikimedia.org/repos/data-engineering/conda-analytics/-/... [16:50:57] 10Data-Engineering-Planning, 10DBA, 10Data-Services, 10Data Pipelines (Sprint 04), 10cloud-services-team (Kanban): Prepare and check storage layer for bclwikiquote - https://phabricator.wikimedia.org/T316456 (10BTullis) a:03BTullis [16:51:18] 10Data-Engineering-Planning, 10DBA, 10Data-Services, 10Data Pipelines (Sprint 04), 10cloud-services-team (Kanban): Prepare and check storage layer for igwikiquote - https://phabricator.wikimedia.org/T314639 (10BTullis) a:03BTullis [17:04:16] 10Data-Engineering-Planning, 10DBA, 10Data-Services, 10Data Pipelines (Sprint 04), 10cloud-services-team (Kanban): Prepare and check storage layer for tlwikiquote - https://phabricator.wikimedia.org/T317111 (10BTullis) a:03BTullis [17:28:30] (03CR) 10Joal: [V: 03+2 C: 03+2] "LGTM! Merging" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/859463 (owner: 10Volans) [17:31:22] 10Data-Engineering-Planning, 10DBA, 10Data-Services, 10Data Pipelines (Sprint 04), 10cloud-services-team (Kanban): Prepare and check storage layer for tlwikiquote - https://phabricator.wikimedia.org/T317111 (10BTullis) 05Open→03Resolved [17:31:44] 10Data-Engineering-Planning, 10DBA, 10Data-Services, 10Data Pipelines (Sprint 04), 10cloud-services-team (Kanban): Prepare and check storage layer for bclwikiquote - https://phabricator.wikimedia.org/T316456 (10BTullis) 05Open→03Resolved [17:37:23] 10Data-Engineering-Kanban, 10Data-Engineering-Planning, 10Data Pipelines: Optimization of conda-analytics deb package - https://phabricator.wikimedia.org/T318397 (10Antoine_Quhen) Now, I think like you: it's not worth the time spent maintaining 2 packages. I'm not even sure about putting some time into opti... [18:00:06] 10Data-Engineering-Planning, 10DBA, 10Data-Services, 10Data Pipelines (Sprint 04), 10cloud-services-team (Kanban): Prepare and check storage layer for igwikiquote - https://phabricator.wikimedia.org/T314639 (10BTullis) 05Open→03Resolved [19:48:40] 10Data-Engineering, 10AQS 2.0 Roadmap, 10API Platform (API Platform Roadmap), 10Epic, and 2 others: AQS 2.0: Pageviews Service - https://phabricator.wikimedia.org/T288296 (10BPirkle) [19:48:46] 10Data-Engineering, 10API Platform (Sprint 01), 10AQS2.0, 10Platform Engineering Roadmap, 10User-Eevans: AQS 2.0: Pageviews: Implement Unit Tests - https://phabricator.wikimedia.org/T299735 (10BPirkle) 05Stalled→03Open This is now unblocked. [20:01:42] 10Data-Engineering-Planning, 10Data Pipelines: NEW FEATURE REQUEST: Upgrade superset to 1.5.2 - https://phabricator.wikimedia.org/T323458 (10xcollazo) Here is a compare between current prod version (1.4.2) to 1.5.2: https://github.com/apache/superset/compare/1.4.2...1.5.2 Here is the CHANGELOG: https://github.... [20:13:31] (03PS4) 10Milimetric: [WIP] Stream revision topics into iceberg table [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/858344 (https://phabricator.wikimedia.org/T322326) [20:17:55] (03CR) 10CI reject: [V: 04-1] [WIP] Stream revision topics into iceberg table [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/858344 (https://phabricator.wikimedia.org/T322326) (owner: 10Milimetric) [21:18:05] 10Data-Engineering-Planning, 10Data Pipelines, 10Discovery-Search: Migrate Search Airflow jobs to Airflow 2 and use shared supporting code from the data engineering Airflow - https://phabricator.wikimedia.org/T318414 (10xcollazo) [21:32:33] 10Data-Engineering-Planning, 10Data Pipelines, 10Discovery-Search: Migrate Search Airflow jobs to Airflow 2 and use shared supporting code from the data engineering Airflow - https://phabricator.wikimedia.org/T318414 (10xcollazo) For sizing purposes, tasks that I see for Data Eng. on this ticket: * Have a m...