[00:58:51] 10Data-Engineering (Sprint 8), 10Patch-For-Review: [Maintenance] Migrate cx ReportUpdater job - https://phabricator.wikimedia.org/T356424 (10lbowmaker) [00:59:21] 10Data-Engineering (Sprint 8), 10Patch-For-Review: [Maintenance] Migrate cx ReportUpdater job - https://phabricator.wikimedia.org/T356424 (10lbowmaker) [00:59:24] 10Data-Engineering, 10Data Pipelines: [Airflow Migration] Migrate 1+ reportupdater jobs - https://phabricator.wikimedia.org/T307540 (10lbowmaker) [01:00:53] 10Data-Engineering: [Maintenance] Migrate pingback to Airflow - https://phabricator.wikimedia.org/T357372 (10lbowmaker) [02:29:18] 10Analytics-Radar, 10Data-Engineering, 10Data Products, 10Metrics Platform Backlog: mw.user.generateRandomSessionId should return a UUID - https://phabricator.wikimedia.org/T266813 (10Ottomata) I think because it was on the Event Platform board, but doesn't have anything really to do with Event Platform.... [03:29:53] (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-coord1003:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1003:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [05:34:53] (HiveServerHeapUsage) resolved: Hive Server JVM Heap usage is above 80% on an-coord1003:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1003:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [05:35:23] (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-coord1003:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1003:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [05:40:08] (HiveServerHeapUsage) resolved: Hive Server JVM Heap usage is above 80% on an-coord1003:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1003:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [08:30:00] 10Data-Platform-SRE ( 2024.02.12 - 2024.03.03), 10Infrastructure-Foundations, 10Puppet-Core, 10SRE, and 5 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619 (10MoritzMuehlenhoff) [09:00:49] 10Data-Platform-SRE ( 2024.02.12 - 2024.03.03): Migrate apifeatureusage hosts to Bullseye or later - https://phabricator.wikimedia.org/T346053 (10brouberol) a:03brouberol [09:02:03] 10Data-Platform-SRE ( 2024.02.12 - 2024.03.03): Migrate apifeatureusage hosts to Bullseye or later - https://phabricator.wikimedia.org/T346053 (10brouberol) We see to have an elastic repository for Bookworm, so I'm going to attempt a bookworm reimage, as we only run logstash on these hosts. ` brouberol@apt1001:~... [09:03:51] !log attempting a reimage of apifeatureusage1001 to bookworm - T346053 [09:03:56] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [09:03:56] T346053: Migrate apifeatureusage hosts to Bullseye or later - https://phabricator.wikimedia.org/T346053 [09:04:37] 10Data-Platform-SRE ( 2024.02.12 - 2024.03.03): Migrate apifeatureusage hosts to Bullseye or later - https://phabricator.wikimedia.org/T346053 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by brouberol@cumin1002 for host apifeatureusage1001.eqiad.wmnet with OS bookworm [09:20:49] 10Data-Platform-SRE ( 2024.02.12 - 2024.03.03): Migrate apifeatureusage hosts to Bullseye or later - https://phabricator.wikimedia.org/T346053 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by brouberol@cumin1002 for host apifeatureusage1001.eqiad.wmnet with OS bookworm executed with errors:... [09:26:20] volans (as a spicerack connaisseur): I attempted to reimage a host to bookworm, which failed at. first puppet run time, due to missing debian packages. I then aborted the reimage process, and attempted to run a 2nd reimaging with --os bullseye, but this fails, as cumin cannot run on the host probably due to unconfigured ssh. [09:26:59] brouberol: what's the current status of the host? [09:27:02] this host is ultimately a ganeti VM, so I'm thinking, is there a way to force rebuild it from scratch? [09:27:18] you can decommission it and restart from scratch :D [09:27:47] it's running and puppet hasn't run on it, so I'm taking it's a pretty fresh VM [09:28:40] so that'd be sre.hosts.decommission on this host, and then sre.hosts.provision ? [09:28:50] no, no provision [09:29:12] that's for physical hosts for their first setup [09:29:38] but also i'm not sure why it should be failing [09:29:49] the reimage doesn't ssh before d-i IIRC [09:30:04] what's the hostname and from which cumin host did you run it? [09:30:20] the first reimage failed due to https://puppetboard.wikimedia.org/report/apifeatureusage1001.eqiad.wmnet/0652a2bb556e5dc154abe72c8992a388d175ca5e, and I then aborted the reimage process [09:30:37] which ran from cumin1002, on apifeatureusage1001 [09:31:40] so for the firt one I guess T353392 is possibly related? [09:31:41] T353392: Ensure Elastic stack works on bookworm - https://phabricator.wikimedia.org/T353392 [09:31:50] same java deps [09:32:51] brouberol: the solution is esy [09:33:18] yes, that was an oversight on my part. I checked that we had bookworm-wikimedia/thirdparty/elastic710 available on our apt servers, and assumed it meant that installing elastic on bookworm would work, but we're missing a java package it seems [09:33:22] you need to pass --new [09:34:20] because in the first reimage it was removed from puppetdb [09:34:50] so, assuming I'd like to attempt a reimage to bullseye instead, that'd be `cookbook sre.hosts.reimage --os bullseye -t T346053 apifeatureusage1001 --new` ? [09:34:50] T346053: Migrate apifeatureusage hosts to Bullseye or later - https://phabricator.wikimedia.org/T346053 [09:35:07] yes by let me check one thing first [09:35:12] I see you passed --new [09:35:13] already [09:35:34] the command that is failing is [09:35:34] puppet lookup --render-as s --compile --node apifeatureusage1001.eqiad.wmnet profile::puppet::agent::force_puppet7 [09:35:41] yep, I was going to say, and it failed with "Host apifeatureusage1001.eqiad.wmnet was found in PuppetDB but --new was set. Are you sure you want to proceed? The --new option will be unset" [09:35:45] that is run on the puppetserver [09:35:48] and the failure is due because: [09:35:48] Error: Could not run: Could not find resource 'Package[openjdk-11-jdk]' in parameter 'require' (file: /srv/puppet_code/environments/production/modules/logstash/manifests/init.pp, line: 49) [09:36:29] right, because we haven't published openjdk-11 packages for bookworm it seems [09:36:52] but your last run was for bullseye [09:37:40] mayhe a hiera setting? [09:37:43] that deps is dynamic [09:39:20] ah no I see that's hadrcoded to java_package => 'openjdk-11-jdk', [09:39:25] in modules/profile/manifests/apifeatureusage/logstash.pp [09:39:28] it is yes [09:39:36] At this point, I think I'd rather start from scratch and reimage that VM to bullesye, for which we have available packages. is that something I can do with the state of the VM? [09:40:17] but you already did that [09:40:18] Executing cookbook sre.hosts.reimage with args: ['--o [09:40:18] s', 'bullseye', '-t', 'T346053', 'apifeatureusage1001', '--new'] [09:40:19] T346053: Migrate apifeatureusage hosts to Bullseye or later - https://phabricator.wikimedia.org/T346053 [09:40:47] yep, and it failed on a cumin execution error, like I said [09:40:58] yes to execute the puppet lookup I mentioned above [09:41:01] on the puppetserver [09:41:05] has nothing to do with ssh to the host [09:41:31] it does that to autodetect if puppet 5 or puppet 7 should be setup [09:42:54] brouberol: ahhh I might have found it [09:43:05] the logstash class adds: require => Package[$java_package], [09:43:17] but if you didn't define that in your catalog, puppet fails to compile [09:43:40] also the first puppet failure, it doesn't find Package[openjdk-11-jdk] [09:43:43] in the catalog [09:43:54] the package might even exists in bookworm [09:44:00] it would fail the same way [09:45:03] Let me have a look at the manifest real quick [09:46:19] ooh, I see [09:46:27] the default openjdk package for bookworm is 'bookworm' => [{'version' => '17', 'variant' => 'jdk'}], [09:46:49] so the Package[openjdk-11-jdk] dependency isn't met at all [09:47:27] tell you what, I think it'd be easier to decom the VM altogether and reprovision it on bullseye [09:48:05] there's no data on it, nothing is currently running on it either [09:50:02] if I were to do that, what I don't know is how to provision a new VM in the first place. Would reimage --new do that? [09:50:58] no,there is a makevm cookbook for that, you have to follow https://wikitech.wikimedia.org/wiki/Ganeti#Create_a_VM [09:51:04] but it shouldn't be needed at all [09:51:14] let me check one quicj thing [09:51:25] thanks for the time btw [09:54:52] brouberol: could you retry to run the reimage cookbook for bookworm and --new? [09:55:04] yes sir [09:55:50] and I guess puppet 7 if asked (I can't recall if it does) :D [09:55:54] ah now it's asking with what version of puppet it should install the server [09:55:59] not sure your cluster is [09:56:09] *in which version your cluster is [09:56:58] yeah I've removed the host from puppetdb (with --new the lookup shouldn't happen anyway) [09:57:00] looking at scrollback, I didn't see "this server was provisioned w/ puppet 7" when ssh-ing on the old node, so I'm going to assume 5 actually [09:57:21] 10Data-Platform-SRE ( 2024.02.12 - 2024.03.03): Migrate apifeatureusage hosts to Bullseye or later - https://phabricator.wikimedia.org/T346053 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by brouberol@cumin1002 for host apifeatureusage1001.eqiad.wmnet with OS bookworm [09:57:39] we can check hiera [09:58:32] if it's a new cluster go for 7 directly [09:58:37] one thing less to migrate [09:59:23] but yeah I don't see profile::puppet::agent::force_puppet7 set to true for this cluster or host [09:59:27] AFAICT [10:07:49] sorry, back from meeting. Puppet is running ATM, although I'm expecting it'll fail the same way [10:08:48] oh wait, what did I suggest you :facepalm: [10:09:44] I meant bullseye :/ [10:10:03] * volans hates similar names for versions [10:10:46] so to recap what's happening, when you interrupted the reimage with bookworm because of the puppe failure, the host had already started the first puppet run and hence sent facts to puppetdb [10:11:12] hahaha, no worries. I'm coffee starved, I shulld have seen that as well [10:11:32] as the next run with bullseye, the reimage cookbook noticed that the host is alredy existing and tried to autodetect the puppet version to install with a puppet lookup on puppetserver [10:12:39] that failed because it couldn't compile the catalog with the current values [10:13:08] ...so if this run fails as expected (let's see), let me know and I can remove it again from puppetdb and you can do the reimage with bullseye and --new [10:14:15] sorry for the additionl mis-step [10:16:50] no worries at all, I'm grateful for the time really [10:18:27] anytime :) [10:21:59] lmk if I should redo the puppetdb cleanup [10:23:35] 10Data-Platform-SRE ( 2024.02.12 - 2024.03.03): Migrate apifeatureusage hosts to Bullseye or later - https://phabricator.wikimedia.org/T346053 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by brouberol@cumin1002 for host apifeatureusage1001.eqiad.wmnet with OS bookworm executed with errors:... [10:24:02] yes please! [10:24:30] done [10:24:31] https://puppetboard.wikimedia.org/node/apifeatureusage1001.eqiad.wmnet [10:24:41] (empty on the right column, no facts) [10:25:50] 10Data-Platform-SRE ( 2024.02.12 - 2024.03.03): Migrate apifeatureusage hosts to Bullseye or later - https://phabricator.wikimedia.org/T346053 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by brouberol@cumin1002 for host apifeatureusage1001.eqiad.wmnet with OS bookworm [10:41:52] 10Data-Platform-SRE ( 2024.02.12 - 2024.03.03): Migrate apifeatureusage hosts to Bullseye or later - https://phabricator.wikimedia.org/T346053 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by brouberol@cumin1002 for host apifeatureusage1001.eqiad.wmnet with OS bookworm executed with errors:... [10:42:30] brouberol: you did bookworm again? [10:47:32] * brouberol puts head in hands [10:47:55] sorry, it's one of these mornings. I was up all night due to the kiddo, and I can't seem to do the right thing [10:48:34] no worries, we're even :D [10:48:37] let me delete it again [10:49:02] ok done, now you can reimage with *bullseye* and *--new* :-P [10:49:33] 10Data-Platform-SRE ( 2024.02.12 - 2024.03.03): Migrate apifeatureusage hosts to Bullseye or later - https://phabricator.wikimedia.org/T346053 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by brouberol@cumin1002 for host apifeatureusage1001.eqiad.wmnet with OS bullseye [10:49:44] done! It only took me 56 tries [10:49:48] thanks again [10:49:55] rotfl [11:20:06] 10Data-Platform-SRE ( 2024.02.12 - 2024.03.03): Migrate apifeatureusage hosts to Bullseye or later - https://phabricator.wikimedia.org/T346053 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by brouberol@cumin1002 for host apifeatureusage1001.eqiad.wmnet with OS bullseye completed: - apifeatu... [11:21:30] 10Data-Platform-SRE ( 2024.02.12 - 2024.03.03): Migrate apifeatureusage hosts to Bullseye or later - https://phabricator.wikimedia.org/T346053 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by brouberol@cumin1002 for host apifeatureusage2001.codfw.wmnet with OS bullseye [11:22:17] yay [11:23:17] <3 [11:50:43] 10Data-Platform-SRE ( 2024.02.12 - 2024.03.03): Migrate apifeatureusage hosts to Bullseye or later - https://phabricator.wikimedia.org/T346053 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by brouberol@cumin1002 for host apifeatureusage2001.codfw.wmnet with OS bullseye completed: - apifeatu... [11:51:13] 10Data-Platform-SRE ( 2024.02.12 - 2024.03.03): Migrate apifeatureusage hosts to Bullseye or later - https://phabricator.wikimedia.org/T346053 (10brouberol) 05Open→03Resolved [11:51:19] 10Data-Platform-SRE, 10Epic: [Epic] Migrate all Search Platform servers to Debian Bullseye - https://phabricator.wikimedia.org/T323921 (10brouberol) [12:04:53] (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-coord1003:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1003:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [12:33:34] 10Data-Engineering, 10MediaWiki-extensions-EventLogging, 10Metrics Platform Backlog, 10Data Products (Data Products Sprint 09), 10Technical-Debt: Fix public documentation for mw.eventLog.submit() and dispatch() - https://phabricator.wikimedia.org/T357003 (10phuedx) [12:37:49] 10Data-Engineering, 10Metrics Platform Backlog, 10Data Products (Data Products Sprint 09), 10Spike: [SPIKE] Remove mentions of MetricsClient#dispatch() and the monoschema from documentation - https://phabricator.wikimedia.org/T355046 (10phuedx) [12:38:10] 10Data-Engineering, 10Data-Platform, 10MediaWiki-extensions-EventLogging, 10Metrics Platform Backlog, 10Epic: Deprecate and remove MetricsClient#dispatch() - https://phabricator.wikimedia.org/T352969 (10phuedx) [12:52:20] 10Data-Platform-SRE ( 2024.02.12 - 2024.03.03), 10Patch-For-Review: Check log rotation settings on airflow instances - https://phabricator.wikimedia.org/T339015 (10Stevemunene) Spent some time looking at [[ https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/logging-monitoring/a... [12:59:53] (HiveServerHeapUsage) resolved: Hive Server JVM Heap usage is above 80% on an-coord1003:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1003:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [13:04:18] 10Data-Platform-SRE ( 2024.02.12 - 2024.03.03): Bring stat1011 into service - https://phabricator.wikimedia.org/T354526 (10Stevemunene) a:05BTullis→03Stevemunene [15:39:06] 10Data-Engineering: Airflow mapped tasks UI & metrics - https://phabricator.wikimedia.org/T357430 (10Antoine_Quhen) [15:39:47] 10Data-Engineering: Airflow mapped tasks UI & metrics - https://phabricator.wikimedia.org/T357430 (10Antoine_Quhen) [15:39:49] 10Data-Engineering (Sprint 8): [Refine Refactoring] Orchestrate Airflow execution of navigationtiming from config store - https://phabricator.wikimedia.org/T356360 (10Antoine_Quhen) [15:59:59] Team, I’ll be 5min late to the sync [16:00:00] 10Data-Engineering, 10Wikidata, 10Wikidata-Termbox, 10serviceops, and 3 others: Migrate Termbox SSR from Node 16 to 18 - https://phabricator.wikimedia.org/T355685 (10akosiaris) Patches have been deployed, simple curl tests as well as `service-checker-swagger` checks have passed. I double checked the diff,... [16:04:33] 10Data-Engineering: [Dataset Config Store] Deploy poc to dse-k8s - https://phabricator.wikimedia.org/T357434 (10lbowmaker) [16:12:34] 10Data-Engineering, 10Data-Platform-SRE ( 2024.02.12 - 2024.03.03): [Iceberg Migration] P.O.C. on Iceberg sensor using Postgres table to keep status of updates - https://phabricator.wikimedia.org/T340466 (10lbowmaker) [16:20:28] 10Data-Platform-SRE ( 2024.02.12 - 2024.03.03), 10Patch-For-Review: Check log rotation settings on airflow instances - https://phabricator.wikimedia.org/T339015 (10Gehel) 05Open→03Resolved [16:27:04] 10Data-Engineering: Remove wikidata from this historical dumps process - https://phabricator.wikimedia.org/T357438 (10lbowmaker) [16:40:49] 10Data-Platform-SRE, 10superset.wikimedia.org: Prod Superset down, showing HTTP 500 instead - https://phabricator.wikimedia.org/T350718 (10Krinkle) [16:45:20] 10Data-Platform-SRE ( 2024.02.12 - 2024.03.03), 10CirrusSearch, 10Discovery-Search (Current work): SUP: Production - https://phabricator.wikimedia.org/T354595 (10bking) [16:52:26] 10Data-Engineering, 10Data Pipelines, 10Product-Analytics: Add Product-Analytics Announcements to Airflow job for notifications - https://phabricator.wikimedia.org/T301281 (10mpopov) @Mayakp.wiki: you and others are already in product-analytics-announce@, but the alerts aren't sent to that. If you're referr... [16:57:25] 10Analytics, 10AQS2.0, 10Tech-Docs-Team, 10Data Products (Epics Timeline), and 3 others: AQS 2.0 documentation - https://phabricator.wikimedia.org/T288664 (10apaskulin) [17:05:13] 10Data-Engineering, 10Metrics Platform Backlog, 10Data Products (Data Products Sprint 09), 10Spike: [SPIKE] Remove mentions of MetricsClient#dispatch() and the monoschema from documentation - https://phabricator.wikimedia.org/T355046 (10VirginiaPoundstone) p:05Triage→03High [17:07:03] 10Data-Engineering, 10MediaWiki-extensions-EventLogging: Migrate EventLogging to JSDoc - https://phabricator.wikimedia.org/T357444 (10apaskulin) [17:12:06] 10Data-Engineering, 10MediaWiki-extensions-EventLogging, 10Metrics Platform Backlog, 10Data Products (Data Products Sprint 09), 10Technical-Debt: Fix public documentation for mw.eventLog.submit() and dispatch() - https://phabricator.wikimedia.org/T357003 (10apaskulin) Great! I've opened {T357444}. I can... [17:28:34] 10Data-Platform-SRE ( 2024.02.12 - 2024.03.03), 10Epic: Migrate an-test-ui1001 to bullseye - https://phabricator.wikimedia.org/T357448 (10brouberol) [17:29:08] 10Data-Platform-SRE ( 2024.02.12 - 2024.03.03), 10Epic: Migrate an-test-ui1001 to bullseye - https://phabricator.wikimedia.org/T357448 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by brouberol@cumin1002 for host an-test-ui1001.eqiad.wmnet with OS bullseye [18:29:48] 10Data-Platform-SRE ( 2024.02.12 - 2024.03.03): Migrate an-test-ui1001 to bullseye - https://phabricator.wikimedia.org/T357448 (10brouberol) [18:42:53] (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-coord1003:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1003:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [18:44:06] 10Data-Platform-SRE ( 2024.02.12 - 2024.03.03), 10Patch-For-Review: Migrate an-test-ui1001 to bullseye - https://phabricator.wikimedia.org/T357448 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by brouberol@cumin1002 for host an-test-ui1001.eqiad.wmnet with OS bullseye completed: - an-test... [18:44:17] 10Data-Platform-SRE, 10Epic: Upgrade the Data Engineering infrastructure to Debian Bullseye - https://phabricator.wikimedia.org/T288804 (10brouberol) [18:44:19] 10Data-Platform-SRE ( 2024.02.12 - 2024.03.03), 10Patch-For-Review: Migrate an-test-ui1001 to bullseye - https://phabricator.wikimedia.org/T357448 (10brouberol) 05Open→03Resolved [18:47:07] 10Data-Platform-SRE ( 2024.02.12 - 2024.03.03), 10Patch-For-Review: Migrate an-test-ui1001 to bullseye - https://phabricator.wikimedia.org/T357448 (10brouberol) p:05Triage→03Medium [18:50:20] 10Data-Engineering, 10Data Pipelines, 10Product-Analytics: Add Product-Analytics Announcements to Airflow job for notifications - https://phabricator.wikimedia.org/T301281 (10Mayakp.wiki) thank you !! we can continue the conversations on Slack. [19:12:53] (HiveServerHeapUsage) resolved: Hive Server JVM Heap usage is above 80% on an-coord1003:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1003:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [19:13:18] 10Data-Engineering: Turn off ReportUpdater jobs no longer used - https://phabricator.wikimedia.org/T357419 (10lbowmaker) [19:23:00] 10Data-Platform-SRE ( 2024.02.12 - 2024.03.03): RdfStreamingUpdaterSpaceUsageTooHigh - https://phabricator.wikimedia.org/T356698 (10bking) 05Open→03Invalid Duplicate of T356313 , closing... [19:36:17] 10Data-Engineering, 10Data-Engineering-Wikistats, 10Data Pipelines, 10Data Products, and 4 others: Merge ks-Arab and ks-Deva to ks - https://phabricator.wikimedia.org/T314476 (10Winston_Sung) 05In progress→03Open [19:42:20] 10Data-Engineering, 10Data-Platform: Enable notifications - https://phabricator.wikimedia.org/T357462 (10Mayakp.wiki) [19:44:12] (03PS5) 10Gmodena: development: Add webrequest schema [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/983898 (https://phabricator.wikimedia.org/T314956) (owner: 10Ottomata) [19:45:11] (03CR) 10CI reject: [V: 04-1] development: Add webrequest schema [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/983898 (https://phabricator.wikimedia.org/T314956) (owner: 10Ottomata) [19:50:40] 10Data-Engineering, 10Data-Platform: Enable notifications for completion of Hive table snapshots - https://phabricator.wikimedia.org/T357462 (10Mayakp.wiki) [19:51:03] 10Data-Engineering, 10Data-Platform: Enable notifications for completion of Hive table snapshots - https://phabricator.wikimedia.org/T357462 (10Mayakp.wiki) p:05Triage→03Medium [19:51:18] 10Data-Engineering, 10Data-Platform, 10Movement-Insights: Enable notifications for completion of Hive table snapshots - https://phabricator.wikimedia.org/T357462 (10Mayakp.wiki) [19:58:36] 10Data-Engineering, 10Event-Platform, 10Patch-For-Review: [Event Platform] Declare webrequest as an Event Platform stream - https://phabricator.wikimedia.org/T314956 (10gmodena) @Fabfur and I would like to start some integration tests in the short term. I moved the `webrequest` schema from GA to `development... [20:26:08] 10Data-Platform-SRE ( 2024.02.12 - 2024.03.03), 10CirrusSearch, 10Discovery-Search (Current work): SUP: Production - https://phabricator.wikimedia.org/T354595 (10bking) [20:28:06] 10Data-Platform-SRE ( 2024.02.12 - 2024.03.03), 10CirrusSearch, 10Discovery-Search (Current work): SUP: Production - https://phabricator.wikimedia.org/T354595 (10bking) 05In progress→03Resolved Per SRE pairing conversation, we only have one outstanding item, "add production releases to cirrus-streaming-u... [20:32:35] 10Data-Engineering: [Dataset Config Store] Setup initial CI checks - https://phabricator.wikimedia.org/T357468 (10lbowmaker) [20:44:01] 10Data-Engineering, 10Data-Platform, 10Movement-Insights: Add movement insights group/users to MWH denormalize job alerts - https://phabricator.wikimedia.org/T357472 (10lbowmaker) [20:44:39] 10Data-Engineering, 10Data-Platform, 10Movement-Insights: Enable notifications for completion of Hive table snapshots - https://phabricator.wikimedia.org/T357462 (10lbowmaker) Created subtask for short term work. We will keep parent ticket for longer term work. [23:39:42] 10Data-Engineering, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 10Movement-Insights: Unique Devices seasonal trends on small projects - https://phabricator.wikimedia.org/T344381 (10Mayakp.wiki) Wanted to note here that the smaller project families that were seeing increases during December... [23:44:15] 10Data-Engineering: [NEEDS GROOMING][SPIKE] Extract refine schema management into a dedicated tool - https://phabricator.wikimedia.org/T356762 (10Ottomata) Oh and in case you haven't seen it: [[ https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-job/src/main/scala/org/wikimedia/analytics...