[00:04:28] (SystemdUnitFailed) firing: (2) wmf_auto_restart_prometheus-mysqld-exporter@staging.service Failed on dbstore1009:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [01:01:11] 10Data-Engineering, 10Product-Analytics, 10Patch-For-Review: Propagate field descriptions from event schemas to Hive event tables - https://phabricator.wikimedia.org/T307040 (10Ottomata) > we should decide sometime soon Aye, prob a different ticket. > syncing the stuff people put in Datahub back to the cod... [01:12:38] 10Data-Engineering, 10MediaWiki-General, 10Event-Platform, 10Patch-For-Review: Create legacy EventLogging proxy HTTP intake (for MediaWikiPingback) endpoint to EventGate - https://phabricator.wikimedia.org/T353817 (10Ottomata) > Volume > peak request rate was ~1900 requests/s. I expect most of are from th... [01:14:17] 10Data-Engineering, 10Movement-Insights, 10Traffic, 10Patch-For-Review: Identify and label prefetch proxy data in our traffic - https://phabricator.wikimedia.org/T346463 (10Ottomata) IIRC, the decision was to wait until the new year, so as not to risk a mistake while people were out on holidays. I'm about... [01:28:08] 10Data-Engineering, 10MediaWiki-General, 10Event-Platform, 10Patch-For-Review: Create legacy EventLogging proxy HTTP intake (for MediaWikiPingback) endpoint to EventGate - https://phabricator.wikimedia.org/T353817 (10Ottomata) > peak request rate was ~1900 requests/s. Oh, that turnilo chart is per hour (I... [01:42:50] 10Data-Engineering, 10Product-Analytics, 10Event-Platform: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10Ottomata) @SNowick_WMF, are latest versions of apps still sending the various MobileApp* events? I see a few events coming in, but maybe those are just f... [04:04:28] (SystemdUnitFailed) firing: (2) wmf_auto_restart_prometheus-mysqld-exporter@staging.service Failed on dbstore1009:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:19:33] 10Data-Platform-SRE (2023/24 Q3 Milestone 1): Restart Search Platform-owned services for Java 8 / Java 11 security updates - https://phabricator.wikimedia.org/T350703 (10MoritzMuehlenhoff) >>! In T350703#9436291, @RKemper wrote: > @MoritzMuehlenhoff This should be all done. Let us know if you see any rogue java... [08:09:14] (SystemdUnitFailed) firing: (2) wmf_auto_restart_prometheus-mysqld-exporter@staging.service Failed on dbstore1009:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:33:28] 10Data-Engineering, 10MediaWiki-General, 10Event-Platform, 10Patch-For-Review: Create legacy EventLogging proxy HTTP intake (for MediaWikiPingback) endpoint to EventGate - https://phabricator.wikimedia.org/T353817 (10phuedx) >>! In T353817#9436883, @Ottomata wrote: > Oh, that turnilo chart is per hour (I t... [09:44:47] 10Data-Engineering, 10Dumps-Generation: Migrate Dumps Snapshot hosts from Buster to Bullseye - https://phabricator.wikimedia.org/T325228 (10MoritzMuehlenhoff) [09:55:02] 10Data-Platform-SRE (2023/24 Q3 Milestone 1), 10Patch-For-Review: Create a helm chart for Superset - https://phabricator.wikimedia.org/T352166 (10akosiaris) [10:09:14] (SystemdUnitFailed) firing: (2) wmf_auto_restart_prometheus-mysqld-exporter@staging.service Failed on dbstore1009:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:13:27] (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1001:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [13:12:11] 10Analytics, 10Data-Engineering, 10Data-Engineering-Wikistats, 10Data Products (Data Products Sprint 05): Wikistats - incorrect number of content articles for Latvian Wikipedia - https://phabricator.wikimedia.org/T354074 (10Sfaci) Taking a look at the code, this issue seems to be related with something sim... [13:25:27] 10Data-Engineering, 10Data Products, 10MediaWiki-extensions-EventLogging, 10Metrics Platform Backlog, and 2 others: EventLoggingTest::testDispatch fails when time ticks within the test run - https://phabricator.wikimedia.org/T353243 (10phuedx) [13:39:45] 10Data-Engineering, 10Data-Platform, 10MediaWiki-extensions-EventLogging, 10Metrics Platform Backlog, 10Epic: Deprecate and remove MetricsClient#dispatch() - https://phabricator.wikimedia.org/T352969 (10phuedx) [13:53:23] 10Data-Engineering, 10Data-Platform, 10MediaWiki-extensions-EventLogging: Remove EventLogging::submitMetricsEvent() - https://phabricator.wikimedia.org/T354419 (10phuedx) [13:53:46] 10Data-Engineering, 10Data-Platform, 10MediaWiki-extensions-EventLogging, 10good first task: Remove EventLogging::submitMetricsEvent() - https://phabricator.wikimedia.org/T354419 (10phuedx) [14:01:05] 10Data-Engineering, 10Data Products, 10MediaWiki-extensions-EventLogging, 10Metrics Platform Backlog, and 2 others: EventLoggingTest::testDispatch fails when time ticks within the test run - https://phabricator.wikimedia.org/T353243 (10phuedx) Hello friends! Sorry for not seeing this task sooner. wikimedia... [14:09:29] (SystemdUnitFailed) firing: user-runtime-dir@43623.service Failed on stat1007:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:10:51] 10Data-Platform-SRE, 10DBA, 10Infrastructure-Foundations, 10Puppet-Core, and 3 others: Revert dbstore migration from puppet7 to puppet5 - https://phabricator.wikimedia.org/T354411 (10BTullis) a:03BTullis I've got no problem with this. I think that I can run the **rollback** steps from T349619. [14:12:04] 10Data-Platform-SRE (2023/24 Q3 Milestone 1), 10DBA, 10Infrastructure-Foundations, 10Puppet-Core, and 3 others: Revert dbstore migration from puppet7 to puppet5 - https://phabricator.wikimedia.org/T354411 (10BTullis) [14:12:14] 10Data-Platform-SRE (2023/24 Q3 Milestone 1), 10DBA, 10Infrastructure-Foundations, 10Puppet-Core, and 3 others: Revert dbstore migration from puppet7 to puppet5 - https://phabricator.wikimedia.org/T354411 (10Marostegui) I am not sure if that'll bring us everything back or we'll need to do something with th... [14:14:33] (03CR) 10Xcollazo: [C: 03+1] Add iceberg version of aqs_hourly table [analytics/refinery] - 10https://gerrit.wikimedia.org/r/982869 (https://phabricator.wikimedia.org/T352669) (owner: 10TChin) [14:45:27] 10Data-Engineering, 10Product-Analytics, 10Event-Platform: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10mpopov) 0 results: - https://github.com/search?q=repo%3Awikimedia%2Fapps-android-wikipedia+mobilewikiapp&type=code - https://github.com/search?q=repo%3A... [14:47:28] 10Data-Engineering, 10Product-Analytics, 10Event-Platform: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10Ottomata) Okay great! Thank you. [15:03:28] (HiveServerHeapUsage) resolved: Hive Server JVM Heap usage is above 80% on an-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1001:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [16:21:00] 10Analytics, 10Data-Engineering, 10Data-Engineering-Wikistats, 10Data Products (Data Products Sprint 05): Wikistats - incorrect number of content articles for Latvian Wikipedia - https://phabricator.wikimedia.org/T354074 (10Sfaci) After merging the patch and deploy the service to the staging environment (w... [16:56:28] (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1001:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [17:58:20] 10Data-Engineering (Sprint 6): [Iceberg Migration] Migrate interlanguage tables to Iceberg - https://phabricator.wikimedia.org/T352671 (10tchin) a:03tchin [18:07:14] 10Data-Platform-SRE, 10Wikidata, 10Wikidata-Query-Service: Identify/complete post-migration tasks after rdf-streaming-updater migrates to flink operator - https://phabricator.wikimedia.org/T350784 (10bking) a:03bking [18:09:29] (SystemdUnitFailed) firing: user-runtime-dir@43623.service Failed on stat1007:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:10:05] (03PS8) 10TChin: Add iceberg version of aqs_hourly table [analytics/refinery] - 10https://gerrit.wikimedia.org/r/982869 (https://phabricator.wikimedia.org/T352669) [18:10:23] (03PS1) 10TChin: Add iceberg version of interlanguage_navigation table [analytics/refinery] - 10https://gerrit.wikimedia.org/r/986839 (https://phabricator.wikimedia.org/T352671) [18:11:33] (03PS2) 10TChin: Add iceberg version of interlanguage_navigation table [analytics/refinery] - 10https://gerrit.wikimedia.org/r/986839 (https://phabricator.wikimedia.org/T352671) [18:23:56] 10Data-Engineering (Sprint 6), 10Patch-For-Review: [Iceberg Migration] Migrate interlanguage tables to Iceberg - https://phabricator.wikimedia.org/T352671 (10tchin) TIL when setting the compression codec to snappy, Iceberg doesn't end the files in hdfs with `.snappy.parquet`. I had to check if the format was c... [18:32:18] 10Data-Engineering (Sprint 6), 10Patch-For-Review: [Iceberg Migration] Migrate interlanguage tables to Iceberg - https://phabricator.wikimedia.org/T352671 (10tchin) `INSERT OVERRIDE` with `PARTITION` also doesn't work anymore because Iceberg uses hidden partitioning so had to enable Spark's dynamic overwrite h... [18:56:26] 10Data-Platform-SRE (2023/24 Q3 Milestone 1), 10Wikidata, 10Wikidata-Query-Service: Identify/complete post-migration tasks after rdf-streaming-updater migrates to flink operator - https://phabricator.wikimedia.org/T350784 (10bking) [19:32:24] 10Data-Platform-SRE (2023/24 Q3 Milestone 1), 10Wikidata, 10Wikidata-Query-Service: Identify/complete post-migration tasks after rdf-streaming-updater migrates to flink operator - https://phabricator.wikimedia.org/T350784 (10bking) [19:41:28] (HiveServerHeapUsage) resolved: Hive Server JVM Heap usage is above 80% on an-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1001:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [22:02:29] 10Analytics, 10Data-Engineering, 10EventStreams, 10Privacy Engineering, and 3 others: EventStreams should redact predetermined wiki articles - https://phabricator.wikimedia.org/T354456 (10Ottomata) [22:09:29] (SystemdUnitFailed) firing: user-runtime-dir@43623.service Failed on stat1007:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed