[00:16:42] (SystemdUnitFailed) firing: hardsync-published.service Failed on an-web1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:18:04] PROBLEM - Check systemd state on an-web1001 is CRITICAL: CRITICAL - degraded: The following units failed: hardsync-published.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:30:06] RECOVERY - Check systemd state on an-web1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:36:42] (SystemdUnitFailed) resolved: hardsync-published.service Failed on an-web1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [01:15:14] 10Data-Engineering, 10Data-Engineering-Wikistats, 10Data Products, 10Data Pipelines (Sprint 12): Non-mobile UAs on mobile (2g/gprs, etc) IP-blocks - https://phabricator.wikimedia.org/T58628 (10MarkAHershberger) I'm not sure what you should do with this since I haven't thought about it in 10 years. Mobile... [02:26:16] (03PS4) 10David Martin: Add wikilambda_zobject_labels & wikilambda_zobject_function_join queries to sqoop [analytics/refinery] - 10https://gerrit.wikimedia.org/r/937568 (https://phabricator.wikimedia.org/T341724) [04:43:16] (03PS4) 10David Martin: Create wikilambda_zobject_labels & wikilambda_zobject_function_join tables in Hive [analytics/refinery] - 10https://gerrit.wikimedia.org/r/937997 (https://phabricator.wikimedia.org/T341729) [04:51:12] (03CR) 10David Martin: Create wikilambda_zobject_labels & wikilambda_zobject_function_join tables in Hive (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/937997 (https://phabricator.wikimedia.org/T341729) (owner: 10David Martin) [05:31:42] (SystemdUnitFailed) firing: wmf_auto_restart_airflow-scheduler@analytics_test.service Failed on an-test-client1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:10:24] 10Data-Platform-SRE, 10API Platform, 10Anti-Harassment, 10Content-Transform-Team, and 18 others: Migrate PipelineLib repos to GitLab - https://phabricator.wikimedia.org/T332953 (10dcaro) [08:11:49] 10Data-Platform-SRE, 10API Platform, 10Anti-Harassment, 10Content-Transform-Team, and 18 others: Migrate PipelineLib repos to GitLab - https://phabricator.wikimedia.org/T332953 (10dcaro) [09:11:36] 10Data-Platform-SRE, 10Patch-For-Review: Migrate analytics_test airflow instance to bullseye an-test-client1002 - https://phabricator.wikimedia.org/T341700 (10Stevemunene) [09:14:07] 10Data-Platform-SRE, 10Patch-For-Review: Migrate analytics_test airflow instance to bullseye an-test-client1002 - https://phabricator.wikimedia.org/T341700 (10Stevemunene) The airflow services are running ok on `an-test-client1002` with zero errors/alerts. Next is to find an alternative to how we ensure the... [09:31:42] (SystemdUnitFailed) firing: wmf_auto_restart_airflow-scheduler@analytics_test.service Failed on an-test-client1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:47:50] 10Data-Platform-SRE, 10Patch-For-Review: Migrate analytics_test airflow instance to bullseye an-test-client1002 - https://phabricator.wikimedia.org/T341700 (10BTullis) >>! In T341700#9014893, @Stevemunene wrote: > Next is to find an alternative to how we ensure the airflow scheduler is not present on more th... [09:48:34] 10Data-Platform-SRE, 10Patch-For-Review: Migrate analytics_test airflow instance to bullseye an-test-client1002 - https://phabricator.wikimedia.org/T341700 (10BTullis) [09:48:36] 10Data-Platform-SRE: Upgrade Airflow instances to Bullseye - https://phabricator.wikimedia.org/T335261 (10BTullis) [10:26:06] 10Data-Platform-SRE: Upgrade the spark YARN shuffler service on Hadoop workers from version 2 to 3 - https://phabricator.wikimedia.org/T332765 (10BTullis) 05Open→03Resolved [10:26:27] 10Data-Platform-SRE: an-worker1145 has a problem - https://phabricator.wikimedia.org/T341481 (10BTullis) 05Open→03Resolved [11:04:20] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [11:06:42] (SystemdUnitFailed) firing: (2) produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:16:24] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [11:16:42] (SystemdUnitFailed) firing: (2) produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:20:42] 10Analytics-Radar, 10Data-Engineering-Icebox, 10Discovery-Search, 10Reading-Admin, and 3 others: Image Classification Working Group - https://phabricator.wikimedia.org/T215413 (10Miriam) [14:46:30] 10Data-Platform-SRE: Decide whether to migrate from Presto to Trino - https://phabricator.wikimedia.org/T266640 (10BTullis) p:05High→03Low Reducing the priority of this ticket to low. It's still something that we might want to do, but for now the performance issues with Presto have been resolved. [14:47:49] 10Data-Platform-SRE: Decide whether to migrate from Presto to Trino - https://phabricator.wikimedia.org/T266640 (10BTullis) [14:47:52] 10Data-Engineering-Planning, 10Data Pipelines, 10Shared-Data-Infrastructure: [Iceberg] Debianize and install iceberg support for Spark, Presto, and optionally Hive - https://phabricator.wikimedia.org/T311738 (10BTullis) [14:47:57] 10Data-Platform-SRE: SPIKE: Spin up a Test Trino instance (Evaluate Trino) - https://phabricator.wikimedia.org/T324011 (10BTullis) 05Open→03Declined Declining this ticket. We still have T266640 which we can use to track any evaluation work regarding Trino. [14:50:21] 10Data-Platform-SRE: Research and test methods for accessing kerberized services from spark running on the DSE K8S cluster - https://phabricator.wikimedia.org/T330162 (10BTullis) 05Open→03Resolved [15:16:42] (SystemdUnitFailed) firing: wmf_auto_restart_airflow-scheduler@analytics_test.service Failed on an-test-client1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:18:19] ^ stevemunene It looks like we have an auto-restart for the airflow-scheduler on an-test-client1001 - but it didn't auto-restart. Would you like me to check this, or will you? [15:18:50] It's probably a good thing that it didn't restart, given that you've moved the scheduler to an-test-client1002, but I just thought I'd check. [15:19:25] (03CR) 10Mforns: Create wikilambda_zobject_labels & wikilambda_zobject_function_join tables in Hive (033 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/937997 (https://phabricator.wikimedia.org/T341729) (owner: 10David Martin) [15:23:39] (03CR) 10Mforns: [C: 03+1] "Looks good to me! Although I haven't ever touched this code, I think. I'll let Dan confirm! +1" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/937568 (https://phabricator.wikimedia.org/T341724) (owner: 10David Martin) [16:04:35] (03PS6) 10Kimberly Sarabia: Fix editattemptstep ref [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/934032 (https://phabricator.wikimedia.org/T337270) [16:07:47] (03PS7) 10Kimberly Sarabia: Fix editattemptstep ref [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/934032 (https://phabricator.wikimedia.org/T337270) [16:41:23] 10Data-Engineering, 10Data-Platform-SRE: Stop and remove oozie services - https://phabricator.wikimedia.org/T341893 (10BTullis) [16:47:53] (03PS1) 10Kimberly Sarabia: Update web_ab_test_enrollment schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/938280 (https://phabricator.wikimedia.org/T337270) [16:48:22] 10Data-Engineering, 10Data-Platform-SRE: Deprecate Hue and stop the services - https://phabricator.wikimedia.org/T341895 (10BTullis) [16:48:38] 10Data-Engineering, 10Data-Platform-SRE: Deprecate Hue and stop the services - https://phabricator.wikimedia.org/T341895 (10BTullis) [17:03:12] (03PS5) 10David Martin: Create wikilambda_zobject_labels & wikilambda_zobject_function_join tables in Hive [analytics/refinery] - 10https://gerrit.wikimedia.org/r/937997 (https://phabricator.wikimedia.org/T341728) [17:06:49] (03PS1) 10Kimberly Sarabia: Update web ui scroll [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/938284 (https://phabricator.wikimedia.org/T337270) [17:09:21] (03CR) 10Kimberly Sarabia: "Parsing out Web team owned schema for review from 934032" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/938284 (https://phabricator.wikimedia.org/T337270) (owner: 10Kimberly Sarabia) [17:10:01] (03CR) 10Kimberly Sarabia: "web team owned schema for review" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/938280 (https://phabricator.wikimedia.org/T337270) (owner: 10Kimberly Sarabia) [17:18:53] (03CR) 10Kimberly Sarabia: "Hi Thomas and Gabriele - Would one of you be able to review this in place of Otto?" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/934032 (https://phabricator.wikimedia.org/T337270) (owner: 10Kimberly Sarabia) [18:00:35] (03PS6) 10David Martin: Create wikilambda_zobject_labels & wikilambda_zobject_function_join tables in Hive [analytics/refinery] - 10https://gerrit.wikimedia.org/r/937997 (https://phabricator.wikimedia.org/T341728) [18:07:36] 10Data-Platform-SRE, 10Data-Catalog, 10Patch-For-Review: Create Airflow Pipeline for Ingesting/Updating Superset Data - https://phabricator.wikimedia.org/T309622 (10BTullis) p:05Triage→03Medium [18:09:32] (03CR) 10David Martin: Create wikilambda_zobject_labels & wikilambda_zobject_function_join tables in Hive (033 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/937997 (https://phabricator.wikimedia.org/T341728) (owner: 10David Martin) [18:42:29] 10Data-Platform-SRE, 10DC-Ops, 10SRE, 10ops-eqiad: Q3:rack/setup/install an-worker11[49-56] - https://phabricator.wikimedia.org/T327295 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host an-worker1153.eqiad.wmnet with OS bullseye [19:16:43] (SystemdUnitFailed) firing: wmf_auto_restart_airflow-scheduler@analytics_test.service Failed on an-test-client1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:38:53] 10Data-Platform-SRE, 10DC-Ops, 10SRE, 10ops-eqiad: Q3:rack/setup/install an-worker11[49-56] - https://phabricator.wikimedia.org/T327295 (10Jhancock.wm) @Jclark-ctr I need your assistance next time you're onsite at Eqiad. These servers do not have a network connection on the 1st port of the NIC card. Can yo... [19:57:51] (HdfsRpcQueueLength) firing: RPC call queue length on the analytics-hadoop cluster is too high. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Namenode_RPC_length_queue - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=54&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsRpcQueueLength [20:07:51] (HdfsRpcQueueLength) resolved: RPC call queue length on the analytics-hadoop cluster is too high. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Namenode_RPC_length_queue - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=54&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsRpcQueueLength [21:06:37] I generated a list of articles that are popular in 10 big wikis, maybe someone in analycs team find it useful: https://meta.wikimedia.org/wiki/User:Danilo.mac/Popular_articles_in_big_wikis [22:59:42] 10Data-Platform-SRE, 10Data Engineering and Event Platform Team, 10GitLab (Project Migration), 10Release-Engineering-Team (Priority Backlog 📥): Migrate analytics/datahub pipeline to GitLab - https://phabricator.wikimedia.org/T341194 (10BTullis) p:05Triage→03High [23:08:39] 10Data-Engineering, 10Data-Platform-SRE, 10Epic: Analytics Platform Future State Planing - https://phabricator.wikimedia.org/T302728 (10BTullis) [23:10:38] 10Data-Platform-SRE, 10SRE, 10SRE Observability: dropped packets to kafkamon 9000/tcp - https://phabricator.wikimedia.org/T238794 (10BTullis) [23:11:10] 10Data-Platform-SRE: Upgrade Druid to latest upstream (> 0.20.1) - https://phabricator.wikimedia.org/T278056 (10BTullis) [23:12:05] 10Data-Platform-SRE, 10Patch-For-Review, 10User-MoritzMuehlenhoff: Automate kerberos credential creation and management to ease the creation of testing infrastructure - https://phabricator.wikimedia.org/T292389 (10BTullis) [23:12:52] 10Data-Platform-SRE, 10conftool: an-launcher1002: failed services - https://phabricator.wikimedia.org/T330652 (10BTullis) [23:13:46] 10Data-Platform-SRE, 10Data-Services, 10cloud-services-team: Drop several views from ptwikisource - https://phabricator.wikimedia.org/T332596 (10BTullis) [23:14:36] 10Data-Platform-SRE, 10DBA, 10cloud-services-team: Migrate wiki replicas (clouddb*) hosts to MariaDB 10.6 - https://phabricator.wikimedia.org/T334651 (10BTullis) [23:15:13] 10Data-Platform-SRE, 10DBA: Migrate dbstore1005 to MariaDB 10.6 - https://phabricator.wikimedia.org/T334652 (10BTullis) [23:16:13] 10Data-Platform-SRE, 10Data-Persistence, 10Security: Use user-specific passwords for accessing Analytics MariaDB replica databases - https://phabricator.wikimedia.org/T120532 (10BTullis) [23:16:43] (SystemdUnitFailed) firing: wmf_auto_restart_airflow-scheduler@analytics_test.service Failed on an-test-client1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [23:16:45] 10Data-Platform-SRE, 10decommission-hardware: decommission db1108.eqiad.wmnet - https://phabricator.wikimedia.org/T336254 (10BTullis) [23:17:27] 10Data-Platform-SRE: Reset GPUs on stat machines - https://phabricator.wikimedia.org/T336784 (10BTullis) [23:19:20] 10Data-Engineering, 10Data-Engineering-Jupyter, 10Data-Platform-SRE, 10Product-Analytics: Remove anaconda-wmf package from the cluster - https://phabricator.wikimedia.org/T337963 (10BTullis) [23:22:00] 10Data-Engineering, 10Data-Platform-SRE: Missconfigured proxies on data-engineering hosts - https://phabricator.wikimedia.org/T326302 (10BTullis) [23:22:33] 10Data-Platform-SRE: Bring an-coord100[3-4] into service - https://phabricator.wikimedia.org/T336045 (10BTullis) [23:23:32] 10Data-Engineering, 10Data-Platform-SRE, 10SRE, 10observability, and 2 others: Kafka 2.x Upgrade Plan - https://phabricator.wikimedia.org/T302610 (10BTullis) [23:24:48] 10Data-Engineering: Connect Kafka to the MVP [Mile Stone 5] - https://phabricator.wikimedia.org/T299899 (10BTullis) 05Open→03Resolved [23:26:15] 10Data-Platform-SRE: Add Authentication/Encryption to Kafka Jumbo's clients - https://phabricator.wikimedia.org/T250146 (10BTullis) [23:28:05] 10Data-Platform-SRE, 10User-Elukey: Only hdfs (or authenticated user) should be able to run Druid indexing jobs - https://phabricator.wikimedia.org/T192959 (10BTullis) [23:28:53] 10Data-Engineering, 10Data-Platform-SRE, 10Cassandra, 10Pageviews-API, 10User-Elukey: Improve user management for AQS Cassandra - https://phabricator.wikimedia.org/T142073 (10BTullis) [23:29:25] 10Data-Engineering, 10Data-Platform-SRE: Enforce authentication for Druid datasources - https://phabricator.wikimedia.org/T255545 (10BTullis) [23:29:59] 10Data-Engineering, 10Data-Platform-SRE: Verify if Turnilo can pull data from Druid using Kerberos/TLS - https://phabricator.wikimedia.org/T250485 (10BTullis) [23:30:41] 10Data-Platform-SRE: Enforce authentication for Druid datasources - https://phabricator.wikimedia.org/T255545 (10BTullis) [23:31:04] 10Data-Platform-SRE: Verify if Turnilo can pull data from Druid using Kerberos/TLS - https://phabricator.wikimedia.org/T250485 (10BTullis) [23:31:36] 10Data-Platform-SRE: Enforce authentication for Kafka Jumbo Topics - https://phabricator.wikimedia.org/T255543 (10BTullis) [23:32:12] 10Data-Engineering, 10Data-Platform-SRE: Define priorities for HDFS data to be backed up - https://phabricator.wikimedia.org/T283261 (10BTullis) [23:33:42] 10Data-Engineering, 10Data-Platform-SRE: Turnilo "Display Druid query" gives "general error" - https://phabricator.wikimedia.org/T273685 (10BTullis) [23:34:13] 10Data-Platform-SRE: Allow users to differentiate their JupyterHub logs in Logstash - https://phabricator.wikimedia.org/T293243 (10BTullis) [23:35:19] 10Data-Engineering, 10Data-Platform-SRE: Enforce authentication and authorization for webrequest_* topics in Kafka jumbo-eqiad cluster - https://phabricator.wikimedia.org/T294264 (10BTullis) [23:36:40] 10Data-Engineering, 10Data-Platform-SRE, 10Data-Services, 10cloud-services-team, 10Epic: Plan a replacement for wiki replicas that is better suited to typical OLAP use cases than the MediaWiki OLTP schema - https://phabricator.wikimedia.org/T215858 (10BTullis) [23:37:46] 10Data-Engineering, 10Data-Engineering-Jupyter, 10Data-Platform-SRE, 10CAS-SSO, and 2 others: Allow login to JupyterHub via CAS - https://phabricator.wikimedia.org/T260386 (10BTullis) [23:38:37] 10Data-Engineering, 10Data-Engineering-Jupyter, 10Data-Platform-SRE, 10Product-Analytics: Functionality to share & view notebooks - https://phabricator.wikimedia.org/T156934 (10BTullis) [23:39:25] 10Data-Engineering, 10Data-Engineering-Jupyter, 10Data-Platform-SRE: Autocomplete is very slow (unusable) in Newpyter - https://phabricator.wikimedia.org/T290008 (10BTullis) [23:41:08] 10Data-Platform-SRE: Create conda .deb and docker image - https://phabricator.wikimedia.org/T304450 (10BTullis) [23:42:48] 10Data-Engineering, 10Data-Platform-SRE: Prometheus metrics for Spark 3 - https://phabricator.wikimedia.org/T298666 (10BTullis) [23:44:16] 10Data-Engineering, 10Data-Platform-SRE, 10SRE, 10SRE Observability: Grant IdempotentWrite Kafka Cluster ACL to User:ANONYOUS in all Kafka clusters - https://phabricator.wikimedia.org/T334733 (10BTullis) [23:46:00] 10Data-Engineering, 10Data-Platform-SRE, 10Infrastructure-Foundations, 10Event-Platform: > ~1 request/second to intake-logging.wikimedia.org times out at the traffic/service interface - https://phabricator.wikimedia.org/T264021 (10BTullis) [23:48:28] 10Data-Engineering, 10Data-Platform-SRE, 10Data Engineering and Event Platform Team, 10Data Pipelines, 10Spike: Explore Containerization Solutions for DE Applications - https://phabricator.wikimedia.org/T288254 (10BTullis) [23:48:38] 10Data-Engineering, 10Data-Platform-SRE, 10Data Engineering and Event Platform Team, 10SRE, and 3 others: DRY kafka broker declaration in helmfiles - https://phabricator.wikimedia.org/T253058 (10BTullis) [23:49:17] 10Data-Platform-SRE: Analytics coordinator failover improvements - https://phabricator.wikimedia.org/T280905 (10BTullis) [23:50:11] 10Data-Engineering, 10Data-Platform-SRE: NEW BUG REPORT remove mysql databases from SQLLab - https://phabricator.wikimedia.org/T337056 (10BTullis) [23:51:31] 10Data-Engineering: Upgrade Presto to access UDF library improvements - https://phabricator.wikimedia.org/T295589 (10BTullis) Those is now done, since we have presto version 0.281 deployed in production. [23:51:47] 10Data-Platform-SRE, 10Data Pipelines (Sprint 14): Upgrade Presto to release that aligns with Iceberg 1.2.1 - https://phabricator.wikimedia.org/T337335 (10BTullis) [23:51:53] 10Data-Engineering: Upgrade Presto to access UDF library improvements - https://phabricator.wikimedia.org/T295589 (10BTullis) 05Open→03Resolved a:03BTullis [23:53:56] 10Data-Platform-SRE, 10SRE, 10User-MoritzMuehlenhoff: Hadoop MapReduce port range cannot be configured to a fixed range - https://phabricator.wikimedia.org/T111433 (10BTullis) [23:54:46] 10Data-Platform-SRE: Bring druid10[09-11] into service - https://phabricator.wikimedia.org/T336042 (10BTullis) [23:55:20] 10Data-Platform-SRE: Decommission kafka-jumbo100[1-6] - https://phabricator.wikimedia.org/T336044 (10BTullis) [23:55:47] 10Data-Platform-SRE: Bring kafka-jumbo10[09-15] into service - https://phabricator.wikimedia.org/T336041 (10BTullis) [23:56:12] 10Data-Platform-SRE: Decommission druid100[4-6] - https://phabricator.wikimedia.org/T336043 (10BTullis) [23:56:48] 10Data-Platform-SRE: Bring stat1010 into service with GPU from stat1005 - https://phabricator.wikimedia.org/T336040 (10BTullis) [23:59:19] 10Data-Engineering, 10Data-Platform-SRE, 10Data Engineering and Event Platform Team, 10SRE-OnFire, and 4 others: Uneven CPU throttling of eventgate-analytics under load - https://phabricator.wikimedia.org/T325068 (10BTullis)