[00:08:13] (DiskSpace) firing: Disk space stat1005:9100:/ 5.689% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=stat1005 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [00:22:38] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 12), 10Patch-For-Review: mediawiki-event-enrichment: issue async requests from ProcessFunction - https://phabricator.wikimedia.org/T332948 (10Ottomata) Today, I deployed with Flink 1.17 and python 3.9. I did this mostly because it looked like it was... [00:23:51] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 12), 10Patch-For-Review: mediawiki-event-enrichment: issue async requests from ProcessFunction - https://phabricator.wikimedia.org/T332948 (10Ottomata) I wonder if we should explore [[ https://flink.apache.org/2022/05/06/exploring-the-thread-mode-in-p... [00:26:54] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 12), 10Patch-For-Review: mediawiki-event-enrichment: issue async requests from ProcessFunction - https://phabricator.wikimedia.org/T332948 (10Ottomata) [[ https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/deployment/memory/mem_trouble/#o... [01:18:31] (SystemdUnitFailed) firing: (19) hadoop-yarn-nodemanager.service Failed on an-test-worker1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:54:27] (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-coord1002:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1002:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [03:09:27] (HiveServerHeapUsage) resolved: Hive Server JVM Heap usage is above 80% on an-coord1002:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1002:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [04:08:13] (DiskSpace) firing: Disk space stat1005:9100:/ 5.648% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=stat1005 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [04:55:27] (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-coord1002:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1002:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [05:10:27] (HiveServerHeapUsage) resolved: Hive Server JVM Heap usage is above 80% on an-coord1002:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1002:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [05:18:33] (SystemdUnitFailed) firing: (19) hadoop-yarn-nodemanager.service Failed on an-test-worker1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:53:27] (SystemdUnitFailed) firing: (20) hadoop-yarn-nodemanager.service Failed on an-test-worker1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:58:27] (SystemdUnitFailed) firing: (20) hadoop-yarn-nodemanager.service Failed on an-test-worker1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:08:27] (SystemdUnitFailed) firing: (21) hadoop-yarn-nodemanager.service Failed on an-test-worker1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:08:13] (DiskSpace) firing: Disk space stat1005:9100:/ 5.683% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=stat1005 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [08:48:27] (SystemdUnitFailed) firing: (20) jupyter-btullis-singleuser-conda-analytics.service Failed on an-test-client1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:01:51] (03Abandoned) 10Hashar: [Full dump analysis] Reduce edits_only and reverts_only intricacy [analytics/wikistats] - 10https://gerrit.wikimedia.org/r/118436 (owner: 10Nemo bis) [09:01:55] (03Abandoned) 10Hashar: Archives are downloaded in .txt.gz format: fix matching and opening [analytics/wikistats] - 10https://gerrit.wikimedia.org/r/92066 (owner: 10Nemo bis) [09:01:58] (03Abandoned) 10Hashar: Remove all trailing whitespace [analytics/wikistats] - 10https://gerrit.wikimedia.org/r/145862 (owner: 10Nemo bis) [09:02:01] (03Abandoned) 10Hashar: Comment some path tests which overrode standard ones [analytics/wikistats] - 10https://gerrit.wikimedia.org/r/118261 (owner: 10Nemo bis) [09:08:13] (03PS1) 10Hashar: Archive repository [analytics/wikistats] - 10https://gerrit.wikimedia.org/r/914713 (https://phabricator.wikimedia.org/T332004) [09:08:33] (03CR) 10CI reject: [V: 04-1] Archive repository [analytics/wikistats] - 10https://gerrit.wikimedia.org/r/914713 (https://phabricator.wikimedia.org/T332004) (owner: 10Hashar) [09:08:36] (03CR) 10Hashar: [V: 03+2 C: 03+2] Archive repository [analytics/wikistats] - 10https://gerrit.wikimedia.org/r/914713 (https://phabricator.wikimedia.org/T332004) (owner: 10Hashar) [09:19:43] 10Analytics-Radar, 10Data-Engineering, 10Data-Engineering-Wikistats, 10Diffusion-Repository-Administrators, and 4 others: Archive analytics/wikistats - https://phabricator.wikimedia.org/T332004 (10hashar) [09:21:02] 10Analytics-Radar, 10Data-Engineering, 10Data-Engineering-Wikistats, 10Diffusion-Repository-Administrators, and 4 others: Archive analytics/wikistats - https://phabricator.wikimedia.org/T332004 (10hashar) [09:22:05] 10Analytics-Radar, 10Data-Engineering, 10Data-Engineering-Wikistats, 10Diffusion-Repository-Administrators, and 4 others: Archive analytics/wikistats - https://phabricator.wikimedia.org/T332004 (10hashar) [09:23:03] 10Analytics-Radar, 10Data-Engineering, 10Data-Engineering-Wikistats, 10Diffusion-Repository-Administrators, and 4 others: Archive analytics/wikistats - https://phabricator.wikimedia.org/T332004 (10hashar) I have archived the repository and in Diffusion. Deleted the Github mirror and removed the repository... [09:33:02] 10Analytics-Radar, 10Data-Engineering, 10Data-Engineering-Wikistats, 10Diffusion-Repository-Administrators, and 4 others: Archive analytics/wikistats - https://phabricator.wikimedia.org/T332004 (10hashar) [09:37:33] 10Analytics-Radar, 10Data-Engineering, 10Data-Engineering-Wikistats, 10Diffusion-Repository-Administrators, and 4 others: Archive analytics/wikistats - https://phabricator.wikimedia.org/T332004 (10hashar) 05Open→03Resolved a:03hashar I could not find anything related in Wikidata. Thus I think the las... [09:43:27] (SystemdUnitFailed) firing: (20) jupyter-btullis-singleuser-conda-analytics.service Failed on an-test-client1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:03:28] (SystemdUnitFailed) firing: (20) jupyter-btullis-singleuser-conda-analytics.service Failed on an-test-client1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:08:27] (SystemdUnitFailed) firing: (20) jupyter-btullis-singleuser-conda-analytics.service Failed on an-test-client1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:28:51] 10Data-Engineering, 10Data-Persistence, 10IP Masking: Adding user_is_temp to the user table - https://phabricator.wikimedia.org/T333223 (10JayCano) The conversation about `user_type`/`actor_type` is out of scope for the work that we are currently doing. The problem we are trying to solve with this ticket is... [10:37:27] (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-coord1002:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1002:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [10:48:28] (SystemdUnitFailed) firing: (20) jupyter-btullis-singleuser-conda-analytics.service Failed on an-test-client1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:08:27] (SystemdUnitFailed) firing: (20) jupyter-btullis-singleuser-conda-analytics.service Failed on an-test-client1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:32:27] (HiveServerHeapUsage) resolved: Hive Server JVM Heap usage is above 80% on an-coord1002:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1002:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [12:08:13] (DiskSpace) firing: Disk space stat1005:9100:/ 5.206% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=stat1005 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [14:03:18] (03PS2) 10Milimetric: Adapt virtualpageview druid scripts to spark [analytics/refinery] - 10https://gerrit.wikimedia.org/r/912360 (https://phabricator.wikimedia.org/T334105) [14:14:15] (03PS3) 10Aqu: Migrate geoeditors monthly Druid ingestion to Airflow [analytics/refinery] - 10https://gerrit.wikimedia.org/r/913136 (https://phabricator.wikimedia.org/T334101) [14:14:53] (03CR) 10Aqu: [V: 03+2 C: 03+2] "Thanks for the reviews!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/913136 (https://phabricator.wikimedia.org/T334101) (owner: 10Aqu) [14:20:23] (03CR) 10Aqu: "You may convert the script from the Hive syntax to the Spark one: https://gerrit.wikimedia.org/r/c/analytics/refinery/+/911890" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/912360 (https://phabricator.wikimedia.org/T334105) (owner: 10Milimetric) [14:26:42] (03PS3) 10Milimetric: Adapt virtualpageview druid scripts to spark [analytics/refinery] - 10https://gerrit.wikimedia.org/r/912360 (https://phabricator.wikimedia.org/T334105) [14:27:39] (03CR) 10Milimetric: "Done. I don't agree with the coalesce because I don't think it matters how many files we have in the temp table, and we should let Spark " [analytics/refinery] - 10https://gerrit.wikimedia.org/r/912360 (https://phabricator.wikimedia.org/T334105) (owner: 10Milimetric) [14:29:12] 10Data-Engineering, 10SRE, 10SRE Observability, 10Event-Platform Value Stream (Sprint 12): Grant IdempotentWrite Kafka Cluster ACL to User:ANONYOUS in all Kafka clusters - https://phabricator.wikimedia.org/T334733 (10Ottomata) Done for Kafka main. We should do this for Kafka logging as well, so that when... [14:29:50] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 12): eventutilities-python should support using Kafka TLS ports - https://phabricator.wikimedia.org/T331526 (10Ottomata) [14:46:29] 10Data-Engineering, 10SRE, 10SRE Observability, 10Event-Platform Value Stream (Sprint 12): Grant IdempotentWrite Kafka Cluster ACL to User:ANONYOUS in all Kafka clusters - https://phabricator.wikimedia.org/T334733 (10Ottomata) Done for logging clusters, and we all done! [14:57:38] (03PS1) 10Nick Ifeajika: test for knowledge-gaps loading [analytics/refinery] - 10https://gerrit.wikimedia.org/r/914799 [14:57:40] (03CR) 10Welcome, new contributor!: "Thank you for making your first contribution to Wikimedia! :) To learn how to get your code changes reviewed faster and more likely to get" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/914799 (owner: 10Nick Ifeajika) [14:59:57] 10Data-Engineering, 10Metrics-Platform-Planning, 10Product-Analytics, 10WMF-Architecture-Team, 10Event-Platform Value Stream (Sprint 12): Major (API) versioning of Event Platform streams - https://phabricator.wikimedia.org/T332212 (10Ottomata) Alright, no objections and May 1 has come and gone. I've doc... [15:08:31] (SystemdUnitFailed) firing: (19) hadoop-yarn-nodemanager.service Failed on an-test-worker1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [16:08:13] (DiskSpace) firing: Disk space stat1005:9100:/ 5.148% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=stat1005 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [16:24:14] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 12), 10Patch-For-Review: mediawiki-event-enrichment: issue async requests from ProcessFunction - https://phabricator.wikimedia.org/T332948 (10Ottomata) I deployed a [[ https://gitlab.wikimedia.org/repos/data-engineering/mediawiki-event-enrichment/-/ta... [16:57:03] (03CR) 10Aqu: [C: 03+2] Migrate unique devices druid loading queries to Airflow/SparkSQL [analytics/refinery] - 10https://gerrit.wikimedia.org/r/910092 (https://phabricator.wikimedia.org/T334096) (owner: 10Mforns) [16:57:25] thanks aqu! [17:32:12] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 12), 10Patch-For-Review: mediawiki-event-enrichment: issue async requests from ProcessFunction - https://phabricator.wikimedia.org/T332948 (10Ottomata) I decided to try again resetting `python.fn-execution-bundle.size` to it's default of 1000. At fir... [17:35:21] (03CR) 10Milimetric: [V: 03+2 C: 03+2] Migrate queries for webrequest_sampled_128 to /hql (Airflow/Spark3) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/911890 (https://phabricator.wikimedia.org/T334106) (owner: 10Mforns) [17:38:54] (03CR) 10Milimetric: [C: 03+2] Fix HiveToDruid to allow for non-partitioned source tables. [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/910094 (https://phabricator.wikimedia.org/T334096) (owner: 10Mforns) [17:47:16] (03Merged) 10jenkins-bot: Fix HiveToDruid to allow for non-partitioned source tables. [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/910094 (https://phabricator.wikimedia.org/T334096) (owner: 10Mforns) [17:55:09] 10Quarry: superset not showing all user queries - https://phabricator.wikimedia.org/T335903 (10rook) [18:13:08] (03CR) 10Mforns: [V: 03+2] Migrate unique devices druid loading queries to Airflow/SparkSQL [analytics/refinery] - 10https://gerrit.wikimedia.org/r/910092 (https://phabricator.wikimedia.org/T334096) (owner: 10Mforns) [18:13:55] (03PS1) 10Milimetric: Update changelog.md with v0.2.14 changes [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/914880 [18:16:37] (03CR) 10Milimetric: [V: 03+2 C: 03+2] Update changelog.md with v0.2.14 changes [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/914880 (owner: 10Milimetric) [18:23:32] (03Merged) 10jenkins-bot: Update changelog.md with v0.2.14 changes [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/914880 (owner: 10Milimetric) [18:26:06] Starting build #119 for job analytics-refinery-maven-release-docker [18:33:46] 10Quarry: superset not showing all user queries - https://phabricator.wikimedia.org/T335903 (10rook) Perhaps related https://github.com/apache/superset/issues/20604 [18:33:58] 10Quarry, 10cloud-services-team (FY2022/2023-Q4): Consider moving Quarry to be an installation of a community supported analytics tool - https://phabricator.wikimedia.org/T169452 (10rook) [18:34:00] 10Quarry: superset not showing all user queries - https://phabricator.wikimedia.org/T335903 (10rook) [18:39:33] Project analytics-refinery-maven-release-docker build #119: 09SUCCESS in 13 min: https://integration.wikimedia.org/ci/job/analytics-refinery-maven-release-docker/119/ [19:08:31] (SystemdUnitFailed) firing: (19) hadoop-yarn-nodemanager.service Failed on an-test-worker1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:42:15] 10Data-Engineering-Planning, 10Patch-For-Review, 10Shared-Data-Infrastructure (Shared-Data-Infra Sprint 12): Upgrade Hadoop test cluster to Bullseye - https://phabricator.wikimedia.org/T329363 (10BTullis) [19:42:17] 10Data-Engineering-Planning, 10Shared-Data-Infrastructure, 10Epic: Upgrade the Data Engineering infrastructure to Debian Bullseye - https://phabricator.wikimedia.org/T288804 (10BTullis) [20:08:13] (DiskSpace) firing: Disk space stat1005:9100:/ 5.089% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=stat1005 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [20:10:27] 10Data-Engineering, 10Data-Persistence, 10IP Masking: Adding user_is_temp to the user table - https://phabricator.wikimedia.org/T333223 (10daniel) >>! In T333223#8823190, @JayCano wrote: > The conversation about `user_type`/`actor_type` is out of scope for the work that we are currently doing. The problem we... [20:23:43] Starting build #78 for job analytics-refinery-update-jars-docker [20:24:02] (03PS1) 10Maven-release-user: Add refinery-source jars for v0.2.14 to artifacts [analytics/refinery] - 10https://gerrit.wikimedia.org/r/914810 [20:24:02] Project analytics-refinery-update-jars-docker build #78: 09SUCCESS in 19 sec: https://integration.wikimedia.org/ci/job/analytics-refinery-update-jars-docker/78/ [20:29:43] 10Data-Engineering, 10Data Pipelines: Update Sqoop for externallinks table changes - https://phabricator.wikimedia.org/T335917 (10Milimetric) [20:36:07] (03CR) 10Milimetric: [V: 03+2 C: 03+2] Add refinery-source jars for v0.2.14 to artifacts [analytics/refinery] - 10https://gerrit.wikimedia.org/r/914810 (owner: 10Maven-release-user) [20:38:39] 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 12): Flink Enrichment monitoring - https://phabricator.wikimedia.org/T328925 (10Ottomata) Been doing [[ https://grafana.wikimedia.org/goto/LEnBpfsVk?orgId=1 | lots of work on the dashboard ]] while debugging memory issues for T332948. [21:43:45] !log deployed refinery-source and refinery to prepare for launching new airflow druid jobs [21:43:46] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [22:24:27] (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-coord1002:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1002:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [22:32:06] 10Data-Engineering, 10Advanced-Search, 10All-and-every-Wikisource, 10ArticlePlaceholder, and 65 others: Remove unnecessary targets definitions - https://phabricator.wikimedia.org/T328497 (10Jdlrobson) [22:44:27] (HiveServerHeapUsage) resolved: Hive Server JVM Heap usage is above 80% on an-coord1002:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1002:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [23:08:31] (SystemdUnitFailed) firing: (19) hadoop-yarn-nodemanager.service Failed on an-test-worker1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed