[01:00:29] (SystemdUnitFailed) firing: (10) jupyter-stevemunene-singleuser-conda-analytics.service Failed on an-test-client1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:02:27] (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-coord1002:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1002:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [05:05:13] (SystemdUnitFailed) firing: (10) jupyter-stevemunene-singleuser-conda-analytics.service Failed on an-test-client1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:32:27] (HiveServerHeapUsage) resolved: Hive Server JVM Heap usage is above 80% on an-coord1002:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1002:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [09:05:13] (SystemdUnitFailed) firing: (10) jupyter-stevemunene-singleuser-conda-analytics.service Failed on an-test-client1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:09:39] --˄ was me doing some tests for T333511 [09:09:40] T333511: jupyterhub-conda.service failure after hadooptest client bullseye upgrade - https://phabricator.wikimedia.org/T333511 [09:12:04] stevemunene: o/ puppet seems broken on some stat nodes, seems due to a user deletion that is not proceeding well. Do you have time to check later on? [09:13:40] thanks for the alert elukey having a look [09:44:10] these are their jupyterhub server services [10:33:28] all good on the stat servers elukey [10:57:50] 10Data-Engineering, 10Data-Persistence, 10IP Masking: Adding user_is_temp to the user table - https://phabricator.wikimedia.org/T333223 (10daniel) If we are thinking about adding `user_type`, I'd rather add `actor_type` to the actor table. That way, we'd finally have a good way to identify system users (so f... [11:03:27] (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-coord1002:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1002:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [11:19:27] (03PS1) 10Aqu: Migrate geoeditors monthly Druid ingestion to Airflow [analytics/refinery] - 10https://gerrit.wikimedia.org/r/913136 (https://phabricator.wikimedia.org/T334101) [11:23:27] (HiveServerHeapUsage) resolved: Hive Server JVM Heap usage is above 80% on an-coord1002:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1002:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [11:35:30] 10Data-Engineering, 10DBA, 10Infrastructure-Foundations, 10Machine-Learning-Team, and 9 others: codfw row C switches upgrade - https://phabricator.wikimedia.org/T334049 (10fgiunchedi) [12:04:00] 10Data-Engineering, 10DBA, 10Infrastructure-Foundations, 10Machine-Learning-Team, and 9 others: codfw row C switches upgrade - https://phabricator.wikimedia.org/T334049 (10MoritzMuehlenhoff) [12:19:26] 10Data-Engineering, 10Anti-Harassment, 10Event-Platform Value Stream, 10Privacy Engineering, and 3 others: Exposing revIDs (nothing more) of deleted/suppressed edits for research to respect their removal - https://phabricator.wikimedia.org/T200559 (10Ottomata) Is the [[ https://stream.wikimedia.org/?doc#/s... [12:43:36] 10Data-Engineering, 10Data-Persistence, 10IP Masking: Adding user_is_temp to the user table - https://phabricator.wikimedia.org/T333223 (10Ottomata) > If we wanted to add a user_type attribute, we would need much more work and discussion than the conversation we just had on this ticket. It's something that s... [12:58:59] 10Data-Engineering, 10Event-Platform Value Stream, 10Discovery-Search (Current work), 10Patch-For-Review: Add support for redirects - https://phabricator.wikimedia.org/T325315 (10Ottomata) [13:05:13] (SystemdUnitFailed) firing: (10) jupyter-stevemunene-singleuser-conda-analytics.service Failed on an-test-client1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:51:12] 10Data-Engineering, 10Data-Persistence, 10IP Masking: Adding user_is_temp to the user table - https://phabricator.wikimedia.org/T333223 (10Tgr) I brought up the `user_type`/`actor_type` flag because my perception was that schema changes of that kind are prohibitively slow, so not doing it now when we need to... [15:53:25] 10Data-Engineering, 10Event-Platform Value Stream: Fix eventutillites_python stream_manager error_sink configuration - https://phabricator.wikimedia.org/T335591 (10Ottomata) [15:55:08] 10Data-Engineering, 10Event-Platform Value Stream: Fix eventutillites_python stream_manager error_sink configuration - https://phabricator.wikimedia.org/T335591 (10Ottomata) [15:57:56] 10Data-Engineering, 10Data-Persistence, 10IP Masking: Adding user_is_temp to the user table - https://phabricator.wikimedia.org/T333223 (10Ottomata) > They are identifiable via User::isSystemUser() Right, but this has the same problem as a user is temp regex: it is only available in MediaWiki PHP. If we wan... [16:14:21] 10Data-Engineering, 10Data-Persistence, 10IP Masking: Adding user_is_temp to the user table - https://phabricator.wikimedia.org/T333223 (10Ladsgroup) List of system users is quite small and doesn't change often. We can simply expose it via an API call (if it's not already) so you wouldn't need to rely on mw... [17:05:13] (SystemdUnitFailed) firing: (10) jupyter-stevemunene-singleuser-conda-analytics.service Failed on an-test-client1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:23:52] 10Data-Engineering-Planning, 10XTools, 10Chinese-Sites, 10Data Pipelines (Sprint 12): Run maintain-views on zhwiki, newiki - https://phabricator.wikimedia.org/T334041 (10lbowmaker) [19:24:21] 10Data-Engineering-Planning, 10XTools, 10Chinese-Sites, 10Data Pipelines (Sprint 12): Run maintain-views on zhwiki, newiki - https://phabricator.wikimedia.org/T334041 (10lbowmaker) [20:00:37] 10Quarry, 10cloud-services-team (FY2022/2023-Q4): Consider moving Quarry to be an installation of a community supported analytics tool - https://phabricator.wikimedia.org/T169452 (10nskaggs) [20:52:08] 10Data-Engineering, 10Advanced-Search, 10All-and-every-Wikisource, 10ArticlePlaceholder, and 65 others: Remove unnecessary targets definitions - https://phabricator.wikimedia.org/T328497 (10Jdlrobson) p:05Triage→03High @kostajh https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/9... [21:05:13] (SystemdUnitFailed) firing: (10) jupyter-stevemunene-singleuser-conda-analytics.service Failed on an-test-client1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [23:33:11] 10Data-Engineering, 10DBA, 10Infrastructure-Foundations, 10Machine-Learning-Team, and 9 others: codfw row C switches upgrade - https://phabricator.wikimedia.org/T334049 (10colewhite)