[00:37:29] PROBLEM - Check systemd state on an-worker1132 is CRITICAL: CRITICAL - degraded: The following units failed: session-c624.scope https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:29:13] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: monitor_refine_event.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [07:04:37] PROBLEM - Check systemd state on an-worker1132 is CRITICAL: CRITICAL - degraded: The following units failed: session-c624.scope https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [07:08:25] PROBLEM - Check systemd state on an-worker1132 is CRITICAL: CRITICAL - degraded: The following units failed: session-c624.scope https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [07:42:04] !log Rerun failed refine_event job [07:42:05] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [07:44:23] RECOVERY - Check systemd state on an-worker1132 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [07:45:06] !log reset failed session-c624.scope as last issue was on March 14 on an-worker1132 [07:45:07] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [09:52:40] 10Analytics-Radar, 10Data-Engineering-Icebox, 10Machine-Learning-Team, 10Patch-For-Review: Upgrade ROCm to 4.5 - https://phabricator.wikimedia.org/T295661 (10elukey) Upstream already reached 5.x, we should probably upgrade to a more recent version as well to keep up and have better support (especially if w... [10:16:15] (03CR) 10Aqu: "Do you mind merging the 2 first arguments into 1?" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/900389 (https://phabricator.wikimedia.org/T330200) (owner: 10Snwachukwu) [12:08:04] 10Data-Engineering-Planning, 10DBA, 10Data Pipelines, 10Infrastructure-Foundations, and 9 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10MatthewVernon) [12:45:44] 10Data-Engineering-Planning, 10Shared-Data-Infrastructure (Shared-Data-Infra Sprint 10): Deploy ceph mon processes to data-engineering cluster - https://phabricator.wikimedia.org/T330149 (10BTullis) The monitor processes are now running. ` btullis@cephosd1001:~$ sudo ceph -s cluster: id: 6d4278e1-ea4... [13:14:25] 10Data-Engineering-Planning: Airflow ArchiveOperator should have a number of retries of 0 - https://phabricator.wikimedia.org/T332216 (10lbowmaker) [13:14:48] 10Data-Engineering-Planning, 10Data Pipelines: Airflow ArchiveOperator should have a number of retries of 0 - https://phabricator.wikimedia.org/T332216 (10lbowmaker) [13:16:36] 10Data-Engineering-Planning: Airflow skein hook shouldn't fail when not managing to gather yarn logs - https://phabricator.wikimedia.org/T332215 (10lbowmaker) [13:16:53] 10Data-Engineering-Planning, 10Data Pipelines: Airflow skein hook shouldn't fail when not managing to gather yarn logs - https://phabricator.wikimedia.org/T332215 (10lbowmaker) [15:49:11] 10Analytics, 10AQS 2.0 Roadmap, 10API Platform (API Platform Roadmap), 10Epic, and 2 others: AQS 2.0 documentation - https://phabricator.wikimedia.org/T288664 (10JArguello-WMF) [16:20:16] 10Data-Engineering-Planning, 10Shared-Data-Infrastructure (Shared-Data-Infra Sprint 10): Deploy ceph mon and mgr processes to data-engineering cluster - https://phabricator.wikimedia.org/T330149 (10BTullis) [16:21:38] 10Data-Engineering-Planning, 10Data Pipelines: Load wmf.unique_editors_by_country_monthly into Druid for access in Turnilo & Superset - https://phabricator.wikimedia.org/T330436 (10JArguello-WMF) [16:23:57] 10Data-Engineering-Planning, 10Data Pipelines: Deprecate old mobile datasets - https://phabricator.wikimedia.org/T329310 (10JArguello-WMF) [16:26:23] 10Analytics, 10Data-Engineering-Planning, 10Data Pipelines: Add cawiki to clickstream dataset - https://phabricator.wikimedia.org/T327982 (10JArguello-WMF) [16:27:41] 10Analytics, 10Data-Engineering-Planning, 10Data Pipelines: Add cawiki to clickstream dataset - https://phabricator.wikimedia.org/T327982 (10JArguello-WMF) For this ticket we will need to coordinate with other teams, for example, Research [16:36:38] 10Data-Engineering-Icebox: Add redirects in bot detection features - https://phabricator.wikimedia.org/T326337 (10JArguello-WMF) [16:37:32] 10Data-Engineering-Planning, 10Data Pipelines: Use RDD checkpointing in Mediawiki-History spark job - https://phabricator.wikimedia.org/T331003 (10JArguello-WMF) [17:39:04] 10Analytics, 10Data-Engineering, 10Event-Platform Value Stream, 10EventStreams, and 2 others: Expose rdf-streaming-updater.mutation content through EventStreams - https://phabricator.wikimedia.org/T294133 (10Sj) Thanks @Gehel for catching the duplicate. Suggested edits for this (merged) issue: * The merge... [19:12:21] 10Data-Engineering-Planning: Assign Superset sql_labs access through customer roles - https://phabricator.wikimedia.org/T331160 (10lbowmaker) [19:12:37] 10Data-Engineering-Planning, 10Data Pipelines: Assign Superset sql_labs access through customer roles - https://phabricator.wikimedia.org/T331160 (10lbowmaker) [19:38:07] 10Data-Engineering, 10Data-Engineering-Jupyter, 10Infrastructure-Foundations, 10CAS-SSO, 10User-MoritzMuehlenhoff: Allow login to JupyterHub via CAS - https://phabricator.wikimedia.org/T260386 (10Mayakp.wiki) The new Okta dashboard looks swell !! now I was wondering... if only I had Jupyter notebooks on...