[00:02:49] PROBLEM - Check systemd state on an-worker1125 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:09:05] PROBLEM - Check systemd state on an-worker1125 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:12:15] RECOVERY - Check systemd state on an-worker1125 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:19:51] PROBLEM - MegaRAID on an-worker1086 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [00:21:53] PROBLEM - Check systemd state on an-worker1125 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:23:31] RECOVERY - Check systemd state on an-worker1125 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:34:37] PROBLEM - Check systemd state on an-worker1125 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:41:01] RECOVERY - MegaRAID on an-worker1086 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [00:49:09] PROBLEM - Check systemd state on an-worker1125 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:55:37] RECOVERY - Check systemd state on an-worker1125 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:02:03] PROBLEM - Check systemd state on an-worker1125 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:11:45] RECOVERY - Check systemd state on an-worker1125 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:12:49] PROBLEM - MegaRAID on an-worker1086 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [01:21:27] PROBLEM - Check systemd state on an-worker1125 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:24:07] (03PS3) 10Cicalese: Update pingback MediaWiki versions to include new values [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/879595 (https://phabricator.wikimedia.org/T326825) [01:35:59] PROBLEM - Check systemd state on an-worker1125 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:40:49] RECOVERY - Check systemd state on an-worker1125 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:50:31] PROBLEM - Check systemd state on an-worker1125 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:55:23] RECOVERY - Check systemd state on an-worker1125 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [02:04:51] PROBLEM - Check systemd state on an-worker1125 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [02:08:03] PROBLEM - Check systemd state on an-worker1125 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [02:11:11] RECOVERY - Check systemd state on an-worker1125 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [02:22:21] PROBLEM - Check systemd state on an-worker1125 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [02:25:33] RECOVERY - Check systemd state on an-worker1125 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [02:36:51] PROBLEM - Check systemd state on an-worker1125 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [02:40:03] RECOVERY - Check systemd state on an-worker1125 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [02:50:02] 10Data-Engineering: an-worker1125 has been flapping non-stop - https://phabricator.wikimedia.org/T327042 (10Ladsgroup) [02:51:25] PROBLEM - Check systemd state on an-worker1125 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [02:56:15] RECOVERY - Check systemd state on an-worker1125 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [03:04:19] PROBLEM - Check systemd state on an-worker1125 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [03:10:47] RECOVERY - Check systemd state on an-worker1125 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [03:22:07] PROBLEM - Check systemd state on an-worker1125 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [03:30:37] RECOVERY - MegaRAID on an-worker1086 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [03:31:51] PROBLEM - Check systemd state on an-worker1125 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [03:39:55] RECOVERY - Check systemd state on an-worker1125 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [03:51:15] PROBLEM - Check systemd state on an-worker1125 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [03:57:45] RECOVERY - Check systemd state on an-worker1125 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [03:57:59] (03CR) 10Cicalese: "I have tested this on stat1007, and it performs as desired." [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/879595 (https://phabricator.wikimedia.org/T326825) (owner: 10Cicalese) [04:02:27] PROBLEM - MegaRAID on an-worker1086 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [04:07:25] PROBLEM - Check systemd state on an-worker1125 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [04:10:39] RECOVERY - Check systemd state on an-worker1125 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [04:21:57] PROBLEM - Check systemd state on an-worker1125 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [04:28:25] RECOVERY - Check systemd state on an-worker1125 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [04:38:11] PROBLEM - Check systemd state on an-worker1125 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [04:39:47] RECOVERY - Check systemd state on an-worker1125 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [04:52:45] PROBLEM - Check systemd state on an-worker1125 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [04:57:39] RECOVERY - Check systemd state on an-worker1125 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [05:08:57] PROBLEM - Check systemd state on an-worker1125 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [05:12:11] RECOVERY - Check systemd state on an-worker1125 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [05:21:53] PROBLEM - Check systemd state on an-worker1125 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [05:27:15] RECOVERY - MegaRAID on an-worker1086 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [05:28:21] RECOVERY - Check systemd state on an-worker1125 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [05:39:43] PROBLEM - Check systemd state on an-worker1125 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [05:41:19] RECOVERY - Check systemd state on an-worker1125 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [05:51:05] PROBLEM - Check systemd state on an-worker1125 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [05:53:19] PROBLEM - Hadoop NodeManager on an-worker1125 is CRITICAL: CHECK_NRPE: Error - Could not connect to 10.64.21.8: Connection reset by peer https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23Yarn_Nodemanager_process [05:56:33] PROBLEM - Hadoop DataNode on an-worker1125 is CRITICAL: NRPE: Call to popen() failed https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23HDFS_Datanode_process [05:56:33] PROBLEM - Hadoop NodeManager on an-worker1125 is CRITICAL: NRPE: Call to popen() failed https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23Yarn_Nodemanager_process [05:58:19] PROBLEM - puppet last run on an-worker1125 is CRITICAL: CHECK_NRPE: Error - Could not connect to 10.64.21.8: Connection reset by peer https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [05:59:07] PROBLEM - MegaRAID on an-worker1086 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [06:56:21] 10Data-Engineering-Planning, 10Data Pipelines (Sprint 05-06): Update Airflow DAGs code to make it compatible with version V2.3.4 of Airflow - https://phabricator.wikimedia.org/T309552 (10Antoine_Quhen) [07:52:22] (03PS8) 10Kosta Harlan: image-suggestions-feedback: Bump to version 2.0.0 [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/809150 (https://phabricator.wikimedia.org/T302925) [07:52:39] (03PS9) 10Kosta Harlan: image-suggestions-feedback: Bump to version 2.0.0 [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/809150 (https://phabricator.wikimedia.org/T302925) [08:38:01] RECOVERY - MegaRAID on an-worker1086 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [08:46:24] !log powercycle an-worker1125 - soft lockup traces registered in the tty, host frozen [08:46:27] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:50:17] RECOVERY - SSH on an-worker1125 is OK: SSH OK - OpenSSH_7.9p1 Debian-10+deb10u2 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring [08:50:19] RECOVERY - Hadoop DataNode on an-worker1125 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.hdfs.server.datanode.DataNode https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23HDFS_Datanode_process [08:50:19] RECOVERY - Hadoop NodeManager on an-worker1125 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23Yarn_Nodemanager_process [08:51:23] RECOVERY - Check systemd state on an-worker1125 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [08:51:57] PROBLEM - puppet last run on an-worker1125 is CRITICAL: CRITICAL: Puppet last ran 2 days ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:53:12] 1125 up and running, it should recover soon-ish in theory [08:57:33] RECOVERY - puppet last run on an-worker1125 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:58:50] (HdfsCorruptBlocks) firing: HDFS corrupt blocks detected on the analytics-hadoop HDFS cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_corrupt_blocks - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=39&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsCorruptBlocks [09:01:20] I think that --^ it is only a temporary issue due to the long downtime of the datanode, in the past it went away after a couple of hours [09:31:07] PROBLEM - MegaRAID on an-worker1086 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [10:20:18] 10Analytics-Wikistats, 10Data-Engineering-Planning, 10Data Pipelines: Non-mobile UAs on mobile (2g/gprs, etc) IP-blocks - https://phabricator.wikimedia.org/T58628 (10BTullis) I believe that we could obtain an estimate on user's connection speeds via the maxmind geoip databases to which we have access. From t... [10:52:27] elukey: Many thanks for picking those up. I'll keep an eye on the corrupt blocks one. Hopefully you're right. [10:55:03] 10Data-Engineering: an-worker1125 has been flapping non-stop - https://phabricator.wikimedia.org/T327042 (10BTullis) 05Open→03Resolved a:03BTullis The host has now been restarted. See https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log#2023-01-16 Apologies for the noise. We haven't usually seen nois... [11:08:08] btullis: np! Kafka test has also been restarted, new pki certs picked up etc.. [11:08:11] all good [12:20:34] 10Data-Engineering, 10Event-Platform Value Stream: Remove CommentFormatter from EventFactory constructor, or otherwise make its usage optional - https://phabricator.wikimedia.org/T327065 (10kostajh) [12:29:58] 10Data-Engineering, 10Event-Platform Value Stream, 10Growth-Team (Current Sprint): Remove CommentFormatter from EventFactory constructor, or otherwise make its usage optional - https://phabricator.wikimedia.org/T327065 (10kostajh) a:03kostajh [12:30:14] 10Data-Engineering, 10Event-Platform Value Stream, 10Growth-Team (Current Sprint): Remove CommentFormatter from EventFactory constructor, or otherwise make its usage optional - https://phabricator.wikimedia.org/T327065 (10kostajh) [12:38:24] 10Data-Engineering-Planning, 10Epic, 10Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-05)): Decide on installation details for new ceph cluster - https://phabricator.wikimedia.org/T326945 (10BTullis) I have been reading about progress with the way in which Ceph is containerized lately. My initial reasea... [12:58:50] (HdfsCorruptBlocks) firing: HDFS corrupt blocks detected on the analytics-hadoop HDFS cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_corrupt_blocks - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=39&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsCorruptBlocks [13:00:44] 10Data-Engineering, 10Event-Platform Value Stream, 10Growth-Team (Current Sprint), 10Patch-For-Review: Remove CommentFormatter from EventFactory constructor, or otherwise make its usage optional - https://phabricator.wikimedia.org/T327065 (10kostajh) 05Open→03In progress [13:18:50] (HdfsCorruptBlocks) resolved: HDFS corrupt blocks detected on the analytics-hadoop HDFS cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_corrupt_blocks - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=39&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsCorruptBlocks [13:29:02] --^ the corrupt blocks check recovered. \o/ [14:21:27] (03PS10) 10Snwachukwu: [WIP] Refactor and Expand External referer classification [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/864772 (https://phabricator.wikimedia.org/T309769) [14:24:13] (03PS11) 10Snwachukwu: [WIP] Refactor and Expand External referer classification [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/864772 (https://phabricator.wikimedia.org/T309769) [14:25:59] (03CR) 10CI reject: [V: 04-1] [WIP] Refactor and Expand External referer classification [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/864772 (https://phabricator.wikimedia.org/T309769) (owner: 10Snwachukwu) [14:37:32] 10Quarry, 10Tool-tsreports: RSS feeds - https://phabricator.wikimedia.org/T60830 (10Aklapper) p:05Triage→03Low [15:11:38] 10Data-Engineering-Planning, 10Epic, 10Patch-For-Review, 10Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-05)): Decide on installation details for new ceph cluster - https://phabricator.wikimedia.org/T326945 (10BTullis) I'll tentatively update the description to say that the decision has bveen made to... [15:12:22] 10Data-Engineering-Planning, 10Epic, 10Patch-For-Review, 10Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-05)): Decide on installation details for new ceph cluster - https://phabricator.wikimedia.org/T326945 (10BTullis) [15:28:59] Starting build #16 for job wikimedia-event-utilities-maven-release-docker [15:32:09] Project wikimedia-event-utilities-maven-release-docker build #16: 09SUCCESS in 3 min 10 sec: https://integration.wikimedia.org/ci/job/wikimedia-event-utilities-maven-release-docker/16/ [15:32:50] (03PS12) 10Snwachukwu: Refactor and Expand External referer classification [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/864772 (https://phabricator.wikimedia.org/T309769) [15:35:03] (03CR) 10CI reject: [V: 04-1] Refactor and Expand External referer classification [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/864772 (https://phabricator.wikimedia.org/T309769) (owner: 10Snwachukwu) [16:02:22] (03CR) 10Snwachukwu: Refactor and Expand External referer classification (0311 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/864772 (https://phabricator.wikimedia.org/T309769) (owner: 10Snwachukwu) [16:06:52] 10Data-Engineering-Planning, 10Data Pipelines (Sprint 07): Include EU Registered Country in the canonical country database - https://phabricator.wikimedia.org/T324995 (10EChetty) [16:08:00] 10Data-Engineering-Planning, 10Data Pipelines (Sprint 07): Build Druid Operator - https://phabricator.wikimedia.org/T309996 (10EChetty) [16:10:32] 10Data-Engineering-Planning, 10Data Pipelines (Sprint 07): Update Airflow DAGs code to make it compatible with version V2.3.4 of Airflow - https://phabricator.wikimedia.org/T309552 (10EChetty) [16:11:44] 10Data-Engineering-Planning, 10Data Pipelines (Sprint 07), 10Patch-For-Review, 10SecTeam-Processed, 10Vuln-VulnComponent: Upgrade Puppet code to make Airflow configuration files compatible with version 2.3.4 - https://phabricator.wikimedia.org/T315580 (10EChetty) [16:15:08] folks an-airflow1002 has only 3GB left in the root partition :( [16:36:17] 10Data-Engineering-Planning, 10Data Pipelines: Provide aggregated user device data per-country - https://phabricator.wikimedia.org/T325306 (10EChetty) [16:46:27] 10Data-Engineering-Planning, 10Data Pipelines: NEW FEATURE REQUEST: Dataset with active and non-active Wikis - https://phabricator.wikimedia.org/T323662 (10EChetty) [16:50:39] 10Data-Engineering-Planning, 10Data Pipelines: Fix mediawiki-history page computation for deleted pages having the same title - https://phabricator.wikimedia.org/T320860 (10EChetty) p:05Triage→03Low [16:58:53] 10Data-Engineering-Planning, 10Epic, 10Patch-For-Review, 10Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-05)): Decide on installation details for new ceph cluster - https://phabricator.wikimedia.org/T326945 (10MatthewVernon) I do think upstream haven't covered themselves in glory here - only cephadm f... [18:41:14] RECOVERY - MegaRAID on an-worker1086 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [19:12:58] PROBLEM - MegaRAID on an-worker1086 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [19:23:32] RECOVERY - MegaRAID on an-worker1086 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [20:05:50] PROBLEM - MegaRAID on an-worker1086 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [20:16:24] RECOVERY - MegaRAID on an-worker1086 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [21:19:54] PROBLEM - MegaRAID on an-worker1086 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring