[00:02:49] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1125 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:09:05] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1125 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:12:15] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1125 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:19:51] <icinga-wm>	 PROBLEM - MegaRAID on an-worker1086 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[00:21:53] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1125 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:23:31] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1125 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:34:37] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1125 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:41:01] <icinga-wm>	 RECOVERY - MegaRAID on an-worker1086 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[00:49:09] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1125 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:55:37] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1125 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[01:02:03] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1125 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[01:11:45] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1125 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[01:12:49] <icinga-wm>	 PROBLEM - MegaRAID on an-worker1086 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[01:21:27] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1125 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[01:24:07] <wikibugs>	 (03PS3) 10Cicalese: Update pingback MediaWiki versions to include new values [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/879595 (https://phabricator.wikimedia.org/T326825)
[01:35:59] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1125 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[01:40:49] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1125 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[01:50:31] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1125 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[01:55:23] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1125 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[02:04:51] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1125 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[02:08:03] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1125 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[02:11:11] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1125 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[02:22:21] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1125 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[02:25:33] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1125 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[02:36:51] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1125 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[02:40:03] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1125 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[02:50:02] <wikibugs>	 10Data-Engineering: an-worker1125 has been flapping non-stop - https://phabricator.wikimedia.org/T327042 (10Ladsgroup)
[02:51:25] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1125 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[02:56:15] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1125 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[03:04:19] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1125 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[03:10:47] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1125 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[03:22:07] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1125 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[03:30:37] <icinga-wm>	 RECOVERY - MegaRAID on an-worker1086 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[03:31:51] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1125 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[03:39:55] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1125 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[03:51:15] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1125 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[03:57:45] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1125 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[03:57:59] <wikibugs>	 (03CR) 10Cicalese: "I have tested this on stat1007, and it performs as desired." [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/879595 (https://phabricator.wikimedia.org/T326825) (owner: 10Cicalese)
[04:02:27] <icinga-wm>	 PROBLEM - MegaRAID on an-worker1086 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[04:07:25] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1125 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[04:10:39] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1125 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[04:21:57] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1125 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[04:28:25] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1125 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[04:38:11] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1125 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[04:39:47] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1125 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[04:52:45] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1125 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[04:57:39] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1125 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[05:08:57] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1125 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[05:12:11] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1125 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[05:21:53] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1125 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[05:27:15] <icinga-wm>	 RECOVERY - MegaRAID on an-worker1086 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[05:28:21] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1125 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[05:39:43] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1125 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[05:41:19] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1125 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[05:51:05] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1125 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[05:53:19] <icinga-wm>	 PROBLEM - Hadoop NodeManager on an-worker1125 is CRITICAL: CHECK_NRPE: Error - Could not connect to 10.64.21.8: Connection reset by peer https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23Yarn_Nodemanager_process
[05:56:33] <icinga-wm>	 PROBLEM - Hadoop DataNode on an-worker1125 is CRITICAL: NRPE: Call to popen() failed https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23HDFS_Datanode_process
[05:56:33] <icinga-wm>	 PROBLEM - Hadoop NodeManager on an-worker1125 is CRITICAL: NRPE: Call to popen() failed https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23Yarn_Nodemanager_process
[05:58:19] <icinga-wm>	 PROBLEM - puppet last run on an-worker1125 is CRITICAL: CHECK_NRPE: Error - Could not connect to 10.64.21.8: Connection reset by peer https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[05:59:07] <icinga-wm>	 PROBLEM - MegaRAID on an-worker1086 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[06:56:21] <wikibugs>	 10Data-Engineering-Planning, 10Data Pipelines (Sprint 05-06): Update Airflow DAGs code to make it compatible with version V2.3.4 of Airflow - https://phabricator.wikimedia.org/T309552 (10Antoine_Quhen)
[07:52:22] <wikibugs>	 (03PS8) 10Kosta Harlan: image-suggestions-feedback: Bump to version 2.0.0 [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/809150 (https://phabricator.wikimedia.org/T302925)
[07:52:39] <wikibugs>	 (03PS9) 10Kosta Harlan: image-suggestions-feedback: Bump to version 2.0.0 [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/809150 (https://phabricator.wikimedia.org/T302925)
[08:38:01] <icinga-wm>	 RECOVERY - MegaRAID on an-worker1086 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[08:46:24] <elukey>	 !log powercycle an-worker1125 - soft lockup traces registered in the tty, host frozen 
[08:46:27] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[08:50:17] <icinga-wm>	 RECOVERY - SSH on an-worker1125 is OK: SSH OK - OpenSSH_7.9p1 Debian-10+deb10u2 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring
[08:50:19] <icinga-wm>	 RECOVERY - Hadoop DataNode on an-worker1125 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.hdfs.server.datanode.DataNode https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23HDFS_Datanode_process
[08:50:19] <icinga-wm>	 RECOVERY - Hadoop NodeManager on an-worker1125 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23Yarn_Nodemanager_process
[08:51:23] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1125 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[08:51:57] <icinga-wm>	 PROBLEM - puppet last run on an-worker1125 is CRITICAL: CRITICAL: Puppet last ran 2 days ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[08:53:12] <elukey>	 1125 up and running, it should recover soon-ish in theory
[08:57:33] <icinga-wm>	 RECOVERY - puppet last run on an-worker1125 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[08:58:50] <jinxer-wm>	 (HdfsCorruptBlocks) firing: HDFS corrupt blocks detected on the analytics-hadoop HDFS cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_corrupt_blocks - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=39&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsCorruptBlocks
[09:01:20] <elukey>	 I think that --^ it is only a temporary issue due to the long downtime of the datanode, in the past it went away after a couple of hours
[09:31:07] <icinga-wm>	 PROBLEM - MegaRAID on an-worker1086 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[10:20:18] <wikibugs>	 10Analytics-Wikistats, 10Data-Engineering-Planning, 10Data Pipelines: Non-mobile UAs on mobile (2g/gprs, etc) IP-blocks - https://phabricator.wikimedia.org/T58628 (10BTullis) I believe that we could obtain an estimate on user's connection speeds via the maxmind geoip databases to which we have access. From t...
[10:52:27] <btullis>	 elukey: Many thanks for picking those up. I'll keep an eye on the corrupt blocks one. Hopefully you're right.
[10:55:03] <wikibugs>	 10Data-Engineering: an-worker1125 has been flapping non-stop - https://phabricator.wikimedia.org/T327042 (10BTullis) 05Open→03Resolved a:03BTullis The host has now been restarted. See https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log#2023-01-16  Apologies for the noise. We haven't usually seen nois...
[11:08:08] <elukey>	 btullis: np! Kafka test has also been restarted, new pki certs picked up etc..
[11:08:11] <elukey>	 all good
[12:20:34] <wikibugs>	 10Data-Engineering, 10Event-Platform Value Stream: Remove CommentFormatter from EventFactory constructor, or otherwise make its usage optional - https://phabricator.wikimedia.org/T327065 (10kostajh)
[12:29:58] <wikibugs>	 10Data-Engineering, 10Event-Platform Value Stream, 10Growth-Team (Current Sprint): Remove CommentFormatter from EventFactory constructor, or otherwise make its usage optional - https://phabricator.wikimedia.org/T327065 (10kostajh) a:03kostajh
[12:30:14] <wikibugs>	 10Data-Engineering, 10Event-Platform Value Stream, 10Growth-Team (Current Sprint): Remove CommentFormatter from EventFactory constructor, or otherwise make its usage optional - https://phabricator.wikimedia.org/T327065 (10kostajh)
[12:38:24] <wikibugs>	 10Data-Engineering-Planning, 10Epic, 10Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-05)): Decide on installation details for new ceph cluster - https://phabricator.wikimedia.org/T326945 (10BTullis) I have been reading about progress with the way in which Ceph is containerized lately.  My initial reasea...
[12:58:50] <jinxer-wm>	 (HdfsCorruptBlocks) firing: HDFS corrupt blocks detected on the analytics-hadoop HDFS cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_corrupt_blocks - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=39&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsCorruptBlocks
[13:00:44] <wikibugs>	 10Data-Engineering, 10Event-Platform Value Stream, 10Growth-Team (Current Sprint), 10Patch-For-Review: Remove CommentFormatter from EventFactory constructor, or otherwise make its usage optional - https://phabricator.wikimedia.org/T327065 (10kostajh) 05Open→03In progress
[13:18:50] <jinxer-wm>	 (HdfsCorruptBlocks) resolved: HDFS corrupt blocks detected on the analytics-hadoop HDFS cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_corrupt_blocks - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=39&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsCorruptBlocks
[13:29:02] <btullis>	 --^ the corrupt blocks check recovered. \o/
[14:21:27] <wikibugs>	 (03PS10) 10Snwachukwu: [WIP] Refactor and Expand External referer classification [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/864772 (https://phabricator.wikimedia.org/T309769)
[14:24:13] <wikibugs>	 (03PS11) 10Snwachukwu: [WIP] Refactor and Expand External referer classification [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/864772 (https://phabricator.wikimedia.org/T309769)
[14:25:59] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] [WIP] Refactor and Expand External referer classification [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/864772 (https://phabricator.wikimedia.org/T309769) (owner: 10Snwachukwu)
[14:37:32] <wikibugs>	 10Quarry, 10Tool-tsreports: RSS feeds - https://phabricator.wikimedia.org/T60830 (10Aklapper) p:05Triage→03Low
[15:11:38] <wikibugs>	 10Data-Engineering-Planning, 10Epic, 10Patch-For-Review, 10Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-05)): Decide on installation details for new ceph cluster - https://phabricator.wikimedia.org/T326945 (10BTullis) I'll tentatively update the description to say that the decision has bveen made to...
[15:12:22] <wikibugs>	 10Data-Engineering-Planning, 10Epic, 10Patch-For-Review, 10Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-05)): Decide on installation details for new ceph cluster - https://phabricator.wikimedia.org/T326945 (10BTullis)
[15:28:59] <wmf-insecte>	 Starting build #16 for job wikimedia-event-utilities-maven-release-docker
[15:32:09] <wmf-insecte>	 Project wikimedia-event-utilities-maven-release-docker build #16: 09SUCCESS in 3 min 10 sec: https://integration.wikimedia.org/ci/job/wikimedia-event-utilities-maven-release-docker/16/
[15:32:50] <wikibugs>	 (03PS12) 10Snwachukwu: Refactor and Expand External referer classification [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/864772 (https://phabricator.wikimedia.org/T309769)
[15:35:03] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Refactor and Expand External referer classification [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/864772 (https://phabricator.wikimedia.org/T309769) (owner: 10Snwachukwu)
[16:02:22] <wikibugs>	 (03CR) 10Snwachukwu: Refactor and Expand External referer classification (0311 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/864772 (https://phabricator.wikimedia.org/T309769) (owner: 10Snwachukwu)
[16:06:52] <wikibugs>	 10Data-Engineering-Planning, 10Data Pipelines (Sprint 07): Include EU Registered Country in the canonical country database - https://phabricator.wikimedia.org/T324995 (10EChetty)
[16:08:00] <wikibugs>	 10Data-Engineering-Planning, 10Data Pipelines (Sprint 07): Build Druid Operator - https://phabricator.wikimedia.org/T309996 (10EChetty)
[16:10:32] <wikibugs>	 10Data-Engineering-Planning, 10Data Pipelines (Sprint 07): Update Airflow DAGs code to make it compatible with version V2.3.4 of Airflow - https://phabricator.wikimedia.org/T309552 (10EChetty)
[16:11:44] <wikibugs>	 10Data-Engineering-Planning, 10Data Pipelines (Sprint 07), 10Patch-For-Review, 10SecTeam-Processed, 10Vuln-VulnComponent: Upgrade Puppet code to make Airflow configuration files compatible with version 2.3.4 - https://phabricator.wikimedia.org/T315580 (10EChetty)
[16:15:08] <elukey>	 folks an-airflow1002 has only 3GB left in the root partition :(
[16:36:17] <wikibugs>	 10Data-Engineering-Planning, 10Data Pipelines: Provide aggregated user device data per-country - https://phabricator.wikimedia.org/T325306 (10EChetty)
[16:46:27] <wikibugs>	 10Data-Engineering-Planning, 10Data Pipelines: NEW FEATURE REQUEST: Dataset with active and non-active Wikis - https://phabricator.wikimedia.org/T323662 (10EChetty)
[16:50:39] <wikibugs>	 10Data-Engineering-Planning, 10Data Pipelines: Fix mediawiki-history page computation for deleted pages having the same title - https://phabricator.wikimedia.org/T320860 (10EChetty) p:05Triage→03Low
[16:58:53] <wikibugs>	 10Data-Engineering-Planning, 10Epic, 10Patch-For-Review, 10Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-05)): Decide on installation details for new ceph cluster - https://phabricator.wikimedia.org/T326945 (10MatthewVernon) I do think upstream haven't covered themselves in glory here - only cephadm f...
[18:41:14] <icinga-wm>	 RECOVERY - MegaRAID on an-worker1086 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[19:12:58] <icinga-wm>	 PROBLEM - MegaRAID on an-worker1086 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[19:23:32] <icinga-wm>	 RECOVERY - MegaRAID on an-worker1086 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[20:05:50] <icinga-wm>	 PROBLEM - MegaRAID on an-worker1086 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[20:16:24] <icinga-wm>	 RECOVERY - MegaRAID on an-worker1086 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[21:19:54] <icinga-wm>	 PROBLEM - MegaRAID on an-worker1086 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring