[00:08:09] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1102 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:13:59] <icinga-wm>	 PROBLEM - MegaRAID on an-worker1089 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[00:20:09] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1102 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:22:05] <icinga-wm>	 PROBLEM - Check systemd state on an-web1001 is CRITICAL: CRITICAL - degraded: The following units failed: hardsync-published.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:25:23] <icinga-wm>	 RECOVERY - MegaRAID on an-worker1089 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[00:31:39] <icinga-wm>	 RECOVERY - Check systemd state on an-web1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:39:19] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1102 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:53:43] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1102 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[01:12:55] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1102 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[01:24:55] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1102 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[01:44:07] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1102 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[01:56:09] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1102 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[02:07:53] <icinga-wm>	 PROBLEM - MegaRAID on an-worker1089 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[02:15:17] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1102 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[02:27:15] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1102 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[02:46:27] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1102 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[02:58:25] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1102 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[03:04:51] <icinga-wm>	 RECOVERY - MegaRAID on an-worker1089 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[03:17:35] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1102 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[03:29:35] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1102 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[03:48:05] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1102 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[04:02:29] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1102 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[04:21:37] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1102 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[04:33:37] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1102 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[04:51:11] <wikibugs>	 (03PS1) 10DLynch: New schema: editattemptsblocked [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/820908 (https://phabricator.wikimedia.org/T310390)
[04:51:41] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] New schema: editattemptsblocked [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/820908 (https://phabricator.wikimedia.org/T310390) (owner: 10DLynch)
[04:52:47] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1102 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[04:53:23] <wikibugs>	 (03CR) 10DLynch: "It occurred to me that `platform` and `interface` are redundant, and I have omitted `platform` as a result." [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/820908 (https://phabricator.wikimedia.org/T310390) (owner: 10DLynch)
[04:58:39] <icinga-wm>	 PROBLEM - MegaRAID on an-worker1089 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[04:58:56] <wikibugs>	 (03PS2) 10DLynch: New schema: editattemptsblocked [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/820908 (https://phabricator.wikimedia.org/T310390)
[05:04:49] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1102 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[05:23:59] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1102 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[05:35:57] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1102 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[05:44:13] <icinga-wm>	 RECOVERY - MegaRAID on an-worker1089 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[05:55:07] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1102 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:07:09] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1102 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:26:23] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1102 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:38:25] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1102 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:41:15] <icinga-wm>	 PROBLEM - MegaRAID on an-worker1089 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[06:57:31] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1102 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[07:09:29] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1102 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[07:57:29] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1102 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[08:00:57] <icinga-wm>	 RECOVERY - MegaRAID on an-worker1089 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[08:09:25] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1102 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[08:13:27] <jinxer-wm>	 (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1001:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage
[08:28:39] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1102 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[08:35:05] <icinga-wm>	 PROBLEM - MegaRAID on an-worker1089 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[08:40:37] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1102 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[08:53:27] <jinxer-wm>	 (HiveServerHeapUsage) resolved: Hive Server JVM Heap usage is above 80% on an-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1001:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage
[08:53:57] <jinxer-wm>	 (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1001:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage
[08:57:51] <icinga-wm>	 RECOVERY - MegaRAID on an-worker1089 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[08:58:42] <jinxer-wm>	 (HiveServerHeapUsage) resolved: Hive Server JVM Heap usage is above 80% on an-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1001:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage
[09:02:11] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1102 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[09:14:11] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1102 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[09:33:21] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1102 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[09:40:28] <elukey>	 btullis: o/
[09:40:50] <elukey>	 I didn't get from the etcd code review if the ml_etcd srv records need to be updated to unblock the work or not
[09:57:21] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1102 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:15:30] <icinga-wm>	 PROBLEM - MegaRAID on an-worker1089 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[10:18:58] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1102 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:26:54] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1102 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:34:50] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1102 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:36:36] <btullis>	 elukey: no, I didn't think you need to change the ml_serve records to unblock the dse-k8s-etcd work.
[10:37:29] <btullis>	 It's only if you ever wished to switch that cluster from cergen/puppetCA to cfssl/PKI then you would need to.
[10:46:16] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1102 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:52:06] <elukey>	 btullis: ahhh okok nice, I'll try to do it in the future :)
[11:08:57] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1102 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[11:16:45] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1102 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[11:43:42] <btullis>	 !log rebooting an-worker1102 due to kernel soft lockups
[11:43:44] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[12:03:44] <icinga-wm>	 PROBLEM - Host an-worker1102 is DOWN: PING CRITICAL - Packet loss = 100%
[12:25:36] <icinga-wm>	 RECOVERY - Host an-worker1102 is UP: PING OK - Packet loss = 0%, RTA = 0.22 ms
[12:26:22] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1102 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:34:51] <jinxer-wm>	 (HdfsCorruptBlocks) firing: HDFS corrupt blocks detected on the analytics-hadoop HDFS cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_corrupt_blocks - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=39&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsCorruptBlocks
[12:49:49] <btullis>	 ^^ investigating this corruptblock alert now
[12:55:09] <btullis>	 https://www.irccloud.com/pastebin/56qPrmIe/
[12:55:46] <btullis>	 According to this: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_corrupt_blocks the alert may be a false positive.
[12:55:50] <ottomata>	 hi, was a bout to look too (have meeting soon tho), 
[12:55:57] <ottomata>	 an-worker1102 flapped a lot over the weekend
[12:56:01] <ottomata>	 over last night rearlly
[12:56:42] <btullis>	 Yeah, I just rebooted an-worker1002 and it seems better. It was a soft CPU lockup I think. However, the corrupt blocks alert appeared just after an-worker1002 booted again.
[12:58:26] <ottomata>	 1102?
[12:58:42] <btullis>	 Yep, sorry fat fingered typo.
[12:58:49] <ottomata>	 okay 
[12:58:50] <ottomata>	 coo
[12:59:56] <icinga-wm>	 RECOVERY - MegaRAID on an-worker1089 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[13:01:06] <btullis>	 ^^ These megaraid battery failures are all really annoying. We've got a whole batch of hadoop work nodes where the RAID battery is starting to fail at roughly the same time. This is about the 4th one.
[13:33:52] <icinga-wm>	 PROBLEM - MegaRAID on an-worker1089 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[14:21:28] <ottomata>	 this is the battery?
[14:23:04] <btullis>	 Yes, the backup battery on the RAID controller card in each host. When the charge is too low it reduces the performance, from WriteBack to WriteThrough.
[14:26:25] <ottomata>	 huh, what can we do?  escalate to dcops?  can we get new batteries?
[14:53:04] <icinga-wm>	 RECOVERY - MegaRAID on an-worker1089 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[15:17:04] <wikibugs>	 10Analytics-Clusters, 10Data Engineering Planning, 10Voice & Tone: Rename geoeditors_blacklist_country - https://phabricator.wikimedia.org/T259804 (10odimitrijevic)
[15:24:30] <aqu>	 Hi btullis, this is the patch to review  https://gerrit.wikimedia.org/r/c/operations/puppet/+/813278 plz . I scoped the modifications on the test cluster. So we can merge it, and I can test Spark3 on 
[15:25:00] <btullis>	 aqu: Thanks. Will look at it now.
[15:28:32] <icinga-wm>	 PROBLEM - MegaRAID on an-worker1089 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[15:49:30] <icinga-wm>	 RECOVERY - MegaRAID on an-worker1089 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[16:35:56] <wikibugs>	 (03CR) 10Vivian Rook: [C: 03+2] Escape '|' from wikitable output [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/816254 (https://phabricator.wikimedia.org/T308362) (owner: 10WelpThatWorked)
[16:40:27] <wikibugs>	 (03Merged) 10jenkins-bot: Escape '|' from wikitable output [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/816254 (https://phabricator.wikimedia.org/T308362) (owner: 10WelpThatWorked)
[16:41:16] <icinga-wm>	 PROBLEM - MegaRAID on an-worker1089 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[16:42:49] <wikibugs>	 10Quarry, 10Patch-For-Review, 10good first task: Escape special characters in results - https://phabricator.wikimedia.org/T308362 (10rook) 05Open→03Resolved
[16:46:47] <btullis>	 aqu: That's merged, but it failed to run on some hosts: 
[16:46:51] <btullis>	 `Error: Failed to apply catalog: Parameter source failed on File[/etc/spark3/conf/spark-env.sh]: Cannot use relative URLs '#!/usr/bin/env bash`
[16:48:35] <btullis>	 Ah, looks like it's here: https://gerrit.wikimedia.org/r/c/operations/puppet/+/813278/12/modules/profile/manifests/hadoop/spark3.pp#147 I will patch it now.
[16:59:03] <btullis>	 aqu: I've deployed the fix now, so you're free to test whether or not your changes works as expected.
[17:00:56] <aqu>	 I've noticed the patch. Thanks. Will check now.
[17:10:21] <wikibugs>	 (03CR) 10Michael Große: "I ran this on stat1008 with the command in the comment and got plausible results:" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/817837 (owner: 10Michael Große)
[17:15:57] <wikibugs>	 (03PS1) 10Vivian Rook: Switch string and pipe [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/821283 (https://phabricator.wikimedia.org/T308362)
[17:45:20] <wikibugs>	 (03CR) 10RhinosF1: [C: 04-1] Switch string and pipe (031 comment) [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/821283 (https://phabricator.wikimedia.org/T308362) (owner: 10Vivian Rook)
[17:57:43] <wikibugs>	 (03CR) 10Vivian Rook: Switch string and pipe (031 comment) [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/821283 (https://phabricator.wikimedia.org/T308362) (owner: 10Vivian Rook)
[18:04:27] <wikibugs>	 (03CR) 10RhinosF1: [C: 04-1] Switch string and pipe (031 comment) [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/821283 (https://phabricator.wikimedia.org/T308362) (owner: 10Vivian Rook)
[18:13:51] <wikibugs>	 (03PS2) 10Vivian Rook: Switch string and pipe [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/821283 (https://phabricator.wikimedia.org/T308362)
[18:14:15] <wikibugs>	 (03CR) 10Vivian Rook: Switch string and pipe (031 comment) [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/821283 (https://phabricator.wikimedia.org/T308362) (owner: 10Vivian Rook)
[18:27:56] <ottomata>	 aqu: btullis  i can't totally recall, but was that patch ready to merge?
[18:28:09] <ottomata>	 i think there are still issues with the .deb package
[18:28:09] <ottomata>	 https://phabricator.wikimedia.org/T309227#8079678
[18:31:16] <wikibugs>	 (03CR) 10RhinosF1: [C: 03+1] Switch string and pipe [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/821283 (https://phabricator.wikimedia.org/T308362) (owner: 10Vivian Rook)
[18:39:47] <btullis>	 ottomata: I thought that it was ready to merge; that was what I took from Antoine's standup anyway.
[18:41:41] <ottomata>	 i guess its got a guard on it now, but i think the .deb package isn't quite working.  so the puppet maybe is okay?
[18:41:51] <ottomata>	 been a while though.
[18:45:01] <btullis>	 Yeah, apologies if I jumped the gun. I thought that this was just facilitating further testing on the test cluster. The .deb itself  can be iterated separately.
[18:51:16] <icinga-wm>	 RECOVERY - MegaRAID on an-worker1089 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[19:05:25] <wikibugs>	 (03PS3) 10RhinosF1: mypy: add to tox [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/821244
[19:09:20] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] mypy: add to tox [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/821244 (owner: 10RhinosF1)
[19:11:42] <wikibugs>	 10Data-Engineering, 10Event Metrics, 10GrowthExperiments-CommunityConfiguration, 10MediaWiki-extensions-EventLogging, and 2 others: editgrowthconfig schema: '' should NOT have additional properties, - https://phabricator.wikimedia.org/T314173 (10Ottomata)
[19:11:47] <wikibugs>	 10Data-Engineering, 10Event Metrics, 10GrowthExperiments-CommunityConfiguration, 10MediaWiki-extensions-EventLogging, and 2 others: editgrowthconfig schema: '' should NOT have additional properties, - https://phabricator.wikimedia.org/T314173 (10Ottomata)
[19:12:01] <wikibugs>	 10Data-Engineering, 10Event Metrics, 10GrowthExperiments-CommunityConfiguration, 10MediaWiki-extensions-EventLogging, and 2 others: editgrowthconfig schema: '' should NOT have additional properties, - https://phabricator.wikimedia.org/T314173 (10Ottomata)
[19:19:09] <wikibugs>	 (03PS4) 10RhinosF1: mypy: add to tox [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/821244
[19:23:21] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] mypy: add to tox [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/821244 (owner: 10RhinosF1)
[19:25:10] <icinga-wm>	 PROBLEM - MegaRAID on an-worker1089 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[19:27:00] <wikibugs>	 (03PS5) 10RhinosF1: mypy: add to tox [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/821244
[19:54:18] <wikibugs>	 (03PS6) 10RhinosF1: mypy: add to tox [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/821244
[19:54:36] <wikibugs>	 (03PS7) 10RhinosF1: mypy: add to tox [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/821244
[19:58:14] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] mypy: add to tox [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/821244 (owner: 10RhinosF1)
[19:59:07] <wikibugs>	 (03PS8) 10RhinosF1: mypy: add to tox [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/821244
[19:59:33] <wikibugs>	 (03CR) 10RhinosF1: "will finish in morning. brain get sleepy" [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/821244 (owner: 10RhinosF1)
[20:04:34] <wikibugs>	 10Quarry, 10cloud-services-team (Kanban): quarry-nfs-1 went down; quarry is offline - https://phabricator.wikimedia.org/T302154 (10RhinosF1) 05Open→03Resolved Not happened since. Closing per IRC.
[20:11:27] <jinxer-wm>	 (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1001:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage
[20:45:44] <wikibugs>	 10Analytics, 10Analytics-Wikistats, 10Data-Engineering: WikiStats in Uzbek - https://phabricator.wikimedia.org/T314477 (10EChetty)
[20:46:05] <wikibugs>	 10Analytics, 10Analytics-Wikistats, 10Data-Engineering: WikiStats in Uzbek - https://phabricator.wikimedia.org/T314477 (10EChetty)
[20:48:14] <wikibugs>	 10Analytics, 10Analytics-Wikistats, 10Data-Engineering: WikiStats in Uzbek - https://phabricator.wikimedia.org/T314477 (10JArguello-WMF)
[21:26:14] <ottomata>	 milimetric:  thanks for the ping.  
[21:26:31] <ottomata>	 they look like they are re-running  sucessfully now
[21:26:34] <milimetric>	 nice
[21:33:53] <btullis>	 Oh dear, I'm so sorry for the mess I made.
[21:35:29] <ottomata>	 np!  btullis  not your fault!  its been weeks since we looked at that. iirc antoine was off right before offsite too, and he and I have not synced up
[22:50:49] <wikibugs>	 10Analytics-Wikistats, 10Data-Engineering: WikiStats in Uzbek - https://phabricator.wikimedia.org/T314477 (10Aklapper) @EChetty: Please keep/add valid code project tags such as #Analytics-Wikistats which allow finding tasks related to code bases, not to end up in a big unmaintainable pile of only some-team-in-...
[23:25:22] <wikibugs>	 (03PS4) 10Ottomata: WIP - Add new mediawiki entity fragments, and use them in new mediawiki page change schema [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/807565 (https://phabricator.wikimedia.org/T308017)
[23:25:58] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] WIP - Add new mediawiki entity fragments, and use them in new mediawiki page change schema [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/807565 (https://phabricator.wikimedia.org/T308017) (owner: 10Ottomata)
[23:26:19] <wikibugs>	 (03CR) 10Ottomata: "Update: latest patch uses entity subobjects for each entity in the page cahgne schema.  page, revision, actor, etc.  See also example 2 he" [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/807565 (https://phabricator.wikimedia.org/T308017) (owner: 10Ottomata)