[00:00:55] RECOVERY - Check systemd state on an-worker1127 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:13:21] PROBLEM - Check systemd state on an-worker1127 is CRITICAL: CRITICAL - degraded: The following units failed: user-runtime-dir@116.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:14:59] RECOVERY - Check systemd state on an-worker1127 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:22:27] PROBLEM - Check systemd state on an-worker1127 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service,user-runtime-dir@116.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [03:03:37] PROBLEM - puppet last run on an-worker1127 is CRITICAL: CRITICAL: Puppet last ran 6 hours ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [06:21:04] RECOVERY - Check systemd state on an-worker1127 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [06:27:06] PROBLEM - Check systemd state on an-worker1127 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service,user-runtime-dir@116.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [08:41:55] 10Data-Engineering, 10Event-Platform: Improve EventGate's error message when the client's HTTP Content-Type is not the one expected - https://phabricator.wikimedia.org/T313202 (10elukey) [08:53:56] hello folks [08:54:05] for some reason puppet on an-worker1127 gets stuck in [08:54:06] Error: Facter: error while resolving custom facts in /var/lib/puppet/lib/facter/lvm_support.rb: command timed out after 60 seconds. [08:54:16] I tried to kill the agent and run manually, same thing [08:54:56] ahh ok I see, the dmesg is full of " INFO: rcu_sched self-detected stall on CPU" [10:55:17] RECOVERY - puppet last run on an-worker1127 is OK: OK: Puppet is currently disabled (re-sync postgres), not alerting. Last run 13 hours ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [11:32:41] PROBLEM - puppet last run on an-worker1127 is CRITICAL: CRITICAL: Puppet last ran 14 hours ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:34:55] 10Data-Engineering, 10Data-Engineering-Kanban: Some varnishkafka instances dropped traffic for a long time due to the wrong version of the package installed - https://phabricator.wikimedia.org/T300164 (10Mayakp.wiki) Hi @JArguello-WMF / @EChetty Can we rename this task to improve discoverability in the future... [18:04:32] 10Analytics-Jupyter, 10Data-Engineering: Cannot import Numpy in new Conda environment on stat1008 - https://phabricator.wikimedia.org/T313249 (10nshahquinn-wmf)