[00:17:07] RECOVERY - Check unit status of monitor_refine_eventlogging_analytics on an-launcher1002 is OK: OK: Status of the systemd unit monitor_refine_eventlogging_analytics https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [00:30:23] PROBLEM - Check unit status of monitor_refine_eventlogging_analytics on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit monitor_refine_eventlogging_analytics https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [03:01:37] (03CR) 10AntiCompositeNumber: "recheck" [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/752034 (owner: 10EpicPupper) [10:41:56] !log restart hive daemons on an-coord1002 (after my last upgrade/rollback of packages the prometheus agent settings were not picked up, so no metrics) [10:41:58] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:46:29] PROBLEM - Hive Server on an-coord1002 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hive.service.server.HiveServer2 https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive [10:49:57] uff [10:51:05] RECOVERY - Hive Server on an-coord1002 is OK: PROCS OK: 1 process with command name java, args org.apache.hive.service.server.HiveServer2 https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive [10:51:21] !log start hive-server2 on an-coord1002 - failed to connect to the metastore due to restart [10:51:23] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log