[05:59:12] (VarnishkafkaNoMessages) firing: varnishkafka on cp2033 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=codfw%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp2033%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [06:04:12] (VarnishkafkaNoMessages) resolved: varnishkafka on cp2033 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=codfw%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp2033%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [10:05:02] SandraEbele: o/ [10:05:19] (03CR) 10Krinkle: [C: 03+2] navtiming: Add cumulative layout shift and largest contentful paint. [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/859968 (https://phabricator.wikimedia.org/T281022) (owner: 10Phedenskog) [10:06:20] SandraEbele: I have updated the webrequest live druid supervisor following the wikitech procedure [10:06:54] (03Merged) 10jenkins-bot: navtiming: Add cumulative layout shift and largest contentful paint. [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/859968 (https://phabricator.wikimedia.org/T281022) (owner: 10Phedenskog) [10:07:15] !log refresh the webrequest-sampled-live druid supervisor after https://gerrit.wikimedia.org/r/c/analytics/refinery/+/859463 [10:07:17] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:16:18] (03PS14) 10Aqu: Add HdfsXMLFsImageConverter to refinery-job [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/852315 (https://phabricator.wikimedia.org/T321168) [10:19:29] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [10:27:10] (03PS15) 10Aqu: Add HdfsXMLFsImageConverter to refinery-job [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/852315 (https://phabricator.wikimedia.org/T321168) [10:31:53] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [11:24:48] !log restart turnilo on an-tool1007 to pick up new settings for webrequest_sampled_live [11:24:50] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:02:56] 10Analytics: Kerberos Principal for pfischer - https://phabricator.wikimedia.org/T323822 (10pfischer) [15:07:02] PROBLEM - SSH on an-coord1002.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [15:29:34] !log reset the bmc on an-coord1002 [15:29:36] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:03:02] (03Abandoned) 10David Caro: db: Added a script to generate a DB schema from the models [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/711133 (https://phabricator.wikimedia.org/T288523) (owner: 10David Caro) [16:07:52] RECOVERY - SSH on an-coord1002.mgmt is OK: SSH OK - OpenSSH_7.4 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [16:25:43] 10Data-Engineering-Planning, 10Data Pipelines: Back-fill Wikidata reliability Graphite metrics - https://phabricator.wikimedia.org/T321838 (10Antoine_Quhen) [17:37:13] 10Data-Engineering-Radar, 10Cassandra: Bootstrap new Cassandra nodes (eqiad) - https://phabricator.wikimedia.org/T307802 (10Eevans) [22:51:26] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [23:01:36] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state