[11:58:44] FIRING: LiftWingServiceErrorRate: ... [11:58:44] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=codfw%20prometheus/k8s-mlserve&var-namespace=revscoring-editquality-damaging&var-backend=enwiki-damaging-predictor-default.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [12:25:15] FIRING: ORESFetchScoreJobKafkaLag: Kafka consumer lag for ORESFetchScoreJob over threshold for past 1h. ... [12:25:15] - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#Kafka_Consumer_lag_-_ORESFetchScoreJobKafkaLag_alert - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?from=now-3h&orgId=1&to=now&var-cluster=main-codfw&var-consumer_group=cpjobqueue-ORESFetchScoreJob&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DORESFetchScoreJobKafkaLag [12:28:44] RESOLVED: LiftWingServiceErrorRate: ... [12:28:44] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=codfw%20prometheus/k8s-mlserve&var-namespace=revscoring-editquality-damaging&var-backend=enwiki-damaging-predictor-default.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [12:35:15] RESOLVED: ORESFetchScoreJobKafkaLag: Kafka consumer lag for ORESFetchScoreJob over threshold for past 1h. ... [12:35:15] - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#Kafka_Consumer_lag_-_ORESFetchScoreJobKafkaLag_alert - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?from=now-3h&orgId=1&to=now&var-cluster=main-codfw&var-consumer_group=cpjobqueue-ORESFetchScoreJob&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DORESFetchScoreJobKafkaLag [17:12:16] 06Machine-Learning-Team: vscode remote ssh into ml-lab freezes - https://phabricator.wikimedia.org/T377067 (10calbon) 03NEW [17:38:05] 06Machine-Learning-Team: vscode remote ssh into ml-lab freezes - https://phabricator.wikimedia.org/T377067#10223577 (10calbon) [19:16:44] FIRING: LiftWingServiceErrorRate: ... [19:16:44] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=codfw%20prometheus/k8s-mlserve&var-namespace=revscoring-editquality-damaging&var-backend=enwiki-damaging-predictor-default.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [19:41:44] RESOLVED: LiftWingServiceErrorRate: ... [19:41:44] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=codfw%20prometheus/k8s-mlserve&var-namespace=revscoring-editquality-damaging&var-backend=enwiki-damaging-predictor-default.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [19:44:15] FIRING: ORESFetchScoreJobKafkaLag: Kafka consumer lag for ORESFetchScoreJob over threshold for past 1h. ... [19:44:15] - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#Kafka_Consumer_lag_-_ORESFetchScoreJobKafkaLag_alert - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?from=now-3h&orgId=1&to=now&var-cluster=main-codfw&var-consumer_group=cpjobqueue-ORESFetchScoreJob&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DORESFetchScoreJobKafkaLag [19:59:15] RESOLVED: ORESFetchScoreJobKafkaLag: Kafka consumer lag for ORESFetchScoreJob over threshold for past 1h. ... [19:59:15] - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#Kafka_Consumer_lag_-_ORESFetchScoreJobKafkaLag_alert - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?from=now-3h&orgId=1&to=now&var-cluster=main-codfw&var-consumer_group=cpjobqueue-ORESFetchScoreJob&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DORESFetchScoreJobKafkaLag [22:07:43] FIRING: LiftWingServiceErrorRate: ... [22:07:44] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=codfw%20prometheus/k8s-mlserve&var-namespace=revscoring-editquality-damaging&var-backend=enwiki-damaging-predictor-default.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [22:17:43] RESOLVED: LiftWingServiceErrorRate: ... [22:17:44] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=codfw%20prometheus/k8s-mlserve&var-namespace=revscoring-editquality-damaging&var-backend=enwiki-damaging-predictor-default.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate