[00:01:49] (RdfStreamingUpdaterFlinkProcessingLatencyIsHigh) firing: Processing latency of WDQS_Streaming_Updater in codfw (k8s) is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkProcessingLatencyIsHigh [00:03:49] (RdfStreamingUpdaterFlinkJobUnstable) firing: WDQS_Streaming_Updater in codfw (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/gCFgfpG7k/flink-session-cluster - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkJobUnstable [00:06:49] (RdfStreamingUpdaterFlinkProcessingLatencyIsHigh) resolved: Processing latency of WDQS_Streaming_Updater in codfw (k8s) is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkProcessingLatencyIsHigh [00:08:49] (RdfStreamingUpdaterFlinkJobUnstable) resolved: WDQS_Streaming_Updater in codfw (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/gCFgfpG7k/flink-session-cluster - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkJobUnstable [00:15:49] (RdfStreamingUpdaterFlinkProcessingLatencyIsHigh) firing: Processing latency of WDQS_Streaming_Updater in codfw (k8s) is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkProcessingLatencyIsHigh [00:16:41] PROBLEM - Check systemd state on db2180 is CRITICAL: CRITICAL - degraded: The following units failed: ipmiseld.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:25:31] (03PS1) 10RLazarus: Combine linkrecommendation SLO metrics into one cross-datacenter value [grafana-grizzly] - 10https://gerrit.wikimedia.org/r/916680 (https://phabricator.wikimedia.org/T278083) [00:27:59] (03CR) 10RLazarus: "As discussed on phab." [grafana-grizzly] - 10https://gerrit.wikimedia.org/r/916680 (https://phabricator.wikimedia.org/T278083) (owner: 10RLazarus) [00:29:09] RECOVERY - Check systemd state on db2180 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:39:29] (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/915802 [00:39:33] (03CR) 10TrainBranchBot: [C: 03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/915802 (owner: 10TrainBranchBot) [00:40:50] (03CR) 10Krinkle: mediawiki: Add auto_prepend_file to PHP config_cli (take 2) (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/910882 (https://phabricator.wikimedia.org/T253547) (owner: 10Krinkle) [00:45:34] (RdfStreamingUpdaterFlinkProcessingLatencyIsHigh) resolved: Processing latency of WDQS_Streaming_Updater in codfw (k8s) is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkProcessingLatencyIsHigh [00:46:37] PROBLEM - Check systemd state on db2180 is CRITICAL: CRITICAL - degraded: The following units failed: ipmiseld.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:55:54] (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/915802 (owner: 10TrainBranchBot) [00:59:15] RECOVERY - Check systemd state on db2180 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:05:49] (RdfStreamingUpdaterFlinkProcessingLatencyIsHigh) firing: Processing latency of WDQS_Streaming_Updater in codfw (k8s) is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkProcessingLatencyIsHigh [01:10:49] (RdfStreamingUpdaterFlinkProcessingLatencyIsHigh) resolved: Processing latency of WDQS_Streaming_Updater in codfw (k8s) is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkProcessingLatencyIsHigh [01:16:03] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:18:49] (RdfStreamingUpdaterFlinkProcessingLatencyIsHigh) firing: Processing latency of WDQS_Streaming_Updater in codfw (k8s) is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkProcessingLatencyIsHigh [01:53:49] (RdfStreamingUpdaterFlinkProcessingLatencyIsHigh) resolved: Processing latency of WDQS_Streaming_Updater in codfw (k8s) is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkProcessingLatencyIsHigh [01:53:49] (RdfStreamingUpdaterFlinkJobUnstable) firing: WDQS_Streaming_Updater in codfw (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/gCFgfpG7k/flink-session-cluster - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkJobUnstable [01:54:04] (RdfStreamingUpdaterFlinkProcessingLatencyIsHigh) firing: Processing latency of WDQS_Streaming_Updater in codfw (k8s) is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkProcessingLatencyIsHigh [01:59:04] (RdfStreamingUpdaterFlinkProcessingLatencyIsHigh) resolved: Processing latency of WDQS_Streaming_Updater in codfw (k8s) is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkProcessingLatencyIsHigh [02:07:54] (JobUnavailable) firing: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [02:08:31] (Access port speed <= 100Mbps) firing: Alert for device asw2-b-eqiad.mgmt.eqiad.wmnet - Access port speed <= 100Mbps - https://alerts.wikimedia.org/?q=alertname%3DAccess+port+speed+%3C%3D+100Mbps [02:18:49] (RdfStreamingUpdaterFlinkJobUnstable) resolved: WDQS_Streaming_Updater in codfw (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/gCFgfpG7k/flink-session-cluster - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkJobUnstable [02:23:31] (JobUnavailable) resolved: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [03:04:58] (RdfStreamingUpdaterHighConsumerUpdateLag) firing: (3) wdqs2005:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag [03:05:49] (RdfStreamingUpdaterFlinkProcessingLatencyIsHigh) firing: Processing latency of WDQS_Streaming_Updater in codfw (k8s) is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkProcessingLatencyIsHigh [03:08:28] (RdfStreamingUpdaterHighConsumerUpdateLag) firing: (6) wdqs2004:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag [03:10:49] (RdfStreamingUpdaterFlinkProcessingLatencyIsHigh) resolved: Processing latency of WDQS_Streaming_Updater in codfw (k8s) is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkProcessingLatencyIsHigh [03:13:31] (NodeTextfileStale) firing: Stale textfile for lists1003:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale [03:29:49] (RdfStreamingUpdaterFlinkJobUnstable) firing: WDQS_Streaming_Updater in codfw (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/gCFgfpG7k/flink-session-cluster - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkJobUnstable [03:34:49] (RdfStreamingUpdaterFlinkJobUnstable) resolved: WDQS_Streaming_Updater in codfw (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/gCFgfpG7k/flink-session-cluster - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkJobUnstable [04:08:49] (RdfStreamingUpdaterFlinkProcessingLatencyIsHigh) firing: Processing latency of WDQS_Streaming_Updater in codfw (k8s) is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkProcessingLatencyIsHigh [04:09:49] (RdfStreamingUpdaterFlinkJobUnstable) firing: WDQS_Streaming_Updater in codfw (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/gCFgfpG7k/flink-session-cluster - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkJobUnstable [04:13:49] (RdfStreamingUpdaterFlinkProcessingLatencyIsHigh) resolved: Processing latency of WDQS_Streaming_Updater in codfw (k8s) is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkProcessingLatencyIsHigh [04:17:08] 10SRE, 10Gerrit, 10Release-Engineering-Team, 10serviceops-collab, 10Patch-For-Review: gerrit1003 service implementation task - https://phabricator.wikimedia.org/T326368 (10Dzahn) >>! In T326368#8829791, @hashar wrote: > I'd like the new host to be added first as a replica rather than an entirely new prim... [04:19:49] (RdfStreamingUpdaterFlinkJobUnstable) resolved: WDQS_Streaming_Updater in codfw (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/gCFgfpG7k/flink-session-cluster - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkJobUnstable [04:50:03] (ProbeDown) firing: (2) Service centrallog2002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://wikitech.wikimedia.org/wiki/TLS/Runbook#centrallog2002:6514 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [04:51:49] (RdfStreamingUpdaterFlinkProcessingLatencyIsHigh) firing: Processing latency of WDQS_Streaming_Updater in codfw (k8s) is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkProcessingLatencyIsHigh [04:55:03] (ProbeDown) resolved: (2) Service centrallog2002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://wikitech.wikimedia.org/wiki/TLS/Runbook#centrallog2002:6514 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [04:56:49] (RdfStreamingUpdaterFlinkProcessingLatencyIsHigh) resolved: Processing latency of WDQS_Streaming_Updater in codfw (k8s) is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkProcessingLatencyIsHigh [05:16:49] (RdfStreamingUpdaterFlinkJobUnstable) firing: WDQS_Streaming_Updater in codfw (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/gCFgfpG7k/flink-session-cluster - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkJobUnstable [05:21:49] (RdfStreamingUpdaterFlinkJobUnstable) resolved: WDQS_Streaming_Updater in codfw (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/gCFgfpG7k/flink-session-cluster - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkJobUnstable [05:49:35] PROBLEM - Backup freshness on backup1001 is CRITICAL: Stale: 2 (gerrit1001, ...), Fresh: 124 jobs https://wikitech.wikimedia.org/wiki/Bacula%23Monitoring [05:56:35] PROBLEM - Cxserver LVS codfw on cxserver.svc.codfw.wmnet is CRITICAL: /v2/suggest/title/{title}/{from}/{to} (Suggest a target title for the given source title and language pairs) timed out before a response was received https://wikitech.wikimedia.org/wiki/CX [05:57:49] (RdfStreamingUpdaterFlinkJobUnstable) firing: WDQS_Streaming_Updater in codfw (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/gCFgfpG7k/flink-session-cluster - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkJobUnstable [05:58:01] RECOVERY - Cxserver LVS codfw on cxserver.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/CX [06:02:49] (RdfStreamingUpdaterFlinkJobUnstable) resolved: WDQS_Streaming_Updater in codfw (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/gCFgfpG7k/flink-session-cluster - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkJobUnstable [06:08:31] (Access port speed <= 100Mbps) firing: Alert for device asw2-b-eqiad.mgmt.eqiad.wmnet - Access port speed <= 100Mbps - https://alerts.wikimedia.org/?q=alertname%3DAccess+port+speed+%3C%3D+100Mbps [06:50:27] !log jelto@cumin1001 START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Install software version upgrade [06:53:49] (RdfStreamingUpdaterFlinkJobUnstable) firing: WDQS_Streaming_Updater in codfw (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/gCFgfpG7k/flink-session-cluster - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkJobUnstable [06:58:49] (RdfStreamingUpdaterFlinkJobUnstable) resolved: WDQS_Streaming_Updater in codfw (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/gCFgfpG7k/flink-session-cluster - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkJobUnstable [07:00:05] Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230506T0700) [07:04:58] (RdfStreamingUpdaterHighConsumerUpdateLag) firing: (3) wdqs2005:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag [07:07:52] !log jelto@cumin1001 START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Install software version upgrade [07:08:28] (RdfStreamingUpdaterHighConsumerUpdateLag) firing: (6) wdqs2004:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag [07:10:49] (RdfStreamingUpdaterFlinkProcessingLatencyIsHigh) firing: Processing latency of WDQS_Streaming_Updater in codfw (k8s) is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkProcessingLatencyIsHigh [07:13:31] (NodeTextfileStale) firing: Stale textfile for lists1003:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale [07:16:49] (RdfStreamingUpdaterFlinkJobUnstable) firing: WDQS_Streaming_Updater in codfw (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/gCFgfpG7k/flink-session-cluster - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkJobUnstable [07:25:49] (RdfStreamingUpdaterFlinkProcessingLatencyIsHigh) resolved: Processing latency of WDQS_Streaming_Updater in codfw (k8s) is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkProcessingLatencyIsHigh [07:39:49] (RdfStreamingUpdaterFlinkProcessingLatencyIsHigh) firing: Processing latency of WDQS_Streaming_Updater in codfw (k8s) is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkProcessingLatencyIsHigh [07:44:54] !log jelto@cumin1001 END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: Install software version upgrade [07:45:11] ^ failed cookbook expected due to https://gerrit.wikimedia.org/r/c/operations/software/spicerack/+/915434 [07:49:49] (RdfStreamingUpdaterFlinkProcessingLatencyIsHigh) resolved: Processing latency of WDQS_Streaming_Updater in codfw (k8s) is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkProcessingLatencyIsHigh [07:50:00] !log jelto@cumin1001 START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Install software version upgrade [07:50:04] (RdfStreamingUpdaterFlinkProcessingLatencyIsHigh) firing: Processing latency of WDQS_Streaming_Updater in codfw (k8s) is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkProcessingLatencyIsHigh [07:55:04] (RdfStreamingUpdaterFlinkProcessingLatencyIsHigh) resolved: Processing latency of WDQS_Streaming_Updater in codfw (k8s) is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkProcessingLatencyIsHigh [08:00:59] PROBLEM - mailman list info on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [08:02:17] PROBLEM - mailman archives on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [08:03:11] !log jelto@cumin1001 END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1004.wikimedia.org with reason: Install software version upgrade [08:03:37] ^ failed cookbook expected due to https://gerrit.wikimedia.org/r/c/operations/software/spicerack/+/915434 [08:05:17] RECOVERY - mailman archives on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 49992 bytes in 0.053 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [08:05:33] RECOVERY - mailman list info on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8571 bytes in 0.261 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [08:18:49] (RdfStreamingUpdaterFlinkProcessingLatencyIsHigh) firing: Processing latency of WDQS_Streaming_Updater in codfw (k8s) is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkProcessingLatencyIsHigh [08:23:49] (RdfStreamingUpdaterFlinkProcessingLatencyIsHigh) resolved: Processing latency of WDQS_Streaming_Updater in codfw (k8s) is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkProcessingLatencyIsHigh [08:35:49] (RdfStreamingUpdaterFlinkProcessingLatencyIsHigh) firing: Processing latency of WDQS_Streaming_Updater in codfw (k8s) is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkProcessingLatencyIsHigh [08:51:26] !log jelto@cumin1001 END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab2002.wikimedia.org with reason: Install software version upgrade [08:55:49] (RdfStreamingUpdaterFlinkProcessingLatencyIsHigh) resolved: Processing latency of WDQS_Streaming_Updater in codfw (k8s) is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkProcessingLatencyIsHigh [09:51:49] RECOVERY - Backup freshness on backup1001 is OK: Fresh: 126 jobs https://wikitech.wikimedia.org/wiki/Bacula%23Monitoring [10:07:06] 10SRE, 10RESTBase, 10RESTBase-API, 10Traffic: REST API is not invalidating caches after template and/or module changes - https://phabricator.wikimedia.org/T335770 (10akosiaris) > As for the reason restarting the router fixed it, I suspect restarting the router might have changed your IP address and, in tur... [10:08:31] (Access port speed <= 100Mbps) firing: Alert for device asw2-b-eqiad.mgmt.eqiad.wmnet - Access port speed <= 100Mbps - https://alerts.wikimedia.org/?q=alertname%3DAccess+port+speed+%3C%3D+100Mbps [10:28:18] (KubernetesAPILatency) firing: High Kubernetes API latency (LIST secrets) on k8s-staging@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=eqiad&var-cluster=k8s-staging - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [10:33:18] (KubernetesAPILatency) resolved: High Kubernetes API latency (LIST secrets) on k8s-staging@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=eqiad&var-cluster=k8s-staging - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [10:35:49] (RdfStreamingUpdaterFlinkProcessingLatencyIsHigh) firing: Processing latency of WDQS_Streaming_Updater in codfw (k8s) is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkProcessingLatencyIsHigh [10:36:49] (RdfStreamingUpdaterFlinkJobUnstable) resolved: WDQS_Streaming_Updater in codfw (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/gCFgfpG7k/flink-session-cluster - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkJobUnstable [10:40:49] (RdfStreamingUpdaterFlinkProcessingLatencyIsHigh) resolved: Processing latency of WDQS_Streaming_Updater in codfw (k8s) is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkProcessingLatencyIsHigh [11:03:49] (RdfStreamingUpdaterFlinkJobUnstable) firing: WDQS_Streaming_Updater in codfw (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/gCFgfpG7k/flink-session-cluster - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkJobUnstable [11:04:58] (RdfStreamingUpdaterHighConsumerUpdateLag) firing: (3) wdqs2005:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag [11:08:28] (RdfStreamingUpdaterHighConsumerUpdateLag) firing: (6) wdqs2004:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag [11:13:31] (NodeTextfileStale) firing: Stale textfile for lists1003:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale [11:21:49] (RdfStreamingUpdaterFlinkProcessingLatencyIsHigh) firing: Processing latency of WDQS_Streaming_Updater in codfw (k8s) is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkProcessingLatencyIsHigh [11:26:49] (RdfStreamingUpdaterFlinkProcessingLatencyIsHigh) resolved: Processing latency of WDQS_Streaming_Updater in codfw (k8s) is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkProcessingLatencyIsHigh [11:28:49] (RdfStreamingUpdaterFlinkJobUnstable) resolved: WDQS_Streaming_Updater in codfw (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/gCFgfpG7k/flink-session-cluster - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkJobUnstable [11:30:13] PROBLEM - BGP status on cr2-esams is CRITICAL: BGP CRITICAL - AS64605/IPv6: Active - Anycast, AS64605/IPv6: Active - Anycast, AS64605/IPv6: Idle - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status [11:47:49] (RdfStreamingUpdaterFlinkJobUnstable) firing: WDQS_Streaming_Updater in codfw (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/gCFgfpG7k/flink-session-cluster - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkJobUnstable [12:23:49] (RdfStreamingUpdaterFlinkProcessingLatencyIsHigh) firing: Processing latency of WDQS_Streaming_Updater in codfw (k8s) is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkProcessingLatencyIsHigh [12:28:49] (RdfStreamingUpdaterFlinkProcessingLatencyIsHigh) resolved: Processing latency of WDQS_Streaming_Updater in codfw (k8s) is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkProcessingLatencyIsHigh [12:29:19] (RdfStreamingUpdaterFlinkProcessingLatencyIsHigh) firing: Processing latency of WDQS_Streaming_Updater in codfw (k8s) is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkProcessingLatencyIsHigh [12:34:19] (RdfStreamingUpdaterFlinkProcessingLatencyIsHigh) resolved: Processing latency of WDQS_Streaming_Updater in codfw (k8s) is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkProcessingLatencyIsHigh [12:58:36] 10SRE, 10RESTBase, 10RESTBase-API, 10Traffic: REST API is not invalidating caches after template and/or module changes - https://phabricator.wikimedia.org/T335770 (10akosiaris) Re-reading my answer, I realized that I have left a window open to suggest that a Belgian ISP might be doing a MITM attack. Let me... [13:42:49] (RdfStreamingUpdaterNotEnoughTaskSlots) firing: The flink session cluster rdf-streaming-updater in codfw (k8s) does not have enough task slots - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/gCFgfpG7k/flink-session-cluster - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterNotEnoughTaskSlots [13:47:49] (RdfStreamingUpdaterNotEnoughTaskSlots) resolved: The flink session cluster rdf-streaming-updater in codfw (k8s) does not have enough task slots - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/gCFgfpG7k/flink-session-cluster - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterNotEnoughTaskSlots [14:08:31] (Access port speed <= 100Mbps) firing: Alert for device asw2-b-eqiad.mgmt.eqiad.wmnet - Access port speed <= 100Mbps - https://alerts.wikimedia.org/?q=alertname%3DAccess+port+speed+%3C%3D+100Mbps [14:26:11] (03PS3) 10Andrew Bogott: toolforge k8s: permit access to clouds.yaml [puppet] - 10https://gerrit.wikimedia.org/r/916589 (https://phabricator.wikimedia.org/T330759) [14:26:13] (03PS3) 10Andrew Bogott: grid_configurator: use mwopenstackclients library [puppet] - 10https://gerrit.wikimedia.org/r/916588 (https://phabricator.wikimedia.org/T330759) [14:26:15] (03PS3) 10Andrew Bogott: wmcs prometheus: include 'OPENSTACK->CLOUD' in prometheus config [puppet] - 10https://gerrit.wikimedia.org/r/916590 (https://phabricator.wikimedia.org/T330759) [14:26:17] (03PS1) 10Andrew Bogott: codfw1dev: enforce scope and default policies [puppet] - 10https://gerrit.wikimedia.org/r/916725 (https://phabricator.wikimedia.org/T330759) [14:28:27] (03CR) 10Andrew Bogott: [C: 03+2] toolforge k8s: permit access to clouds.yaml [puppet] - 10https://gerrit.wikimedia.org/r/916589 (https://phabricator.wikimedia.org/T330759) (owner: 10Andrew Bogott) [14:28:46] (03CR) 10CI reject: [V: 04-1] grid_configurator: use mwopenstackclients library [puppet] - 10https://gerrit.wikimedia.org/r/916588 (https://phabricator.wikimedia.org/T330759) (owner: 10Andrew Bogott) [15:04:58] (RdfStreamingUpdaterHighConsumerUpdateLag) firing: (3) wdqs2005:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag [15:08:28] (RdfStreamingUpdaterHighConsumerUpdateLag) firing: (6) wdqs2004:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag [15:13:31] (NodeTextfileStale) firing: Stale textfile for lists1003:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale [15:17:49] (RdfStreamingUpdaterFlinkJobUnstable) resolved: WDQS_Streaming_Updater in codfw (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/gCFgfpG7k/flink-session-cluster - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkJobUnstable [15:30:09] PROBLEM - Check systemd state on doc1002 is CRITICAL: CRITICAL - degraded: The following units failed: rsync-doc-doc2001.codfw.wmnet.service,rsync-doc-doc2002.codfw.wmnet.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [15:46:49] (RdfStreamingUpdaterFlinkJobUnstable) firing: WDQS_Streaming_Updater in codfw (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/gCFgfpG7k/flink-session-cluster - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkJobUnstable [15:51:49] (RdfStreamingUpdaterFlinkJobUnstable) resolved: WDQS_Streaming_Updater in codfw (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/gCFgfpG7k/flink-session-cluster - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkJobUnstable [16:05:49] (RdfStreamingUpdaterFlinkJobUnstable) firing: WDQS_Streaming_Updater in codfw (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/gCFgfpG7k/flink-session-cluster - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkJobUnstable [16:14:18] (03PS1) 10Majavah: apt: unattendedupgrades: upgrade osbpo packages too [puppet] - 10https://gerrit.wikimedia.org/r/916736 [16:25:49] (RdfStreamingUpdaterFlinkJobUnstable) resolved: WDQS_Streaming_Updater in codfw (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/gCFgfpG7k/flink-session-cluster - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkJobUnstable [16:26:25] RECOVERY - Check systemd state on doc1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [16:42:49] (RdfStreamingUpdaterFlinkJobUnstable) firing: WDQS_Streaming_Updater in codfw (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/gCFgfpG7k/flink-session-cluster - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkJobUnstable [16:43:25] (03PS2) 10Andrew Bogott: codfw1dev: enforce scope and default policies [puppet] - 10https://gerrit.wikimedia.org/r/916725 (https://phabricator.wikimedia.org/T330759) [16:43:27] (03PS4) 10Andrew Bogott: grid_configurator: use mwopenstackclients library [puppet] - 10https://gerrit.wikimedia.org/r/916588 (https://phabricator.wikimedia.org/T330759) [16:43:29] (03PS4) 10Andrew Bogott: wmcs prometheus: include 'OPENSTACK->CLOUD' in prometheus config [puppet] - 10https://gerrit.wikimedia.org/r/916590 (https://phabricator.wikimedia.org/T330759) [16:45:26] (03CR) 10Andrew Bogott: [C: 03+2] codfw1dev: enforce scope and default policies [puppet] - 10https://gerrit.wikimedia.org/r/916725 (https://phabricator.wikimedia.org/T330759) (owner: 10Andrew Bogott) [16:45:54] (03CR) 10CI reject: [V: 04-1] grid_configurator: use mwopenstackclients library [puppet] - 10https://gerrit.wikimedia.org/r/916588 (https://phabricator.wikimedia.org/T330759) (owner: 10Andrew Bogott) [17:07:49] (RdfStreamingUpdaterFlinkJobUnstable) resolved: WDQS_Streaming_Updater in codfw (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/gCFgfpG7k/flink-session-cluster - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkJobUnstable [17:10:03] (ProbeDown) firing: Service centrallog2002:6514 has failed probes (tcp_rsyslog_receiver_ip6) - https://wikitech.wikimedia.org/wiki/TLS/Runbook#centrallog2002:6514 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [17:15:03] (ProbeDown) resolved: (2) Service centrallog2002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://wikitech.wikimedia.org/wiki/TLS/Runbook#centrallog2002:6514 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [17:22:08] (03PS5) 10Andrew Bogott: grid_configurator: use mwopenstackclients library [puppet] - 10https://gerrit.wikimedia.org/r/916588 (https://phabricator.wikimedia.org/T330759) [17:22:10] (03PS5) 10Andrew Bogott: wmcs prometheus: include 'OPENSTACK->CLOUD' in prometheus config [puppet] - 10https://gerrit.wikimedia.org/r/916590 (https://phabricator.wikimedia.org/T330759) [17:22:12] (03PS1) 10Andrew Bogott: Revert "codfw1dev: enforce scope and default policies" [puppet] - 10https://gerrit.wikimedia.org/r/916781 [17:23:50] (03CR) 10Andrew Bogott: [C: 03+2] Revert "codfw1dev: enforce scope and default policies" [puppet] - 10https://gerrit.wikimedia.org/r/916781 (owner: 10Andrew Bogott) [17:24:39] (03CR) 10CI reject: [V: 04-1] grid_configurator: use mwopenstackclients library [puppet] - 10https://gerrit.wikimedia.org/r/916588 (https://phabricator.wikimedia.org/T330759) (owner: 10Andrew Bogott) [17:38:49] (RdfStreamingUpdaterFlinkJobUnstable) firing: WDQS_Streaming_Updater in codfw (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/gCFgfpG7k/flink-session-cluster - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkJobUnstable [17:48:49] (RdfStreamingUpdaterFlinkJobUnstable) resolved: WDQS_Streaming_Updater in codfw (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/gCFgfpG7k/flink-session-cluster - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkJobUnstable [18:08:31] (Access port speed <= 100Mbps) firing: Alert for device asw2-b-eqiad.mgmt.eqiad.wmnet - Access port speed <= 100Mbps - https://alerts.wikimedia.org/?q=alertname%3DAccess+port+speed+%3C%3D+100Mbps [18:13:49] (RdfStreamingUpdaterFlinkJobUnstable) firing: WDQS_Streaming_Updater in codfw (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/gCFgfpG7k/flink-session-cluster - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkJobUnstable [18:23:49] (RdfStreamingUpdaterFlinkJobUnstable) resolved: WDQS_Streaming_Updater in codfw (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/gCFgfpG7k/flink-session-cluster - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkJobUnstable [18:41:43] (03PS1) 10Majavah: toolforge: wmcs-package-build: support .git suffix in URLs [puppet] - 10https://gerrit.wikimedia.org/r/916787 [18:42:07] (03CR) 10CI reject: [V: 04-1] toolforge: wmcs-package-build: support .git suffix in URLs [puppet] - 10https://gerrit.wikimedia.org/r/916787 (owner: 10Majavah) [18:42:19] (RdfStreamingUpdaterFlinkJobUnstable) firing: WDQS_Streaming_Updater in codfw (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/gCFgfpG7k/flink-session-cluster - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkJobUnstable [18:45:30] (03PS1) 10Majavah: P:toolforge: fix toolforge-cli config file location [puppet] - 10https://gerrit.wikimedia.org/r/916788 [18:45:32] (03PS1) 10Majavah: P:toolforge::k8s::client: remove toolforge-webservice link [puppet] - 10https://gerrit.wikimedia.org/r/916789 [18:45:34] (03PS1) 10Majavah: P:toolforge: actually install toolforge-cli package [puppet] - 10https://gerrit.wikimedia.org/r/916790 [18:45:36] (03PS1) 10Majavah: P:toolforge: install toolforge-logs-cli [puppet] - 10https://gerrit.wikimedia.org/r/916791 (https://phabricator.wikimedia.org/T336057) [18:45:38] (03PS1) 10Majavah: P:toolforge: merge jobs_framework_cli to toolforge_cli [puppet] - 10https://gerrit.wikimedia.org/r/916792 [18:46:52] (03CR) 10Majavah: [V: 03+1] "PCC SUCCESS (): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/41075/console" [puppet] - 10https://gerrit.wikimedia.org/r/916792 (owner: 10Majavah) [18:48:45] (03CR) 10CI reject: [V: 04-1] P:toolforge: fix toolforge-cli config file location [puppet] - 10https://gerrit.wikimedia.org/r/916788 (owner: 10Majavah) [18:49:00] (03CR) 10CI reject: [V: 04-1] P:toolforge::k8s::client: remove toolforge-webservice link [puppet] - 10https://gerrit.wikimedia.org/r/916789 (owner: 10Majavah) [18:49:09] (03CR) 10CI reject: [V: 04-1] P:toolforge: actually install toolforge-cli package [puppet] - 10https://gerrit.wikimedia.org/r/916790 (owner: 10Majavah) [18:49:33] (03PS2) 10Majavah: P:toolforge: fix toolforge-cli config file location [puppet] - 10https://gerrit.wikimedia.org/r/916788 [18:49:35] (03PS2) 10Majavah: P:toolforge::k8s::client: remove toolforge-webservice link [puppet] - 10https://gerrit.wikimedia.org/r/916789 [18:49:37] (03PS2) 10Majavah: P:toolforge: actually install toolforge-cli package [puppet] - 10https://gerrit.wikimedia.org/r/916790 [18:49:39] (03PS2) 10Majavah: P:toolforge: install toolforge-logs-cli [puppet] - 10https://gerrit.wikimedia.org/r/916791 (https://phabricator.wikimedia.org/T336057) [18:49:41] (03PS2) 10Majavah: P:toolforge: merge jobs_framework_cli to toolforge_cli [puppet] - 10https://gerrit.wikimedia.org/r/916792 [18:50:42] (03PS2) 10Majavah: toolforge: wmcs-package-build: support .git suffix in URLs [puppet] - 10https://gerrit.wikimedia.org/r/916787 [18:57:19] (RdfStreamingUpdaterFlinkJobUnstable) resolved: WDQS_Streaming_Updater in codfw (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/gCFgfpG7k/flink-session-cluster - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkJobUnstable [19:04:58] (RdfStreamingUpdaterHighConsumerUpdateLag) firing: (3) wdqs2005:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag [19:08:28] (RdfStreamingUpdaterHighConsumerUpdateLag) firing: (6) wdqs2004:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag [19:11:49] (RdfStreamingUpdaterFlinkJobUnstable) firing: WDQS_Streaming_Updater in codfw (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/gCFgfpG7k/flink-session-cluster - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkJobUnstable [19:13:31] (NodeTextfileStale) firing: Stale textfile for lists1003:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale [19:19:59] PROBLEM - mailman list info on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [19:21:27] RECOVERY - mailman list info on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8571 bytes in 0.239 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [19:21:49] (RdfStreamingUpdaterFlinkJobUnstable) resolved: WDQS_Streaming_Updater in codfw (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/gCFgfpG7k/flink-session-cluster - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkJobUnstable [19:35:49] (RdfStreamingUpdaterFlinkJobUnstable) firing: WDQS_Streaming_Updater in codfw (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/gCFgfpG7k/flink-session-cluster - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkJobUnstable [19:40:20] (03PS1) 10Majavah: Remove -pwb images [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/916795 [19:50:49] (RdfStreamingUpdaterFlinkJobUnstable) resolved: WDQS_Streaming_Updater in codfw (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/gCFgfpG7k/flink-session-cluster - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkJobUnstable [20:04:04] (RdfStreamingUpdaterFlinkJobUnstable) firing: WDQS_Streaming_Updater in codfw (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/gCFgfpG7k/flink-session-cluster - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkJobUnstable [20:27:15] (03Abandoned) 10DannyS712: Consolidate user rights assignments, part 2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/580414 (https://phabricator.wikimedia.org/T239771) (owner: 10DannyS712) [20:27:19] (03Abandoned) 10DannyS712: Consolidate user rights assignments, part 1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/562396 (https://phabricator.wikimedia.org/T239771) (owner: 10DannyS712) [20:27:45] (03Abandoned) 10DannyS712: Remove obsolete schema migration settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/602022 (owner: 10DannyS712) [21:25:13] PROBLEM - mailman list info on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [21:26:39] RECOVERY - mailman list info on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8571 bytes in 0.252 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [22:08:31] (Access port speed <= 100Mbps) firing: Alert for device asw2-b-eqiad.mgmt.eqiad.wmnet - Access port speed <= 100Mbps - https://alerts.wikimedia.org/?q=alertname%3DAccess+port+speed+%3C%3D+100Mbps [23:04:58] (RdfStreamingUpdaterHighConsumerUpdateLag) firing: (3) wdqs2005:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag [23:08:28] (RdfStreamingUpdaterHighConsumerUpdateLag) firing: (6) wdqs2004:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag [23:13:31] (NodeTextfileStale) firing: Stale textfile for lists1003:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale