[00:08:16] (MediaWikiLatencyExceeded) resolved: Average latency high: eqiad parsoid GET/200 - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad%20prometheus/ops&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [00:26:16] (MediaWikiLatencyExceeded) firing: Average latency high: eqiad parsoid GET/200 - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad%20prometheus/ops&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [00:33:03] (KubernetesAPILatency) firing: High Kubernetes API latency (PUT customresourcedefinitions) on k8s-mlstaging@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=codfw&var-cluster=k8s-mlstaging - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [00:38:03] (KubernetesAPILatency) resolved: High Kubernetes API latency (PUT customresourcedefinitions) on k8s-mlstaging@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=codfw&var-cluster=k8s-mlstaging - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [00:38:22] (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/938331 [00:38:28] (03CR) 10TrainBranchBot: [C: 03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/938331 (owner: 10TrainBranchBot) [00:53:07] (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/938331 (owner: 10TrainBranchBot) [01:53:23] (SystemdUnitFailed) firing: (2) load-categories-daily.service Failed on wdqs2021:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:08:23] (JobUnavailable) firing: (4) Reduced availability for job cloud_dev_pdns in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [02:28:23] (JobUnavailable) firing: (4) Reduced availability for job cloud_dev_pdns in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [03:36:16] (MediaWikiLatencyExceeded) resolved: Average latency high: eqiad parsoid GET/200 - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad%20prometheus/ops&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [04:34:49] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [04:45:17] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [05:36:16] (MediaWikiLatencyExceeded) firing: Average latency high: eqiad parsoid GET/200 - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad%20prometheus/ops&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [05:53:23] (SystemdUnitFailed) firing: (2) load-categories-daily.service Failed on wdqs2021:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:28:23] (JobUnavailable) firing: (2) Reduced availability for job cloud_dev_pdns in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [07:00:05] Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230716T0700) [07:57:32] (03CR) 10Kaleem Bhatti: [C: 03+1] "Please add to review I'm unable to add I dont know command how to submit with git" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/937922 (https://phabricator.wikimedia.org/T268203) (owner: 10Kaleem Bhatti) [09:26:03] PROBLEM - Check systemd state on cumin2002 is CRITICAL: CRITICAL - degraded: The following units failed: httpbb_kubernetes_mw-api-ext_hourly.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [09:31:59] PROBLEM - Check unit status of httpbb_kubernetes_mw-api-ext_hourly on cumin2002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-api-ext_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [09:36:16] (MediaWikiLatencyExceeded) firing: Average latency high: eqiad parsoid GET/200 - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad%20prometheus/ops&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [09:53:23] (SystemdUnitFailed) firing: (2) load-categories-daily.service Failed on wdqs2021:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:23:13] RECOVERY - Check systemd state on cumin2002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [10:24:27] RECOVERY - Check unit status of httpbb_kubernetes_mw-api-ext_hourly on cumin2002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-api-ext_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [10:28:23] (JobUnavailable) firing: (2) Reduced availability for job cloud_dev_pdns in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [11:04:42] (03CR) 10Kaleem Bhatti: [C: 03+1] "someone please start review for this I don't commands please" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/937922 (https://phabricator.wikimedia.org/T268203) (owner: 10Kaleem Bhatti) [12:32:28] (03PS3) 10Anzx: Add appendix namespace aliases on huwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/938315 (https://phabricator.wikimedia.org/T341926) [13:24:03] (ProbeDown) firing: (2) Service centrallog2002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://wikitech.wikimedia.org/wiki/TLS/Runbook#centrallog2002:6514 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [13:29:03] (ProbeDown) resolved: (2) Service centrallog2002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://wikitech.wikimedia.org/wiki/TLS/Runbook#centrallog2002:6514 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [13:36:16] (MediaWikiLatencyExceeded) firing: Average latency high: eqiad parsoid GET/200 - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad%20prometheus/ops&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [13:53:25] (SystemdUnitFailed) firing: (2) load-categories-daily.service Failed on wdqs2021:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:08:23] (JobUnavailable) firing: (4) Reduced availability for job cloud_dev_pdns in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [14:14:23] (JobUnavailable) firing: (4) Reduced availability for job cloud_dev_pdns in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [15:04:46] (03CR) 10Paladox: "This change is ready for review." [software/gerrit] (deploy/wmf/stable-3.5) - 10https://gerrit.wikimedia.org/r/938318 (owner: 10Paladox) [17:20:58] !log starting rsync of sql/xml dumps files, pulling from dumpsdata1004, running in ariel screen session on dumpsdata1007, bw limited to 1G [17:21:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:22:06] for those following along, this is prep to swapping out the current sql/xml dums fallback nfs share, running buster , for one running bullseye; that should make all production nfs shares be on bullseye [17:36:16] (MediaWikiLatencyExceeded) firing: Average latency high: eqiad parsoid GET/200 - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad%20prometheus/ops&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [17:54:22] (SystemdUnitFailed) firing: (2) load-categories-daily.service Failed on wdqs2021:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:18:23] (JobUnavailable) firing: (2) Reduced availability for job cloud_dev_pdns in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [18:43:18] (KubernetesAPILatency) firing: High Kubernetes API latency (POST nodes) on k8s-staging@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=codfw&var-cluster=k8s-staging - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [18:48:18] (KubernetesAPILatency) resolved: High Kubernetes API latency (POST nodes) on k8s-staging@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=codfw&var-cluster=k8s-staging - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [21:36:16] (MediaWikiLatencyExceeded) firing: Average latency high: eqiad parsoid GET/200 - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad%20prometheus/ops&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [21:58:23] (SystemdUnitFailed) firing: (2) load-categories-daily.service Failed on wdqs2021:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [22:18:23] (JobUnavailable) firing: (2) Reduced availability for job cloud_dev_pdns in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [22:57:10] (03CR) 10Tacsipacsi: [C: 03+1] "Thanks!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/938315 (https://phabricator.wikimedia.org/T341926) (owner: 10Anzx)