[00:34:11] (03PS1) 10Urbanecm: cswikibooks: Enable visualeditor for all users [mediawiki-config] - 10https://gerrit.wikimedia.org/r/890173 (https://phabricator.wikimedia.org/T330015) [00:34:40] PROBLEM - Check systemd state on logstash1023 is CRITICAL: CRITICAL - degraded: The following units failed: run-dashboards-backup.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:19:00] 10ops-codfw: ManagementSSHDown - https://phabricator.wikimedia.org/T328832 (10phaultfinder) [01:23:03] 10ops-codfw: ManagementSSHDown - https://phabricator.wikimedia.org/T328832 (10phaultfinder) [02:07:45] (JobUnavailable) firing: (3) Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [02:22:45] (JobUnavailable) firing: (3) Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [05:23:47] 10ops-codfw: ManagementSSHDown - https://phabricator.wikimedia.org/T328832 (10phaultfinder) [05:27:49] 10ops-codfw: ManagementSSHDown - https://phabricator.wikimedia.org/T328832 (10phaultfinder) [06:22:45] (JobUnavailable) firing: Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [06:30:17] (MediaWikiHighErrorRate) firing: (2) Elevated rate of MediaWiki errors - jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [06:35:17] (MediaWikiHighErrorRate) resolved: (2) Elevated rate of MediaWiki errors - jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [08:00:04] Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230219T0800) [09:07:33] 10SRE, 10Traffic, 10IPv6: Start a pure IPv6 web site for wikimedia services - https://phabricator.wikimedia.org/T330020 (10Peachey88) [09:24:02] 10ops-codfw: ManagementSSHDown - https://phabricator.wikimedia.org/T328832 (10phaultfinder) [09:28:04] 10ops-codfw: ManagementSSHDown - https://phabricator.wikimedia.org/T328832 (10phaultfinder) [10:22:45] (JobUnavailable) firing: Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [13:13:21] (03PS1) 10Andrew Bogott: cloud-vps vm backup: don't run purge job until backup job completes [puppet] - 10https://gerrit.wikimedia.org/r/890185 (https://phabricator.wikimedia.org/T330022) [13:23:18] (03PS2) 10Andrew Bogott: cloud-vps vm backup: don't run purge job until backup job completes [puppet] - 10https://gerrit.wikimedia.org/r/890185 (https://phabricator.wikimedia.org/T330022) [13:27:11] 10SRE, 10Traffic, 10IPv6: Start a pure IPv6 web site for wikimedia services - https://phabricator.wikimedia.org/T330020 (10I) >>! In T330020#8627903, @MrAureliusR wrote: > For Chinese users that want to access Wikipedia, aren't projects like Tor Snowflake effective? Unfortunately, the Chinese government has... [13:28:08] (03CR) 10Andrew Bogott: [C: 03+2] cloud-vps vm backup: don't run purge job until backup job completes [puppet] - 10https://gerrit.wikimedia.org/r/890185 (https://phabricator.wikimedia.org/T330022) (owner: 10Andrew Bogott) [13:28:47] 10ops-codfw: ManagementSSHDown - https://phabricator.wikimedia.org/T328832 (10phaultfinder) [13:32:48] 10ops-codfw: ManagementSSHDown - https://phabricator.wikimedia.org/T328832 (10phaultfinder) [13:46:03] 10SRE, 10Traffic, 10IPv6: Start a pure IPv6 web site for wikimedia services - https://phabricator.wikimedia.org/T330020 (10MrAureliusR) >>! In T330020#8628041, @I wrote: >>>! In T330020#8627903, @MrAureliusR wrote: >> For Chinese users that want to access Wikipedia, aren't projects like Tor Snowflake effecti... [14:17:32] 10SRE, 10DNS: Let all requests from mainland China will be processed to codfw/esams/drmrs - https://phabricator.wikimedia.org/T330024 (10I) [14:22:45] (JobUnavailable) firing: Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [15:45:22] (03PS1) 10Stang: zhwiki(books|quote): Enable block feature for AbuseFilter [mediawiki-config] - 10https://gerrit.wikimedia.org/r/890187 (https://phabricator.wikimedia.org/T330026) [15:50:09] (03PS2) 10Stang: zhwiki(books|quote): Enable block feature for AbuseFilter [mediawiki-config] - 10https://gerrit.wikimedia.org/r/890187 (https://phabricator.wikimedia.org/T330026) [15:50:36] (03PS3) 10Stang: zhwiki(books|quote): Enable block feature for AbuseFilter [mediawiki-config] - 10https://gerrit.wikimedia.org/r/890187 (https://phabricator.wikimedia.org/T330026) [15:52:20] PROBLEM - Check systemd state on kubestagemaster1001 is CRITICAL: CRITICAL - degraded: The following units failed: kube-controller-manager.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [17:01:16] (Nonwrite HTTP requests with primary DB writes alert) firing: Nonwrite HTTP requests with primary DB writes alert - https://alerts.wikimedia.org/?q=alertname%3DNonwrite+HTTP+requests+with+primary+DB+writes+alert [17:23:44] That’s new [17:29:03] 10ops-codfw: ManagementSSHDown - https://phabricator.wikimedia.org/T328832 (10phaultfinder) [17:33:03] 10ops-codfw: ManagementSSHDown - https://phabricator.wikimedia.org/T328832 (10phaultfinder) [18:17:28] PROBLEM - Check systemd state on kubestagemaster2001 is CRITICAL: CRITICAL - degraded: The following units failed: kube-controller-manager.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [18:21:16] (Nonwrite HTTP requests with primary DB writes alert) firing: (2) Nonwrite HTTP requests with primary DB writes alert - https://alerts.wikimedia.org/?q=alertname%3DNonwrite+HTTP+requests+with+primary+DB+writes+alert [18:22:45] (JobUnavailable) firing: Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [18:36:16] (Nonwrite HTTP requests with primary DB writes alert) resolved: Nonwrite HTTP requests with primary DB writes alert - https://alerts.wikimedia.org/?q=alertname%3DNonwrite+HTTP+requests+with+primary+DB+writes+alert [20:41:30] 10SRE, 10DNS, 10Traffic: Let all requests from mainland China will be processed to codfw/esams/drmrs - https://phabricator.wikimedia.org/T330024 (10Bugreporter) [21:31:57] (ProbeDown) firing: Service centrallog2002:6514 has failed probes (tcp_rsyslog_receiver_ip6) - https://wikitech.wikimedia.org/wiki/TLS/Runbook#centrallog2002:6514 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [21:33:47] 10ops-codfw: ManagementSSHDown - https://phabricator.wikimedia.org/T328832 (10phaultfinder) [21:36:57] (ProbeDown) resolved: Service centrallog2002:6514 has failed probes (tcp_rsyslog_receiver_ip6) - https://wikitech.wikimedia.org/wiki/TLS/Runbook#centrallog2002:6514 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [21:37:48] 10ops-codfw: ManagementSSHDown - https://phabricator.wikimedia.org/T328832 (10phaultfinder) [22:22:45] (JobUnavailable) firing: Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable