[00:14:20] PROBLEM - MariaDB Replica Lag: s1 on db1140 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 843.52 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [00:17:39] !log brett@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4050.ulsfo.wmnet with OS bullseye [00:17:45] 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by brett@cumin1001 for host cp4050.ulsfo.wmnet with OS bullseye completed: - cp4050 (**PASS**) - Removed from Puppet and PuppetDB if present -... [00:21:56] ^^ I guess it's backuping from db1140, db2141 the other backup host has the exact replication lag [00:22:04] but not sure why it's alerting then [00:26:00] PROBLEM - MariaDB Replica Lag: s4 on db2099 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 1546.44 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [00:31:26] PROBLEM - MariaDB Replica Lag: s3 on db1102 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 1869.96 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [00:34:40] PROBLEM - MediaWiki exceptions and fatals per minute for parsoid on alert1001 is CRITICAL: 193 gt 100 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [00:35:47] !log brett@cumin1001 conftool action : set/pooled=yes; selector: name=cp4045.ulsfo.wmnet [00:36:21] !log brett@cumin1001 conftool action : set/pooled=yes; selector: name=cp4050.ulsfo.wmnet [00:36:24] RECOVERY - MediaWiki exceptions and fatals per minute for parsoid on alert1001 is OK: (C)100 gt (W)50 gt 19 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [00:37:06] 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10BCornwall) [00:44:03] (ProbeDown) firing: (2) Service centrallog1002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://wikitech.wikimedia.org/wiki/TLS/Runbook#centrallog1002:6514 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [01:13:29] (03PS2) 10Arlolra: Try to determine what's adding to Parsoid init times [core] (wmf/1.40.0-wmf.20) - 10https://gerrit.wikimedia.org/r/884138 [01:13:45] (03CR) 10Arlolra: Try to determine what's adding to Parsoid init times (031 comment) [core] (wmf/1.40.0-wmf.20) - 10https://gerrit.wikimedia.org/r/884138 (owner: 10Arlolra) [01:53:00] (03CR) 10Ladsgroup: [C: 03+1] drop_cul_user_text_T328086.py: New schema change [software/schema-changes] - 10https://gerrit.wikimedia.org/r/884221 (https://phabricator.wikimedia.org/T328086) (owner: 10Marostegui) [02:09:18] RECOVERY - MariaDB Replica Lag: s1 on db1140 is OK: OK slave_sql_lag Replication lag: 0.00 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [02:10:45] (JobUnavailable) firing: (3) Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [02:17:02] RECOVERY - MariaDB Replica Lag: s3 on db1102 is OK: OK slave_sql_lag Replication lag: 0.20 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [02:20:45] (JobUnavailable) firing: (3) Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [02:21:14] PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (Zotero and citoid alive) timed out before a response was received https://wikitech.wikimedia.org/wiki/Citoid [02:22:56] RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid [02:37:50] RECOVERY - MariaDB Replica Lag: s4 on db2099 is OK: OK slave_sql_lag Replication lag: 0.21 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [02:49:03] (ProbeDown) firing: (4) Service centrallog1002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [02:58:43] 10SRE, 10vm-requests: : VMs requested for - https://phabricator.wikimedia.org/T328184 (10Ontomoly2) [03:10:11] (Wikidata Reliability Metrics - wbeditentity API: executeTiming alert) firing: Wikidata Reliability Metrics - wbeditentity API: executeTiming alert - https://alerts.wikimedia.org/?q=alertname%3DWikidata+Reliability+Metrics+-+wbeditentity+API%3A+executeTiming+alert [04:36:47] (03CR) 10Subramanya Sastry: Try to determine what's adding to Parsoid init times (031 comment) [core] (wmf/1.40.0-wmf.20) - 10https://gerrit.wikimedia.org/r/884138 (owner: 10Arlolra) [05:24:48] 10SRE, 10LDAP-Access-Requests: Grant Access to 'cn=nda or cn=wmf' for ekalkst - https://phabricator.wikimedia.org/T328145 (10Ekalkst) volunteer? Thank you for your time Eric Kalkstein ________________________________ From: Dzahn Sent: Friday, January 27, 2023 3:38 PM To:... [06:20:45] (JobUnavailable) firing: Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [06:49:03] (ProbeDown) firing: (2) Service centrallog1002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://wikitech.wikimedia.org/wiki/TLS/Runbook#centrallog1002:6514 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [06:59:07] (03CR) 10Marostegui: [C: 03+2] drop_cul_user_text_T328086.py: New schema change [software/schema-changes] - 10https://gerrit.wikimedia.org/r/884221 (https://phabricator.wikimedia.org/T328086) (owner: 10Marostegui) [06:59:31] (03Merged) 10jenkins-bot: drop_cul_user_text_T328086.py: New schema change [software/schema-changes] - 10https://gerrit.wikimedia.org/r/884221 (https://phabricator.wikimedia.org/T328086) (owner: 10Marostegui) [07:10:11] (Wikidata Reliability Metrics - wbeditentity API: executeTiming alert) firing: Wikidata Reliability Metrics - wbeditentity API: executeTiming alert - https://alerts.wikimedia.org/?q=alertname%3DWikidata+Reliability+Metrics+-+wbeditentity+API%3A+executeTiming+alert [08:01:42] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [09:31:01] 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users for Abhas - https://phabricator.wikimedia.org/T328015 (10JanWMF) approved [10:13:58] (KubernetesAPILatency) firing: High Kubernetes API latency (UPDATE certificaterequests) on k8s-staging@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=eqiad&var-cluster=k8s-staging - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [10:18:58] (KubernetesAPILatency) resolved: High Kubernetes API latency (UPDATE certificaterequests) on k8s-staging@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=eqiad&var-cluster=k8s-staging - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [10:20:45] (JobUnavailable) firing: Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [10:49:03] (ProbeDown) firing: (2) Service centrallog1002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://wikitech.wikimedia.org/wiki/TLS/Runbook#centrallog1002:6514 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [10:50:55] 10SRE, 10LDAP-Access-Requests: Grant Access to 'cn=nda or cn=wmf' for ekalkst - https://phabricator.wikimedia.org/T328145 (10Dzahn) >>! In T328145#8566393, @Ekalkst wrote: > volunteer? Alright, thanks! That would mean it's about the "nda" group (not the "wmf" group). That was to clarify that. Volunteer acces... [10:52:38] mutante: https://ldap.toolforge.org/user/Ekalkst [10:52:45] So just never logged in at Wikitech [10:53:00] They do not need nda there [10:54:19] 10SRE, 10LDAP-Access-Requests: Grant Access to 'cn=nda or cn=wmf' for ekalkst - https://phabricator.wikimedia.org/T328145 (10RhinosF1) Why do you need grafana access? You seem to have 0 previous engagement with the WMF. What is the purpose and who approved your work? I don’t see this being granted on the in... [11:10:11] (Wikidata Reliability Metrics - wbeditentity API: executeTiming alert) firing: Wikidata Reliability Metrics - wbeditentity API: executeTiming alert - https://alerts.wikimedia.org/?q=alertname%3DWikidata+Reliability+Metrics+-+wbeditentity+API%3A+executeTiming+alert [11:48:56] PROBLEM - snapshot of s3 in codfw on backupmon1001 is CRITICAL: snapshot for s3 at codfw (db2139) taken more than 3 days ago: Most recent backup 2023-01-25 11:41:40 https://wikitech.wikimedia.org/wiki/MariaDB/Backups%23Rerun_a_failed_backup [12:00:16] RECOVERY - Check systemd state on maps2009 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [12:05:30] PROBLEM - Check systemd state on maps2009 is CRITICAL: CRITICAL - degraded: The following units failed: planet_sync_tile_generation-gis.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [12:27:27] (03CR) 10Superpes15: "This change is ready for review." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/884333 (https://phabricator.wikimedia.org/T328194) (owner: 10Superpes15) [12:52:53] (03PS3) 10Superpes15: Allow AbuseFilter to block IPs and users on itwikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/884333 (https://phabricator.wikimedia.org/T328194) [12:54:52] (03PS4) 10Superpes15: Allow AbuseFilter to block IPs and users on itwikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/884333 (https://phabricator.wikimedia.org/T328194) [13:59:55] (LogstashKafkaConsumerLag) firing: Too many messages in kafka logging - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag [14:04:55] (LogstashKafkaConsumerLag) resolved: Too many messages in kafka logging - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag [14:20:45] (JobUnavailable) firing: Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [14:49:03] (ProbeDown) firing: (2) Service centrallog1002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://wikitech.wikimedia.org/wiki/TLS/Runbook#centrallog1002:6514 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [14:55:11] (Wikidata Reliability Metrics - wbeditentity API: executeTiming alert) firing: (2) Wikidata Reliability Metrics - wbeditentity API: executeTiming alert - https://alerts.wikimedia.org/?q=alertname%3DWikidata+Reliability+Metrics+-+wbeditentity+API%3A+executeTiming+alert [15:15:11] (Wikidata Reliability Metrics - wbeditentity API: executeTiming alert) resolved: Wikidata Reliability Metrics - wbeditentity API: executeTiming alert - https://alerts.wikimedia.org/?q=alertname%3DWikidata+Reliability+Metrics+-+wbeditentity+API%3A+executeTiming+alert [15:35:45] (JobUnavailable) firing: (2) Reduced availability for job atlas_exporter in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [15:40:45] (JobUnavailable) firing: (2) Reduced availability for job atlas_exporter in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [16:10:45] (JobUnavailable) firing: (2) Reduced availability for job atlas_exporter in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [16:28:49] 10SRE, 10vm-requests: : VMs requested for - https://phabricator.wikimedia.org/T328199 (10Themindcoder) [16:29:10] (03PS3) 10Arlolra: Try to determine what's adding to Parsoid init times [core] (wmf/1.40.0-wmf.20) - 10https://gerrit.wikimedia.org/r/884138 [16:31:20] (03CR) 10Arlolra: Try to determine what's adding to Parsoid init times (031 comment) [core] (wmf/1.40.0-wmf.20) - 10https://gerrit.wikimedia.org/r/884138 (owner: 10Arlolra) [16:34:47] (03PS2) 10Giuseppe Lavagetto: mediawiki: adapt rsyslog parsing of slowlog to ecs 1.11 [deployment-charts] - 10https://gerrit.wikimedia.org/r/884360 [16:48:24] (03PS4) 10Arlolra: Try to determine what's adding to Parsoid init times [core] (wmf/1.40.0-wmf.20) - 10https://gerrit.wikimedia.org/r/884138 (https://phabricator.wikimedia.org/T328201) [16:51:50] PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (Zotero and citoid alive) timed out before a response was received https://wikitech.wikimedia.org/wiki/Citoid [16:52:10] PROBLEM - Kafka MirrorMaker main-eqiad_to_main-codfw max lag in last 10 minutes on alert1001 is CRITICAL: 1.001e+05 gt 1e+05 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/d/000000521/kafka-mirrormaker?var-datasource=codfw+prometheus/ops&var-lag_datasource=eqiad+prometheus/ops&var-mirror_name=main-eqiad_to_main-codfw [16:53:36] RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid [17:28:18] PROBLEM - Router interfaces on cr2-eqsin is CRITICAL: CRITICAL: host 103.102.166.130, interfaces up: 80, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [17:31:52] PROBLEM - Router interfaces on cr2-eqsin is CRITICAL: CRITICAL: host 103.102.166.130, interfaces up: 80, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [17:32:23] (Wikidata Reliability Metrics - wbeditentity API: executeTiming alert) firing: Wikidata Reliability Metrics - wbeditentity API: executeTiming alert - https://alerts.wikimedia.org/?q=alertname%3DWikidata+Reliability+Metrics+-+wbeditentity+API%3A+executeTiming+alert [17:43:08] PROBLEM - Kafka MirrorMaker main-eqiad_to_main-codfw max lag in last 10 minutes on alert1001 is CRITICAL: 1.016e+05 gt 1e+05 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/d/000000521/kafka-mirrormaker?var-datasource=codfw+prometheus/ops&var-lag_datasource=eqiad+prometheus/ops&var-mirror_name=main-eqiad_to_main-codfw [18:02:30] PROBLEM - Kafka MirrorMaker main-eqiad_to_main-codfw max lag in last 10 minutes on alert1001 is CRITICAL: 1.005e+05 gt 1e+05 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/d/000000521/kafka-mirrormaker?var-datasource=codfw+prometheus/ops&var-lag_datasource=eqiad+prometheus/ops&var-mirror_name=main-eqiad_to_main-codfw [18:49:03] (ProbeDown) firing: (2) Service centrallog1002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://wikitech.wikimedia.org/wiki/TLS/Runbook#centrallog1002:6514 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [19:32:18] RECOVERY - Kafka MirrorMaker main-eqiad_to_main-codfw max lag in last 10 minutes on alert1001 is OK: (C)1e+05 gt (W)1e+04 gt 2015 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/d/000000521/kafka-mirrormaker?var-datasource=codfw+prometheus/ops&var-lag_datasource=eqiad+prometheus/ops&var-mirror_name=main-eqiad_to_main-codfw [19:52:23] (Wikidata Reliability Metrics - wbeditentity API: executeTiming alert) firing: (2) Wikidata Reliability Metrics - wbeditentity API: executeTiming alert - https://alerts.wikimedia.org/?q=alertname%3DWikidata+Reliability+Metrics+-+wbeditentity+API%3A+executeTiming+alert [20:10:45] (JobUnavailable) firing: Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [20:40:32] PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 140, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [20:56:36] RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 141, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [20:57:08] (03PS1) 10Urbanecm: [Growth] Remove wgGERecentChangesUnstarredMenteesFilterEnabled [mediawiki-config] - 10https://gerrit.wikimedia.org/r/884427 [21:07:16] PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 140, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [21:12:38] RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 141, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [21:42:36] (03PS2) 10Urbanecm: [Growth] Remove wgGERecentChangesUnstarredMenteesFilterEnabled [mediawiki-config] - 10https://gerrit.wikimedia.org/r/884427 [21:44:07] (03CR) 10Urbanecm: [C: 04-1] Allow AbuseFilter to block IPs and users on itwikiversity (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/884333 (https://phabricator.wikimedia.org/T328194) (owner: 10Superpes15) [21:52:23] (Wikidata Reliability Metrics - wbeditentity API: executeTiming alert) resolved: Wikidata Reliability Metrics - wbeditentity API: executeTiming alert - https://alerts.wikimedia.org/?q=alertname%3DWikidata+Reliability+Metrics+-+wbeditentity+API%3A+executeTiming+alert [21:58:26] (03PS5) 10Superpes15: Allow AbuseFilter to block IPs and users on itwikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/884333 (https://phabricator.wikimedia.org/T328194) [21:58:42] (03PS6) 10Superpes15: Allow AbuseFilter to block IPs and users on itwikiversity and fixing alphabetical order [mediawiki-config] - 10https://gerrit.wikimedia.org/r/884333 (https://phabricator.wikimedia.org/T328194) [21:59:11] (03CR) 10Superpes15: Allow AbuseFilter to block IPs and users on itwikiversity and fixing alphabetical order (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/884333 (https://phabricator.wikimedia.org/T328194) (owner: 10Superpes15) [22:32:26] (Wikidata Reliability Metrics - wbeditentity API: executeTiming alert) firing: Wikidata Reliability Metrics - wbeditentity API: executeTiming alert - https://alerts.wikimedia.org/?q=alertname%3DWikidata+Reliability+Metrics+-+wbeditentity+API%3A+executeTiming+alert [22:38:15] 10SRE: On wikimediastatus.net, graph tooltips disappear when trying to scroll them - https://phabricator.wikimedia.org/T328206 (10Rummskartoffel) [22:49:03] (ProbeDown) firing: (2) Service centrallog1002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://wikitech.wikimedia.org/wiki/TLS/Runbook#centrallog1002:6514 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [23:31:35] 10SRE: On wikimediastatus.net, graph tooltips disappear when trying to scroll them - https://phabricator.wikimedia.org/T328206 (10Peachey88) [23:31:43] 10SRE, 10Incident Tooling: wikimediastatus.net help popups are unreadable - https://phabricator.wikimedia.org/T327201 (10Peachey88)