[07:01:17] (03PS5) 10Joal: Update mediawiki-history page computation [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/842922 (https://phabricator.wikimedia.org/T318589) [11:12:59] (03PS1) 10Gerrit maintenance bot: Add shn.wikibooks to pageview whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/844551 (https://phabricator.wikimedia.org/T321283) [11:13:26] (03PS1) 10Gerrit maintenance bot: Add guw.wikiquote to pageview whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/844553 (https://phabricator.wikimedia.org/T321289) [11:13:52] (03PS1) 10Gerrit maintenance bot: Add as.wikiquote to pageview whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/844555 (https://phabricator.wikimedia.org/T321295) [12:04:00] (03PS6) 10Joal: Update mediawiki-history page and user computation [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/842922 (https://phabricator.wikimedia.org/T318589) [12:21:52] 10Data-Engineering-Planning, 10SRE, 10Traffic: Add a rolled-up cache_status field to druid webrequest_sampled_128 - https://phabricator.wikimedia.org/T319344 (10LSobanski) [12:28:29] (03CR) 10Joal: [V: 03+2 C: 03+2] "Merging for next deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/844551 (https://phabricator.wikimedia.org/T321283) (owner: 10Gerrit maintenance bot) [12:29:47] (03CR) 10Joal: [V: 03+2 C: 03+2] "Merging for next deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/844553 (https://phabricator.wikimedia.org/T321289) (owner: 10Gerrit maintenance bot) [12:30:22] (03CR) 10Joal: [V: 03+2 C: 03+2] "Merging for next deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/844555 (https://phabricator.wikimedia.org/T321295) (owner: 10Gerrit maintenance bot) [12:42:12] (VarnishkafkaNoMessages) firing: (3) varnishkafka on cp2027 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [12:47:12] (VarnishkafkaNoMessages) resolved: (4) varnishkafka on cp2027 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [13:08:31] PROBLEM - Host aqs1013.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [13:10:27] PROBLEM - Host an-conf1002.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [13:10:31] PROBLEM - Host an-test-worker1002.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [13:11:36] --^ It looks like there is another management switch either throwing errors or being worked on. Because these are `.mgmt` hosts there is nothing that we have to worry about with these. [13:13:40] ack btullis - thanks for letting us know :) [13:34:11] RECOVERY - Host an-conf1002.mgmt is UP: PING OK - Packet loss = 0%, RTA = 0.07 ms [13:43:03] ACKNOWLEDGEMENT - Host an-test-worker1002.mgmt is DOWN: PING CRITICAL - Packet loss = 100% Cathal Mooney Hosts down due to failure of rack C5 mgmt switch msw-c5-eqiad - The acknowledgement expires at: 2022-10-21 13:42:26. [13:43:03] ACKNOWLEDGEMENT - SSH on aqs1013.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds Cathal Mooney Hosts down due to failure of rack C5 mgmt switch msw-c5-eqiad - The acknowledgement expires at: 2022-10-21 13:42:26. https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [13:43:03] ACKNOWLEDGEMENT - Host aqs1013.mgmt is DOWN: PING CRITICAL - Packet loss = 100% Cathal Mooney Hosts down due to failure of rack C5 mgmt switch msw-c5-eqiad - The acknowledgement expires at: 2022-10-21 13:42:26. [14:00:16] RECOVERY - Host aqs1013.mgmt is UP: PING OK - Packet loss = 0%, RTA = 0.08 ms [14:02:08] RECOVERY - Host an-test-worker1002.mgmt is UP: PING OK - Packet loss = 0%, RTA = 0.93 ms [14:03:48] 10Data-Engineering-Planning, 10Patch-For-Review, 10Shared-Data-Infrastructure (Sprint 03): Add spark and spark-operator images to operations/docker-images/production-images - https://phabricator.wikimedia.org/T318730 (10BTullis) I'm now attempting the first production build of these containers. Hopefully the... [14:30:31] 10Data-Engineering-Planning, 10Event-Platform Value Stream: Event Platform and DataHub Integration - https://phabricator.wikimedia.org/T318863 (10mpopov) @EChetty is this essentially a duplicate of {T307040}? [15:48:16] 10Data-Engineering-Operations, 10Data-Engineering-Planning, 10Mail, 10SRE: Change the analytics-alerts email alias to a mailman distribution list - https://phabricator.wikimedia.org/T315486 (10BTullis) [15:49:36] 10Data-Engineering-Operations, 10Data-Engineering-Planning, 10Mail, 10SRE: Change the analytics-alerts email alias to a mailman distribution list - https://phabricator.wikimedia.org/T315486 (10BTullis) > Mostly, don't worry about sudo, I can create it for you. Just noting that it will be "whatever-name-ale... [15:50:20] 10Data-Engineering-Operations, 10Data-Engineering-Planning, 10Mail, 10SRE: Change the analytics-alerts email alias to a mailman distribution list - https://phabricator.wikimedia.org/T315486 (10BTullis) 05Resolved→03Open [15:50:46] 10Data-Engineering-Operations, 10Data-Engineering-Planning, 10Mail, 10SRE: Change the analytics-alerts email alias to a mailman distribution list - https://phabricator.wikimedia.org/T315486 (10Dzahn) Thank you very much for doign this, @BTullis [15:52:31] 10Data-Engineering-Operations, 10Data-Engineering-Planning, 10Mail, 10SRE: Change the analytics-alerts email alias to a mailman distribution list - https://phabricator.wikimedia.org/T315486 (10Dzahn) @BTullis Done! Please check your mail. ` [lists1001:~] $ sudo mailman-wrapper create --owner btullis@wik... [16:08:03] 10Data-Engineering-Planning, 10Shared-Data-Infrastructure, 10Data Pipelines (Sprint 03): Create Plan for Spark 2 Deprecation - https://phabricator.wikimedia.org/T318367 (10EChetty) [16:08:24] 10Data-Engineering-Planning, 10Data Pipelines (Sprint 03), 10Patch-For-Review, 10Technical-Debt: Prepare the fsimage - https://phabricator.wikimedia.org/T321167 (10Antoine_Quhen) https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/175 [16:29:20] 10Data-Engineering-Operations, 10Data-Engineering-Planning, 10Mail, 10SRE: Change the analytics-alerts email alias to a mailman distribution list - https://phabricator.wikimedia.org/T315486 (10Dzahn) The very last step would then be to remove the line from the puppetized exim aliases in the private repo. [21:58:12] (VarnishkafkaNoMessages) firing: varnishkafka on cp4037 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=ulsfo%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp4037%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [22:03:12] (VarnishkafkaNoMessages) resolved: varnishkafka on cp4037 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=ulsfo%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp4037%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [22:37:12] (VarnishkafkaNoMessages) firing: varnishkafka on cp4037 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=ulsfo%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp4037%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [22:42:12] (VarnishkafkaNoMessages) resolved: varnishkafka on cp4037 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=ulsfo%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp4037%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [23:03:12] (VarnishkafkaNoMessages) firing: varnishkafka on cp3054 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=esams%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp3054%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [23:04:12] (VarnishkafkaNoMessages) firing: (2) varnishkafka on cp2035 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [23:05:56] 10Data-Engineering, 10Product-Analytics: Identify imported revisions in mediawiki_history - https://phabricator.wikimedia.org/T221482 (10nshahquinn-wmf) >>! In T221482#8330593, @Milimetric wrote: > then just mark all the revisions that have much larger revision ids than their parent (via `rev_parent_id` as `re... [23:08:12] (VarnishkafkaNoMessages) resolved: (6) varnishkafka on cp2031 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [23:09:12] (VarnishkafkaNoMessages) resolved: (2) varnishkafka on cp2035 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [23:10:12] (VarnishkafkaNoMessages) firing: varnishkafka on cp1085 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=eqiad%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp1085%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [23:15:12] (VarnishkafkaNoMessages) resolved: (3) varnishkafka on cp1085 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [23:29:12] (VarnishkafkaNoMessages) firing: varnishkafka on cp4037 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=ulsfo%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp4037%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [23:34:12] (VarnishkafkaNoMessages) resolved: varnishkafka on cp4037 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=ulsfo%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp4037%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [23:37:13] (VarnishkafkaNoMessages) firing: varnishkafka on cp4037 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=ulsfo%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp4037%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [23:42:13] (VarnishkafkaNoMessages) resolved: varnishkafka on cp4037 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=ulsfo%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp4037%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [23:46:13] (VarnishkafkaNoMessages) firing: varnishkafka on cp4037 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=ulsfo%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp4037%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [23:51:12] (VarnishkafkaNoMessages) resolved: varnishkafka on cp4037 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=ulsfo%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp4037%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages