[03:35:02] PROBLEM - Check unit status of refinery-sqoop-whole-mediawiki on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit refinery-sqoop-whole-mediawiki https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [03:37:12] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: refinery-sqoop-whole-mediawiki.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [08:22:55] 10Analytics-Clusters, 10Voice & Tone: Rename geoeditors_blacklist_country - https://phabricator.wikimedia.org/T259804 (10Aklapper) [09:12:12] (VarnishkafkaNoMessages) firing: ... [09:12:12] varnishkafka for instance cp2031:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=codfw%20prometheus/ops&var-source=eventlogging&var-cp_cluster=cache_text&var-instance=cp2031:9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [09:20:42] All of these varnishkafka alerts are false positives and can be ignored. They are caused by ongoing work in codfw. They will go away when I have finished T300246 [09:20:43] T300246: Add alert for varnishkafka low/zero messages per second to alertmanager - https://phabricator.wikimedia.org/T300246 [09:39:48] 10Data-Engineering, 10Event-Platform: jsonschema-tools tests should fail if schema $id does not match title or path - https://phabricator.wikimedia.org/T300404 (10EChetty) [09:56:12] .11 [09:56:14] uff :) [09:57:37] Always nice to have you around elukey, even accidentally :-) [09:57:56] ahahaha thanks <3 [09:58:36] btullis: if you need any help with dse-etcd let me know, I am around this afternoon [09:58:40] I'll review also the PKI patch :) [09:59:42] Great, thanks. I spoke to j.bond about whether to use cergen or PKI for this - and you can guess the answer he gave :-) [10:00:20] I am 100% supporting this :) [10:00:37] I hope sooner or later to keep going with the kafka work, that is basically ready to go :D [10:01:31] there is a big movement this morning to depool codfw services, including an etcd node, so let's make sure first that the change is a no-op for all non-dse etcd nodes etc.. [10:01:40] just to avoid raising eyebrows here and there :D [10:02:54] Agreed. This PCC run shows that the only difference is a parameter change, which defaults to false on all nodes: https://puppet-compiler.wmflabs.org/pcc-worker1001/1384/ [10:10:29] (03CR) 10Emil Chetty: [C: 03+1] "Lgtm" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/819043 (https://phabricator.wikimedia.org/T314151) (owner: 10Phuedx) [10:12:23] super [10:12:32] btullis: I'll review it after lunch! [12:08:30] 10Analytics, 10Analytics-Wikistats, 10Data-Engineering: Confusion in two names of Kashmiri language; - https://phabricator.wikimedia.org/T314476 (10Tajamul9) [12:17:08] 10Analytics, 10Analytics-Wikistats, 10Data-Engineering: WikiStats in Uzbek - https://phabricator.wikimedia.org/T314477 (10Nataev) [12:34:31] (03CR) 10Michael Große: Add metric_id column to Wikidata EntitySchema text HQL (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/817837 (owner: 10Michael Große) [13:12:27] (VarnishkafkaNoMessages) firing: ... [13:12:27] varnishkafka for instance cp2031:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=codfw%20prometheus/ops&var-source=eventlogging&var-cp_cluster=cache_text&var-instance=cp2031:9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [13:42:12] (VarnishkafkaNoMessages) firing: (3) varnishkafka for instance cp2031:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [13:52:12] (VarnishkafkaNoMessages) firing: (5) varnishkafka for instance cp2031:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [13:57:12] (VarnishkafkaNoMessages) firing: (5) varnishkafka for instance cp2031:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [14:02:12] (VarnishkafkaNoMessages) firing: (4) varnishkafka for instance cp2029:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [14:07:12] (VarnishkafkaNoMessages) firing: (7) varnishkafka for instance cp2027:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [14:12:12] (VarnishkafkaNoMessages) firing: (8) varnishkafka for instance cp2027:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [14:17:12] (VarnishkafkaNoMessages) firing: (8) varnishkafka for instance cp2027:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [14:27:12] (VarnishkafkaNoMessages) firing: (8) varnishkafka for instance cp2027:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [14:27:17] PROBLEM - Check unit status of produce_canary_events on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [14:37:12] (VarnishkafkaNoMessages) firing: (8) varnishkafka for instance cp2027:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [14:37:32] RECOVERY - Check unit status of produce_canary_events on an-launcher1002 is OK: OK: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [14:37:36] btullis: I guess those varnishkafka alerts are the codfw depoolings that Luca mentioned? But why is the same server firing multiple times? [14:47:12] (VarnishkafkaNoMessages) firing: (7) varnishkafka for instance cp2029:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [14:52:12] (VarnishkafkaNoMessages) firing: (7) varnishkafka for instance cp2029:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [14:57:12] (VarnishkafkaNoMessages) firing: (6) varnishkafka for instance cp2027:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [15:02:12] (VarnishkafkaNoMessages) firing: (8) varnishkafka for instance cp2027:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [15:07:12] (VarnishkafkaNoMessages) firing: (7) varnishkafka for instance cp2027:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [15:12:12] (VarnishkafkaNoMessages) firing: (8) varnishkafka for instance cp2027:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [15:22:12] (VarnishkafkaNoMessages) firing: (8) varnishkafka for instance cp2027:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [15:24:34] milimetric: Good question. I guess maybe one or two messages might be getting throung and resetting the counter like this? https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=codfw%20prometheus%2Fops&var-source=eventlogging&var-cp_cluster=cache_text&var-instance=cp2027:9132&viewPanel=14&from=1659534707524&to=1659538955894 [15:25:23] to depooled hosts! [15:25:37] * milimetric turns head sideways like puppy [15:27:12] (VarnishkafkaNoMessages) firing: (8) varnishkafka for instance cp2027:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [15:28:35] I cannot explain it, but I've just suggested expediting T300246 into the current sprint so that I can fix it asap. [15:28:36] T300246: Add alert for varnishkafka low/zero messages per second to alertmanager - https://phabricator.wikimedia.org/T300246 [15:34:11] PROBLEM - Host aqs2005.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [15:34:12] PROBLEM - Host aqs2006.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [15:34:12] PROBLEM - Host aqs2008.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [15:34:12] PROBLEM - Host aqs2007.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [15:37:12] (VarnishkafkaNoMessages) firing: (8) varnishkafka for instance cp2027:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [15:47:12] (VarnishkafkaNoMessages) firing: (8) varnishkafka for instance cp2027:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [16:02:12] (VarnishkafkaNoMessages) firing: (8) varnishkafka for instance cp2027:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [16:04:54] 10Quarry, 10Regression, 10Wikimania-Hackathon-2022, 10good first task: Bad resultset number case is not handled - https://phabricator.wikimedia.org/T218470 (10rook) [16:05:37] RECOVERY - Host aqs2006.mgmt is UP: PING OK - Packet loss = 0%, RTA = 33.86 ms [16:05:41] 10Quarry, 10Wikimania-Hackathon-2022, 10good first task: Define in a single place the pseudoname of unnamed queries - https://phabricator.wikimedia.org/T197029 (10rook) [16:05:43] 10Data-Engineering-Kanban, 10Data Pipelines, 10Data Engineering Planning (Sprint 01): [Airflow] Refactor HDFSArchiveOperator to run in Skein - https://phabricator.wikimedia.org/T310542 (10Snwachukwu) 05Resolved→03Open [16:06:27] RECOVERY - Host aqs2005.mgmt is UP: PING OK - Packet loss = 0%, RTA = 33.75 ms [16:06:27] RECOVERY - Host aqs2008.mgmt is UP: PING OK - Packet loss = 0%, RTA = 33.72 ms [16:06:27] RECOVERY - Host aqs2007.mgmt is UP: PING OK - Packet loss = 0%, RTA = 38.34 ms [16:07:12] (VarnishkafkaNoMessages) firing: (7) varnishkafka for instance cp2027:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [16:07:41] PROBLEM - Host furud is DOWN: PING CRITICAL - Packet loss = 100% [16:08:59] 10Data-Engineering-Kanban, 10Data Pipelines, 10Data Engineering Planning (Sprint 02): [Airflow] Refactor HDFSArchiveOperator to run in Skein - https://phabricator.wikimedia.org/T310542 (10Snwachukwu) [16:10:04] (03CR) 10Vivian Rook: "recheck" [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/816254 (https://phabricator.wikimedia.org/T308362) (owner: 10WelpThatWorked) [16:12:12] (VarnishkafkaNoMessages) firing: (8) varnishkafka for instance cp2027:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [16:14:23] (03CR) 10CI reject: [V: 04-1] Escape '|' from wikitable output [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/816254 (https://phabricator.wikimedia.org/T308362) (owner: 10WelpThatWorked) [16:16:09] 10Data-Engineering-Radar, 10Event-Platform, 10Generated Data Platform, 10Data Engineering Planning (Sprint 02), 10Patch-For-Review: Add Event Platform timestamp JSONSchema -> Flink type support - https://phabricator.wikimedia.org/T310495 (10Ottomata) [16:16:25] PROBLEM - Host furud.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [16:25:24] ok btullis: https://gerrit.wikimedia.org/r/c/operations/puppet/+/820160 [16:26:55] PROBLEM - Host furud is DOWN: PING CRITICAL - Packet loss = 100% [16:31:05] milimetric: Can you test the canary aqs1010 now please? [16:32:06] The instructions here are helpful, but it would be even more helpful if it said what the expected output should be? https://wikitech.wikimedia.org/wiki/Analytics/Systems/AQS#Deploy_new_History_snapshot_for_Wikistats_Backend [16:32:12] (VarnishkafkaNoMessages) firing: (8) varnishkafka for instance cp2027:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [16:32:41] I'm getting an empty set of results, but I'm not 100% sure whether that's good or bad. [16:32:52] https://www.irccloud.com/pastebin/UiqWhOcu/ [16:34:39] RECOVERY - Host furud.mgmt is UP: PING OK - Packet loss = 0%, RTA = 45.05 ms [16:35:17] RECOVERY - Host furud is UP: PING OK - Packet loss = 0%, RTA = 31.61 ms [16:36:38] btullis: sorry that's weird... it shouldn't be empty, there should be data there with the new snapshot, trying to figure out what's going on [16:39:18] (03CR) 10Vivian Rook: "recheck" [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/816254 (https://phabricator.wikimedia.org/T308362) (owner: 10WelpThatWorked) [16:39:35] btullis: is it because aqs1010 is not in this list? https://gerrit.wikimedia.org/r/c/operations/puppet/+/820160/1/hieradata/role/common/aqs.yaml#122 [16:39:53] that looks like it's just aqs1004 through aqs1009, which I thought were decomissioned [16:41:21] They're not yet decommissioned, we're waiting on the airflow migration of the cassandra loading jobs before we decom them. [16:42:12] (VarnishkafkaNoMessages) firing: (8) varnishkafka for instance cp2027:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [16:45:13] It definitely mentions the correct snapshot name in /etc/aqs/config/yaml [16:45:26] The aqs service was restarted successfully 14 minutes ago. [16:45:49] Could there be something wrong with the snapshot, given that it finished more quickly than usual this month? [16:47:12] (VarnishkafkaNoMessages) firing: (7) varnishkafka for instance cp2027:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [16:47:51] btullis: yeah, that's what I'm investigating. Anyway, the snapshot seems fine for the old data, so that is the most likely scenario, that it doesn't have good new data or something [16:50:43] Should I cancel the aqs rolling restart cookbook, do you think? [16:52:12] (VarnishkafkaNoMessages) firing: (7) varnishkafka for instance cp2027:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [16:57:12] (VarnishkafkaNoMessages) firing: (8) varnishkafka for instance cp2027:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [16:57:25] milimetric: Should we revert the mw_history change? [16:58:11] 10Data-Engineering, 10API Platform, 10Platform Engineering Roadmap, 10User-Eevans: Pageviews integration testing - https://phabricator.wikimedia.org/T299735 (10BPirkle) [16:59:52] 10Data-Engineering, 10API Platform, 10Platform Engineering Roadmap, 10User-Eevans: Pageviews integration testing - https://phabricator.wikimedia.org/T299735 (10BPirkle) I updated the list in the task description with tests that have been completed. @codebug , please correct anything I got wrong. , I notic... [17:00:06] btullis: yeah, let's revert. I'm looking at the snapshot and it looks 100% ok, so it's super weird [17:00:26] OK, I'll do a puppet change now. [17:02:12] (VarnishkafkaNoMessages) firing: (8) varnishkafka for instance cp2027:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [17:03:29] https://gerrit.wikimedia.org/r/c/operations/puppet/+/820167 [17:08:20] any oh man, sqoop's down too [17:08:23] *broken [17:09:06] Oh no, like how? [17:09:50] I've reverted the mw_history, applied the reverted config, restarted the aqs on aqs1010 and repooled it. [17:12:12] (VarnishkafkaNoMessages) firing: (8) varnishkafka for instance cp2027:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [17:12:35] milimetric: Is it this from an-launcher1002? [17:12:39] https://www.irccloud.com/pastebin/rO3nDyx5/ [17:12:48] From `journalctl -u refinery-sqoop-whole-mediawiki.service` [17:13:02] btullis: yea, that's an easy rerun, but just another thing [17:13:25] hm, or maybe those are broken by the templatelinks migration [17:14:09] btullis: no worries, none of this is super urgent, I'll muddle through stuff today with Sandra and make use of your help tomorrow [17:14:32] OK, cool. Many thanks. [17:16:51] 10Data-Engineering-Kanban, 10Data Engineering Planning, 10Data Pipelines: Investigate why airflow sensor tasks fail without sending errors - https://phabricator.wikimedia.org/T311976 (10xcollazo) [17:17:12] (VarnishkafkaNoMessages) firing: (7) varnishkafka for instance cp2027:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [17:27:12] (VarnishkafkaNoMessages) firing: (7) varnishkafka for instance cp2027:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [17:28:35] (03PS1) 10Vivian Rook: ci test, do not merge [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/820170 [17:42:12] (VarnishkafkaNoMessages) firing: (8) varnishkafka for instance cp2027:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [17:48:51] I'm looking to increase the partition count on .cpjobqueue.partitioned.mediawiki.job.cirrusSearchElasticaWrite. I can manage the bits about getting data into the right partitions (happens manually from cpjobqueue config), but might need help actually changing the live topic. Suspect it amounts to a `kafka-topics.sh --alter ...` call from appropriate places. Would anyone be able to [17:48:53] help with that sometime (later) today? [17:49:02] re: T314426 [17:49:02] T314426: Job queue for writes to cloudelastic falling behind - https://phabricator.wikimedia.org/T314426 [17:50:57] i can help ebernhardson [17:51:40] ebernhardson: i can do that, let me know when I should. [17:52:06] ottomata: any time is fine, i won't be deploying the supporting code until later today as some bits have to go through the mediawiki backport window [17:52:22] I will also its okay to do that now? [17:52:24] its okay* [17:52:26] ? [17:52:27] ottomata: yea [17:52:28] okay [17:52:47] https://wikitech.wikimedia.org/wiki/Kafka/Administration#Alter_topic_partitions_number doing [17:52:49] i added (later) because i didn't want to seem like i need someone to drop what they are doing just this moment, but if you have a moment now thats great [17:52:58] ya now is good [17:54:13] ebernhardson: those topics currently have 5 partitions [17:54:32] you still want incrreased to 6? [17:54:41] ottomata: yup, 6 partitions total [17:55:03] okay [17:56:24] as far as I can tell there's nothing wrong with this 2022-07 snapshot of mw history. Maybe there was some weird race condition between when the config updated and when the restart actually happened...? [17:56:42] ebernhardson: done. [17:57:07] ottomata: thanks! [18:02:12] (VarnishkafkaNoMessages) firing: (8) varnishkafka for instance cp2027:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [18:12:12] (VarnishkafkaNoMessages) firing: (8) varnishkafka for instance cp2027:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [18:17:12] (VarnishkafkaNoMessages) firing: (8) varnishkafka for instance cp2027:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [18:22:12] (VarnishkafkaNoMessages) firing: (7) varnishkafka for instance cp2029:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [18:27:12] (VarnishkafkaNoMessages) firing: (8) varnishkafka for instance cp2027:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [18:42:12] (VarnishkafkaNoMessages) firing: (8) varnishkafka for instance cp2027:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [18:47:12] (VarnishkafkaNoMessages) firing: (8) varnishkafka for instance cp2027:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [18:52:12] (VarnishkafkaNoMessages) firing: (7) varnishkafka for instance cp2027:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [18:57:12] (VarnishkafkaNoMessages) firing: (8) varnishkafka for instance cp2027:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [19:07:12] (VarnishkafkaNoMessages) firing: (8) varnishkafka for instance cp2027:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [19:27:12] (VarnishkafkaNoMessages) firing: (8) varnishkafka for instance cp2027:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [19:52:08] 10Data-Engineering, 10Event Metrics, 10EventStreams, 10GrowthExperiments, and 2 others: editgrowthconfig schema: '' should NOT have additional properties, - https://phabricator.wikimedia.org/T314173 (10Tgr) Whoops, looks like our [[https://logstash.wikimedia.org/app/dashboards#/view/41cae700-bfbe-11eb-85b7... [19:57:12] (VarnishkafkaNoMessages) firing: (8) varnishkafka for instance cp2027:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [20:07:12] (VarnishkafkaNoMessages) firing: (8) varnishkafka for instance cp2027:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [20:17:12] (VarnishkafkaNoMessages) firing: (8) varnishkafka for instance cp2027:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [20:27:08] 10Data-Engineering, 10Event Metrics, 10EventStreams, 10GrowthExperiments-CommunityConfiguration, and 3 others: editgrowthconfig schema: '' should NOT have additional properties, - https://phabricator.wikimedia.org/T314173 (10Urbanecm_WMF) p:05Triage→03High a:03Urbanecm_WMF >>! In T314173#8129392, @Tg... [20:27:12] (VarnishkafkaNoMessages) firing: (8) varnishkafka for instance cp2027:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [20:27:30] 10Data-Engineering, 10Event Metrics, 10EventStreams, 10GrowthExperiments-CommunityConfiguration, and 3 others: editgrowthconfig schema: '' should NOT have additional properties, - https://phabricator.wikimedia.org/T314173 (10Urbanecm_WMF) [21:17:12] (VarnishkafkaNoMessages) firing: (8) varnishkafka for instance cp2027:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [21:22:12] (VarnishkafkaNoMessages) firing: (7) varnishkafka for instance cp2027:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [21:27:12] (VarnishkafkaNoMessages) firing: (8) varnishkafka for instance cp2027:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [21:32:12] (VarnishkafkaNoMessages) firing: (8) varnishkafka for instance cp2027:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [21:37:12] (VarnishkafkaNoMessages) firing: (8) varnishkafka for instance cp2027:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [21:47:12] (VarnishkafkaNoMessages) firing: (8) varnishkafka for instance cp2027:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [22:17:12] (VarnishkafkaNoMessages) firing: (8) varnishkafka for instance cp2027:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [22:27:12] (VarnishkafkaNoMessages) firing: (8) varnishkafka for instance cp2027:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [22:32:12] (VarnishkafkaNoMessages) firing: (8) varnishkafka for instance cp2027:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [22:42:12] (VarnishkafkaNoMessages) firing: (8) varnishkafka for instance cp2027:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [22:55:49] PROBLEM - SSH on druid1006.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [23:12:12] (VarnishkafkaNoMessages) firing: (8) varnishkafka for instance cp2027:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [23:22:12] (VarnishkafkaNoMessages) firing: (8) varnishkafka for instance cp2027:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [23:32:12] (VarnishkafkaNoMessages) firing: (8) varnishkafka for instance cp2027:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [23:42:12] (VarnishkafkaNoMessages) firing: (7) varnishkafka for instance cp2027:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [23:47:12] (VarnishkafkaNoMessages) firing: (8) varnishkafka for instance cp2027:9132 is not logging cache_text requests from eventlogging - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [23:56:26] RECOVERY - SSH on druid1006.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook