[00:29:40] FIRING: LogstashIndexingFailures: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashIndexingFailures [00:34:40] RESOLVED: LogstashIndexingFailures: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashIndexingFailures [00:38:40] FIRING: LogstashIndexingFailures: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashIndexingFailures [00:43:40] RESOLVED: LogstashIndexingFailures: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashIndexingFailures [00:56:40] FIRING: LogstashIndexingFailures: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashIndexingFailures [01:01:40] RESOLVED: LogstashIndexingFailures: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashIndexingFailures [01:05:40] FIRING: LogstashIndexingFailures: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashIndexingFailures [01:10:40] RESOLVED: LogstashIndexingFailures: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashIndexingFailures [02:52:40] FIRING: LogstashIndexingFailures: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashIndexingFailures [02:56:31] ^ silenced - will look closer tomorrow am [08:03:12] FIRING: ThanosQueryRangeLatencyHigh: Thanos Query has high latency for queries. - https://wikitech.wikimedia.org/wiki/Thanos#Alerts - https://grafana.wikimedia.org/d/af36c91291a603f1d9fbdabdd127ac4a/thanos-query - https://alerts.wikimedia.org/?q=alertname%3DThanosQueryRangeLatencyHigh [08:04:12] FIRING: ThanosQueryInstantLatencyHigh: Thanos Query has high latency for queries. - https://wikitech.wikimedia.org/wiki/Thanos#Alerts - https://grafana.wikimedia.org/d/af36c91291a603f1d9fbdabdd127ac4a/thanos-query - https://alerts.wikimedia.org/?q=alertname%3DThanosQueryInstantLatencyHigh [08:08:12] RESOLVED: ThanosQueryRangeLatencyHigh: Thanos Query has high latency for queries. - https://wikitech.wikimedia.org/wiki/Thanos#Alerts - https://grafana.wikimedia.org/d/af36c91291a603f1d9fbdabdd127ac4a/thanos-query - https://alerts.wikimedia.org/?q=alertname%3DThanosQueryRangeLatencyHigh [08:09:12] RESOLVED: [2x] ThanosQueryInstantLatencyHigh: Thanos Query has high latency for queries. - https://wikitech.wikimedia.org/wiki/Thanos#Alerts - https://alerts.wikimedia.org/?q=alertname%3DThanosQueryInstantLatencyHigh [13:18:25] FIRING: SystemdUnitFailed: grafana-loki.service on grafana2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:38:25] RESOLVED: SystemdUnitFailed: grafana-loki.service on grafana2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:05:39] cwhite: sorry to pick on you again, logstash1037 is in rack F1 we'll be upgrading today, that ok? [14:05:58] also do you know anything about titan1001? it's also in that rack [14:06:52] topranks: logstash1037 is good to go [14:07:13] cwhite: cool thanks [14:07:42] for titan, it's a thanos component and I'm not sure offhand what is needed, if anything, to make it ok for isolation [14:07:52] godog might know if around? [14:09:59] hey, yeah I'm around [14:10:10] depooling is sufficient [14:10:30] topranks: would you remind me what time is the work today ? [14:11:07] godog: thanks [14:11:44] it was scheduled for 14:00 UTC - so 10 mins ago - but I can wait until you are ready [14:12:08] topranks: yeah all good, I've just depooled the host [14:12:10] go ahead [14:12:11] I was tied up with meetings so I'm a little behind myslef [14:12:17] ok great thanks! [14:34:29] godog, cwhite: upgrade is complete thanks for your help! [14:34:53] topranks: cheers [14:34:57] I've repooled titan1001 [14:49:07] hi folks. Traffic gets this alert which seems to flap: SLOMetricAbsent traffic (thanos-rule haproxy critical haproxy-combined thanos) [14:49:22] any idea on what is the cause for this? IIRC, I saw herron doing something about it but I can't find it now [14:51:46] hey sukhe having a look, its meant to alert that the underlying metric has a gap [14:52:27] thanks keith! not urgent [17:58:57] sukhe: hey coming back around to this -- it looks like a false positive and issue with thanos-rule, created https://phabricator.wikimedia.org/T369854 for tracking [17:59:27] thanks herron! [17:59:35] let me know if you need anything from our side [17:59:48] ack will do thx! [20:12:03] Hello folks! I'm a software engineer on the Catalyst team at the foundation. As we're getting closer to production, we want to start storing our logs outside of the vanilla kubernetes log aggregation [20:13:18] Our understanding is that logstash.wikimedia.org is only for production wikis and services, but is there any options for folks on Cloud VPS, or do we need to bring our own ELK (or w/e)? [21:18:11] Hi kindrobot! You are correct, that link is for production services is out of reach for cloud-vps. We do not operate anything on cloud-vps except for an experimental cluster that backs deployment-prep. [23:33:32] Would you be interested in a second consumer for your experimental cluster? We're making CI tooling, so the stakes a pretty low [23:39:59] I'm down to explore the idea. [23:42:30] Nice! I can take that possibility back to the team to see what they think. What would you need from us? Would a syncronous meeting be helpful? [23:43:20] If my teammates are amenable to the idea, we can discuss the data and operational requirements. :) [23:43:57] Can I get back to you with more concrete info?