Fork me on GitHub

Wikimedia IRC logs browser - #wikimedia-observability

Filter:
Start date
End date

Displaying 49 items:

2024-07-18 05:27:26 <jinxer-wm> FIRING: SystemdUnitFailed: statograph_post.service on alert2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
2024-07-18 06:43:07 <godog> mmhh it looked like the statograph_post alerts continued from the other day
2024-07-18 06:45:34 <godog> opening a task
2024-07-18 06:50:01 <godog> https://phabricator.wikimedia.org/T370386
2024-07-18 07:02:25 <jinxer-wm> RESOLVED: SystemdUnitFailed: statograph_post.service on alert2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
2024-07-18 07:07:25 <jinxer-wm> FIRING: SystemdUnitFailed: statograph_post.service on alert2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
2024-07-18 10:47:25 <jinxer-wm> RESOLVED: SystemdUnitFailed: statograph_post.service on alert2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
2024-07-18 10:51:18 <Lucas_WMDE> hi! I could use a bit of help with Grafana (or maybe Graphite) for https://phabricator.wikimedia.org/T359248 :)
2024-07-18 10:51:32 <Lucas_WMDE> basically, at https://grafana.wikimedia.org/d/TUJ0V-0Zk/wikidata-alerts?from=1721286000000&to=1721293200000&viewPanel=34 we have a metric that was accidentally tracked as seconds rather than milliseconds until this morning (train deployment)
2024-07-18 10:51:53 <Lucas_WMDE> and I’m looking for a way, if it’s possible, to show both the old and new values correctly in the same panel
2024-07-18 10:52:25 <jinxer-wm> FIRING: SystemdUnitFailed: statograph_post.service on alert2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
2024-07-18 10:52:38 <Lucas_WMDE> something like, before this timestamp, multiply the value by 1000; or, show query A up to the timestamp and query B afterwards (and it’s the same query except one is multiplied by 1000)
2024-07-18 10:52:50 <Lucas_WMDE> but so far I haven’t found anything like that 😔
2024-07-18 11:54:04 <jinxer-wm> RESOLVED: SystemdUnitFailed: statograph_post.service on alert2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
2024-07-18 11:54:15 <jinxer-wm> FIRING: LogstashKafkaConsumerLag: Too many messages in logging-eqiad for group logstash7-codfw - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag
2024-07-18 11:55:24 <jinxer-wm> FIRING: LogstashNoLogsIndexed: Logstash logs are not being indexed by Elasticsearch - https://wikitech.wikimedia.org/wiki/Logstash#No_logs_indexed - https://grafana.wikimedia.org/d/000000561/logstash?var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashNoLogsIndexed
2024-07-18 11:59:57 <jinxer-wm> FIRING: SLOMetricAbsent: <no value> - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent
2024-07-18 12:06:33 <godog> Lucas_WMDE: yeah if the metric name is the same I don't know how we'd do that
2024-07-18 12:10:19 <jinxer-wm> RESOLVED: LogstashNoLogsIndexed: Logstash logs are not being indexed by Elasticsearch - https://wikitech.wikimedia.org/wiki/Logstash#No_logs_indexed - https://grafana.wikimedia.org/d/000000561/logstash?var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashNoLogsIndexed
2024-07-18 12:14:27 <jinxer-wm> RESOLVED: SLOMetricAbsent: <no value> - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent
2024-07-18 12:14:55 <jinxer-wm> FIRING: [2x] SystemdUnitFailed: statograph_post.service on alert2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
2024-07-18 12:15:11 <jinxer-wm> FIRING: [2x] LogstashKafkaConsumerLag: Too many messages in logging-codfw for group logstash7-codfw - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag
2024-07-18 12:27:40 <jinxer-wm> RESOLVED: LogstashKafkaConsumerLag: Too many messages in logging-eqiad for group logstash7-codfw - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag
2024-07-18 12:31:25 <Lucas_WMDE> godog: alright, thanks. then we can just live with it, I think
2024-07-18 12:32:25 <jinxer-wm> FIRING: [2x] SystemdUnitFailed: statograph_post.service on alert2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
2024-07-18 12:33:52 <godog> Lucas_WMDE: bummer I realize :(
2024-07-18 14:47:25 <jinxer-wm> RESOLVED: SystemdUnitFailed: statograph_post.service on alert2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
2024-07-18 14:52:26 <jinxer-wm> FIRING: SystemdUnitFailed: statograph_post.service on alert2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
2024-07-18 18:52:26 <jinxer-wm> FIRING: SystemdUnitFailed: statograph_post.service on alert2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
2024-07-18 19:11:17 <denisse> ^ I think think this may be the same error we faced yesterday, I'm looking for the relevant stacktrace.
2024-07-18 19:32:09 <denisse> I sent the sigquit signal to an instance of the benthos service that was stuck.
2024-07-18 19:54:43 <jinxer-wm> FIRING: BenthosKafkaConsumerLag: Too many messages in logging-eqiad for group benthos-mw-accesslog-metrics - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag
2024-07-18 20:09:43 <jinxer-wm> RESOLVED: BenthosKafkaConsumerLag: Too many messages in logging-eqiad for group benthos-mw-accesslog-metrics - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag
2024-07-18 20:20:43 <jinxer-wm> FIRING: BenthosKafkaConsumerLag: Too many messages in logging-eqiad for group benthos-mw-accesslog-metrics - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag
2024-07-18 20:30:43 <jinxer-wm> RESOLVED: BenthosKafkaConsumerLag: Too many messages in logging-eqiad for group benthos-mw-accesslog-metrics - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag
2024-07-18 20:39:46 <jinxer-wm> FIRING: BenthosKafkaConsumerLag: Too many messages in logging-eqiad for group benthos-mw-accesslog-metrics - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag
2024-07-18 20:44:48 <jinxer-wm> RESOLVED: BenthosKafkaConsumerLag: Too many messages in logging-eqiad for group benthos-mw-accesslog-metrics - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag
2024-07-18 21:14:43 <jinxer-wm> FIRING: BenthosKafkaConsumerLag: Too many messages in logging-eqiad for group benthos-mw-accesslog-metrics - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag
2024-07-18 21:29:43 <jinxer-wm> RESOLVED: BenthosKafkaConsumerLag: Too many messages in logging-eqiad for group benthos-mw-accesslog-metrics - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag
2024-07-18 21:45:43 <jinxer-wm> FIRING: BenthosKafkaConsumerLag: Too many messages in logging-eqiad for group benthos-mw-accesslog-metrics - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag
2024-07-18 21:50:43 <jinxer-wm> RESOLVED: BenthosKafkaConsumerLag: Too many messages in logging-eqiad for group benthos-mw-accesslog-metrics - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag
2024-07-18 21:57:43 <jinxer-wm> FIRING: BenthosKafkaConsumerLag: Too many messages in logging-eqiad for group benthos-mw-accesslog-metrics - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag
2024-07-18 22:27:43 <jinxer-wm> RESOLVED: BenthosKafkaConsumerLag: Too many messages in logging-eqiad for group benthos-mw-accesslog-metrics - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag
2024-07-18 22:38:43 <jinxer-wm> FIRING: BenthosKafkaConsumerLag: Too many messages in logging-eqiad for group benthos-mw-accesslog-metrics - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag
2024-07-18 22:42:26 <jinxer-wm> RESOLVED: SystemdUnitFailed: statograph_post.service on alert2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
2024-07-18 23:08:43 <jinxer-wm> FIRING: BenthosKafkaConsumerLag: Too many messages in logging-eqiad for group benthos-mw-accesslog-metrics - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag
2024-07-18 23:18:43 <jinxer-wm> RESOLVED: BenthosKafkaConsumerLag: Too many messages in logging-eqiad for group benthos-mw-accesslog-metrics - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag
2024-07-18 23:36:43 <jinxer-wm> FIRING: BenthosKafkaConsumerLag: Too many messages in logging-eqiad for group benthos-mw-accesslog-metrics - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag
2024-07-18 23:41:43 <jinxer-wm> RESOLVED: BenthosKafkaConsumerLag: Too many messages in logging-eqiad for group benthos-mw-accesslog-metrics - TODO - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DBenthosKafkaConsumerLag

This page is generated from SQL logs, you can also download static txt files from here