[00:05:25] (SystemdUnitFailed) resolved: logrotate.service on moss-be1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:15:15] (MediaWikiLatencyExceeded) firing: p75 latency high: eqiad mw-parsoid (k8s) 1.459s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [00:20:15] (MediaWikiLatencyExceeded) resolved: p75 latency high: eqiad mw-parsoid (k8s) 1.032s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [00:20:45] (MediaWikiLatencyExceeded) firing: p75 latency high: eqiad mw-parsoid (k8s) 909.5ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [00:25:30] (MediaWikiLatencyExceeded) resolved: p75 latency high: eqiad mw-parsoid (k8s) 1.07s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [00:34:15] (MediaWikiLatencyExceeded) firing: p75 latency high: eqiad mw-parsoid (k8s) 1.282s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [00:37:49] (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1015446 [00:37:49] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1015446 (owner: 10TrainBranchBot) [00:39:15] (MediaWikiLatencyExceeded) resolved: p75 latency high: eqiad mw-parsoid (k8s) 1.252s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [00:39:45] (MediaWikiLatencyExceeded) firing: p75 latency high: eqiad mw-parsoid (k8s) 804.2ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [00:44:30] (MediaWikiLatencyExceeded) resolved: p75 latency high: eqiad mw-parsoid (k8s) 1.234s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [00:50:45] (MediaWikiLatencyExceeded) firing: p75 latency high: eqiad mw-parsoid (k8s) 1.027s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [00:59:54] (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1015446 (owner: 10TrainBranchBot) [01:00:45] (MediaWikiLatencyExceeded) resolved: p75 latency high: eqiad mw-parsoid (k8s) 1000ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [01:12:45] (MediaWikiLatencyExceeded) firing: p75 latency high: eqiad mw-parsoid (k8s) 841.9ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [01:14:39] (CirrusSearchNodeIndexingNotIncreasing) firing: Elasticsearch instance elastic2090-production-search-codfw is not indexing - https://wikitech.wikimedia.org/wiki/Search#Indexing_hung_and_not_making_progress - https://grafana.wikimedia.org/d/JLK3I_siz/elasticsearch-indexing?orgId=1&from=now-3d&to=now&viewPanel=57 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchNodeIndexingNotIncreasing [01:17:45] (MediaWikiLatencyExceeded) resolved: p75 latency high: eqiad mw-parsoid (k8s) 1.27s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [01:23:15] (MediaWikiLatencyExceeded) firing: p75 latency high: eqiad mw-parsoid (k8s) 900.5ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [01:33:15] (MediaWikiLatencyExceeded) resolved: p75 latency high: eqiad mw-parsoid (k8s) 916.1ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [01:36:15] (MediaWikiLatencyExceeded) firing: p75 latency high: eqiad mw-parsoid (k8s) 863.3ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [01:46:15] (MediaWikiLatencyExceeded) resolved: p75 latency high: eqiad mw-parsoid (k8s) 815.9ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [01:52:23] 06SRE, 06Infrastructure-Foundations: Reduce 'root' Email Noise by Migrating Reprepro Emails to Google Group - https://phabricator.wikimedia.org/T361262#9672098 (10andrea.denisse) Hi @bd808 thanks for kindly sharing your insights. I appreciate the time you’ve taken to engage with the points I’ve raised. Howeve... [01:53:58] (RdfStreamingUpdaterHighConsumerUpdateLag) resolved: (2) wdqs1013:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag [01:55:47] 06SRE, 10ChangeProp, 06collaboration-services, 10GitLab, and 9 others: Figure out a plan to move forward with regarding Redis License changes - https://phabricator.wikimedia.org/T360596#9672100 (10Diskdance) [02:09:15] (MediaWikiLatencyExceeded) firing: p75 latency high: eqiad mw-parsoid (k8s) 873.5ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [02:14:15] (MediaWikiLatencyExceeded) resolved: p75 latency high: eqiad mw-parsoid (k8s) 887.7ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [02:19:15] (MediaWikiLatencyExceeded) firing: p75 latency high: eqiad mw-parsoid (k8s) 837.6ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [02:32:16] (SystemdUnitFailed) firing: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:34:15] (MediaWikiLatencyExceeded) resolved: p75 latency high: eqiad mw-parsoid (k8s) 935ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [02:37:15] (MediaWikiLatencyExceeded) firing: p75 latency high: eqiad mw-parsoid (k8s) 916.5ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [02:37:20] (JobUnavailable) firing: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [02:38:05] 06SRE, 10ChangeProp, 06collaboration-services, 10GitLab, and 9 others: Figure out a plan to move forward with regarding Redis License changes - https://phabricator.wikimedia.org/T360596#9672123 (10Cstone) [02:41:45] (SwiftTooManyMediaUploads) firing: Too many codfw mediawiki originals uploads - https://wikitech.wikimedia.org/wiki/Swift/How_To#mediawiki_originals_uploads - https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=codfw - https://alerts.wikimedia.org/?q=alertname%3DSwiftTooManyMediaUploads [02:42:15] (MediaWikiLatencyExceeded) resolved: p75 latency high: eqiad mw-parsoid (k8s) 916.5ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [02:44:15] (MediaWikiLatencyExceeded) firing: p75 latency high: eqiad mw-parsoid (k8s) 851.3ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [02:57:20] (JobUnavailable) resolved: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [03:09:41] (03CR) 10Krinkle: planet: Use a more human date format than "Friday, 03 2023 March" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/902832 (owner: 10Krinkle) [03:11:45] (SwiftTooManyMediaUploads) resolved: Too many codfw mediawiki originals uploads - https://wikitech.wikimedia.org/wiki/Swift/How_To#mediawiki_originals_uploads - https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=codfw - https://alerts.wikimedia.org/?q=alertname%3DSwiftTooManyMediaUploads [03:19:15] (MediaWikiLatencyExceeded) resolved: p75 latency high: eqiad mw-parsoid (k8s) 839.9ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [03:19:40] (SystemdUnitFailed) firing: wmf_auto_restart_prometheus-blazegraph-exporter-wdqs-blazegraph.service on wdqs1013:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [03:23:41] (RoutinatorRsyncErrors) firing: (2) Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [04:13:15] (MediaWikiLatencyExceeded) firing: p75 latency high: eqiad mw-parsoid (k8s) 899.2ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [04:18:15] (MediaWikiLatencyExceeded) resolved: p75 latency high: eqiad mw-parsoid (k8s) 838.7ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [04:21:15] (MediaWikiLatencyExceeded) firing: p75 latency high: eqiad mw-parsoid (k8s) 972ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [04:51:15] (MediaWikiLatencyExceeded) resolved: p75 latency high: eqiad mw-parsoid (k8s) 800.2ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [05:01:15] (MediaWikiLatencyExceeded) firing: p75 latency high: eqiad mw-parsoid (k8s) 822.8ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [05:06:15] (MediaWikiLatencyExceeded) resolved: p75 latency high: eqiad mw-parsoid (k8s) 817.3ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [05:11:20] 06SRE, 06Infrastructure-Foundations: Reduce 'root' Email Noise by Migrating Reprepro Emails to Google Group - https://phabricator.wikimedia.org/T361262#9672227 (10Aklapper) The last comment is reading some things into its previous comment which could also be taken differently, and unfortunately seems to make s... [05:14:54] (CirrusSearchNodeIndexingNotIncreasing) firing: Elasticsearch instance elastic2090-production-search-codfw is not indexing - https://wikitech.wikimedia.org/wiki/Search#Indexing_hung_and_not_making_progress - https://grafana.wikimedia.org/d/JLK3I_siz/elasticsearch-indexing?orgId=1&from=now-3d&to=now&viewPanel=57 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchNodeIndexingNotIncreasing [05:40:15] (MediaWikiLatencyExceeded) firing: p75 latency high: eqiad mw-parsoid (k8s) 848.7ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [05:45:15] (MediaWikiLatencyExceeded) resolved: p75 latency high: eqiad mw-parsoid (k8s) 823.2ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [05:47:09] !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance [05:47:17] !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance [05:47:25] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1174 (T352010)', diff saved to https://phabricator.wikimedia.org/P59009 and previous config saved to /var/cache/conftool/dbconfig/20240329-054724-ladsgroup.json [05:47:30] T352010: Gradually drop old pagelinks columns - https://phabricator.wikimedia.org/T352010 [06:00:04] Deploy window MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240329T0600) [06:01:45] (SwiftTooManyMediaUploads) firing: (2) Too many eqiad mediawiki originals uploads - https://wikitech.wikimedia.org/wiki/Swift/How_To#mediawiki_originals_uploads - https://alerts.wikimedia.org/?q=alertname%3DSwiftTooManyMediaUploads [06:14:20] (03CR) 10Hashar: [C:03+1] "We had that added on Gerrit some years ago and having a view of the Apache scoreboard is definitely a plus when something stops responding" [puppet] - 10https://gerrit.wikimedia.org/r/1014610 (owner: 10Dzahn) [06:31:45] (SwiftTooManyMediaUploads) resolved: (2) Too many eqiad mediawiki originals uploads - https://wikitech.wikimedia.org/wiki/Swift/How_To#mediawiki_originals_uploads - https://alerts.wikimedia.org/?q=alertname%3DSwiftTooManyMediaUploads [06:32:16] (SystemdUnitFailed) firing: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:34:15] (MediaWikiLatencyExceeded) firing: p75 latency high: eqiad mw-parsoid (k8s) 862ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [06:39:15] (MediaWikiLatencyExceeded) resolved: p75 latency high: eqiad mw-parsoid (k8s) 816.2ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [06:41:15] (MediaWikiLatencyExceeded) firing: p75 latency high: eqiad mw-parsoid (k8s) 915.5ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [06:46:15] (MediaWikiLatencyExceeded) resolved: p75 latency high: eqiad mw-parsoid (k8s) 915.5ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [07:00:05] Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240329T0700) [07:00:34] (03CR) 10Brouberol: "This looks good! I have another DRY suggestion, but after that I think we can merge!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1014655 (https://phabricator.wikimedia.org/T360531) (owner: 10Btullis) [07:01:45] (SwiftTooManyMediaUploads) firing: (2) Too many eqiad mediawiki originals uploads - https://wikitech.wikimedia.org/wiki/Swift/How_To#mediawiki_originals_uploads - https://alerts.wikimedia.org/?q=alertname%3DSwiftTooManyMediaUploads [07:04:15] (MediaWikiLatencyExceeded) firing: p75 latency high: eqiad mw-parsoid (k8s) 1.059s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [07:09:15] (MediaWikiLatencyExceeded) resolved: p75 latency high: eqiad mw-parsoid (k8s) 815.7ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [07:11:15] (MediaWikiLatencyExceeded) firing: p75 latency high: eqiad mw-parsoid (k8s) 1.201s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [07:16:15] (MediaWikiLatencyExceeded) resolved: p75 latency high: eqiad mw-parsoid (k8s) 909.4ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [07:19:25] (SystemdUnitFailed) resolved: wmf_auto_restart_prometheus-blazegraph-exporter-wdqs-blazegraph.service on wdqs1013:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:22:11] (03PS1) 10Aklapper: phabricator: Fix SafeConfigParser Python DeprecationWarning [puppet] - 10https://gerrit.wikimedia.org/r/1015474 [07:23:41] (RoutinatorRsyncErrors) firing: (2) Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [07:26:15] (MediaWikiLatencyExceeded) firing: p75 latency high: eqiad mw-parsoid (k8s) 931.5ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [07:31:45] (SwiftTooManyMediaUploads) resolved: (2) Too many eqiad mediawiki originals uploads - https://wikitech.wikimedia.org/wiki/Swift/How_To#mediawiki_originals_uploads - https://alerts.wikimedia.org/?q=alertname%3DSwiftTooManyMediaUploads [07:48:07] 06SRE, 06Infrastructure-Foundations, 10Mail: Access to DMARCIAN - https://phabricator.wikimedia.org/T356920#9672323 (10Aklapper) @jhathaway: Another question is why the task is in S4 (Hardware Procurement) while it seems to have nothing to do with hardware procurement? [08:01:15] (MediaWikiLatencyExceeded) resolved: p75 latency high: eqiad mw-parsoid (k8s) 825.9ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [08:36:32] !log repooling wdqs1013 (T360993) [08:36:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:36:36] T360993: WDQS lag propagation to wikidata not working as intended - https://phabricator.wikimedia.org/T360993 [09:14:54] (CirrusSearchNodeIndexingNotIncreasing) firing: Elasticsearch instance elastic2090-production-search-codfw is not indexing - https://wikitech.wikimedia.org/wiki/Search#Indexing_hung_and_not_making_progress - https://grafana.wikimedia.org/d/JLK3I_siz/elasticsearch-indexing?orgId=1&from=now-3d&to=now&viewPanel=57 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchNodeIndexingNotIncreasing [09:18:05] !log filippo@cumin2002 START - Cookbook sre.puppet.migrate-host for host alert1001.wikimedia.org [09:18:28] !log filippo@cumin2002 END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host alert1001.wikimedia.org [09:21:48] !log filippo@cumin2002 START - Cookbook sre.puppet.migrate-host for host alert1001.wikimedia.org [09:22:05] !log filippo@cumin2002 END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host alert1001.wikimedia.org [09:24:32] !log filippo@cumin2002 START - Cookbook sre.puppet.migrate-host for host alert1001.wikimedia.org [09:30:07] !log filippo@cumin2002 END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host alert1001.wikimedia.org [10:04:45] 10SRE-Sprint-Week-Sustainability-March2023, 10SRE-swift-storage, 06Data-Persistence, 10Sustainability (Incident Followup): 142022-08-24 swift incident (tracking) - 14https://phabricator.wikimedia.org/T317358#9672332 (10Aklapper) 05Open→03Declined [10:06:32] 10SRE-tools, 10Cloud-VPS, 06Infrastructure-Foundations, 10Spicerack: spicerack.puppet.PuppetHostsError: Unable to find CSR fingerprints for all hosts, detected errors are: Another puppet instance is already running and the waitforlock setting is set to 0; e... - https://phabricator.wikimedia.org/T361218#9672481 [10:06:40] 10SRE-tools, 10Cloud-VPS, 06Infrastructure-Foundations, 10Spicerack: spicerack.puppet.PuppetHostsError: Unable to find CSR fingerprints for all hosts, detected errors are: Another puppet instance is already running and the waitforlock setting is set to 0; e... - https://phabricator.wikimedia.org/T361218#9672483 [10:07:00] (03CR) 10DCausse: [C:03+1] cirrus: More reliable reporting of reindexing status [deployment-charts] - 10https://gerrit.wikimedia.org/r/1014593 (owner: 10Ebernhardson) [10:07:21] (03PS1) 10Filippo Giunchedi: puppetserver: use client certs for naggen2/puppetdb [puppet] - 10https://gerrit.wikimedia.org/r/1015511 (https://phabricator.wikimedia.org/T358506) [10:07:35] (03CR) 10Filippo Giunchedi: [C:03+2] "I've tested this and it works, it'll also unbreak puppet on alert2001 so I'm going ahead" [puppet] - 10https://gerrit.wikimedia.org/r/1015511 (https://phabricator.wikimedia.org/T358506) (owner: 10Filippo Giunchedi) [10:07:53] 06SRE, 10SRE-tools, 06Infrastructure-Foundations, 10Puppet-Infrastructure, 10Puppet (Puppet 7.0): 14Re-images sometimes fail as the cert request goes to the wrong puppet master - 14https://phabricator.wikimedia.org/T353558#9672500 (10Volans) 05Open→03Resolved a:03Volans 14Many things have chan... [10:08:03] (03CR) 10Filippo Giunchedi: [C:03+2] alert: Update hiera entries for alert1001 to use Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/1003531 (https://phabricator.wikimedia.org/T358506) (owner: 10Andrea Denisse) [10:08:19] wikibugs is sleeping on the job, those actions are old at this point [10:08:55] 06SRE, 06Infrastructure-Foundations, 13Patch-For-Review, 10SRE Observability (FY2023/2024-Q3): 14Fix the Alert hosts Puppet catalogue to be compatible with Puppet 7 - 14https://phabricator.wikimedia.org/T358506#9672528 (10fgiunchedi) 05Stalled→03Resolved 14I pushed forward with this to be in a sta... [10:09:41] (03CR) 10Filippo Giunchedi: "I'm not involved in any of those hosts, removing myself as a reviewer" [puppet] - 10https://gerrit.wikimedia.org/r/1015392 (owner: 10Andrew Bogott) [10:10:40] godog: it's known. Bd.808 has been fighting with it. [10:23:22] hah, thanks RhinosF1 [10:32:16] (SystemdUnitFailed) firing: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:45:41] (03PS1) 10Filippo Giunchedi: logging: move jaeger index cleanup to curator [puppet] - 10https://gerrit.wikimedia.org/r/1015520 (https://phabricator.wikimedia.org/T344953) [10:45:42] (03PS1) 10Filippo Giunchedi: jaeger: disable index cleaner [deployment-charts] - 10https://gerrit.wikimedia.org/r/1015521 (https://phabricator.wikimedia.org/T344953) [10:54:17] 06SRE, 10conftool, 06Data-Persistence, 06Infrastructure-Foundations: Integrate dbctl IP changes as part of VLAN changes. - https://phabricator.wikimedia.org/T360029#9672570 (10cmooney) >>! In T360029#9653766, @Ladsgroup wrote: >>>! In T360029#9635703, @Ladsgroup wrote: >> It might sound revolutionary but I... [11:12:15] (MediaWikiLatencyExceeded) firing: p75 latency high: eqiad mw-parsoid (k8s) 829.9ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [11:17:15] (MediaWikiLatencyExceeded) resolved: p75 latency high: eqiad mw-parsoid (k8s) 836.3ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [11:23:41] (RoutinatorRsyncErrors) firing: (2) Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [11:35:15] (MediaWikiLatencyExceeded) firing: p75 latency high: eqiad mw-parsoid (k8s) 843.2ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [11:45:15] (MediaWikiLatencyExceeded) resolved: p75 latency high: eqiad mw-parsoid (k8s) 872.3ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [11:56:15] (MediaWikiLatencyExceeded) firing: p75 latency high: eqiad mw-parsoid (k8s) 892.5ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [12:02:17] 06SRE, 06Infrastructure-Foundations: Reduce 'root' Email Noise by Migrating Reprepro Emails to Google Group - https://phabricator.wikimedia.org/T361262#9672596 (10andrea.denisse) Hi @Aklapper , thanks for your comment. > and unfortunately seems to make some assumptions Could you elaborate on which assumption... [12:06:15] (MediaWikiLatencyExceeded) resolved: p75 latency high: eqiad mw-parsoid (k8s) 845.8ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [12:08:13] 06SRE, 06Infrastructure-Foundations: 14Reduce 'root' Email Noise by Migrating Reprepro Emails to Google Group - 14https://phabricator.wikimedia.org/T361262#9672601 (10andrea.denisse) 05Open→03Invalid 14I'm closing this task, I regret documenting the issue to try to solve it in order to help unclutter... [12:09:15] (MediaWikiLatencyExceeded) firing: p75 latency high: eqiad mw-parsoid (k8s) 954.7ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [12:24:15] (MediaWikiLatencyExceeded) resolved: p75 latency high: eqiad mw-parsoid (k8s) 852.5ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [12:27:51] (03CR) 10Cwhite: [C:03+1] "Looks good! Thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/1015520 (https://phabricator.wikimedia.org/T344953) (owner: 10Filippo Giunchedi) [12:29:30] (03CR) 10Cwhite: [C:03+1] jaeger: disable index cleaner [deployment-charts] - 10https://gerrit.wikimedia.org/r/1015521 (https://phabricator.wikimedia.org/T344953) (owner: 10Filippo Giunchedi) [12:32:45] 06SRE, 13Patch-For-Review, 10SRE Observability (FY2023/2024-Q3): 14Upgrade alert* hosts to Bookworm - 14https://phabricator.wikimedia.org/T333615#9672673 (10andrea.denisse) 05In progress→03Resolved 14Special thanks to @fgiunchedi for their help and support. [12:43:58] 10ops-codfw, 06SRE, 10observability: titan200[12] RAM/SSD upgrade coordination - https://phabricator.wikimedia.org/T361229#9672690 (10Jhancock.wm) @herron yes, that works great. thanks! [12:46:09] (03CR) 10Andrew Bogott: "ok! I added you as you're an admin in several of those projects; you can remove yourself if you want to avoid future spam :)" [puppet] - 10https://gerrit.wikimedia.org/r/1015392 (owner: 10Andrew Bogott) [12:54:15] (MediaWikiLatencyExceeded) firing: p75 latency high: eqiad mw-parsoid (k8s) 922.7ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [12:59:15] (MediaWikiLatencyExceeded) resolved: p75 latency high: eqiad mw-parsoid (k8s) 920.7ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [13:03:05] 10ops-codfw, 06SRE, 06DC-Ops, 10cloud-services-team (Hardware), 13Patch-For-Review: Q#:rack/setup/install (2) cloudbackup hosts - https://phabricator.wikimedia.org/T356216#9672738 (10Andrew) 05Resolved→03Open a:05Jhancock.wm→03Andrew [13:05:34] (03PS1) 10Andrew Bogott: cloudbackup: put new cloudbackup hosts into service with minimal test load [puppet] - 10https://gerrit.wikimedia.org/r/1015526 (https://phabricator.wikimedia.org/T356216) [13:06:03] (03CR) 10CI reject: [V:04-1] cloudbackup: put new cloudbackup hosts into service with minimal test load [puppet] - 10https://gerrit.wikimedia.org/r/1015526 (https://phabricator.wikimedia.org/T356216) (owner: 10Andrew Bogott) [13:14:54] (CirrusSearchNodeIndexingNotIncreasing) firing: Elasticsearch instance elastic2090-production-search-codfw is not indexing - https://wikitech.wikimedia.org/wiki/Search#Indexing_hung_and_not_making_progress - https://grafana.wikimedia.org/d/JLK3I_siz/elasticsearch-indexing?orgId=1&from=now-3d&to=now&viewPanel=57 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchNodeIndexingNotIncreasing [13:32:18] !log bking@cumin2002 START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_codfw [13:32:20] !log bking@cumin2002 END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_codfw [13:40:41] (03CR) 10Andrew Bogott: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/1015526 (https://phabricator.wikimedia.org/T356216) (owner: 10Andrew Bogott) [13:48:53] (03CR) 10Elukey: "Missing license headers in the new yaml file :)" [puppet] - 10https://gerrit.wikimedia.org/r/1015354 (https://phabricator.wikimedia.org/T273507) (owner: 10JMeybohm) [13:52:14] (03CR) 10Andrew Bogott: [C:03+2] cloudbackup: put new cloudbackup hosts into service with minimal test load [puppet] - 10https://gerrit.wikimedia.org/r/1015526 (https://phabricator.wikimedia.org/T356216) (owner: 10Andrew Bogott) [14:00:07] !log eevans@cumin1002 START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on restbase1027.eqiad.wmnet with reason: Decommissioning — T354561 [14:00:20] T354561: Decommission restbase10[19-27] - https://phabricator.wikimedia.org/T354561 [14:00:21] !log eevans@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase1027.eqiad.wmnet with reason: Decommissioning — T354561 [14:02:09] (CirrusSearchNodeIndexingNotIncreasing) resolved: Elasticsearch instance elastic2090-production-search-codfw is not indexing - https://wikitech.wikimedia.org/wiki/Search#Indexing_hung_and_not_making_progress - https://grafana.wikimedia.org/d/JLK3I_siz/elasticsearch-indexing?orgId=1&from=now-3d&to=now&viewPanel=57 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchNodeIndexingNotIncreasing [14:15:28] (03PS1) 10EoghanGaffney: Revert "[gitlab] Narrow scope of gitlab backup rsync commands" [puppet] - 10https://gerrit.wikimedia.org/r/1015546 [14:17:50] (03CR) 10Filippo Giunchedi: [C:03+1] Revert "[gitlab] Narrow scope of gitlab backup rsync commands" [puppet] - 10https://gerrit.wikimedia.org/r/1015546 (owner: 10EoghanGaffney) [14:18:37] (03CR) 10EoghanGaffney: [C:03+2] Revert "[gitlab] Narrow scope of gitlab backup rsync commands" [puppet] - 10https://gerrit.wikimedia.org/r/1015546 (owner: 10EoghanGaffney) [14:27:05] (03PS1) 10Elukey: Rework the amd-pytorch22's image [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1015530 (https://phabricator.wikimedia.org/T360638) [14:28:03] (03PS2) 10Elukey: Rework the amd-pytorch22's image [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1015530 (https://phabricator.wikimedia.org/T360638) [14:32:16] (SystemdUnitFailed) firing: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:35:03] (03PS1) 10Andrew Bogott: backy2: apply fix-backy2-crypto-imports on Bookworm [puppet] - 10https://gerrit.wikimedia.org/r/1015535 (https://phabricator.wikimedia.org/T356216) [14:35:58] (03CR) 10Andrew Bogott: [C:03+2] backy2: apply fix-backy2-crypto-imports on Bookworm [puppet] - 10https://gerrit.wikimedia.org/r/1015535 (https://phabricator.wikimedia.org/T356216) (owner: 10Andrew Bogott) [14:37:20] (JobUnavailable) firing: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [14:40:47] 10ops-codfw, 06SRE, 06DC-Ops, 10cloud-services-team (Hardware), 13Patch-For-Review: Q#:rack/setup/install (2) cloudbackup hosts - https://phabricator.wikimedia.org/T356216#9672959 (10Andrew) These are now set up and should start running a few backup jobs over the weekend. I need to check back and make su... [14:59:27] 06SRE, 10Continuous-Integration-Infrastructure, 10observability, 13Patch-For-Review, 10Release-Engineering-Team (Seen): 14Export zuul metrics to Prometheus - 14https://phabricator.wikimedia.org/T233089#9672969 (10hashar) 05Open→03Declined [15:08:42] (03PS1) 10Eevans: targets: Remove decommissioned hosts [software/logstash-logback-encoder] - 10https://gerrit.wikimedia.org/r/1015538 (https://phabricator.wikimedia.org/T354561) [15:09:00] (03CR) 10Eevans: [C:03+1] targets: Remove decommissioned hosts [software/logstash-logback-encoder] - 10https://gerrit.wikimedia.org/r/1015538 (https://phabricator.wikimedia.org/T354561) (owner: 10Eevans) [15:11:40] 06SRE, 10Continuous-Integration-Infrastructure, 10observability, 13Patch-For-Review, 10Release-Engineering-Team (Seen): 14Export zuul metrics to Prometheus - 14https://phabricator.wikimedia.org/T233089#9673024 (10colewhite) 14Do we have a solution for this yet? I fear that by ignoring it dashboards... [15:21:22] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1174 (T352010)', diff saved to https://phabricator.wikimedia.org/P59012 and previous config saved to /var/cache/conftool/dbconfig/20240329-152122-ladsgroup.json [15:21:26] T352010: Gradually drop old pagelinks columns - https://phabricator.wikimedia.org/T352010 [15:23:41] (RoutinatorRsyncErrors) firing: (2) Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [15:36:30] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P59013 and previous config saved to /var/cache/conftool/dbconfig/20240329-153629-ladsgroup.json [15:51:37] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P59014 and previous config saved to /var/cache/conftool/dbconfig/20240329-155137-ladsgroup.json [16:06:45] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1174 (T352010)', diff saved to https://phabricator.wikimedia.org/P59015 and previous config saved to /var/cache/conftool/dbconfig/20240329-160644-ladsgroup.json [16:06:47] !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance [16:06:52] T352010: Gradually drop old pagelinks columns - https://phabricator.wikimedia.org/T352010 [16:07:00] !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance [16:07:08] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1191 (T352010)', diff saved to https://phabricator.wikimedia.org/P59016 and previous config saved to /var/cache/conftool/dbconfig/20240329-160707-ladsgroup.json [16:11:09] (03PS1) 10Elukey: profile::pki::multirootca::monitoring: add workaround for python3-crypto [puppet] - 10https://gerrit.wikimedia.org/r/1015541 (https://phabricator.wikimedia.org/T360595) [16:13:14] (03CR) 10Elukey: [V:03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/1760/console" [puppet] - 10https://gerrit.wikimedia.org/r/1015541 (https://phabricator.wikimedia.org/T360595) (owner: 10Elukey) [16:13:26] (RoutinatorRsyncErrors) firing: (2) Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [16:14:06] !log updated wikitech-static to 1.41.1 [16:14:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:14:26] (03CR) 10Elukey: [V:03+1] "PCC SUCCESS (): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/1761/console" [puppet] - 10https://gerrit.wikimedia.org/r/1015541 (https://phabricator.wikimedia.org/T360595) (owner: 10Elukey) [16:51:27] !log amastilovic@deploy1002 Started deploy [airflow-dags/analytics@e6892f4]: (no justification provided) [16:51:54] !log amastilovic@deploy1002 Finished deploy [airflow-dags/analytics@e6892f4]: (no justification provided) (duration: 00m 26s) [17:03:15] (MediaWikiLatencyExceeded) firing: p75 latency high: eqiad mw-parsoid (k8s) 819ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [17:13:15] (MediaWikiLatencyExceeded) resolved: p75 latency high: eqiad mw-parsoid (k8s) 814.9ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [17:15:25] (03PS3) 10Elukey: Rework the amd-pytorch22's image [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1015530 (https://phabricator.wikimedia.org/T360638) [17:17:49] (03PS4) 10Elukey: Rework the amd-pytorch22's image [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1015530 (https://phabricator.wikimedia.org/T360638) [17:18:21] (03CR) 10Elukey: "Tested with Revert Risk ML, and it works nicely. The final size of the image is ~13G, no duplication of torch in various layers." [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1015530 (https://phabricator.wikimedia.org/T360638) (owner: 10Elukey) [17:18:26] (RoutinatorRsyncErrors) firing: (2) Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [17:21:07] (03CR) 10Jforrester: [C:03+1] Deploy partial action blocks everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1015373 (https://phabricator.wikimedia.org/T353496) (owner: 10Tchanders) [17:23:04] (03PS5) 10Elukey: Rework the amd-pytorch22's image [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1015530 (https://phabricator.wikimedia.org/T360638) [17:32:44] (03CR) 10Ladsgroup: [C:03+1] profile::pki::multirootca::monitoring: add workaround for python3-crypto [puppet] - 10https://gerrit.wikimedia.org/r/1015541 (https://phabricator.wikimedia.org/T360595) (owner: 10Elukey) [17:36:39] (03CR) 10Dzahn: [C:03+2] "Thanks! personally I slightly prefer to include profiles in the role rather than profile in another profile" [puppet] - 10https://gerrit.wikimedia.org/r/1014610 (owner: 10Dzahn) [17:39:25] (03CR) 10Dzahn: [C:03+2] "this will install the package (we fixed it recently so it also works on bookworm, fwiw). and then we will need another step to add config " [puppet] - 10https://gerrit.wikimedia.org/r/1014610 (owner: 10Dzahn) [18:00:00] 06SRE, 10Continuous-Integration-Infrastructure, 10observability, 13Patch-For-Review, 10Release-Engineering-Team (Seen): Export zuul metrics to Prometheus - https://phabricator.wikimedia.org/T233089#9673528 (10thcipriani) 05Declined→03Open >>! In T233089#9673024, @colewhite wrote: > Do we have a solut... [18:09:15] (MediaWikiLatencyExceeded) firing: p75 latency high: eqiad mw-parsoid (k8s) 867.5ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [18:11:45] (03CR) 10Cwhite: [C:03+1] "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/1015541 (https://phabricator.wikimedia.org/T360595) (owner: 10Elukey) [18:32:16] (SystemdUnitFailed) firing: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:33:47] (03PS2) 10Krinkle: Move etcd.php from wmf-config/ to src/ [mediawiki-config] - 10https://gerrit.wikimedia.org/r/891733 [18:33:58] (03PS3) 10Krinkle: Move etcd.php from wmf-config/ to src/ [mediawiki-config] - 10https://gerrit.wikimedia.org/r/891733 (https://phabricator.wikimedia.org/T308932) [18:37:35] (JobUnavailable) firing: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [18:49:15] (MediaWikiLatencyExceeded) resolved: p75 latency high: eqiad mw-parsoid (k8s) 871.1ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [18:50:15] (03PS1) 10FNegri: R:wmcs::db::toolsdb: remove unnecessary config [puppet] - 10https://gerrit.wikimedia.org/r/1015580 (https://phabricator.wikimedia.org/T344717) [18:50:15] (MediaWikiLatencyExceeded) firing: p75 latency high: eqiad mw-parsoid (k8s) 878.9ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [18:53:15] (03CR) 10FNegri: [C:04-1] "Let's wait until tools-db-3 has fully caught up with the primary before merging this patch." [puppet] - 10https://gerrit.wikimedia.org/r/1015580 (https://phabricator.wikimedia.org/T344717) (owner: 10FNegri) [18:54:30] (MediaWikiLatencyExceeded) resolved: p75 latency high: eqiad mw-parsoid (k8s) 838.1ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [19:10:15] (MediaWikiLatencyExceeded) firing: p75 latency high: eqiad mw-parsoid (k8s) 877.9ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [19:15:15] (MediaWikiLatencyExceeded) resolved: p75 latency high: eqiad mw-parsoid (k8s) 877.9ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [19:53:59] !log removing 1 file for legal compliance [19:54:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:00:49] !log removing 1 file for legal compliance [20:00:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:24:15] (MediaWikiLatencyExceeded) firing: p75 latency high: eqiad mw-parsoid (k8s) 1.028s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [20:28:10] (03PS1) 10Dzahn: miscweb: include prometheus::apache_exporter for dashboard data [puppet] - 10https://gerrit.wikimedia.org/r/1015586 [20:29:02] (03PS2) 10Dzahn: miscweb: include prometheus::apache_exporter for dashboard data [puppet] - 10https://gerrit.wikimedia.org/r/1015586 [20:30:20] !log removing 4 files for legal compliance [20:30:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:30:49] (03CR) 10Dzahn: [C:03+2] miscweb: include prometheus::apache_exporter for dashboard data [puppet] - 10https://gerrit.wikimedia.org/r/1015586 (owner: 10Dzahn) [20:34:15] (MediaWikiLatencyExceeded) resolved: p75 latency high: eqiad mw-parsoid (k8s) 807.8ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [20:36:06] (03PS1) 10Dzahn: requesttracker: include prometheus::apache_exporter [puppet] - 10https://gerrit.wikimedia.org/r/1015587 [20:36:39] (03PS2) 10Dzahn: requesttracker: include prometheus::apache_exporter [puppet] - 10https://gerrit.wikimedia.org/r/1015587 [20:43:03] (03CR) 10Dzahn: [C:03+2] requesttracker: include prometheus::apache_exporter [puppet] - 10https://gerrit.wikimedia.org/r/1015587 (owner: 10Dzahn) [20:44:54] (03PS6) 10Dzahn: miscweb: remove profile::microsites::security [puppet] - 10https://gerrit.wikimedia.org/r/1015005 (https://phabricator.wikimedia.org/T350796) (owner: 10AOkoth) [20:49:01] (03PS1) 10Dzahn: prometheus: add config for scraping apache data on doc hosts [puppet] - 10https://gerrit.wikimedia.org/r/1015590 [20:51:35] (03PS1) 10Dzahn: prometheus: add config for scraping apache data on miscweb and rt [puppet] - 10https://gerrit.wikimedia.org/r/1015592 [21:07:15] (MediaWikiLatencyExceeded) firing: p75 latency high: eqiad mw-parsoid (k8s) 818ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [21:12:15] (MediaWikiLatencyExceeded) resolved: p75 latency high: eqiad mw-parsoid (k8s) 818ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [21:18:41] (RoutinatorRsyncErrors) firing: (2) Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [21:56:27] !log removing 4 files for legal compliance [21:56:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:03:27] (03Abandoned) 10Daniel Kinzler: Create functional values-beta.yaml [deployment-charts] - 10https://gerrit.wikimedia.org/r/722649 (owner: 10Daniel Kinzler) [22:04:18] (03Abandoned) 10Daniel Kinzler: Create generic config extract script [deployment-charts] - 10https://gerrit.wikimedia.org/r/723306 (owner: 10Daniel Kinzler) [22:04:26] (03Abandoned) 10Daniel Kinzler: Generate a .env file for use by ratelimiter [deployment-charts] - 10https://gerrit.wikimedia.org/r/722956 (owner: 10Daniel Kinzler) [22:12:30] !log removing 1 file for legal compliance [22:12:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:23:45] !log amastilovic@deploy1002 Started deploy [airflow-dags/analytics@67eaa50]: (no justification provided) [22:24:14] !log amastilovic@deploy1002 Finished deploy [airflow-dags/analytics@67eaa50]: (no justification provided) (duration: 00m 29s) [22:32:16] (SystemdUnitFailed) firing: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [22:37:35] (JobUnavailable) firing: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable