[00:03:37] <jinxer-wm>	 (Nonwrite HTTP requests with primary DB connections alert) firing: <no value>   - https://alerts.wikimedia.org/?q=alertname%3DNonwrite+HTTP+requests+with+primary+DB+connections+alert
[00:08:16] <jinxer-wm>	 (MediaWikiLatencyExceeded) resolved: Average latency high: eqiad parsoid GET/200 - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad%20prometheus/ops&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[00:18:50] <icinga-wm>	 PROBLEM - Check systemd state on an-web1001 is CRITICAL: CRITICAL - degraded: The following units failed: hardsync-published.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:23:37] <jinxer-wm>	 (Nonwrite HTTP requests with primary DB connections alert) resolved: <no value>   - https://alerts.wikimedia.org/?q=alertname%3DNonwrite+HTTP+requests+with+primary+DB+connections+alert
[00:31:10] <icinga-wm>	 RECOVERY - Check systemd state on an-web1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:38:43] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/940211
[00:38:49] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/940211 (owner: 10TrainBranchBot)
[00:53:21] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/940211 (owner: 10TrainBranchBot)
[01:13:16] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: Average latency high: eqiad parsoid GET/200 - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad%20prometheus/ops&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[01:18:16] <jinxer-wm>	 (MediaWikiLatencyExceeded) resolved: Average latency high: eqiad parsoid GET/200 - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad%20prometheus/ops&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[01:23:16] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: Average latency high: eqiad parsoid GET/200 - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad%20prometheus/ops&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[01:38:16] <jinxer-wm>	 (MediaWikiLatencyExceeded) resolved: Average latency high: eqiad parsoid GET/200 - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad%20prometheus/ops&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[01:42:16] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: Average latency high: eqiad parsoid GET/200 - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad%20prometheus/ops&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[02:07:32] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:18:16] <icinga-wm>	 PROBLEM - Check systemd state on gitlab1003 is CRITICAL: CRITICAL - degraded: The following units failed: sync-gitlab-group-with-ldap.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[02:19:16] <icinga-wm>	 PROBLEM - Check systemd state on gitlab2002 is CRITICAL: CRITICAL - degraded: The following units failed: sync-gitlab-group-with-ldap.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[02:30:14] <icinga-wm>	 RECOVERY - Check systemd state on gitlab1003 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[02:31:18] <icinga-wm>	 RECOVERY - Check systemd state on gitlab2002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[02:32:32] <jinxer-wm>	 (JobUnavailable) resolved: (2) Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[04:11:54] <icinga-wm>	 PROBLEM - mailman archives on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[04:13:14] <icinga-wm>	 RECOVERY - mailman archives on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 50276 bytes in 0.079 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[04:20:02] <icinga-wm>	 PROBLEM - mailman list info on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[04:21:24] <icinga-wm>	 RECOVERY - mailman list info on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8571 bytes in 0.272 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[05:42:16] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: Average latency high: eqiad parsoid GET/200 - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad%20prometheus/ops&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[06:46:48] <wikibugs>	 (03CR) 10Andrea Denisse: [C: 03+1] "LGTM, thank you!!" [puppet] - 10https://gerrit.wikimedia.org/r/937601 (https://phabricator.wikimedia.org/T234565) (owner: 10Cwhite)
[07:00:06] <jouncebot>	 Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230722T0700)
[08:50:16] <jinxer-wm>	 (PHPFPMTooBusy) firing: Not enough idle php7.4-fpm.service workers for Mediawiki parsoid at eqiad #page - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?from=now-3h&orgId=1&to=now&var-cluster=parsoid&var-datasource=eqiad%20prometheus/ops&viewPanel=64 - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[08:55:16] <jinxer-wm>	 (PHPFPMTooBusy) resolved: Not enough idle php7.4-fpm.service workers for Mediawiki parsoid at eqiad #page - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?from=now-3h&orgId=1&to=now&var-cluster=parsoid&var-datasource=eqiad%20prometheus/ops&viewPanel=64 - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[09:02:33] <elukey>	 hey folks
[09:02:37] <elukey>	 just seen the page
[09:03:36] <elukey>	 the two extra nodes that we added yesterday didn't make a big difference
[09:06:27] <elukey>	 nothing really out of the ordinary afaics from the graphs
[09:06:48] <elukey>	 the graph is not 100% in line with the alert though
[09:06:59] <Emperor>	 morning
[09:07:43] <Emperor>	 this is the alert that was p.aging in the last couple of days too, isn't it?
[09:09:22] <elukey>	 exactly yes
[09:09:24] <elukey>	 https://grafana-rw.wikimedia.org/d/RIA1lzDZk/application-servers-red?forceLogin&from=now-2d&orgId=1&refresh=1m&to=now&var-cluster=parsoid&var-datasource=eqiad%20prometheus%2Fops&viewPanel=22
[09:09:31] <Emperor>	 dunno if there's anything Saturday-straightforward that could be done to stop it firing over the weekend?
[09:09:33] <elukey>	 this stands out, I think it was called out in the task
[09:09:38] <elukey>	 lemme find it
[09:11:48] <elukey>	 https://phabricator.wikimedia.org/T342085
[09:14:51] <Emperor>	 I'm going to bump that task to High, since it's p.aging on a weekend :-/
[09:16:18] <Emperor>	 I don't think I want to try and shuffle host allocations further, especially since the p.age self-resolved, but I suspect that's going to result in more alerts over the weekend :-/
[09:17:28] <elukey>	 yeah
[09:17:53] <elukey>	 so I am reviewing the metric and alert rule for PHPFPMTooBusy in the alerts repo, it is not 100% clear to me why it fired
[09:18:07] <elukey>	 the time windows is 2minutes, so very tight
[09:18:39] <Emperor>	 I think because some not-enough-workers problems represent a serious outage?
[09:18:42] <wikibugs>	 (03PS12) 10Winston Sung: SiteMatrix config: Add actual (non-deprecated) language code for deprecated language codes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/884494 (https://phabricator.wikimedia.org/T172035)
[09:19:14] <Emperor>	 [at least per https://bit.ly/wmf-fpmsat ]
[09:20:42] <elukey>	 yes yes that is understood, I tried to render the metric for the alarm and I don't see it running under 0.3
[09:21:06] <elukey>	 the graph attached to the alert is not the same as what we have in the alert rule afaics
[09:21:12] <Emperor>	 sorry, I'll let you work on that and stop asking silly questions :)
[09:21:40] <elukey>	 nono please it may be a saturday's pebcak :D
[09:21:53] <elukey>	 my line of thinking is that maybe the 2min window is too tight
[09:25:47] <Emperor>	 if I'm reading the graph correctly ( https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?from=now-3h&orgId=1&to=now&var-cluster=parsoid&var-datasource=eqiad%20prometheus/ops&viewPanel=64 ) from the p.age we're skirting around the 0.3 window all the time ATM
[09:27:08] <elukey>	 and the graph uses, afaics
[09:27:09] <elukey>	 sum by (state) (phpfpm_statustext_processes{site=~"$site",service="php7.4-fpm.service",cluster="$cluster"})
[09:27:36] <elukey>	 but the PHPFPMTooBusy alarm in the alerts repo uses another config
[09:27:42] <elukey>	           sum by (cluster, service) (phpfpm_statustext_processes{cluster=~"(api_appserver|appserver|parsoid)", state="idle"})
[09:27:45] <elukey>	           /
[09:27:47] <elukey>	           sum by (cluster, service) (phpfpm_statustext_processes{cluster=~"(api_appserver|appserver|parsoid)"})
[09:27:50] <elukey>	           <= 0.3
[09:27:54] <Emperor>	 hang on a minute - isn't that graph showing about 30% _used_ so about 70% idle?
[09:28:58] <Emperor>	 e.g. at 08:50 UTC it has 984 active and 1764 idle
[09:29:39] <Emperor>	 <-- confused
[09:30:15] <elukey>	 if you render the above it shows more or less the same
[09:30:44] <elukey>	 I am confused as well
[09:32:34] <Emperor>	 the panel is graphing "sum by (state) (phpfpm_statustext_processes{site=~"(eqiad|codfw)",service="php7.4-fpm.service",cluster="parsoid"})"
[09:33:23] <Emperor>	 [once logged in, select "explore"]
[09:33:51] <wikibugs>	 (03PS1) 10Jelto: vrts: enable blackbox check on active_host only [puppet] - 10https://gerrit.wikimedia.org/r/940467 (https://phabricator.wikimedia.org/T342366)
[09:34:29] <elukey>	 Emperor: right, and selecting only say eqiad gets to a different view
[09:34:47] <elukey>	 now it makes more sense, it was smoothed out between eqiad and codfw
[09:35:21] <elukey>	 and around 8:50 it crossed the mark
[09:35:25] <elukey>	 so all checks out
[09:35:32] <elukey>	 maybe we could improve the alert's text
[09:36:29] <Emperor>	 in eqiad I think the threshold is about 392 idle processes?
[09:36:49] <Emperor>	 [because I think there are about 1300 total, and that's 0.3 of that]
[09:37:07] <elukey>	 yes yes it checks out now
[09:37:34] <elukey>	 but maybe the time window is too short, 2 minutes is good for us to react but it may take into account small variations
[09:37:45] <elukey>	 I'd propose to bump it to 5m, at least for the moment
[09:38:14] <Emperor>	 that or push the threshold to 0.25 ?
[09:38:34] <wikibugs>	 (03CR) 10Jelto: [V: 03+1] "PCC SUCCESS (NOOP 1 DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/42646/console" [puppet] - 10https://gerrit.wikimedia.org/r/940467 (https://phabricator.wikimedia.org/T342366) (owner: 10Jelto)
[09:38:55] <Emperor>	 0.23 would be about 300 idle, which would give us more headroom
[09:39:36] <Emperor>	 (I'm a bit torn between wanting to avoid people getting p.aged all weekend for this and overly-enthusiastically silencing a real problem if one turns up)
[09:41:53] <wikibugs>	 (03PS1) 10Elukey: team-sre: fix graph for the PHPFPMTooBusy alert [alerts] - 10https://gerrit.wikimedia.org/r/940469 (https://phabricator.wikimedia.org/T342085)
[09:42:11] <elukey>	 Emperor: in the meantime this should fix the graph that we get --^
[09:42:16] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: Average latency high: eqiad parsoid GET/200 - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad%20prometheus/ops&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[09:43:23] <elukey>	 ah we have the same problem in this graph sigh --^
[09:43:32] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] team-sre: fix graph for the PHPFPMTooBusy alert [alerts] - 10https://gerrit.wikimedia.org/r/940469 (https://phabricator.wikimedia.org/T342085) (owner: 10Elukey)
[09:43:42] <elukey>	 p95 is definitely a high
[09:43:50] <elukey>	 the baseline of the lower one increased
[09:44:37] <elukey>	 ah yes the tests
[09:45:00] <jelto>	 elukey: there are some tests in mediawiki_test.yaml which fail now
[09:45:13] <jelto>	 ah you found it, good :)
[09:46:20] <elukey>	 yes yes fixing them :)
[09:47:46] <wikibugs>	 (03CR) 10MVernon: [C: 04-1] "This is a good change, but I think the URL needs tweaking a bit (see inline)." [alerts] - 10https://gerrit.wikimedia.org/r/940469 (https://phabricator.wikimedia.org/T342085) (owner: 10Elukey)
[09:48:17] <Emperor>	 elukey: I think your URL change wasn't quite right
[09:48:46] <Emperor>	 [and I think for non-p.aging alerts lets make a task to note the graphs need improving, but don't feel we have to fix them all on a Saturday]
[09:49:17] <elukey>	 Emperor: why not?
[09:49:59] <elukey>	 ah yes yes you are right, I was fixing it
[09:50:26] <elukey>	 I think we should fix it, we spent a solid 10 minutes figuring out the issue
[09:51:15] <Emperor>	 FE (I'm not sure how many alerts we're talking about needing to fix)
[09:51:24] <Emperor>	 [sorry, FE = fair enough]
[09:53:22] <wikibugs>	 (03PS2) 10Elukey: team-sre: fix mediawiki graphs using the RED dashboard [alerts] - 10https://gerrit.wikimedia.org/r/940469 (https://phabricator.wikimedia.org/T342085)
[09:53:39] <wikibugs>	 (03CR) 10Elukey: team-sre: fix mediawiki graphs using the RED dashboard (031 comment) [alerts] - 10https://gerrit.wikimedia.org/r/940469 (https://phabricator.wikimedia.org/T342085) (owner: 10Elukey)
[09:54:45] <elukey>	 ok better now :)
[09:55:44] <elukey>	 and about the time window: https://grafana-rw.wikimedia.org/d/RIA1lzDZk/application-servers-red?from=1690015311474&orgId=1&to=1690016070094&var-cluster=parsoid&var-datasource=eqiad%20prometheus%2Fops&forceLogin&var-site=eqiad&var-method=POST&var-code=200&var-php_version=All&editPanel=64
[09:56:15] <elukey>	 afaics it seems that we crossed the threshold for 3 mins, this is why I think that the 2 min window may catch false alerts
[09:56:35] <elukey>	 especially like in this case that we are close to threshold
[09:56:40] <elukey>	 it will surely fire again this weekend
[09:58:03] <Emperor>	 [just checking all the revised URLs in your CR]
[09:59:01] <elukey>	 <3
[10:00:50] <wikibugs>	 (03CR) 10MVernon: [C: 03+1] "The revised URLs all seem good to me, thanks for fixing these!" [alerts] - 10https://gerrit.wikimedia.org/r/940469 (https://phabricator.wikimedia.org/T342085) (owner: 10Elukey)
[10:01:47] <Emperor>	 OK, let's push it to 5m and get the relevant people to review on Monday?
[10:02:05] <elukey>	 filing the change
[10:02:41] <wikibugs>	 (03PS1) 10Elukey: team-sre: increase the time window for PHPFPMTooBusy [alerts] - 10https://gerrit.wikimedia.org/r/940471 (https://phabricator.wikimedia.org/T342085)
[10:03:04] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] team-sre: fix mediawiki graphs using the RED dashboard [alerts] - 10https://gerrit.wikimedia.org/r/940469 (https://phabricator.wikimedia.org/T342085) (owner: 10Elukey)
[10:03:19] <Emperor>	 I think you'll need to fix the tests too
[10:04:43] <Emperor>	 OK, evidently not :)
[10:04:43] <wikibugs>	 (03Merged) 10jenkins-bot: team-sre: fix mediawiki graphs using the RED dashboard [alerts] - 10https://gerrit.wikimedia.org/r/940469 (https://phabricator.wikimedia.org/T342085) (owner: 10Elukey)
[10:05:40] <wikibugs>	 (03CR) 10MVernon: [C: 03+1] "I think this is a sensible approach for the weekend at least." [alerts] - 10https://gerrit.wikimedia.org/r/940471 (https://phabricator.wikimedia.org/T342085) (owner: 10Elukey)
[10:06:47] <elukey>	 Emperor: thanks for the review, ok to merge for the weekend?
[10:07:06] <elukey>	 I'll take the blame for service ops in case
[10:07:08] <elukey>	 :D
[10:08:59] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] team-sre: increase the time window for PHPFPMTooBusy [alerts] - 10https://gerrit.wikimedia.org/r/940471 (https://phabricator.wikimedia.org/T342085) (owner: 10Elukey)
[10:09:15] <elukey>	 done, I'll update the task
[10:09:24] <elukey>	 I think we can get back to the weekend folks
[10:09:27] <elukey>	 thanks!
[10:09:28] <elukey>	 o/
[10:09:46] <Emperor>	 have a good rest-of-weekend :)
[10:10:06] <wikibugs>	 (03Merged) 10jenkins-bot: team-sre: increase the time window for PHPFPMTooBusy [alerts] - 10https://gerrit.wikimedia.org/r/940471 (https://phabricator.wikimedia.org/T342085) (owner: 10Elukey)
[10:11:37] <elukey>	 you too!
[13:42:16] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: Average latency high: eqiad parsoid GET/200 - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=eqiad&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[13:49:16] <icinga-wm>	 PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[14:01:20] <icinga-wm>	 RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[14:07:32] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[14:17:32] <jinxer-wm>	 (JobUnavailable) resolved: (2) Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[17:42:16] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: Average latency high: eqiad parsoid GET/200 - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=eqiad&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[20:01:05] <jinxer-wm>	 (SwiftTooManyMediaUploads) firing: (2) Too many eqiad mediawiki originals uploads - https://wikitech.wikimedia.org/wiki/Swift/How_To#mediawiki_originals_uploads  - https://alerts.wikimedia.org/?q=alertname%3DSwiftTooManyMediaUploads
[20:31:05] <jinxer-wm>	 (SwiftTooManyMediaUploads) firing: (2) Too many eqiad mediawiki originals uploads - https://wikitech.wikimedia.org/wiki/Swift/How_To#mediawiki_originals_uploads  - https://alerts.wikimedia.org/?q=alertname%3DSwiftTooManyMediaUploads
[21:21:05] <jinxer-wm>	 (SwiftTooManyMediaUploads) resolved: Too many eqiad mediawiki originals uploads - https://wikitech.wikimedia.org/wiki/Swift/How_To#mediawiki_originals_uploads - https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=eqiad - https://alerts.wikimedia.org/?q=alertname%3DSwiftTooManyMediaUploads
[21:42:16] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: Average latency high: eqiad parsoid GET/200 - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=eqiad&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[22:05:36] <jinxer-wm>	 (GitLabCIPipelineErrors) firing: GitLab - High pipeline error rate - https://wikitech.wikimedia.org/wiki/GitLab/Runbook - https://grafana.wikimedia.org/d/Chb-gC07k/gitlab-ci-overview - https://alerts.wikimedia.org/?q=alertname%3DGitLabCIPipelineErrors
[22:10:36] <jinxer-wm>	 (GitLabCIPipelineErrors) resolved: GitLab - High pipeline error rate - https://wikitech.wikimedia.org/wiki/GitLab/Runbook - https://grafana.wikimedia.org/d/Chb-gC07k/gitlab-ci-overview - https://alerts.wikimedia.org/?q=alertname%3DGitLabCIPipelineErrors
[22:31:47] <wikibugs>	 (03PS1) 10Hamish: Add botadmin group on eswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/940486 (https://phabricator.wikimedia.org/T342484)
[23:12:48] <wikibugs>	 10SRE, 10Wikimedia-Mailing-lists: Add custom footer linking to Privacy Policy in Postorious and Hyperkitty - https://phabricator.wikimedia.org/T340375 (10Ladsgroup) I wonder if UCoC is a better option here. Most mailing lists are not covered by TCoC.