[10:31:11] Krinkle: I was trying to correlate yesterday's ResourceLoader 5xx alert with Logstash: https://grafana.wikimedia.org/d/000000066/resourceloader?orgId=1&from=1712862062090&to=1712881502090&viewPanel=56 [10:31:22] but mediawiki-errors has nothing: https://logstash.wikimedia.org/app/dashboards#/view/mediawiki-errors?_g=h@46e3fbe&_a=h@8141eda [10:31:51] am I wrong in assuming that a varnish 503 means the backend timed out and there should be a corresponding error in the MediaWiki logs somewhere? [10:32:36] (there was a small DDoS attack so it's no mystery what caused it, I'm just being curious) [10:58:59] AIUI a timeout due to queueing will not necessarily result in an error on the PHP side [11:12:43] as in, fcgi or apache or something like that discards the request after a certain time because there's no available worker, and PHP does not get involved at all? [14:06:36] tgr: I believe 503 is exclusively for when the request never reaches the appserver. Not exactly lack of worker at Apache level, but rather ATS proactively refusing to try (there is a configured limit to how many connections varnish/ATS will send to each backend). [14:09:30] In practice, the only 5xx from php are 500. Including for timeouts at that level. [14:11:14] In theory timeouts can at higher layers but that suggests a failure to enforce timeouts at lower layers. https://wikitech.wikimedia.org/wiki/HTTP_timeouts#App_server