[00:55:38] 10Traffic, 10Release-Engineering-Team: Java fails to install on WMF Debian container - https://phabricator.wikimedia.org/T352350 (10BCornwall) [00:56:04] 10Traffic, 10Release-Engineering-Team: Java fails to install on WMF Debian container - https://phabricator.wikimedia.org/T352350 (10BCornwall) [00:56:39] 10Traffic, 10Release-Engineering-Team: Java fails to install on WMF Debian container - https://phabricator.wikimedia.org/T352350 (10BCornwall) [00:57:12] 10Traffic, 10Release-Engineering-Team: Java fails to install on WMF Debian container - https://phabricator.wikimedia.org/T352350 (10BCornwall) p:05Triage→03High [08:32:32] 10Traffic, 10Release-Engineering-Team: Java fails to install on WMF Debian container - https://phabricator.wikimedia.org/T352350 (10hashar) [08:33:37] 10Traffic, 10Release-Engineering-Team: Java fails to install on WMF Debian container - https://phabricator.wikimedia.org/T352350 (10MoritzMuehlenhoff) We ran into this before with the Buster image, so probably we simply need to carry the previous fix forward: https://phabricator.wikimedia.org/T289694 [08:36:28] 10Traffic, 10Release-Engineering-Team: Java fails to install on WMF Debian container - https://phabricator.wikimedia.org/T352350 (10hashar) The Debian package fails to install since OpenJDK 11 .The root cause is our `dpkg` configuration does not install man pages, `/usr/share/man` is thus never created and the... [08:54:24] 10Traffic, 10MW-on-K8s, 10SRE, 10serviceops, 10Release-Engineering-Team (Seen): Serve production traffic via Kubernetes - https://phabricator.wikimedia.org/T290536 (10Clement_Goubert) [09:48:42] (SystemdUnitFailed) firing: ifup@ens13.service Failed on ncredir3003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:53:42] (SystemdUnitFailed) resolved: ifup@ens13.service Failed on ncredir3003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:38:35] 10Traffic, 10MW-on-K8s, 10SRE, 10serviceops, and 2 others: Move MediaWiki jobs to mw-on-k8s - https://phabricator.wikimedia.org/T349796 (10Clement_Goubert) [12:24:46] (VarnishHighThreadCount) firing: (8) Varnish's thread count on cp5017:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [12:29:46] (VarnishHighThreadCount) firing: (8) Varnish's thread count on cp5017:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [12:34:46] (VarnishHighThreadCount) firing: (9) Varnish's thread count on cp5017:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [12:44:47] (VarnishHighThreadCount) firing: (11) Varnish's thread count on cp5017:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [12:54:47] (VarnishHighThreadCount) firing: (10) Varnish's thread count on cp5017:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [12:59:46] (VarnishHighThreadCount) firing: (12) Varnish's thread count on cp5017:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [13:04:46] (VarnishHighThreadCount) firing: (9) Varnish's thread count on cp5017:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [13:14:46] (VarnishHighThreadCount) firing: (6) Varnish's thread count on cp5017:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [13:19:46] (VarnishHighThreadCount) resolved: (5) Varnish's thread count on cp5017:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [14:45:14] 10Traffic, 10Abstract Wikipedia team, 10Beta-Cluster-Infrastructure, 10WikiLambda, 10Beta-Cluster-reproducible: HTTP 504 connection timeout error accessing MW API on Beta cluster - https://phabricator.wikimedia.org/T351930 (10Vgutierrez) `api.php` is currently handled by deployment-mediawiki11 and that i... [14:45:17] taavi: Nov 30 14:21:25 deployment-cache-text08 puppet-agent[965696]: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Resource Statement, Evaluation Error: Error while evaluating a Function Call, Class[Profile::Cache::Haproxy]: parameter 'ocsp_proxy' expects a String value, got Undef (file: /etc/puppet/modules/role/manifests/cache/text.pp, line: 8, [14:45:17] column: 5) on node deployment-cache-text08.deployment-prep.eqiad1.wikimedia.cloud [14:45:38] taavi: IIRC you were refactoring some ocsp_proxy stuff a few days ago? [14:48:57] 10Traffic, 10Abstract Wikipedia team, 10Beta-Cluster-Infrastructure, 10WikiLambda, 10Beta-Cluster-reproducible: HTTP 504 connection timeout error accessing MW API on Beta cluster - https://phabricator.wikimedia.org/T351930 (10kostajh) The timing of the spike in errors aligns with {4344b2fb80727daa44eb461... [14:49:18] vgutierrez: uh, yes. let me have a look [14:49:44] haproxy profile still requires it: hieradata/cloud.yaml:http_proxy [14:49:46] arg.. wrong paste [14:49:54] String $ocsp_proxy = lookup('http_proxy'), [14:51:21] that's kinda wrong considering that LE only environments don't need to fetch OCSP responses [14:52:00] yep and some environments like deployment-prep don't need a proxy in the first place. I'll send a patch [14:52:27] deployment-prep doesn't need to do ocsp at all.. acme-chief takes care of that [14:52:30] thanks [14:56:12] vgutierrez: https://gerrit.wikimedia.org/r/c/operations/puppet/+/979115/ [14:57:10] the issue is that deployment-prep used to have `http_proxy: ''` in the project hiera, and I changed it to undef as part of that acme-chief cleanup. an empty string is a valid puppet `String`, but undef is not [14:57:51] (ache-chief previously required an empty string to work in cloud vps, but does not anymore) [14:59:01] 10Traffic, 10Abstract Wikipedia team, 10Beta-Cluster-Infrastructure, 10WikiLambda, 10Beta-Cluster-reproducible: HTTP 504 connection timeout error accessing MW API on Beta cluster - https://phabricator.wikimedia.org/T351930 (10kostajh) p:05Triage→03Unbreak! Raising the priority as the beta cluster is... [15:01:19] godog: BTW, could you remind me why aren't we using node_exporter to expose sysctl metrics? [15:01:34] just to keep my soul from screaming while writing a promtheus exporter in bash [15:56:23] vgutierrez: afaics that's >= bookworm only, though yes in principle we could, also assuming node-exporter is smart enough to do the label thing in your case [15:59:50] ack [16:00:06] nah.. I'll just use the prometheus python library to create a valid output instead of printing strings in bash [16:00:38] makes more sense to me [16:00:55] I think that's wise yeah, replace the current bash implementation with python should be straightforward and more mainteaneable for sure [16:43:27] 10Traffic, 10Abstract Wikipedia team, 10Beta-Cluster-Infrastructure, 10WikiLambda, 10Beta-Cluster-reproducible: HTTP 504 connection timeout error accessing MW API on Beta cluster - https://phabricator.wikimedia.org/T351930 (10Lucas_Werkmeister_WMDE) >>! In T351930#9371481, @kostajh wrote: > The timing of... [17:01:31] 10Traffic, 10serviceops: Java fails to install on WMF Debian container - https://phabricator.wikimedia.org/T352350 (10BCornwall) [17:22:42] 10Traffic, 10Abstract Wikipedia team, 10Beta-Cluster-Infrastructure, 10WikiLambda, 10Beta-Cluster-reproducible: HTTP 504 connection timeout error accessing MW API on Beta cluster - https://phabricator.wikimedia.org/T351930 (10bd808) [17:53:29] 10Traffic, 10Abstract Wikipedia team, 10Beta-Cluster-Infrastructure, 10WikiLambda, 10Beta-Cluster-reproducible: HTTP 504 connection timeout error accessing MW API on Beta cluster - https://phabricator.wikimedia.org/T351930 (10kostajh) >>! In T351930#9372190, @Lucas_Werkmeister_WMDE wrote: >>>! In T351930... [18:50:43] 10Traffic, 10Abstract Wikipedia team, 10Beta-Cluster-Infrastructure, 10WikiLambda, 10Beta-Cluster-reproducible: HTTP 504 connection timeout error accessing MW API on Beta cluster - https://phabricator.wikimedia.org/T351930 (10AlexisJazz) I just realized my editor (which relies on the API) was working aga... [21:16:43] 10Traffic, 10Abstract Wikipedia team, 10Beta-Cluster-Infrastructure, 10WikiLambda, 10Beta-Cluster-reproducible: HTTP 504 connection timeout error accessing MW API on Beta cluster - https://phabricator.wikimedia.org/T351930 (10bd808) [22:13:46] 10Traffic, 10GitLab (Project Migration), 10Patch-For-Review: Migrate Traffic repositories from Gerrit to Gitlab - https://phabricator.wikimedia.org/T347623 (10BCornwall)