[05:06:38] 10serviceops, 10DBA, 10User-fgiunchedi, 10cloud-services-team (Kanban): Roll restart haproxy to apply updated configuration - https://phabricator.wikimedia.org/T287574 (10Marostegui) @fgiunchedi with the switches maintenance finished, we can proceed whenever you like with this. Just give me 24h heads up an... [07:00:23] hello folks [07:00:38] nothing urgent, I am going to write some thoughts for TLS certs in ML [07:00:47] answer if/when you have a moment :) [07:01:29] so I'd need to deploy some TLS certs on the ML cluster via helm, and I am looking into what's available for the other clusters [07:02:05] I see that we have profile::kubernetes::deployment_server::helmfile and profile::kubernetes::deployment_server::global_config [07:02:11] <_joe_> yes [07:03:24] I was wondering if we could keep the same structure, overriding values in the private ml roles, to avoid rendering (for example) all the serviceops secrets for us [07:03:34] basically rendering only what we need [07:03:41] (for example, the puppet CA, etc..) [07:04:02] otherwise I may need a separate profile for ML, rendering files with what we need on the deployment server [07:04:13] (all the envoy configs etc.. for example are not needed) [07:05:25] <_joe_> elukey: yes you can clearly [07:05:44] <_joe_> elukey: for the puppet CA, can I suggest you add the wmf-certificates package to your images? [07:06:22] _joe_ I'd need to use its b64 output in helm though [07:06:39] this is why I was asking [07:06:40] <_joe_> elukey: why? [07:07:19] for the webhook, IIUC it needs to communicate to the k8s api what to trust when establishing a TLS connection to it [07:07:37] <_joe_> who executes the webhook? [07:07:44] <_joe_> one of your images, right? [07:08:06] <_joe_> so if you add wmf-certificates, you'll have your puppet CA on the filesystem and in the default certs bundle [07:08:21] <_joe_> but anyways, this can be a second-order improvement [07:08:35] <_joe_> now what I don't understand from your discussion is [07:08:46] <_joe_> are you planning on having your own deployment hosts? [07:08:58] the webhook is exposed by a docker image (the service itself), but the flow is: k8s api -> webhook [07:09:14] <_joe_> if not, just add your stuff to the current data structure, it gets translated to yaml files on deploy hosts [07:09:20] so it is the k8s api itself that needs to trust a cert [07:09:23] not a docker image [07:09:25] IIUC [07:09:33] <_joe_> oh ok [07:10:11] if possible I'd avoid the dedicated deployment host :D [07:10:38] but if there is a good reason we'll maintain it for sure [07:17:52] <_joe_> no reason [07:18:43] <_joe_> but to my point [07:19:38] <_joe_> you just need to add stuff to the general structure, or add another separated hiera key to gather ml-specific secrets, and modify the profile::kubernetes::deployment_server class [07:20:08] <_joe_> let me finish what I'm doing and we can work on it [07:26:04] yes yes makes complete sense [07:34:18] 10serviceops, 10DBA, 10User-fgiunchedi, 10cloud-services-team (Kanban): Roll restart haproxy to apply updated configuration - https://phabricator.wikimedia.org/T287574 (10fgiunchedi) Thanks all for your help! Let's go for Tues next week (i.e. Aug 3rd). Easiest would be around 9 UTC, does that work on your... [07:36:13] 10serviceops, 10DBA, 10User-fgiunchedi, 10cloud-services-team (Kanban): Roll restart haproxy to apply updated configuration - https://phabricator.wikimedia.org/T287574 (10Marostegui) Works for me! [08:16:08] 10serviceops, 10DBA, 10User-fgiunchedi, 10cloud-services-team (Kanban): Roll restart haproxy to apply updated configuration - https://phabricator.wikimedia.org/T287574 (10Marostegui) Active hosts that would need puppet stopped and failed over before applying the change: dbproxy1014 m1 dbproxy1013 m2 dbpro... [08:16:30] 10serviceops, 10DBA, 10User-fgiunchedi, 10cloud-services-team (Kanban): Roll restart haproxy to apply updated configuration - https://phabricator.wikimedia.org/T287574 (10Marostegui) [10:59:28] 10serviceops, 10observability, 10GitLab (Initialization), 10Patch-For-Review: Define monitoring for gitlab - https://phabricator.wikimedia.org/T275170 (10Jelto) Basic Icinga alerts for the public https and SSH endpoints of GitLab are in place now: https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?host... [13:56:50] 10serviceops, 10SRE, 10Patch-For-Review: bring 43 new mediawiki appserver in eqiad into production - https://phabricator.wikimedia.org/T279309 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqiad.wmnet for hosts: ` ['mw1439.eqiad.wmnet', 'mw1440.eqiad.wmnet', 'mw1445.eqiad.... [14:40:17] 10serviceops, 10SRE, 10Patch-For-Review: bring 43 new mediawiki appserver in eqiad into production - https://phabricator.wikimedia.org/T279309 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1439.eqiad.wmnet', 'mw1440.eqiad.wmnet', 'mw1445.eqiad.wmnet', 'mw1446.eqiad.wmnet'] ` and were **ALL**... [15:17:40] 10serviceops, 10SRE, 10decommission-hardware, 10Patch-For-Review: decom 44 eqiad appservers purchased on 2016-04-12/13 (mw1261 through mw1301) - https://phabricator.wikimedia.org/T280203 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by dzahn@cumin1001 for hosts: `mw[1295-1296].eqiad.wmn... [15:25:01] 10serviceops, 10MW-on-K8s, 10Release-Engineering-Team, 10SRE: Ensure the code is deployed to mediawiki on k8s when it is deployed to production - https://phabricator.wikimedia.org/T287570 (10Joe) p:05Triage→03High [15:33:17] 10serviceops, 10SRE, 10decommission-hardware, 10Patch-For-Review: decom 44 eqiad appservers purchased on 2016-04-12/13 (mw1261 through mw1301) - https://phabricator.wikimedia.org/T280203 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by dzahn@cumin1001 for hosts: `mw[1298-1299].eqiad.wmn... [15:46:06] 10serviceops, 10SRE, 10decommission-hardware, 10Patch-For-Review: decom 44 eqiad appservers purchased on 2016-04-12/13 (mw1261 through mw1301) - https://phabricator.wikimedia.org/T280203 (10Dzahn) [15:54:30] 10serviceops, 10DBA, 10User-fgiunchedi, 10cloud-services-team (Kanban): Roll restart haproxy to apply updated configuration - https://phabricator.wikimedia.org/T287574 (10Andrew) >>! In T287574#7248022, @fgiunchedi wrote: > Thanks all for your help! Let's go for Tues next week (i.e. Aug 3rd). Easiest woul... [20:23:12] so... I never did https://gerrit.wikimedia.org/r/c/labs/private/+/693169 for shellbox [20:23:25] is that required? [20:51:56] legoktm: it's useful to keep the puppet compiler working as expected [20:52:28] ack, let me add it in [21:18:14] 10serviceops, 10Performance-Team, 10SRE: WARNING: opcache cache-hit ratio is below 99.99% on multiple eqiad appservers and parsoid servers - https://phabricator.wikimedia.org/T287792 (10Legoktm) [21:51:31] 10serviceops, 10SRE, 10Services, 10Wikibase-Quality-Constraints, and 4 others: Deploy Shellbox instance (shellbox-constraints) for Wikidata constraint regexes - https://phabricator.wikimedia.org/T285104 (10Legoktm) [22:48:27] 10serviceops, 10SRE, 10Services, 10Wikibase-Quality-Constraints, and 4 others: Deploy Shellbox instance (shellbox-constraints) for Wikidata constraint regexes - https://phabricator.wikimedia.org/T285104 (10Legoktm) ` $ curl https://staging.svc.eqiad.wmnet:4010/healthz { "__": "Shellbox running", "p... [22:48:33] Amir1: ^^