[07:56:14] <_joe_> MountVolume.SetUp failed for volume "tls-certs-volume" : configmap "mediawiki-pinkunicorn-tls-proxy-certs# Mcrouter configuration" not found [07:56:28] <_joe_> I <3 go text/template [08:19:11] 10serviceops, 10SRE, 10User-jbond: Update docker-reporter to only check images available in the respective repos - https://phabricator.wikimedia.org/T284539 (10jbond) [08:22:22] 10serviceops, 10SRE, 10User-jbond: Update docker-reporter to only check images available in the respective repos - https://phabricator.wikimedia.org/T284539 (10jbond) [08:22:28] 10serviceops, 10SRE, 10Patch-For-Review, 10User-jbond: docker-reporter-releng-images failed on deneb - https://phabricator.wikimedia.org/T251918 (10jbond) [09:23:15] _joe_: I just tried running tox in docker-report repo and setuptools_scm fails with errors parsing version from "upstream/0.0.11" tag [09:23:27] have you seen that before? [09:23:44] <_joe_> jayme: oh sigh, maybe, but ask volans [09:24:02] <_joe_> somehow I allowed him to trick me into using setuptools_scm again [09:24:12] lol [09:24:18] you'll probably have just summoned him [09:24:20] ah :) [09:25:18] having a look [09:25:29] as I'm on clinic duty this week I'm at your service :D [09:29:08] jayme: I think you can set tag_regex, trying now [09:36:42] jayme: yep, it works, sending patch [09:37:04] volans: <3 [10:04:40] 10serviceops, 10SRE, 10User-jbond: Update docker-reporter to only check images available in the respective repos - https://phabricator.wikimedia.org/T284539 (10jbond) p:05Triage→03Low [10:22:42] 10serviceops, 10MediaWiki-Core-Snapshots, 10Wikimedia-General-or-Unknown: Reproducible HTTP 503 error trying to import from Telugu wikipedia to Telugu Wikibooks - https://phabricator.wikimedia.org/T283472 (10Aklapper) [10:34:06] <_joe_> jayme: if I run docker exec --user root /bin/bash to enter a container run by kubernetes in production, I'm unable to write to the filesystem [10:34:17] <_joe_> I miss why that's the case [10:34:26] <_joe_> I mean in production [10:36:37] hm...which one? [10:37:03] <_joe_> uhhh wait, brainfart [10:37:04] <_joe_> sigh [10:38:02] great :) [10:39:24] <_joe_> yeah the problem is I can't run apt :P [10:39:39] <_joe_> but that's because of capabilities that are dropped [10:39:46] <_joe_> E: seteuid 100 failed - seteuid (1: Operation not permitted) [11:13:57] 10serviceops, 10SRE, 10Patch-For-Review, 10User-jbond: docker-reporter-releng-images failed on deneb - https://phabricator.wikimedia.org/T251918 (10hashar) +1 on the image that got deleted. Note that we might have some images that got moved from Jessie, Stretch, Buster but having kept their name. I don't... [12:31:42] 10serviceops, 10Traffic, 10VPS-project-Codesearch, 10netops: Consider using BindsTo instead of Requires to declare dependencies between systemd unit - https://phabricator.wikimedia.org/T284555 (10ema) [12:32:02] 10serviceops, 10Traffic, 10VPS-project-Codesearch, 10netops: Consider using BindsTo instead of Requires to declare dependencies between systemd unit - https://phabricator.wikimedia.org/T284555 (10ema) p:05Triage→03Low [13:19:50] 10serviceops, 10SRE, 10Patch-For-Review, 10User-jbond: docker-reporter-releng-images failed on deneb - https://phabricator.wikimedia.org/T251918 (10JMeybohm) 05Open→03Resolved Thanks @jbond for looking after this. I'll bluntly close this task again now. [13:21:39] <_joe_> @all I have a question I'm not sure how to answer properly [13:22:30] <_joe_> so, I need for all of our php apps to be able to access the puppet CA, and php-curl doesn't seem to care too much of settings trying to change the CA file location [13:23:52] <_joe_> the options are therefore: 1 - Add a prepend file in the source code that sets the CA file option globally for all php requests, and add the puppet CA in the kubernetes manifests for every deployment, or 2) add the puppet ca to our base php images and just add it to the default cert bundle [13:24:15] <_joe_> I think 1 looks cleaner logically but 2 is exponentially more practical and will require less work on our part [13:24:55] <_joe_> jayme/effie/jelto ^^ (I'd ping the americans but it seems way too early for them :P) [13:25:53] 2 sounds better to me [13:25:59] 3) open up the question again if we should add puppet ca to the base image [13:26:20] that would solve way more problems [13:26:46] eg mcrouter needs out puppet CA [13:27:41] <_joe_> effie: does it? I think it doesn't atm [13:27:57] the downside of this is we would have to rebild each and every image on CA update (+ re-reploy) [13:28:00] <_joe_> I think it might need its own CA atm [13:28:16] _joe_: it will when we are done with TLS [13:28:22] <_joe_> jayme: yeah I'm not a fan of adding an internal CA to a published image [13:28:31] ditto ... [13:28:42] because we will be using puppet certs [13:28:52] well, we have our restricted repo [13:28:53] we do have a bunch of other services that do include the puppet_ca already. Adding it to base would generalize that [13:28:59] <_joe_> so there is another option ofc - I add the puppet image only when building the restricted final image [13:30:44] "Add a prepend file in the source code that sets the CA file option globally for all php requests" <- pardon my ignorance, but what it that, exactly? [13:31:09] <_joe_> so, php allows you to add one, or several "prepend" files to the code used for each request [13:31:16] <_joe_> we do so atm for mediawiki [13:31:57] <_joe_> auto_prepend_file = /srv/mediawiki/wmf-config/PhpAutoPrepend.php [13:32:36] async() uh and...we probably discussed last time (and I forgot): Could we just bind mount (read-only) the CA from the k8s nodes instead of having to use dedicated config maps everywhere? [13:32:51] <_joe_> yes [13:32:54] <_joe_> but ew [13:33:05] :) [13:33:28] would that mean it would "auto-update"? [13:33:32] like via puppet? [13:33:57] <_joe_> well yes, but the actual deployments wouldn't pick it up because most software is dumb [13:34:08] <_joe_> but to be clear, we want to STOP using it [13:34:17] but, but, but PHP is request based :D [13:34:18] <_joe_> so assume this is some random CA [13:34:32] okay,okay ... got your point [13:34:33] <_joe_> php would pick it up, yes [13:35:19] ideally, we would have a mutating webhook controller that would inject the CA into every container [13:35:30] * jayme runns [13:35:54] <_joe_> one day I'll be free to move again [13:36:00] <_joe_> you'll get less cocky :P [13:36:46] 10serviceops, 10SRE, 10Traffic, 10VPS-project-Codesearch, 10netops: Consider using BindsTo instead of Requires to declare dependencies between systemd unit - https://phabricator.wikimedia.org/T284555 (10BBlack) When we looked into this for the Bird-based anycast stuff, we found that the combination you w... [13:37:48] we'll see :) But there is somthing right about that idea as that we would not have to cary code for this around in each and every chart that needs access to the ca [13:38:06] and let "the infrastructure" handle it instead [13:38:23] <_joe_> so in most cases, it's just envoy that needs it [13:38:50] <_joe_> in mediawiki, ideally, all outgoing requests go via envoy but I wanted to keep etcd outside of that realm [13:38:51] _joe_: question, I am missing something [13:39:21] _joe_: I count 9 services already including it (for whatever reason) [13:39:54] no nvm [13:40:11] 10serviceops, 10SRE, 10Traffic, 10VPS-project-Codesearch: Consider using BindsTo instead of Requires to declare dependencies between systemd unit - https://phabricator.wikimedia.org/T284555 (10ayounsi) [13:40:25] <_joe_> jayme: ok, so most probably something's very wrong there :P [13:40:30] <_joe_> which ones btw? [13:40:58] (isn't it nice how all out names have the same width in fixed font IRC...j.elto should say something as well :D) [13:41:27] api-gateway, changeprop, cxserver, event*, mobileapps, similar-users, termbox, wikifeeds [13:41:50] ah, well...I do see now that we added it to common_templates. /ignore me [13:42:27] <_joe_> jayme: so in some cases it's also inside the service and I'd argue that's wrong [13:42:35] <_joe_> but I don't have time to do it now [13:42:40] ok question, what would be wrong if our base image that includes the ca [13:42:48] is in the restricted registry [13:42:54] <_joe_> tes [13:42:56] <_joe_> *yes [13:42:59] one is to rebuild everything, understood [13:43:05] <_joe_> because then you either keep all images restricted [13:43:09] <_joe_> or what's the point? [13:43:33] it does solve the problem once and for all [13:43:54] <_joe_> no I mean [13:44:02] <_joe_> we definitely want to keep publishing our images [13:44:09] <_joe_> else we'd just have them all restricted [13:45:26] we can have the public and the non public ones [13:45:48] that adds overhead ofc [13:46:40] it is more like a rethorical/psychological issue anyways, right? Or do we fear anything from adding the CA to public images? Technically I mean? [13:47:48] I have been thinking the same thing, are there any risks ? [13:48:30] my POV is more that it would be nice to have the base images clean of WMF specific stuff. But otoh - if this keeps us from adding special handling here and there it might be worth it to give up on that [13:48:33] <_joe_> no it's just dirty, and adds a dependency on ca-certificates everywhere (else there is no point), and I think it's a hack [13:49:00] yeah...it kind of is a hack [13:49:05] <_joe_> I'm more positive on adding it to "final" images or to "language-specific" ones [13:49:54] ack to that...we could have a helper in docker-pkg that does so, right? [13:50:35] in the language specific ones at least. for finals, that would need to be done in blubber [13:51:27] <_joe_> jayme: I was thinking of creating a "wmf-ca-bundles" package [13:52:52] <_joe_> I talked about it with john [13:57:12] and? [13:57:47] <_joe_> about creating a debian package with our ca bundles included [13:58:08] yes, what's his take I mean [13:58:32] <_joe_> do you think that if he disagreed I wouldn't have said so? :P [13:59:20] I don't know if he was willing or you made him [13:59:25] say yes [14:02:21] _joe_: oh, well. Yes. That's even more easy [14:02:37] <_joe_> #-sre's floor is soaked in john's blood [14:02:44] <_joe_> other than that, he was willing [14:03:22] I think I'd go with that rather than having it all over the place in helm/k8s. Easy to add to intermediate images as well as to final once [14:03:38] and on CA change, stuff needs to be deployed anyways ... [14:03:48] <_joe_> yeah [14:03:54] (so no change there I mean. Apart from that it needs a docker rebuild additionally) [14:03:54] agree [14:03:57] <_joe_> the package is easy to add to final images [14:07:05] would be nice ig that package would not depend on the upstream ca-certificates, but that's probably not easy to do [14:10:58] <_joe_> jayme: yeah also pointless [14:11:17] couple of MB less, I guess [14:11:36] <_joe_> yeah not that relevant tbh [14:12:06] <_joe_> we shaved off 1 mb by moving off bootstrap-vz and do debuerreotype for the base images, we're taking it back [14:47:04] 10serviceops, 10Data-Persistence (Consultation), 10SRE, 10ops-codfw: codfw: Relocate servers in 10G racks - https://phabricator.wikimedia.org/T281135 (10Papaul) [14:56:18] 10serviceops, 10Data-Persistence (Consultation), 10SRE, 10ops-codfw: codfw: Relocate servers in 10G racks - https://phabricator.wikimedia.org/T281135 (10Papaul) [14:57:40] 10serviceops, 10MW-on-K8s, 10SRE, 10Patch-For-Review: Add the puppet CA to the MediaWiki deployment - https://phabricator.wikimedia.org/T284417 (10Joe) After further consideration, we came to the conclusion that the best course of action is: * For now, hotpatch the chart to use the puppet ca * Create a "wm... [15:16:30] 10serviceops, 10Beta-Cluster-Infrastructure, 10Editing-team, 10Release Pipeline, and 2 others: Migrate Beta cluster services to use Kubernetes - https://phabricator.wikimedia.org/T220235 (10hashar) [15:20:45] 10serviceops, 10Release-Engineering-Team: Offer Loadtesting as a Service - https://phabricator.wikimedia.org/T230530 (10hashar) [15:21:44] 10serviceops, 10Release-Engineering-Team: Offer Loadtesting as a Service - https://phabricator.wikimedia.org/T230530 (10hashar) I have marked it as a dupe of T67394 and copy pasted the description there. [15:30:13] 10serviceops, 10SRE-swift-storage: Corrupted helm charts in Swift that are not properly showing up in chartmuseum - https://phabricator.wikimedia.org/T283147 (10JMeybohm) 05Open→03Resolved Looks fine so far, I'll resolve this. [15:35:32] 10serviceops, 10Data-Persistence (Consultation), 10SRE, 10ops-codfw: codfw: Relocate servers in 10G racks - https://phabricator.wikimedia.org/T281135 (10Papaul) [16:12:07] 10serviceops, 10MW-on-K8s, 10Release-Engineering-Team, 10SRE: Ownership of the /tmp/mw-cache directories should be www-data in the mediawiki-multiversion image - https://phabricator.wikimedia.org/T284581 (10Joe) [16:13:50] 10serviceops, 10MW-on-K8s, 10Release-Engineering-Team, 10SRE: Ownership of the /tmp/mw-cache directories should be www-data in the mediawiki-multiversion image - https://phabricator.wikimedia.org/T284581 (10Joe) I suspect we just need to remove the directories, as they only contain a cached configuration f... [16:17:20] 10serviceops, 10SRE, 10Patch-For-Review, 10User-jbond: docker-reporter-releng-images failed on deneb - https://phabricator.wikimedia.org/T251918 (10jbond) >>! In T251918#7142275, @JMeybohm wrote: > Thanks @jbond for looking after this. I'll bluntly close this task again now. thanks and can confirm the lat... [16:50:41] 10serviceops, 10MW-on-K8s, 10Release-Engineering-Team, 10SRE: The restricted/mediawiki-multiversion image should include the production version of private/PrivateSettings.php - https://phabricator.wikimedia.org/T284582 (10Joe) [16:53:06] 10serviceops, 10Data-Persistence (Consultation), 10SRE, 10ops-codfw: codfw: Relocate servers in 10G racks - https://phabricator.wikimedia.org/T281135 (10Papaul) [16:53:56] 10serviceops, 10Data-Persistence (Consultation), 10SRE, 10ops-codfw: codfw: Relocate servers in 10G racks - https://phabricator.wikimedia.org/T281135 (10Papaul) 05Open→03Resolved Complete [18:23:55] 10serviceops, 10SRE, 10Patch-For-Review: Publish wikimedia-bullseye base docker image - https://phabricator.wikimedia.org/T281596 (10Majavah) [18:24:58] 10serviceops, 10MW-on-K8s, 10Release-Engineering-Team, 10SRE, 10Patch-For-Review: Ownership of the /tmp/mw-cache directories should be www-data in the mediawiki-multiversion image - https://phabricator.wikimedia.org/T284581 (10dancy) 05Open→03Resolved a:03dancy @Joe This should be fixed now. You... [18:25:04] 10serviceops, 10MW-on-K8s, 10SRE: Create a mwdebug deployment for mediawiki on kubernetes - https://phabricator.wikimedia.org/T283056 (10dancy) [18:31:14] 10serviceops, 10decommission-hardware: decommission thumbor100[34].eqiad.wmnet - https://phabricator.wikimedia.org/T273137 (10wiki_willy) Hi, just checking to see if the decoms for these servers can proceed? Thanks in advance. ~Willy [18:31:49] 10serviceops, 10decommission-hardware: decommission thumbor200[12].codfw.wmnet - https://phabricator.wikimedia.org/T273141 (10wiki_willy) Hi, just pinging to see if we can get an ETA on when these decoms can move forward. Much appreciated. Thanks, Willy [19:36:10] 10serviceops, 10MW-on-K8s, 10Release-Engineering-Team, 10SRE, 10Patch-For-Review: The restricted/mediawiki-multiversion image should include the production version of private/PrivateSettings.php - https://phabricator.wikimedia.org/T284582 (10dancy) 05Open→03Resolved a:03dancy @Joe This should be fi... [19:36:16] 10serviceops, 10MW-on-K8s, 10SRE: Create a mwdebug deployment for mediawiki on kubernetes - https://phabricator.wikimedia.org/T283056 (10dancy) [20:01:54] :~/deployment-charts$ rake scaffold [20:01:55] rake aborted! [20:01:55] LoadError: cannot load such file -- git [20:01:55] /home/mutante/deployment-charts/.rake_modules/monkeypatch.rb:3:in `' [20:01:57] /home/mutante/deployment-charts/Rakefile:13:in `' [20:02:00] (See full trace by running task with --trace) [20:03:01] ok, it requires 'git' and I dont have ruby-git. But that monkeypatch part requiring it must be new [20:53:24] <_joe_> mutante: yeah relatively new indeed [20:54:12] _joe_: ACK, no real issue then, just had to install ruby-git package on my laptop but did not need it for this before [20:54:16] thanks