[07:44:24] good morning folks [07:44:44] if nobody disagres I'd reimage kubernetes1006 (kask vm) to bullseye [08:07:00] <_joe_> elukey: go on [08:07:43] already started :) [08:29:01] reimage done [08:29:52] the sanity checks are ok, going to uncordon and repool [08:31:06] done, 2 vms to go :) [08:46:42] 10serviceops, 10Infrastructure-Foundations, 10SRE: Cert renewal for {appserver,api}.svc.{eqiad,codfw}.wmnet - https://phabricator.wikimedia.org/T304237 (10fgiunchedi) I wholeheartedly agree with the points made here, I'll add that as part of this quarter's work on the `monitoring` section of `service::catalo... [09:12:35] _joe_ if you are ok I'll also reimage kubernetes1015 (kask vm) [09:13:20] I'm okay fwiw :) [09:13:27] ah good morning! [09:13:33] thanks proceeding :) [09:14:15] hey o/ - It did not seem like I needed to say something earlier. Sorry if I missread that [09:15:11] yes yes I didn't write the sentence correctly, the "if nobody disagrees" if clearly flawed since since I had no idea if people were still in bed or not [09:16:49] (also completely unrelated, I am working on the istio-cni debian pkg, I am hitting an issue while running make build-linux but hopefully I'll have something working EOD) [09:17:44] ack. I'm happy if it takes a couple of more days :) [09:18:16] ahahahah [09:18:17] (not because I want you to suffer more then you alredy do ofc) [09:18:28] (you are very kind) [09:19:20] I am inclined to target 1.9.5 for istio-cni so I can attempt the upgrade path later on [09:19:27] (during the next weeks if/when we'll upgrade) [09:31:00] (may have spoken too soon) [09:32:35] that sounds like a good idea [09:33:25] the go.mod file has this dep https://proxy.golang.org/github.com/envoyproxy/go-control-plane/@v/v0.9.9-0.20210420150223-d760b7f6014b.mod [09:33:29] that returns 410 now [09:33:56] will try to patch it [09:42:08] ouch...I bet they rewrote history for the actual v0.9.9 release which was a day later than when the snapshot was taken [09:44:14] yeah I have the same impression, or something similar [09:44:34] in theory using 0.9.9-0 should work, I can use quilt to apply the patch to go.mod and that's it [09:46:24] are you using the "flow" we use for a bunch of the other go packages (like helm)? gbp plus debian/repack ? [10:03:06] at the moment I am using gdb import to get the pristine-tar+upstream branches with the source release, and then I am trying to build with the istio Makefile (that in turns also builds go code) [10:03:20] it fetches what's needed from gcs though [10:07:25] 1015 uncordoned and running [10:09:39] ah, so you're building from a release tar, not from a git checkout? [10:13:18] jayme: I find it way easier, and the git history is cleaner (no need to pull all the upstream's commits but only a squashed one) [10:13:31] but if there is a preference for the git tag I can change [10:14:52] no objections. I was just curious. As long as it's clear what needs to be done (steps in wikitech or something) - fine with me [10:15:24] yes yes definitely, I'll also ask to Moritz if there is any preference [10:15:52] I like the fact that you just run gbp import $path-to-tarball and everything is handled (git tags, import, commits, etc..) [10:16:12] +1 [10:16:13] I always end up with a mess if I pull the upstream's git history [10:17:32] indeed. I think this was part of the reason for the debian/repack script (which actually creates a tarball from an upsteam tag). That plus not having a proper upstream tarball release [10:21:56] ah I think I've never really used repack then [10:22:08] is there a repo that I can check? [10:22:57] the helm/helm3 repos use it as well as chartmuseum (there may be more), see https://wikitech.wikimedia.org/wiki/Helm#Importing_a_new_version for example [10:24:05] envoy, dragonfly :) [10:30:09] ack thanks [10:33:21] going to drain + reimage kubernetes1016, the last one [11:04:45] hi folks, I'm seeking reviewers for https://gerrit.wikimedia.org/r/c/operations/puppet/+/767729 to mute the irc spam for wikiversion check [11:20:37] 10serviceops, 10Data-Catalog, 10Data-Engineering, 10SRE, and 2 others: New Service Request: DataHub - https://phabricator.wikimedia.org/T303049 (10BTullis) I'm sorry to be a pain, but I'm under some pressure to implement this new service as soon as it's practicable, for which I really need help from #servi... [11:24:25] aaand 1016 done, so all vms on bullseye! [11:24:59] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Machine-Learning-Team (Active Tasks), 10Patch-For-Review: Move kubernetes workers to bullseye and docker to overlayfs - https://phabricator.wikimedia.org/T300744 (10elukey) [11:25:09] jayme: going to start from kubernetes1007 tomorrow if you are ok [11:34:56] 10serviceops, 10SRE, 10observability, 10Patch-For-Review: aggregate mismatched wikiversions alert - https://phabricator.wikimedia.org/T302832 (10jbond) p:05Triage→03Medium [11:39:35] 10serviceops, 10SRE, 10Znuny, 10Patch-For-Review: Move VTRS db passwords to a different hiera location - https://phabricator.wikimedia.org/T303272 (10jbond) p:05Triage→03Medium [11:41:26] 10serviceops, 10SRE, 10Traffic: upstream connect error or disconnect/reset before headers. reset reason: overflow - https://phabricator.wikimedia.org/T303305 (10jbond) p:05Triage→03Medium [11:43:01] 10serviceops, 10Data-Engineering, 10SRE, 10Traffic, 10Trust-and-Safety: Disable GeoIP Legacy Download - https://phabricator.wikimedia.org/T303464 (10jbond) p:05Triage→03Medium [11:50:54] 10serviceops, 10Performance-Team, 10SRE, 10Traffic: Potential navtiming_responseStart regression as of 13 Mar 2022 - https://phabricator.wikimedia.org/T303782 (10jbond) p:05Triage→03Medium [11:57:17] 10serviceops, 10SRE, 10Wikimedia-Etherpad: Etherpads corrupted - https://phabricator.wikimedia.org/T304005 (10jbond) p:05Triage→03Medium [11:59:02] 10serviceops, 10SRE, 10Traffic: upstream connect error or disconnect/reset before headers. reset reason: overflow - https://phabricator.wikimedia.org/T303305 (10Joe) 05Open→03Resolved a:03Joe This happened during an outage. That is the tls terminator of the application servers (envoy) circuit-breaking... [12:45:11] 10serviceops, 10Data-Catalog, 10Data-Engineering, 10SRE, and 2 others: New Service Request: DataHub - https://phabricator.wikimedia.org/T303049 (10Ottomata) > I haven't created TLS certificates for datahub.wikimedia.org I don't believe you will need a cert for this, IIUC it should use the wikimedia.org wil... [13:45:01] elukey: yeah, sure [13:51:11] 10serviceops, 10Data-Catalog, 10Data-Engineering, 10SRE, and 2 others: New Service Request: DataHub - https://phabricator.wikimedia.org/T303049 (10JMeybohm) I'm trying to get back to this today/tomorrow. You don't need to create any TLS certificates and we can use Ingress for both, frontend and gms. [13:56:16] 10serviceops, 10Prod-Kubernetes, 10Patch-For-Review: cert-manager created multiple CertificateRequest objects with the same certificate-revision - https://phabricator.wikimedia.org/T304092 (10JMeybohm) [13:57:46] 10serviceops, 10Prod-Kubernetes, 10Patch-For-Review: cert-manager created multiple CertificateRequest objects with the same certificate-revision - https://phabricator.wikimedia.org/T304092 (10JMeybohm) p:05Medium→03Low Lowering the priority as alerts are in place now and I don't have high hopes regarding... [14:02:12] 10serviceops, 10Performance-Team, 10SRE, 10Traffic: Potential navtiming_responseStart regression as of 13 Mar 2022 - https://phabricator.wikimedia.org/T303782 (10Vgutierrez) latest round of HAProxy reimages were performed between March 7th and March 8th: ` * 4d58564f87 - site: Reimage cp1083 as cache::text... [14:21:02] <_joe_> godog: I merged a change today that should've fixed the issue supposedly? [14:21:14] <_joe_> (the wikiversions spam) [14:30:38] _joe_: thank you, yeah though the per-host spam might still be there in similar cases, i.e. spam for each mw host [14:56:07] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Machine-Learning-Team (Active Tasks), 10Patch-For-Review: Move kubernetes workers to bullseye and docker to overlayfs - https://phabricator.wikimedia.org/T300744 (10elukey) [15:14:48] 10serviceops, 10Infrastructure-Foundations, 10SRE: Cert renewal for {appserver,api}.svc.{eqiad,codfw}.wmnet - https://phabricator.wikimedia.org/T304237 (10Volans) [15:19:47] 10serviceops, 10Infrastructure-Foundations, 10SRE: Cert renewal for {appserver,api}.svc.{eqiad,codfw}.wmnet - https://phabricator.wikimedia.org/T304237 (10Volans) [15:19:53] 10serviceops, 10Infrastructure-Foundations, 10SRE: Cert renewal for {appserver,api}.svc.{eqiad,codfw}.wmnet - https://phabricator.wikimedia.org/T304237 (10Volans) [15:39:16] 10serviceops, 10Maps, 10Product-Infrastructure-Team-Backlog, 10User-jijiki: Connect kartotherian to tegola as a tile backend per cluster - https://phabricator.wikimedia.org/T298248 (10Jgiannelos) [15:39:34] 10serviceops, 10Maps, 10Product-Infrastructure-Team-Backlog, 10User-jijiki: Connect kartotherian to tegola as a tile backend per cluster - https://phabricator.wikimedia.org/T298248 (10Jgiannelos) 05Open→03Resolved a:03Jgiannelos [15:39:40] 10serviceops, 10Maps, 10Product-Infrastructure-Team-Backlog, 10Patch-For-Review, 10User-jijiki: Maps 2.0 roll-out plan - https://phabricator.wikimedia.org/T280767 (10Jgiannelos) [15:49:24] 10serviceops, 10Infrastructure-Foundations, 10SRE: Cert renewal for {appserver,api}.svc.{eqiad,codfw}.wmnet - https://phabricator.wikimedia.org/T304237 (10JMeybohm) [15:51:32] 10serviceops, 10Maps, 10Product-Infrastructure-Team-Backlog, 10Patch-For-Review, 10User-jijiki: Maps 2.0 roll-out plan - https://phabricator.wikimedia.org/T280767 (10Jgiannelos) [15:52:43] 10serviceops, 10DC-Ops, 10SRE, 10ops-eqiad: Q2: (Need By: TBD) rack/setup/install kubestage100[34].eqiad.wmnet - https://phabricator.wikimedia.org/T290894 (10akosiaris) [15:53:15] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: setup/install kubestage100[34] - https://phabricator.wikimedia.org/T293729 (10akosiaris) 05Open→03Resolved This has been done, resolving! [15:57:52] 10serviceops: Test running php7.2 and php7.4 in parallel on the beta cluster - https://phabricator.wikimedia.org/T295578 (10JMeybohm) [17:16:54] 10serviceops, 10InternetArchiveBot: IABot 301 POST requests - https://phabricator.wikimedia.org/T274090 (10Harej) 05Open→03Resolved [18:35:43] 10serviceops, 10Performance-Team, 10SRE, 10Traffic: Potential navtiming_responseStart regression as of 13 Mar 2022 - https://phabricator.wikimedia.org/T303782 (10Krinkle) a:03Krinkle