[00:02:41] 10serviceops, 10MediaWiki-Cache, 10MediaWiki-General, 10Performance-Team, 10User-jijiki: Use monotonic clock instead of microtime() for perf measures in MW PHP - https://phabricator.wikimedia.org/T245464 (10Krinkle) Ah, I forgot about that. Agreed yeah, it's just for CI and dev then. [07:02:24] 10serviceops, 10MW-on-K8s, 10Kubernetes: Kubernetes timeing out before pulling the mediawiki-multiversion image - https://phabricator.wikimedia.org/T284628 (10JMeybohm) >>! In T284628#7337156, @dancy wrote: > @JMeybohm By the way, I think I managed to get the 'jenkins' k8s account auto-banned in the staging... [08:58:39] jelto: o/ I came up with https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/719128, that is basically a ml-services dir in helmfile.d (to separate concerns) [08:59:40] it is a proposal for the helm3 service deployment for ML, if you have time later on I'd really like to know your opinion [08:59:57] there are surely a lot of things to improve [09:05:25] elukey: I will take a look later today! [09:05:33] <3 [09:27:45] 10serviceops, 10SRE Observability (FY2021/2022-Q1), 10User-fgiunchedi, 10User-jijiki: Handle unknown stats in rsyslog_exporter - https://phabricator.wikimedia.org/T210137 (10fgiunchedi) Upstream PR https://github.com/aleroyer/rsyslog_exporter/pull/5 [09:50:37] 10serviceops, 10SRE Observability (FY2021/2022-Q1), 10User-fgiunchedi, 10User-jijiki: Handle unknown stats in rsyslog_exporter - https://phabricator.wikimedia.org/T210137 (10fgiunchedi) 05Open→03Resolved a:03fgiunchedi Rollout is completed, please reopen if sth is amiss [10:25:28] 10serviceops, 10GitLab, 10Release-Engineering-Team (Next), 10User-brennen: GitLab major version upgrade: 14.x - https://phabricator.wikimedia.org/T289802 (10Jelto) > @Jelto I'll coordinate with you tomorrow, but if you want to go ahead with the upgrade on gitlab2001 before I'm online, feel free. I upgrade... [13:34:38] 10serviceops, 10GitLab, 10Patch-For-Review, 10Release-Engineering-Team (Next), 10User-brennen: GitLab major version upgrade: 14.x - https://phabricator.wikimedia.org/T289802 (10Jelto) I updated `gitlab-runner` to `14.2.0` and `gitlab-ce` to `14.1.5-ce.0` on `apt1001`. [14:39:20] 10serviceops, 10MW-on-K8s, 10Kubernetes: Kubernetes timeing out before pulling the mediawiki-multiversion image - https://phabricator.wikimedia.org/T284628 (10dancy) >>! In T284628#7338653, @JMeybohm wrote: >>>! In T284628#7337156, @dancy wrote: >> @JMeybohm By the way, I think I managed to get the 'jenkins'... [14:52:59] 10serviceops, 10GitLab, 10Release-Engineering-Team (Next), 10User-brennen: GitLab major version upgrade: 14.x - https://phabricator.wikimedia.org/T289802 (10Jelto) I updated `gitlab-ce` to `14.2.3-ce.0` on `apt1001` [15:15:23] 10serviceops, 10MW-on-K8s, 10Kubernetes: Kubernetes timeing out before pulling the mediawiki-multiversion image - https://phabricator.wikimedia.org/T284628 (10dancy) I logged into releases1002 fresh today and helm commands are working. Not sure what happened but I'll report more here if I find out. No acti... [15:19:54] 10serviceops, 10GitLab, 10Release-Engineering-Team (Next), 10User-brennen: GitLab major version upgrade: 14.x - https://phabricator.wikimedia.org/T289802 (10brennen) 05Open→03Resolved a:03brennen We're at 14.2.3 on both gitlab2001 and gitlab1001. Shared runners are at 14.2.0 and restarted without in... [15:20:13] 10serviceops, 10MW-on-K8s, 10Kubernetes: Kubernetes timeing out before pulling the mediawiki-multiversion image - https://phabricator.wikimedia.org/T284628 (10Jelto) @dancy Thanks for finding this issue! I pretty sure its related to some RBAC refactoring I did in [715498](https://gerrit.wikimedia.org/r/c/op... [15:22:15] 10serviceops, 10GitLab, 10Release-Engineering-Team (Doing), 10User-brennen: GitLab major version upgrade: 14.x - https://phabricator.wikimedia.org/T289802 (10brennen) [15:26:09] 10serviceops, 10MW-on-K8s, 10Kubernetes: Kubernetes timeing out before pulling the mediawiki-multiversion image - https://phabricator.wikimedia.org/T284628 (10dancy) @akosiaris @Jelto @JMeybohm Sorry for the false alarm. I figured out why things stopped working. I was working on releases1002.eqiad.wmet.... [15:32:03] 10serviceops, 10SRE, 10wikidiff2, 10Community-Tech (CommTech-Sprint-8), 10Platform Team Workboards (Platform Engineering Reliability): Deploy wikidiff2 1.12.0 - https://phabricator.wikimedia.org/T285857 (10akosiaris) @ArielGlenn Thanks for taking over this. Let us know if you need any help! [15:38:06] 10serviceops, 10SRE: Pods in evicted state for various namespaces in k8s main - https://phabricator.wikimedia.org/T290444 (10akosiaris) For what is worth, evictions are not a bad thing per se in kubernetes. They can happen for a variety of reasons, notably: * `DiskPressure` -- Usable disk is running out on th... [15:41:39] 10serviceops, 10SRE: Pods in evicted state for various namespaces in k8s main - https://phabricator.wikimedia.org/T290444 (10akosiaris) 05Open→03Resolved a:03akosiaris Per the above the answer to `Is it normal that pods are in this state? If not, let's investigate and then add an alarm :)` is "Mostly... [15:56:59] 10serviceops, 10SRE, 10Kubernetes, 10Patch-For-Review: Migrate to helm v3 - https://phabricator.wikimedia.org/T251305 (10Jelto) Some additional RBAC requirements: on `releases1002` and `releases2002` helm is used as well. So when migrating, we have to make sure that the [user](https://gerrit.wikimedia.org... [16:18:18] 10serviceops, 10Prod-Kubernetes, 10Shellbox, 10Kubernetes, 10Patch-For-Review: Docker container logs (stdout, stderr) can grow quite large - https://phabricator.wikimedia.org/T289578 (10akosiaris) >>! In T289578#7311945, @JMeybohm wrote: > Did you try the logrotate approach? Yes, I had at some testing i... [16:25:13] 10serviceops, 10SRE, 10Wikifeeds, 10Patch-For-Review: wikifeeds in codfw seems failing health checks intermittently - https://phabricator.wikimedia.org/T290445 (10akosiaris) To everyone involved, should we have an incident doc about this? Given the amount of people involved and the amount of time that went... [16:25:24] 10serviceops, 10SRE, 10Wikifeeds, 10Patch-For-Review: wikifeeds in codfw seems failing health checks intermittently - https://phabricator.wikimedia.org/T290445 (10akosiaris) p:05Triage→03Low [18:13:26] 10serviceops, 10GitLab, 10Release-Engineering-Team (Yak Shaving 🐃🪒), 10User-brennen: GitLab major version upgrade: 14.x - https://phabricator.wikimedia.org/T289802 (10thcipriani) [19:54:19] 10serviceops, 10MW-on-K8s, 10SRE, 10Performance-Team (Radar): Benchmark performance of MediaWiki on k8s - https://phabricator.wikimedia.org/T280497 (10Krinkle)