[05:22:12] 10serviceops, 10MW-on-K8s, 10Traffic: Serve production traffic via Kubernetes - https://phabricator.wikimedia.org/T290536 (10jijiki) [07:03:20] 10serviceops, 10SRE: Pods in evicted state for various namespaces in k8s main - https://phabricator.wikimedia.org/T290444 (10elukey) Fine for me, what i had in mind was an alert if a namespace showed evicted pods for too much time (say days) since it seemed to some something that could be missed. Ok to close :) [07:07:33] 10serviceops, 10SRE, 10Wikifeeds, 10Patch-For-Review: wikifeeds in codfw seems failing health checks intermittently - https://phabricator.wikimedia.org/T290445 (10elukey) We didn't have time to follow up but it may be worth an incident doc. The envoy issue that we faced may be either something that could b... [07:38:45] 10serviceops, 10MW-on-K8s, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar): Benchmark performance of MediaWiki on k8s - https://phabricator.wikimedia.org/T280497 (10akosiaris) Graph with latency percentiles comparing baremetal against both the IPv6 etcd egress rule fixed version and the non fixed v... [07:48:50] 10serviceops, 10SRE: Pods in evicted state for various namespaces in k8s main - https://phabricator.wikimedia.org/T290444 (10akosiaris) >>! In T290444#7341138, @elukey wrote: > Fine for me, what i had in mind was an alert if a namespace showed evicted pods for too much time (say days) since it seemed to some s... [07:49:54] akosiaris: thanks for the explanations --^ [09:23:14] 10serviceops, 10decommission-hardware: decommission mc1027.eqiad.wmnet - https://phabricator.wikimedia.org/T281618 (10jijiki) [09:23:57] 10serviceops, 10decommission-hardware, 10ops-eqiad: decommission mc1027.eqiad.wmnet - https://phabricator.wikimedia.org/T281618 (10jijiki) a:03Cmjohnson [10:38:47] I think we may have a slight problem with CI [10:39:04] I got [10:39:06] "10:36:04 Unable to find image 'docker-registry.wikimedia.org/releng/helm-linter:0.2.17' locally [10:39:06] 10:36:05 docker: Error response from daemon: manifest for docker-registry.wikimedia.org/releng/helm-linter:0.2.17 not found: manifest unknown: manifest unknown. [10:39:07] " [10:41:21] I pinged hashar about it [10:48:51] it was my bad :) [11:56:07] effie: someone broke it. Should be fine now [12:08:15] 10serviceops, 10SRE, 10Wikifeeds, 10Patch-For-Review: wikifeeds in codfw seems failing health checks intermittently - https://phabricator.wikimedia.org/T290445 (10akosiaris) >>! In T290445#7341141, @elukey wrote: > "but I still don't have a complete understanding of why this happened :)" I think that's th... [12:28:24] jayme: :D [12:56:13] 10serviceops, 10MW-on-K8s, 10SRE, 10Traffic: Serve production traffic via Kubernetes - https://phabricator.wikimedia.org/T290536 (10akosiaris) p:05Triage→03Medium [13:30:41] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Add helmfile validation for the helmfile.d/admin part - https://phabricator.wikimedia.org/T266670 (10JMeybohm) a:03JMeybohm I also tried to improve the kubeyaml situation (removing the need of parsing the yaml in therakefile). But unfortu... [13:51:55] jelto: thanks a lot! Will read and fix the code change! [15:51:34] 10serviceops, 10Maps, 10Product-Infrastructure-Team-Backlog, 10User-jijiki: Maps 2.0 roll-out plan - https://phabricator.wikimedia.org/T280767 (10MSantos) p:05Triage→03High [15:53:38] 10serviceops, 10Maps, 10Product-Infrastructure-Team-Backlog, 10User-jijiki: Maps 2.0 roll-out plan - https://phabricator.wikimedia.org/T280767 (10MSantos) [16:24:01] 10serviceops, 10SRE, 10decommission-hardware, 10ops-eqiad: decommission mc1027.eqiad.wmnet - https://phabricator.wikimedia.org/T281618 (10Cmjohnson) [16:24:11] 10serviceops, 10SRE, 10decommission-hardware, 10ops-eqiad: decommission mc1027.eqiad.wmnet - https://phabricator.wikimedia.org/T281618 (10Cmjohnson) removed from rack and updated netbox [16:24:39] 10serviceops, 10SRE, 10decommission-hardware, 10ops-eqiad: decommission mc1027.eqiad.wmnet - https://phabricator.wikimedia.org/T281618 (10Cmjohnson) 05Open→03Resolved [18:07:46] 10serviceops, 10MW-on-K8s, 10Datacenter-Switchover: Update switchdc cookbooks for mwdebug service - https://phabricator.wikimedia.org/T290676 (10Legoktm) p:05Triage→03High [18:09:27] 10serviceops, 10Datacenter-Switchover: Split switchdc.mediawiki step 08 into a 09 - https://phabricator.wikimedia.org/T290677 (10Legoktm) [18:21:59] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Add helmfile validation for the helmfile.d/admin part - https://phabricator.wikimedia.org/T266670 (10JMeybohm) There are three new rake tasks now: - **admin_lint**: Runs helmfile lint, this is fast - **admin_validate**: Runs helmfile te... [18:25:33] 10serviceops, 10Datacenter-Switchover, 10Patch-For-Review: Split switchdc.mediawiki step 08 into a 09 - https://phabricator.wikimedia.org/T290677 (10Legoktm) 05Open→03Resolved Docs updated too: https://wikitech.wikimedia.org/w/index.php?diff=1924829&oldid=1924647&title=Switch_Datacenter&type=revision [18:56:29] 10serviceops, 10SRE, 10wikidiff2, 10Community-Tech (CommTech-Sprint-8), 10Platform Team Workboards (Platform Engineering Reliability): Deploy wikidiff2 1.12.0 - https://phabricator.wikimedia.org/T285857 (10ldelench_wmf) Appreciate everyone's help with this! @ArielGlenn this came up at a CommTech retro th... [20:31:13] 10serviceops, 10SRE, 10wikidiff2, 10Community-Tech (CommTech-Sprint-8), 10Platform Team Workboards (Platform Engineering Reliability): Deploy wikidiff2 1.12.0 - https://phabricator.wikimedia.org/T285857 (10ArielGlenn) >>! In T285857#7343025, @ldelench_wmf wrote: > Appreciate everyone's help with this! @A... [22:24:52] 10serviceops, 10MW-on-K8s, 10SRE, 10Patch-For-Review, 10User-jijiki: Create a mwdebug deployment for mediawiki on kubernetes - https://phabricator.wikimedia.org/T283056 (10dpifke) [23:44:12] 10serviceops, 10MW-on-K8s, 10Datacenter-Switchover: Update switchdc cookbooks for mwdebug service - https://phabricator.wikimedia.org/T290676 (10Legoktm) 05Open→03Resolved