[07:37:21] 10serviceops: Decommission mc2019-mc2037 - https://phabricator.wikimedia.org/T313733 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by jiji@cumin1001 for hosts: `mc2031.codfw.wmnet` - mc2031.codfw.wmnet (**WARN**) - Downtimed host on Icinga/Alertmanager - Found physical host - //Managemen... [09:17:12] 10serviceops: Decommission mc2019-mc2037 - https://phabricator.wikimedia.org/T313733 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by jiji@cumin1001 for hosts: `mc2032.codfw.wmnet` - mc2032.codfw.wmnet (**WARN**) - Downtimed host on Icinga/Alertmanager - Found physical host - //Managemen... [09:28:25] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Decide on new Pod and Sevice IP ranges for wikikube clusters - https://phabricator.wikimedia.org/T326617 (10JMeybohm) [09:30:05] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Decide on new Pod and Sevice IP ranges for wikikube clusters - https://phabricator.wikimedia.org/T326617 (10JMeybohm) [09:30:41] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Decide on new Pod and Sevice IPv4 ranges for wikikube clusters - https://phabricator.wikimedia.org/T326617 (10JMeybohm) [10:06:14] 10serviceops: Decommission mc2019-mc2037 - https://phabricator.wikimedia.org/T313733 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by jiji@cumin1001 for hosts: `mc2033.codfw.wmnet` - mc2033.codfw.wmnet (**WARN**) - Downtimed host on Icinga/Alertmanager - Found physical host - //Managemen... [10:11:42] !log repooled parse1002.eqiad.wmnet - T326119 [10:50:43] 10serviceops, 10Foundational Technology Requests, 10Prod-Kubernetes, 10Shared-Data-Infrastructure, and 2 others: Update Kubernetes clusters to v1.23 - https://phabricator.wikimedia.org/T307943 (10JMeybohm) [10:51:07] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Update staging-codfw to k8s 1.23 - https://phabricator.wikimedia.org/T326340 (10JMeybohm) [12:18:57] 10serviceops: Decommission mc2019-mc2037 - https://phabricator.wikimedia.org/T313733 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by jiji@cumin1001 for hosts: `mc2034.codfw.wmnet` - mc2034.codfw.wmnet (**WARN**) - Downtimed host on Icinga/Alertmanager - Found physical host - //Managemen... [12:39:50] 10serviceops, 10SRE, 10User-Joe: etcd switchover/enhancements - https://phabricator.wikimedia.org/T159687 (10LSobanski) [14:03:57] 10serviceops: Decommission mc2019-mc2037 - https://phabricator.wikimedia.org/T313733 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by jiji@cumin1001 for hosts: `mc2035.codfw.wmnet` - mc2035.codfw.wmnet (**WARN**) - Downtimed host on Icinga/Alertmanager - Found physical host - //Managemen... [14:07:19] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Update staging-codfw to k8s 1.23 - https://phabricator.wikimedia.org/T326340 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=eff8a645-166c-412e-8f27-b7169d6aa830) set by jayme@cumin1001 for 1 day, 0:00:00 on 6 host(s) an... [14:28:31] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Update staging-codfw to k8s 1.23 - https://phabricator.wikimedia.org/T326340 (10ops-monitoring-bot) Cookbook cookbooks.sre.ganeti.reimage was started by jayme@cumin1001 for host kubestagetcd2001.codfw.wmnet with OS bullseye [14:28:41] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Update staging-codfw to k8s 1.23 - https://phabricator.wikimedia.org/T326340 (10ops-monitoring-bot) Cookbook cookbooks.sre.ganeti.reimage started by jayme@cumin1001 for host kubestagetcd2001.codfw.wmnet with OS bullseye executed with errors... [14:33:35] 10serviceops: Decommission mc2019-mc2037 - https://phabricator.wikimedia.org/T313733 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by jiji@cumin1001 for hosts: `mc2036.codfw.wmnet` - mc2036.codfw.wmnet (**WARN**) - Downtimed host on Icinga/Alertmanager - Found physical host - //Managemen... [14:50:35] 10serviceops, 10MW-on-K8s, 10SRE, 10observability: Logging options for apache httpd in k8s - https://phabricator.wikimedia.org/T265876 (10Joe) We have now the logs in kafka, and thus should also be ingested in logstash, and create a dashboard. Once that's done, we should reduce also the retention time of... [14:53:35] 10serviceops, 10SRE, 10Thumbor: Thumbor units failing / service general slowness - https://phabricator.wikimedia.org/T312722 (10LSobanski) [14:56:02] 10serviceops, 10MW-on-K8s, 10SRE, 10observability: Logging options for apache httpd in k8s - https://phabricator.wikimedia.org/T265876 (10Ottomata) If we did {T291645} and {T276972}, these logs could be mirrored to Kafka jumbo and available in Hive and Turnilo too. [14:56:44] elukey: btullis o/ [14:57:08] I'm in a meeting, but out in 5 minutes. [14:59:33] yeehaw [15:00:20] I'm out now. How can I help? [15:01:27] 10serviceops: Decommission mc2019-mc2037 - https://phabricator.wikimedia.org/T313733 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by jiji@cumin1001 for hosts: `mc2037.codfw.wmnet` - mc2037.codfw.wmnet (**WARN**) - Downtimed host on Icinga/Alertmanager - Found physical host - //Managemen... [15:02:38] btullis: think we can do https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/876200 [15:02:40] ? [15:03:47] I'm around fwiw if you need help ottomata [15:03:47] I'm game if you are. [15:04:02] i'm game, jayme is game. les go. [15:04:13] i've never deployed somethign like this [15:04:21] nobody ever has :-p [15:04:29] so after merge, what? deployment server, cd helmfile.d/admin_ng [15:04:36] i can do it with helmfile apply? [15:04:42] or do I need to do helm install? [15:04:45] no, def helmfile. [15:04:45] 10serviceops, 10MW-on-K8s, 10SRE, 10observability: Logging options for apache httpd in k8s - https://phabricator.wikimedia.org/T265876 (10Joe) >>! In T265876#8512693, @Ottomata wrote: > If we did {T291645} and {T276972}, these logs could be mirrored to Kafka jumbo and available in Hive and Turnilo too. Wh... [15:05:00] there are two releases, the crds and then the operator [15:05:50] ottomata: you need root (for admin credentials to the cluster) [15:05:55] Needs a `sudo -i` and then `kube_env admin dse-k8s-eqiad` before `cd helmfile.d/admin_ng ` [15:05:59] k [15:06:09] the kube_env part is actually not needed [15:06:15] but sudo is [15:06:32] then you can do "helmfile -e dse-k8s-eqiad -i apply" [15:06:36] will it just be a helmfile apply from admin_ng? ah -d [15:06:36] okay [15:06:45] k i'll do diff first and see [15:06:50] there will be quite a diff, prepare! [15:06:52] heheh [15:07:14] btullis: (and/or jay me) interested in doing together in screen share? okay if not [15:07:32] Yep, more than happy. [15:07:49] i can join as well, waiting for reimage anyways [15:08:18] just started a huddle with you in slack [15:08:52] I have absolutely no clue what you're talking about :D [15:09:11] jayme: you on slack? [15:09:40] * jayme eyes to manager... [15:09:43] sure [15:09:46] go to your DMs, there should be a chat with you me ben. [15:13:29] 10serviceops, 10Campaign-Tools, 10MW-on-K8s: Setup sendmail on k8s container - https://phabricator.wikimedia.org/T325131 (10Joe) >>! In T325131#8490881, @Legoktm wrote: > One other note about using a different sendmail (or switching to $wgSMTP / PEAR) is whether the VERP return-path stuff still works, https:... [15:56:30] 10serviceops, 10Campaign-Tools, 10MW-on-K8s: Setup sendmail on k8s container - https://phabricator.wikimedia.org/T325131 (10Joe) 05Open→03Resolved After more digging on our option to have fallbacks for email: Pear::mail expects the host provided to have MX records, and will cycle through them if needed,... [16:15:27] 10serviceops, 10MW-on-K8s, 10SRE, 10observability: New mediawiki.httpd.accesslog topic on kafka-logging + logstash and dashboard - https://phabricator.wikimedia.org/T324439 (10Clement_Goubert) Changed kafka topic retention time to 2 days instead of the default 7. ` cgoubert@kafka-logging1001:~$ kafka topic... [19:07:02] 10serviceops, 10Infrastructure-Foundations, 10Scap, 10Patch-For-Review, 10Release-Engineering-Team (Deployment Autopilot 🛩ī¸): Use scap to deploy itself to scap targets - https://phabricator.wikimedia.org/T303559 (10dancy) [20:51:38] 10serviceops, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar), 10User-jijiki: Enable TLS on memcached for cross-dc replication - https://phabricator.wikimedia.org/T271967 (10jijiki) [20:51:50] 10serviceops, 10Performance-Team, 10SRE, 10Patch-For-Review, 10User-jijiki: Enable "/*/mw-with-onhost-tier/" route for MediaWiki where safe - https://phabricator.wikimedia.org/T264604 (10jijiki) [20:51:59] 10serviceops, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar), 10Sustainability (Incident Followup): Upgrade and improve our application object caching service (memcached) - https://phabricator.wikimedia.org/T244852 (10jijiki) [20:52:38] 10serviceops, 10SRE, 10Patch-For-Review, 10User-jijiki: Upgrade memcached to version 1.6.x - https://phabricator.wikimedia.org/T270315 (10jijiki) 05Open→03Resolved a:03jijiki Bluntly closing this as we are moving to mediawiki to kubernetes [20:53:21] 10serviceops, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar), 10Sustainability (Incident Followup): Upgrade and improve our application object caching service (memcached) - https://phabricator.wikimedia.org/T244852 (10jijiki) 05Open→03Resolved [21:19:08] 10serviceops: Upgrade mc* and mc-gp* hosts to Debian Bullseye - https://phabricator.wikimedia.org/T293216 (10jijiki) [21:21:47] 10serviceops: Upgrade mc* and mc-gp* hosts to Debian Bullseye - https://phabricator.wikimedia.org/T293216 (10jijiki) p:05Triage→03Medium [22:50:47] ookay btullis , jayme, trying this: not totally sure how to deploy to dse-k8s-eqiad: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/878210