[08:09:42] hello folks! [08:09:54] I filed a patch for rec-api, to allow prometheus metrics [08:09:55] https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/983403 [08:10:31] if anybody has time for a quick check lemme know :) (today IIUC it is not freezed so I wanted to test the new docker image + metrics in staging etc..) [08:14:02] 10serviceops, 10Infrastructure-Foundations, 10Puppet-Core, 10SRE, and 4 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619 (10MoritzMuehlenhoff) [09:41:55] elukey: done [09:42:08] <3 [10:51:16] 10serviceops, 10MW-on-K8s, 10Patch-For-Review: Handle sidecar containers in one-off Kubernetes jobs - https://phabricator.wikimedia.org/T348284 (10JMeybohm) Bummer...the change you proposed would require us to deploy one sidecar-controller per namespace (probably this is the yak you are looking for :-)) - wh... [12:04:33] worked nicely! [12:04:46] I have updated https://grafana-rw.wikimedia.org/d/Y5wk80oGk/recommendation-api to show native prometheus metrics as well [12:05:01] in theory we could deploy to prod and complete the work, lemme know what you think [12:08:15] https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/983694 [12:20:21] elukey: IIRC the only user of that thing is the android app ? [12:20:31] or are you going to morph it into something more after all? [13:37:52] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Migrate wikikube control planes to hardware nodes - https://phabricator.wikimedia.org/T353464 (10akosiaris) I am not so sure we actually do scratch that memory limit now. Looking at [kubemaster2001 last week](https://grafana.wikimedia.org/d/000000377/host-overvi... [13:58:14] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Migrate wikikube control planes to hardware nodes - https://phabricator.wikimedia.org/T353464 (10JMeybohm) >>! In T353464#9412761, @akosiaris wrote: > I am not so sure we actually do scratch that memory limit now. Looking at [kubemaster2001 last week](https://gr... [14:28:28] akosiaris: o/ yes correct, I don't know exactly what is the plan but IIUC the "new" python-based rec-api-ng (on lift wing) should take over (eventually) [14:28:41] of course even the new codebase doesn't have any owner [15:23:30] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Migrate wikikube control planes to hardware nodes - https://phabricator.wikimedia.org/T353464 (10bking) Forgive me for the drive-by comment, but would it be possible to create high IOPS tiers for Ganeti (RAID-0?) I'd recommend deploying in conjunction with non-D... [15:31:40] rolling back rec-api, the prometheus metrics need some refinement [16:00:14] 10serviceops, 10Dumps-Generation, 10Infrastructure-Foundations, 10SRE-tools, 10IPv6: Some Service Operations clusters apparently do not support IPv6 - https://phabricator.wikimedia.org/T271142 (10akosiaris) >>! In T271142#9382040, @Volans wrote: > Another datapoint for the mw*/parse* clusters, they will... [16:15:20] 10serviceops, 10Data-Platform-SRE (2023/24 Q2 Milestone 1), 10Discovery-Search (Current work): Enable mediawiki.cirrussearch.page_rerender.v1 on all public wikis - https://phabricator.wikimedia.org/T351503 (10pfischer) [16:48:15] 10serviceops, 10Dumps-Generation, 10Infrastructure-Foundations, 10SRE-tools, 10IPv6: Some Service Operations clusters apparently do not support IPv6 - https://phabricator.wikimedia.org/T271142 (10akosiaris) Regarding the mc* hosts, I 've been mulling over this one for some time now trying to figure out t... [16:58:52] 10serviceops, 10MW-on-K8s, 10Patch-For-Review: Handle sidecar containers in one-off Kubernetes jobs - https://phabricator.wikimedia.org/T348284 (10RLazarus) Oh, I misunderstood what you meant by "enable the controller on a per namespace level" [[ #9392506 | above ]]! I thought deploying one instance per name... [17:03:59] brouberol elukey any concerns with running these commands against kafka-jumbo? https://phabricator.wikimedia.org/T351503#9381039 . If there are any commands I need to run to check LMK [17:04:38] inflatador: fine to me! [17:05:02] before starting you can check with `kafka topics --describe` how many partitions there are now [17:05:28] I am not 100% sure how kafka mirror handles multiple partitions from consumer/producer [17:09:51] elukey excellent, thanks...I'll ask David/Peter if they ran into issues w/mirroring on kafka-main since it's been set up for awhile [17:13:19] 10serviceops, 10Phabricator, 10collaboration-services, 10Release-Engineering-Team (Bonus Level 🕹️): Deprecate git-ssh service on phabricator.wikimedia.org - https://phabricator.wikimedia.org/T296022 (10Dzahn) @Aklapper Yea, weak "yes". The cloud one, sure, just a comment and not relevant anymore. The other... [17:15:16] Inflatador: I’m afk. I defer to e.lukey’s opinion [17:15:19] 10serviceops, 10Content-Transform-Team-WIP, 10Page Content Service, 10RESTBase Sunsetting: Update mobileapps k8s deployment chart for Cassandra credentials - https://phabricator.wikimedia.org/T350507 (10hnowlan) Am I right in thinking that mobileapps itself has had no code changes to that allow it to commu... [17:16:38] hmm, looks like there is no kafka-jumbo in CODFW...not sure if the mirroring is still a consideration since there is some mirroring between main/jumbo [17:17:48] 10serviceops, 10Phabricator, 10collaboration-services, 10Release-Engineering-Team (Bonus Level 🕹️): Deprecate git-ssh service on phabricator.wikimedia.org - https://phabricator.wikimedia.org/T296022 (10Dzahn) it never ends: ` data/fixed_settings.yaml:diffusion.ssh-user: 'vcs' files/phab_deploy_config_dep... [17:33:11] inflatador: do those topics exist on kafka main? I think they do? we should probably alter in both places to keep consistent [17:36:46] ottomata indeed they do...partition count has already been increased on main, so this change makes them consistent [17:42:34] ah okay, great ty [17:50:45] 10serviceops, 10envoy, 10observability, 10Patch-Needs-Improvement: Envoy should listen on ipv6 and ipv4 - https://phabricator.wikimedia.org/T255568 (10Dzahn) per comments on https://gerrit.wikimedia.org/r/c/operations/puppet/+/983893/ "grepping thru the Puppet repo shows 11 instances of `profile::tlsproxy... [18:17:03] 10serviceops, 10envoy, 10observability, 10Patch-Needs-Improvement: Envoy should listen on ipv6 and ipv4 - https://phabricator.wikimedia.org/T255568 (10akosiaris) >>! In T255568#9413725, @Dzahn wrote: > per comments on https://gerrit.wikimedia.org/r/c/operations/puppet/+/983893/ "grepping thru the Puppet r... [18:25:55] 10serviceops, 10Content-Transform-Team-WIP, 10Page Content Service, 10RESTBase Sunsetting: Update mobileapps k8s deployment chart for Cassandra credentials - https://phabricator.wikimedia.org/T350507 (10Jgiannelos) Not yet, we are building a small npm package for that but haven't merged anything on PCS yet... [20:30:38] 10serviceops, 10LandingCheck, 10MW-on-K8s, 10MediaWiki-extensions-WikimediaEvents, and 4 others: PHP Warning: geoip_country_code_by_name(): Required database not available at /usr/share/GeoIP/GeoIP.dat. - https://phabricator.wikimedia.org/T352156 (10XenoRyet) [23:43:44] 10serviceops, 10SRE, 10ops-codfw: Php-fpm failed to start on one host during Scap (ssh to mw2448.codfw.wmnet timed out) - https://phabricator.wikimedia.org/T353679 (10taavi) I've set this host as `pooled=inactive` in conftool. Tagging #serviceops since this is their host and #ops-codfw directly given the his... [23:45:40] 10serviceops, 10SRE, 10ops-codfw: mw2448.codfw.wmnet is down - https://phabricator.wikimedia.org/T353679 (10taavi) [23:52:20] 10serviceops, 10Phabricator, 10collaboration-services, 10Release-Engineering-Team (Bonus Level 🕹️): Deprecate git-ssh service on phabricator.wikimedia.org - https://phabricator.wikimedia.org/T296022 (10Aklapper) @Dzahn: Eh, sorry if I opened a can of worms. Feel free to leave as-is then?