[07:56:43] <wikibugs>	 10serviceops, 10SRE, 10observability, 10Patch-For-Review, and 2 others: Create an alert for high memcached bw usage - https://phabricator.wikimedia.org/T224454 (10elukey) An optional (but in my opinion useful) alert could be related to a prolonged usage of the gutter pool, that is not something we wish for...
[09:00:49] <claime>	 Mornin'
[10:01:47] <wikibugs>	 10serviceops, 10MW-on-K8s, 10Release-Engineering-Team (Priority Backlog 📥): Provide an mwdebug functionality on kubernetes - https://phabricator.wikimedia.org/T276994 (10Clement_Goubert) I created a [[ https://logstash.wikimedia.org/app/dashboards#/view/c8fa7480-6a48-11ed-83a4-ab884db3ba3b | mw-debug (k8s) ]...
[10:01:56] <wikibugs>	 10serviceops, 10MW-on-K8s, 10Release-Engineering-Team (Priority Backlog 📥): Provide an mwdebug functionality on kubernetes - https://phabricator.wikimedia.org/T276994 (10Clement_Goubert) I created a [[ https://logstash.wikimedia.org/app/dashboards#/view/c8fa7480-6a48-11ed-83a4-ab884db3ba3b | mw-debug (k8s) ]...
[10:13:53] <_joe_>	 claime: cool
[10:14:21] <claime>	 I'm trying to create the grafana dashes for the other deployments
[10:30:07] <godog>	 hi -- re: confd + mw + k8s I was wondering if you had time/bandwidth to think about https://phabricator.wikimedia.org/T322523 ?
[10:49:35] <jayme>	 elukey: new pause container works as expected with 1.23 btw
[10:52:03] <elukey>	 \o/
[11:50:53] <_joe_>	 hnowlan: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/856950 (diff at https://integration.wikimedia.org/ci/job/helm-lint/8405/console
[11:51:05] <_joe_>	 to convert api-gateway to modules
[11:54:34] <wikibugs>	 10serviceops, 10SRE, 10Traffic, 10Patch-For-Review: _etcd-client SRV record missing for conftool cluster - https://phabricator.wikimedia.org/T320397 (10Vgutierrez) 05Open→03Resolved a:03Joe ` vgutierrez@lvs6001:~$ ./liberica etcd --config /home/vgutierrez/config.yaml  Using config file: /home/vgutier...
[11:56:49] <hnowlan>	 _joe_: nice, lgtm 
[11:59:32] <hnowlan>	 in case this is useful for anyone, I got sick of continuously forgetting to bump Chart.yaml when I change a chart so I wrote a pre-commit hook: https://gist.github.com/nosmo/306d50581f4069958206d772b6d49176 
[11:59:52] <hnowlan>	 there's probably edge cases but It'll Do
[12:00:06] <claime>	 Noice.
[12:11:12] <_joe_>	 I kind-of did it in rake
[12:11:28] <_joe_>	 I was wondering if we wanted to add it as a pre-commit hook
[12:36:30] <akosiaris>	 yeah, it's been bugging me too. And it clearly annoys users as well, we should at least have it in CI
[12:56:24] <_joe_>	 ideally, it should ask
[12:57:24] <wikibugs>	 10serviceops, 10SRE, 10Thumbor, 10Thumbor Migration, and 2 others: Encoding issues when handling unicode characters in filenames - https://phabricator.wikimedia.org/T323114 (10hnowlan) 05Open→03Resolved
[12:57:26] <wikibugs>	 10serviceops, 10SRE, 10Thumbor, 10Thumbor Migration, and 2 others: Migrate thumbor to Kubernetes - https://phabricator.wikimedia.org/T233196 (10hnowlan)
[14:38:38] <elukey>	 hnowlan: o/
[14:39:26] <elukey>	 I am checking https://istio.io/latest/docs/tasks/policy-enforcement/rate-limit as possibility to add basic rate limits to our istio ingresses (if needed). What do we use for the api-gw?
[14:40:19] <elukey>	 (in my case the rate limit would be needed as basic protection for the internal endpoint of the ml-serve clusters, inference.discovery.wmnet, since multiple people will query it from Hadoop etc..)
[14:40:26] <elukey>	 (so bypassing the api-gw)
[14:50:07] <hnowlan>	 elukey: we use the Envoy ratelimit implmentation mentioned in those docs (https://github.com/envoyproxy/ratelimit) 
[14:50:14] <wikibugs>	 10serviceops, 10Maps: Re-import full planet data into eqiad and codfw - https://phabricator.wikimedia.org/T314472 (10jijiki) 05Open→03In progress
[14:50:15] <hnowlan>	 it runs as a sidecar in each apigw pod 
[14:51:44] <elukey>	 hnowlan: ah nice and does it use redis as well?
[14:52:05] <elukey>	 or is it a local rate limit for each api-gw pod?
[14:52:15] <wikibugs>	 10serviceops, 10Maps: Re-import full planet data into eqiad and codfw - https://phabricator.wikimedia.org/T314472 (10jijiki) Planet import in eqiad (on maps1009) started at 11:53 UTC
[14:54:13] <_joe_>	 elukey: yes it uses redis
[14:54:44] <elukey>	 ack thanks
[14:56:04] <elukey>	 I see the rdb nodes specified in the config
[14:57:02] <elukey>	 would it be ok to have an instance of Redis for other clusters as well on rdb nodes? (trying to understand if I need one on ML-specific nodes or if rdb could be used for this use case)
[14:57:37] <_joe_>	 elukey: not sure there's still room on the rdbs
[14:57:42] <_joe_>	 but in theory, yes
[14:57:47] <_joe_>	 you could use them
[14:57:57] <_joe_>	 clearly though, the moment you do so
[15:00:33] <elukey>	 (there is also nutcracker of course, sigh..)
[15:08:04] <_joe_>	 hnowlan: uh why do we have both cassandra-http-gateway and image-suggestion-api as charts?
[15:09:27] <hnowlan>	 _joe_: afaik image-suggestion-api is an older project that never got deployed 
[15:09:40] <_joe_>	 oh ok
[15:09:46] <_joe_>	 so we can maybe remove the chart you mean
[15:10:12] <_joe_>	 uh we also have a deployment
[15:10:39] <_joe_>	 but not a namespace in production
[15:11:47] <_joe_>	 ok, let's remove it then
[15:11:49] <_joe_>	 :)
[15:13:46] <hnowlan>	 let me confirm 
[15:13:59] <_joe_>	 I'll prepare a patch in the meantime 
[15:26:50] <akosiaris>	 https://groups.google.com/a/kubernetes.io/g/dev/c/sEVopPxKPDo/m/9ME3CzicBwA
[15:27:22] <akosiaris>	 interesting. etcd 3.4 and 3.4 can croak and corrupt data 
[15:28:00] <akosiaris>	 the "usually there is no data loss" part sounds particularly reassuring (not)
[15:34:05] <_joe_>	 akosiaris: the worst part is
[15:34:08] <_joe_>	 "This issue only affects etcd clusters where auth is enabled."
[15:34:15] <_joe_>	 that smells like code rot tbh
[15:34:58] <akosiaris>	 for the second issue
[15:35:07] <akosiaris>	 but the first issue isn't reassuring either
[15:35:15] <akosiaris>	 not that we ever do defragmentation
[15:35:24] <akosiaris>	 but imagine if we found ourselves wanting to
[15:36:04] <_joe_>	 we're still on 3.3 :P
[15:54:27] <wikibugs>	 10serviceops, 10MW-on-K8s, 10SRE, 10Traffic, 10Release-Engineering-Team (Seen): Make mw-web and mw-api-ext available behind LVS - https://phabricator.wikimedia.org/T323621 (10Clement_Goubert)
[16:00:47] <wikibugs>	 10serviceops, 10MW-on-K8s, 10SRE, 10Traffic, 10Release-Engineering-Team (Seen): Make mw-web and mw-api-ext available behind LVS - https://phabricator.wikimedia.org/T323621 (10Clement_Goubert) 05Open→03In progress
[16:01:03] <wikibugs>	 10serviceops, 10MW-on-K8s, 10SRE, 10Traffic, and 3 others: Deploy mediawiki kubernetes services - https://phabricator.wikimedia.org/T321786 (10Clement_Goubert)
[19:02:22] <wikibugs>	 10serviceops, 10DC-Ops, 10SRE, 10ops-eqiad: Decommission mw13[07-48] - https://phabricator.wikimedia.org/T306162 (10wiki_willy) a:03Jclark-ctr
[21:29:25] <wikibugs>	 10serviceops, 10SRE, 10observability, 10Patch-For-Review, and 2 others: Create an alert for high memcached bw usage - https://phabricator.wikimedia.org/T224454 (10jijiki) >>! In T224454#8411988, @elukey wrote: > An optional (but in my opinion useful) alert could be related to a prolonged usage of the gutte...
[21:35:54] <wikibugs>	 10serviceops, 10SRE, 10good first task: Upgrade all deployment charts to use the latest version of common_templates - https://phabricator.wikimedia.org/T292390 (10Aklapper)