[02:41:43] <wikibugs>	 10serviceops, 10SRE, 10Patch-For-Review: Delay spinner showing for graphs for 1s - https://phabricator.wikimedia.org/T256641 (10Seddon) a:05Seddon→03None
[06:55:52] <_joe_>	 so, given I'm working today
[06:56:37] <_joe_>	 jelto/jayme/mutante/effie: I'm going to go through how I looked at the logs the other day during the apcu outage to nail down problematic requests later in the morning
[06:56:46] <_joe_>	 let's say in a couple hours?
[06:57:41] <jelto>	 joe: I have a meeting the next 1.5hr, then I'm available and would be happy to join the session :)
[06:58:04] <jayme>	 depends on definition of couple for me :) I'll be in transit from ~10:30Z for 4h
[07:01:08] <elukey>	 hello folks, I have another istio-related question (please be patient)
[07:01:32] <elukey>	 after reading a ton of logs to find why the current set up is not working, I saw this on the kubeapi server
[07:01:35] <elukey>	 failed calling webhook "validation.istio.io": Post https://istiod.istio-system.svc:443/validate?timeout=30s: dial tcp 10.64.77.73:443: i/o timeout
[07:02:06] <elukey>	 so istiod (the only pod up, the control plane) tries at first to validate a wrong config via validation webhook, and fails (so all the rest is not created)
[07:02:35] <elukey>	 in theory the kube api should be able to call the istiod validation webhook when needed 
[07:02:38] <elukey>	 but this is not working
[07:02:58] <elukey>	 is there a special calico config that I should look up, or something else in your opinion?
[07:05:05] <wikibugs>	 10serviceops, 10SRE, 10Datacenter-Switchover: Document communication expectations around planning a DC switchover - https://phabricator.wikimedia.org/T285806 (10Legoktm) a:03Legoktm I've tried to summarize a combination of what I did and the feedback here into https://wikitech.wikimedia.org/wiki/Switch_Dat...
[07:06:57] <jayme>	 elukey: the FQDN looks weird in first place but let's ignore as it is looked up. Is 10.64.77.73 the IP of your istiod pod?
[07:07:20] <wikibugs>	 10serviceops, 10MW-on-K8s, 10SRE, 10Shellbox, and 3 others: RFC: PHP microservice for containerized shell execution - https://phabricator.wikimedia.org/T260330 (10Samwilson) The 1.36 release notes say that "Command::execute() now returns a Shellbox\Command\UnboxedResult instead of a MediaWiki\Shell\Result....
[07:07:56] <jayme>	 elukey: do you have proper ingress policies in place for the apiserver(s) to reach the pod?
[07:10:40] <elukey>	 jayme: for the IP it should be, there is also an svc called "istiod" with the following
[07:10:43] <elukey>	 Port:              https-webhook  443/TCP
[07:10:45] <elukey>	 TargetPort:        15017/TCP
[07:11:13] <elukey>	 about the ingress policies, probably not, I have never added them (this is what I was asking about)
[07:11:36] <elukey>	 ah now I remember that our GlobalNetworkPolicies are empty, Alex at the time said that we should have done it in a second step
[07:12:21] <elukey>	 ok so I guess I have to mess with that 
[07:13:10] <jayme>	 I'm not 100% what the calico default is when you've not defined anything
[07:13:20] <elukey>	 misery probably
[07:13:35] <jayme>	 I would have assumed "allow all" tbh
[07:13:42] <wikibugs>	 10serviceops, 10SRE, 10Datacenter-Switchover: Document communication expectations around planning a DC switchover - https://phabricator.wikimedia.org/T285806 (10Joe) After talking off-phabricator with a few people, I think what we have seen is more of a failure of coordination between affected SRE teams than...
[07:14:57] <jayme>	 but take a look at the rules we have in main.yaml. You'll get an idea
[07:16:43] <jayme>	 maybe https://docs.projectcalico.org/security/app-layer-policy would be a good read as well. Don't know if that applies to you or is just about sidecar injection stuff
[07:18:47] <elukey>	 I'll try to read it, in theory at the moment we don't need the sidecar injection magic since we don't use it, but anything can be true at this point :D
[07:19:30] <elukey>	 I am pretty sure that if I unblock the kube api -> istiod webhook comms it should work, but there may be more to do
[07:19:44] <elukey>	 I'll try to also check that defaults the calico global net policies have
[07:20:13] <elukey>	 in issues like https://github.com/istio/istio/issues/19532 they say "update the firewall rules" that is of course very helpful :D
[07:22:34] <_joe_>	 elukey: so before going to blindly change stuff, I'd just go and do some blackbox debugging the old-fashioned way
[07:22:48] <_joe_>	 I can help with that if you want
[07:23:19] <_joe_>	 elukey: how do I select your cluster/namespace with kube_env?
[07:23:44] <_joe_>	 ml-serve-eqiad, I see
[07:25:01] <elukey>	 _joe_ I am not going to blindly change stuff, I am following what suggested by upstream and what logs are pointing to :)
[07:25:16] <_joe_>	 which seem quite unclear
[07:25:41] <_joe_>	 that's why I said "blindly", as in "we're not sure what's not working exactly
[07:26:09] <_joe_>	 sorry I wasn't suggesting you were acting recklessly :)
[07:27:16] <elukey>	 the thing that it is not working, IIUC, is that when istiod tries to validate a "Wrong" config as pre-check, it calls the kube-api that in turn has to call the istiod validation webhook, and this times out for some reason
[07:27:44] <elukey>	 I don't have a solid idea if it is Calico not allowing it or something else
[07:27:51] <_joe_>	 so calling istiod from the kube-api seems to be the issue
[07:31:17] <elukey>	 the IP mentioned in the errors (to follow up to what Janis asked, I was wrong) is
[07:31:20] <elukey>	 elukey@ml-serve-ctrl1001:~$ kubectl get svc -n istio-system
[07:31:23] <elukey>	 NAME     TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)                                 AGE
[07:31:26] <elukey>	 istiod   ClusterIP   10.64.77.73   <none>        15010/TCP,15012/TCP,443/TCP,15014/TCP   40h
[07:31:50] <elukey>	 and it does
[07:31:51] <elukey>	 Port:              https-webhook  443/TCP
[07:31:51] <elukey>	 TargetPort:        15017/TCP
[07:32:10] <elukey>	 (selector is istio, so it should map to the istiod pod's 15017 in theory)
[07:33:07] <jayme>	 makes sense to be a service ip rather than a pod ip, sorry
[07:33:24] <elukey>	 nono sorry me, I should have better checked :)
[07:33:43] <elukey>	 I tried nsenter to check the port on the pod etc.. and it looks working
[07:35:53] <jayme>	 kubectl get ep -n istio-system gives you the correct endpoints as well, right?
[07:36:38] <elukey>	 ah nice TIL!
[07:36:46] <elukey>	 yes target is the istiod pod ip
[07:40:37] <jayme>	 I think it's a firewall issue
[07:41:15] <jayme>	 "Calico network policy is default deny." the docs say
[07:41:53] <_joe_>	 jayme: I don't think so
[07:42:00] <jayme>	 me neither
[07:42:09] <jayme>	 maybe different for k8s
[07:42:11] <_joe_>	 from ml-serve1004 I can reach the ip:port
[07:42:23] <_joe_>	 same from ml-serve1005
[07:42:30] <_joe_>	 sorry 1003
[07:42:52] <_joe_>	 because calico's running there
[07:43:01] <jayme>	 ah, makes sense
[07:43:13] <jayme>	 we're not running it at all on masters
[07:43:15] <elukey>	 I imagine that I am the first one testing a validation webhook 
[07:43:22] <_joe_>	 while I can't reach it from e.g. the ml-serve-ctrl1001 server, where calico isn't running
[07:43:24] <jayme>	 you are elukey
[07:43:25] <_joe_>	 elukey: yes
[07:43:45] <_joe_>	 elukey: I think it was even a deliberate choice of ours at the time, but alex would rememeber better
[07:44:03] * elukey take notes "Blame Alex"
[07:46:26] <_joe_>	 elukey: always blame alex and istio
[07:47:28] <apergos>	 what what about vo lans? 
[07:47:32] <apergos>	 *wait what
[07:48:06] <elukey>	 of course we don't forget about Riccardo
[07:48:09] <_joe_>	 apergos: we only blame volans when python is involved
[07:48:24] <apergos>	 oooohhh my bad, I was just blaming all the time tbh :-P
[07:48:24] <_joe_>	 this is golang all the way down
[07:48:43] <apergos>	 ohdear :-D
[07:48:44] <elukey>	 _joe_ I think that he got 3 pings for highlights (username, Riccardo, python)
[07:48:54] <_joe_>	 apergos: also you can assume blaming him is ok if you get frustrated at any linter
[07:49:00] * elukey would love to see Riccardo's IRC client
[07:49:05] <apergos>	 whew gtk :-D
[07:49:06] <_joe_>	 elukey: and we didn't put any cumin in the spice mix
[07:49:16] <jayme>	 so this situation is the same on staging clusters (where the master is unable to reach service/pod IPs)
[07:49:38] <jayme>	 confirmed it's not a policy issue then
[07:49:39] <elukey>	 ahahha yes we were derailing a bit the conversation :D
[07:50:47] <_joe_>	 elukey: we were establishing ground rules for blaming, it's an important detour
[07:50:58] <_joe_>	 jayme: yes, they need to run calico on the masters
[07:51:10] <elukey>	 I assume it is not as simple as including profile::calico::kubernetes
[07:51:10] <_joe_>	 now, how can they do so? aren't we running calico in-cluster now?
[07:51:25] <_joe_>	 I completely lost track tbh
[07:51:42] <jayme>	 yep. calico-node is running as daemonset
[07:52:51] <elukey>	 and for the masters to run it in theory they should bgp-peer with the routers as well
[07:52:59] <jayme>	 damn...I totally did not think about this at the time of building the new calico stuff
[07:55:02] <_joe_>	 jayme: well we can run it as a docker process maybe on the masters?
[07:55:37] <_joe_>	 jayme: do we need calico-node for being able to route to calico addresses though?
[07:55:46] <_joe_>	 I think it's only needed to set up local IP addresses
[07:56:11] <jayme>	 I'm not sure tbh
[07:56:48] <elukey>	 I am glad to deliver joy to your team folks
[07:57:03] <jayme>	 the docs do hide that fact pretty well (only talking about calico-node needs to run on every node)
[07:57:13] <elukey>	 I'll tell Chris to prep something to ship as gift :D
[07:57:34] <jayme>	 but the k8s manifests do state that they want the daemonset on the masters as well 
[07:57:58] <_joe_>	 but I don't think we run the kubelet on the master, do we?
[07:58:02] <jayme>	 nono
[07:58:16] <jayme>	 we don't but it seems they assume we do
[07:58:20] <jayme>	 which is weird
[07:58:32] <_joe_>	 so yeah, we might need to run calico-node on the masters as well somehow
[08:00:07] <elukey>	 maybe a silly question, but wouldn't it be sufficient to allow the masters to bgp-peer with routers + adding to them profile::calico::kubernetes?
[08:00:12] <elukey>	 or is there something more?
[08:03:09] * elukey sees Janis building a voodoo doll with "Luca" written on top
[08:06:15] <jayme>	 unfortunately all that stuff is in caloco-node I guess
[08:06:34] <_joe_>	 no
[08:06:48] <_joe_>	 calico-bird typha and the other thing are still on the servers I think?
[08:06:54] <_joe_>	 lookingf at the puppet class
[08:07:07] <_joe_>	 elukey: that was also my hope
[08:07:23] <_joe_>	 elukey: given your cluster is still experimental, you can try :)
[08:07:50] <jayme>	 no
[08:08:04] <jayme>	 bird is running inside the node container
[08:08:12] <jayme>	 whis is hostNetwork: true
[08:09:26] <jayme>	 so is felix
[08:09:39] <_joe_>	 jayme: ok so we're not installing profile::calico::kubernetes anymore? because it seems to install bird
[08:10:35] <jayme>	 where do you see that?
[08:11:05] <jayme>	 is installs calicoctl and calico-cni
[08:11:07] <jayme>	 *it
[08:11:45] <_joe_>	 jayme: right it's just ferm
[08:11:49] <_joe_>	 sigh.
[08:12:12] <_joe_>	 ok, so... do we have any chance to make it work as a simple docker container launched by systemd?
[08:13:31] <jayme>	 I'm not sure that is simpler than running kubelet on the masters with master taint on
[08:14:21] <jayme>	 running it manually via docker/systemd would mean another way of launching, potentially configuring it
[08:14:37] <jayme>	 + we would need docker on the masters (which we don't have currently)
[08:14:54] <jayme>	 in that case, I think we could also add kubelet
[08:19:19] <_joe_>	 yeah you're probably right
[08:19:32] <_joe_>	 probably the best solution
[08:20:08] <_joe_>	 we also need to add a toleration to all helm charts though in that case
[08:20:17] <_joe_>	 which we don't have rn
[08:20:22] <jayme>	 yeah
[08:20:39] <jayme>	 wait, no :)
[08:21:01] <jayme>	 we would just need to add something for calico to allow it to run on master nodes as well 
[08:22:50] <jayme>	 and that we already have in calico-node
[08:23:11] * _joe_ goes to look at the charts
[08:32:52] <elukey>	 I can open a task with a summary of everything it was discussed if it helps
[08:34:59] <jayme>	 that would be nice indeed
[08:36:54] <elukey>	 will do it in a bit :)
[08:37:35] <_joe_>	 thanks
[08:51:05] <wikibugs>	 10serviceops, 10SRE, 10decommission-hardware, 10Patch-For-Review: decom 44 eqiad appservers purchased on 2016-04-12/13  (mw1261 through mw1301) - https://phabricator.wikimedia.org/T280203 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by jelto@cumin1001 for hosts: `mw1261.eqiad.wmnet` - m...
[08:54:45] <effie>	 jelto: mw1261+ are canary hosts, have we decided which ones are going to replace them ?
[08:56:48] <_joe_>	 it's about time for the debugging session, but AIUI jayme isn't available
[08:57:05] <_joe_>	 also I didn't see mutante around today
[08:59:05] <jelto>	 joe: mutante was in a decom session with me this morning. How do join the debugging session? Meet? tmux?
[09:03:30] <jelto>	 effie: I talked with mutante that we have to deceide on how to replace the canarys. I will make sure that there is a ticket in phabricator for the new canarys and we don't forget about replacing them.
[09:05:41] <effie>	 cool, just remember to copy over the hieradata/hosts/<host>.yaml for each canary 
[09:05:48] <effie>	 thank you !
[09:06:12] <effie>	 _joe_:  I can't join as I am working on tegola's deployment 
[09:14:22] <jayme>	 I'm t
[09:14:55] <jayme>	 I'm still around like an hour, but I it looks like a bad slot anyways :)
[09:24:21] <_joe_>	 ok
[09:24:43] <_joe_>	 sorry I was reading a couple tasks
[09:25:09] <_joe_>	 ok let's do tomorrow at 9:00Z
[09:25:35] <_joe_>	 but at that point who's in is in, and that's going to be final 
[09:26:01] <jayme>	 sgtm
[09:30:25] <jayme>	 _joe_: thanks for looking ad dragonfly. Unfortunately I forgot to add a supernode role
[09:30:39] <_joe_>	 lol see?
[09:30:51] <_joe_>	 I was concentrated at reading the code and seeing if there was some error
[09:38:12] <jayme>	 done
[09:41:32] <wikibugs>	 10serviceops, 10Machine-Learning-Team, 10SRE: Add the possibility to deploy calico on kubernetes master nodes - https://phabricator.wikimedia.org/T285927 (10elukey)
[09:41:54] <wikibugs>	 10serviceops, 10Machine-Learning-Team, 10SRE: Add the possibility to deploy calico on kubernetes master nodes - https://phabricator.wikimedia.org/T285927 (10elukey)
[09:41:58] <elukey>	 ok tried to summarize all in --^
[09:42:44] <wikibugs>	 10serviceops, 10Machine-Learning-Team, 10SRE, 10Kubernetes: Add the possibility to deploy calico on kubernetes master nodes - https://phabricator.wikimedia.org/T285927 (10JMeybohm)
[09:43:15] <jayme>	 looking
[09:44:39] <wikibugs>	 10serviceops, 10Machine-Learning-Team, 10SRE, 10Kubernetes: Add the possibility to deploy calico on kubernetes master nodes - https://phabricator.wikimedia.org/T285927 (10JMeybohm)
[09:52:12] <wikibugs>	 10serviceops, 10Machine-Learning-Team, 10SRE, 10Kubernetes: Add the possibility to deploy calico on kubernetes master nodes - https://phabricator.wikimedia.org/T285927 (10JMeybohm) I don't like the idea of having another way of how calico-node is run (it's already complex enough). Because of that I'll sugg...
[09:58:33] <wikibugs>	 10serviceops, 10Machine-Learning-Team, 10SRE, 10Kubernetes: Add the possibility to deploy calico on kubernetes master nodes - https://phabricator.wikimedia.org/T285927 (10elukey) Definitely, it seems a good way to proceed. The only concern that I have is that our kube masters are lightweight VMs (1 virtual...
[10:00:17] <effie>	 does anyone know what "  Error: release main failed: timed out waiting for the condition" means?
[10:00:43] <effie>	 I run helm apply 
[10:00:53] <effie>	 and after wainting for ages, I got this error
[10:03:01] <effie>	 _joe_: ^
[10:03:33] <_joe_>	 effie: do you have very fat containers to deploy?
[10:03:51] <effie>	 nemo-yiannis:  do I have very fat containers to deploy?
[10:03:52] <_joe_>	 effie: it generically means that some operation lasted longer than 2 minutes
[10:04:08] <_joe_>	 one such case could be pulling from the registry - it happens with the mediawiki images
[10:04:16] <_joe_>	 effie: look at the kubernetes events
[10:04:24] <effie>	 ah right, let me look 
[10:05:22] <effie>	 ah ! 
[10:05:25] <effie>	 Error creating: pods "tegola-vector-tiles-main-69d844cf76-8llw2" is forbidden: minimum cpu usage per Container is 100m, but request is 1m
[10:05:36] <effie>	 ok let me fix that 
[10:05:42] <effie>	 thanx joe
[10:07:30] <nemo-yiannis>	 effie: I pulled it locally and its ~700Mb so not really a lightweight image.
[10:07:58] <effie>	 ok let me fix teh first issue, and we will see about its size
[10:15:00] <nemo-yiannis>	 Yeah, i just checked blubber. We have a step to build tegola and then we copy the whole intermediate image involving all the dependent packages that we don't necessarily need on prod.
[10:15:02] <nemo-yiannis>	 https://gerrit.wikimedia.org/r/plugins/gitiles/operations/software/tegola/+/refs/heads/wmf/v0.14.x/.pipeline/blubber.yaml#27
[10:17:08] <effie>	 so, we can make the image smaller I reckon ?
[10:18:52] <nemo-yiannis>	 yes, the vast majority of the files are go packages
[10:18:58] <wikibugs>	 10serviceops, 10Machine-Learning-Team, 10SRE, 10Kubernetes: Add the possibility to deploy calico on kubernetes master nodes - https://phabricator.wikimedia.org/T285927 (10JMeybohm) Yeah, maybe. Calico-node runs with a memory limit of 400Mi and CPU requests if 350m but the other components will also take up...
[10:23:28] <_joe_>	 oh indeed, that's also a security issue tbh
[10:24:04] <_joe_>	 nemo-yiannis: do you already know what you need to do to improve this, or do you need assistance?
[10:24:56] <_joe_>	 you just need to copy over /srv/service/cmd/tegola/tegola over I think
[10:25:01] <nemo-yiannis>	 Yeah I am working on it, i am testing the patch locally at the moment.
[10:33:46] <wikibugs>	 10serviceops, 10SRE, 10Sustainability: Jobrunner on Buster occasional timeout on codfw file upload - https://phabricator.wikimedia.org/T275752 (10fgiunchedi) Another data point, as expected post-switchover the high latency uploads from jobrunners moved from codfw to eqiad since codfw is now active.
[10:35:07] <wikibugs>	 10serviceops, 10SRE, 10Sustainability: Jobrunner on Buster occasional timeout on codfw file upload - https://phabricator.wikimedia.org/T275752 (10fgiunchedi) Also to avoid confusion I'd like to clarify that on the swift side I can't find anything obviously wrong though I don't have the bandwidth to investiga...
[10:59:34] <_joe_>	 ugh this is the mwdebug pod cpu usage *without any load besides readiness probes* https://grafana-rw.wikimedia.org/d/U7JT--knk/joe-k8s-mwdebug?viewPanel=28&orgId=1&refresh=1m
[10:59:43] <_joe_>	 I would say it's quite underresourced :P
[10:59:51] <_joe_>	 I'll dig deeper later
[11:05:58] <apergos>	 which are the mwdebug in codfw for deploy testing anyhow?
[11:07:10] <apergos>	 nm
[11:46:33] <wikibugs>	 10serviceops, 10SRE, 10decommission-hardware, 10Patch-For-Review: decom 44 eqiad appservers purchased on 2016-04-12/13  (mw1261 through mw1301) - https://phabricator.wikimedia.org/T280203 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by jelto@cumin1001 for hosts: `mw1262.eqiad.wmnet` - m...
[12:09:31] <wikibugs>	 10serviceops, 10SRE, 10decommission-hardware, 10Patch-For-Review: decom 44 eqiad appservers purchased on 2016-04-12/13  (mw1261 through mw1301) - https://phabricator.wikimedia.org/T280203 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by jelto@cumin1001 for hosts: `mw1263.eqiad.wmnet` - m...
[12:09:55] <volans>	 jelto: FYI the decommission cookbook can be run on multiple hosts at once (limit to 5 by default, to 20 with --force)
[12:10:32] <volans>	 this also makes the run of the dns cookbook part quicker because it runs it only once at the end 
[12:11:36] <volans>	 but of course be careful on which hosts you run it ;)
[12:11:52] <volans>	 *run it on
[12:12:52] <jelto>	 volans: thanks for the hint! As this is my first time running this cookbock I wanted to get a feeling what it is doing. But I will batch the last two remaining mw canary hosts together :)
[12:13:31] <volans>	 ack! feel free to ping me if you have any question about it :)
[12:13:56] <jelto>	 volans: thanks a lot I will do 
[12:30:06] <wikibugs>	 10serviceops, 10SRE, 10decommission-hardware, 10Patch-For-Review: decom 44 eqiad appservers purchased on 2016-04-12/13  (mw1261 through mw1301) - https://phabricator.wikimedia.org/T280203 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by jelto@cumin1001 for hosts: `mw[1264-1265].eqiad.wmn...
[12:49:57] <wikibugs>	 10serviceops, 10SRE, 10decommission-hardware, 10Patch-For-Review: decom 44 eqiad appservers purchased on 2016-04-12/13  (mw1261 through mw1301) - https://phabricator.wikimedia.org/T280203 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by jelto@cumin1001 for hosts: `mw1266.eqiad.wmnet` - m...
[13:29:10] <elukey>	 tried to come up with https://gerrit.wikimedia.org/r/702645
[13:29:47] <elukey>	 the pcc diff looks reasonable, then there will be the calico part in case (plus I imagine the router part to enable BGP peering)
[13:50:46] <wikibugs>	 10serviceops, 10SRE, 10decommission-hardware, 10Patch-For-Review: decom 44 eqiad appservers purchased on 2016-04-12/13  (mw1261 through mw1301) - https://phabricator.wikimedia.org/T280203 (10Dzahn)
[13:54:41] <wikibugs>	 10serviceops, 10SRE, 10decommission-hardware, 10Patch-For-Review: decom 44 eqiad appservers purchased on 2016-04-12/13  (mw1261 through mw1301) - https://phabricator.wikimedia.org/T280203 (10Dzahn) @Jclark-ctr @wiki_willy The 6 servers at the bottom of rack A5 (mw1261 through mw1266) have been decomed and...
[13:57:32] <wikibugs>	 10serviceops, 10SRE, 10decommission-hardware, 10Patch-For-Review: decom 44 eqiad appservers purchased on 2016-04-12/13  (mw1261 through mw1301) - https://phabricator.wikimedia.org/T280203 (10Jelto)
[14:00:42] <wikibugs>	 10serviceops, 10SRE: bring 43 new mediawiki appserver in eqiad into production - https://phabricator.wikimedia.org/T279309 (10Jelto)
[17:13:43] <wikibugs>	 10serviceops, 10SRE, 10Patch-For-Review: Delay spinner showing for graphs for 1s - https://phabricator.wikimedia.org/T256641 (10herron) p:05Triage→03Medium
[17:18:34] <wikibugs>	 10serviceops, 10Machine-Learning-Team, 10SRE, 10Kubernetes, 10Patch-For-Review: Add the possibility to deploy calico on kubernetes master nodes - https://phabricator.wikimedia.org/T285927 (10herron) p:05Triage→03Medium
[19:18:33] <wikibugs>	 10serviceops, 10SRE, 10Datacenter-Switchover: Document communication expectations around planning a DC switchover - https://phabricator.wikimedia.org/T285806 (10wkandek) Thanks everybody for the feedback on the communications for the DC switchover process. We will spend some time this quarter (Q1) in working...
[20:25:28] <wikibugs>	 10serviceops, 10Performance-Team, 10SRE, 10MW-1.36-notes, and 3 others: Enable "/*/mw-with-onhost-tier/" route for MediaWiki where safe - https://phabricator.wikimedia.org/T264604 (10Krinkle) So the problem appears to be bad interactions between WANCache's "pre-emptive regeneration" feature (as prompted by...
[22:56:09] <wikibugs>	 10serviceops, 10SRE, 10Services, 10Wikibase-Quality-Constraints, and 3 others: Deploy Shellbox instance (shellbox-constraints) for Wikidata constraint regexes - https://phabricator.wikimedia.org/T285104 (10Addshore) Any idea on a timeline for being able to get this ticket moving? It's blocking T176312 whic...