[00:40:14] 10serviceops, 10Patch-For-Review: Productionise thumbor1005, thumbor1006, thumbor2005 and thumbor2006 - https://phabricator.wikimedia.org/T285477 (10Legoktm) Started https://wikitech.wikimedia.org/wiki/Thumbor/Runbook#Pooling_a_new_server [08:07:30] 10serviceops, 10Patch-For-Review: Productionise thumbor1005, thumbor1006, thumbor2005 and thumbor2006 - https://phabricator.wikimedia.org/T285477 (10MoritzMuehlenhoff) >>! In T285477#7505257, @Legoktm wrote: > It was pointed out during the serviceops meeting that we'll find out whether stretch's kernel support... [09:15:11] 10serviceops, 10Prod-Kubernetes, 10SRE, 10Kubernetes: Helm chart dependencies no longer in requitements.yaml - https://phabricator.wikimedia.org/T295750 (10JMeybohm) [09:15:34] 10serviceops, 10Prod-Kubernetes, 10SRE, 10Kubernetes: Helm chart dependencies no longer in requirements.yaml - https://phabricator.wikimedia.org/T295750 (10JMeybohm) [10:26:15] 10serviceops, 10Wikidata-Query-Service, 10Discovery-Search (Current work): Additional capacity on the k8s Flink cluster for WCQS updater - https://phabricator.wikimedia.org/T280485 (10akosiaris) Hi. Given the 13 core/15GB RAM requirement, I can verify that we have that capacity free lying around[1], so we ex... [10:37:22] 10serviceops, 10Prod-Kubernetes, 10Shellbox, 10Kubernetes, 10Patch-For-Review: Docker container logs (stdout, stderr) can grow quite large - https://phabricator.wikimedia.org/T289578 (10akosiaris) 05Open→03Resolved a:03akosiaris >>! In T289578#7504949, @Legoktm wrote: >>>! In T289578#7387767, @akos... [10:52:24] akosiaris: o/ I just noticed https://gerrit.wikimedia.org/r/c/operations/puppet/+/719551, do I need to restart docker on the ml-nodes ? [10:54:39] <_joe_> elukey: are your docker daemons running since september? [10:55:02] elukey: that issue is not docker [10:55:13] I mean, sure restart it, although I think I did [10:55:27] but the issue is all the long running pods you might have [10:55:38] those won't get the change until you restart them too [10:55:56] ahhhh right [10:56:47] _joe_ Active: active (running) since Thu 2021-08-26 10:04:38 UTC; 2 months 21 days ago - this is on ml-serve1001 [10:57:13] <_joe_> elukey: then yes I guess [10:57:17] I am asking since I had an issue with network policies and knative, a port was blocked and one of the pods was hammering the webhook of requests [10:57:33] and I got some log-file-too-big trouble, this is why I was asking [11:02:32] ah yes, then you want to restart docker and kill that pod at the least [11:02:48] seems like I did not restart docker on those nodes [11:08:07] restarted the dockers, will proceed with the pods as well, thanks! [14:59:09] 10serviceops, 10MW-on-K8s, 10SRE, 10Patch-For-Review, 10User-jijiki: Create a mwdebug deployment for mediawiki on kubernetes - https://phabricator.wikimedia.org/T283056 (10Joe) 05Open→03Resolved [17:44:49] 10serviceops, 10Infrastructure-Foundations, 10SRE, 10SRE-tools, 10netops: Support services VIPs with not marked as VIP in Netbox - https://phabricator.wikimedia.org/T295793 (10Volans) [22:31:29] 10serviceops, 10Patch-For-Review: Productionise thumbor1005, thumbor1006, thumbor2005 and thumbor2006 - https://phabricator.wikimedia.org/T285477 (10Legoktm) ` legoktm@cumin1001:~$ httpbb --hosts thumbor1005.eqiad.wmnet --http_port 8800 /srv/deployment/httpbb-tests/thumbor/test_thumbor.yaml Sending to thumbor1... [23:30:00] 10serviceops, 10Patch-For-Review: Productionise thumbor1005, thumbor1006, thumbor2005 and thumbor2006 - https://phabricator.wikimedia.org/T285477 (10Legoktm) thumbor1005 is now pooled at weight=5, I'll fully pool it tomorrow if there are no reports of issues (I dropped a note in `#wikimedia-commons`). I kept a...