[05:29:02] <_joe_> bd808: curl https://staging.svc.eqiad.wmnet:$(kubectl get service "${TILLER_NAMESPACE}-main-tls-service" -o jsonpath='{.spec.ports[0].nodePort}') [05:30:28] 10serviceops, 10PyBal, 10SRE, 10Scap, and 3 others: High rate of errors and increased latency on uncached MediaWiki requests due to infrastructure outage - https://phabricator.wikimedia.org/T337497 (10Joe) 05Open→03Resolved a:03Joe The problem that caused this outage has been fixed. [10:16:35] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Use cert-manager for service-proxy certificate creation - https://phabricator.wikimedia.org/T300033 (10jijiki) [11:36:19] 10serviceops, 10MW-on-K8s: mw-on-k8s app container CPU throttling at low average load - https://phabricator.wikimedia.org/T342748 (10Clement_Goubert) [11:36:21] 10serviceops, 10MW-on-K8s, 10Prod-Kubernetes, 10Kubernetes: Allow more flexibility in ResourceQuota and LimitRanger config - https://phabricator.wikimedia.org/T343978 (10Clement_Goubert) 05Open→03Resolved Deployed, we'll be testing it with new deployments for T342748, I will reopen if we encounter any... [14:13:33] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Reserve resources for system daemons on kubernetes nodes - https://phabricator.wikimedia.org/T277876 (10Clement_Goubert) Just to clarify, the above patch will not lead to evictions (unless we have a worker with less than 300MB of free RAM,... [14:33:19] 10serviceops, 10MW-on-K8s, 10Patch-For-Review: mw-on-k8s app container CPU throttling at low average load - https://phabricator.wikimedia.org/T342748 (10Clement_Goubert) [14:33:36] 10serviceops, 10MW-on-K8s, 10Prod-Kubernetes, 10Kubernetes: Allow more flexibility in ResourceQuota and LimitRanger config - https://phabricator.wikimedia.org/T343978 (10Clement_Goubert) 05Resolved→03Open Reverting ` 3m7s Warning FailedCreate replicaset/mw-web.eqiad.canary-77cf597c89... [14:53:47] Thank you for that smoke test _joe_. Testing it out shows that the service name for shellbox deployments does not quite follow that pattern. The shellbox-* services all use "shellbox-main-tls-service" as the service name. [14:54:15] <_joe_> bd808: ah damn right [14:54:15] but I imagine that is an artifact of the helm chart sharing that happens for them [14:54:25] <_joe_> it's the chart name, correctly so [14:55:20] 10serviceops, 10MW-on-K8s, 10Prod-Kubernetes, 10Kubernetes: Allow more flexibility in ResourceQuota and LimitRanger config - https://phabricator.wikimedia.org/T343978 (10JMeybohm) > minimum cpu usage per Pod is 100m. No request is specified, minimum memory usage per Pod is 100Mi. No request is specified... [15:40:56] 10serviceops, 10Data Products, 10RESTbase Sunsetting, 10Code-Health-Objective, 10Patch-For-Review: Route to new AQS Knowledge Gaps endpoint - https://phabricator.wikimedia.org/T342213 (10VirginiaPoundstone) p:05Triage→03Low [16:06:10] 10serviceops, 10MW-on-K8s, 10Observability-Logging: Some apache access logs are invalid json - https://phabricator.wikimedia.org/T340935 (10Joe) 05In progress→03Resolved [16:06:35] 10serviceops, 10MW-on-K8s, 10Observability-Logging: Some apache access logs are invalid json - https://phabricator.wikimedia.org/T340935 (10Joe) There has been no further dropped message in the last hour. I'll call this a success.