[09:24:55] hi folks, a quick question re: elasticsearch / opensearch, are you aware of deployment in cloud vps where the cluster runs on a single instance? context is https://phabricator.wikimedia.org/T328674 [09:31:26] godog: no idea, the few I have in the radar (like the one in the tools project) are 3-nodes [09:31:35] others may exists that I'm not aware of [09:33:02] arturo: thank you! that's helpful [09:33:33] I'm wondering if there's a way to audit the hiera settings across cloud vps to infer something from those, i.e. if we're using the ::instances variable [09:34:30] yes [09:34:40] you could grep https://gerrit.wikimedia.org/g/cloud/instance-puppet (which is also indexed in https://codesearch-beta.wmcloud.org) [09:34:48] that ^^^ [09:36:18] neato, thanks folks [09:39:35] !log tools aborrero@tools-sgegrid-shadow:~$ sudo truncate -s 1G /var/log/syslog (was 17G, full root disk) [09:39:36] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:41:27] arturo: has the grid failed over to the shadow? [09:41:42] something is weird here, I'm investigating [09:41:47] that might be [09:42:10] pretty sure whatever happened is a result of the outage the other day [09:43:31] aborrero@tools-sgegrid-shadow:~$ cat /var/lib/gridengine/default/common/act_qmaster [09:43:31] tools-sgegrid-shadow.tools.eqiad1.wikimedia.cloud [09:43:35] taavi: ^^^ [09:45:44] hm, and gridengine-shadow.service is running but gridengine-master.service is not [09:45:56] let's fail back? it was suprisingly simple when I did that last time [09:46:04] taavi: yeah [09:46:11] iirc just stop the shadow service, wait a few moments, and then start the master service [09:46:18] yeah [09:46:31] that's what https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Grid#GridEngine_Master suggests [09:47:07] taavi: you do it, or I do? [09:47:16] go ahead [09:47:19] ok [09:48:52] !log tools grid engine was failed over to shadow server, manually put it back into normal https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Grid#GridEngine_Master [09:48:54] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [10:15:33] !log admin purges osd daemons 48 and 40 from eqiad ceph cluster (T329709) [10:15:36] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [10:15:37] T329709: [cookbooks.ceph] Add a cookbook to drain a ceph osd in a safe manner - https://phabricator.wikimedia.org/T329709 [10:32:49] !log toolsbeta aborrero@toolsbeta-test-k8s-control-4:~$ sudo -i kubectl apply -f /etc/kubernetes/psp/base-pod-security-policies.yaml [10:32:51] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [10:35:34] !log tools aborrero@tools-k8s-control-1:~$ sudo -i kubectl apply -f /etc/kubernetes/psp/base-pod-security-policies.yaml [10:35:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [12:58:17] !log paws moving to new cluster. Old one was restarting hub and couldn't find all of its nodes [12:58:19] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Paws/SAL [15:49:10] !log cloudinfra aborrero@cloud-cumin-03:~$ sudo keyholder arm (password in pw) [15:49:12] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Cloudinfra/SAL [17:52:46] Hi love [17:54:28] Libby B. Spinner call or text 530 921 0966 [17:55:12] Haven't seen the kids in 7 months and nobody is on my team. [17:55:18] !kb Guest47 [17:55:21] !log admin Manually zapped /dev/sdc on cloudcephosd1002, probably a leftover drive since the beginning (or during the reimage the drives changed names, and this one had leftovers from the previous OS) (T329498) [17:55:25] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [17:55:25] T329498: [ceph] Move cloudcephosd1001 (b7) and cloudcephosd1002 (b4) to rack e4 - https://phabricator.wikimedia.org/T329498 [17:55:53] taavi: thanks