[07:17:02] 10Machine-Learning-Team, 10DC-Ops, 10SRE, 10ops-eqiad: Q3:(Need By: TBD) rack/setup/install ml-cache100[1-3] - https://phabricator.wikimedia.org/T299435 (10elukey) 05Resolved→03Open Hi Chris! I noticed that we have two nodes on the same ROW, would it be possible to move one elsewhere? We are going to h... [07:17:07] good morning! [07:17:39] the ml-cache100[1-3] nodes have been racked but two of them are in the same row (E), and I think it may be a risk (even if they are on different racks) [09:28:38] Same row usually means same set of three phases of power, right? [09:30:42] I am not super sure about that one, but it is generally considered like an availability zone or similar.. for example, the TORs are set up in a leaf/spine cluster and in theory one wrong setting/upgrade/etc.. could cause connectivity loss in all rack (extreme scenario I know) [09:33:18] Mh, I see. [09:33:48] I should try and see if we have docs on the DC topologies, both net and power. [09:34:14] for the network we have, for the power not sure [09:34:40] I'm sure it's documented _somewhere_. Question is just if I can find it ;) [09:35:10] there is https://wikitech.wikimedia.org/wiki/SRE/Dc-operations/Common_Datacenter_Specifications as starter [10:23:01] 10Machine-Learning-Team: Re-initialize the Kubernetes ML Serve clusters - https://phabricator.wikimedia.org/T304673 (10elukey) [10:23:49] I opened the task that we discussed yesterday about restarting the k8s clusters --^ [10:23:55] ah snap I am missing the etcd wipe [10:24:31] 10Machine-Learning-Team: Re-initialize the Kubernetes ML Serve clusters - https://phabricator.wikimedia.org/T304673 (10elukey) [10:25:14] done :) [10:53:58] Good call on the daemon shutdown. Just shooting them in the head (or having systemd do it) might leave wonky state behind [11:05:40] 10Lift-Wing, 10Epic, 10Machine-Learning-Team (Active Tasks): Re-evaluate ip pools for ml-serve-{eqiad,codfw} - https://phabricator.wikimedia.org/T302701 (10elukey) @akosiaris @ayounsi if you have time could you please review what Tobias proposed above? If everything is inline with best-practices we (ML) will... [11:39:00] * elukey lunch [13:46:31] Morning all! [13:49:29] good morning o/ [13:54:13] Morning Aiko! [14:39:50] 10Lift-Wing, 10Epic, 10Machine-Learning-Team (Active Tasks): Re-evaluate ip pools for ml-serve-{eqiad,codfw} - https://phabricator.wikimedia.org/T302701 (10cmooney) Hey, I've can't see any problem with the above. Avoiding fragmentation is worth doing so we have as sane a plan as possible, but I note the ex... [14:47:11] o/ [15:19:01] root@apt1001:/srv/wikimedia# reprepro lsbycomponent istio-cni [15:19:01] istio-cni | 1.9.5-1 | bullseye-wikimedia | component/istio195 | amd64 [15:19:03] \o/ [15:43:25] very neat [15:45:01] just tried it, installed the binaries and killed the test pod with the sidera [15:45:04] *sidecar [15:45:05] all good [15:45:48] I am going to start https://wikitech.wikimedia.org/wiki/Istio adding some info [15:49:54] nice! [15:49:56] yesssss [16:11:33] first version created https://wikitech.wikimedia.org/wiki/Istio [16:11:47] I will add how to build debian packages and few other infos during the next days [16:12:07] Thanks elukey [16:58:04] 10Machine-Learning-Team, 10DC-Ops, 10SRE, 10ops-eqiad: Q3:(Need By: TBD) rack/setup/install ml-cache100[1-3] - https://phabricator.wikimedia.org/T299435 (10Cmjohnson) @elukey I moved ml-cache1002 to row/rack C4. [17:14:24] 10Machine-Learning-Team, 10DC-Ops, 10SRE, 10ops-eqiad: Q3:(Need By: TBD) rack/setup/install ml-cache100[1-3] - https://phabricator.wikimedia.org/T299435 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host ml-cache1002.eqiad.wmnet with OS bullseye [17:42:49] 10Machine-Learning-Team, 10DC-Ops, 10SRE, 10ops-eqiad: Q3:(Need By: TBD) rack/setup/install ml-cache100[1-3] - https://phabricator.wikimedia.org/T299435 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host ml-cache1002.eqiad.wmnet with OS bullseye executed wit... [17:48:31] heading for the weekend. \o Have great rest-of-Friday and a splendid weekend, everyone [17:49:04] you too! Going afk in a bit too o/