[06:57:48] I am going to start switching over s8 master in codfw [06:59:40] ok [07:35:21] done [07:50:29] dumping es5 took 27h 34m 😭 [07:56:40] yow [08:39:39] Emperor: re: https://gerrit.wikimedia.org/r/c/operations/puppet/+/817209 what do you think re: fs usage? [08:41:02] godog: you'll see I've now added all the codfw new nodes; I need to see how bust the eqiad ones are [08:41:17] AIUI adding all the new nodes should address the capacity question? [08:44:13] godog: (or at least, I was under the impression the new nodes not yet in production were there to replace the capacity from the nodes now to be drained) [08:45:59] Emperor: yeah I saw codfw new capacity so that's great, not sure about eqiad though but maybe good enough [08:48:55] so the capacity to replace these hosts was ms-be20[62-65] e.g. for codfw, which is fully in production already, though by now I think we've overgrown that, and the new capacity is part of the usual/regular expansion cycle [08:49:37] either way, what do you reckon will take to get the two missing eqiad hosts on their feet ? [08:49:38] AFAICS it's adding 4 24x7T nodes [ms-be10[68-71]] to replace 11 12x4T nodes, which I think is adding 672T to replace 528T [08:49:58] godog: dunno yet, that's almost the next thing on my TODO list :) [08:50:13] :) [08:51:13] the codfw nodes were easy (just partitions in the wrong place), whereas neither ms-be1070 nor ms-be1071 have enough drives (per df) which is more concerning. [08:52:04] ok, I have some time this morning to help and take a look if that'd be helpful [08:53:06] ms-be1070's iDRAC thinks it has the right number of virtual drives [08:53:31] and they're in /proc/partitions too [08:53:59] but not fstab [08:55:22] godog: if you did have a bit of time to look at ms-be1070, that'd be helpful - I'm going to assume it's the same issue as ms-be1071; the drives are there, puppet runs OK, but it's not using drives beyond sdn [08:55:30] hah! I think I know what's up [08:55:47] I'll send a review [08:55:49] [I'm wondering if there's some host-match thing that doesn't know about xx7x nodes] [08:56:14] that's exactly right [08:56:24] :) [08:56:33] I look forward to your CR then :) [08:58:44] https://gerrit.wikimedia.org/r/c/operations/puppet/+/817724 [09:03:37] thanks; once that's merged I'll make sure puppet is OK on 1070 and 1071 then update my CR [09:04:27] sweet! I realized the new hosts will need to be added to the firewall too, left a comment in your CR [09:07:03] Thanks, I'll sort that too [09:52:16] (PrometheusMysqldExporterFailed) firing: Prometheus-mysqld-exporter failed (db2173:9104) - https://grafana.wikimedia.org/d/000000278/mysql-aggregated - https://alerts.wikimedia.org/?q=alertname%3DPrometheusMysqldExporterFailed [09:52:23] ^ me [10:12:16] (PrometheusMysqldExporterFailed) resolved: Prometheus-mysqld-exporter failed (db2173:9104) - https://grafana.wikimedia.org/d/000000278/mysql-aggregated - https://alerts.wikimedia.org/?q=alertname%3DPrometheusMysqldExporterFailed [11:17:44] PROBLEM - Check unit status of swift_ring_manager on ms-fe1009 is CRITICAL: CRITICAL: Status of the systemd unit swift_ring_manager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [11:26:35] ValueError: Couldn't determine zone for IP 10.64.131.2 [11:27:37] I've got a meeting imminently, maybe godog knows about zones? that node is in eqiad rack E2... [11:46:52] * volans thinks about using netbox+hiera to improve the above :D [11:52:32] hah, mmhh yeah the new rack would explain [11:52:38] Emperor: I'll take a look [11:56:02] yeah config data from puppet/netbox for might help, up to a certain extent only though since we have to map subnets to zones anyways [12:01:05] godog: swift_ring_manager knows about the subnets that I knew about at the time - https://gitlab.wikimedia.org/mvernon/swift-ring/-/blob/main/swift_ring_manager.py#L661 but for new subnets it'll need telling which zone they should belong to [12:02:02] why do we need to use IPs? [12:02:27] we have a forward_dns() so clearly the fqdn is available [12:02:39] also that code could use wmflib functionalities instead ;) [12:02:48] godog: question really is which zone 10.64.131.0/24 should be in [12:03:00] Emperor: ack, still in meeting? if not I'd gladly pass the buck back as I have to go run an errand [12:03:15] Emperor: a new one IMHO, like 7 [12:03:20] godog: just out of meeting, but IHNI which zone these new nodes _should_ be in [12:03:32] godog: ah, OK. Do I need to do some magic to the rings to make a new zone? [12:03:34] well all subnets of row E in zone 7 [12:03:59] volans: the rings themselves are by-IP [12:04:10] no a new zone doesn't require anything specific to the ring IIRC [12:04:15] godog: FYI row E/F have rack redundancy, not row redundancy at the network layer [12:05:26] volans: ah good point [12:05:28] godog: shall we catch up after your errand? I think this is a "we need to decide the answer today", not "OMG must fix now", and I'd rather we got the right answer :) [12:05:45] Emperor: ok SGTM! [12:06:10] bbiab [12:06:20] volans: so you can see we currently have 4 zones in eqiad which I think are 1 per row A-D [12:06:50] yes, and I think those were because of the row-redundancy logic there, but ofc could also be a coincidence [12:07:19] volans: it's my understanding that zones should match failure domains roughly, so yes, that's embedding an assupmtion that each of those rows is a separate failure domain [12:07:34] exactly, in the new rows that's a per-rack one [12:07:57] so you might endup with zone E1, E2, .... [12:08:20] does that redundancy extend to power &c too? [12:08:38] just network AFAIK [12:08:56] err, by which I mean - are there any ways in which E1 and E2 have more correlated failure modes than e.g. E1 and D1? [12:08:57] D1: Initial commit - https://phabricator.wikimedia.org/D1 [12:09:17] lol stashbot [12:09:24] (trying to work out if we should have zone-per-row or zone-per-rack) [12:10:15] I guess it depends from what you want to protect yourself, probably better to check with dcops and netops to have all the details right [12:10:20] and make an informed decision [12:12:14] it feels more like a swift question - I don't know whether having more smaller zones is going to cause us pain in future, which might mean we still want all of E in one zone [12:12:52] yeah, I don't have the specific context to help there, but wanted to point out the difference at the network layer [12:13:18] [dns] for the dns bit I was mentioning before, the *_dns() functions could be replaced by https://doc.wikimedia.org/wmflib/master/api/wmflib.dns.html if you want :) [12:14:13] [also not sure why eqiad zones are 1,6,3,5 rather than 1,2,3,4] [12:16:51] my question about IPs was because I saw that zone = find_ip_zone(ip) [12:16:58] and that ip comes from ip = forward_dns(host) [12:17:35] so I guess you get the hostname there and then convert it to IP, at that point we could map hostnames to zones directly [12:18:47] I'm not sure how that helps, since at least currently zones are de facto based on the underlying network [12:18:49] ? [12:19:11] hostname doesn't tell us rack location AFAIAA [12:19:23] no but it's in hiera from netbox [12:19:26] so easy to use [12:20:21] is the subnet a requirment from swift? [12:20:32] or was chosen just because subnet == row == failure domain [12:20:44] you'd have to ask godog that ;-) [12:20:59] I think from a swift POV you can assign devices to zones however you like [12:21:23] (but you have to use IP addresses in the rings, not hostnames) [12:21:45] because my impression is that all this subnets was maybe the easy way back then to map a host to a failure domain [12:22:08] but we have alternatives now, depending on what's the actual underlying requirement [12:22:26] quite possibly - given we're looking up the IP anyway, using the subnet to identify which failure domain to use has the KISS merit [12:22:51] [unrelated] ms-fe1009 alert didn't cripple in here too [12:23:08] the issue here is we're wanting to add hosts to the ring that are outside our existing zones as currently set up [12:23:40] volans: I have an icinga-wm ping at 11:11 here [12:24:31] there was a recovery at 12:10 and a critical again at 12:15 in -operations [12:26:00] (UTCs to avoid confusion) [12:27:14] wonder why icinga-wm didn't mention that here :-/ [12:37:19] marostegui: sorry about the date confussion, I missread June 28 as July 28 [12:37:28] Ah haha! [12:37:29] will update to the 4th [12:37:33] :) [13:19:39] back and caught up on scrollback, so yes the current zone allocation maps failure domains which has been "row", good question re: power vs network on failure domains, I don't know [13:20:36] Emperor: the eqiad zones numbering is an historical artifact unfortunately, no reason other than that [13:21:54] godog: we currently have 1 node in each of E1 E2 F1 F2 ; I'm inclined to thinking that we probably don't want to have each in a single-node zone? Which would push me towards putting E into zone 7 and F into zone 8 maybe? [13:21:55] (thinking out loud) on one hand IMHO it is simpler to keep the current row == zone mapping across a cluster, and generally it's been fine to spread hardware across rows regardless of rack [13:22:20] Emperor: agreed, I think that'd be best [13:22:57] there's the open question in my mind of whether row E and F are their own failure domains or each rack is now [13:23:06] but not blocking for this decision I'd say [13:28:10] volans: to answer your question re: subnet, yes that's used to map hostnames to rows via subnets essentially [13:29:14] and yes in swift rings you'd need IPs [13:31:48] godog: care to 👀 https://gitlab.wikimedia.org/mvernon/swift-ring/-/merge_requests/4 please? [13:32:33] for sure, looking [13:34:02] Emperor: LGTM, I don't think I can approve via gitlab as I can't write to the repo but left a +1 [13:34:16] thanks [14:18:07] RECOVERY - Check unit status of swift_ring_manager on ms-fe1009 is OK: OK: Status of the systemd unit swift_ring_manager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [14:31:16] (PrometheusMysqldExporterFailed) firing: Prometheus-mysqld-exporter failed (db1108:13351) - https://grafana.wikimedia.org/d/000000278/mysql-aggregated - https://alerts.wikimedia.org/?q=alertname%3DPrometheusMysqldExporterFailed [15:32:16] heads up, I am fixing a few prometheus exporter mysql rules by editing zarcillo [18:31:31] (PrometheusMysqldExporterFailed) firing: Prometheus-mysqld-exporter failed (db1108:13351) - https://grafana.wikimedia.org/d/000000278/mysql-aggregated - https://alerts.wikimedia.org/?q=alertname%3DPrometheusMysqldExporterFailed [22:31:31] (PrometheusMysqldExporterFailed) firing: Prometheus-mysqld-exporter failed (db1108:13351) - https://grafana.wikimedia.org/d/000000278/mysql-aggregated - https://alerts.wikimedia.org/?q=alertname%3DPrometheusMysqldExporterFailed