[07:01:53] hi folks [07:02:06] checking the ml-cache alerts, new nodes so nothing really important [07:05:00] I can ssh to them but there seems to be an issue with puppet reaching the puppetmaster [07:05:19] and all three in row E/F [07:10:06] topranks: --^ :( [07:40:07] elukey: hey thanks for the heads up [07:40:27] yeah that is unfortunate, we've been working with Juniper on it but there is no fix yet [07:40:50] working assumption is some code bug, although we've not found any pattern to confirm why some work and some do not [07:41:06] anyway for those I've shut/unshut the ports for the 3 ml-cache nodes, and ARP is now ok [07:41:13] so they should be reachable again by v4 [07:41:24] the reason ssh worked is that v6 ND is not affected by the bug [07:41:47] sorry for the hassle [07:43:27] topranks: nope all good! Out of curiosity, did you disable/enable the network ports on the switches right? [07:43:36] I did yes [07:43:58] Although I expect an "ip link set dev XXX down" followed by the reverse would also fix it [07:44:11] I should have tested that theory out, will do it next time if it happens [07:44:52] what I can say is that any that have been "fixed" this way have not suffered any re-occurance [07:57:46] elukey: yes to test I tried with one of the other hosts [07:58:00] shutting/unshutting the interface from the host side also works [07:59:24] ack! [12:42:34] hi all there is a task to possibly add some projects under operations/sowftare to code search. Im not too familure with code search som im not sure what does and doesn't make sense. i have tagged a few roject owneres but perhaps someone more familure with codesearch could also comment on what may or may not make senses [12:43:04] _joe_: akosiaris: moritzm: perhaps ^^ [12:43:15] and the task number T303434 [12:43:16] T303434: Add operations/software/purged to Codesearch - https://phabricator.wikimedia.org/T303434 [12:46:47] <_joe_> jbond: ack thanks [12:48:38] yeah, I think adding our internal components (anything not uploaded to Debian where codesearch.debian.net does the same for us) makes sense in general. e.g. think of a case where one wants to remove/change something in wmflib and wants to asses the impact on existing tooling [12:56:46] ack sgtm ill create a CR to add stuff [13:01:34] <_joe_> we would probably want to add a couple more repos indeed [13:01:42] <_joe_> let me comment on the task [13:02:01] thanks [13:02:23] oops, did not read here before adding a patch :) [13:02:35] jayme: no worries ill base min on yours [13:02:38] at least I don't have +2 rights there... [13:04:25] <_joe_> I would suggest if possible to classify all these repos under a "sre" category in codesearch [13:05:16] _joe_: are you suggesting renaming "Wikimedia Operations" to sre or creating a new sre section [13:31:37] For those familiar with the eqord and eqdfw networking locations we have - I'm looking at implementing the renaming away from calling our DCs "clusters" but notice that we list networking-only locations here as well. It was my impression that these effectively function as routers only - do those technnically reside in a server rack? I'm trying to keep things simple, but not inaccurate :) calling them a data center is probably wrong from [13:31:37] our POV but correct from the POV of what it resides in. However I'd like to define data center as a physical location in which we have server racks. - So: Is that true for eqord? [13:34:02] I think the counter-question is, what is a server? XD [13:35:19] if you have access, this may help, Krinkle? https://netbox.wikimedia.org/dcim/sites/ [13:37:07] nyes. [13:37:20] so racks yes, but the item(s) are not considered a "server". [13:37:28] okay, I'll go with yes it's a DC and yes it has server racks. [13:37:59] whether they contain web servers or milkshakes, is a question for someone else to worry about. [13:38:22] thanks jynus :) [13:40:03] this may be a controversial opinion, but I wouldn't worry too much about unifying names, as much as making sure those are not confusing- cluster is certainly an ambiguous one, site probaly not [13:41:36] I remember having multiple discussions with marostegui if we should call things hosts, servers, boxes or something else :-D [13:47:52] see -ops thread from last week :) [13:51:37] Krinkle: not sure I get your question about the network POPs [13:52:38] Krinkle: routers are still rack-mounted [13:54:01] cdanis: ack, I realize that now based on netbox. It never looked up how big a core router is and whether that would make sense to be in a rack or not. [13:54:12] makes sense I suppose :) [13:54:28] Krinkle: it varies by location but https://www.juniper.net/us/en/products/routers/mx-series/mx204-universal-routing-platform.html is the smallest we have iirc [13:54:43] and yeah, it's not just size, it's also that it's a very convenient/standard form factor [13:55:15] for sure, but if it's just one object somewhere, I wouldn't have been surprised if it was just a home router sized object stuffed in a wall closest somewhere. [13:55:31] data centers charge extra for special arrangements like that ;) [13:55:36] which is more or less how I've heard eqord described in the past [13:56:19] which yeah, I figured wasnt meant literally but I also didn't think it'd be a standard rack mount unit. [13:59:13] my bikeshedding on this would be: "datacenter" is accurate at all the locations, although it describes the facility we lease space from, not our deployment within it, and could confusingly make someone think we actually own our own datacenter or something. [13:59:22] "site" is kinda reasonable for all of them too [13:59:41] yeah, when some entities talk about a new datacenter, they meant they literally bought land and poured concrete [14:00:41] differentiating them, the key thing is Core vs Edge vs Network in some way. They're like a concentric venn diagram. [14:01:11] The network[-only] sites just have routers. The Edge sites have routers + CDN edge stuff. The Core sites have all that plus a lot more. [14:01:31] you could s/Edge/CDN/ or something too, not sure about the label really [14:01:48] some operators might call what we call network-only sites "edge pops" too, it can all be confusing [14:42:02] I think Cloudflare uses the term colo [14:42:44] but that's usually more about the physical housing/hosting isn't it [14:42:50] from collocation, which is more accurate and less ambiguous than "site" [14:43:27] I think POP as well, as it's not used for anything else [14:47:40] Krinkle | which is more or less how I've heard eqord described in the past [14:47:59] heh, that may have been me, I've sometimes said that it's not a lot bigger than people's home utility closets [14:48:16] but that's just because it's only a part of a (full) rack, but indeed, it's still in a normal rack [14:48:20] (shared with other customers, usually) [14:48:59] using "colo" as metonymy for "our space in a datacenter facility" is pretty common across the industry for better or worse [14:49:09] yes [14:49:47] I've even heard "colo" used for "our wholly-owned datacenter we literally poured the concrete for" :D [14:50:14] yes, which is odd isn't it ;) [14:53:36] and btw, we do occasionally have a few small devices in our data centers which are not really rack mountable, which then tend to end up on a rack mounted shelf or similar [14:53:56] they are a pain to deal with and generally hated by on-site people ;) [14:55:12] at home I have two (non-fullsize) racks as well, and I sometimes must resist the temptation to grossly overpay for buying proper rack mountable equipment over something else that in other respects is perfectly fine ;) [14:55:23] (like my HPE microserver gen10+) [15:42:52] Of course, they're also multiple U high... :D [15:44:40] But proper rackmounted stuff often looks a lot better too [16:18:25] BTW, do you plan to do a public blog announcement about drmrs when things are fully setup? [16:18:51] (just saw Maryana reference on her mail) [16:26:54] +1, I think we should. That's precisely the kind of stuff many folks want to read in blogposts :-P [16:28:34] ^am I seeing a volunteer to write it? :-D [16:29:59] * arturo looks over the shoulder [16:39:16] There was some discussion on this previously, myelf and Arzhel were talking about it but we've not yet had time to look [16:44:06] yeah, I could guess- better finishing all the work first :-D [16:48:11] yeah in quarterly and/or outcome terms, this Q was "make the site fully available", and we're stretching to ensure it can be an esams failover in the short term as well. [16:48:47] next Q is the part where we fully integrate it in the geodns map config so it's being fully utilized for latency improvements in the region. [16:49:41] there's no big moment at the end of the latter. Probably the data will be put together fairly early on, and then countries' mappings to drmrs will be phased in over weeks gradually. [16:50:04] so somewhere around this upcoming Q boundary might be a good moment for a blog announcement showing where we're at and what's coming gradually-next