[00:05:10] curiously..implemented rechecking for the old version in index and running it over the errors from the 3rd, 99%+ of those events have fixed themselves, eqiad, cloudelastic and the db all agree on the latest rev_id [10:00:24] dcausse: I'll be 2' late [10:00:30] np [10:35:58] errand + lunch [13:47:33] o/ [14:40:08] migrating cloudelastic1009 to private IP, holler if you notice anything amiss [15:23:14] inflatador: while I have your attention did the "elastic ban" cookbook get run for the elastic hosts in codfw B4 we're moving to new switch today? [15:23:27] T355860 [15:23:28] T355860: Migrate servers in codfw rack B4 from asw-b4-codfw to lsw1-b4-codfw - https://phabricator.wikimedia.org/T355860 [15:23:56] wdqs2016 is also in that rack I believe it needs to be depooled? [15:25:14] topranks my apologies, starting now [15:25:25] no probs thanks [15:30:36] cookbook has a bug...looks like I have to ban manually ;( [15:36:14] topranks OK, we are good for the switch maintenance [15:37:47] inflatador: thanks appreciate that [15:37:52] always something isn't there :( [16:01:19] \o [16:04:51] o/ [16:15:08] inflatador: we're done in rack B4 and those hosts responding to ping ok [16:22:13] topranks ACK, just unbanned...will ban hosts for tomorrow's maintenance as well [16:22:27] super, thanks! [16:58:10] workout/lunch, back in ~90 [18:02:00] sorry, been back. Will take lunch at :30 [18:19:02] ebernhardson ryankemper heads up, we are getting several "node is not indexing" alerts...investigating [18:19:03] https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchNodeIndexingNotIncreasing [18:19:15] hmm [18:20:39] I think they just updated MW [18:22:06] inflatador: ebernhardson: just a result of https://phabricator.wikimedia.org/T355860#9517953 perhaps? [18:22:24] first shows up in the graph around 17:38 or so, [18:22:47] thats about an hour after that message, but i suppose it would have taken time to drain the writes. If the nodes are banned and those are the ones complaining, then makes sense [18:23:04] err, drain the existing indexes after a ban [18:23:13] ebernhardson ryankemper I've only checked 2 hosts, but their datadirs are empty. Blood pressure rising... [18:23:23] cluster status is green across the board though [18:24:23] ah, I guess if banned they would move their data off [18:24:28] let me double check settings [18:25:17] i didn't check exhaustively, but the host list seems to match up [18:25:33] ebernhardson confirmed, those are the hosts I banned for tomorrow...forgot to downtime [18:25:40] Sorry for freaking everyone out [18:26:57] ryankemper ^^ thanks, you were right on this [18:27:08] cool [18:27:21] aaand, downtimed [18:29:59] did remind me that we're overdue for retiring hosts from https://phabricator.wikimedia.org/T198169 , will get a ticket started when I'm back from lunch [19:03:36] dinner [19:04:12] back [19:26:40] hmm, apparently a ~5MB html table makes HtmlFormatter choke. Not sure which to be more disapointed with :P [19:30:51] curiously passing the first 4mb returns something, but the full thing returns null from preg_replace :S [19:34:33] hmm, PREG_BACKTRACK_LIMIT_ERROR. Wonder what we are supposed to do with that [19:39:20] got hit by a similar error at some point... I could optimize the regex but that might not be possible for you, the problem I had was also that I ignored regex failures which caused a variety of other problems, might not be a problem in the codebase you're looking into tho [19:40:26] by optimizing I think I simply ran multiple simpler regex instead of a big complex one IIRC [19:41:05] this is probably same :) The regex is `/^.*?|<\/body>.*$/s` and it ignores the null result from preg_replace, which then bails elsewhere [19:45:37] * ebernhardson is mildly surprised this code has a DOMElement, and it renders it to text and then regexes out the body element [19:45:37] might be faster to look for the first occurrence and replace manually avoiding the non-greedy ".*?" perhaps [19:45:56] yea i'll try a couple things [19:58:55] * ebernhardson is kinda distracted from the initial purpose, i guess this isn't an SUP specific error...but seems worthwhile to fix things if they aren't too crazy [20:18:08] I guess running puppet doesn't always reload ferm...TIL [20:20:32] anyway, cloudelastic1009 is back in the cluster...So 1/3 of the way there [20:43:12] * ebernhardson wonders when pids in the millions will seem natural..its been years and they still look odd :P [20:47:39] yeah, same with super high UIDs/GIDs [21:11:24] OK, created T356803 and T356806 for the SUP procedures/updated docs. I plan on doing the docs update myself unless anyone objects...but probably won't be until next week [21:11:25] T356803: Develop recovery/reindex procedures for new Search Update Pipeline - https://phabricator.wikimedia.org/T356803 [21:11:25] T356806: Document review/refresh for https://wikitech.wikimedia.org/wiki/Search - https://phabricator.wikimedia.org/T356806 [21:35:46] what a curious page. WikiPage::exists is true, but getParserOutputForIndexing returns null [21:36:02] and visiting the page in a browser gets a does not exist [21:39:03] hh, and a lastrevid of 0 [22:51:53] oh cool, looks like they are working on a way to do mwscript stuff from k8s T341553 [22:51:54] T341553: Allow running one-off scripts manually - https://phabricator.wikimedia.org/T341553