[10:00:33] Lunch [10:21:52] lunch [12:30:16] greetings [12:41:33] o/ [12:43:17] \o [15:02:36] o/ [15:02:46] ebernhardson: can we put undeploying APIFeatureUsage on pause? [15:02:58] gehel: sure, why? [15:03:30] There are enough people raising issues on how that's going to have negative impact. More info soon! [15:29:19] ebernhardson: do we sample search satisfaction events on en wiki from the browser? [15:30:27] dcausse: should be unsampled [15:32:03] hm... I can see https://intake-analytics.wikimedia.org/v1/events?hasty=true used on frwiki from my browser but not on enwiki [15:32:33] these are filtered by ublock origin btw [15:34:06] yea, i always have to open an incognito to see the events flowing through. Also i'm not seeing all the events i would expect, i get a "searchResultPage" event from Special:Search, but nothing from autocomplete [15:34:34] yes same, even for fr wiki actually [15:34:53] not seeing click or visitPage events either [15:35:00] oh, actually they seem buffered up [15:35:38] in my network inspector tab it had nothing, nothing, then it sent 6 requests to events [15:36:05] oh I thought there were filtered I see them too [15:39:33] autocomplete events do seem like something is wrong, i'm getting fulltext events, and it looks like some autocomplete events from the search bar on Special:Search, but nothing from the skin autocomplete [15:39:39] (this is the old skin on enwiki) [15:39:51] oh nevermind, those do come in. I wonder how to turn the lag off [16:04:52] dinner [16:06:26] heh, Special:Preferences / Editor still says "Temporarily disable the visual editor while it is in beta" [16:09:47] :) [16:10:25] errand [16:38:18] meh, elastic's complaint about date formats in apifeatureusage is mostly complaining about nothing. It says we are using `y`, which is year-of-era, which returns 2022. Instead we should use `u` which is year, which returns 2022 [16:39:01] (these aren't the annoying part to resolve in apifeatureusage though, the more difficult part is validating the logstash pipeline) [16:42:43] sorry, forgot to notify for workout, but I'm back [17:09:15] well, I thought Minecraft using the java would be a good test case for the inspiration project, but it wants to spin everything up in its (ephemeral) working directory instead in the stateful volume [17:11:05] ebernhardson: existing index mappings have the "_all": false I wonder if they'll migrate properly [17:11:51] dcausse: hmm, not sure. I suppose i can give that a quick local test should be easy enough. I just put up the patch to drop that from the template but ofc thats only new indices [17:12:18] worst case we drop older indices I suppose [17:12:52] should be reindexable, will have to double check how ApiFeatureUsage references them but we can probably throw a .v2 at the end of the name or some such [17:13:05] and then adjust curator to properly delete them later [17:18:03] LMK if you need me to merge that puppet patch [17:18:44] inflatador: yea go ahead, it should be safe. I'll have to double check later but i'm pretty sure the template will be auto-imported to the proper clusters without changing anything else (might happen on next curator run or some such) [17:21:01] ebernhardson grabbing a quick lunch, while merge when I get back [17:21:08] kk [17:21:24] dinner [17:54:23] Created an apifeatureusage-2022.07.17 index using the old template in 6.8.23 and then reloaded the same elasticsearch data directory into 7.10.2, looks to load as expected (and still reports {"_all":{ "enabled": false}} as part of the mapping) [17:55:41] It even still returns the old format with "api-feature-usage-sanitized" as the type name if i provide include_type_name=true [17:57:00] but surprisingly the bulk ingest in 6.8 doesn't accept an index request that only specifys the index name without a type. Not sure how to handle that when we are running 6 in one cluster and 7 in the other...will have to ponder [17:57:17] (maybe not surprising and i'm just forgetfull :P) [18:07:17] i suppose it's possible something in logstash will need a restart to read the new config, might have to poke the o11y team [18:07:20] back [18:07:38] but could also deploy the template change and check tomorrow if the template gets updated before bothering them [18:10:56] ebernhardson merged now, LMK if you need me to revert https://gerrit.wikimedia.org/r/c/operations/puppet/+/815327 [18:11:49] inflatador: doubt we will need a revert, worst case we will need to followup with something else. We have to followup anyways if we are going to keep apifeatureusage, this change only silences some deprecation warnings [18:12:10] ACK [19:06:55] lunch [20:02:21] back [20:03:38] looks like the apifeatureusage template update made it to eqiad and codfw, no more mention of _all from the /_template endpoint [21:40:12] ( ^_^)o自自o(^_^ ) CHEERS! [22:02:22] ebernhardson ryankemper just noticed we are in red in CODFW, checking it now [22:03:10] inflatador: hmm, might be related to reindexing and restarts. reindexing creates new indices and restarts might have interrupted something at an inoportune time. Whatever the red index is check if there is a matching green index with an earlier timestamp in the name [22:03:26] cebwiki_content_1658250407 is the index , let me see [22:04:07] the thing is when we reindex we reindex into an index with 0 replicas. once reindexing is complete we increase replica count as appropriate, once that is happy we flip the alias [22:05:20] looks like we have a cebwiki_content_1627611685 [22:05:59] inflatador: yup, looks like one of the nodes hosting a shard was restarted. No big deal, the reindexing will go on to the next wiki and i'll pull a report about which ones failed when reindexing is done. Go ahead and delete the red index [22:06:30] ACK will do [22:07:25] there is some question of if cirrus should delete those itself...but we always felt odd just deleting things automatically even if it seems safe. But i don't know that it's ever helped us to have those old indexes lay around until someone manually cleans then up [22:07:29] s/then up/them up/ [22:08:24] Deleted cebwiki_content_1658250407 and it looks like we are back to green [22:08:30] * ebernhardson is constantly reminded that 'just' is almost always a useless word, but i can't seem to leave them out :P [22:09:52] I can't complain too much...hard for me to type a message that doesn't start with "yeah" [22:09:59] :) [22:10:31] * ryankemper finishes up catching up on backlog just in time to see that there's nothing for me to do here :P [22:10:39] {◕ ◡ ◕} [22:11:10] ryankemper actually, now that you mention it ;) ... can you keep an eye on the codfw reimages? I've got a tmux window up on cumin1001 if you don't mind [22:11:22] inflatador: sure thing! pulling up tmux now [22:12:35] would be neet if we could somehow have an 'orange' status, would be red but only on indices that have no active alias's meaning they aren't so bad [22:13:02] not so bad == cirrus only talks to indices through aliases except for very specific maintenance actions around creating new indices [22:13:37] Yeah we could hack a little script together to check that [22:13:59] return the response from cluster health api, unless it's red in which case return either orange or red based off scanning the index aliases [22:14:10] if we keep running in to this one, probably a good idea. Nothing like seeing red to end your day ;) [22:15:00] Might even be helpful to generalize it to a `cluster status` type command where it lists red/orange/yellow/green but also active shard recoveries, etc [22:15:07] kinda like a point-in-time view of one of our tmux dashboards [22:17:37] yea something to ponder, probably wouldn't be too hard to put together something. There is probably a way to get a list of all non-green indices so we don't have to poll a few thousand indices [22:17:55] i suppose just parse /_cat/indices?format=json at worst [22:18:28] speaking of point-in-time, been meaning to try this out https://github.com/sachaos/viddy [22:18:54] works like the watch command, but can time-travel...so we could see when the cluster went red [22:19:00] (not that there aren't already lots of ways to do that) [22:21:39] * ebernhardson apparently needs to setup some sort of local logstash images to figure out what logstash+elasticsearch is going to do when one cluster is on 6.x and the other is 7.x [22:24:23] Need go run to pharmacy, back in 30 [22:24:31] to run* [23:19:00] heading out