[08:32:10] marostegui: btw, I realized text table in core for really old revision (around a couple hundred thousand rows) instead of holding pointer to ES is actually storing the whole content of pages in the core tables. I'm moving them to ES, I have to hard-code cluster27 so if you're depooling es5, let me know [08:35:29] Amir1: cool, for now I am not planning anything on es [08:36:11] awesome. Is it okay if I continue with db maint too? like schema changes, reboots, etc. [08:41:41] as long as you don't touch sanitarium masters and below yes [08:42:20] sure, awesome. Thanks [08:54:06] * Emperor deploys the kitten test on the reimaged ms-fe1009 [09:55:44] https://www.irccloud.com/pastebin/QQNd48Cv/ [09:55:47] le sigh [09:56:39] that's great [10:05:17] Nice, the gerrit "revert" UI gives you a text box that looks like it's wrapping text for you, but actually stores each paragraph as a really long line, so then the CI yells at you [10:10:10] enwiki text clean up will be fun [10:42:52] Amir1: I am not sure if you broke my replicas-index run [10:42:56] By running it twice [10:43:03] Check my comment on the task [10:43:10] nah, I was running it on clouddb1018 [10:43:16] then something is bad [10:43:27] so many things about that script is bad [10:43:43] can you take care of that task? but please only run it on clouddb1021 [10:44:12] To be honest, I would like to see involvement from WMCS on that [10:44:13] sure, I sorta wanted to run it on clouddb1021 but I was worried I step on your toes [10:44:16] Because they own that [10:44:35] I have shutdown s4 on clouddb1021 for now to reclone [10:44:50] yeah, I will drag them to at least code review [10:45:00] thanks [10:45:05] I take a look [13:03:27] marostegui: I found the problem, the patch works, I'll get WMCS to review and merge it but clouddb1021 is already getting the indexes [13:03:38] (the script is running) [13:04:02] and yes, I know you can read your emails just wanted to be explicit here :D [15:51:31] Hey, just wanted to check in and see if there was anything else you were hoping to get help with from WMCS with wiki replicas. [16:17:41] balloons: Actually yes https://phabricator.wikimedia.org/T337734#8891899 [16:18:24] balloons: And probably some more communication on wikitech-l is probably required [16:31:54] marostegui: balloons the patch is merged now (Arturo reviewed it), I ran it again on clouddb1021 and it works fine. Shall I switch to running it in s1? [16:32:02] (or other rebuilt sections) [16:32:25] Amir1: it is needed everywhere as far as I know [16:32:32] But again, I have no context at all [16:32:52] yeah it's needed. Just checking if you have objections to running it [16:33:00] because you said, let's stick to clouddb1021 for now [16:33:10] but I cleaned clouddb1021 fully [16:33:17] s4 is still pending there, as it is down [16:33:21] It will be up tomorrow [16:33:33] So if clouddb1021 is fully done, I suggest you move to the other hosts/sections then [16:33:47] I am going to bring cloudb1015:s4 up in a second [16:33:47] okay, cool. [16:33:55] yeah for s4 I'll redo that tomorrow [16:34:06] yeah, let's leave all s4 hosts aside from now [16:34:28] I think I will have them ready by 10pm or so tonight [16:35:01] Amir1, thank you. [16:35:39] awesome. No rush on my side, running this on other sections going to take a while anyway [17:30:45] Does anyone know if the replicas return utf-8 fields in byte format? I'm seeing `b'string'` such as `b'Ary29'` or `b'\xeb\xa9\x94\xec\x9d\xb4'` when doing searches in superset output to csv [17:31:23] in general, data on wikis is stored in binary format [17:31:43] if you need utf-8 you need to use a mw api [17:36:03] Ah, thank you! [20:18:21] Amir1: any idea why https://replag.toolforge.org/ is reporting such a weird number for s4 lag? [20:18:28] It is reporting: 9223372036854775807 [20:18:33] While heartbeat view has a normal value [20:18:40] root@clouddb1019.eqiad.wmnet[heartbeat_p]> select * from heartbeat; [20:18:40] +-------+----------------------------+------------+ [20:18:40] | shard | last_updated | lag | [20:18:40] +-------+----------------------------+------------+ [20:18:40] | s4 | 2023-05-31T14:58:08.000820 | 19198.9992 | [20:18:41] +-------+----------------------------+------------+ [20:18:54] hmm, it might be cache [20:18:58] ah maybe [20:19:08] I was checking why https://guc.toolforge.org/?by=date&user=Doc+Taxon wasn't working [20:19:11] But might be cached too [20:19:38] one thing, I depooled half of wikireplicas, let me check if it's among them [20:20:11] yeah, but I'd guess the lb would send it to the other one [20:20:28] But again...this might be the logic of all this but we don't know [20:20:36] yeah, FWIW, clouddb1019 is the pooled one in s4 [20:20:56] I would expect the tool to work even if it is lagged [20:21:20] I think the error is because of me [20:21:31] give me a min to try something [20:22:08] yeah it is ok, don't worry [20:22:23] there might be like 100 tools erroring still due to all the on going stuff [20:22:26] but it is what it is [20:22:58] so I manually changed the haproxy config in dbproxy1019 because since both hosts of s4 were down, it didn't let me reload the config, I removed s4 from it [20:23:30] now I reloaded haproxy [20:23:35] I actually checked quarry earlier today and it worked for s4 [20:23:35] let's see if it can connect now [20:23:37] so it is just that [20:23:45] Yeah, that tool seems to be working now [20:23:51] (Or not erroring) [20:23:51] yay [20:24:09] sorry, I was trying to find a way to depool wikireplicas [20:24:15] It is ok [20:24:20] <3 [20:24:22] Don't worry [20:24:36] Are you running it in parallel? [20:24:40] Like on all the depooled ones? [20:24:41] FWIW, the alters are going through [20:24:47] on two of them [20:24:58] but good idea, let me start on third at least [20:25:02] https://replag.toolforge.org/ also went back to normal [20:25:33] Anyways, I am going to bed it is 10:30pm [20:26:18] go go [20:26:26] Thanks :***