[18:06:33] !log admin pooling cloudweb1003 and 1004 for wikitech, horizon, striker [18:06:36] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [18:17:43] just noticed wikitech told me it's in readonly mode when I tried to make an edit. but that log line up there sounds exactly like it's the explanation [18:18:26] sorry to be that guy who shows up exactly during maintenance to ask if the service is down, heh [18:23:03] *hisss* [18:23:16] readonly? I don't think that's expected atm [18:23:42] (also not seeing it here fwiw) [18:24:05] it said "The wiki is currently in read-only mode." when I clicked "Add topic" on a user talk page to leave a message [18:24:20] and now it works again [18:25:19] ignore it as a fluke [18:26:16] !log admin depooling cloudweb1003 and 1004 for wikitech, horizon, striker -- pending db grant changes [18:26:18] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [18:26:38] mutante, TheresNoTime, it was sort of an experimental deployment, now rolled back. Expect more breakage in the coming hours! [18:26:59] andrewbogott: gotcha! ACK [18:27:13] sounds exciting :D [18:28:37] Like everything, this is 95% automated and I have to re-learn about the remaining 5% by breaking things [18:29:49] saw business in -traffic just now. all the best [19:43:12] Hi! On Toolforge, I'm trying to get djvu working in CropTool again after having moved to kubernetes, but having a hard time. There's no compilers in the php7.4 image, so I was recommended to try the rub27 image. I was able to compile things there, but not to get the binaries to run in the php7.4 image due to missing shared libraries. It's probably possible to create fully standalone binaries that don't depend on shared libraries, [19:43:12] but it's a bit beyond my knowledge (I tried adding a --disable-shared flag). It would be very helpful if compilers could be added to the php7.4 image, what's the process for requesting that? [19:49:29] danmichaelo_: I can't predict how likely it is, but opening a ticket in phab + tagging it with toolforge and wmcs would be the first step. [19:49:49] We're gradually approaching a more flexible image system but I don't think that's going to help you today :( [19:50:15] danmichaelo_: what is the thing that needs to be compiled in this case -- is it jpegtran? [19:50:44] djvulibre, imagemagick and ghostscript [19:51:18] I could probably do without imagemagick if I used built-in php methods, but it requires a small rewrite I was hoping I could avoid [19:51:38] ghostcript for extracting pages from pdfs [20:05:02] Thanks for the tags, @andrewbogott [20:06:22] danmichaelo_: honestly your short term best bet is just to go back to running the webservice from grid engine. [20:08:08] the progressive creep of "please add X to image Y" will lead us to the "everything from the grid" docker image that we have been actively rejecting for 6 years [20:08:41] the fix is coming in the form of a buildpack based custom image system, but that is not going to be ready in the short term [20:11:15] Understand that. It would still be nice if build-essential was available, so you could at least compile things yourself while waiting for the future to arrive [20:12:15] But thanks for the heads up, not very fun having to go back to webservice, but guess that's still the best solution now [20:16:51] yeah, build essential would help but wouldn't that mean that it takes like 90 minutes for any new pod to start up? [20:17:08] Maybe I'm misunderstanding and there's a caching solution here... [20:21:08] I assumed the build results would be stored somewhere in the tool’s home directory on NFS? [20:21:19] AFAIK we’re not in the no-NFS future yet either [20:22:05] Yes, keeping the build outputs in the tool home was my plan [20:27:46] !log testlabs are you there log? It's me, Andrew [20:27:48] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Testlabs/SAL [20:32:19] bd808: there are a few examples in -operations of stashbot failing today. I broke wikitech for a minute or two earlier but I can't think of why it would still be unreliable now... [20:32:34] not an emergency but something to keep an eye on (presuming you still feel some ownership for that bot) [20:33:33] andrewbogott: stashbot doesn't respond to logmsgbot any more. Krinkle sent a patch to turn that off in an attempt to spam less in -operations [20:36:01] in the backscroll ebernhardson got an actual error message [20:37:22] right it's not the confirmations, it's the bot saying it failed to edit that's concerning (not crazy concerning, but just a potential problem). at 13:20 and against at 13:32 (pacific time) [20:41:32] I am about to go get a haircut, but I encourage everyone to ignore this issue unless it's still happening tomorrow :) [20:41:40] ebernhardson: ack. we had r/o wikitech around 18:20 UTC, but that would be like 11:20 PDT right? [20:42:07] * bd808 would like to outlaw non-UTC times in talking about things computers do [20:42:27] i set phab to UTC, but i don't know if i can survive with irc in UTC :P [20:42:46] my irc is UTC, but I understand :) [20:42:50] it's 20:42 UTC now, so the errors would be 20:20 and 20:32 [20:44:51] why oh why do I not have timestamps in stashbot's error log :/ bad config striking there [20:45:47] The last error was 'The wiki is currently in read-only mode.'. Not sure why we have seen another blip of that. [20:50:35] grafana does show s6 (dblists claim thats where labswiki lives) with random small bits of lag, but nothing particularly significant [20:51:06] maybe the bot could retry? Dunno if queueing for a few seconds in reasonable with however that works [20:55:04] retrys are technically possible, but complexity stashbot has avoided for many years. If we started having consistent issues I would be happy to try and address them, but this feels more like something lingering after the attempt to switch wikitech physical hosts earlier today. [20:55:20] yea that seems fine for now, this is the only time i've noticed stashbot complaining [20:56:08] *nod* thanks for pointing it out. the active monitoring solution is all y'all using the bot. :) [20:59:24] @bd808 you available for a dm for a few min? [21:00:46] PotsdamLamb: I would rather chat in public unless you have a security vulnerability to discuss. It can be a bit scary to ask questions in public, but when we do others can help and learn as well. [21:01:52] pip3 install -e .[mwparserfromhell,mwoauth,mysql] [21:01:52] Obtaining file:///mnt/nfs/labstore-secondary-tools-project/pdlbotarchiver [21:01:52] ERROR: file:///mnt/nfs/labstore-secondary-tools-project/pdlbotarchiver does not appear to be a Python project: neither 'setup.py' nor 'pyproject.toml' found. [21:02:51] also this is a 404 https://doc.wikimedia.org/pywikibot/master/utilities/index.html linked from https://wikitech.wikimedia.org/wiki/Help:Toolforge/Grid [21:05:09] https://doc.wikimedia.org/pywikibot/master/utilities/scripts.html is maybe the right link for that 404 [21:07:05] PotsdamLamb: that pip3 error is from the `.` you used in the command. if your current working directory was $HOME of your tool that seems like a reasonable error. That looks like the sort of command that would be run from the root of a git clone of pywikibot? [21:10:49] ok give me a few please got interrupted by a phone call [21:11:19] no worries. I give and get asynchronous tech support here :) [21:15:18] andrewbogott: I just got a "database locked for maintenance" error from a manual edit on wikitech. Could the pybal pool still have a new cloudweb host in it? [21:15:44] I hit save again and it went through, so it's not all access for sure [21:16:10] * bd808 remembers that a.ndrewbogott is afk for a bit [21:16:30] @bd808 so that . is actually part of the command on the wiki page "(pwb) $ pip3 install -e .[mwparserfromhell,mwoauth,mysql] # adjust extra dependencies as needed for your tool" [21:17:02] https://wikitech.wikimedia.org/wiki/Help:Toolforge/Pywikibot#Setup_a_Python_virtual_environment_for_library_dependencies -------- 3rd one down [21:17:12] PotsdamLamb: see the `cd $HOME/pywikibot` line right above that one? Did you do that too? [21:18:32] let me start that section over lol [21:18:45] how do you put that code in like that? [21:19:14] this bit of the doc could be more clear :) it starts with the `git clone ...` in the prior section [21:19:31] yeah it could it is going now [21:20:34] PotsdamLamb: > how do you put that code in like that? -- I'm not sure what this was in reference to [21:22:01] cd $HOME/pywikibot [21:22:10] that part up there? [21:24:53] ah. I have a habit of surrounding things like that in backticks (`). Sounds like your client is processing that as markdown and turning it into ... or something similar when displayed [21:27:17] bd808: I confirmed that the new hosts were depooled but I also just now switched their weights to 0 just in case that's something [21:28:09] I'm also not sure how this would result in 'locked for maintenance'... unless when adding grants for the new hosts we somehow switched the grants for one of the old hosts to r/o? [21:31:36] andrewbogott: thanks. I guess I assumed the grant issue that made you roll back at 18:26Z was in play. But if the issue there was no access to the db at all I would hope the error message would be different. [21:33:26] yeah, it was no access at all. [21:34:03] So this read-only thing might be unrelated, or some kind of rando side-effect :/ [21:42:07] big link: [21:42:10] https://www.irccloud.com/pastebin/dEhG0Fck/ [21:42:25] bd808: ^ shows the read/only thing as being due to slow replicaion [21:42:36] '[{reqId}] {exception_url} Wikimedia\Rdbms\DBReadOnlyError: Database is read-only: The database is read-only until replication lag decreases. ' [21:42:51] Why would that be just wikitech though? [21:43:28] andrewbogott: should be all of s6, so also frwiki jawiki and ruwiki [21:43:50] agree its odd :( [21:44:11] Top domain names: wikitech, test, commons, en, jobrunner.discovery [21:44:14] https://orchestrator.wikimedia.org/web/cluster/alias/s6 shows db1098.eqiad.wmnet as the sad node [21:47:20] I dropped that link into -data-persistence with a note. Going to assume this is out of my hands until they tell me otherwise. [21:48:00] looks like it's active maintenance -- https://sal.toolforge.org/log/fJqtIoIB8Fs0LHO5cvk2 [21:48:21] related to T312863 [21:48:21] T312863: Schema change to change primary key of templatelinks - https://phabricator.wikimedia.org/T312863 [22:03:53] @bd808 gotcha [23:03:46] !log tools.lexeme-forms deployed 38141487d1 (l10n updates) [23:03:49] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.lexeme-forms/SAL