[10:11:06] ALPS [10:11:10] SPSS [14:43:45] hi, i have a weird problem. i have a tool with a sqlite database, which has grown to 700 MB. code using this database from kubernetes cron jobs works perfectly, but i can't read it from a ssh connection using the `sqlite3` executable - any query returns "Error: disk I/O error". any clues why that would happen? is it just not a good idea to have a database of this size on toolforge? [14:43:52] to reproduce: sqlite3 /data/project/dtcheck/public_html/database.sqlite "select * from meta" [14:43:56] (yes it's public_html and world-readable, that's ok) [14:44:35] MatmaRex: when you run that command, are you the dtcheck tool user or yourself? [14:44:48] i am dtcheck [14:45:05] ok. are you running it on the bastion directly or inside some Kubernetes container? [14:45:21] on the bastion, i think [14:45:35] tools.dtcheck@tools-sgebastion-10 [14:45:58] yeah. that's one of the bastions. if you have not dome some `webservice shell` magic then you are still on a bastion [14:47:22] MatmaRex: hmmm... I cannot recreate from the dev.toolforge.org bastion. Let me try from login.toolforge.org where it is failing for you. [14:47:59] strace gave me these details: fcntl(3, F_SETLK, {l_type=F_RDLCK, l_whence=SEEK_SET, l_start=1073741824, l_len=1}) = -1 EIO (Input/output error) but unfortunately this tells me nothing, i searched for various parts of this error but did not learn much [14:48:09] well, except that it has to do with file locks [14:50:05] MatmaRex: the command works as expected when I ssh to dev.toolforge.org, but it fails from login.toolforge.org. I will file a bug report first and then if I have time try to see if NFS is obviously broken somehow on the login.toolforge.org node (tools-sgebastion-10). I think I have seen some similar error report recently in Phabricator... [14:50:48] huh. thanks [14:52:52] in general I think our advice is to not run sqlite and other databases on NFS since you're bound to get bad performance on it [14:55:20] T336510 [14:55:20] T336510: "Error: disk I/O error" from sqlite3 access attempt made from tools-sgebastion-10 - https://phabricator.wikimedia.org/T336510 [14:55:48] T336145 is the other report I remembered seeing [14:55:49] T336145: disk I/O error - https://phabricator.wikimedia.org/T336145 [14:56:07] taavi: surely one of these years, NFS will get good [14:56:09] bd808: thanks [14:56:19] (it runs fast enough for me, when it runs) [14:56:49] ToolsDB is the "better" solution to replace sqlite [14:59:41] I hear there's an even fancier database solution now [15:00:50] Anyways, if I wanted to do outreach to Wikimedia Hackathon attendees, as a non-attendee, what is my best option for that? [15:01:23] @harej: phab board, telegram/irc channel, wiki page [15:03:52] bd808: hi, we could use some help with debugging a gerrit issue if anyone from wmcs is around [15:04:13] so we changed the IP of gerrit and it has been added in cloudgw.yaml a while ago [15:04:20] but it's like we are still firewalled off [15:04:30] is it possible the cloudgw thing needs a reload [15:04:47] arturo: are you around to help mutante? ^ [15:05:09] oh man. bank holiday for him today [15:05:09] mutante: I can access gerrit from WMCS just fine. please clarify where you are seeing issues [15:05:51] taavi: it's about accessing port 443 from cloud VPS machines [15:06:04] the IP behind gerrit.wikimedia.org changed from .137 to .151 [15:06:21] the relevant change to allow it was https://gerrit.wikimedia.org/r/c/operations/puppet/+/909795 [15:06:31] oh right, I'm testing from an instance with a public ip [15:06:35] currently both beta and integration cant talk to it [15:06:39] yeah, I can see the issue [15:06:41] but we need to fix that [15:06:44] ok! [15:07:12] irb-1120.cloudsw1-c8-eqiad.eqiad1.wikimediacloud.org is the last one to answer [15:07:58] !log tools.masto-collab Updated from 4339b82 to 16b8d84 [15:08:01] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.masto-collab/SAL [15:08:12] mutante: so the problem is that the cloudgw ::dmz_cidr list defined in Puppet has to match what's defined in operations/homer/public.git:definitions/static.net [15:09:02] for gerrit dmz_cidr has 154.137 and 154.151, but the homer repo only has 154.137 (and 153.107 for gerrit-replica) [15:09:27] so now that gerrit.wm.o points to 154.151, things do not work [15:10:20] sounds like we need a homer deploy ...hmmm [15:10:35] is it expected that only editing cloudgw.yaml is not enough [15:10:42] yes, you would need to update the definitions and then deploy it [15:10:48] yes, this is intended behaviour [15:11:09] so the WMCS switch would miss a static route to production? [15:12:51] topranks: you around to help with ^ [15:13:08] (it's a holiday in many places today :/) [15:13:52] dcaro: mutante is working with sukhbir on homer stuff i believe [15:13:59] talking to sukhbir about homer deploy [15:14:04] thanks taavi [15:14:06] 👍 [15:19:21] we are about to deploy https://gerrit.wikimedia.org/r/c/operations/homer/public/+/919151/1/definitions/static.net [15:19:24] mutante: cool. lmk if you have any patches that need reviews [15:19:26] taavi: ^ lgtm [15:19:29] there ^ [15:20:49] +1'd [15:21:07] thanks! [15:28:13] seems to works now [15:28:23] :) [15:28:30] indeed! [15:28:39] yay [15:28:45] beta jobs are being reenabled now [15:30:15] thanks! [15:31:52] !log tools Sent `wall` for reboot of tools-sgebastion-10 circa 15:40Z [15:31:54] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [15:36:02] !log tools.masto-collab Updated from 16b8d84 to 0aff6ff [15:36:03] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.masto-collab/SAL [15:36:10] dcaro: just out of a meeting now [15:36:39] topranks: np it's already fixed, sorry for the noise :) [15:36:41] I'm not 100% sure what the ask is based on the scrollback [15:36:43] ah ok no worries [15:37:39] topranks: fixed by sukhe:) thanks so much [15:48:29] !log tools Rebooted tools-sgebastion-10 for T336510 [15:48:32] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [15:48:32] T336510: "Error: disk I/O error" from sqlite3 access attempt made from tools-sgebastion-10 - https://phabricator.wikimedia.org/T336510 [15:51:56] MatmaRex: rebooting the bastion fixed the flock problem. I would agree with t.aavi that moving to a data store that is not sqlite on NFS is recommended. Sqlite file locking and NFS are not friends. [15:54:05] thanks [19:37:11] more specifically, if I wanted people to know that I have toys for them to play with (a wikidata query service with a higher timeout limit and a new source metadata-themed wikibase), what do you think would be most effective? if this were a different kind of hackathon I would buy a sponsorship slot (re @wmtelegram_bot: @harej: phab board, telegram/irc channel, wiki page) [20:02:55] @harej: Write that up on a wiki page somewhere and drop a link in the irc/telegram chat? Maybe also look on https://www.mediawiki.org/wiki/Wikimedia_Hackathon_2023/Connect for someone to act as your agent and spread the word at the event? [20:32:37] wikidata mailing list?