[10:11:06] <Guest1>	 ALPS
[10:11:10] <Guest1>	 SPSS
[14:43:45] <MatmaRex>	 hi, i have a weird problem. i have a tool with a sqlite database, which has grown to 700 MB. code using this database from kubernetes cron jobs works perfectly, but i can't read it from a ssh connection using the `sqlite3` executable - any query returns "Error: disk I/O error". any clues why that would happen? is it just not a good idea to have a database of this size on toolforge?
[14:43:52] <MatmaRex>	 to reproduce: sqlite3 /data/project/dtcheck/public_html/database.sqlite "select * from meta"
[14:43:56] <MatmaRex>	 (yes it's public_html and world-readable, that's ok)
[14:44:35] <bd808>	 MatmaRex: when you run that command, are you the dtcheck tool user or yourself?
[14:44:48] <MatmaRex>	 i am dtcheck
[14:45:05] <bd808>	 ok. are you running it on the bastion directly or inside some Kubernetes container?
[14:45:21] <MatmaRex>	 on the bastion, i think
[14:45:35] <MatmaRex>	 tools.dtcheck@tools-sgebastion-10
[14:45:58] <bd808>	 yeah. that's one of the bastions. if you have not dome some `webservice shell` magic then you are still on a bastion
[14:47:22] <bd808>	 MatmaRex: hmmm... I cannot recreate from the dev.toolforge.org bastion. Let me try from login.toolforge.org where it is failing for you.
[14:47:59] <MatmaRex>	 strace gave me these details: fcntl(3, F_SETLK, {l_type=F_RDLCK, l_whence=SEEK_SET, l_start=1073741824, l_len=1}) = -1 EIO (Input/output error) but unfortunately this tells me nothing, i searched for various parts of this error but did not learn much
[14:48:09] <MatmaRex>	 well, except that it has to do with file locks
[14:50:05] <bd808>	 MatmaRex: the command works as expected when I ssh to dev.toolforge.org, but it fails from login.toolforge.org. I will file a bug report first and then if I have time try to see if NFS is obviously broken somehow on the login.toolforge.org node (tools-sgebastion-10). I think I have seen some similar error report recently in Phabricator...
[14:50:48] <MatmaRex>	 huh. thanks
[14:52:52] <taavi>	 in general I think our advice is to not run sqlite and other databases on NFS since you're bound to get bad performance on it
[14:55:20] <bd808>	 T336510
[14:55:20] <stashbot>	 T336510: "Error: disk I/O error" from sqlite3 access attempt made from tools-sgebastion-10 - https://phabricator.wikimedia.org/T336510
[14:55:48] <bd808>	 T336145 is the other report I remembered seeing
[14:55:49] <stashbot>	 T336145: disk I/O error - https://phabricator.wikimedia.org/T336145
[14:56:07] <MatmaRex>	 taavi: surely one of these years, NFS will get good
[14:56:09] <MatmaRex>	 bd808: thanks
[14:56:19] <MatmaRex>	 (it runs fast enough for me, when it runs)
[14:56:49] <bd808>	 ToolsDB is the "better" solution to replace sqlite
[14:59:41] <wm-bb>	 <harej> I hear there's an even fancier database solution now
[15:00:50] <wm-bb>	 <harej> Anyways, if I wanted to do outreach to Wikimedia Hackathon attendees, as a non-attendee, what is my best option for that?
[15:01:23] <bd808>	 @harej: phab board, telegram/irc channel, wiki page
[15:03:52] <mutante>	 bd808: hi, we could use some help with debugging a gerrit issue if anyone from wmcs is around
[15:04:13] <mutante>	 so we changed the IP of gerrit and it has been added in cloudgw.yaml a while ago
[15:04:20] <mutante>	 but it's like we are still firewalled off
[15:04:30] <mutante>	 is it possible the cloudgw thing needs a reload
[15:04:47] <bd808>	 arturo: are you around to help mutante? ^
[15:05:09] <bd808>	 oh man. bank holiday for him today
[15:05:09] <taavi>	 mutante: I can access gerrit from WMCS just fine. please clarify where you are seeing issues
[15:05:51] <mutante>	 taavi: it's about accessing port 443 from cloud VPS machines 
[15:06:04] <mutante>	 the IP behind gerrit.wikimedia.org changed from .137 to .151
[15:06:21] <mutante>	 the relevant change to allow it was https://gerrit.wikimedia.org/r/c/operations/puppet/+/909795
[15:06:31] <taavi>	 oh right, I'm testing from an instance with a public ip
[15:06:35] <mutante>	 currently both beta and integration cant talk to it
[15:06:39] <taavi>	 yeah, I can see the issue
[15:06:41] <mutante>	 but we need to fix that
[15:06:44] <mutante>	 ok!
[15:07:12] <hashar>	 irb-1120.cloudsw1-c8-eqiad.eqiad1.wikimediacloud.org is the last one to answer
[15:07:58] <wm-bot>	 !log tools.masto-collab <legoktm> Updated from 4339b82 to 16b8d84
[15:08:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.masto-collab/SAL
[15:08:12] <taavi>	 mutante: so the problem is that the cloudgw ::dmz_cidr list defined in Puppet has to match what's defined in operations/homer/public.git:definitions/static.net
[15:09:02] <taavi>	 for gerrit dmz_cidr has 154.137 and 154.151, but the homer repo only has 154.137 (and 153.107 for gerrit-replica)
[15:09:27] <taavi>	 so now that gerrit.wm.o points to 154.151, things do not work
[15:10:20] <mutante>	 sounds like we need a homer deploy ...hmmm
[15:10:35] <mutante>	 is it expected that only editing cloudgw.yaml is not enough
[15:10:42] <taavi>	 yes, you would need to update the definitions and then deploy it
[15:10:48] <taavi>	 yes, this is intended behaviour
[15:11:09] <hashar>	 so the WMCS switch would miss a static route to production?
[15:12:51] <dcaro>	 topranks: you around to help with ^
[15:13:08] <dcaro>	 (it's a holiday in many places today :/)
[15:13:52] <brennen>	 dcaro: mutante is working with sukhbir on homer stuff i believe
[15:13:59] <mutante>	 talking to sukhbir about homer deploy
[15:14:04] <mutante>	 thanks taavi
[15:14:06] <dcaro>	 👍 
[15:19:21] <mutante>	 we are about to deploy https://gerrit.wikimedia.org/r/c/operations/homer/public/+/919151/1/definitions/static.net
[15:19:24] <taavi>	 mutante: cool. lmk if you have any patches that need reviews
[15:19:26] <mutante>	 taavi: ^ lgtm
[15:19:29] <mutante>	 there ^
[15:20:49] <taavi>	 +1'd
[15:21:07] <mutante>	 thanks!
[15:28:13] <hashar>	 seems to works now
[15:28:23] <mutante>	 :)
[15:28:30] <taavi>	 indeed!
[15:28:39] <mutante>	 yay
[15:28:45] <mutante>	 beta jobs are being reenabled now
[15:30:15] <dcaro>	 thanks!
[15:31:52] <bd808>	 !log tools Sent `wall` for reboot of tools-sgebastion-10 circa 15:40Z
[15:31:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
[15:36:02] <wm-bot>	 !log tools.masto-collab <legoktm> Updated from 16b8d84 to 0aff6ff
[15:36:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.masto-collab/SAL
[15:36:10] <topranks>	 dcaro: just out of a meeting now 
[15:36:39] <dcaro>	 topranks: np it's already fixed, sorry for the noise :)
[15:36:41] <topranks>	 I'm not 100% sure what the ask is based on the scrollback
[15:36:43] <topranks>	 ah ok no worries 
[15:37:39] <mutante>	 topranks: fixed by sukhe:) thanks so much
[15:48:29] <bd808>	 !log tools Rebooted tools-sgebastion-10 for T336510
[15:48:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
[15:48:32] <stashbot>	 T336510: "Error: disk I/O error" from sqlite3 access attempt made from tools-sgebastion-10 - https://phabricator.wikimedia.org/T336510
[15:51:56] <bd808>	 MatmaRex: rebooting the bastion fixed the flock problem. I would agree with t.aavi that moving to a data store that is not sqlite on NFS is recommended. Sqlite file locking and NFS are not friends.
[15:54:05] <MatmaRex>	 thanks
[19:37:11] <wm-bb>	 <harej> more specifically, if I wanted people to know that I have toys for them to play with (a wikidata query service with a higher timeout limit and a new source metadata-themed wikibase), what do you think would be most effective? if this were a different kind of hackathon I would buy a sponsorship slot (re @wmtelegram_bot: <bd808> @harej: phab board, telegram/irc channel, wiki page)
[20:02:55] <bd808>	 @harej: Write that up on a wiki page somewhere and drop a link in the irc/telegram chat? Maybe also look on https://www.mediawiki.org/wiki/Wikimedia_Hackathon_2023/Connect for someone to act as your agent and spread the word at the event?
[20:32:37] <Platonides>	 wikidata mailing list?