[00:00:43] !log tools.wikibugs Updated channels.yaml to: 1909d61ffd9df086a536257524a66f9666687f78 phorge: Replace `fab` library with local client [00:00:46] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikibugs/SAL [00:06:59] heh. I didn't realize that the wb-test bot in tools.wikibugs-test would log here. I probably should make that a config thing. [04:49:53] Is there a upload quota for object storage on Cloud Services? [08:54:49] harej: there's a space limit, but afaik there's no traffic limit [08:55:45] !log lucaswerkmeister@tools-sgebastion-10 tools.bridgebot Double IRC messages to other bridges [08:55:48] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.bridgebot/SAL [09:42:03] !log tools reboot tools-sgeexec-10-20, -21, -23, sgeweblight-10-32 due to stuck nfs procs [09:42:07] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:46:36] !log taavi@tools-sgebastion-11 tools.wikibugs toolforge jobs restart irc [09:46:38] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikibugs/SAL [10:34:20] !log tools rebuilding all docker images for https://gerrit.wikimedia.org/r/c/operations/docker-images/toollabs-images/+/1005952 (T293552) + normal package updates [10:34:25] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [10:34:26] T293552: Remove Python/webservice-runner from toolforge web containers - https://phabricator.wikimedia.org/T293552 [15:21:41] !log melos@tools-sgebastion-10 tools.stewardbots Restarted SULWatcher disconnected [15:21:46] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL [15:24:47] !log admin renewing puppet ca cert for cloudinfra-internal puppetmaster [15:24:50] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [15:36:25] !log admin renewing puppet ca cert for cloud-puppetmaster-03 [15:36:32] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [17:13:14] heyo, do you know if the public Superset instance is down? `SELECT * FROM recentchanges LIMIT 10;` in SQL Lab returns nothing/seems stuck [17:17:23] Using MariaDB directly, this same statement is returning the results in 0.007s, with Superset, if it doesn't fail, the same results are returned after 4 mins [17:18:46] I can run _some_ queries on superset but I don't have references to say if they're slower than usual. Rook might know more. [17:19:23] So weird... Opening public charts or other requests works fine... [17:19:32] But typing my own is buggy [17:20:05] 03/06/2024 18:15:14  - Duration: 00:03:36.68 - Rows: 10 - SELECT * FROM recentchanges LIMIT 10; [17:20:48] Wait wait wait, are my own requests stalled? Meaning, there is a queue? [17:23:11] I can run "SELECT * FROM recentchanges LIMIT 10" on db s1, schema enwiki_p [17:23:47] Yea, I can too, but it seems that my queries were stalled because a big one was still executing (I opened two tabs) [17:24:08] though I don't know if that's the case, it sounds reasonable to me :) [17:24:11] I'm not sure how superset handles concurrency, but I wouldn't be surprised if it has some queueing system [17:24:56] Dunno if it is reasonable: I could open 30 charts at the same time and it doesn't seem there is a queue for them [17:29:22] If superset doesn't have any parallel work in progress limits itself, the user accounts it uses to talk to the databases do. Nothing scales infinitely horizontally. [17:30:08] Yea I know, but I am alone on the service... [17:30:36] (just checked the history) [17:30:51] I don't remember this type of limitation with Quarry. [17:32:22] Alright, I planned to create a dashboard for frwiki admins, I hope it doesn't break when showing them [17:32:32] Quarry very much has a parallel process cap. All of quarry is built around a work queue with a fixed number of job runners [17:34:02] Should I report this difference between the two services on Phab? Because If I understand correctly, Quarry is going to be replaced by Superset [17:34:23] Or is it a planned feature difference? [17:37:51] I think it's useful to have a task in phab, if anything for future reference if other people encounter similar limitations [17:38:25] I would probably describe the issue you're seeing, rather than comparing with Quarry, but up to you [17:38:38] Yea yea, XY problem I know! [17:38:42] :) [17:46:50] !log admin running "wmcs-dnsleaks --delete" to clean up 2 leaked records (tools-sgeweblight-10-32) [17:46:55] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [17:57:50] https://phabricator.wikimedia.org/T359431 [18:01:58] Also, are these fake? https://github.com/toolforge/superset-deploy/blob/main/ansible/vars/codfw1dev-secrets.yaml [18:07:24] I'm back. To answer some of the above. Quarry doesn't have to replace superset, but "replace" won't mean quarry features are going into superset. Superset is limited in what kind of customization it supports. When I last checked, about six months ago, the assumption was that you added patches right into an install, which in our case means rebuilding all the superset containers with our patches, and maintaining that forever. That is not [18:07:24] the plan. [18:08:34] meh, I guess we can't have everything.. [18:10:12] Quarry itself is setup to go to kubernetes. At which point it could run forever-ish as we could keep redeploying the same image. I haven't done this because it would require changing the NFS, which has a chance of removing the history. Though if it gets deleted by SRE in June, then there is no history, so it makes the transition zero risk. The real reason that I haven't touched it, is that I get pushback about keeping it running from my [18:10:13] management chain who would like quarry to be removed, as it is old and weird, and periodically causes problems with its history retention (aka forever) [18:10:35] codfw1dev is a test environment, the secrets for it can be exposed [18:12:29] I think Superset is generally a good replacement of Quarry, except some points that means redefining some workflow for some users, but the concurrency limitation could be a significant problem for user experience [18:13:43] I don't think that the community is generally abusing open services, also Superset is hidden behind SUL, so it's easy to find the baddies [18:14:38] But then again, do you even had problems with Quarry usages? [18:15:36] I'm not convinced that superset is the overall way to go. One doesn't have to look further than https://phabricator.wikimedia.org/T169452 to see that Quarry is quite loved, regardless of it's weirdness. Superset has some kind of backwards ways of doing things, like assuming you're installing on VMs, at least at the moment they have helm but it seems to get less attention [18:16:31] Not going to say the opposite, but I also know that some people just don't like changes [18:16:51] Took me months to find the energy to look on the "new" build service [18:17:19] Still not a great friend with Superset, but you understand how it works with a bit of practice. [18:17:46] In particular superset (and other data analytics tools) do not keep the results around. This seems very intuitive to me. Quarry does, and people really like that feature. In my mind it is stale data, but a lot of people really like that they have it. Basically over the last decade we've instilled bad habits, but now they are here. It might be less work to just have quarry live on k8s and be done with it. Maybe start trimming the outputs [18:17:46] at 90 days or the like as they periodically have info that needs removed [18:17:51] The big pro of Quarry is results caching [18:17:58] Ah, oops, you said it [18:18:06] And I didn't say pro :p [18:19:08] So, it means that Superset datasets are really not static? [18:19:26] Cause I tried multiple times to save one, and Superset was keeping running them again and again [18:19:35] yes, superset goes and pulls new data all the time. it only keeps it around for as long as your session is open [18:20:24] indeed, comparing is not cool, but it could be irritating in the long term to keep re-running again requests when you don't need new data [18:21:42] like this one take five minutes https://superset.wmcloud.org/sqllab?savedQueryId=81 - I can support it cause I only need once, but I can easily find "backlogs" requests on Quarry [18:22:22] We're getting side tracked. I want the community to be satisfied. I'm not seeing that satisfaction with superset. And from a systems perspective I'm not thrilled about superset either. The main issue appears to be data caching. Which won't be replicated in superset, or metabase, or probably any other data analytics tool. So that leaves Quarry as a real option [18:23:08] Do you mean that Superset was only for testing? [18:23:26] Or the instance can be kept [18:24:08] No, superset is what the foundation is supporting at this point. Quarry is "officially" community support only. And it's running on buster and should vanish in June if nothing is done. [18:24:40] Superset can probably stay around, some people like it. The only problem is moving the state. Some day there will be an upgrade where the state doesn't move well and we have a clean version running [18:25:27] What do you mean about "state"? The existing data like saved queries? [18:25:31] Quarry saves state as plain flat files, which we can be fairly confident we can move around. Also all the code is controlled by the community so we don't have to change it [18:26:16] Yes saved queries, dashboards, things like that. I pull the DB out on an upgrade and put it into the new install in a blue/green deploy thing. If the new install doesn't play nice with old db, well [18:26:23] aaaaaaaaaaaa [18:26:39] So a dashboard could disappear? [18:26:45] Always possible [18:26:48] oh god [18:26:50] kill me [18:26:57] It's data, it can always disappear [18:27:00] Same on Quarry [18:27:25] They are not using something like migrations between releases behind Superset? [18:27:49] Imagine Wikipédia getting reset at each new mw version [18:28:08] They might be, I haven't figured out their methods of doing so on superset. Usually there are notes in the changelog if things need run to update [18:28:55] Ultimately we're a small handful of people maintaining a cloud, It remains amazing to me that it works to the extent it does. SRE at large is a much larger group, maintaining a much less diverse set of products, thus higher confidence [18:29:34] superset is not a wiki Lofhi. it's not even a platform for building apps. it's a database interface with some visualization helpers [18:31:17] I hear you loud and clear, but if saved queries could be wiped between each releases and couldn't be migrated between each release, it looks like a tool to avoid cause not stable enough for the moment [18:31:50] All in all, saved queries are just strings of SQL statements, I could hear that charts could break... [18:32:03] A detail of perspective could be that quarry is outright slated to be removed in the summer [18:32:11] But for the time being, it doesn't inspire much confidence. [18:33:13] The context is: if I build a dashboard for frwiki admins, and frwiki patrollers to help them, I really don't want to rebuild the dashboards at each release from scratch [18:33:28] bd808: do we have a document outlining WMCS data reliability? I didn't know that we offered any, though maybe we do? [18:34:15] Lofhi: I don't blame you, I wouldn't want to either. Though state is hard to keep in place, and it takes a lot of engineers to do so with confidence, we don't have that. Nothing in WMCS has that [18:34:42] !log bd808@tools-sgebastion-11 tools.wikibugs Restart phorge task to pick up changes for T359145 and T127506 [18:34:47] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikibugs/SAL [18:35:00] I don't understand, it it linked to Superset itself, or your team ressources? [18:35:11] I feel like we're not loud enough in pointing out that any data placed on a WMCS service could vanish at any point. Perhaps because we've magically avoided most significant data loss over time? (Again, I'm always amazed with what we do offer) [18:35:41] Lofhi: sorry I don't understand your question, could you rephrase? [18:36:11] I mean, using WMCS to host data is way more reliable than hosting it myself on a VPS until I ended up under a car [18:36:31] Rook: I guess the official stance on reliability is https://wikitech.wikimedia.org/wiki/Wikitech:Cloud_Services_Terms_of_use#9.2_Disclaimers which pretty much says "we try, but shit happens" [18:36:51] thanks bd808 in line with what I thought [18:37:18] "Though state is hard to keep in place, and it takes a lot of engineers to do so with confidence, we don't have that": I don't understand if the problem is linked to Superset only, or it's more about a lack of people to maintain the WMCS [18:37:23] Lofhi: which VPS were you hosting on that was less reliable than WMCS? [18:38:25] Ah! All of WMCS, superset is run on WMCS, so it is subject to the same "we try, but shit happens" level of reliability [18:40:05] You're looking at this the wrong way round: even if data is lost on WMCS, it is more simpler to share a project with other wikimedians with the new recents tools (gitlab, admin tools, etc.). This centralization is far more positive than having volunteer developers host everything on their own, without access to source code or a runtime environment. [18:40:06] Data redundancy, or even simple backup, is more of a problem for the tool maintainer, not so much for WMCS as it stands. [18:41:28] So I backup the superset DB. But I'm backing it up to a VM on WMCS. I believe I'm still mostly in the same boat, just with less worry that I'll have an issue if superset itself goes bonkers [18:44:09] PAWS and Superset and others are fully defined in code. If PAWS was erased, I would have it all back shortly thereafter, though all the home directories would be empty. Same with superset regarding charts, and dashboards and the like. Though with PAWS, it's just a bunch of files, upgrading paws isn't going to interact with those differently. If the superset group decides they could make the old db not work with the new version [18:44:33] With superset I don't control that element of things. Seems unlikely, but it could happen [18:47:05] For the current point, I think you're overthinking. Data lost on WMCS would be awful, but if volunteers complain about it if it happens extremely rarely (if ever), excuse me but they're a bunch of jerks since the service is free and generally good. As for the question of Superset backups and redundancy, I imagine there's no better solution, because [18:47:05] redundancy is not one of your missions. I might need to ask the board for more resources for WMCS! [18:48:08] However, it seems that data loss is more likely to occur with Superset than with any other service: perhaps I've misunderstood? That's why I asked if it linked to the software itself. [18:48:09] That last part about more resources is always welcome! :p [18:50:09] "If the superset group decides they could make the old db not work with the new version", alright, just a risk, that could be mitigated upstream by notifying Superset users, that seems acceptable to me. [18:53:04] So superset itself is one large obtuse project that we run. It's not my favorite because the systems side of things is poorly defined. Making it hard to trust how well it would be running at any given time. I've mostly fixed (but haven't deployed) that in quarry, which is much simpler than superset, and we control all the parts to it. I think you're looking for an assurance that I cannot offer. Our deploy of superset relies on just me, [18:53:04] and thus my ability to overcome whatever silliness it may send in our direction. And it has offered a fair amount of silliness, alluded to above in the systems side of things being poorly defined on it. It's more a tool that seems to require hacking on, and thus, in my perspective, lacks maturity [18:53:34] Do I trust myself enough to say that we should rely on superset (or quarry, or paws) to have data remain there. No, I do not trust myself that much [18:54:45] Nah, I was requesting nothing, just wanted to have feedback and expertise from people running it, that's all, for the rest I'm good! [18:55:04] The wikitech page about Superset is like 10 lines, so I come to the source... [18:55:23] Sounds reasonable [18:55:28] (at?) [18:55:42] At what? [18:56:32] Was checking my not-native-ugly-english [18:56:56] Oh, forgot [18:57:00] !log bd808@tools-sgebastion-11 tools.wikibugs Restarting gerrit job; last event logged at 2024-03-06T17:56:59Z (T359096) [18:57:05] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikibugs/SAL [18:57:52] The limit of 60s for charts is the default setting? Is it destined to change? to be bigger? [18:58:39] Open up a ticket we can have a look [18:59:38] Once again, I didn't necessarily want to increase it, I just wanted to know what type of requests I should limit myself to! [19:00:08] Sky's the limit, so long as you're ok with being told 'no' to things that aren't likely [19:00:38] 👌 [19:49:42] sleepy, byebye, thanks for help