[00:02:04] nova-api.log was filling the logs with "nova-api.log:XXX lineno: 104, opcode: 120" that's a new one for me [04:22:36] looks like cloudcontrol1006 filled up again. I'm about to board a flight so can't help with that [08:14:16] good morning! Just checked, the host seems ok (maybe someone got to it?) [08:14:29] https://www.irccloud.com/pastebin/HVcYH3g5/ [08:34:48] morning :) nice to have you back dcaro [08:43:35] ᕕ(⌐■_■)ᕗ yay! [08:45:31] oh, a little fire happening on cloudcontrol1007, looking [08:45:56] disk space [08:46:46] same issue: [08:46:48] https://www.irccloud.com/pastebin/DXrTjTJO/ [08:46:54] I'll open a task [08:48:50] T352635 [08:48:51] T352635: [openstack] cloudcontrols getting out of space due to nova-api.log message 'XXX lineno: 104, opcode: 120' - https://phabricator.wikimedia.org/T352635 [10:08:39] dcaro: what would be the best way to test changes to maintain-harbor locally? [10:14:24] blancadesal: /me was refreshing the memory xd [10:15:50] if you want a 'full test', you can setup lima-kilo with the build service and harbor, create a couple builds for a couple of the tools (just starting a build should work), and then manually run the script against the local harbor [10:17:11] then you can play with the local harbor, for example removing all the images from a tool project should make maintain-harbor remove the project (empty project), or changing the retention policy should make it reset it to the original one [10:17:55] there's a --config and --debug flags to help with it [10:19:32] since we moved the repo creation to the builds-api, the script is quite self-contained (does not need ldap/nfs/etc., only harbor) [10:20:29] by 'manually' do you mean through toolforge jobs, or not even? [10:24:26] not even, locally from your laptop [10:24:47] (or from the vagrant VM, whichever has access to the harbor instance you are playing with) [10:36:40] * dcaro lunch [10:46:49] dcaro: thanks! took me a bit to figure out I need to invoke the script with `python -m src.maintain_harbor` [11:09:23] I think that refactor was done after I went out [12:31:13] dcaro: I'll be at collab in a few minutes [13:50:57] dcaro: thanks for clearing cloudcontrol1007, I did the same with 1006 but was half asleep and didn't follow up. [13:51:01] i can already tell this is going to be a fun one [13:51:07] (also welcome back!) [14:21:58] I'm installing security updates for Postgresql, for cloudbackup2001/2002, can I simply proceed or does it need some kind of coordination? [14:22:52] moritzm: you can go ahead and do it. It might interrupt backup jobs but they'll just re-run tomorrow. [14:23:33] ack, thanks. going ahead then [15:37:46] * andrewtavis-wmde Hey all :) @balloons suggested that I pop in to ask a question about storing data in a way that it's accessible to PAWS instances. WMDE is doing a student outreach program where they'll be checking public data sources to derive Wikidata mismatches. General idea is that we want to put the data in place for the project where they would ideally be able to query it into PAWS notebooks as chunks. [15:39:31] Rook, do you think object storage is a good place to put the datasets? [15:40:57] I think there was also some concern over memory limits; but this seems like a good use case for PAWS [15:41:25] if the format is non-relational should work well (if the idea is not to "query" the data) [15:41:57] otherwise trove might be a better place (it should be accessible from paws) [15:42:13] * andrewtavis-wmde Alternative that we have is using analytics.wikimedia.org/published to save pre-chunked subsets of the data. This would also work, but isn't the nicest solution from an end user/infrastructure standpoint. [15:42:48] andrewtavis-wmde: "able to query it into PAWS notebooks as chunks" I don't think I understand what this means. Could you elaborate? [15:43:10] * andrewtavis-wmde I think the memory limitations of PAWS are ok. We know that we can't load in 2 GB +, but they can write their workflows and then cycle through the data to generate the mismatches. [15:43:59] * andrewtavis-wmde I'm just assuming that the datasets will in some cases be past what PAWS could directly load in, so if we do want to keep using PAWS we'd need to subset and work in stages. [15:44:55] * andrewtavis-wmde If say there's interest in OpenStreetMaps data or something of that magnitude, I'm not sure if we're getting a 6-7 GB compressed file loaded into PAWS. [15:45:50] * andrewtavis-wmde Sorry if it's still not clear :) [15:45:52] Is this something superset would help with? I believe we can load other things into superset and then the analytics tools are unlocked, and there's some collaboration possible as well [15:46:10] Is the problem that is suspected that a large data file will be pulled down from somewhere, then a notebook will start working on it? [15:47:10] * andrewtavis-wmde Yes we'd like the data to be brought down/in and then they'll be doing traditional Python analytics to derive the mismatches. Also using Python tooling for bringing in Wikidata data. [15:47:27] How big is the data? [15:48:00] * andrewtavis-wmde SuperSet could be an option, but the stated goal from a learning standpoint is Python/SQL (SPARQL). [15:48:29] * andrewtavis-wmde We haven't figured it out yet as the community still need to provide feedback on what they'd find interesting. [15:48:52] * andrewtavis-wmde Asking now as exploration with the OSM data maybe being an upper bound. [15:49:26] * andrewtavis-wmde But also assuming that we'll be over "This data can be a .csv that you can load directly into PAWS". [15:49:59] OSM data is in the hundreds of gigabytes? [15:50:42] * andrewtavis-wmde 6-7 GB was just a quote that I'm getting from a team member. Sorry :) [15:51:19] PAWS has a 24 hour cleanup that will start removing the largest files in a users space until it gets down to 5gb. Thus helping prevent the nfs from filling up, and working under the assumption that 1 core can't work on much more than that. [15:51:44] * andrewtavis-wmde This is great to know! [15:54:14] * andrewtavis-wmde I'm reading into Trove right now. Would there be other suggestions? [15:57:53] * andrewtavis-wmde Ah and object storage was the original suggestion. I guess a few more details on that would be appreciated :) A quick search to me implies we're talking about Swift? [15:58:17] I think I still don't quite understand what is desired. Though if you're looking to pull down a file and feed it to a script, I would do just that in PAWS. [15:59:00] object storage is not yet connected to PAWS, and if it were to be the intent would be to replace NFS. Currently fussing with limitations in our setup with tofu using it as a state repository [16:02:20] * andrewtavis-wmde Sorry to come with only hypotheticals :( If the dataset from an external source like say data.gov or another source that the community would like to look into is larger than what PAWS can directly bring in, then the idea is that we put it in a place where then it's easier for PAWS to digest. [16:03:25] * andrewtavis-wmde In say a situation where a request to get the data from the original source can't be completed with PAWS. [16:03:53] Ok, I think I see. So such a place doesn't exist. Either it is absorbed from the filesystem local to a paws instance. Or it is coming from the network. As such you could either pull down something from data.gov (Or anywhere) and work on it locally. Or you could have the script interact with the data straight from data.gov (And not store it locally to the PAWS nfs) [16:05:42] So the reason for PAWS is because you want the experience to be using python to crawl and process data right? But if the expectation is the data is coming from a SQL database, d.caro's suggestion of using trove I think makes sense. You can load as big a db as needed into trove, and folks in PAWS can connect to it, query, and process data [16:06:23] yes, object storage is via swift.. And trove is Database-as-a-service. I'd suggest mysql [16:07:42] * andrewtavis-wmde Yes you're correct on PAWS: big thing is that we want the notebook experience and it allows us to bring in Python packages to access Wikidata or allow them to work with dumps with Python. [16:11:33] * andrewtavis-wmde Appreciate the feedback on all of this! Sorry again that on my end there's no explicit file sizes as of yet. We're still in the planning, but are trying to get ahead on the infrastructure part so it's not too rushed at the start of the program in January. [16:21:18] Rook: do you happen to know if it's easy or hard to inject a metadata value into a VM when starting with tf/ot? [16:22:09] The context is: I need a metadata switch to tell a VM if it's going to be puppetized or not (T352635). The easiest thing for me is to make things puppetized by default and add a special flag for unpuppetized. But it's kind of arbitrary which I make the default. [16:22:10] T352635: [openstack] cloudcontrols getting out of space due to nova-api.log message 'XXX lineno: 104, opcode: 120' - https://phabricator.wikimedia.org/T352635 [16:22:26] Whoah that is not the task I meant to link. T326818 [16:22:27] T326818: Support 'unmanaged' projects in cloud-vps - https://phabricator.wikimedia.org/T326818 [16:23:13] andrewbogott: I have not tried this on openstack. iirc it worked fine on other providers. I did try with magnum (rather than a vm), and had no luck [16:23:23] ok [16:24:02] Hm, this makes me wonder what's happening with magnum and trove VMs now. Maybe trying to run puppet and failing? [16:24:03] * andrewbogott checks [16:25:27] Rook: would you assume that would mostly want puppet or mostly not want puppet? I think I lack imagination about future applications. [16:25:50] * andrewbogott anticipates Rook saying that no VMs should have puppet ever [16:26:46] * Rook is concerned with the realization that andrewbogott can read minds or see the future [16:28:08] Welp I guess I should think about how to make unpuppetized the default and add a metadata flag for 'puppet me' rather than the reverse. But I reserve the right to do whichever turns out to be easier [16:29:25] 👍 [16:29:41] * andrewbogott wonders what happens if the vendor and the user both set a metadata key to different value [16:30:00] Why would they be using the same key? [16:30:53] Just wondering if I can have the vendordata contain a default of 'pupppetme=True' and allow users to override that in userdata. [16:31:48] These are very minor distinctions. It could also just be a second key 'buttheusersaysdontpuppetme' [16:32:39] I would expect if it is set by default, then changed after it wouldn't revert. [16:34:26] yeah, likely [17:37:32] phabricator notifs reminded me of T208197 today. That's another counter-example to point to when someone says "nothing in WMCS is production software". [17:37:33] T208197: ContentTranslation relies on recommendation-api running on Cloud VPS - https://phabricator.wikimedia.org/T208197 [18:29:40] * bd808 heads towards an airport [18:33:30] see you later today!