[07:34:01] toolforge is now serving web traffic over ipv6 [07:44:13] \o/ [08:10:52] hmm, got failing probe for the toolforge api on v4 [08:11:08] I can curl though [08:11:18] [#wikimedia-cloud-admin-feed] ^ just deleted instances, will clear soon, sorry [08:11:39] ack [08:11:44] the remove instance cookbook tries to silence alerts for that host but that does not work for blackbox probes for some reason [08:12:11] there's no labels with the instance name [08:12:17] just the external fqdn I think [08:12:46] I guess it just blindly uses `instance=hostname` for the silence [08:12:47] huh [08:12:53] in that case, let me have a look [08:13:13] I started adding `service=ceph,mgr,...` labels to some alerts, so we could [08:13:29] filter alerts not specific to that instance but to derived services too [08:15:01] anyway, the alert seems to have cleared for now [08:15:17] yep :) [08:15:51] i think the issue might be that the instance= label has a port number appended on the blackbox metrics that's not there for the rest [08:36:08] quick review https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/32 [08:36:54] (just checked the email) the instance for the api had `instance = api.svc.toolforge.org:443`, nothing related to the proxies :/ [08:56:50] hmm... how does svc.beta.toolforge.org show up in the list of domains for webproxies in toolsbeta? (toolsbeta.org does not show up), is it a config somewhere? [08:57:35] probably https://gerrit.wikimedia.org/g/cloud/instance-puppet/+/e7cc9efe6897d7ebedbe8c46ce9190d869700e5f/project-proxy/proxy.yaml#43 [08:58:13] https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Web_proxy#Enable_per-project_subdomain_delegation [08:59:57] thanks, I think we might not have followed that process for `toolsbeta.org` (or missed some steps I guess) [09:00:02] I'll add a task to clean up [11:38:05] * dcaro lunch [12:16:49] well I found where all the toolforge NFS space went: https://phabricator.wikimedia.org/T395020 [13:10:51] dhinus: chuckonwu I have left a few tasks flagged with 'good first task' that you can pick up when you are done with the current one, can you give them a review and let me know if they are clear/easy to understand? [13:15:38] dcaro: sure, is that on the toolforge phab board? [13:16:24] yep [13:16:54] https://phabricator.wikimedia.org/project/view/7905/ [13:18:32] found them, they look good thanks! the ones we identified with arturo were: T384251, T394276, T349775 [13:18:33] T384251: [jobs-cli] If the pod exists and it has no logs, read the message status from it and output that - https://phabricator.wikimedia.org/T384251 [13:18:33] T394276: [components-api] Add basic prometheus metrics - https://phabricator.wikimedia.org/T394276 [13:18:33] T349775: [toolforge,jobs] "toolforge jobs logs" fails when job has not started yet - https://phabricator.wikimedia.org/T349775 [13:18:40] the prometheus one might be a bit too big [13:19:29] I think that Raymond_Ndibe might have started it [13:21:03] the other two look ok, I'd start with the components-api ones though, as they would help with the hypothesis work ( so more reportableā„¢ :) ) [13:21:20] dcaro: good point [13:22:00] maybe starting from T394994 and T394990 [13:22:00] T394994: [components-cli] make `toolforge components deployment show` show the latest deployment if no id passed - https://phabricator.wikimedia.org/T394994 [13:22:01] T394990: [components-api] add `GET` endpoint `/v1/tool//deployments/latest` - https://phabricator.wikimedia.org/T394990 [13:22:27] sounds good yes, note that the cli depends on the api one [13:47:09] dcaro, can I assign T394333 to you to double check racking balance? That's for the first order of jumbo-sized osds. [13:47:09] T394333: Q4:rack/setup/install cloudcephosd10[48-51] & relocate cloudcephosd1039 - https://phabricator.wikimedia.org/T394333 [13:47:36] andrewbogott: ack, I'll try to give it a look, when is it needed? [13:48:26] hm, you're about to leave for two weeks aren't you? [13:49:28] It's possible but unlikely that the hardware will show up before you're back. [13:49:46] If you don't have time I can make an attempt, I just know you have a plan already :) [13:50:23] I'll let you know then if I have time to get to it :) [13:51:01] thx [14:30:25] taavi: do you have thoughts or a task about the recent increase in DNS leaks? If not I'll open a task. [14:30:45] no! [14:34:18] andrewbogott: https://phabricator.wikimedia.org/T395020#10848314 asks if our NFS mounts support `atime`. do you happen to know that already? [14:37:11] I responded on the task [14:37:23] they don't. [14:37:55] thanks! [14:39:50] * andrewbogott makes T395037 [14:40:13] um... T395037 [14:40:14] T395037: new, frequent DNS record leaks in wmcs - https://phabricator.wikimedia.org/T395037 [15:37:41] Is there any HTTPS frontend for the Cloud VPS S3 storage stuff? I'm wondering if a project that uses S3 storage needs to build it's own web ui to look at stored things or if buckets can be marked as public and then just browsed/deep linked into. [15:39:50] bd808: if you mark a bucket as public you can access individual files directly via https, but there isn't a graphical browser like mod_index [15:40:06] (you may or may not have an xml index listing all the files, don't remember the exact details) [15:40:38] ack. Thanks taavi. The possible use case here is Zuul job logs and I think that direct URL access is what it would need. [15:41:46] yeah, as long as you know the bucket and file name you can construct the URL manually [15:45:16] I guess we could expand openstack browser to consume swift apis? But that seems like a lot of scope creep. [15:45:40] listing buckets maybe, but otherwise i don't think that's in scope for that tool [15:47:50] I think I agree... [15:48:01] it would be easy enough to expose objects but then that has me thinking about scrapers :( [15:50:05] Something like https://github.com/rufuspollock/s3-bucket-listing might be a better thing than adding file viewing to openstack-browser. [18:14:02] Just got an alert about tools nfs [18:14:32] ...and a recovery [18:15:36] I think tools-static might have gotten borked, restarting nginx [19:12:45] andrewbogott: yep, there's a few workers now with processes stuck on nfs :/, can you handle it? I'm in an uncomfortable platfrom (not laptop) [19:13:04] yep, I was waiting for them to show up [19:19:42] I'm rebooting 8 stuck workers -- need to step out but will take care of any new/remaining workers when I'm back [20:35:00] andrewbogott: thanks!