[06:35:32] !log library-upgrader libup@upgrader-06:/srv/libraryupgrader$ pip install -U wikimediaci-utils hotfix [06:35:33] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Library-upgrader/SAL [09:26:23] !log admin-monitoring deleting instances leaked by T327980 [09:26:26] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin-monitoring/SAL [09:26:27] T327980: eqiad1 VMs can no longer contact the nova metadata service - https://phabricator.wikimedia.org/T327980 [14:20:25] taavi: any issues with k8s? Seems pods get stuck at initializing state? [14:20:37] which tool? [14:20:55] taavi: https://k8s-status.toolforge.org/namespaces/tool-mabot/pods/test-l6ts7/ [14:21:50] hmmmm [14:21:52] how did you create it? [14:22:23] toolforge-jobs run test --command "./test.sh" --image python3.9 [14:23:49] I'll delete the job for now if that's okay [14:26:51] sure. I can't see what's wrong with it, so curious to see if it happens again [14:27:05] retrying [14:28:05] after 3 times, completed now [14:28:46] very weird. it complained something about volumes (lost the exact error somehow already), but nothing seems to be wrong with them [14:29:57] something related to current work in -operations? [14:30:22] do you see anything related there? I don't [14:30:43] I don't know for sure :) Just joined IRC [14:31:02] Seems fixed for now though. Thanks taavi :) [15:05:03] !log tools drain and reboot tools-k8s-worker-74 which seems to have some issues with nfs [15:05:05] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:01:07] When using Horizon UI, if I get an internal server error (HTTP 500), the details don't specify what I should do with the error ID. ("Internal Server Error. Please keep this ID to help us figure out what went wrong: (db1aa5e4-583c-4d77-8069-470297e7cfe6). (HTTP 500)" The issue resolved itself with a retry of loading the page, but I'm curious: if I were to continue seeing server errors in Horizon, is there somewhere I could [17:01:07] look to check the status of the service? https://www.wikimediastatus.net/ seems to broad, https://wikitech.wikimedia.org/wiki/Incident_status seems to be for more serious issues. [17:02:14] tburm: we don't have anything like that today [17:25:04] Is this the right place to discuss with owners of labweb? [17:32:52] brett: this or -cloud-admin depending on what you need, yes [17:34:35] Cool! I'm looking at https://phabricator.wikimedia.org/T236065 which is looking to remove unused plain HTTP services from LVS. labweb seems ripe for removal based on imvsadm reporting no traffic [17:35:05] So I wanted to talk to some people familiar with it to confirm whether it's a good/bad idea to remove it [17:35:30] *ipvsadm - [17:36:45] yeah, that should be fine. I might even have sent a set of patches for that at some point, give me just a sec to find it [17:37:53] yeah, starting from https://gerrit.wikimedia.org/r/c/operations/puppet/+/831173. the entire thing needs s/labweb/cloudweb/ and apparently I too realized that the plaintext variant is unused and could be removed [17:38:47] oooo [17:40:25] taavi: Were there any concerns that kept you from adding reviewers? [17:40:52] no, I think I just forgot that [17:56:07] taavi: Thanks for reviving that and adding a reviewer! [18:48:31] tburm: the status site is mostly for public services that face the general public. Incidents will cover internal services too so if this Horizon issue turns out to be due to an outage, it might get an entry on Wikitech afterwards (not sheikh). [18:48:42] not during*. Sorry [18:49:06] The maintainer of two of the open-source projects I use (jinja2 and select2) accepts (ok, solicits) sponsorships. WMF gets great value from the many bits of open source we use. Is there any mechanism whereby I can ask/direct/beg/whatever WMF to sign on as a sponsor? $100 or even $1000 would be pocket change to the WMF, but would go a long way towards supporting projects we depend on. [18:49:42] people have asked for that for projects the WMF uses in production. [18:49:54] and what has the answer been? [18:50:33] tburm: for internal services like cloud, analytics, performance tooling etc, we do have good monitoring. That would Eg be on alerts.wikimedia.org (staff only) or socially by checking the relevant IRC channel. Eg #wikimedia-cloud might know at the time whether there is a known issue with Horizon. [18:54:24] roy649: mostly that nobody budgets for this and that it is unknown how to do so :/ [18:54:42] Hmm. Sadness. [18:57:20] and given that the replacement for technical grants has been stuck for a year since it was supposed to be announced, I don't see that changing anytime soon [19:00:40] Extreme sadness. [19:00:49] technical grants are a totally different thing than the WMF donating money to an arbitrary FOSS upstream [19:01:59] enwiki doesn't even want us to pay our own devs, so finding a way to pay others is hard ;) [19:02:44] Explain "enwiki doesn't even want us to pay our own devs" [19:02:51] Andreas [19:02:57] ^ [19:03:10] Who is Andreas? [19:03:39] The person who wrote the RFC that caused a drastic change in December's banner fundraising [19:03:54] Oh, that. [19:04:11] What was a bit of a shit-show. [19:04:13] the one who things WMF should offshore all its devs since it's cheaper [19:05:27] s/What/That/ [19:27:29] WMF *does* do some off shoring. It hires plenty of staff in places with lower COL than America [20:34:31] !log admin shutting down mariadb on cloudbackup2001-dev, testing the waters for T328079 [20:34:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [20:34:35] T328079: decommission clouddb2001-dev.codfw.wmnet - https://phabricator.wikimedia.org/T328079