[05:00:56] addshore: thats's just a warning, right? [05:01:18] we have a fix ready, will be included in the next webservice release [07:36:39] majavah: well, I was running the webservice command inside a webservice container, seemingly it doesnt like that :D [07:36:57] uhh [07:36:59] which container? [07:47:59] ah yeah I was wondering about that too… I assume it's not possible to manage kubernetes jobs from a kubernetes job itself, right? [07:49:20] in my case, I have a job which can sometimes stall in unclear ways (without failing), and so I have had the need for a periodic job to observe the first job's output and restart it if it is stalled [07:50:12] it's a bit ugly >_< [07:54:56] pintoch: you definitely can do that, the credentials needed are included in your container [07:55:19] * majavah needs to run and can't explain further [08:18:31] I am not sure this is the right place to ask this. Is there a way in https://quarry.wmcloud.org/ to query for users blocked in more than one project? I failed to do this for more than one project at the time [08:27:41] matanya: if nobody follows up here, I suggest you try the email list cloud@l.w.o [08:27:53] Thanks arturo [08:28:03] np [08:31:54] matanya: not really, you would need to check all wikis and quarry lets you query one wiki at a time [08:32:38] Unless I use programmatic access, I presume? [08:33:21] if you're doing something like that, you should be just querying the replicas directly instead of going via quarry [10:30:35] !log tools.lingua-libre-bot [git] Updated to fe05237 : added support for shywiktionary [10:30:37] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.lingua-libre-bot/SAL [10:36:33] !log tools add toolforge-jobs-framework-cli v5 to aptly buster-tools/toolsbeta [10:36:36] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [10:39:27] !log toolsbeta deploying jobs-framework-api 16fbf51 (T286135) [10:39:30] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [10:39:30] T286135: Toolforge jobs framework: email maintainers on job failure - https://phabricator.wikimedia.org/T286135 [10:44:58] !log toolsbeta deploying jobs-framework-emailer 51032af (T286135) [10:45:01] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [10:45:01] T286135: Toolforge jobs framework: email maintainers on job failure - https://phabricator.wikimedia.org/T286135 [11:42:33] !log toolsbeta deploying jobs-framework-emailer 3045601 (T286135) [11:42:36] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [11:42:37] T286135: Toolforge jobs framework: email maintainers on job failure - https://phabricator.wikimedia.org/T286135 [15:45:10] !log toolsbeta disable podpreset admission plugin in toolsbeta T279106 [15:45:14] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [15:45:15] T279106: Establish replacement for PodPresets in Toolforge Kubernetes - https://phabricator.wikimedia.org/T279106 [15:45:48] !help, can I get a sysadmin to help identify high traffic to archive.org from either Toolforge or Cloud VPS with exception Cyberbot/IABot? [15:46:28] Someone here, or maybe a collection of someones, is hitting the API on the Wayback Machine too hard. [15:46:50] we are currently on a meeting Cyberpower678 [15:47:00] please open a phab task meanwhile [15:47:06] arturo: oh ok. When should I come back? [15:47:11] Sure [15:52:13] https://phabricator.wikimedia.org/T290983 [17:37:12] I guess asking for IP wasn't that useful, it's the external NAT IP [17:49:56] the call is coming from inside the house! [19:12:05] !log tools.lexeme-forms deployed 902156ddb8 (Croatian item ID fix) [19:12:08] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.lexeme-forms/SAL [20:21:26] bd808: Libera Chat has changed their policies so we can "double bridge" via Matrix, so I'd like to test getting a channel bridged as Telegram<-->Matrix<-->IRC. On the IRC/Matrix side, we'd see Telegram users as individual accounts/nicks ("puppets"), but on Telegram it would still appear as a bot (same as wm-bb) [20:23:29] legoktm: should we work out how that all happens in a test channel somewhere or would you like to use this channel or #wikimedia-hackathon as the experiment source? [20:23:36] is Matrix stable enough to be the bridge between IRC and telegram these days? Or am I misunderstanding something? (re @wmtelegram_bot: bd808: Libera Chat has changed their policies so we can "double bridge" via Matrix, so I'd like to test getting a channel bridged as Telegram<-->Matrix<-->IRC. On the IRC/Matrix side, we'd see Telegram users as individual accounts/nicks ("puppets"), but on Telegram it would still [20:24:18] *Matrix bridges stable enough [20:26:47] bd808: a test channel or one that's low traffic-ish would be good to experiment on, though I suspect if there are issues it'll only show up when there are like 100+ people being bridged [20:27:37] @chicocvenancio: my anecdotal feeling is that the Matrix bridge has outages/issues on the same frequency IRC has netsplits. [20:28:51] good! My experience was worse several years ago. (re @wmtelegram_bot: @chicocvenancio: my anecdotal feeling is that the Matrix bridge has outages/issues on the same frequency IRC has netsplits.) [20:29:08] the biggest win in switching to bridging like this is that on Matrix/IRC the messages come from individual nicks, so replying/tab complete all work, and we can kick people from here without needing to wait for a Telegram admin [20:30:32] Also there will be little things that are nicer, like the Matrix<-->IRC bridge turning long multi-line messages into pastebin-type links [20:31:02] I wonder how it deals with edits [20:31:38] legoktm: I'm up for trying things, but I don't know anything about matrix really. I know matrix can handle the "talk to irc" part itself, but does it also natively do the telegram connection or does that require running something like https://github.com/mautrix/telegram? [20:34:39] yep, I was thinking we can use https://t2bot.io/telegram/ which is a hosted instance of that project for experimentation, and then we could self-host it in Toolforge [20:35:36] @chicocvenancio: it tries to calculate the diff and ouputs a s/tyop/typo/ style message, which I think is super IRC geeky :) [20:36:19] https://github.com/matrix-org/matrix-appservice-irc/pull/1465 has some more examples [20:36:42] thats awesome! Better than the very verbose solution matterbridge uses (re @wmtelegram_bot: @chicocvenancio: it tries to calculate the diff and ouputs a s/tyop/typo/ style message, which I think is super IRC geeky :)) [20:36:53] thats awesome! Better than the very verbose solution matterbridge uses. eg (re @wmtelegram_bot: @chicocvenancio: it tries to calculate the diff and ouputs a s/tyop/typo/ style message, which I think is super IRC geeky :)) [20:38:55] bd808: as for actual next steps, depending on which IRC channel we want to try it with, I could create the Matrix room, bridge it to the IRC channel, then follow the t2bot steps jointly with you doing the telegram side [20:39:30] legoktm: would we also need to run https://github.com/hifi/heisenbridge for the matrix<->irc bit or would the matrix side be a portal room like #libera_#wikimedia_cloud? [20:41:27] no, we would use the existing Libera/Matrix bridge. We'll have a room like #wikimedia-cloud:matrix.org, which is "plumbed" (using the Matrix.org bridge) to Libera's #wikimedia-cloud (I keep meaning to write a guide/glossary to Matrix jargon) [20:42:17] ah. Yeah I just read a tiny bit about portal vs plumbed at https://matrix.org/bridges/ [20:43:17] anyway cool stuff. maybe a reasonable next step is working out a short design doc on what and why and then giving it a shot. [20:43:35] !log tools.quickcategories deployed 6d375c7dcc (link to Commons edit groups) [20:43:38] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.quickcategories/SAL [20:43:40] sure [20:44:14] I've got all the superuser rights to play with it in this channel. I'm sure we could find an admin for the Telegram side of #wikimedia-hackathon to help there too. [20:45:06] I'll write something up tonight then :) [20:52:30] !log tools.lexeme-forms deployed c36ae4154a (l10n updates) [20:52:34] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.lexeme-forms/SAL [21:02:46] hi cloud team! i have 2 questions about monitoring for cloudvps instances and toolforge tools [21:02:48] 1. The wikitech docs say there are metrics for every node in the toolforge cluster https://grafana-labs.wikimedia.org/d/000000059/cloud-vps-project-board?orgId=1&from=now-5m&to=now&var-project=image-suggestion-api&var-server=All where could i view those? [21:04:05] nikkinikk: https://grafana-labs.wikimedia.org/d/000000059/cloud-vps-project-board?orgId=1&from=now-5m&to=now&var-project=tools&var-server=All -- but I kind of doubt that you will learn much there. What are you actually trying to figure out? [21:04:58] 2. the metrics under this project board https://grafana-labs.wikimedia.org/d/000000059/cloud-vps-project-board these are all automatically created from each individual cloud vps projects ya? [21:05:36] bd808: I'm just trying to map out what is possible and whats not possible for projects depending on where they are hosted! [21:06:25] *what is possible and whats not possible, monitoring wise [21:08:27] That grafana board you found is really the only user visible metrics data. We do not have any solid shared monitoring/alerting system for Cloud VPS/Toolforge hosted things. Mostly we have that basic infrastructure data collection for the VM instances. [21:10:00] There is another prometheus based data collection system that is enabled for some projects. It also has some basic alerting capability, but right now it is not scaled to handle all Cloud VPS projects and really only alerts WMCS roots about things like puppet failures and suddenly missing VM instances. [21:11:16] T266050 is about the prometheus system [21:11:16] T266050: Build Prometheus service for use by all Cloud VPS projects and their instances - https://phabricator.wikimedia.org/T266050 [21:13:37] The dashboards you have found already are powered by a data feed that we would like to get rid of (Diamond) T264920, but that is blocked by building out the prometheus replacement [21:13:37] T264920: Grafana "cloud-vps-project-board" needs to be migrated from Graphite to Prometheus - https://phabricator.wikimedia.org/T264920 [21:16:26] bd808: ok cool cool, thanks for that thorough answer. So were not in a place to handle individual cloud vps project monitoring beyond the defaults. just out of curiosity, is it TECHNICALLY possible for those individual cloud vps instances to give metrics to prometheus, given that they already are being scraped right? (not saying they should, just curious) [21:16:53] T194333 has some now stale rough estimates of work for more magnificent logging/metrics/monitoring support for projects. That list has been kicking around for about 6 years now in various forms, so holding ones breath for implementation is not recommended. :) [21:16:53] T194333: [Epic] Provide logging/metrics/monitoring SaaS for Cloud VPS tenants - https://phabricator.wikimedia.org/T194333 [21:19:16] nikkinikk: yes, technically possible for Cloud VPS projects that have been connected to the https://openstack-browser.toolforge.org/project/metricsinfra prometheus service. Most projects are not connected there yet. The data in the grafana dashboard is actually from a graphite service. [21:22:17] nikkinikk: T284993 might be a good task to watch if you are interested in Prometheus for one of your projects. [21:22:17] T284993: Enable self-service Prometheus configuration management for project administrators - https://phabricator.wikimedia.org/T284993 [21:27:55] i think bstorm: was telling me about metricsinfra, how it is being used for alerting, is that the primary reason why a project would be connected to the metricsinfra project? or does it offer more? [21:28:35] Currently, we just have alerting in there...and worse it only goes to WMCS admins [21:28:55] However, it does that through short-term prometheus [21:29:37] We have lovely ambitions to expand that to make it much more useful [21:29:56] Unfortunately, it's not huge right now [21:30:49] ok cool cool, thats all the info I think I wanted for now but i may come back :D [21:31:22] metricsinfra is going to replace the current Diamond/graphite data collection, eventually. It started as a replacement for an older alerting system that was only used by 4-5 projects. The next big lift is figuring out how to scale from watching ~100 instances to all ~850. [22:32:34] !log tools.quickcategories deployed 8ddb2092b2 (Python 3.9, i.e. new venv) [22:32:37] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.quickcategories/SAL [22:43:21] !log tools.quickcategories deployed e8a95d4c04 (background runner speedup) [22:43:23] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.quickcategories/SAL