[10:53:33] !log clouddb-services restarting mariadb in clouddb1001 to apply minor version upgrade (T328273) [10:53:36] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Clouddb-services/SAL [10:53:36] T328273: apt error in clouddb1001 - https://phabricator.wikimedia.org/T328273 [11:02:03] !log clouddb-services 'SET GLOBAL read_only = 0;' after restarting mariadb in clouddb1001 (T328273) [11:02:06] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Clouddb-services/SAL [11:02:06] T328273: apt error in clouddb1001 - https://phabricator.wikimedia.org/T328273 [11:53:26] !log paws updated ingress-nginx to allow larger file (more than 800K) uploads T328168 [11:53:29] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Paws/SAL [11:53:29] T328168: Receiving 413 error codes when trying to save notebook or upload certain files - https://phabricator.wikimedia.org/T328168 [13:16:15] I set up a toolforge cron job to run last night. [13:16:22] Job name: Job type: Status: [13:16:23] ----------- -------------------- ---------------------------------------- [13:16:23] dykbot-cron schedule: 58 * * * * Last schedule time: 2023-01-30T03:58:00Z [13:16:44] it never got scheduled. [13:16:48] any idea why? [13:17:27] or maybe it got scheduled but never ran [13:17:28] which tool? [13:17:41] tools.dyk-tools [13:18:43] I set it up with: [13:18:45] run dykbot-cron --command /data/project/dyk-tools/www/python/src/dyk_tools/bot/dykbot.bash --image tf-python39 --schedule "58 * * * *" [13:19:08] you seem to have open `webservice shell` sessions which are consuming the quota and blocking the cron pod from starting [13:19:41] 2 of the 3 pods are over a month old, so I wonder if those got somehow left running when they were supposed to be closed [13:20:01] I can delete them if you're ok with that [13:20:19] I have one open shell right now that I'm using to monitor progress. [13:20:36] And there's also a webservice running, so that consumes a pod I guess [13:21:00] how do I get a list of them all? [13:21:07] yeah I see both of those, and in addition two extra shell pods I guess are a result of some bug in webservice? [13:21:09] `kubectl get pod` [13:22:06] that just gives me pages and pages of golang stack traces [13:22:31] runtime: failed to create new OS thread (have 45 already; errno=11) [13:22:31] runtime: may need to increase max user processes (ulimit -u) [13:22:31] fatal error: newosproc [13:22:53] OK, let me close my monitoring window. [13:23:42] I see these two: [13:23:43] shell-1671408475 1/1 Running 0 42d [13:23:43] shell-1671561668 1/1 Running 0 40d [13:24:03] I have no clude what those are. [13:24:12] the third one: [13:24:13] dyk-tools-85697b8f47-2r7tt 1/1 Running 0 9h [13:24:18] I assume is my webservice [13:24:48] yes, correct [13:24:49] how do I get rid of those two old ones? [13:25:10] I deleted them for you, but you can also use `kubectl delete pod ` [13:27:26] It's sub-optimal that there's no user-visible error report back to the user when a cron job fails to start. [13:29:31] I agree. I was already poking at jobs error messages last week, apparently I missed this failure mode [13:32:16] Interesting; it looks like as soon as you killed the hung pods, the pending cron job started to run. [13:51:53] !log paws Set 1 to = 1 for nfs mounts and variables 09b036eb9dab09df39e9064cad4d18bc0db1b763 T326675 [13:51:55] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Paws/SAL [13:51:56] T326675: Set 1 to = 1 for nfs names in PAWS - https://phabricator.wikimedia.org/T326675 [15:52:54] hello! whenever a new table is created on a wiki, we need to manually re-run the maintain-views.py script, right? [15:53:11] (assuming the table is listed in that script) [15:53:43] i.e. it won't magically show up on its own eventually? this is for https://phabricator.wikimedia.org/T326387 and https://phabricator.wikimedia.org/T328224 [16:08:50] musikanimal: I think that's correct, although lately the data-engineering folks have been keeping on top of that [16:09:41] okay, I'll ping them once both of those tasks are resolved. Thanks! [19:47:32] How long does a dump mirror have to be dead before we remove them from the list? [21:34:36] !log admin merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/884922 and upgrading rabbitmq nodes for T328155 [21:34:40] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [21:34:41] T328155: nova-api service flaps - https://phabricator.wikimedia.org/T328155 [22:15:46] @harej: that sounds like a great question for the dumps mailing list or maybe just a task in https://phabricator.wikimedia.org/project/view/1519/ about the specific mirror that you think is dead. [22:16:17] T260223 is the only dumps rsync problem I know about right now [22:16:17] T260223: Kiwix rsyncs not completing and stacking up on Clouddumps1001,2 - https://phabricator.wikimedia.org/T260223 [22:16:52] I’m referring to the two here listed as stalled or defunct: https://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps [22:24:56] @harej: *nod* I guess I would personally differ to apergos about when to remove things from that list. It looks like Count Count added the indications in https://meta.wikimedia.org/w/index.php?title=Mirroring_Wikimedia_project_XML_dumps&diff=prev&oldid=20335022. I don't know if there were any other tasks to look into what's wrong. [22:30:39] I just want to make sure I'm not missing something here [22:31:16] I want to set up a public demo of an unmerged wikidiff2 topic branch [22:32:36] I gather this is impossible (or very hacky/difficult) in toolforge [22:34:31] my plan was to compile a special wikidiff2.so and then run php -d extension=./wikidiff2.so as a job [22:35:05] TimStarling I take it https://patchdemo.wmflabs.org/ wouldn't work? [22:35:11] however there are no PHP dev packages in the available containers [22:37:50] can you locally compile the wikidiff2.so in a Debian container and scp it to toolforge? [22:40:16] wm-bb: maybe but I had that down as hacky/difficult [22:40:31] ok, fair [22:41:47] kindrobot: doesn't look like it [22:41:53] You might have better luck with a CloudVPS instance. [22:41:54] TimStarling: I don't know that we ever had anyone come asking for an ability to compile PHP in Toolforge, so it's not totally surprising that we don't have a container setup for that. [22:42:39] I think that kindrobot is correct that the fastest fix today would be a VPS instance instead of Toolforge. [22:42:54] or self-host [22:43:02] * bd808 looks to see what project he could offer up as tribute.... [22:43:59] Could you do it locally with docker-compose? [22:47:03] > public demo [22:47:19] TimStarling: I would be happy to promote you to admin in the testlabs project if you want a place to setup an instance for a while. Or we could even setup a whole project for wikidiff2 testing. [22:48:24] IIRC there used to be a wikidiff2 project when WMDE was working on it, (re)creating a project seems pretty reasonable. [22:48:32] there's 180 projects so I figured projects were pretty cheap, but can a project be deleted when I'm done with this in a month? [22:49:03] yeah, I think a.ndrewbogott has cleanup all scripted [22:49:19] (https://wikitech.wikimedia.org/wiki/News/Cloud_VPS_2019_Purge#In_use_wikidiff2-wmde-dev was the old project) [22:50:59] You would be able to delete all the instances yourself too, and a project with no instances is really cheap (like a couple of database rows cheap) [22:52:05] if I want a project do I need to wait for the WMCS weekly meeting? [22:52:16] that's what https://phabricator.wikimedia.org/project/view/2875/ says [22:52:48] TimStarling: if you take up bd808's offer you'll have access to an existing project [22:52:56] we have a webperf instance where you can create an instance right away and attach a wmcloud.org subdomain webproxy to it [22:53:03] webperf project* [22:53:26] https://openstack-browser.toolforge.org/project/webperf [22:54:51] TimStarling: I know we changed to do quota requests with only 2 admins approval. I honestly don't remember if we decided that was ok for new project too or not. andrewbogott, do you remember? [22:55:31] (all the waiting stuff is really just about trying to make sure we don't give away more compute + storage than we actually have.) [22:56:21] yeah, same thing for a new project. And I'm here now so we can approve immediately if there's a ticket. [22:57:13] ok, so would you prefer to have a project for this, or just an instance in webperf like Krinkle says? [22:58:00] either is perfectly fine [22:58:23] ok, I'll make an instance under webperf [22:58:37] that's easy! lmk if you need a quota bump to make room. [22:58:55] and by lmk I mean, open a ticket and then lmk :) [22:59:00] TimStarling: under secuity group, pick 'web' and then attach webproxy under "Network" [22:59:12] (the former when creating the instance, is easiest) [23:00:53] We fixed things quite a while ago I think so that it is possible to add new security groups to existing instances. [23:01:15] s/we/upstream/ probably [23:01:27] yeah, should work and yeah, it was upstream [23:08:52] instance is created, couldn't see any webproxy options in horizon though [23:09:07] it's under the 'dns' tab group [23:09:15] which isn't quite right but was as right as I could get it [23:10:14] ok [23:11:59] thanks everyone [23:15:19] andrewbogott: btw, feel free to drop the 'cvn' quota back (floating IPS 4->2). ref T306066 now closed [23:15:20] T306066: Cloud VPS "cvn" project Stretch deprecation - https://phabricator.wikimedia.org/T306066 [23:15:56] awesome, you deleted those archived VMs? [23:31:57] it's available in upstream [23:41:05] seems to work: https://wikidiff2-demo.wmcloud.org/demo.php [23:46:10] TimStarling: excellent! Is there anything I can help with before I go cook dinner? [23:46:37] nope, thanks andrewbogott [23:47:15] great!