[10:00:06] !log toolsbeta refreshed jobs-api deployment [10:00:10] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [10:34:12] !log tools refreshed jobs-api deployment [10:34:17] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [10:40:43] !log admin Adding new OSD cloudcephosd1016.eqiad.wmnet to the cluster (T285858) - cookbook ran by dcaro@vulcanus [10:40:49] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [10:41:25] !log admin Adding new OSD cloudcephosd1016.eqiad.wmnet to the cluster (T285858) - cookbook ran by dcaro@vulcanus [10:41:30] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [10:42:45] !log admin Adding new OSD cloudcephosd1016.eqiad.wmnet to the cluster (T285858) - cookbook ran by dcaro@vulcanus [10:42:50] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [10:44:51] !log admin Adding new OSD cloudcephosd1016.eqiad.wmnet to the cluster (T285858) - cookbook ran by dcaro@vulcanus [10:44:58] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [10:45:07] sorry for the spam, this first one will take a few tries [10:45:17] 👍 [10:45:44] you can comment the !dolog line in the script meanwhile [10:47:46] !log admin Adding new OSD cloudcephosd1016.eqiad.wmnet to the cluster (T285858) - cookbook ran by dcaro@vulcanus [10:47:52] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [10:58:53] !log admin rebooting cloudcephosd1016 (T285858) [10:58:59] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [10:58:59] T285858: Install the new ceph osd machines cloudcephosd10(1[6-9]|20) using cookbooks - https://phabricator.wikimedia.org/T285858 [11:04:07] !log tools added toolforge-jobs-framework-cli_1_all.deb to aptly buster-tools,buster-toolsbeta [11:04:12] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [11:13:31] !log admin Adding new OSD cloudcephosd1016.eqiad.wmnet to the cluster (T285858) - cookbook ran by dcaro@vulcanus [11:13:36] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [11:16:17] !log admin Added new OSD node cloudcephosd1016.eqiad.wmnet (T285858) - cookbook ran by dcaro@vulcanus [11:16:23] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [12:04:41] !log paws deploy ingress-nginx 0.46 via the helm chart to paws T264221 [12:04:47] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Paws/SAL [12:04:47] T264221: Upgrade the nginx ingress controller in Toolforge (and likely PAWS) - https://phabricator.wikimedia.org/T264221 [12:43:29] anyone in here willing to participate in T285944 ? [12:43:30] T285944: Toolforge: beta phase for the new jobs framework - https://phabricator.wikimedia.org/T285944 [12:44:11] i.e: we're running a beta phase for the new toolforge workflow for runnings jobs in kubernetes rather than in the grid [12:45:38] cc legoktm Krenair joakino [12:46:51] cc Krinkle [14:15:32] the rebalancing of the first OSD finished now, that is ~3h [14:15:37] adding the rest :) [14:16:58] !log admin Adding new OSDs ['cloudcephosd1017.eqiad.wmnet', 'cloudcephosd1019.eqiad.wmnet', 'cloudcephosd1020.eqiad.wmnet'] to the cluster (T285858) - cookbook ran by dcaro@vulcanus [14:17:02] !log admin Adding OSD cloudcephosd1017.eqiad.wmnet... (1/3) (T285858) - cookbook ran by dcaro@vulcanus [14:17:04] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [14:17:09] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [14:18:07] !log admin Rebooting node cloudcephosd1017.eqiad.wmnet - cookbook ran by dcaro@vulcanus [14:18:10] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [14:20:33] !log admin Adding new OSDs ['cloudcephosd1017.eqiad.wmnet', 'cloudcephosd1019.eqiad.wmnet', 'cloudcephosd1020.eqiad.wmnet'] to the cluster (T285858) - cookbook ran by dcaro@vulcanus [14:20:36] !log admin Adding OSD cloudcephosd1017.eqiad.wmnet... (1/3) (T285858) - cookbook ran by dcaro@vulcanus [14:20:37] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [14:20:39] second try xd [14:20:41] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [14:21:05] !log admin Rebooting node cloudcephosd1017.eqiad.wmnet - cookbook ran by dcaro@vulcanus [14:21:08] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [14:24:09] !log admin Finished rebooting node cloudcephosd1017.eqiad.wmnet - cookbook ran by dcaro@vulcanus [14:24:13] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [14:25:52] !log admin Added OSD cloudcephosd1017.eqiad.wmnet... (1/3) (T285858) - cookbook ran by dcaro@vulcanus [14:25:52] !log admin Adding OSD cloudcephosd1019.eqiad.wmnet... (2/3) (T285858) - cookbook ran by dcaro@vulcanus [14:25:57] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [14:26:01] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [15:47:43] !log tools rebased labs/private.git [15:47:48] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [15:57:01] arturo: I'd like to do the beta. [15:57:55] JJMC89: excellent, read the phab task T285944 and let me know if you have any doubts. Bug reports, feature requests etc should be subtasks [15:57:56] T285944: Toolforge: beta phase for the new jobs framework - https://phabricator.wikimedia.org/T285944 [16:03:06] I'll lok at moving some jobs over to test after work. [16:05:54] 👍 [16:09:07] arturo: thx, I'll take a look. [16:09:46] Krinkle: 👍 [16:18:01] !log admin downtimed cloudstore1008 and cloudstore1009 to fail over T224747 [16:18:07] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [16:18:07] T224747: Move maps and scratch on cloudstore1008/9 to a DRBD failover similar to labstore1004/5 - https://phabricator.wikimedia.org/T224747 [16:27:10] !log admin failed over cloudstore1009 to cloudstore1008 T224747 [16:27:17] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [16:27:17] T224747: Move maps and scratch on cloudstore1008/9 to a DRBD failover similar to labstore1004/5 - https://phabricator.wikimedia.org/T224747 [16:46:53] !log maps rebooted entire project of VMs and things appear mounted T224747 [16:46:58] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Maps/SAL [16:46:59] T224747: Move maps and scratch on cloudstore1008/9 to a DRBD failover similar to labstore1004/5 - https://phabricator.wikimedia.org/T224747 [16:47:22] !log tools remounted scratch everywhere...but mostly tools T224747 [16:47:27] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [16:56:15] arturo: yep!! I'll take a look tonight [16:56:58] legoktm: thanks! [17:01:52] !log toolsbeta updating jobs-framework-api [17:01:55] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [17:03:50] !log tools rebooting tools-k8s-worker-[31,33,35,44,49,51,57-58,70].tools.eqiad1.wikimedia.cloud [17:03:53] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:11:38] Hello, world! My pod of kubernetes does not starting. kubectl pod status: "ContainerCreating". What happened? [17:12:34] Iluvatar: hi! try `kubectl describe pod NAME`, and if that does not help, which tool? [17:14:09] Oops, everything is fine. Started successfully after ~10 minutes with status “containerCreating”. Thanks! [17:14:18] https://t.me/wb9876有兴趣戳蓝色字体联系 [17:14:19] ✈️✈️;@wb9876 [17:15:29] Iluvatar: side note, if you're using raw kubernetes manifests, we have a new tool available for beta testing to provide a much simpler interface to interact with kubernetes: T285944 [17:15:30] T285944: Toolforge: beta phase for the new jobs framework - https://phabricator.wikimedia.org/T285944 [18:04:13] how does toolforge maintain its database replicas? like, how does it copy the live database but strip out certain tables/columns? (this isn't a native feature in mysql right?) [18:04:24] https://t.me/wb9876有兴趣戳蓝色字体联系 [18:04:25] ✈️✈️;@wb9876 [18:06:09] proc: https://wikitech.wikimedia.org/wiki/Labsdb_redaction [18:07:02] interesting. was it difficult to set up? [18:07:31] I wasn't involved but I think so [18:08:04] the main problem is that MediaWiki has various ways of marking things as deleted or hidden and getting that consistent with what MW exposes in the UI took a while [18:08:44] also the replicas just give users a view and not access to the underlying table, so you can't run EXPLAIN, etc. and there are various workarounds to that [18:08:52] https://wikitech.wikimedia.org/wiki/MariaDB/Sanitarium_and_Labsdbs has a bit of info on the tech bits used in the "Sanitarium" layer which gets rid of all the things that are simple to remove. [18:09:45] And then we "view layer" does a lot of the more complicated hiding based on MediaWiki's often complex rules about deleted and oversighted revisions [18:09:45] I feel like https://phabricator.wikimedia.org/T215445 is a good example of things being difficult [18:10:56] the wikireplicas are much much better than the Toolserver's replication, which was using a tool called "trainwreck". if replication broke for too long it would take months to get a new dump imported and replication restarted [18:14:14] jynus did lots and lots of work to make the wiki replicas work well starting back in the 2015-2016 era. and marostegui has done a lot of heroic things to keep them working as the scale got out of hand. The Foundation's DBAs really care about the replicas and put a lot of work into keeping them running. [21:57:09] bstorm: ad scratch, maybe good time to delete `T183758-user-db-archive`? Asked at https://phabricator.wikimedia.org/T183758#7098705, but no one said anything :/ [21:57:10] T183758: Create backups of user tables from decommissioned database servers - https://phabricator.wikimedia.org/T183758 [21:57:52] heh [21:59:15] scratch space is hardly == archival space [21:59:25] for that matter [22:00:23] i think it was given to users "restore your dbs if you want", but probably not meant to live there permanently [22:03:24] bd808: You were active on that task. This won't give back much space on scratch, but it will help. Since the task talks about labsdb1001 and labstore1003, I really think I should delete that. [22:03:35] * bd808 looks [22:07:20] The video2commons stuff is the big cleanup needed. I just don't know which files to delete, and it really only grew so much because of my rsyncing, unfortunately [22:08:36] !log tools releasing webservice 0.75 [22:08:41] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [22:08:57] so... yeah. these are dumps of "public" user tables that were on the 2 generations ago wiki replica servers. I think the whole task was honestly a "what if we decide we need this data in 100 years" wild speculation anyway. I suppose someone could ask the dumps project (not wikimedia dumps, the vps project) if they would like to save the tarball to InternetArchive. [22:10:43] Fair. Would you like to volunteer to do that? I don't have to blow them away right away. Honestly I need more like a couple hundred GB anyway. [22:12:39] yeah, I can ping on the task and see if hydriz or nemo have any desire to permanently save whatever is there. [22:13:29] Thanks :) [22:13:33] That can't hurt. [22:13:49] nemo is already one person I'm asking for cleanups [22:14:47] bstorm: the /data/scratch/video2commons/ssu folder is used to publish uploads for server side uploads (tasks like this one https://phabricator.wikimedia.org/T285682). Things that are months old are probably already processed. [22:15:23] I'm sure that's most of the stuff to clean up [22:15:28] and that would fix it. [22:16:50] * urbanecm recalls some discussions about retention in ssu folder, but i can't find it in phab [22:17:44] There is stuff from April and March in there [22:18:14] I know for a fact that there is more in there now than there was on the source before I rsync'd because they were cleaned up already on the source before my syncs finished [22:18:14] yeah [22:18:46] I wanted to process some more SSU requests... but webproxy.codfw.wmnet doesn't let me to connect to toolforge [22:20:07] `$ wget https://video2commons.toolforge.org/static/ssu/Le_grand_voyage.webm.txt` just hangs :/ [22:21:28] ...and that's because the tool just never responds [22:21:39] hrm. Is the tool broken? [22:21:45] I can restart it [22:22:01] It's entirely likely that when I intentionally broke scratch earlier it broke the tool [22:22:31] trying to load https://video2commons.toolforge.org in my browser just doesn't do anything (connection hangs) [22:22:38] if you could restart it, that'd be great bstorm [22:22:45] Ok :) [22:23:09] That's more in my comfort zone than deleting other people's data [22:23:12] 😁 [22:23:21] hehe [22:23:28] i'm not trying to make you to delete it :) [22:23:49] !log tools.jouncebot Deploying f560830 (Update `help` message) [22:23:53] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.jouncebot/SAL [22:24:25] !log tools.video2commons restarting webservice as it appears to be hung (by deleting the pod) [22:24:27] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.video2commons/SAL [22:24:51] urbanecm: it looks better now [22:24:58] yup, my wget now downloads something [22:24:59] thanks :) [23:37:38] !log tools.lexeme-forms unlink ~/services.template # new version of webservice doesn’t like the symlink :( [23:37:42] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.lexeme-forms/SAL [23:37:56] !log tools.lexeme-forms deployed ac8779515d (l10n updates) [23:37:59] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.lexeme-forms/SAL