[10:00:06] <arturo>	 !log toolsbeta refreshed jobs-api deployment
[10:00:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
[10:34:12] <arturo>	 !log tools refreshed jobs-api deployment
[10:34:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
[10:40:43] <wm-bot>	 !log admin Adding new OSD cloudcephosd1016.eqiad.wmnet to the cluster (T285858) - cookbook ran by dcaro@vulcanus
[10:40:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[10:41:25] <wm-bot>	 !log admin Adding new OSD cloudcephosd1016.eqiad.wmnet to the cluster (T285858) - cookbook ran by dcaro@vulcanus
[10:41:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[10:42:45] <wm-bot>	 !log admin Adding new OSD cloudcephosd1016.eqiad.wmnet to the cluster (T285858) - cookbook ran by dcaro@vulcanus
[10:42:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[10:44:51] <wm-bot>	 !log admin Adding new OSD cloudcephosd1016.eqiad.wmnet to the cluster (T285858) - cookbook ran by dcaro@vulcanus
[10:44:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[10:45:07] <dcaro>	 sorry for the spam, this first one will take a few tries
[10:45:17] <arturo>	 👍
[10:45:44] <arturo>	 you can comment the !dolog line in the script meanwhile
[10:47:46] <wm-bot>	 !log admin Adding new OSD cloudcephosd1016.eqiad.wmnet to the cluster (T285858) - cookbook ran by dcaro@vulcanus
[10:47:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[10:58:53] <dcaro>	 !log admin rebooting cloudcephosd1016 (T285858)
[10:58:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[10:58:59] <stashbot>	 T285858: Install the new ceph osd machines cloudcephosd10(1[6-9]|20) using cookbooks - https://phabricator.wikimedia.org/T285858
[11:04:07] <arturo>	 !log tools added toolforge-jobs-framework-cli_1_all.deb to aptly buster-tools,buster-toolsbeta
[11:04:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
[11:13:31] <wm-bot>	 !log admin Adding new OSD cloudcephosd1016.eqiad.wmnet to the cluster (T285858) - cookbook ran by dcaro@vulcanus
[11:13:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[11:16:17] <wm-bot>	 !log admin Added new OSD node cloudcephosd1016.eqiad.wmnet (T285858) - cookbook ran by dcaro@vulcanus
[11:16:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[12:04:41] <majavah>	 !log paws deploy ingress-nginx 0.46 via the helm chart to paws T264221
[12:04:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Paws/SAL
[12:04:47] <stashbot>	 T264221: Upgrade the nginx ingress controller in Toolforge (and likely PAWS) - https://phabricator.wikimedia.org/T264221
[12:43:29] <arturo>	 anyone in here willing to participate in T285944 ?
[12:43:30] <stashbot>	 T285944: Toolforge: beta phase for the new jobs framework - https://phabricator.wikimedia.org/T285944
[12:44:11] <arturo>	 i.e: we're running a beta phase for the new toolforge workflow for runnings jobs in kubernetes rather than in the grid
[12:45:38] <arturo>	 cc legoktm Krenair joakino 
[12:46:51] <arturo>	 cc Krinkle 
[14:15:32] <dcaro>	 the rebalancing of the first OSD finished now, that is ~3h
[14:15:37] <dcaro>	 adding the rest :)
[14:16:58] <wm-bot>	 !log admin Adding new OSDs ['cloudcephosd1017.eqiad.wmnet', 'cloudcephosd1019.eqiad.wmnet', 'cloudcephosd1020.eqiad.wmnet'] to the cluster (T285858) - cookbook ran by dcaro@vulcanus
[14:17:02] <wm-bot>	 !log admin   Adding OSD cloudcephosd1017.eqiad.wmnet... (1/3) (T285858) - cookbook ran by dcaro@vulcanus
[14:17:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[14:17:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[14:18:07] <wm-bot>	 !log admin Rebooting node cloudcephosd1017.eqiad.wmnet - cookbook ran by dcaro@vulcanus
[14:18:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[14:20:33] <wm-bot>	 !log admin Adding new OSDs ['cloudcephosd1017.eqiad.wmnet', 'cloudcephosd1019.eqiad.wmnet', 'cloudcephosd1020.eqiad.wmnet'] to the cluster (T285858) - cookbook ran by dcaro@vulcanus
[14:20:36] <wm-bot>	 !log admin   Adding OSD cloudcephosd1017.eqiad.wmnet... (1/3) (T285858) - cookbook ran by dcaro@vulcanus
[14:20:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[14:20:39] <dcaro>	 second try xd
[14:20:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[14:21:05] <wm-bot>	 !log admin Rebooting node cloudcephosd1017.eqiad.wmnet - cookbook ran by dcaro@vulcanus
[14:21:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[14:24:09] <wm-bot>	 !log admin Finished rebooting node cloudcephosd1017.eqiad.wmnet - cookbook ran by dcaro@vulcanus
[14:24:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[14:25:52] <wm-bot>	 !log admin   Added OSD cloudcephosd1017.eqiad.wmnet... (1/3) (T285858) - cookbook ran by dcaro@vulcanus
[14:25:52] <wm-bot>	 !log admin   Adding OSD cloudcephosd1019.eqiad.wmnet... (2/3) (T285858) - cookbook ran by dcaro@vulcanus
[14:25:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[14:26:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[15:47:43] <arturo>	 !log tools rebased labs/private.git
[15:47:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
[15:57:01] <JJMC89>	 arturo: I'd like to do the beta.
[15:57:55] <arturo>	 JJMC89: excellent, read the phab task T285944 and let me know if you have any doubts. Bug reports, feature requests etc should be subtasks
[15:57:56] <stashbot>	 T285944: Toolforge: beta phase for the new jobs framework - https://phabricator.wikimedia.org/T285944
[16:03:06] <JJMC89>	 I'll lok at moving some jobs over to test after work.
[16:05:54] <arturo>	 👍
[16:09:07] <Krinkle>	 arturo: thx, I'll take a look.
[16:09:46] <arturo>	 Krinkle: 👍
[16:18:01] <bstorm>	 !log admin downtimed cloudstore1008 and cloudstore1009 to fail over T224747
[16:18:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[16:18:07] <stashbot>	 T224747: Move maps and scratch on cloudstore1008/9 to a DRBD failover similar to labstore1004/5 - https://phabricator.wikimedia.org/T224747
[16:27:10] <bstorm>	 !log admin failed over cloudstore1009 to cloudstore1008 T224747
[16:27:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[16:27:17] <stashbot>	 T224747: Move maps and scratch on cloudstore1008/9 to a DRBD failover similar to labstore1004/5 - https://phabricator.wikimedia.org/T224747
[16:46:53] <bstorm>	 !log maps rebooted entire project of VMs and things appear mounted T224747
[16:46:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Maps/SAL
[16:46:59] <stashbot>	 T224747: Move maps and scratch on cloudstore1008/9 to a DRBD failover similar to labstore1004/5 - https://phabricator.wikimedia.org/T224747
[16:47:22] <bstorm>	 !log tools remounted scratch everywhere...but mostly tools T224747
[16:47:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
[16:56:15] <legoktm>	 arturo: yep!! I'll take a look tonight
[16:56:58] <arturo>	 legoktm: thanks!
[17:01:52] <majavah>	 !log toolsbeta updating jobs-framework-api
[17:01:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
[17:03:50] <andrewbogott>	 !log tools rebooting tools-k8s-worker-[31,33,35,44,49,51,57-58,70].tools.eqiad1.wikimedia.cloud
[17:03:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
[17:11:38] <Iluvatar>	 Hello, world! My pod of kubernetes does not starting. kubectl pod status: "ContainerCreating". What happened?
[17:12:34] <majavah>	 Iluvatar: hi! try `kubectl describe pod NAME`, and if that does not help, which tool?
[17:14:09] <Iluvatar>	 Oops, everything is fine. Started successfully after ~10 minutes with status “containerCreating”. Thanks!
[17:14:18] <wm-bb>	 <utx> https://t.me/wb9876有兴趣戳蓝色字体联系
[17:14:19] <wm-bb>	 <utx> ✈️✈️；@wb9876
[17:15:29] <majavah>	 Iluvatar: side note, if you're using raw kubernetes manifests, we have a new tool available for beta testing to provide a much simpler interface to interact with kubernetes: T285944
[17:15:30] <stashbot>	 T285944: Toolforge: beta phase for the new jobs framework - https://phabricator.wikimedia.org/T285944
[18:04:13] <proc>	 how does toolforge maintain its database replicas? like, how does it copy the live database but strip out certain tables/columns? (this isn't a native feature in mysql right?)
[18:04:24] <wm-bb>	 <utx> https://t.me/wb9876有兴趣戳蓝色字体联系
[18:04:25] <wm-bb>	 <utx> ✈️✈️；@wb9876
[18:06:09] <legoktm>	 proc: https://wikitech.wikimedia.org/wiki/Labsdb_redaction
[18:07:02] <proc>	 interesting. was it difficult to set up?
[18:07:31] <legoktm>	 I wasn't involved but I think so
[18:08:04] <legoktm>	 the main problem is that MediaWiki has various ways of marking things as deleted or hidden and getting that consistent with what MW exposes in the UI took a while
[18:08:44] <legoktm>	 also the replicas just give users a view and not access to the underlying table, so you can't run EXPLAIN, etc. and there are various workarounds to that
[18:08:52] <bd808>	 https://wikitech.wikimedia.org/wiki/MariaDB/Sanitarium_and_Labsdbs has a bit of info on the tech bits used in the "Sanitarium" layer which gets rid of all the things that are simple to remove.
[18:09:45] <bd808>	 And then we "view layer" does a lot of the more complicated hiding based on MediaWiki's often complex rules about deleted and oversighted revisions
[18:09:45] <legoktm>	 I feel like https://phabricator.wikimedia.org/T215445 is a good example of things being difficult
[18:10:56] <legoktm>	 the wikireplicas are much much better than the Toolserver's replication, which was using a tool called "trainwreck". if replication broke for too long it would take months to get a new dump imported and replication restarted
[18:14:14] <bd808>	 jynus did lots and lots of work to make the wiki replicas work well starting back in the 2015-2016 era. and marostegui has done a lot of heroic things to keep them working as the scale got out of hand. The Foundation's DBAs really care about the replicas and put a lot of work into keeping them running.
[21:57:09] <urbanecm>	 bstorm: ad scratch, maybe good time to delete `T183758-user-db-archive`? Asked at https://phabricator.wikimedia.org/T183758#7098705, but no one said anything :/
[21:57:10] <stashbot>	 T183758: Create backups of user tables from decommissioned database servers - https://phabricator.wikimedia.org/T183758
[21:57:52] <bstorm>	 heh
[21:59:15] <bstorm>	 scratch space is hardly == archival space
[21:59:25] <bstorm>	 for that matter
[22:00:23] <urbanecm>	 i think it was given to users "restore your dbs if you want", but probably not meant to live there permanently
[22:03:24] <bstorm>	 bd808: You were active on that task. This won't give back much space on scratch, but it will help. Since the task talks about labsdb1001 and labstore1003, I really think I should delete that.
[22:03:35] * bd808 looks
[22:07:20] <bstorm>	 The video2commons stuff is the big cleanup needed. I just don't know which files to delete, and it really only grew so much because of my rsyncing, unfortunately
[22:08:36] <bstorm>	 !log tools releasing webservice 0.75
[22:08:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
[22:08:57] <bd808>	 so... yeah. these are dumps of "public" user tables that were on the 2 generations ago wiki replica servers. I think the whole task was honestly a "what if we decide we need this data in 100 years" wild speculation anyway. I suppose someone could ask the dumps project (not wikimedia dumps, the vps project) if they would like to save the tarball to InternetArchive.
[22:10:43] <bstorm>	 Fair. Would you like to volunteer to do that? I don't have to blow them away right away. Honestly I need more like a couple hundred GB anyway.
[22:12:39] <bd808>	 yeah, I can ping on the task and see if hydriz or nemo have any desire to permanently save whatever is there.
[22:13:29] <bstorm>	 Thanks :)
[22:13:33] <bstorm>	 That can't hurt.
[22:13:49] <bstorm>	 nemo is already one person I'm asking for cleanups
[22:14:47] <urbanecm>	 bstorm: the /data/scratch/video2commons/ssu folder is used to publish uploads for server side uploads (tasks like this one https://phabricator.wikimedia.org/T285682). Things that are months old are probably already processed.
[22:15:23] <bstorm>	 I'm sure that's most of the stuff to clean up
[22:15:28] <bstorm>	 and that would fix it.
[22:16:50] * urbanecm recalls some discussions about retention in ssu folder, but i can't find it in phab
[22:17:44] <bstorm>	 There is stuff from April and March in there
[22:18:14] <bstorm>	 I know for a fact that there is more in there now than there was on the source before I rsync'd because they were cleaned up already on the source before my syncs finished
[22:18:14] <urbanecm>	 yeah
[22:18:46] <urbanecm>	 I wanted to process some more SSU requests... but webproxy.codfw.wmnet doesn't let me to connect to toolforge
[22:20:07] <urbanecm>	 `$ wget https://video2commons.toolforge.org/static/ssu/Le_grand_voyage.webm.txt` just hangs :/
[22:21:28] <urbanecm>	 ...and that's because the tool just never responds
[22:21:39] <bstorm>	 hrm. Is the tool broken?
[22:21:45] <bstorm>	 I can restart it
[22:22:01] <bstorm>	 It's entirely likely that when I intentionally broke scratch earlier it broke the tool
[22:22:31] <urbanecm>	 trying to load https://video2commons.toolforge.org in my browser just doesn't do anything (connection hangs)
[22:22:38] <urbanecm>	 if you could restart it, that'd be great bstorm 
[22:22:45] <bstorm>	 Ok :)
[22:23:09] <bstorm>	 That's more in my comfort zone than deleting other people's data
[22:23:12] <bstorm>	 😁
[22:23:21] <urbanecm>	 hehe
[22:23:28] <urbanecm>	 i'm not trying to make you to delete it :)
[22:23:49] <wm-bot>	 !log tools.jouncebot <bd808> Deploying f560830 (Update `help` message)
[22:23:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.jouncebot/SAL
[22:24:25] <bstorm>	 !log tools.video2commons restarting webservice as it appears to be hung (by deleting the pod)
[22:24:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.video2commons/SAL
[22:24:51] <bstorm>	 urbanecm: it looks better now
[22:24:58] <urbanecm>	 yup, my wget now downloads something
[22:24:59] <urbanecm>	 thanks :)
[23:37:38] <wm-bot>	 !log tools.lexeme-forms <lucaswerkmeister> unlink ~/services.template # new version of webservice doesn’t like the symlink :(
[23:37:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.lexeme-forms/SAL
[23:37:56] <wm-bot>	 !log tools.lexeme-forms <lucaswerkmeister> deployed ac8779515d (l10n updates)
[23:37:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.lexeme-forms/SAL