[13:22:32] !log wmcs Draining node cloudvirt1012.eqiad.wmnet... - cookbook ran by michael@mouse [13:22:33] wm-bot: Unknown project "wmcs" [13:33:26] !log puppet-diffs quota bump to 200G (+120G) T297594 [13:33:28] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Puppet-diffs/SAL [13:33:29] T297594: Request increased quota for puppet-diffs Cloud VPS project - https://phabricator.wikimedia.org/T297594 [15:15:37] !log puppet-diffs pcc-worker1002 up and running, rebuilding compiler1001 (T297356) [15:15:39] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Puppet-diffs/SAL [15:15:40] T297356: [pcc] Release the latest version - https://phabricator.wikimedia.org/T297356 [16:43:50] !log admin Setting cloudvirt 'cloudvirt1012.eqiad.wmnet' maintenance. - cookbook ran by michael@mouse [16:43:52] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [16:44:39] !log admin Set cloudvirt 'cloudvirt1012.eqiad.wmnet' maintenance. - cookbook ran by michael@mouse [16:44:41] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [16:47:47] !log admin Draining 'cloudvirt1012.eqiad.wmnet'. - cookbook ran by michael@mouse [16:47:47] !log admin Setting cloudvirt 'cloudvirt1012.eqiad.wmnet' maintenance. - cookbook ran by michael@mouse [16:47:49] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [16:47:50] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [16:50:43] !log admin Set cloudvirt 'cloudvirt1012.eqiad.wmnet' maintenance. - cookbook ran by michael@mouse [16:50:46] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [17:13:39] !log admin Drained 'cloudvirt1012.eqiad.wmnet'. - cookbook ran by michael@mouse [17:13:42] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [17:30:02] !log admin Draining 'cloudvirt1013.eqiad.wmnet'. - cookbook ran by michael@mouse [17:30:02] !log admin Setting cloudvirt 'cloudvirt1013.eqiad.wmnet' maintenance. - cookbook ran by michael@mouse [17:30:05] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [17:30:06] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [17:30:52] !log admin Set cloudvirt 'cloudvirt1013.eqiad.wmnet' maintenance. - cookbook ran by michael@mouse [17:30:53] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [17:44:05] !log admin Drained 'cloudvirt1013.eqiad.wmnet'. - cookbook ran by michael@mouse [17:44:08] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [17:46:17] We have an instance in WMCS, wcqs-beta-01 in the wikidata-query project, that transitioned itself to the powered-down state and when i attempt to start the instance (as projectadmin) it says I don't have appropriate rights. Could someone with wmcs admin attempt to start the instance? [17:46:52] ebernhardson: can you open a task? I can try to have a look [17:47:14] dcaro: https://phabricator.wikimedia.org/T297454 [17:47:54] the VM is in error state [17:48:02] https://www.irccloud.com/pastebin/mj32UtlH/ [17:48:22] "no space left on device"? [17:48:37] and that shuts down the instance and prevents it from starting [17:48:39] ? [17:49:34] :-/ [17:49:38] !log admin Draining 'cloudvirt1014.eqiad.wmnet'. - cookbook ran by michael@mouse [17:49:38] !log admin Setting cloudvirt 'cloudvirt1014.eqiad.wmnet' maintenance. - cookbook ran by michael@mouse [17:49:40] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [17:49:41] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [17:49:42] i suppose it could have filled it's root partition somehow, can look but only once it's on :) [17:50:26] !log admin Set cloudvirt 'cloudvirt1014.eqiad.wmnet' maintenance. - cookbook ran by michael@mouse [17:50:28] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [17:52:02] ebernhardson: did you do any special operation on 2021-12-10T19:19:35Z? or any other folks? [17:52:40] it's the actual cloudvirt that has filled up its disk [17:52:41] taavi@cloudvirt-wdqs1001 ~ $ df -h /var/lib/nova/instances/ [17:52:41] Filesystem Size Used Avail Use% Mounted on [17:52:51] /dev/mapper/tank-data 3.3T 3.3T 20K 100% /var/lib/nova/instances [17:53:01] oh! thats even odder [17:53:09] arturo: nothing that i know of [17:53:14] oh right, that makes a lot of sense, [17:53:24] i'm pretty sure last week or two when i looked the instance inside was only using ~25% of the disk [17:53:29] fun :( [17:53:48] majavah: would you like to update the ticket? [17:54:57] bock from a meeting [17:54:59] I added a comment with that [17:56:42] arturo: actually, 12-10T19:19 was when i first noticed things wrong and started poking it in horizon [17:56:56] (that comes about ~30 minutes before i posted in the ticket, seems right) [17:57:21] the instance itself stopped responding a day earlier [17:58:45] I guess my next questions are "why did nova schedule that instance there in the first place?" and "why was this not caught by any of our monitoring systems?" [17:59:03] majavah: that is a dedicated hardware we installed to WMCS that only runs 1 VM [17:59:06] I'm guessing the first one is explained by the special hypervisors [17:59:42] basically, it needed TB of disk, 128G memory and a few cores and we were told regular wmcs cant give that :) [18:00:27] it's using 3.2 TB: 3.2T /var/lib/nova/instances/87acbb5a-ddac-457a-9fab-19b4f8af7916/disk [18:01:48] Horizon says it has 3400G disk, but that drive only has 3.2T [18:02:13] there's also one of our hypervisor canary VMs running, taking 17G of space [18:02:22] that does not account for 20G of root disk, + 20G of canary instance [18:02:41] yeah :-( [18:03:23] so probably the quota calculation was a bit too tight [18:03:31] hmmm [18:04:07] worst case, the instance is recreatable but will take a few days to load data. Would rather not if possible though :) [18:04:45] I think that the only way of getting this through, might be messing up with the disk, and maybe shrinking it (possible data loss!) [18:05:15] if the disk inside is full, it must have like 2TB of log's or some other spam, the actual data is < 1TB [18:05:23] probably fine [18:05:25] I can try to create a bit of space so the VM might start up [18:06:07] ebernhardson: do you need it today? (it's the end of my shift) [18:06:25] dcaro: no it'll be fine, we communicate that this is a beta service and not always available [18:07:12] then I'll try to have a look at it tomorrow morning (unless anyone beats me to it xd) [18:07:44] kk, thanks! [18:08:04] !log admin Drained 'cloudvirt1014.eqiad.wmnet'. - cookbook ran by michael@mouse [18:08:07] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [18:09:07] Added a note