[01:09:50] <icinga-wm>	 PROBLEM - MariaDB sustained replica lag on m1 on db1217 is CRITICAL: 4.6 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1217&var-port=13321
[01:11:14] <icinga-wm>	 RECOVERY - MariaDB sustained replica lag on m1 on db1217 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1217&var-port=13321
[13:48:20] <moritzm>	 urandom: something's broken with restbase2012, I noticed that debdeploy completely hung when rolling out updates to Buster hosts
[13:48:25] <moritzm>	 repro case on the host itself:
[13:48:36] <moritzm>	 sudo debdeploy-restarts --libname libXpm
[13:49:09] <moritzm>	 I noticed there's a kernel oops logged in syslog, which is likely the culprit forthe I/O hang
[13:50:04] <moritzm>	 it seems the host is an ongoing decom for https://phabricator.wikimedia.org/T328490, not sure if the decom is completed to the pount that it can be removed? otherwise we might need a reboot/powercycle to bring it fully back
[13:55:32] <marostegui>	 we are in a meeting at the moment moritzm 
[13:58:24] <urandom>	 moritzm: oh, it's being decommissioned...I wonder if that's why
[13:58:36] <urandom>	 maybe puppet isn't running to completion?
[13:58:47] <urandom>	 I'll have a look
[13:59:08] <urandom>	 https://phabricator.wikimedia.org/T328490
[14:00:15] <urandom>	 the cassandra instances have already been decommissioned, and the units prevented from starting. not sure why that would cause debdeploy to hang tho...
[14:01:25] <urandom>	 gah... irccloud
[14:08:58] <moritzm>	 in a meeting myself now. it's hanging on the system level somehow, i/o is stalled
[14:09:30] <moritzm>	 if cassandra is decommed, can we go ahead and take out the host entirely?
[14:09:37] <urandom>	 moritzm: yes
[15:14:41] <urandom>	 moritzm: https://phabricator.wikimedia.org/T349526
[15:33:51] <moritzm>	 ack, thanks