[01:09:50] PROBLEM - MariaDB sustained replica lag on m1 on db1217 is CRITICAL: 4.6 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1217&var-port=13321 [01:11:14] RECOVERY - MariaDB sustained replica lag on m1 on db1217 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1217&var-port=13321 [13:48:20] urandom: something's broken with restbase2012, I noticed that debdeploy completely hung when rolling out updates to Buster hosts [13:48:25] repro case on the host itself: [13:48:36] sudo debdeploy-restarts --libname libXpm [13:49:09] I noticed there's a kernel oops logged in syslog, which is likely the culprit forthe I/O hang [13:50:04] it seems the host is an ongoing decom for https://phabricator.wikimedia.org/T328490, not sure if the decom is completed to the pount that it can be removed? otherwise we might need a reboot/powercycle to bring it fully back [13:55:32] we are in a meeting at the moment moritzm [13:58:24] moritzm: oh, it's being decommissioned...I wonder if that's why [13:58:36] maybe puppet isn't running to completion? [13:58:47] I'll have a look [13:59:08] https://phabricator.wikimedia.org/T328490 [14:00:15] the cassandra instances have already been decommissioned, and the units prevented from starting. not sure why that would cause debdeploy to hang tho... [14:01:25] gah... irccloud [14:08:58] in a meeting myself now. it's hanging on the system level somehow, i/o is stalled [14:09:30] if cassandra is decommed, can we go ahead and take out the host entirely? [14:09:37] moritzm: yes [15:14:41] moritzm: https://phabricator.wikimedia.org/T349526 [15:33:51] ack, thanks