[07:09:41] 10SRE-tools, 10DBA, 10Infrastructure-Foundations, 10Recommendation-API, and 2 others: Failover m2 master (db1107) to a different host to upgrade its kernel - https://phabricator.wikimedia.org/T287852 (10Marostegui) db1183 is now up and replicating from db1107 [09:33:52] 10SRE-tools, 10DBA, 10Infrastructure-Foundations, 10Recommendation-API, and 2 others: Failover m2 master (db1107) to a different host to upgrade its kernel - https://phabricator.wikimedia.org/T287852 (10hnowlan) No objection for sockpuppet, thanks! [09:39:22] 10SRE-tools, 10DBA, 10Infrastructure-Foundations, 10Recommendation-API, and 2 others: Failover m2 master (db1107) to a different host to upgrade its kernel - https://phabricator.wikimedia.org/T287852 (10Marostegui) Thank you all for the fast replies! [10:29:09] 10SRE-tools, 10DBA, 10Infrastructure-Foundations, 10Recommendation-API, and 3 others: Failover m2 master (db1107) to a different host to upgrade its kernel - https://phabricator.wikimedia.org/T287852 (10Marostegui) [14:03:44] 10SRE-tools, 10Infrastructure-Foundations, 10Orchestrator: Add database host removal from Orchestrator to sre.hosts.decommission cookbook - https://phabricator.wikimedia.org/T287954 (10LSobanski) [14:19:06] 10SRE-tools, 10Infrastructure-Foundations, 10Orchestrator: Add database host removal from Orchestrator to sre.hosts.decommission cookbook - https://phabricator.wikimedia.org/T287954 (10MoritzMuehlenhoff) How about we create a mechanism similar to the logout.d scripts, but for decom? Let's say we create a new... [14:30:56] 10SRE-tools, 10Infrastructure-Foundations, 10Orchestrator: Add database host removal from Orchestrator to sre.hosts.decommission cookbook - https://phabricator.wikimedia.org/T287954 (10Volans) @MoritzMuehlenhoff 's proposal is certainly a neat option but I have a couple of worries, namely: - it might be hard... [14:40:41] 10SRE-tools, 10Infrastructure-Foundations, 10Orchestrator: Add database host removal from Orchestrator to sre.hosts.decommission cookbook - https://phabricator.wikimedia.org/T287954 (10Kormat) >>! In T287954#7255698, @Volans wrote: > - some "decom" actions should be performed from a central host instead of t... [15:08:09] hey, wI found a weird issue, I was told maybe you would be interested and tell me if worth researching more or not [15:08:22] *interesting [15:08:59] which checking I was formatting the right partition on some backup hosts, I noticed some sofware raid being desynced [15:09:40] meaning it was in read only mode [15:10:12] shows as "resync=PENDING" on mdstat [15:10:16] *shown [15:11:19] that looks like a bad state to be in (if I undertood it correctly, mirroring was not really enabled at that time) but got no alerts [15:11:47] I forced read write and the sync resumed [15:12:04] this happened on 5 out of 8 new backup servers we recently provisioned [15:12:51] my worry is if this is as bad as it seems, and if it could be happening on other hosts? [15:21:50] 10Puppet, 10Infrastructure-Foundations, 10SRE Observability, 10User-jbond: puppetdb Investigate the expected bahaviour of the edges table - https://phabricator.wikimedia.org/T287673 (10lmata) [20:53:03] 10netops, 10Infrastructure-Foundations, 10SRE: ripe-atlas-codfw is down - https://phabricator.wikimedia.org/T267714 (10Papaul) [20:55:34] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE: (Need By: TBD) rack/setup/install atlas-codfw.wikimedia.org - https://phabricator.wikimedia.org/T273114 (10Papaul)