[08:17:41] morning folks, could I get a +1 (or otherwise) on https://gerrit.wikimedia.org/r/c/operations/puppet/+/719223 please? puppet changes for decommissioning pc1008 [08:19:50] that looks good to me but I would wait for some of the dbas to confirm [08:26:55] kormat / marostegui : ^-- ? :) [08:28:17] checking [08:28:37] I did pc1007 yesterday [08:28:50] yeah I saw :) [08:28:51] (and there may be more similar MRs coming up later today :) ) [08:29:31] +1 ed! [08:29:49] TY [08:39:36] I'm going to remove the "Homer" section from https://wikitech.wikimedia.org/wiki/MariaDB/Decommissioning_a_DB_Host because the sre.host.decommission cookbook does that now. OK? [08:42:48] Emperor: yes, thanks! [09:10:17] marostegui / kormat would you care to +1 https://gerrit.wikimedia.org/r/c/operations/puppet/+/719228 please? decommission pc1009 [09:12:22] checking [09:13:15] today I am become Death destroyer of old parsercache systems :) [09:14:59] I for one welcome our new parsercache decommissioning overlords [09:39:17] 🎉 [09:52:55] marostegui / kormat +1 https://gerrit.wikimedia.org/r/c/operations/puppet/+/719237 ? pc1010 delenda est [09:56:27] will check in a bit [09:57:38] (would it be easier if I batched up a bunch of CRs for the hosts I'm going through, rather than doing them one at a time?) [09:58:03] Emperor: i'm fine with marostegui reviewing them all ;) [09:58:13] (+1'd) [09:58:39] Emperor: personally i'd do it like you are currently, less chance i'd get something wrong [09:59:31] fwiw the decom cookbook accept multiple hosts in the form of a cumin query [09:59:48] it also saves time as it does run the dns step only once and the homer step once per switch [10:42:33] I've put in CRs for the last couple on my decom list [10:43:34] (and taken ownership of the phabricator items) [12:15:27] Emperor: one cleanup task is to remove the old pc hosts from mediawiki-config [12:15:37] which comes with an exciting new procedure for you [12:15:41] oh my god... [12:15:49] The fun of wmf-config [12:16:23] We need to keep in mind that Emperor uses emacs so maybe he will enjoy deploying wmf-config [12:16:49] Emperor: you'll need to clone https://gerrit.wikimedia.org/r/admin/repos/operations/mediawiki-config [12:16:52] marostegui: haha [12:17:18] in there, open `wmf-config/ProductionServices.php` [12:17:27] you'll find all the old pc hosts in 2 sets of block comments [12:17:34] those comments can just be nuked [12:18:58] Emperor: i _think_ i've +1'd all your decom CRs. let me know if i somehow missed one. [12:20:49] kormat: I think https://gerrit.wikimedia.org/r/c/operations/puppet/+/719243 lost your +1 when I had to de-merge-conflict it... [12:21:43] done [12:21:48] ta [12:22:46] (I dunno if the final one will similarly lose it because gerrit hates me) [12:24:48] if I put >1 Bug: line in a CR will gerrit DTRT? [12:24:58] [Do The Right Thing] [12:25:54] yep [12:42:24] marostegui: yesterday I started looking at db queries common requests (simple API, load.php, etc.) make [12:42:35] and oh you'd be surprised [12:43:17] I am sure that not in a good way [12:43:50] Yeah :( [13:51:37] there is an alert about failed parsercache jobs for pc1010 and pc2010 [13:52:21] could be outdated zarcillo or it pending to update prometheus config [13:57:22] jynus: hurm. thanks for pointing it out. i'll have a look. [13:58:23] I've not yet taken pc2010 out of zarcillo (going to get there shortly) [13:58:34] ah, there you have it [13:58:37] no worries then [13:58:52] pc1010 I took out this morning, though [13:59:40] Emperor: yeah, that seems sus [14:00:49] this is alert I am referring to, could be outdated: https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=alert1001&service=Prometheus+jobs+reduced+availability [14:01:32] the grafana link from the alert shows the issue [14:02:54] the `generate-mysqld-exporter-config` timer seems to be running fine on prometheus1003 [14:05:10] ohh [14:05:22] the primaries haven't been updated for pc in zarcillo for eqiad [14:05:25] that won't help, at least [14:05:26] ...and my command history has pc1010 in, so I'm pretty sure I didn't typo it [14:15:07] huuh [14:15:25] ? [14:15:34] so, when the script runs, the old data in `mysql-parsercache_eqiad.yaml` is untouched [14:15:43] the new pc hosts only show up in `mysql-core_eqiad.yaml` [14:17:02] Emperor: i'm 99% sure this isn't anything you did wrong, fwiw [14:17:43] Just did: !log No more db maintenance on eqiad T288594 [14:17:43] T288594: Pre DC switchover codfw -> eqiad DB work - https://phabricator.wikimedia.org/T288594 [14:18:09] marostegui: 🎉 [14:20:46] bingo, i see the issue. [14:21:13] when we inserted the new pc hosts into zarcillo, we didn't set the group type correctly [14:21:16] so it defaulted to 'core' [14:21:36] ah, that was a common issue we had in the past [14:23:56] having a sanity loss issue here. why is this sql wrong? [14:24:01] `update instances set group='parsercache' where name like 'pc%';` [14:26:21] `group` [14:26:27] as it is a reserved word :-) [14:26:36] are you kidding me [14:26:40] that's the column name [14:26:49] probably not the best selection of a name [14:27:07] kormat: I am surprised you are surprised at this point with how mysql works! [14:27:13] ok, surrounding group with backticks works [14:27:19] but i'm still pissed off at mysql. [14:27:28] marostegui: 🥀 [14:28:37] probably shouldn't be able to make column names with reserved words [14:30:55] oh god. what have i done [14:31:01] https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?search_string=^\(db\|es\|pc\)[12]&style=detail&servicestatustypes=29 [14:32:22] kormat, that is unrelated [14:32:33] it is a new thing, not related to prometheus mysql, see ops- [14:32:37] dont worry :-) [14:32:39] jynus: 😓 [14:40:04] well, the good news is that our alert is now fixed [14:41:00] I think the Prometheus Puppet work is happening in https://phabricator.wikimedia.org/T283585 [14:41:49] sobanski: the important thing is that it's happening in not-our-fault. ;)