[08:22:01] for T374215 I'm trying to reclone db1246 from another host, but I'm having cumin query errors with syntax that previously worked, I might not be fully awaken yet but I'd be happy to have a sanity check on my command → sudo cookbook sre.mysql.clone --source db1233.eqiad.wmnet --target db1246.eqiad.wmnet --primary db1222.eqiad.wmnet (note: I also [08:22:01] tested on cumin2002 and also tested another cookbook (upgrade), and a preview version of this one (test-cookbook -c 1071155)) [08:25:57] The error message is cumin.backends.InvalidQueryError: Unexpected boolean operator 'and not' with hosts '' [08:26:48] but this query worked previously, did cumin changed recently? [08:27:53] https://github.com/wikimedia/operations-cookbooks/blob/master/cookbooks/sre/mysql/clone.py#L50 for me it could stem from that query that must not be fully filled before passed to cumn [08:28:09] but that raises the question of "why did another cookbook had the same issue" [08:29:13] The exception is from line 57 not line 50 [08:30:19] so its only on the target host, good catch [08:30:37] yep, and that's it [08:30:39] will dig, thanks Emperor for being the active rubber ducky [08:30:47] db1246 is not in A:db-all [08:30:59] sudo cumin 'P{db1246.eqiad.wmnet} and A:db-all' -> zero matches [08:32:04] which is why it explodes on line 57 and not for the similar query on line 50 [08:32:22] because db1233 _is_ in A:db-all [08:32:32] that makes total sense, the next question is now why did db1246 got out of db-all :D [08:32:43] that I'm not going to attempt to answer :) [08:33:00] haha thanks for helping me that far anyway :) [08:33:18] it's a shame the error report is so user-hostile, though, especially given that the cookbook obviously intends to produce sensible error messages. [08:35:38] db1246 is in the insetup role and the Cumin alias matches on the mariadb role(s) [08:36:25] ah indeed, I'll go back to sleep after this, thanks moritzm [08:43:26] https://gerrit.wikimedia.org/r/c/operations/puppet/+/1097943 ← just for a sanity check [08:56:00] arnaudb: that's not a revert of https://gerrit.wikimedia.org/r/c/operations/puppet/+/1093372 so forgive the probably-dumb question, but: do you need to change preseed.yaml too ? [08:56:14] Oh, you did, I just can't read this morning, ignore me [08:56:28] you & me both x) [13:13:36] Amir1: any joy finding a Commons admin for T380738 please? [13:13:37] T380738: Schuur - Nieuwerbrug - 20164513 - RCE.jpg inconsistent, needs new upload - https://phabricator.wikimedia.org/T380738 [13:14:04] I asked in the commons telegram channel yesterday, no response yet. One thing, I wonder if anyone can just reupload it [13:14:20] anyone can reupload most images [13:18:15] I asked in the IRC channel too, let's see [13:20:12] Emperor: It actually doesn't need admin but with reupload, I get this: [13:20:14] > The file "mwstore://local-multiwrite/local-public/b/bf/Schuur_-_Nieuwerbrug_-_20164513_-_RCE.jpg" is in an inconsistent state within the internal storage backends [13:20:35] It is, the images in the two backends are different. [13:20:57] mw refuses to accept the reupload. I think we probably need to directly change the file in swift [13:21:09] I can fix the thumbnails (action=purge will clean it) [13:21:16] jynus has historically been "never ever do that" [13:21:30] then there is probably good reason for it :( [13:22:05] AIUI his view is it's better to re-upload than e.g. restore from a backup so there's a record of the intervention in MW rather than just changing things silently. [13:22:41] But if we can't do that here, maybe I should just do swift upload... [13:22:48] what if we remove the file from both places [13:23:57] YM swift delete from both containers? [13:24:52] yeah, not sure how mw would react though [13:24:55] I can do that (obv), will that then let you do an new upload? [13:25:02] yeah [13:25:42] OK, shall I do so? I can at least !log it against the ticket, and we have the image we want in phab already [13:27:58] that would be: swift delete wikipedia-commons-local-public.bf b/bf/Schuur_-_Nieuwerbrug_-_20164513_-_RCE.jpg [13:28:13] (and I think I'll wait for an explicit +1 before going any further :) ) [13:28:32] LGTM [13:29:32] Emperor: let me know once it's done in both dcs [13:30:23] Amir1: {{done}} [13:34:10] Emperor: it should be fixed now [13:35:37] Amir1: yeah, that looks good, thanks. OOI, did you need the intermediate image because it wouldn't let you re-upload something that looked like what MW thought was meant to be in swift? [13:36:01] yeah, it had the same sha1, so it didn't let me reupload [13:37:03] (if the file is corrupt, the sha1 would be different, that's the point of hashes but it was stored during the upload in the image table separately) [13:39:59] 👍 [23:37:10] PROBLEM - MariaDB sustained replica lag on s8 on db2195 is CRITICAL: 50.6 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2195&var-port=9104 [23:40:10] RECOVERY - MariaDB sustained replica lag on s8 on db2195 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2195&var-port=9104