[08:22:01] <arnaudb>	 for T374215 I'm trying to reclone db1246 from another host, but I'm having cumin query errors with syntax that previously worked, I might not be fully awaken yet but I'd be happy to have a sanity check on my command → sudo cookbook sre.mysql.clone --source db1233.eqiad.wmnet --target db1246.eqiad.wmnet --primary db1222.eqiad.wmnet (note: I also
[08:22:01] <arnaudb>	 tested on cumin2002 and also tested another cookbook (upgrade), and a preview version of this one (test-cookbook -c 1071155))
[08:25:57] <Emperor>	 The error message is cumin.backends.InvalidQueryError: Unexpected boolean operator 'and not' with hosts ''
[08:26:48] <arnaudb>	 but this query worked previously, did cumin changed recently?
[08:27:53] <arnaudb>	 https://github.com/wikimedia/operations-cookbooks/blob/master/cookbooks/sre/mysql/clone.py#L50 for me it could stem from that query that must not be fully filled before passed to cumn
[08:28:09] <arnaudb>	 but that raises the question of "why did another cookbook had the same issue"
[08:29:13] <Emperor>	 The exception is from line 57 not line 50
[08:30:19] <arnaudb>	 so its only on the target host, good catch
[08:30:37] <Emperor>	 yep, and that's it
[08:30:39] <arnaudb>	 will dig, thanks Emperor for being the active rubber ducky
[08:30:47] <Emperor>	 db1246 is not in A:db-all
[08:30:59] <Emperor>	 sudo cumin 'P{db1246.eqiad.wmnet} and A:db-all' -> zero matches
[08:32:04] <Emperor>	 which is why it explodes on line 57 and not for the similar query on line 50
[08:32:22] <Emperor>	 because db1233 _is_ in A:db-all
[08:32:32] <arnaudb>	 that makes total sense, the next question is now why did db1246 got out of db-all :D
[08:32:43] <Emperor>	 that I'm not going to attempt to answer :)
[08:33:00] <arnaudb>	 haha thanks for helping me that far anyway :)
[08:33:18] <Emperor>	 it's a shame the error report is so user-hostile, though, especially given that the cookbook obviously intends to produce sensible error messages.
[08:35:38] <moritzm>	 db1246 is in the insetup role and the Cumin alias matches on the mariadb role(s)
[08:36:25] <arnaudb>	 ah indeed, I'll go back to sleep after this, thanks moritzm
[08:43:26] <arnaudb>	 https://gerrit.wikimedia.org/r/c/operations/puppet/+/1097943 ← just for a sanity check 
[08:56:00] <Emperor>	 arnaudb: that's not a revert of https://gerrit.wikimedia.org/r/c/operations/puppet/+/1093372 so forgive the probably-dumb question, but: do you need to change preseed.yaml too ?
[08:56:14] <Emperor>	 Oh, you did, I just can't read this morning, ignore me
[08:56:28] <arnaudb>	 you & me both x)
[13:13:36] <Emperor>	 Amir1: any joy finding a Commons admin for T380738 please?
[13:13:37] <stashbot>	 T380738: Schuur - Nieuwerbrug - 20164513 - RCE.jpg inconsistent, needs new upload - https://phabricator.wikimedia.org/T380738
[13:14:04] <Amir1>	 I asked in the commons telegram channel yesterday, no response yet. One thing, I wonder if anyone can just reupload it
[13:14:20] <Amir1>	 anyone can reupload most images
[13:18:15] <Amir1>	 I asked in the IRC channel too, let's see
[13:20:12] <Amir1>	 Emperor: It actually doesn't need admin but with reupload, I get this:
[13:20:14] <Amir1>	 > The file "mwstore://local-multiwrite/local-public/b/bf/Schuur_-_Nieuwerbrug_-_20164513_-_RCE.jpg" is in an inconsistent state within the internal storage backends 
[13:20:35] <Emperor>	 It is, the images in the two backends are different.
[13:20:57] <Amir1>	 mw refuses to accept the reupload. I think we probably need to directly change the file in swift
[13:21:09] <Amir1>	 I can fix the thumbnails (action=purge will clean it)
[13:21:16] <Emperor>	 jynus has historically been "never ever do that"
[13:21:30] <Amir1>	 then there is probably good reason for it :(
[13:22:05] <Emperor>	 AIUI his view is it's better to re-upload than e.g. restore from a backup so there's a record of the intervention in MW rather than just changing things silently.
[13:22:41] <Emperor>	 But if we can't do that here, maybe I should just do swift upload...
[13:22:48] <Amir1>	 what if we remove the file from both places
[13:23:57] <Emperor>	 YM swift delete from both containers?
[13:24:52] <Amir1>	 yeah, not sure how mw would react though 
[13:24:55] <Emperor>	 I can do that (obv), will that then let you do an new upload?
[13:25:02] <Amir1>	 yeah
[13:25:42] <Emperor>	 OK, shall I do so? I can at least !log it against the ticket, and we have the image we want in phab already 
[13:27:58] <Emperor>	 that would be: swift delete wikipedia-commons-local-public.bf b/bf/Schuur_-_Nieuwerbrug_-_20164513_-_RCE.jpg
[13:28:13] <Emperor>	 (and I think I'll wait for an explicit +1 before going any further :) )
[13:28:32] <Amir1>	 LGTM
[13:29:32] <Amir1>	 Emperor: let me know once it's done in both dcs
[13:30:23] <Emperor>	 Amir1: {{done}}
[13:34:10] <Amir1>	 Emperor: it should be fixed now
[13:35:37] <Emperor>	 Amir1: yeah, that looks good, thanks. OOI, did you need the intermediate image because it wouldn't let you re-upload something that looked like what MW thought was meant to be in swift?
[13:36:01] <Amir1>	 yeah, it had the same sha1, so it didn't let me reupload
[13:37:03] <Amir1>	 (if the file is corrupt, the sha1 would be different, that's the point of hashes but it was stored during the upload in the image table separately)
[13:39:59] <Emperor>	 👍
[23:37:10] <icinga-wm>	 PROBLEM - MariaDB sustained replica lag on s8 on db2195 is CRITICAL: 50.6 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2195&var-port=9104
[23:40:10] <icinga-wm>	 RECOVERY - MariaDB sustained replica lag on s8 on db2195 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2195&var-port=9104