[05:17:26] marostegui: I realized what was the reason behind the revision view being broken in etwiki (for data engineering team) and it's not about the rev_timestamp alter table :D [05:17:51] it was dropping of revision_actor_temp as revision used to have a view on it [05:18:25] so that needs recreation on all the replicas [05:18:32] clouddb1021 is fully done [05:18:36] as I did it yesterday [05:18:43] I did that everywhere before [05:18:44] but the other replicas need checking [05:18:56] I just didn't do it with clouddb1021 [05:19:09] ah good [05:19:12] (because it wasn't mentioned in the docs) [05:19:25] https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Wiki_Replicas [05:20:22] Amir1: I guess cause clouddb1021 isn't a public replica but a replica just for analytics [05:20:27] The team, I mean [05:20:43] yeah, I didn't know it needed it [05:23:03] it is actually a good tool to measure changes [05:23:04] https://grafana.wikimedia.org/d/000000377/host-overview?viewPanel=28&orgId=1&var-server=clouddb1021&var-datasource=thanos&var-cluster=wmcs&from=now-90d&to=now [05:23:26] around half a TB (assuming compressed) gone [09:50:04] I want to abandon https://gerrit.wikimedia.org/r/c/operations/puppet/+/455769 votes against? [09:50:25] I am fine with abandoning it [09:50:40] if it was something that was wanted, probably research should start from 0 [09:50:52] and the reasons at the time probably won't appply [09:51:06] plus it would need more complex changes [09:54:22] (I am trying to make my patch backlog fit in a single screen getting rid of non-useful reviews) [09:56:33] marostegui: btw, ICYMI, Dan and I are working to basically migrate really slow queries that take hours (I remember that one that took 9) to update special pages to hadoop and feed that back to mediawiki: T309738 [09:56:34] T309738: Move Mediawiki QueryPages computation to Hadoop - https://phabricator.wikimedia.org/T309738 [09:56:44] Yeah, I read that today :) [09:57:02] in case you hate the idea, this is the time to mention it :P [09:57:07] No, i love it XD [09:57:21] Awesome :D [10:43:32] I just had a though regarding unknown performance issues- is disabling of learning cycles still done? Could some of that impact newer host performance? [11:50:21] jynus: which new host? [11:50:51] well ,not new new, but newer, the one that had issues last time [11:51:00] db1128 maybe? [11:51:10] db1128 had a memory crash [11:51:13] ah [11:51:15] then other [11:51:31] You mean the one that had a few issues and we didn't know if it was hw or 10.6? [11:51:35] yeah [11:51:59] Yeah, I checked if a learning cycle was going on [11:52:01] but it wasn't [11:52:09] and it never happened according to the controller's log [11:55:12] :-( [11:55:40] :) [11:55:43] XD [13:58:48] Amir1: the run failed, because page_restrictions was removed from the page table, was this listed anywhere? [13:59:04] and are there any other breaking changes? [13:59:34] milimetric: that's part of https://phabricator.wikimedia.org/T60674 [13:59:43] I can recreate the views there if needed [14:00:29] marostegui: no need, we're not using page_compat, just the full page view [14:00:44] so I just need to adjust our query. [14:00:55] ideally if you know of any other breaking changes, that would be useful [14:01:12] milimetric: ah so you can still select from the page table? [14:01:27] milimetric: not sure, I think that's the one that's currently on-going [14:01:31] I can select, but our current query uses page_restrictions, just have to fix [14:01:38] ok, I'll just run all the other queries to verify [14:04:12] milimetric: we have two page restrictions [14:04:18] Thr field and the table [14:05:58] yep, dropping the field in the page table is the one that broke our query [14:06:09] 😔😔 [14:06:25] yeah, expected if there's a full view there [14:06:30] milimetric: I can recreate the view now [14:06:38] It has been empty for years [14:06:39] at least on cloudb1021 [14:06:53] Yeah, but it doesn't matter, if there's a view on the full table, it will break anyways [14:07:07] marostegui: I think it's recreated, `desc page` on etwiki_p shows me the schema without the column [14:07:14] ah cool [14:07:24] is that the one I did yesterday? [14:07:26] Maybe [14:07:26] (I think maintain_views has been run unrelatedly since then) [14:07:29] yeah [14:07:31] yeah [14:07:34] :) [14:07:44] I don't remember which ones I ran it for, revision was one of them for sure, maybe page was the other [14:08:10] oh, I didn't actually know you could filter to specific tables [14:08:16] yeah :) [14:08:26] I'll verify all our queries against etwiki_p right now, is there any other shard that has partial deploys of stuff like this? [14:08:50] milimetric: I could run it for all the shards just in case too [14:09:04] marostegui: that's probably useful, yeah [14:09:18] ok, let me do that [14:09:21] for the page table, then? [14:09:25] I don't think it matters either way to me, but it might be a good idea [14:09:51] because for me, I'm just going to `select null as page_restrictions` so it should work even if the view is incorrect maybe? [14:09:58] oh... but this is rolling out as we speak to other shards? [14:10:06] and when it does, it would break the views anyway? [14:10:11] Amir1: ^ that question is for you :) [14:10:15] milimetric: yes [14:10:19] yeah... [14:10:21] that's a problem [14:10:45] The views need to be recreated once the schema change has been applied on clouddb* [14:10:50] yep [14:11:19] so let's hold off on maintain_views and get an estimate from Amir1 on when he thinks the page_restrictions column drop will be applied to all shards [14:11:38] oki [14:14:15] milimetric: just finished a general run for all the shards for the page table, it doesn't hurt [14:14:37] thx! [14:23:56] ok, I finished running all other queries. Looks like Joseph found this bug this morning and has already deployed the fix :) [14:24:11] so I'll start another run, and I'll let you know if it runs into other problems [14:28:25] Amir1: if possible, please hold off on dropping page_restrictions until the sqoop job finishes (should be sometime Sunday). If not, let me know [15:20:22] milimetric: let me see, I think it's mostly done already [15:21:49] https://drift-tracker.toolforge.org/report/core/ I don't see anything in sanitarium masters [15:22:26] the Extra field page.page_restrictions in production section shows a bunch of results... so it's not done there right? And are you planning on doing those? [15:24:38] yeah but they should not affect analytics (not being snaitarium master) [15:24:50] nor dbstore [15:25:35] but I won't do anything for now [16:34:01] ok. And to be clear: I don't think this is something that we can or should handle this way. The proposals we have around building a shared data platform are the real thing we need to be discussing [16:34:17] as in, this is not anyone's fault here, we're all doing the best in a bad situation [16:35:45] milimetric: yeah, and we also had a large queue of schema changes, thankfully it is getting to a better state now, so hopefully it won't happen so often again [16:36:58] cool, but it's ok even if we have more hiccups... we know the real solution we're trying to drive towards