[09:54:57] I was taking a look at the crashed mediawiki_job_update_special_pages_s5.service and, is there something special about Mostlinkedtemplates on cebwiki? [09:57:14] Most other wikis even when they have 5000 rows in the response take under a second, but cebwiki the request takes almost 14 minutes [10:05:21] claime: oh that's normal. cebwiki is a botpedia with an extremely large templatelinks table (~100GB?) [10:05:29] I'm happy it's not taking hours [10:05:37] Amir1: lol ok then [10:05:42] I hope to eventually migrate that script to hadoop [10:05:58] T309738 [10:05:59] T309738: Move Mediawiki QueryPages computation to Hadoop - https://phabricator.wikimedia.org/T309738 [10:06:06] https://gerrit.wikimedia.org/r/c/mediawiki/core/+/983961 [10:06:08] Eventually [10:06:10] I relaunched it manually because it had crashed and didn't run for 2 consecutive times for s5 [10:06:41] it's fine, If it crashes again, let's disable that script or make it monthly for cebwiki [10:07:57] Hmm that will be a little involved since it's per-section, I saw on wikimedia-tech that someone was complaining about Special:BrokenRedirects not being updated on srwiki [10:08:30] oh no, mw has config for it [10:08:33] don't worry [10:08:44] Which happened because the script crashed on dewiki on the 10th and 13th [10:09:02] what is the crash error? [10:09:30] https://phabricator.wikimedia.org/P62412 [10:10:17] that's a different issue, let me see how can I fix that [10:11:06] Yeaaaah it crashed again on dewiki x) I thought it would be related to the issue that was fixed on monday [10:11:46] no, it's because FR got migrated to SQB without proper testing. I fixed on case in it already, can you create a ticket? [10:11:55] Sure [10:15:17] thanks [10:15:45] https://phabricator.wikimedia.org/T364974 [10:24:34] I've updated the task because I just checked other sections and only s1 and s4 actually ran correctly [10:24:54] and s8 [10:41:09] claime: those don't have FlaggedRevs installed (at least not like that) [10:42:05] Amir1: that explains it [12:01:24] FYI: Last snapshot for s7 at eqiad (db1171) taken on 2024-05-15 10:57:54 is 871 GiB, but the previous one was 1058 GiB, a change of -17.7 % [13:26:51] a single clouddb host (clouddb1015) is alerting with 95% memory usage, grafana shows it kept growing very slowly in the past 30 days [13:27:55] I was thinking of restarting the service with "systemctl restart mariadb" [13:28:24] have you seen this happening in the past in other hosts and what did you do? [13:33:19] just noticed that the section that uses most of the memory (mariadb@s4, with 72% memory) is also at 100% CPU [13:35:20] red herring: cpu usage was temporary and is now down to 10%, memory is still at 72% [13:35:56] jynus: that's pagelinks \o/ [13:36:19] https://phabricator.wikimedia.org/T352010 [13:36:25] (bottom of the task desc) [14:45:41] Amir1: had time only now to read your email about the db conns circuit breaking, really nice work! [14:46:14] I hope it works 🤞 [15:01:52] afk for a bit [15:56:51] PROBLEM - MariaDB sustained replica lag on s1 on db1234 is CRITICAL: 4.6 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1234&var-port=9104 [15:57:51] RECOVERY - MariaDB sustained replica lag on s1 on db1234 is OK: (C)2 ge (W)1 ge 0.8 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1234&var-port=9104