[09:54:57] <claime>	 I was taking a look at the crashed mediawiki_job_update_special_pages_s5.service and, is there something special about Mostlinkedtemplates on cebwiki?
[09:57:14] <claime>	 Most other wikis even when they have 5000 rows in the response take under a second, but cebwiki the request takes almost 14 minutes
[10:05:21] <Amir1>	 claime: oh that's normal. cebwiki is a botpedia with an extremely large templatelinks table (~100GB?)
[10:05:29] <Amir1>	 I'm happy it's not taking hours
[10:05:37] <claime>	 Amir1: lol ok then
[10:05:42] <Amir1>	 I hope to eventually migrate that script to hadoop
[10:05:58] <Amir1>	 T309738
[10:05:59] <stashbot>	 T309738: Move Mediawiki QueryPages computation to Hadoop - https://phabricator.wikimedia.org/T309738
[10:06:06] <Amir1>	 https://gerrit.wikimedia.org/r/c/mediawiki/core/+/983961
[10:06:08] <Amir1>	 Eventually
[10:06:10] <claime>	 I relaunched it manually because it had crashed and didn't run for 2 consecutive times for s5
[10:06:41] <Amir1>	 it's fine, If it crashes again, let's disable that script or make it monthly for cebwiki
[10:07:57] <claime>	 Hmm that will be a little involved since it's per-section, I saw on wikimedia-tech that someone was complaining about Special:BrokenRedirects not being updated on srwiki
[10:08:30] <Amir1>	 oh no, mw has config for it
[10:08:33] <Amir1>	 don't worry
[10:08:44] <claime>	 Which happened because the script crashed on dewiki on the 10th and 13th
[10:09:02] <Amir1>	 what is the crash error?
[10:09:30] <claime>	 https://phabricator.wikimedia.org/P62412
[10:10:17] <Amir1>	 that's a different issue, let me see how can I fix that
[10:11:06] <claime>	 Yeaaaah it crashed again on dewiki x) I thought it would be related to the issue that was fixed on monday
[10:11:46] <Amir1>	 no, it's because FR got migrated to SQB without proper testing. I fixed on case in it already, can you create a ticket?
[10:11:55] <claime>	 Sure
[10:15:17] <Amir1>	 thanks
[10:15:45] <claime>	 https://phabricator.wikimedia.org/T364974
[10:24:34] <claime>	 I've updated the task because I just checked other sections and only s1 and s4 actually ran correctly
[10:24:54] <claime>	 and s8
[10:41:09] <Amir1>	 claime: those don't have FlaggedRevs installed (at least not like that)
[10:42:05] <claime>	 Amir1: that explains it
[12:01:24] <jynus>	 FYI: Last snapshot for s7 at eqiad (db1171) taken on 2024-05-15 10:57:54 is 871 GiB, but the previous one was 1058 GiB, a change of -17.7 %
[13:26:51] <dhinus>	 a single clouddb host (clouddb1015) is alerting with 95% memory usage, grafana shows it kept growing very slowly in the past 30 days
[13:27:55] <dhinus>	 I was thinking of restarting the service with "systemctl restart mariadb"
[13:28:24] <dhinus>	 have you seen this happening in the past in other hosts and what did you do?
[13:33:19] <dhinus>	 just noticed that the section that uses most of the memory (mariadb@s4, with 72% memory) is also at 100% CPU
[13:35:20] <dhinus>	 red herring: cpu usage was temporary and is now down to 10%, memory is still at 72%
[13:35:56] <Amir1>	 jynus: that's pagelinks \o/
[13:36:19] <Amir1>	 https://phabricator.wikimedia.org/T352010
[13:36:25] <Amir1>	 (bottom of the task desc)
[14:45:41] <elukey>	 Amir1: had time only now to read your email about the db conns circuit breaking, really nice work!
[14:46:14] <Amir1>	 I hope it works 🤞
[15:01:52] <Amir1>	 afk for a bit
[15:56:51] <icinga-wm>	 PROBLEM - MariaDB sustained replica lag on s1 on db1234 is CRITICAL: 4.6 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1234&var-port=9104
[15:57:51] <icinga-wm>	 RECOVERY - MariaDB sustained replica lag on s1 on db1234 is OK: (C)2 ge (W)1 ge 0.8 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1234&var-port=9104