[07:58:32] greetings [08:16:09] I've opened T418829 to track fourohfour overload btw [08:16:09] T418829: fourohfour general unavailability / overload - https://phabricator.wikimedia.org/T418829 [08:16:27] please edit at will the description with more info/etc [08:32:24] mmhh grafana.wmcloud.org == 503, taking a look at that first [08:35:17] Mar 03 06:27:02 metricsinfra-grafana-2 grafana[3867793]: Error: ✗ migration failed (id = ensure rule_group column is case sensitive in returned results): Error 1273 (HY000): Unknown collation: 'utf8mb4_0900_as_cs' [08:35:21] joy [08:39:08] which version of mariadb we have? [08:41:48] so grafana is using a trove db in this case, not sure what version [08:42:34] opening a task in the meantime [08:44:43] T418831 [08:44:43] T418831: grafana.wmcloud.org unavailable - failed db migration - https://phabricator.wikimedia.org/T418831 [08:46:51] upstream issue is https://github.com/grafana/grafana/issues/118836 [08:46:52] comemnted there [08:47:01] lol great timing [08:48:30] I'll downgrade grafana and hold it for now [08:52:56] but when it was upgraded? by who? [08:53:52] https://phabricator.wikimedia.org/T416608 [08:54:36] and unattended-upgrades did the actual upgrade on metricsinfra [08:54:56] right :/ [08:59:24] godog: thanks! [09:00:07] yw dcaro, that adrenaline kick worked better than coffee [09:00:19] lol [09:17:35] morning [09:26:53] just added a new redis dashboard to help debug issues (we were missing it) [09:26:53] https://grafana-rw.wmcloud.org/d/e008bc3f-81a2-40f9-baf2-a33fd8dec7ec/redis-dashboard-for-prometheus-redis-exporter-1-x?orgId=1&from=now-24h&to=now&timezone=browser&var-namespace=&var-instance=tools-redis-5&var-query0=&var-prometheus=P8433460076D33992&editIndex=0 [09:27:11] fyi, very useful one-liner to replace datasources: [09:27:12] dcaro@acme$ cat redis.grafana | jq '.panels[] |= (if .datasource.uid == "P8433460076D33992" then .datasource.uid = "${prometheus}" else . end)' | wl-copy [09:30:20] very cool, thank you dcaro [09:59:48] I'm playing around with setting a local memory cache per-process to avoid hitting redis so often on fourohfour [10:02:14] ack [10:02:26] just saw the prod issues [10:52:11] fourohfour sems stable now :) [10:52:25] (🤞) [10:53:01] neat, thank you dcaro [11:06:40] this is the current fix (applied in prod, where I have some extra logging still for debugging for a bit) https://gitlab.wikimedia.org/toolforge-repos/fourohfour/-/merge_requests/16 [11:42:57] neat [15:58:15] Raymond_Ndibe: I'm deploying https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/259, can you get ready to test in toolsbeta? [15:58:30] (regarding the v1->v2 migration for jobs) [16:11:51] hmmm.... the versions of the components deployed shown look weird in toolsbeta [16:13:51] Raymond_Ndibe: ping? [16:21:12] quick review (for the fourohfour) https://gitlab.wikimedia.org/toolforge-repos/fourohfour/-/merge_requests/16 if anyone has a moment (it's running in prod already) [16:25:46] Raymond_Ndibe: I've deployed it to toolsbeta, deploying the jobs-api change on tools, I'll start with the tests in toolsbeta in parallel [16:40:09] Raymond_Ndibe: tested in toolsbeta, will start migrating tools in chunks [17:40:49] okok, done :) [17:44:52] dcaro: nice! all went smoothly? [17:45:03] yep :), had to do some manually, but minor issues [18:16:23] * dcaro off [18:17:11] cya tomorrow! [18:18:32] Raymond_Ndibe: feel free to send the MR cleaning up the extra code we added to jobs-api for the migration if you don't find any more v1 jobs in a few hours (if you are still around) [18:26:23] T418897 [18:26:24] T418897: toolforge-deploy tests failure: Your local changes to the following files would be overwritten by checkout: components/jobs-api/2025_04_migration_of_all_jobs_to_version_2 - https://phabricator.wikimedia.org/T418897 [18:31:32] taavi: is that in tools or lima-kilo? [18:31:44] that is toolsbeta [18:31:56] .. which probably explains why I couldn't find the checkout on tools. :D [18:32:02] :P [18:32:47] sigh [18:33:16] yep [18:34:15] * dhinus off [20:13:10] Oh yes, feel free to discard those changes, forgot to git reset after testing