[02:18:32] PROBLEM - Check unit status of swift_ring_manager on thanos-fe1001 is CRITICAL: CRITICAL: Status of the systemd unit swift_ring_manager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [03:13:52] RECOVERY - Check unit status of swift_ring_manager on thanos-fe1001 is OK: OK: Status of the systemd unit swift_ring_manager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [05:14:10] I am handling all the private data reports alerts [05:54:59] All of them fixed now, confirming now with a check_private_data manual run [06:01:02] Any idea why dbstore1005:3350 instance is down? [06:08:34] https://phabricator.wikimedia.org/T321464 [06:24:13] Amir1: https://phabricator.wikimedia.org/T321464#8337713 can I start it? [06:25:14] marostegui: oh yeah, that happened before as well. Forgot to fix it [06:25:27] thanks [06:27:30] no problem, I will get it back [06:52:31] marostegui: when you have time, let me know where did I miss in the private data sanitation [06:52:37] *mess [06:53:04] haha sure, I can explain in our 1:1 today [06:53:38] SGTM [06:53:45] oh I see there was a master failure [06:53:46] nice [06:53:48] and also don't worry about ParserCache, I'm on it :) [06:54:03] marostegui: yup, all the fun stuff happens when you're on vacation [06:54:18] \o/ [07:39:58] good morning, I am trying the week-end worth of mediawiki exceptions. There are database ones often recurring and I am always wondering what to do with them: `Transaction spent {time}s in writes, exceeding the 3s limit` [07:40:29] hashar: there's not much we can do with that one....from a DBA point of view unfortunately [07:41:51] there are not too many thanksfully :) [07:42:14] Not sure how useful they are on a daily basis though [07:42:21] I have another kind which is specific to ContentTranslation extension: Error 1213: Deadlock found when trying to get lock; try restarting transaction Function: ContentTranslation\Store\TranslationCorporaStore::save [07:42:29] which I guess I am goin gto file for them to investigate ;) [07:42:39] yeah, that sounds like a good idea [07:45:31] thank you marostegui ! [07:58:46] Any objections to start mysql on db1202? https://phabricator.wikimedia.org/T320786 [07:59:52] see comment [08:00:55] Ah I see [08:01:01] I will leave it up to Amir1 then! [08:01:06] Thanks :) [08:01:27] 3 people is too many cooks, so getting out of your way :-D [08:01:36] welcome, BTW [08:01:39] * marostegui out too [08:01:51] hashar: the deadlock already has a ticket [08:01:58] jynus: thanks :) [08:02:01] T256229 [08:02:02] T256229: ContentTranslation\TranslationStorageManager::saveQuery: Deadlock found when trying to get lock; try restarting transaction - https://phabricator.wikimedia.org/T256229 [08:02:06] AHHH [08:02:11] I will mark the dupe so [08:02:21] marostegui: no issue for me [08:02:37] Amir1: But see jaime's comment, you still want to practice the recovery? [08:03:33] nah. I'm going to try it with a test host. This has been lagging for too long [08:03:38] ok! [08:04:04] I am going to upgrade+reboot+start mysql then [08:04:20] hashar: the "took too long" is sorta common, just slow write queries, we should avoid them but a healthy dose of it is tolerable [08:06:45] and I imagine your team has some additional tooling to detect slow queries beside the MediaWiki log errors isn't it? [08:11:17] hashar: we used to have, I killed it with my bare hands [08:14:32] (MysqlReplicationLag) firing: MySQL instance db1202:9104 has too large replication lag (6d 22h 23m 29s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db1202&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [08:16:48] Amir1: 8-] [12:14:55] (MysqlReplicationLag) firing: MySQL instance db1202:9104 has too large replication lag (5d 8h 12m 36s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db1202&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [16:14:55] (MysqlReplicationLag) firing: MySQL instance db1202:9104 has too large replication lag (3d 16h 7m 37s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db1202&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [20:14:55] (MysqlReplicationLag) firing: MySQL instance db1202:9104 has too large replication lag (1d 20h 16m 19s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db1202&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag