[02:18:32] <icinga-wm>	 PROBLEM - Check unit status of swift_ring_manager on thanos-fe1001 is CRITICAL: CRITICAL: Status of the systemd unit swift_ring_manager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[03:13:52] <icinga-wm>	 RECOVERY - Check unit status of swift_ring_manager on thanos-fe1001 is OK: OK: Status of the systemd unit swift_ring_manager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[05:14:10] <marostegui>	 I am handling all the private data reports alerts
[05:54:59] <marostegui>	 All of them fixed now, confirming now with a check_private_data manual run
[06:01:02] <marostegui>	 Any idea why dbstore1005:3350 instance is down? 
[06:08:34] <marostegui>	 https://phabricator.wikimedia.org/T321464
[06:24:13] <marostegui>	 Amir1: https://phabricator.wikimedia.org/T321464#8337713 can I start it?
[06:25:14] <Amir1>	 marostegui: oh yeah, that happened before as well. Forgot to fix it
[06:25:27] <Amir1>	 thanks
[06:27:30] <marostegui>	 no problem, I will get it back
[06:52:31] <Amir1>	 marostegui: when you have time, let me know where did I miss in the private data sanitation 
[06:52:37] <Amir1>	 *mess
[06:53:04] <marostegui>	 haha sure, I can explain in our 1:1 today
[06:53:38] <Amir1>	 SGTM
[06:53:45] <marostegui>	 oh I see there was a master failure
[06:53:46] <marostegui>	 nice
[06:53:48] <Amir1>	 and also don't worry about ParserCache, I'm on it :)
[06:54:03] <Amir1>	 marostegui: yup, all the fun stuff happens when you're on vacation
[06:54:18] <marostegui>	 \o/
[07:39:58] <hashar>	 good morning, I am trying the week-end worth of mediawiki exceptions. There are database ones often recurring and I am always wondering what to do with them: `Transaction spent {time}s in writes, exceeding the 3s limit`
[07:40:29] <marostegui>	 hashar: there's not much we can do with that one....from a DBA point of view unfortunately 
[07:41:51] <hashar>	 there are not too many thanksfully :)
[07:42:14] <marostegui>	 Not sure how useful they are on a daily basis though
[07:42:21] <hashar>	 I have another kind which is specific to ContentTranslation extension:  Error 1213: Deadlock found when trying to get lock; try restarting transaction Function: ContentTranslation\Store\TranslationCorporaStore::save
[07:42:29] <hashar>	 which I guess I am goin gto file for them to investigate ;)
[07:42:39] <marostegui>	 yeah, that sounds like a good idea
[07:45:31] <hashar>	 thank you marostegui !
[07:58:46] <marostegui>	 Any objections to start mysql on db1202? https://phabricator.wikimedia.org/T320786
[07:59:52] <jynus>	 see comment
[08:00:55] <marostegui>	 Ah I see
[08:01:01] <marostegui>	 I will leave it up to Amir1 then!
[08:01:06] <marostegui>	 Thanks :)
[08:01:27] <jynus>	 3 people is too many cooks, so getting out of your way :-D
[08:01:36] <jynus>	 welcome, BTW
[08:01:39] * marostegui out too 
[08:01:51] <Amir1>	 hashar: the deadlock already has a ticket
[08:01:58] <marostegui>	 jynus: thanks :)
[08:02:01] <Amir1>	 T256229
[08:02:02] <stashbot>	 T256229: ContentTranslation\TranslationStorageManager::saveQuery: Deadlock found when trying to get lock; try restarting transaction - https://phabricator.wikimedia.org/T256229
[08:02:06] <hashar>	 AHHH
[08:02:11] <hashar>	 I will mark the dupe so
[08:02:21] <Amir1>	 marostegui: no issue for me
[08:02:37] <marostegui>	 Amir1: But see jaime's comment, you still want to practice the recovery?
[08:03:33] <Amir1>	 nah. I'm going to try it with a test host. This has been lagging for too long
[08:03:38] <marostegui>	 ok!
[08:04:04] <marostegui>	 I am going to upgrade+reboot+start mysql then
[08:04:20] <Amir1>	 hashar: the "took too long" is sorta common, just slow write queries, we should avoid them but a healthy dose of it is tolerable 
[08:06:45] <hashar>	 and I imagine your team has some additional tooling to detect slow queries beside the MediaWiki log errors isn't it?
[08:11:17] <Amir1>	 hashar: we used to have, I killed it with my bare hands
[08:14:32] <jinxer-wm>	 (MysqlReplicationLag) firing: MySQL instance db1202:9104 has too large replication lag (6d 22h 23m 29s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db1202&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag
[08:16:48] <hashar>	 Amir1: 8-]
[12:14:55] <jinxer-wm>	 (MysqlReplicationLag) firing: MySQL instance db1202:9104 has too large replication lag (5d 8h 12m 36s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db1202&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag
[16:14:55] <jinxer-wm>	 (MysqlReplicationLag) firing: MySQL instance db1202:9104 has too large replication lag (3d 16h 7m 37s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db1202&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag
[20:14:55] <jinxer-wm>	 (MysqlReplicationLag) firing: MySQL instance db1202:9104 has too large replication lag (1d 20h 16m 19s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db1202&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag