[09:58:52] 10Project-Admins: Create project tag for Teyora - https://phabricator.wikimedia.org/T298365 (10Ed6767) [14:16:36] 10Phabricator: Phabricator can not connect to MySQL - https://phabricator.wikimedia.org/T298369 (10Bugreporter) [14:16:50] 10Phabricator: Phabricator can not connect to MySQL - https://phabricator.wikimedia.org/T298369 (10Bugreporter) p:05Triage→03Unbreak! [14:19:12] 10Phabricator: Phabricator can not connect to MySQL - https://phabricator.wikimedia.org/T298369 (10Zabe) Where did you encounter this error? [14:20:44] zabe: can see on graphs [14:21:31] Amir1: you around [14:22:11] Depends [14:22:16] Let me look [14:22:54] Amir1: a quick look of https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db1107&var-port=9104 shows QPS doubled [14:23:08] that should be phab's DB if I can read [14:23:12] 10Phabricator: Phabricator can not connect to MySQL - https://phabricator.wikimedia.org/T298369 (10Bugreporter) [14:23:43] 10Phabricator: Phabricator can not connect to MySQL - https://phabricator.wikimedia.org/T298369 (10Bugreporter) This only happens occasionally though I reproduced it at least three times. [14:24:03] 10Phabricator, 10DBA: Phabricator can not connect to MySQL - https://phabricator.wikimedia.org/T298369 (10RhinosF1) I've let DBAs know. It is being looked at. [14:24:24] RhinosF1: thx [14:24:42] zabe: np [14:27:29] 10Phabricator, 10DBA: Phabricator can not connect to MySQL - https://phabricator.wikimedia.org/T298369 (10Ladsgroup) Given the grafana data, I do see an increase in writes and read queries: https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-job=All&var-server=db1107&var-port=9104&from=1640785533725&to... [14:30:22] Amir1: is it a monitor and hope it doesn't spike again then? [14:32:31] Amir1: there's a jump in requests to match https://grafana.wikimedia.org/d/000000587/phabricator?viewPanel=8&orgId=1&from=now-3h&to=now [14:33:24] yeah, I'll comment [14:34:34] 10Phabricator, 10DBA: Phabricator can not connect to MySQL - https://phabricator.wikimedia.org/T298369 (10Ladsgroup) I looked at binlogs of db1107 and it's all phabricator inserts so I don't think we can do much here. [14:35:36] Amir1: from user traffic or background proccess? If it's traffic then it might be worth looking for scrapers or something if it comes back [14:37:35] it has to traffic because https://www.irccloud.com/irc/libera.chat/channel/wikimedia-releng#:~:text=https%3A//grafana.wikimedia.org/d/000000587/phabricator%3FviewPanel%3D8%26orgId%3D1%26from%3Dnow%2D3h%26to%3Dnow [14:39:07] 10Phabricator, 10DBA: Phabricator can not connect to MySQL - https://phabricator.wikimedia.org/T298369 (10RhinosF1) There's an increase in traffic (https://www.irccloud.com/irc/libera.chat/channel/wikimedia-releng#:~:text=https%3A//grafana.wikimedia.org/d/000000587/phabricator%3FviewPanel%3D8%26orgId%3D1%26fro... [14:41:27] 10Phabricator, 10DBA: Phabricator can not connect to MySQL - https://phabricator.wikimedia.org/T298369 (10Ladsgroup) p:05Unbreak!→03High It seems it was a scraper or something that have pushed some load on the infra and went away, given that phabricator is not a critical service (vs. Wikipedia) I don't see... [14:41:39] Amir1: the traffic graph still looks high so it could be a time bomb [14:41:48] no it's not [16:12:46] Seing 'MySQL server has gone away' again. [16:13:49] and https://grafana.wikimedia.org/d/000000587/phabricator?viewPanel=8&orgId=1&from=now-3h&to=now is heavily going up again [16:18:31] https://w.wiki/4caA shows a single user agent making a large spike of requests [16:25:46] zabe, taavi: I pinged DBAs, they looking in #wikimedia-data-persistence [16:36:24] 10Phabricator, 10DBA: Phabricator can not connect to MySQL - https://phabricator.wikimedia.org/T298369 (10Marostegui) I am not sure if this is entirely related, but I have found quite a few runs of the following queries:{P18265} They are slow and they've been run a lots of times in the last 3 hours. I have no... [16:37:04] 10Phabricator, 10DBA: Phabricator can not connect to MySQL - https://phabricator.wikimedia.org/T298369 (10Marostegui) I can inspect writes too, but I haven't found a super obvious time range to inspect the binlogs from/to [16:43:10] 10Phabricator, 10DBA: Phabricator can not connect to MySQL - https://phabricator.wikimedia.org/T298369 (10Marostegui) So the slowest query takes almost 3 minutes to run: ` +------+-------------+-----------+--------+----------------------+-------------------+---------+-------------------------------------------... [17:43:09] 10Phabricator, 10DBA, 10Vuln-DoS: Phabricator can not connect to MySQL - https://phabricator.wikimedia.org/T298369 (10Bugreporter) [17:46:11] 10Phabricator, 10DBA, 10Vuln-DoS, 10Wikimedia-Incident: Phabricator can not connect to MySQL - https://phabricator.wikimedia.org/T298369 (10RhinosF1) [19:46:55] 10Phabricator, 10DBA, 10User-Ladsgroup, 10Vuln-DoS, 10Wikimedia-Incident: Phabricator can not connect to MySQL - https://phabricator.wikimedia.org/T298369 (10Ladsgroup) 05Open→03Resolved a:03Ladsgroup I added the crawler's IP to the abuse net (in private repo) and it should be blocked now. The load... [20:17:21] PROBLEM - SSH on contint1001.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [21:18:29] RECOVERY - SSH on contint1001.mgmt is OK: SSH OK - OpenSSH_6.6 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [21:27:01] 10Phabricator, 10DBA, 10User-Ladsgroup, 10Vuln-DoS, 10Wikimedia-Incident: Phabricator can not connect to MySQL - https://phabricator.wikimedia.org/T298369 (10Peachey88) [22:10:14] 10Phabricator, 10DBA, 10User-Ladsgroup, 10Vuln-DoS, 10Wikimedia-Incident: Phabricator can not connect to MySQL - https://phabricator.wikimedia.org/T298369 (10RhinosF1) Personally, I think an IR and some follow up is needed. I'm happy to help with writing the IR. We've had a partial service outage lastin... [22:23:57] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10Zuul: CI is doing nada (Gearman) - https://phabricator.wikimedia.org/T298177 (10hashar) The reason is the so many patches cause a lot of merge requests (roughly 850 based on the {nav Gearman job queue} graph). That takes a bit of time to...