[01:09:59] PROBLEM - MariaDB sustained replica lag on m1 on db1117 is CRITICAL: 20.8 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1117&var-port=13321 [01:11:35] RECOVERY - MariaDB sustained replica lag on m1 on db1117 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1117&var-port=13321 [07:48:48] good morning, FYI it seems that db1210's NIC (eno8303) has negotiated a port speed of 10Mb/s [07:48:56] (we got an email to noc@) [07:48:57] volans: yeah, it is known [07:49:11] good :) [07:49:14] https://phabricator.wikimedia.org/T334446 [07:49:34] In fact, the alert is new and was created to catch things like that one earlier [09:12:55] zabe: feel free to increase the speed of the script in s1 [09:51:04] The good old times https://gerrit.wikimedia.org/r/c/operations/dns/+/907808 [10:01:50] good morning, no switch maintenance today, right? [10:02:02] jynus: indeed not [10:02:09] good [10:15:51] jynus: I would need to stop db2185 for like 1h (or less), it is db_inventory backup source for codfw [10:16:11] yeah, no issue [10:16:18] great thanks! [10:17:31] the only ones I would like to keep up are es1025 and es1022, as last week we had the issue with es4 and we have the backups there pending [10:17:49] until tomorrow [10:17:55] yeah, not planning to touch external store at all this week :) [10:24:57] once binlog backups are in place, we could maybe reduce full es backups to every month or every 3 months! [10:25:27] that's going to be really good [10:31:23] host back up [12:31:16] I am going to merge this https://gerrit.wikimedia.org/r/c/operations/dns/+/907873/ I don't expect any issues as nothing seems to be using it, but just in case, heads up! [12:53:32] jynus: Let me fix the db1215 thing. It was cloned from the codfw host, so it probably only has codfw grants [12:53:41] ah! [12:53:45] I was having a look [12:54:06] indeed that was what actually happened [12:54:42] jynus: can you retry now? [12:54:45] Just added them [12:54:54] this was to prevent cross-dc backups and general hardening (dc independence, you know I am big on that 0:-)) [12:55:06] Yeah, I will remove codfw grants now from there [12:55:13] no worries [12:56:01] clean now [12:56:56] if finished now- you can check the file differences between last backup and new one at: http:///dbbackups/jobs/?search=dump+db_inventory [12:57:10] I can paste them elsewhere if needed [12:57:26] or if you prefer, on m1 dbbackups db [12:57:37] jynus: going to a meeting now, but does it look good to you? [12:57:44] yeah, 95.0 KB done [12:57:54] same as before [12:58:10] I need to get used to use the dashboard though [12:58:15] So I will check myself too to practice :) [12:58:17] and same number of files [12:58:35] excellent [12:58:35] it would probably help if it had more reporting features [12:58:36] thanks [12:58:45] like compare backup A and backup B [12:58:53] I remember the ssh tunnel command was on wikitech, right? [12:59:01] I will look for it later [12:59:03] Meeting! [12:59:05] Thanks for your help [13:01:00] I will add it here: https://wikitech.wikimedia.org/wiki/MariaDB/Backups#Monitoring_and_metadata_gathering [13:26:17] thanks [13:57:04] sadly, I have issues uploading a screenshot of the dashboard [14:00:03] yeah, file uploading is broken in wikitech [14:01:35] but looks like the frontend, not storage [14:02:50] and I cannot fix it because https://wikitech.wikimedia.org/wiki/File:Dbbackups_dashboard_status.png shows no file or thumb, but if I try to overwrite it, it says it is a duplicate [14:12:04] I asked on the train ticket if it seems worth reporting it [14:36:53] doesn't seem to be train related, as it only happens on wikitech, not on testwiki [14:44:16] hey, just a swift-related heads-up (cc Emperor) - thumbor-k8s is going to be receiving prod traffic without interruption from now on. We've hopefully resolved all of the issues that manifested in the swift graphs, so this won't require action from you but figured it was important to mention [14:52:20] thanks for letting me know :) [16:58:30] Emperor: sorry for the delay on seeing T327253, for some reason I didn't see the first time you tagged me last week (probably because multiple emergencys going at the time) [16:58:31] T327253: >=27k objects listed in swift containers but not extant - https://phabricator.wikimedia.org/T327253