[01:08:21] PROBLEM - MariaDB sustained replica lag on m1 on db1217 is CRITICAL: 3.6 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1217&var-port=13321 [01:09:45] RECOVERY - MariaDB sustained replica lag on m1 on db1217 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1217&var-port=13321 [09:53:17] jynus: did you read this part? https://phabricator.wikimedia.org/T347318#9196458 [09:53:25] there's a cpu error there, not sure if you maybe missed it? [09:53:41] no, that is not a cpu error [09:53:44] as you said there was no recent error [09:53:58] it is the normal log when cpu lose power for reboot [09:54:08] ah really? [09:54:19] Oh wait, it said reset [09:54:26] I read error [09:54:27] XD [09:54:32] I guess I am used to CPU errors [09:54:33] "System is performing a CPU reset because of system power off, power on or a warm reset like CTRL-ALT-DEL." [09:54:45] ^full text [09:55:04] so it is the same line for a normal stop [09:55:04] Yeah, I just read the comment yesterday from dcops and I automatically read cpu error [09:55:13] Like the ones we had when a mainboard change is needed [09:55:21] no issue, I scanned for errors on all the logs and saw no errors [09:55:25] yeah [09:55:41] the only thing I could recommend is to restart it several times and see if they surface [09:55:48] :-( [09:56:04] well it is not even booting up, so let's see if they find something onsite [09:57:21] oh, I didn't know that :-( [09:57:55] saw the power redundancy things, but maybe that was normal pdu / rack maintenance [09:58:38] so maybe on power up, if possible, it will be clear [09:58:54] or maybe it just got fried [10:48:12] hi! I need to deploy some wiki replica grant updates (https://gerrit.wikimedia.org/r/961067 and https://gerrit.wikimedia.org/r/961068), how do I do that? [14:23:07] wmfbackups, wmfdb and wmfmariadbpy are now available on GitLab [14:23:09] https://gitlab.wikimedia.org/repos/sre/wmfbackups, https://gitlab.wikimedia.org/repos/sre/wmfdb, https://gitlab.wikimedia.org/repos/sre/wmfmariadbpy [15:39:05] taavi, i think the intent is to grant permissions for the correct source ip address of the `labsdbuser`, but also make it so that the user can't do so from the old IP addresses. that should probably be done by marostegui, i think. so, i don't think https://gerrit.wikimedia.org/r/961067 should actually be merged (that is just specifying intent), but https://gerrit.wikimedia.org/r/961068) would be appropriate to merge. [15:41:33] marostegui does that sound about right? andrewbogott i think you're well aware with other cutover of IPs and such in cloud (as you're all chatting there, too), but ^ for visibility [15:43:24] taavi: i said 'labsdbuser', i meant 'labsadmin' (the user, not the role). technically speaking the labsadmin user could have probably granted itself these permissions prior, but it seems better to grant super level perms from a DBA here, obviously with an audit trail of the ticket. one thing that ought to happen is the new grants should be planted, then the tooling should be double checked to ensure that things are [15:43:39] actually working, THEN when they're working the old grants could be removed [15:55:05] (I'm in meetings and this is a bit too much for me to follow on the side) [15:57:55] taavi: actually, let me retract non-merge of https://gerrit.wikimedia.org/r/961067 - that would make sense, i just realized that these are stacked patches... [16:06:18] I would prefer if all this is done via tasks rather than IRC [16:08:27] marostegui, the first patch didn't reference the task, but i saw the second one does. it's here at https://phabricator.wikimedia.org/T347381 - i just now added you and Andrew to the patches and the ticket [16:14:50] ok [16:14:55] thanks [16:15:08] I'll take a look when I can