[14:10:01] <icinga-wm>	 PROBLEM - MariaDB sustained replica lag on s2 on db2138 is CRITICAL: 13.4 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2138&var-port=13312
[14:10:45] <icinga-wm>	 PROBLEM - MariaDB sustained replica lag on s2 on db2125 is CRITICAL: 8.6 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2125&var-port=9104
[14:11:21] <godog>	 Emperor: making sure we're not waiting on one another, do you have time/bandwidth to take over T288458 ? thanks!
[14:11:22] <stashbot>	 T288458: Put ms-be20[62-65] in service - https://phabricator.wikimedia.org/T288458
[14:11:25] <icinga-wm>	 PROBLEM - MariaDB sustained replica lag on s2 on db2095 is CRITICAL: 12.2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2095&var-port=13312
[14:12:05] <icinga-wm>	 RECOVERY - MariaDB sustained replica lag on s2 on db2138 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2138&var-port=13312
[14:12:49] <icinga-wm>	 RECOVERY - MariaDB sustained replica lag on s2 on db2125 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2125&var-port=9104
[14:13:31] <icinga-wm>	 RECOVERY - MariaDB sustained replica lag on s2 on db2095 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2095&var-port=13312
[14:15:22] <kormat>	 oh, that's me, sorry folks
[14:15:31] <kormat>	 i should put in downtimes
[14:15:39] <godog>	 tut tut
[14:24:32] <kormat>	 finally, i know how elukey feels all the time
[14:29:21] <godog>	 lulz, you're welcome
[15:21:00] <sobanski>	 godog Emperor I just got a bunch of emails from a Pontoon Thanos host. Expected?
[15:22:11] <godog>	 sobanski: not sure, can you forward them over ?
[15:22:51] <sobanski>	 Forwarded.
[15:22:59] <kormat>	 marostegui: what have you done to poor s8 in codfw?
[15:23:25] <marostegui>	 reimaged the master
[15:23:32] <kormat>	 you monster
[15:23:36] <marostegui>	 checking its tables now, will be finished tomorrow 
[15:23:48] <godog>	 sobanski: thank you, not expected no, I'll fix it
[15:27:23] <Emperor>	 godog: thanks :)
[15:34:11] <Emperor>	 godog: apropos T288458, is codfw currently pooled? the sre.discovery.service-route cookbook doesn't work for status (I think someone was working on a fix, but I've lost the CR)
[15:34:11] <stashbot>	 T288458: Put ms-be20[62-65] in service - https://phabricator.wikimedia.org/T288458
[15:35:07] <Emperor>	 ah https://gerrit.wikimedia.org/r/730692
[15:36:08] <godog>	 Emperor: yeah codfw is pooled atm, I was looking at https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?orgId=1&from=now-3h&to=now-1m&var-DC=codfw&var-prometheus=codfw%20prometheus%2Fops&refresh=1m
[15:37:13] <Emperor>	 Is the plan to gradually trickle more weight onto these nodes, then, or to depool codfw swift instead?
[15:38:35] <godog>	 yeah less impactful to depool codfw and then bump the weight and repool once rebalances have finished
[15:39:47] <godog>	 eqiad depooled last week was putting strain on the codfw/eqiad bandwidth when one of the links failed, the link has fixed and codfw depooled iirc pulls in less bandwidth anyways
[15:47:59] <Emperor>	 OK, so I can't use the cookbook to check what's pooled, is there a rune for confctl to check? confctl --tags --action get all wants dc, cluster, and service (and I'm not sure what should be in "cluster" here)
[15:49:03] <Emperor>	 [I can click my way to https://config-master.wikimedia.org/pybal/eqiad/swift-https et al, but that's less ideal]
[15:50:49] <volans>	 $ confctl --object-type discovery select 'dnsdisc=swift' get
[15:50:50] <volans>	 {"codfw": {"pooled": true, "references": [], "ttl": 300}, "tags": "dnsdisc=swift"}
[15:50:53] <volans>	 {"eqiad": {"pooled": true, "references": [], "ttl": 300}, "tags": "dnsdisc=swift"}
[15:51:31] <volans>	 Emperor: ^
[15:52:37] <volans>	 'dnsdisc=swift.*' will give uuou also the -ro -rw ones
[15:53:08] <volans>	 this is for the discovery part of it
[15:53:14] <volans>	 then if you want to check single hosts
[15:54:24] <volans>	 confctl select 'cluster=swift' get
[15:54:27] <Emperor>	 Ah, so codfw is already depooled for swift-rw
[15:54:50] <volans>	 and you can add to the selection dc= or service=, comma separated
[15:55:01] <volans>	 to fine-tune the selection
[15:55:28] <Emperor>	 but not for discovery? `confctl --object-type discovery select 'dc=codfw,dnsdisc=swift.*' get` returns me the eqiad ones too
[15:55:55] <volans>	 there the dc is used as key, so you have to use name=codfw
[15:56:11] <volans>	 'dnsdisc=swift.*,name=codfw'
[15:56:24] <Emperor>	 ah, thanks. Is this all documented somewhere?
[15:56:58] <volans>	 probably in https://wikitech.wikimedia.org/wiki/Conftool I don't guarantee it's up-to-dateness though ;)
[15:57:01] <Emperor>	 And so to depool codfw-swift, I'd do `confctl --object-type discover select 'name=codfw,dnsdisc=swift.*' depool` ?
[15:57:02] <volans>	 blame j.o..e :-P
[15:57:25] <volans>	 what's the issue with the cookbook? we should fix that IMHO
[15:57:48] <volans>	 Emperor: not exactly
[15:57:48] <volans>	 see
[15:57:49] <volans>	 https://wikitech.wikimedia.org/wiki/DNS/Discovery#How_to_manage_a_DNS_Discovery_service
[15:58:28] <Emperor>	 volans: see https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/730692 (TL;DR - it doesn't cope with CNAMES, which the swift discovery records use AIUI)
[15:58:55] <volans>	 ah yes that one, I even commented :D
[15:58:56] <volans>	 sorry
[15:59:09] <Emperor>	 NP, thanks for your help
[15:59:40] <Emperor>	 I think, then: `confctl --object-type discovery select 'name=codfw,dnsdisc=swift.*' set/pooled=false` (but tomorrow)
[16:00:41] <volans>	 yes, if you want to depool them all
[16:03:31] <Emperor>	 godog: I presume we want to depool them...
[16:09:11] <Emperor>	 godog: (and that it's expected that codfw wasn't pooled for swift-rw)?
[16:10:15] <godog>	 Emperor: in the meeting now, but "yes" to both
[16:11:34] <volans>	 Emperor: in the meeting too, can elaborate later but look at the active_active key in:
[16:11:37] <volans>	 https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/hieradata/common/service.yaml#2365
[16:12:27] <volans>	 that is basically https://wikitech.wikimedia.org/wiki/DNS/Discovery#Active/passive_services
[16:24:10] <Emperor>	 Thanks :)