[10:03:02] !incidents [10:03:02] 4046 (UNACKED) [3x] Primary outbound port utilisation over 80% (paged) global noc () [10:03:03] 4045 (RESOLVED) EtcdReplicationDown etcd sre (conf2005:8000 etcdmirror codfw) [10:03:03] 4044 (RESOLVED) ATSBackendErrorsHigh cache_text sre (rest-gateway.discovery.wmnet esams) [10:03:03] 4043 (RESOLVED) ProbeDown sre (10.2.1.17 ip4 restbase-https:7443 probes/service http_restbase-https_ip4 codfw) [10:03:07] !ack 4046 [10:03:08] 4046 (ACKED) [3x] Primary outbound port utilisation over 80% (paged) global noc () [10:07:57] !incidents [10:07:57] 4046 (ACKED) [3x] Primary outbound port utilisation over 80% (paged) global noc () [10:07:57] 4047 (ACKED) Primary inbound port utilisation over 80% (paged) global noc (cr3-eqsin.wikimedia.org) [10:07:57] 4045 (RESOLVED) EtcdReplicationDown etcd sre (conf2005:8000 etcdmirror codfw) [10:07:57] 4044 (RESOLVED) ATSBackendErrorsHigh cache_text sre (rest-gateway.discovery.wmnet esams) [10:07:58] 4043 (RESOLVED) ProbeDown sre (10.2.1.17 ip4 restbase-https:7443 probes/service http_restbase-https_ip4 codfw) [10:15:20] !incidents [10:15:20] 4046 (ACKED) [3x] Primary outbound port utilisation over 80% (paged) global noc () [10:15:21] 4047 (ACKED) Primary inbound port utilisation over 80% (paged) global noc (cr3-eqsin.wikimedia.org) [10:15:21] 4045 (RESOLVED) EtcdReplicationDown etcd sre (conf2005:8000 etcdmirror codfw) [10:15:21] 4044 (RESOLVED) ATSBackendErrorsHigh cache_text sre (rest-gateway.discovery.wmnet esams) [10:15:21] 4043 (RESOLVED) ProbeDown sre (10.2.1.17 ip4 restbase-https:7443 probes/service http_restbase-https_ip4 codfw) [10:41:56] !incidents [10:41:56] 4046 (RESOLVED) [3x] Primary outbound port utilisation over 80% (paged) global noc () [10:41:56] 4047 (RESOLVED) Primary inbound port utilisation over 80% (paged) global noc (cr3-eqsin.wikimedia.org) [10:41:56] 4045 (RESOLVED) EtcdReplicationDown etcd sre (conf2005:8000 etcdmirror codfw) [10:41:56] 4044 (RESOLVED) ATSBackendErrorsHigh cache_text sre (rest-gateway.discovery.wmnet esams) [10:41:57] 4043 (RESOLVED) ProbeDown sre (10.2.1.17 ip4 restbase-https:7443 probes/service http_restbase-https_ip4 codfw) [11:48:48] <_joe_> marostegui, volans heads up I'm installing the new etcd-mirror to conf2005; then I'll have to restart the service. There is a very slim chance it could page and recover [11:49:17] ok [11:49:20] thanks for the heads up [11:51:38] <_joe_> done [12:24:13] thx [12:24:30] * volans was at lunch with my phone, laptop in the other room :D [19:28:12] not gonna mess with it on Friday, but I just stumbled on this ticket and an old commit that never went out: [19:28:25] https://phabricator.wikimedia.org/T337446 -> https://gerrit.wikimedia.org/r/c/operations/puppet/+/924508/1/hieradata/common/service.yaml [19:28:46] (we disabled pybal healthchecking of a wikireplicas because of the stuff in that ticket. looks resolved now, but never turned health back on) [19:29:09] just saying it out loud here so maybe someone remembers if I don't :) [19:34:00] I think better shout it on -cloud-admin [19:34:21] or on the ticket, it may be forgotten otherwise [19:35:51] yeah probably :)