[00:18:30] (SystemdUnitFailed) firing: (3) debmonitor-maintenance-gc.service Failed on debmonitor2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:18:30] (SystemdUnitFailed) firing: (3) debmonitor-maintenance-gc.service Failed on debmonitor2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:23:30] (SystemdUnitFailed) firing: (3) debmonitor-maintenance-gc.service Failed on debmonitor2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:03:30] (SystemdUnitFailed) firing: (4) httpbb_hourly_appserver.service Failed on cumin2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:03:30] (SystemdUnitFailed) firing: (4) httpbb_hourly_appserver.service Failed on cumin2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:36:18] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Add Dell switches support to Homer/Cookbooks - https://phabricator.wikimedia.org/T320638 (10ayounsi) Quick status update regarding Homer. With those 3 patches: * Initial OpenConfig/SONiC support to wmf-netbox - https://gerrit.wikimedia.org/... [09:08:32] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Add Dell switches support to Homer/Cookbooks - https://phabricator.wikimedia.org/T320638 (10cmooney) Amazing work! Looks great. >>! In T320638#9082582, @ayounsi wrote: > * The ordering can be problematic (`# TODO needs to happen after the... [09:11:18] 0fyi back from vacation today, catching up on emails ping if there is anything that needs pushing t the top of the list [09:11:53] welcome back jbond [09:13:06] chers [09:13:10] cheers [09:14:35] welcome back [09:27:44] 10SRE-tools, 10Cloud-VPS, 10Infrastructure-Foundations, 10Goal, 10cloud-services-team (FY2023/2024-Q1): cloudcumin: decide sudoers rules for users without global root - https://phabricator.wikimedia.org/T325067 (10fnegri) p:05Triage→03Medium [09:38:34] 10SRE-tools, 10Infrastructure-Foundations, 10Spicerack, 10Patch-For-Review: Spicerack: add distributed locking support - https://phabricator.wikimedia.org/T341973 (10jbond) >>! In T341973#9049479, @bking wrote: > Swift > - CON: [[ https://platform.swiftstack.com/docs/introduction/openstack_swift.html#mass... [09:58:18] jbond: welcome back :) [09:58:34] I think sukhe probably caught up with you about the ns2 stuff for the upcoming knams move? [09:58:39] https://phabricator.wikimedia.org/T343942 [09:59:08] your input on that would be welcome if you get a moment [10:03:31] (SystemdUnitFailed) firing: (3) debmonitor-maintenance-gc.service Failed on debmonitor2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:11:10] topranks, sukhe, fyi the drmrs-eqiad telxius transport is back online, so all clear for tomorrow's depool [10:11:40] XioNoX: this is the good news we need, thank you :) [10:11:46] absolutely :) [10:31:34] topranks: yes he did i just sent a response [10:31:45] thanks! [10:31:48] np [10:34:08] topranks: sukhe: one thing thats probably worth also confirming is that updates to glue records can be preformed as quickly as updats to the ns set. they probably are but there is something in the recesses of my memory that makes me think glue record updates can be a bit more of a pain and take more time to process [10:34:27] jbond: ok [10:34:36] just to speed this up and knowing this is not your call [10:34:40] which one would you pick? [10:34:43] just one :) [10:34:54] out of the options above that is [10:35:10] I don't think anyone is saying it's their call or not, just trying to see if we can have some consensus [10:35:20] and quickly, if possible, which then rules out 1 anyway [10:35:54] as mentoned on the task i think id go for option 2 as thats what we would probably do in a real incident [10:36:02] ok thanks [10:36:24] id rule out option 1 all together its seems like to many moving parts and timelines are to tight [10:44:27] yeah option 1 does not seem workable to me [10:48:01] jbond: What's a "real incident" ? Site fully down with no head's up? [10:51:59] XioNoX: yes [10:52:33] or losing both core routeres in ams [10:52:34] I guess we don't have any procedure if that would happen [10:53:14] in an outage, option 2 is the one that is fully under our control (no need to wait for 3rd parties) [10:53:39] but could have side effects if the site were to come back up while being advertised from a different DC [10:53:51] Initially I expect we'd just depool the site and let queries to ns2 fail / retry via ns0/ns1 [10:54:04] yeah, we would have bigger things to deal with [10:54:36] then it would depends on the expected length of the outage, etc [10:54:44] i thought thatt at least for the dns side of things the goal is that all ns serveres shuld be able to answer from all sites but yes anything elses on that prefix would likley hav issues [10:55:13] but i could be miss rembering [10:55:27] it's a nice thought experiment, but not sure it's related here, as we have a (tiny) bit of time [10:56:26] longer term anycast seems like the best bet though [10:56:37] keep dns in the main 2 core DCs + anycast from the pops [10:56:50] yeah definitely [10:57:04] this incident has convinced us of that [10:57:25] ahh ok i thought https://wikitech.wikimedia.org/wiki/Anycast_authoritative_DNS was already a bit further on [10:57:50] well, it works and is a thing in that sense [10:58:01] having an authdns in amsterdam is because it was the 3rd site up, but people in asia are suffering from not having dns there [10:58:02] just not "official" and IIRC bblack had some concerns with it [10:58:07] yeah it's fairly well on that I can see anyway [10:58:13] if thats never been tested in any shape or form id probably go for option 3. although it would be great to test opton 2 with this i dont think its fair to add that to the risk resgitr for this cr [10:59:00] and i dont see any real risk with optopn 3 assuming with ever server we pick can handle the addtional load [10:59:01] we are using the anycast network for other public facing, non-critical stuff (WDNS) but there isn't any reason why it shouldn't work [10:59:16] experimental in that sense? probably yes, since we have never actually put that IP anywhere, other than it existing [10:59:35] WDNS - Wikidough, but that's hardly a good measure of a critical service yet that gets significant traffic [11:00:36] ultimately I think it's fairly safe to use it [11:00:38] given that 1 is completely out, 2 is uncertain, I vote for 3 [11:00:48] the IP is routing ok in terms of the anycast setup - we've tested that [11:01:02] yes wikidough has its own prefixes so dosn't aliviate the concerns raised other then to confirm anycast worsk [11:01:03] otherwise it's hitting our existing dns hosts, which we know are good and able to handle the load [11:01:35] yeah load is not an issue and the anycast one will be even more spread out than ns0/1, which just go to eqiad/codfw [11:02:45] ok [11:02:48] 3 it is then? I vote for it [11:03:07] +1 [11:03:11] feel free to put it on me if things go south :P [11:03:11] +1 [11:03:22] "I accept full responsbility of the decision" [11:03:30] sukhe: we will load balance anycast's fault on all of us [11:03:33] hahaha [11:03:39] ECMP indeed [11:04:49] topranks: aye nay abstain? :) [11:05:12] option 3 for me yep [11:05:24] cool :) [11:05:30] you can blame me btw :) [11:05:46] going to send an email to markmonitor asking for ns2 to be updated to 198.35.27.27 [11:05:55] ok [11:05:59] aka nsa.wikimedia.org. should we let the NSA know as well? [11:06:02] kidding, they already do :> [11:06:07] hahah [11:06:10] hahaha beat me to the punchline :P [11:06:26] we need to prep a patch to change the A record for ns2.wikimedia.org in our own zone too [11:06:37] yep [11:06:40] on it [11:06:49] and then I will let rob send that email anyway, since he has to authorize the change (I can't) [11:08:55] cool thanks [11:09:42] sukhe: can we add more authorized people to the list? And maybe make sure former employees are not on it anymore? [11:09:58] I'd vote to add sukhe [11:10:05] XioNoX: it's robh and bblack currently IIRC [11:10:18] :] [11:10:21] would be nice to have a 3rd [11:10:22] we can look into that [11:10:23] yeah [11:11:28] Being in a different country (with different holidays) help too [11:15:53] +1 to add more and +1 to add sukhe [11:16:38] XioNoX: just looking at the VM setup, we have no ping offload VM in drmrs I think? [11:17:06] we do currently in esams, wondering what the approach should be [11:17:11] topranks: nah, and probably not worth re-creating them in new-esams [11:17:37] ok, we can always do it after the main parts of the move anyway [11:17:50] yeah exactly, let's not add this snowflake to the mix for now [11:18:13] there are 2 netflow VMs that need to get created [11:18:30] as well as a prometheus host [11:18:41] (vm) [11:22:20] topranks: only 1 netflow [11:22:42] topranks: we will take care of those remotely [11:23:04] ok cool... yeah was looking at drmrs there are 2, but 1 probably fine [11:32:07] we should think about the timing of the change too [11:32:36] given that 198.35.27.27 is already serving, I think we can merge the update in our zone file as soon as we let markmonitor update it [11:33:11] then that way we can cover both ns2 pointing to 91.198.174.239 and 198.35.27.27 [11:33:14] any thoughts on taht? [11:33:30] (SystemdUnitFailed) firing: (4) httpbb_hourly_appserver.service Failed on cumin1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:33:46] and then depending on where the TTLs expired (or not), requests can come to either the anycasted IP or the current ns2 one [11:36:50] sukhe: yeah I'd thought about that, I don't think there is any constraint on the timing [11:37:21] as you say we can server clients on both IPs simultaneously right now, so I think we can make the change and let caches time out naturally [11:38:06] if a given resolver gets inconsistent IP back from .org auth servers versus us it shouldn't matter, it will try one or other of them and both should work [11:39:16] btw I have provisionally allocated 185.15.59.231/32 in Netbox for the new ns2 IP we'll serve from dns3003/3004 post-migration [11:40:41] cool thanks :) [11:40:52] I will merge our change once we get a confirmation from markmonitor [11:40:58] just in case™ [11:41:04] https://gerrit.wikimedia.org/r/q/topic:T343942 [12:06:59] 10Mail, 10Data-Platform-SRE, 10Infrastructure-Foundations: kerberos manage_principals.py emails go to spam - https://phabricator.wikimedia.org/T318155 (10BTullis) >>! In T318155#9007691, @MoritzMuehlenhoff wrote: > Maybe a very quick fix with immediate impact is to simply move away from using the local host... [12:12:11] 10CAS-SSO, 10Infrastructure-Foundations, 10SRE, 10collaboration-services, and 4 others: migrate gitlab away from the CAS protocol - https://phabricator.wikimedia.org/T320390 (10jbond) >>! In T320390#9068521, @Jelto wrote: > @jbond @SLyngshede-WMF do you have a idea how to change the name GitLab uses with O... [12:17:10] 10netops, 10Infrastructure-Foundations, 10SRE: Announce new public IPv6 prefix from Amsterdam for knams migration - https://phabricator.wikimedia.org/T343216 (10cmooney) Range is being accepted by Arelion according to their looking glass: ` Router: adm-b6 / Amsterdam (Iron Mountain, Haarlem) Command: show bg... [12:26:29] 10netops, 10Infrastructure-Foundations, 10SRE: Announce new public IPv6 prefix from Amsterdam for knams migration - https://phabricator.wikimedia.org/T343216 (10cmooney) HE also accepting and path I'm taking from home connection: ` core1.ams7.he.net> show ipv6 bgp routes detail 2a02:ec80:300::/48 Number... [12:28:20] topranks: [12:28:21] https://gerrit.wikimedia.org/r/c/operations/puppet/+/944875/1/hieradata/common/lvs/interfaces.yaml [12:28:28] guessing no update is required to this? [12:28:32] asking because of the changes inhttps://netbox.wikimedia.org/ipam/prefixes/739/ip-addresses/ [12:28:35] https://netbox.wikimedia.org/ipam/prefixes/739/ip-addresses/ [12:29:13] sukhe: no changes required to that patch no [12:29:24] the public vlan ranges hosts sit on are staying the same [12:29:33] I just had to change the range used for VIPs themselves [12:29:39] ok thanks :) [12:29:54] just sanity-checking stuff in my own head now that ns2 is out of the way temporarily [12:30:47] 10netops, 10Infrastructure-Foundations, 10SRE: Announce new public IPv6 prefix from Amsterdam for knams migration - https://phabricator.wikimedia.org/T343216 (10cmooney) Reachable from VPS in the UK although not sure exactly how it's coming in to us: ` root@uk:~# mtr -z -b -w -c 10 2a02:ec80:300:ffff::187 St... [12:33:30] (SystemdUnitFailed) firing: (4) httpbb_hourly_appserver.service Failed on cumin1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:34:35] 10netops, 10Infrastructure-Foundations, 10SRE: Announce new public IPv6 prefix from Amsterdam for knams migration - https://phabricator.wikimedia.org/T343216 (10cmooney) Also accepted by Liberty Global. They also see a transit route via Tele2 (AS1257) so getting picked up there, as well as from Deutsche Tel... [12:34:54] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: New IP and Vlan allocations for esams knams move - https://phabricator.wikimedia.org/T343214 (10cmooney) [12:35:21] 10netops, 10Infrastructure-Foundations, 10SRE: Announce new public IPv6 prefix from Amsterdam for knams migration - https://phabricator.wikimedia.org/T343216 (10cmooney) 05Open→03Resolved [12:49:14] jbond, moritzm, btullis, there are a few of your hosts alerting in https://netbox.wikimedia.org/extras/reports/results/4882559/ [12:49:31] trying to have a clean slate for next week's work [12:51:54] XioNoX: the puppetserver/db ones are missing because they are in a different puppetdb. the best way to fix that would be https://gerrit.wikimedia.org/r/c/operations/puppet/+/940384 but i still need to check whre we are on that one (still catching up on emails) [12:52:22] XioNoX: Thanks I'm mid-reimage of an-worker1092 - I think it should probably sort itself when the cookbook finishes, but I'll check later. [12:52:57] ok, cool, thanks! [12:57:37] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review, 10User-jbond: Investigate the potential benefits of BGPalerter - https://phabricator.wikimedia.org/T230600 (10ayounsi) 05Open→03Resolved a:03jbond All done! Assigned to jbond as he did most of the work! [13:27:02] 10CAS-SSO, 10Infrastructure-Foundations, 10SRE, 10collaboration-services, and 4 others: migrate gitlab away from the CAS protocol - https://phabricator.wikimedia.org/T320390 (10brennen) > But users may not want to have their full name (cn?) in GitLab displayed (for example Rando McRandomface Jr). So the ui... [13:34:20] sukhe: sorry, got a bit more work for you: https://phabricator.wikimedia.org/T278823#9083690 (/cc topranks) [13:35:18] I think first we need to figure out if we want to work with a whitelist (like the current status quo) or a blacklist (easier to maintain) [13:36:14] we don't tend to get the amplification-based volumetric attacks too often [13:36:32] but I'd still be fairly wary of having a looser policy here, for instance for ntp [13:36:34] yeah we used to [13:36:54] yeah and nothing to say we may not in future [13:36:59] of course [13:37:31] XioNoX: looking shortly! [13:37:31] if we do they'll likely target the lvs vips, text-lb etc, so best cloudflare are not allow udp (ntp/dns) to those [13:37:54] which I guess suggests maintaining the IP-based whitelist of what is allowed get those protocols? [13:38:23] topranks: yeah that's my suggestion to move to a blacklist instead of the current whitelist [13:38:54] ok, so blacklist the udp protocols from say the text-lb vip etc? [13:39:13] we blacklist DNS/UDP for all the VIP ranges, like .224/27 [13:39:38] er, the NS are in them so that doesn't work [13:39:41] I just allocated the new IP for ns2.wikimedia.org from that [13:39:43] haha yeah [13:39:56] or whitelist the NS because they don't change [13:40:01] blacklist the VIPs [13:40:11] I guess both approaches work [13:40:39] what I want to avoid (to be be checked with traffic) is to have to update single server's IPs [13:40:48] because we forgot and we will forget again [13:40:53] question is if it's easier to maintain a whitelist of IPs that we want to allow DNS/NTP to [13:41:03] or maintain a blacklist of IPs we want to block DNS/NTP to [13:41:15] but on the other hand the current setup (whitelist) protects a wider "area" [13:41:41] definitely the blacklist is easier but protects less [13:41:41] yeah overall it's slightly better, but lvs IPs are almost certainly the ones that would be targets [13:41:58] yep [13:42:01] Can we "whitelist" the NS IP, then have a blacklist for the remainder of the /27? [13:42:06] XioNoX: still have to go through the ticket but if it's a matter of updating IPs, we have been doing that in some recent reimages as well and just adding to part of the process [13:42:12] yeah that's what I had in mind [13:42:13] but I will read carefully what the ticket is about and comment [13:43:03] ok well I'm happy with either approach, having the /27 does sound like it might be easier to keep on top of the blacklist [13:44:07] that also allow us to get rid of the cloud specific rules [13:49:46] XioNoX: some anycast related patches, because the setup is still broken and I want to fix it before next week [13:50:01] sukhe: sounds good [13:50:02] just wanted to let you know why you are seeing them now, don't mean to complicate things [13:50:09] but we need to fix them, sadly [13:50:26] this should be the last one, I don't see any other errors [13:56:41] XioNoX: the NTP/DNS roles can be compressed into one? ip.proto eq "udp" and udp.dstport eq 53 && ip.proto eq "ntp" and udp.dstport eq 123, on the same IPs [13:58:28] topranks, sukhe: I just sent a proposal https://phabricator.wikimedia.org/T278823#9083785 [14:04:28] overall LGTM, I wonder if the first rule isn't just a bit too permissive though [14:04:31] (replied on task) [14:10:29] topranks: yeah replied too, we can remove it [14:10:44] XioNoX: [14:10:45] it would defeat the whole thing [14:10:46] ip.dst in {185.15.59.231 208.80.154.238 208.80.153.231 198.35.27.27} and ip.proto eq "udp" and udp.dstport eq 53 [14:11:06] should also include the DNS hosts themselves? [14:11:40] sukhe: no, as they're not being blocked in the block rule below [14:11:59] 198.35.27.27 neither, but I put it there for sake of completeness [14:12:03] queries from the internet should only be sent towards the nsX IPs, not the various hosts actual public IPs [14:14:22] right, but I was thinking about the anycasted IP here, because now we will hit all hosts [14:14:29] but then the VIP comes into picture [14:14:49] note that the rules are also only applied to inbound traffic [14:15:08] yes I assumed that [14:15:09] and even if they hit various servers, it's the destination VIP that matters [14:15:25] right, which is covered above, so OK [14:15:36] thinking if we are missing anything else [14:16:30] I took the border-in firewall filter as example to double check things [14:16:41] we could block more things but better to keep it simple [14:16:55] https://github.com/wikimedia/operations-homer-public/blob/master/policies/cr-border-in.yaml you mean? [14:16:56] it's the volumetric stuff that we want to catch outside of our network [14:17:02] yep [14:17:04] looking [14:17:27] https://github.com/wikimedia/operations-homer-public/blob/master/policies/cr-border-in.yaml#L91-L110 [14:18:00] CF is also v4 only (or we only use it for v6) [14:18:06] er, for v4 [14:19:23] are you telling it's for v4 or asking :) [14:20:02] telling [14:20:11] ok [14:22:17] I need a second pair of eyes on https://gerrit.wikimedia.org/r/c/operations/puppet/+/947810 please :) [14:22:26] we will have to merge it when we update the ns2 records [14:27:13] ty /\ [14:29:19] that's probably the very last step [14:29:36] yep! [14:29:36] maybe after the DC is unreach [14:29:49] will check once more to be absolutely sure [14:30:16] yeah we will know when dns3001/2 stop receiving traffic and see an increase elsewhere [14:30:40] well they still might from the anycasted IP but yeah, should spread around [14:31:11] sukhe: when we will depool esams, we will also stop advertising the anycast ranges from there [14:31:25] including the doh range [14:31:34] so it won't see any prod traffic [14:55:56] XioNoX: dns depool though first right? in which case it doesn't affect this [14:56:02] full site depool yeah [14:56:38] but esams is still online till Sunday though? [14:58:54] sukhe: yeah [14:59:30] 1/ monitoring downtime, 2/ dns depool, 3/ anycast depool [15:04:18] right [15:04:40] so doh range won't see prod traffic in 3) [15:07:03] correct [15:07:21] esams shouldn't have any prod traffic after 3 [15:07:30] only background noise, internet scans, etc [15:07:33] right :) [15:07:38] ssh to bast [16:34:28] (SystemdUnitFailed) firing: (3) debmonitor-maintenance-gc.service Failed on debmonitor2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [17:47:44] 10Mail, 10Data-Platform-SRE, 10Infrastructure-Foundations: kerberos manage_principals.py emails go to spam - https://phabricator.wikimedia.org/T318155 (10BTullis) 05Open→03Resolved a:03BTullis I haven't yet tested this, but I'll be on the lookout for any improvements the next time we have to create a... [19:19:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:29:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:33:30] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:34:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:38:30] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:39:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:43:30] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:44:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:48:33] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:49:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:53:30] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:54:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:59:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [20:04:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [20:09:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [20:18:30] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [20:19:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [20:24:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [20:28:30] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [20:34:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [20:38:30] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [20:39:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [20:48:30] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [20:58:30] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [20:59:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:13:30] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:14:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:28:30] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:29:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:38:30] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:39:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:54:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:58:30] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [22:08:30] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [22:13:30] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [22:24:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [22:29:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [22:34:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [22:38:30] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [22:54:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [22:59:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [23:08:30] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [23:09:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [23:14:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [23:18:30] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [23:19:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [23:23:30] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [23:24:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [23:29:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [23:33:30] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [23:39:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [23:43:30] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [23:44:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [23:48:30] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [23:53:30] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [23:54:29] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [23:58:30] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [23:59:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed