[00:02:43] (DiskSpace) resolved: Disk space idp2002:9100:/ 5.262% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=idp2002 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [00:26:13] (DiskSpace) firing: Disk space idp2002:9100:/ 5.992% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=idp2002 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [01:26:13] (DiskSpace) resolved: Disk space idp2002:9100:/ 5.943% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=idp2002 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [03:48:47] (SystemdUnitFailed) firing: wmf_auto_restart_uwsgi-puppetdb-microservice.service Failed on puppetdb1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:26:13] (DiskSpace) firing: Disk space idp2002:9100:/ 5.997% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=idp2002 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [06:27:53] 10netops, 10Infrastructure-Foundations, 10SRE, 10Traffic: Remove static routes for ns[01] and replace their announcements with bird - https://phabricator.wikimedia.org/T348041 (10ayounsi) Indeed, looks about right :) For Puppet, if we can change the Hiera merge strategy to `hash` for `profile::bird::adve... [06:41:13] (DiskSpace) resolved: Disk space idp2002:9100:/ 5.817% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=idp2002 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [07:12:18] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 3 others: Upgrade cloudsw1-c8-eqiad and cloudsw1-d5-eqiad to Junos 20+ - https://phabricator.wikimedia.org/T316544 (10dcaro) [07:23:13] (DiskSpace) firing: Disk space idp2002:9100:/ 5.976% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=idp2002 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [07:27:29] 10netops, 10Infrastructure-Foundations, 10SRE: Remove static routes for anycast prefixes - https://phabricator.wikimedia.org/T347494 (10ayounsi) 05Open→03Resolved All done. [07:48:47] (SystemdUnitFailed) firing: wmf_auto_restart_uwsgi-puppetdb-microservice.service Failed on puppetdb1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:53:47] (SystemdUnitFailed) resolved: wmf_auto_restart_uwsgi-puppetdb-microservice.service Failed on puppetdb1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:58:13] (DiskSpace) resolved: Disk space idp2002:9100:/ 5.683% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=idp2002 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [09:12:29] 10netops, 10Infrastructure-Foundations, 10SRE, 10Traffic: Remove static routes for ns[01] and replace their announcements with bird - https://phabricator.wikimedia.org/T348041 (10jbond) Proposal looks good to me, minor nit would be to rename `ACAST_PS_ADVERTISE` to remove references to anycast to avoid con... [10:15:29] 10netops, 10Infrastructure-Foundations, 10SRE, 10Traffic: Remove static routes for ns[01] and replace their announcements with bird - https://phabricator.wikimedia.org/T348041 (10cmooney) > Otherwise, it should be fairly straightforward: we add the VIP the same way we do for the anycast IPs, making sure to... [10:40:08] 10netops, 10Infrastructure-Foundations, 10SRE: Firewall filter blocking traceroute in underlay QFX5120 EVPN - https://phabricator.wikimedia.org/T348120 (10cmooney) p:05Triage→03Low [11:00:44] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Server moves in codfw to support switch numbering scheme - https://phabricator.wikimedia.org/T348125 (10cmooney) p:05Triage→03Medium [11:08:25] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Codfw row A-B server moves - port-block constraint / numbering - https://phabricator.wikimedia.org/T348125 (10cmooney) [11:25:58] 10netops, 10Infrastructure-Foundations, 10SRE: Create automation to move servers in Netbox from old to new switch - https://phabricator.wikimedia.org/T348129 (10cmooney) p:05Triage→03Medium [11:26:59] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Codfw row A-B migration - non-standard device moves - https://phabricator.wikimedia.org/T348128 (10cmooney) [11:38:07] 10netops, 10Infrastructure-Foundations, 10SRE, 10Traffic: Remove static routes for ns[01] and replace their announcements with bird - https://phabricator.wikimedia.org/T348041 (10ayounsi) `ACAST_PS_ADVERTISE` is hardcoded in [[ https://github.com/unixsurfer/anycast_healthchecker | anycast_healthchecker ]]... [12:11:25] 10netops, 10Infrastructure-Foundations, 10SRE: Firewall filter blocking traceroute in underlay QFX5120 EVPN - https://phabricator.wikimedia.org/T348120 (10ayounsi) Nice rabbit hole! I found this: https://www.reddit.com/r/Juniper/comments/g12qxh/the_right_way_to_allow_traceroute_in_re_filter/ So it's possible... [12:47:41] 10netops, 10Infrastructure-Foundations, 10SRE: cr2-esams:FPC0 Parity error - https://phabricator.wikimedia.org/T318783 (10cmooney) 05Open→03Resolved I am going to close this task, the FPC issue was addressed through card replacement (although we decom'd router in the meantime). Despite my best efforts i... [13:08:33] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE: Change cloud-instance-transport vlan subnets from /30 to /29 - https://phabricator.wikimedia.org/T348140 (10cmooney) p:05Triage→03Low [13:08:57] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE: Change cloud-instance-transport vlan subnets from /30 to /29 - https://phabricator.wikimedia.org/T348140 (10cmooney) [13:34:10] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Change cloud-instance-transport vlan subnets from /30 to /29 - https://phabricator.wikimedia.org/T348140 (10cmooney) [13:40:48] 10netops, 10Infrastructure-Foundations, 10SRE: Create automation to move servers in Netbox from old to new switch - https://phabricator.wikimedia.org/T348129 (10Papaul) @cmooney this should be a complication if we did have a mixed of 1G and 10G servers within the same rack which is not the case. In all exist... [14:00:39] 10netops, 10Infrastructure-Foundations, 10SRE, 10Traffic: Remove static routes for ns[01] and replace their announcements with bird - https://phabricator.wikimedia.org/T348041 (10ssingh) Thanks everyone for the discussion and feedback above! So it seems like two main points have come up above: 1. We can c... [14:01:01] 10netops, 10Infrastructure-Foundations, 10SRE, 10Traffic: Remove static routes for ns[01] and replace their announcements with bird - https://phabricator.wikimedia.org/T348041 (10jbond) >ACAST_PS_ADVERTISE is hardcoded in anycast_healthchecker (the tool we use to monitor services). in that case agree its t... [14:03:12] 10netops, 10Infrastructure-Foundations, 10SRE, 10Traffic: Remove static routes for ns[01] and replace their announcements with bird - https://phabricator.wikimedia.org/T348041 (10ssingh) >>! In T348041#9224990, @jbond wrote: >>ACAST_PS_ADVERTISE is hardcoded in anycast_healthchecker (the tool we use to mon... [14:36:46] 10netops, 10Infrastructure-Foundations, 10SRE, 10Traffic: Remove static routes for ns[01] and replace their announcements with bird - https://phabricator.wikimedia.org/T348041 (10ayounsi) Ah right! My bad. Unrelated and maybe a scope creep, but we could also start by advertising a unicast v6 IP to validat... [14:59:19] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Codfw row A-B server moves - port-block constraint / numbering - https://phabricator.wikimedia.org/T348125 (10cmooney) 05Open→03Resolved @papaul answered in T348129#9224878, seems like we're in a good place given previous rack assignment as '1... [14:59:27] 10netops, 10Infrastructure-Foundations, 10SRE: Plan codfw row A/B top-of-rack switch refresh - https://phabricator.wikimedia.org/T327938 (10cmooney) [15:01:41] 10netops, 10Infrastructure-Foundations, 10SRE: Create automation to move servers in Netbox from old to new switch - https://phabricator.wikimedia.org/T348129 (10cmooney) >>! In T348129#9224878, @Papaul wrote: > @cmooney this should be a complication if we did have a mixed of 1G and 10G servers within the sam... [15:12:46] 10netops, 10Infrastructure-Foundations, 10SRE, 10Traffic: Remove static routes for ns[01] and replace their announcements with bird - https://phabricator.wikimedia.org/T348041 (10cmooney) >>! In T348041#9222035, @ssingh wrote: > We can and probably should have a backup static routes for each of `ns[01]` bu... [15:23:13] 10netops, 10Infrastructure-Foundations, 10SRE, 10Traffic: Remove static routes for ns[01] and replace their announcements with bird - https://phabricator.wikimedia.org/T348041 (10cmooney) >>! In T348041#9222035, @ssingh wrote: > We can and probably should have a backup static routes for each of `ns[01]` bu... [15:25:09] 10netops, 10Infrastructure-Foundations, 10SRE, 10Traffic: Remove static routes for ns[01] and replace their announcements with bird - https://phabricator.wikimedia.org/T348041 (10ayounsi) Oops, I missed some of the comments. * I'm in favor of ditching the statics * Changing the Hiera merge strategy seems... [15:38:09] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Migrate atlas-codfw from asw-a1-codfw to lsw1-a1-codfw - https://phabricator.wikimedia.org/T348159 (10cmooney) p:05Triage→03Medium [15:38:29] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Migrate atlas-codfw from asw-a1-codfw to lsw1-a1-codfw - https://phabricator.wikimedia.org/T348159 (10cmooney) [15:38:41] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Codfw row A-B migration - non-standard device moves - https://phabricator.wikimedia.org/T348128 (10cmooney) [15:39:37] 10netops, 10Infrastructure-Foundations, 10SRE, 10Traffic: Remove static routes for ns[01] and replace their announcements with bird - https://phabricator.wikimedia.org/T348041 (10ssingh) >>! In T348041#9225321, @cmooney wrote: >>>! In T348041#9222035, @ssingh wrote: >> We can and probably should have a bac... [15:41:18] 10netops, 10Infrastructure-Foundations, 10SRE, 10Traffic: Remove static routes for ns[01] and replace their announcements with bird - https://phabricator.wikimedia.org/T348041 (10ssingh) >>! In T348041#9225405, @ayounsi wrote: > Oops, I missed some of the comments. > > * I'm in favor of ditching the stati... [15:42:34] jbond: sorry for putting you on the spot but if you have a preference above, help finalize (not make!) the decision re: the Puppet change in unicasting bird [15:43:46] sukhe: looking [15:45:49] 10netops, 10Infrastructure-Foundations, 10SRE, 10Traffic: Remove static routes for ns[01] and replace their announcements with bird - https://phabricator.wikimedia.org/T348041 (10jbond) >>! In T348041#9225478, @ssingh wrote: >>>! In T348041#9225405, @ayounsi wrote: >> * Changing the Hiera merge strategy s... [15:45:49] sukhe: sent comment but +1 to merge stratagy [15:46:02] that seals it, thank you! [15:46:44] no probs [15:46:54] 10netops, 10Infrastructure-Foundations, 10SRE, 10Traffic: Remove static routes for ns[01] and replace their announcements with bird - https://phabricator.wikimedia.org/T348041 (10ssingh) For posterity: - no static routes - merge strategy Arzhel mentioned above - I am going to rename `ACAST_PS_ADVERTISE`... [15:49:00] for anyone else: if there are strong objections to renaming ACAST_PS_ADVERTISE, please let me know [15:49:12] otherwise I am going to go ahead with that and document that in a separate subtask [15:50:30] sukhe: i have a minor prefernce to rename but will leave it to you [15:51:03] ok thanks [15:59:20] 10netops, 10Infrastructure-Foundations, 10SRE: Create automation to move servers in Netbox from old to new switch - https://phabricator.wikimedia.org/T348129 (10Papaul) I am thinking about something to consider when going servers refresh or new servers [16:09:41] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Migrate mr1-codfw from asw-a1-codfw to lsw1-a1-codfw - https://phabricator.wikimedia.org/T348164 (10cmooney) p:05Triage→03Medium [16:11:12] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Codfw row A-B migration - non-standard device moves - https://phabricator.wikimedia.org/T348128 (10cmooney) [16:11:19] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Migrate mr1-codfw from asw-a1-codfw to lsw1-a1-codfw - https://phabricator.wikimedia.org/T348164 (10cmooney) [16:22:53] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Migrate atlas-codfw from asw-a1-codfw to lsw1-a1-codfw - https://phabricator.wikimedia.org/T348159 (10ayounsi) Yeah, that's perfect. We can revisit the day it dies and needs to be migrated to a VM. [16:36:59] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Migrate atlas-codfw from asw-a1-codfw to lsw1-a1-codfw - https://phabricator.wikimedia.org/T348159 (10cmooney) [17:24:58] 10netops, 10Infrastructure-Foundations, 10SRE, 10Traffic, 10Patch-For-Review: Remove static routes for ns[01] and replace their announcements with bird - https://phabricator.wikimedia.org/T348041 (10ssingh) [18:00:58] 10netops, 10Infrastructure-Foundations, 10SRE, 10Traffic, 10ops-codfw: Migrate lvs2011 and lvs2012 to new top-of-rack switches - https://phabricator.wikimedia.org/T348178 (10cmooney) p:05Triage→03Medium [18:01:47] 10netops, 10Infrastructure-Foundations, 10SRE, 10Traffic, 10ops-codfw: Migrate lvs2011 and lvs2012 to new top-of-rack switches - https://phabricator.wikimedia.org/T348178 (10cmooney) [18:01:55] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Codfw row A-B migration - non-standard device moves - https://phabricator.wikimedia.org/T348128 (10cmooney) [20:24:19] hello all; signed up for wiki dev account... signed in and get "Bitu Error". Asked in slack and they sent to page referencing this IRC (haven't used irc in a looooong time). [20:41:08] ....crickets. kk. sending email.