[06:39:22] 10Traffic, 10SRE, 10serviceops, 10Platform Team Initiatives (API Gateway): Handle edge cache invalidation for the api gateway - https://phabricator.wikimedia.org/T324200 (10Joe) p:05Triage→03High [07:04:39] 10Traffic, 10SRE, 10serviceops, 10Platform Team Initiatives (API Gateway): Handle edge cache invalidation for the api gateway - https://phabricator.wikimedia.org/T324200 (10Joe) [07:09:18] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE: Access port speed <= 100Mbps False posatives - https://phabricator.wikimedia.org/T336511 (10ayounsi) The issue is that the check is ran from the switch side, and for the switch the port is up `Physical interface: ge-3/0/22, Enabled, Physical link is... [07:10:15] 10Traffic, 10SRE, 10serviceops, 10Platform Team Initiatives (API Gateway): Handle edge cache invalidation for the api gateway - https://phabricator.wikimedia.org/T324200 (10Joe) My idea for implementing this is as follows: - Create a benthos container - Add a release containing a `Deployment` with N replic... [10:05:06] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE: Access port speed <= 100Mbps False posatives - https://phabricator.wikimedia.org/T336511 (10jbond) WARNING: wild speculation > Is it possible that the server turns its interfaces off when the server is off? i guess if it has wake on lan, or some ty... [10:38:48] 10netops, 10Infrastructure-Foundations, 10SRE, 10SRE-tools, 10Patch-For-Review: Setup zero touch provisioning (ZTP) for network devices - https://phabricator.wikimedia.org/T336485 (10cmooney) One question did arise to me, I'll mention it here but not sure we need to focus on it, at least initially. Shou... [10:39:25] 10Traffic, 10WMF-Legal, 10Patch-For-Review, 10Performance-Team (Radar), 10Privacy: Add no-transform to Cache-Control header - https://phabricator.wikimedia.org/T218618 (10dr0ptp4kt) @BCornwall thanks for the prompt - no strong opinion here. I'm looping @SCherukuwada and @ovasileva in case they have any t... [11:08:53] 10netops, 10Infrastructure-Foundations, 10SRE, 10SRE-tools, 10Patch-For-Review: Setup zero touch provisioning (ZTP) for network devices - https://phabricator.wikimedia.org/T336485 (10Volans) I've seen that option and decided that was not relevant for new host's ztp, but lmk if we need it too. The general... [11:11:01] 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team: Configure cloudsw1-b1-codfw and migrate cloud hosts in codfw B1 to it - https://phabricator.wikimedia.org/T327919 (10aborrero) [11:11:35] 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team: cloudgw: review security policy for edge network - https://phabricator.wikimedia.org/T336368 (10aborrero) 05Open→03Resolved Fixed! thanks [11:17:06] 10netops, 10Infrastructure-Foundations, 10SRE, 10SRE-tools, 10Patch-For-Review: Setup zero touch provisioning (ZTP) for network devices - https://phabricator.wikimedia.org/T336485 (10cmooney) >>! In T336485#8847055, @Volans wrote: > The general usage for that seems to me more for a "reimage" concept of u... [11:21:45] 10netops, 10Infrastructure-Foundations, 10SRE, 10SRE-tools, 10Patch-For-Review: Setup zero touch provisioning (ZTP) for network devices - https://phabricator.wikimedia.org/T336485 (10ayounsi) It would be useful during the initial provisioning to have the device running the Junos version we want on day 1.... [12:03:44] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Migrate row E/F network aggregation to dedicated Spine switches - https://phabricator.wikimedia.org/T322937 (10cmooney) ######Routing issue I hit an issue with the new spines in that the overlay loopback address was not reachable when they... [12:26:08] 10netops, 10Infrastructure-Foundations, 10SRE, 10SRE-tools, 10Patch-For-Review: Setup zero touch provisioning (ZTP) for network devices - https://phabricator.wikimedia.org/T336485 (10Volans) Is it ok to start testing without it? Based on how we want the workflow to go we would need a change in Spicerack... [12:42:24] 10Traffic, 10SRE, 10serviceops, 10Platform Team Initiatives (API Gateway): Handle edge cache invalidation for the api gateway - https://phabricator.wikimedia.org/T324200 (10fgiunchedi) >>! In T324200#8846715, @Joe wrote: > My idea for implementing this is as follows: > - Create a benthos container > - Add... [13:20:56] 10netops, 10Infrastructure-Foundations, 10SRE, 10SRE-tools, 10Patch-For-Review: Setup zero touch provisioning (ZTP) for network devices - https://phabricator.wikimedia.org/T336485 (10ayounsi) sgtm as it's an additional feature and to prevent scope creep but might be worth looking at implementing it soone... [14:02:44] 10netops, 10Infrastructure-Foundations, 10SRE, 10observability: Investigate Junos Prometheus exporter - https://phabricator.wikimedia.org/T333210 (10ayounsi) An alternative (or complement) here would be to go the gNMI way, probably through gNMIc https://github.com/openconfig/gnmic https://www.youtube.com/w... [14:10:15] 10netops, 10Infrastructure-Foundations, 10SRE, 10SRE-tools, 10Patch-For-Review: Setup zero touch provisioning (ZTP) for network devices - https://phabricator.wikimedia.org/T336485 (10Papaul) I 'do agree that we can also have the Junos image for upgrade during the process. Our first goal here was to have... [14:28:31] 10Traffic, 10SRE, 10ops-codfw, 10Patch-For-Review: Q4:rack/decom codfw unified decommission task - https://phabricator.wikimedia.org/T335777 (10ssingh) [14:35:59] 10Traffic, 10SRE, 10ops-codfw, 10Patch-For-Review: Q4:rack/decom codfw unified decommission task - https://phabricator.wikimedia.org/T335777 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by sukhe@cumin2002 for hosts: `dns2001.wikimedia.wmnet` - dns2001.wikimedia.wmnet (**FAIL**) - Down... [15:03:07] 10Traffic, 10SRE, 10ops-codfw, 10Patch-For-Review: Q4:rack/decom codfw unified decommission task - https://phabricator.wikimedia.org/T335777 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by sukhe@cumin2002 for hosts: `dns2001.wikimedia.org` - dns2001.wikimedia.org (**FAIL**) - //Unable... [15:08:13] 10Traffic, 10DC-Ops, 10SRE, 10ops-codfw, 10Patch-For-Review: Q4:rack/setup/install lvs2011, lvs2012, lvs2013, lvs2014 - https://phabricator.wikimedia.org/T326767 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host lvs2012.codfw.wmnet with OS bullseye [15:09:11] 10Traffic, 10DC-Ops, 10SRE, 10ops-codfw, 10Patch-For-Review: Q4:rack/setup/install lvs2011, lvs2012, lvs2013, lvs2014 - https://phabricator.wikimedia.org/T326767 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host lvs2012.codfw.wmnet with OS bullseye executed w... [15:09:35] 10netops, 10Infrastructure-Foundations, 10SRE, 10SRE-tools, 10Patch-For-Review: Setup zero touch provisioning (ZTP) for network devices - https://phabricator.wikimedia.org/T336485 (10Volans) Do we want to hardcode that in the dhcp settings? Or better to pass it dynamically to the cookbook? Based on that... [15:15:29] 10netops, 10Infrastructure-Foundations, 10SRE, 10SRE-tools, 10Patch-For-Review: Setup zero touch provisioning (ZTP) for network devices - https://phabricator.wikimedia.org/T336485 (10Volans) [15:21:04] 10Traffic, 10MediaWiki-Parser, 10SRE: Varnish 503 errors on page with large number of flag icons. - https://phabricator.wikimedia.org/T267804 (10Dzahn) [15:21:49] 10netops, 10Infrastructure-Foundations, 10SRE: Servers exposing incorrect LLDP info - https://phabricator.wikimedia.org/T250367 (10Dzahn) [15:38:37] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, and 2 others: Q1:(Need By: TBD) rack/setup/install cloudswift100[12] - https://phabricator.wikimedia.org/T289882 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host cloudswift1001.eqiad.wmnet with OS... [15:52:35] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team: cloudservices[2004/2005]-dev & cloudweb2002-dev: connect them to cloudsw so they can have cloud-private vlan - https://phabricator.wikimedia.org/T336587 (10aborrero) [15:55:06] 10Traffic, 10DC-Ops, 10SRE, 10ops-codfw, 10Patch-For-Review: Q4:rack/setup/install lvs2011, lvs2012, lvs2013, lvs2014 - https://phabricator.wikimedia.org/T326767 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host lvs2012.codfw.wmnet with OS bullseye [15:55:32] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team: cloudservices[2004/2005]-dev & cloudweb2002-dev: connect them to cloudsw so they can have cloud-private vlan - https://phabricator.wikimedia.org/T336587 (10cmooney) [15:55:39] 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team: Configure cloudsw1-b1-codfw and migrate cloud hosts in codfw B1 to it - https://phabricator.wikimedia.org/T327919 (10cmooney) [15:55:53] 10Traffic, 10DC-Ops, 10SRE, 10ops-codfw, 10Patch-For-Review: Q4:rack/setup/install lvs2011, lvs2012, lvs2013, lvs2014 - https://phabricator.wikimedia.org/T326767 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host lvs2012.codfw.wmnet with OS bullseye executed w... [15:56:03] 10Traffic, 10DC-Ops, 10SRE, 10ops-codfw, 10Patch-For-Review: Q4:rack/setup/install lvs2011, lvs2012, lvs2013, lvs2014 - https://phabricator.wikimedia.org/T326767 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host lvs2012.codfw.wmnet with OS bullseye [15:56:23] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team: cloudservices[2004/2005]-dev & cloudweb2002-dev: connect them to cloudsw so they can have cloud-private vlan - https://phabricator.wikimedia.org/T336587 (10aborrero) p:05Triage→03Medium [16:00:24] 10Traffic, 10DC-Ops, 10SRE, 10ops-codfw, 10Patch-For-Review: Q4:rack/setup/install lvs2011, lvs2012, lvs2013, lvs2014 - https://phabricator.wikimedia.org/T326767 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host lvs2012.codfw.wmnet with OS bullseye executed w... [16:02:38] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 2 others: cloudservices[2004/2005]-dev & cloudweb2002-dev: connect them to cloudsw so they can have cloud-private vlan - https://phabricator.wikimedia.org/T336587 (10aborrero) a:03Papaul [16:04:01] 10Traffic, 10DC-Ops, 10SRE, 10ops-codfw, 10Patch-For-Review: Q4:rack/setup/install lvs2011, lvs2012, lvs2013, lvs2014 - https://phabricator.wikimedia.org/T326767 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host lvs2012.codfw.wmnet with OS bullseye [16:06:35] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team: Move cloud vps ns-recursor IPs to host/row-independent addressing - https://phabricator.wikimedia.org/T307357 (10aborrero) Current idea that has gained some momentum as part of {T297596} and {T324992}: * hook the cloudser... [16:08:23] 10Traffic, 10DC-Ops, 10SRE, 10ops-codfw, 10Patch-For-Review: Q4:rack/setup/install lvs2011, lvs2012, lvs2013, lvs2014 - https://phabricator.wikimedia.org/T326767 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host lvs2012.codfw.wmnet with OS bullseye executed w... [16:08:39] 10Traffic, 10DC-Ops, 10SRE, 10ops-codfw, 10Patch-For-Review: Q4:rack/setup/install lvs2011, lvs2012, lvs2013, lvs2014 - https://phabricator.wikimedia.org/T326767 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host lvs2012.codfw.wmnet with OS bullseye [16:13:43] 10Traffic, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Reimaging lvs2012 fails as the host is unreachable from cumin2002 - https://phabricator.wikimedia.org/T336428 (10Volans) >>! In T336428#8847879, @gerritbot wrote: > Change 919358 **merged** by BBlack: > %%%[operations/puppet@production] insetup::... [16:14:45] 10netops, 10Infrastructure-Foundations, 10SRE, 10SRE-tools, 10Patch-For-Review: Setup zero touch provisioning (ZTP) for network devices - https://phabricator.wikimedia.org/T336485 (10cmooney) >>! In T336485#8847630, @Volans wrote: > Do we want to hardcode that in the dhcp settings? Or better to pass it d... [16:19:42] 10Traffic, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Reimaging lvs2012 fails as the host is unreachable from cumin2002 - https://phabricator.wikimedia.org/T336428 (10cmooney) >>! In T336428#8844635, @ssingh wrote: > I think the more probable cause is the switch issue @cmooney fixed above and while... [16:23:45] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 2 others: Move cloud vps ns-recursor IPs to host/row-independent addressing - https://phabricator.wikimedia.org/T307357 (10Andrew) This plan sounds OK to me. We could also move the recursors onto VMs, at which point they'd need to be able to a... [16:34:58] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, and 2 others: Q1:(Need By: TBD) rack/setup/install cloudswift100[12] - https://phabricator.wikimedia.org/T289882 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host cloudswift1001.eqiad.wmnet with OS bul... [16:48:38] 10Traffic, 10DC-Ops, 10SRE, 10ops-codfw, 10Patch-For-Review: Q4:rack/setup/install lvs2011, lvs2012, lvs2013, lvs2014 - https://phabricator.wikimedia.org/T326767 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host lvs2012.codfw.wmnet with OS bullseye completed:... [17:04:58] 10Traffic, 10Infrastructure-Foundations, 10SRE: Reimaging cookbok should force a Puppet run on the Icinga host - https://phabricator.wikimedia.org/T336593 (10ssingh) [17:05:28] 10Traffic, 10Infrastructure-Foundations, 10SRE: Reimaging cookbok should force a Puppet run on the Icinga host - https://phabricator.wikimedia.org/T336593 (10ssingh) p:05Triage→03Low [17:10:51] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, and 2 others: Q1:(Need By: TBD) rack/setup/install cloudswift100[12] - https://phabricator.wikimedia.org/T289882 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host cloudswift1001.eqiad.wmnet with OS... [17:31:59] 10Traffic: varnish-frontend-fetcherr: Assert error in vslc_vtx_next, 100% CPU usage - https://phabricator.wikimedia.org/T253093 (10ssingh) 05Open→03Resolved We have bumped the size of the shared memory log and rolled out the changes to all sites by restarting varnish-frontend. Marking this as closed. [18:08:18] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, and 2 others: Q1:(Need By: TBD) rack/setup/install cloudswift100[12] - https://phabricator.wikimedia.org/T289882 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host cloudswift1001.eqiad.wmnet with OS... [18:08:28] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, and 2 others: Q1:(Need By: TBD) rack/setup/install cloudswift100[12] - https://phabricator.wikimedia.org/T289882 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host cloudswift1001.eqiad.wmnet with OS bul... [18:13:14] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, and 2 others: Q1:(Need By: TBD) rack/setup/install cloudswift100[12] - https://phabricator.wikimedia.org/T289882 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host cloudswift1001.eqiad.wmnet with OS... [18:13:20] 10Traffic, 10Infrastructure-Foundations, 10SRE: Reimaging cookbok should force a Puppet run on the Icinga host - https://phabricator.wikimedia.org/T336593 (10Volans) The reimage cookbook calls the downtime one with the `--force-puppet` flag, see https://gerrit.wikimedia.org/r/plugins/gitiles/operations/cookb... [18:13:30] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, and 2 others: Q1:(Need By: TBD) rack/setup/install cloudswift100[12] - https://phabricator.wikimedia.org/T289882 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host cloudswift1001.eqiad.wmnet with OS bul... [18:58:17] 10Traffic, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Reimaging lvs2012 fails as the host is unreachable from cumin2002 - https://phabricator.wikimedia.org/T336428 (10Southparkfan) >>! In T336428#8844099, @cmooney wrote: > Ok I think I see what the issue is. Looking at the [[ https://www.kernel.org... [19:22:15] 10Traffic, 10Infrastructure-Foundations, 10SRE: Reimaging cookbok should force a Puppet run on the Icinga host - https://phabricator.wikimedia.org/T336593 (10ssingh) >>! In T336593#8848181, @Volans wrote: > The reimage cookbook calls the downtime one with the `--force-puppet` flag, see https://gerrit.wikimed... [19:22:40] 10Traffic, 10Infrastructure-Foundations, 10SRE: Reimaging cookbook not forcing a Puppet agent run on lvs2011, lvs2012 - https://phabricator.wikimedia.org/T336593 (10ssingh) [19:42:41] 10Traffic, 10netops, 10DBA, 10Data-Platform-SRE, and 9 others: codfw row D switches upgrade - https://phabricator.wikimedia.org/T335042 (10colewhite) [21:06:01] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, and 2 others: Q1:(Need By: TBD) rack/setup/install cloudswift100[12] - https://phabricator.wikimedia.org/T289882 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host cloudswift1001.eqiad.wmnet with OS... [21:06:10] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, and 2 others: Q1:(Need By: TBD) rack/setup/install cloudswift100[12] - https://phabricator.wikimedia.org/T289882 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host cloudswift1001.eqiad.wmnet with OS bus... [22:18:44] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, and 2 others: Q1:(Need By: TBD) rack/setup/install cloudswift100[12] - https://phabricator.wikimedia.org/T289882 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host cloudswift1001.eqiad.wmnet with OS b... [22:32:28] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, and 2 others: Q1:(Need By: TBD) rack/setup/install cloudswift100[12] - https://phabricator.wikimedia.org/T289882 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host cloudswift1001.eqiad.wmnet with OS bulls... [22:35:32] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, and 2 others: Q1:(Need By: TBD) rack/setup/install cloudswift100[12] - https://phabricator.wikimedia.org/T289882 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host cloudswift1001.eqiad.wmnet with OS b... [22:54:44] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, and 2 others: Q1:(Need By: TBD) rack/setup/install cloudswift100[12] - https://phabricator.wikimedia.org/T289882 (10Papaul) a:05Papaul→03Jhancock.wm @Jhancock.wm was trying to install the OS on cloudswitf1001 and the server was not getting DHCP a... [22:59:57] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, and 2 others: Q1:(Need By: TBD) rack/setup/install cloudswift100[12] - https://phabricator.wikimedia.org/T289882 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host cloudswift1001.eqiad.wmnet with OS bulls...