[04:37:37] 10netops, 10Infrastructure-Foundations, 10serviceops: TCP retransmissions in eqiad and codfw - https://phabricator.wikimedia.org/T291385 (10Marostegui) p:05Triage→03Medium [06:51:44] 10netops, 10Infrastructure-Foundations, 10SRE, 10procurement: Move AMS-IX port to 802.1q tagged and get "private vlan" added - https://phabricator.wikimedia.org/T291407 (10ayounsi) a:05wiki_willy→03ayounsi AMS-IX NOC emailed to schedule the change, with vlan 380 for IX and 381 for NaWas. [07:08:56] 10netops, 10Infrastructure-Foundations, 10serviceops: TCP retransmissions in eqiad and codfw - https://phabricator.wikimedia.org/T291385 (10jijiki) >>! In T291385#7365671, @cmooney wrote: We have been living with this for quite a long time, we can wait a little longer :) > Should we de-pool those two boxes... [13:19:21] jobo: now that we have an "in progress" status in phab, should we stop using the "in progress" dashboard column? Is there a way to mass-migrate tasks? [13:23:56] The "in progress" dashboard column has trigger that updates status to "in progress" so use it as you did till now. I'll update all statuses. [13:24:22] 10Puppet, 10Infrastructure-Foundations: error while resolving custom fact "lldp_neighbors" on ms-be105[1-9], ms-be205[1-6] and relforge100[3-4] - https://phabricator.wikimedia.org/T290984 (10joanna_borun) 05Open→03In progress [13:26:45] jbond: quick one for you, not sure if related to all the changes we did to facter recently [13:26:59] it seems that for at least some ganeti hosts (I didn't check them all yet) [13:27:12] facter reports the wrong ip6 [13:27:18] that ends up in the known hosts file in all hosts [13:27:28] that makes cumin print a warning [13:27:36] and in moritzm case mak it fail the reboot cookbook [13:28:03] as example in ganeti2025 we got this one as ip6: [13:28:04] inet6 2620:0:860:104:3673:5aff:fefb:3214/64 scope global mngtmpaddr dynamic [13:28:07] valid_lft 2591995sec preferred_lft 604795sec [13:29:16] confirmed for almost all ganeti, cross-checking with OS version [13:30:33] volans: i think that will be realted to the a change i made recently. just finishing an email, will check after that. out of curiousity are both ipaddress6 and networkig.ip6 wrong or just the later? [13:31:09] checking [13:31:16] both [13:31:54] just ganeti3* and ganeti4* are ok, all the others have the issue [13:32:16] 10netops, 10Infrastructure-Foundations, 10serviceops: TCP retransmissions in eqiad and codfw - https://phabricator.wikimedia.org/T291385 (10joanna_borun) 05Open→03In progress [13:32:18] Seems the "private" NIC has both the Netbox allocated IP on it, and also an EUI-64 IP? [13:33:06] 10netops, 10Infrastructure-Foundations, 10SRE, 10procurement: Move AMS-IX port to 802.1q tagged and get "private vlan" added - https://phabricator.wikimedia.org/T291407 (10joanna_borun) 05Open→03In progress [13:33:27] 10Mail, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Upgrade MXes to Bullseye - https://phabricator.wikimedia.org/T286911 (10joanna_borun) 05Open→03In progress [13:36:48] 10Puppet, 10Cloud-VPS, 10Infrastructure-Foundations, 10User-jbond, 10cloud-services-team (Kanban): Audit puppet usage in cloud hosts - https://phabricator.wikimedia.org/T289658 (10joanna_borun) 05Open→03In progress [13:36:55] 10Puppet, 10Cloud Services Proposals, 10Cloud-VPS, 10Infrastructure-Foundations, and 3 others: Easing pain points caused by divergence between cloudservices and production puppet usecases - https://phabricator.wikimedia.org/T285539 (10joanna_borun) [13:37:16] 10Puppet, 10Cloud-VPS, 10Infrastructure-Foundations, 10User-jbond, 10cloud-services-team (Kanban): Normalise hiera default values - https://phabricator.wikimedia.org/T289665 (10joanna_borun) 05Open→03In progress [13:37:24] 10Puppet, 10Cloud Services Proposals, 10Cloud-VPS, 10Infrastructure-Foundations, and 3 others: Easing pain points caused by divergence between cloudservices and production puppet usecases - https://phabricator.wikimedia.org/T285539 (10joanna_borun) [13:38:22] 10Puppet, 10Cloud-VPS, 10Infrastructure-Foundations, 10User-jbond, 10cloud-services-team (Kanban): Gather a list of puppet modules shared between cloud and production - https://phabricator.wikimedia.org/T289666 (10joanna_borun) 05Open→03In progress [13:38:28] 10Puppet, 10Cloud Services Proposals, 10Cloud-VPS, 10Infrastructure-Foundations, and 3 others: Easing pain points caused by divergence between cloudservices and production puppet usecases - https://phabricator.wikimedia.org/T285539 (10joanna_borun) [13:56:50] 10netops, 10Infrastructure-Foundations, 10SRE, 10procurement: Move AMS-IX port to 802.1q tagged and get "private vlan" added - https://phabricator.wikimedia.org/T291407 (10ayounsi) Updated Netbox, resulting diff: `lang=diff [edit interfaces ae2] + flexible-vlan-tagging; + encapsulation flexible-ethern... [14:02:58] volans: topranks: can you take a look at https://gerrit.wikimedia.org/r/c/operations/puppet/+/722621 [14:07:56] * volans looking [14:08:23] jbond: LGTM (far from an expert on the syntax but looks sane). [14:08:42] But I can't help but thinking the issue is the presence of the EUI-64 address itself? Is it not better to prevent the system autoconfiguring that? As a test there I did a ping out and it's defaulting to using that IP for comms as opposed to the one we assign from Netbox. [14:10:05] topranks: i agree and can't quite rember why thats not the case but the main task is https://phabricator.wikimedia.org/T102099 [14:12:04] Ok thanks! I'll have a read and subscribe, at a glance seems like the approach you've taken is best to resolve issue vol.ans noticed though :) [14:12:21] ack thanks [14:14:13] * volans act like he knows what mngtmpaddr does :-p [14:14:29] * volans looked it up [14:14:36] * volans is now an expert! :-P [14:17:04] lol :P [14:17:55] volans: is this breaking anything right now? there rea a few places where ipaddress6 are used and justwant to double check this wont be a change in behaviour for them [14:19:32] jbond: take your time [14:19:42] it's making cumin print a warning when ssh-ing [14:19:42] volans: cool thanks [14:19:47] so in the case of the reboot cookbook [14:19:48] ack [14:19:51] it made the uptime() call fail [14:20:03] because it tried to parse the warning as an uptime [14:36:19] 10netops, 10Infrastructure-Foundations, 10SRE, 10procurement: Move AMS-IX port to 802.1q tagged and get "private vlan" added - https://phabricator.wikimedia.org/T291407 (10ayounsi) 05In progress→03Resolved All done. [14:43:01] topranks, XioNoX: if either of you, when you have time, could comment on https://phabricator.wikimedia.org/T289536 that would be great. TL;DR if it's ok to keep the wikimedia-dns.org zonefile manual or should be managed via netbox [15:59:55] 10SRE-tools, 10Infrastructure-Foundations: Cookbooks: convert wmf-auto-reimage scripts to Cookbooks - https://phabricator.wikimedia.org/T205885 (10joanna_borun) 05Open→03In progress [16:00:01] 10SRE-tools, 10Infrastructure-Foundations, 10SRE, 10Goal: Expand Spicerack library and SRE Cookbooks - Q2 2018-19 Goal - https://phabricator.wikimedia.org/T205867 (10joanna_borun) [16:00:06] 10Puppet, 10Infrastructure-Foundations, 10Patch-For-Review, 10User-jbond: puppetdb seems to be slow on host reimage - https://phabricator.wikimedia.org/T263578 (10joanna_borun) 05Open→03In progress [16:00:18] 10netops, 10Infrastructure-Foundations, 10SRE: Create an alert for output discards on network devices - https://phabricator.wikimedia.org/T284593 (10joanna_borun) 05Open→03In progress [16:00:35] 10SRE-tools, 10homer, 10netbox, 10netops, and 3 others: Investigate Capirca - https://phabricator.wikimedia.org/T273865 (10joanna_borun) 05Open→03In progress [16:01:59] 10netops, 10Infrastructure-Foundations, 10SRE: ripe-atlas-codfw is down - https://phabricator.wikimedia.org/T267714 (10joanna_borun) 05Open→03In progress [16:02:15] 10netbox, 10Infrastructure-Foundations: Add git-local-changes check for netbox-extras - https://phabricator.wikimedia.org/T250288 (10joanna_borun) 05Open→03In progress [16:02:29] 10SRE-tools, 10netbox, 10Infrastructure-Foundations: Netbox support for svc allocation - https://phabricator.wikimedia.org/T263429 (10joanna_borun) 05Open→03In progress [16:03:23] 10netbox, 10Infrastructure-Foundations, 10IPv6, 10User-jbond: Some clusters do not have DNS for IPv6 addresses (TRACKING TASK) - https://phabricator.wikimedia.org/T253173 (10joanna_borun) 05Open→03In progress [16:03:38] 10netbox, 10Infrastructure-Foundations: Netbox CSV dumps can't be compared - https://phabricator.wikimedia.org/T262671 (10joanna_borun) 05Open→03In progress [16:03:50] 10netbox, 10Infrastructure-Foundations: Netbox missing hourly dumps - https://phabricator.wikimedia.org/T262674 (10joanna_borun) 05Open→03In progress [16:04:05] 10netbox, 10Infrastructure-Foundations: Manage DHCP from Netbox - https://phabricator.wikimedia.org/T269855 (10joanna_borun) 05Open→03In progress [16:04:23] 10netbox, 10Infrastructure-Foundations: Netbox: import from PuppetDB script creates VIP also if exists - https://phabricator.wikimedia.org/T278936 (10joanna_borun) 05Open→03In progress [16:04:50] 10CAS-SSO, 10Infrastructure-Foundations, 10SRE, 10observability, 10User-jbond: Icinga Monitoring for CAS - https://phabricator.wikimedia.org/T233935 (10joanna_borun) 05Open→03In progress [16:04:53] 10CAS-SSO, 10Infrastructure-Foundations, 10SRE, 10Security-Team, 10User-jbond: Further steps for CAS/web SSO - https://phabricator.wikimedia.org/T233921 (10joanna_borun) [16:05:07] 10SRE-tools, 10Infrastructure-Foundations: Spicerack: split wmf-auto-reimage-lib into Spicerack modules - https://phabricator.wikimedia.org/T205884 (10joanna_borun) 05Open→03In progress [16:05:12] 10SRE-tools, 10Infrastructure-Foundations, 10SRE, 10Goal: Expand Spicerack library and SRE Cookbooks - Q2 2018-19 Goal - https://phabricator.wikimedia.org/T205867 (10joanna_borun) [16:39:19] topranks: volans: got side tracked but the ipaddress6 fix is now merged. the hosts files shuold be good in ~30-60 minutes [16:43:15] jbond: thanks a lot! great [16:52:08] jbond: great thanks, I can doublecheck with a ganeti reboot reboot [16:52:17] (which exposed the issue) [16:53:59] thanks [16:54:36] we can just check the known hosts file content :D [16:55:16] yep [16:55:18] -ganeti2026.codfw.wmnet,ganeti2026,10.192.48.61,2620:0:860:104:3673:5aff:fefb:36ac ecdsa-sha2 [16:55:25] +ganeti2026.codfw.wmnet,ganeti2026,10.192.48.61,2620:0:860:104:10:192:48:61 ecdsa-sha2 [16:55:56] will take 1h from the merge to get fixed for the cookbook sfwiw [17:06:13] ack, need to reboot one tomorrow anyway, it's not really any extra effort :-) [17:06:21] :D [17:14:45] Thanks for the heads up j.bond, nice bit of background for me there :) [19:38:25] 10Puppet, 10Infrastructure-Foundations, 10GitLab (Infrastructure), 10Patch-For-Review, and 3 others: Puppetise gitlab-ansible playbook - https://phabricator.wikimedia.org/T283076 (10brennen) [19:43:13] 10CAS-SSO, 10Infrastructure-Foundations, 10GitLab (Auth & Access), 10Patch-For-Review, and 2 others: Open gitlab.wikimedia.org to all users with Wikimedia developer accounts - https://phabricator.wikimedia.org/T288162 (10brennen) [20:06:52] 10Mail, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Upgrade MXes to Bullseye - https://phabricator.wikimedia.org/T286911 (10herron) Re: the above patch -- DKIM metrics dropped off on mx2001 beginning yesterday. Our Exim metrics are generated via mtail parsing of the Exim log, and Exim 4.90 in... [23:41:23] 10CAS-SSO, 10Infrastructure-Foundations, 10Observability-Metrics, 10SRE, and 3 others: Sign-in links from Grafana dashboards don't work when not signed into SSO - https://phabricator.wikimedia.org/T269272 (10RLazarus) 05Open→03Resolved a:03jbond Nope, I don't still get the loop described in the origi...