[07:05:01] 10SRE-tools, 10Infrastructure-Foundations, 10Patch-For-Review: Upgrade Fastnetmon to 1.2.4 - https://phabricator.wikimedia.org/T330884 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host netflow3002.esams.wmnet with OS bookworm [08:10:57] 10SRE-tools, 10Infrastructure-Foundations, 10Patch-For-Review: Upgrade Fastnetmon to 1.2.4 - https://phabricator.wikimedia.org/T330884 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host netflow3002.esams.wmnet with OS bookworm completed: - netflow3002 (**WARN**) -... [08:19:52] 10SRE-tools, 10Infrastructure-Foundations, 10Patch-For-Review: Upgrade Fastnetmon to 1.2.4 - https://phabricator.wikimedia.org/T330884 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host netflow1002.eqiad.wmnet with OS bookworm [08:26:56] cheers moritzm [08:38:37] 10CAS-SSO, 10Infrastructure-Foundations, 10SRE, 10serviceops-collab, and 4 others: migrate gitlab away from the CAS protocol - https://phabricator.wikimedia.org/T320390 (10jbond) > After fixing the redirect_uri I'm able to login successfully to the admin interface (https://gitlab.wikimedia.org/admin) using... [08:53:47] 10Puppet, 10Infrastructure-Foundations, 10SRE, 10Tracking-Neverending: Puppet: tracking catalogs that changes at every run - https://phabricator.wikimedia.org/T191388 (10jbond) [08:53:51] 10Puppet, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Puppet: tlsproxy localssl default_server make a Notify at each run - https://phabricator.wikimedia.org/T191393 (10jbond) 05Open→03Resolved a:03jbond This class has now been removed [09:03:24] 10SRE-tools, 10Infrastructure-Foundations, 10Patch-For-Review: Upgrade Fastnetmon to 1.2.4 - https://phabricator.wikimedia.org/T330884 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host netflow1002.eqiad.wmnet with OS bookworm completed: - netflow1002 (**PASS**) -... [09:07:51] 10netops, 10Data-Engineering, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Netflow/pmacct: use forwardingStatus - https://phabricator.wikimedia.org/T331707 (10ayounsi) `name=IPv4,lang=json { "event_type": "purge", "tag2": 1, "as_src": 48551, "as_dst": 0, "comms": "", "as_path": ""... [09:08:47] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 2 others: cloudservices2004-dev: reimage into new network setup - https://phabricator.wikimedia.org/T338778 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by aborrero@cumin2002 for hosts: `cloudservices2004-dev` - cloudservic... [09:10:11] 10SRE-tools, 10Infrastructure-Foundations, 10Patch-For-Review: Upgrade Fastnetmon to 1.2.4 - https://phabricator.wikimedia.org/T330884 (10MoritzMuehlenhoff) p:05Triage→03Medium [09:12:22] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 2 others: cloudservices2004-dev: reimage into new network setup - https://phabricator.wikimedia.org/T338778 (10aborrero) [09:27:16] moritzm: I looked again at the code and it's unfortunately not possible to pass multiple IPs in https://gerrit.wikimedia.org/r/c/operations/puppet/+/921390/comments/ea253f74_bdb6f374 [09:28:06] and good job with the bookworm upgrade, very smooth! [09:30:23] ah, makes sense. +1d [09:37:27] moritzm: do you think I need a review from o11y for https://gerrit.wikimedia.org/r/c/operations/puppet/+/921394/ ? [09:41:46] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 2 others: Move cloud vps ns-recursor IPs to host/row-independent addressing - https://phabricator.wikimedia.org/T307357 (10aborrero) [09:44:40] it seems fine to merge to merge, it's just some scraping from six hosts after all. If you'd enable this for a large new cluster I'd wait for their review to loop them in, but seems fine here to just go ahead [09:53:27] sounds good, thx! [10:07:01] 10SRE-tools, 10Infrastructure-Foundations, 10Patch-For-Review: Upgrade Fastnetmon to 1.2.4 - https://phabricator.wikimedia.org/T330884 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by jmm@cumin2002 for hosts: `netflow2002.codfw.wmnet` - netflow2002.codfw.wmnet (**PASS**) - Downtimed host... [10:07:48] 10SRE-tools, 10Infrastructure-Foundations, 10Patch-For-Review: Upgrade Fastnetmon to 1.2.4 - https://phabricator.wikimedia.org/T330884 (10MoritzMuehlenhoff) [10:08:46] 10SRE-tools, 10Infrastructure-Foundations, 10Patch-For-Review: Upgrade Fastnetmon to 1.2.4 - https://phabricator.wikimedia.org/T330884 (10MoritzMuehlenhoff) 05Open→03Resolved a:03MoritzMuehlenhoff All newflow hosts are migrated to Bookworm and thus FNM 1.2.4 [10:14:31] 10Mail, 10Puppet, 10Infrastructure-Foundations, 10SRE: Cleanup debconf handling in mailman puppet setup - https://phabricator.wikimedia.org/T144933 (10MoritzMuehlenhoff) 05Open→03Resolved a:03MoritzMuehlenhoff This can be closed, the current debconf integration appears to be working fine. [10:18:50] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 3 others: cloudservices2004-dev: reimage into new network setup - https://phabricator.wikimedia.org/T338778 (10aborrero) a:05aborrero→03Papaul hey @Papaul or @Jhancock.wm would you please do the following: * disconnect server eno1 from asw... [10:23:52] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 3 others: cloudservices2004-dev: reimage into new network setup - https://phabricator.wikimedia.org/T338778 (10cmooney) @aborrero yep exactly, when this is done let me know I'll change the netbox side. [10:46:21] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 2 others: Move cloud vps ns-recursor IPs to host/row-independent addressing - https://phabricator.wikimedia.org/T307357 (10aborrero) [11:56:27] just an FYI all got anopther big storm so i may flap [13:22:38] I created a quick Grafana dashboard with some potentially useful metrics - https://grafana.wikimedia.org/d/jjn9MC_Vk/fastnetmon [13:23:08] nothing private so far, that might change the day we have actual DDoS data exposed in prometheus [13:24:11] nice! [13:32:43] 10Puppet, 10Infrastructure-Foundations, 10SRE, 10Tracking-Neverending: Puppet: tracking catalogs that changes at every run - https://phabricator.wikimedia.org/T191388 (10jbond) [13:33:22] 10Puppet, 10Maps, 10SRE, 10SRE-swift-storage: Puppet: tlsproxy localssl default_server make a Notify at each run - https://phabricator.wikimedia.org/T191393 (10jbond) 05Resolved→03Open This is still in use by swift and maps [13:33:35] 10Puppet, 10Maps, 10SRE, 10SRE-swift-storage: Puppet: tlsproxy localssl default_server make a Notify at each run - https://phabricator.wikimedia.org/T191393 (10MatthewVernon) [the class has been put back, because it's still in use] [13:51:36] 10Puppet, 10Infrastructure-Foundations, 10SRE, 10Tracking-Neverending: Puppet: tracking catalogs that changes at every run - https://phabricator.wikimedia.org/T191388 (10jbond) [13:52:04] 10Puppet, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review, 10User-jbond: replace all puppet crons with systemd timers - https://phabricator.wikimedia.org/T273673 (10BTullis) [13:52:17] 10Puppet, 10Maps, 10SRE, 10SRE-swift-storage: Puppet: tlsproxy localssl default_server make a Notify at each run - https://phabricator.wikimedia.org/T191393 (10jbond) 05Open→03Resolved This is currently only used by maps and swift and @MatthewVernon has confirmed we dont see this issue on those machines [13:52:25] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 4 others: cloudservices2004-dev: reimage into new network setup - https://phabricator.wikimedia.org/T338778 (10Papaul) @aborrero can we move this to ge-0/0/11 and not ge-0/0/36? [14:01:34] XioNoX: whoa cool, fastnetmon prometheus metrics? [14:01:47] cdanis: only health related though [14:01:53] that's still nice :) [14:01:56] yep [14:02:08] I pinged the author to know if they had more on their roadmap :) [14:03:10] 👍 [14:03:44] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 4 others: cloudservices2004-dev: reimage into new network setup - https://phabricator.wikimedia.org/T338778 (10Papaul) @cmooney is it possible when moving servers from asw to cloudsw try to connect that server on the interface that matches th... [14:10:08] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 4 others: cloudservices2004-dev: reimage into new network setup - https://phabricator.wikimedia.org/T338778 (10aborrero) >>! In T338778#8927696, @Papaul wrote: > @aborrero can we move this to ge-0/0/11 and not ge-0/0/36? yes, I'm fine with wh... [14:53:11] 10netbox, 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Represent sub-interface and bridge device assocations in Netbox - https://phabricator.wikimedia.org/T296832 (10cmooney) 05Open→03Resolved a:03cmooney Above patch implements the logic from the "Re-image cookbook changes" in... [15:03:34] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 4 others: cloudservices2004-dev: reimage into new network setup - https://phabricator.wikimedia.org/T338778 (10cmooney) @Papaul why do we need to change? Easiest thing here is just move the cable from eno2 to eno1 on the server side, then rem... [15:12:12] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 3 others: cloudservices2004-dev: reimage into new network setup - https://phabricator.wikimedia.org/T338778 (10cmooney) >>! In T338778#8927735, @Papaul wrote: > @cmooney is it possible when moving servers from asw to cloudsw try to connect th... [15:12:34] 10puppet-compiler, 10Infrastructure-Foundations: Puppet compiler fails due to unset fact wmflib.is_container - https://phabricator.wikimedia.org/T338961 (10hashar) [15:13:16] jbond: jhathaway: the new fact `wmflib.is_container` is not available to the Puppet compiler which thus fails to compile the production catalogue. I filed it as https://phabricator.wikimedia.org/T338961 [15:14:09] hashar: thanks [15:14:27] I'll see what I need to do to fix that [15:14:46] maybe it is all about refreshing the facts on the pcc instances [15:15:23] * jhathaway looks for refresh button [15:15:28] https://wikitech.wikimedia.org/wiki/Help:Puppet-compiler#Manually_update_production or most probably the Puppet Compiler should set it [15:15:52] (are you using Puppet to provision OCI images? :D ) [15:16:05] hashar: there are worse sins to commit :) [15:16:35] hashar: https://phabricator.wikimedia.org/T337970, if you care to follow along [15:17:12] I choke at the first link [Pontoon] and feel like that would cause me to spend my whole evening following more links :] [15:17:33] tis the wiki way [15:18:08] the executive summary at https://wikitech.wikimedia.org/wiki/SRE/business_case/Disposable_Development_Environment looks good enough for me :] That sounds exciting [15:24:26] jhathaway: hashar: `ssh deploy1002.eqiad.wmnet sudo facter -p wmflib ` works so i think it must just be the fact request [15:24:32] did you find the right knob [15:24:43] https://wikitech.wikimedia.org/wiki/Help:Puppet-compiler#Manually_update_production [15:25:01] thanks jbond that is what I am trying now [15:25:09] great :0 [15:26:01] :) [15:26:26] jbond: is there a way to test, other than re-running the pcc job? [15:26:55] jhathaway: you should be able to query the facts end point on puppetdb [15:26:59] i think [15:27:08] nod, thanks [15:27:10] the facts processing is one of the more ... fragile parts of pcc [15:27:59] jhathaway: oh worth mentioning the instructions just start the job [15:28:06] i.e. it may still be running [15:29:20] jhathaway: you can also check that the file has been unpacked to /mnt/nfs/labstore-secondary-project/yaml/production/yaml/facts/deploy1002.eqiad.wmnet.yaml [15:29:40] ah, very nice [15:29:54] you might want to wait for Puppet to have run everywhere since the change adding `wmflib.is_container` is recent (35 minutes ago iirc) [15:30:17] hashar: good point, I was wondering the same thing [15:31:02] or after update crawl the facts files on disk and confirm they all have the new is_container :] [15:36:55] 10netops, 10Infrastructure-Foundations, 10SRE: Peering: prefer primary IXP for direcly connected networks - https://phabricator.wikimedia.org/T338201 (10ayounsi) 05Open→03Resolved a:03ayounsi Tested in eqsin, traffic is now balanced more equally between all 3 IXPs. Same for ulsfo. [15:41:47] hashar: deploy1002.eqiad.wmnet now passes, but the other two hosts failed, because they didn't have the fact yet, I'll rerun the export in a bit, once I am sure every host has resolved the fact, sorry about this [15:43:14] jhathaway: ah that is great :) [15:44:08] I don't know how the facts from WMCS instances get collected though [15:44:16] some dark magic is happening on that front [15:44:39] it's basically the same process as in production, but from different puppetmaster hosts [15:44:41] anyway, I guess you can summarizes the action on https://phabricator.wikimedia.org/T338961 and mark it solved :) [15:45:04] taavi: yeah looks like it, thanks [15:45:06] hashar: will do [16:24:00] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 2 others: Move cloud vps ns-recursor IPs to host/row-independent addressing - https://phabricator.wikimedia.org/T307357 (10taavi) The current implementation on codfw1dev seems to have forgotten that the recursors need outbound access to the pu... [16:32:44] 10netops, 10Infrastructure-Foundations, 10SRE, 10Epic: [tracking] Don't keep on the public vlans hosts that don't require it - https://phabricator.wikimedia.org/T317177 (10aborrero) [16:32:52] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 2 others: Move cloud vps ns-recursor IPs to host/row-independent addressing - https://phabricator.wikimedia.org/T307357 (10aborrero) 05Stalled→03Open [16:42:59] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 2 others: Move cloud vps ns-recursor IPs to host/row-independent addressing - https://phabricator.wikimedia.org/T307357 (10cmooney) >>! In T307357#8928378, @taavi wrote: > The current implementation on codfw1dev seems to have forgotten that th... [16:50:10] 10puppet-compiler, 10Infrastructure-Foundations: Puppet compiler fails due to unset fact wmflib.is_container - https://phabricator.wikimedia.org/T338961 (10jhathaway) I manually updated the facts following these instructions, https://wikitech.wikimedia.org/wiki/Help:Puppet-compiler#Manually_update_production. [16:56:13] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 2 others: Move cloud vps ns-recursor IPs to host/row-independent addressing - https://phabricator.wikimedia.org/T307357 (10cmooney) @aborrero I discussed the idea of a [[ https://wikitech.wikimedia.org/wiki/Wikimedia_Cloud_Services_team/Enhanc...