[00:57:11] 10Puppet, 10Infrastructure-Foundations, 10Instrument-ClientError, 10Observability-Logging, 10patch-welcome: Prevent Firefox and Chrome extensions from being able to trigger alerts - https://phabricator.wikimedia.org/T330680 (10Jdlrobson) That should be fine. FWIW it seems very few errors would slip throu... [08:17:25] 10netbox, 10Infrastructure-Foundations, 10Patch-For-Review: Upgrade Netbox to 3.2 - https://phabricator.wikimedia.org/T296452 (10ayounsi) I opened {T330883} to track this specific issue. [09:14:02] 10netbox, 10Infrastructure-Foundations: Netbox in codfw slowness issue - https://phabricator.wikimedia.org/T330883 (10Volans) We do also have to solve some puppet-related issues, things that are driven by which host is primary that are currently in hiera. [09:37:12] 10netops, 10Infrastructure-Foundations, 10SRE: cr2-esams:FPC0 Parity error - https://phabricator.wikimedia.org/T318783 (10ayounsi) 05Resolved→03Open The issue is back: > 2023-01-30 12:36:42 UTC Minor FPC 0 Minor Errors we need to follow up with JTAC for a replacement. [09:40:33] jbond: Quick question about the apt repo. We just pushed a package upgrade to apt1001 and didn't see it reflected on the clients. I see that apt.wikimedia.org is currently pointing at apt2001 - Should we just wait for it to right itself, or should we undo something [09:40:35] ? [09:41:46] btullis: I guess you could force an rsync run from apt1001 (see also T330843 ) [09:41:47] T330843: reprepro uploads should trigger rsync apt job - https://phabricator.wikimedia.org/T330843 [09:41:50] cc jbond [09:42:19] btullis: did yu se the bit motd when logging into apt1001? [09:42:23] ;) [09:42:58] fyi your backage will be removed from apt1001 yu will need to upload it to 2001 [09:43:19] jbond, thanks. There were four of us watching and none of us saw it :-) [09:43:21] volans: that wont work as th ferm rules etc wont be in place [09:43:58] btullis: tbh the message is to big and also easily missable on my screen, its on my list to make it a bit more obvious [09:48:32] 10Packaging, 10Infrastructure-Foundations: reprepro uploads should trigger rsync apt job - https://phabricator.wikimedia.org/T330843 (10jbond) [09:48:42] 10Packaging, 10Infrastructure-Foundations: reprepro uploads should trigger rsync apt job - https://phabricator.wikimedia.org/T330843 (10jbond) p:05Triage→03Medium [09:49:13] All good, thanks. The problem is mainly with the brain, I think. Maybe something like adding an alias from reprepro to cowsay when it's the standby server would help :-) [09:49:25] my brain [13:02:54] 10SRE-tools, 10Infrastructure-Foundations, 10SRE, 10serviceops, 10Datacenter-Switchover: Support locking cookbooks run except for switchover related cookbooks - https://phabricator.wikimedia.org/T330997 (10Volans) [14:07:15] 10Packaging, 10Puppet, 10Infrastructure-Foundations, 10Patch-For-Review: apt: improve apt failover orchestration - https://phabricator.wikimedia.org/T330849 (10jbond) >>! In T330849#8656648, @Volans wrote: > We should find a standard setup for those use cases, I can see Netbox having exactly the same issu... [14:13:37] jbond, XioNoX: is it normal that on cumin1001 we get Exec[git_pull_operations/homer/public] executed successfully (corrective) at every run? [14:13:51] volans: i was just looking at that lol [14:14:12] volans: I'm sure you know the answer to your question :) [14:14:25] https://phabricator.wikimedia.org/P44918 [14:14:33] looks like someone has been doing some local hacking [14:15:12] here we go [14:15:38] we can restore the file currently in git or commit the local changes [14:15:59] jbond: save the diff and restore the file I guess is safest [14:16:08] do you have the diff handy? [14:16:54] XioNoX: i added it to the past above [14:17:16] but also cd /srv/homer/public && sudo git diff [14:17:33] once you have looked at the diff confirm and ill restor the correct version on cumin [14:18:14] topranks: ^ [14:18:24] same for the idm hosts (cc moritzm, slyngs), at every run: Idm::Deployment/Exec[collect static assets] executed successfully (corrective) [14:18:46] jbond: you can revert and I'll try to run homer to check it [14:18:58] volans: idm shuold be acked and removed once the alert re-checks [14:19:31] ack, then I'll wait the recheck :D [14:19:45] XioNoX: done, running puppet now [14:20:20] em... [14:20:37] volans: its updated now. an-test-worker is still there o ill look at why that is [14:20:45] ack [14:20:51] not sure I can explain that, definitely not in the habit of changing anything on the repo on cumin directly [14:20:56] cumin looks good now [14:21:16] godog: volans: [14:21:18] the diff does look like something I had changed a while back alright [14:21:20] sorry ignore [14:21:32] * jbond notes its been some time since i did that lol [14:21:54] same for netmon1003: Prometheus::Blackbox_exporter/Exec[assemble blackbox.yml] executed successfully (corrective) [14:22:09] all good for homer [14:30:02] fyi, the sum of the two eqiad/codfw links reached 7Gbps, that means that if one of them fails the other is going to run hot. Not problematic at this point but to keep an eye on [14:30:09] https://librenms.wikimedia.org/bill/bill_id=24/ [14:33:17] XioNoX: do we have some top-talkers? [14:33:30] I guess the cross-dc traffic patterns changed quite a bit yesterday [14:33:38] volans: no visibility there, but it's related to the switchover [14:33:42] yeah [14:34:06] the kind of things the switchover is for :) [14:34:35] I wonder what traffic given that eqiad is depooled from almost everything multi-dc [14:34:55] so I wonder if it's the analytics pipelines [14:35:23] volans: analytics + esams [14:35:31] + wmcs [14:38:16] in case it becomes hot could we re-route wmcs to pass via the internet? [14:38:34] volans: we could re-route esams through GTT [14:38:49] so direct to codfw, but more expensive [14:39:16] ack [14:39:18] WMCS via the internet is technically possible but quite complex/messy [14:39:41] drmrs goes to codfw? [14:40:14] yep [14:40:23] just takes the shortest path latency wise [14:40:54] then we could just depool esams and see if we can survive :D [14:41:39] for GTT we have a 1Gbps commit, and are hitting this with only drmrs, if we depool esams, all europe would use drmrs's GTT [14:41:48] so we would go over commit [14:41:54] so same result [14:42:20] I know, was joking :D [14:42:47] :) [15:20:19] 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team (FY2022/2023-Q3): Configure cloudsw1-b1-codfw and migrate cloud hosts in codfw B1 to it - https://phabricator.wikimedia.org/T327919 (10aborrero) This ticket had little activity in the last month. Did something happen offline that wasn't r... [15:25:44] 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team (FY2022/2023-Q3): Configure cloudsw1-b1-codfw and migrate cloud hosts in codfw B1 to it - https://phabricator.wikimedia.org/T327919 (10cmooney) @aborrero I've been getting the cloudsw configured in the background, which is nearly done. M... [15:46:21] topranks: in case yuo missed it yes they have an api and an skd but i have not looked at it much. i just used https://github.com/ewilken/starlink-exporter [15:46:48] jbond: cool yeah thanks, certainly interesting stuff [15:47:30] I was just wondering about the throughput measurements really, doing constant speedtests is nice but can knock you over a aggregate bandwidth limit pretty easy on something like that [15:47:50] that exporter looks pretty nice [15:50:42] yes they are pretty good on that side. even the phone app has constant latency andoutage measurements. as to the through put measurements this is just what im using not max [15:51:06] ah gotcha yeah that makes more sense [15:54:39] jhathaway: as a board gamer you may appreicate this. after about 3 years of waiting this arrive the other week https://pasteboard.co/H7Nv6zJYQw8R.jpg (not played yet but floomhaven is one of th best games i have played) [15:55:47] super cool, I have heard great things about gloomhaven, but I have never picked it up [15:57:00] its expensive but well worth it. im a fan of legacy games but most of them are too short. gloomhaven has i think 90 levels/quets with loads of side quests so yuo get bang for you buck [15:59:08] good to know, I'll have to consider for a present for one of my kids [15:59:30] +1 would recomend [16:15:25] and closing the loop on the grphs this is the one i look at most often https://pasteboard.co/oJlVQ1rssuJo.png [16:15:47] * jbond the fridge line is high as we have a workman here pluged into that socket) [16:16:46] is your battery bank @110V? [16:17:19] i think they are 48 [16:17:24] we have three [16:17:57] the array voltage was puzzling me a bit [16:18:41] jbond: next summit session on your setup? [16:19:34] XioNoX yes, ill put some bunk beds in the hanger :P [16:21:01] volans: tbh i have not looked in depth at that or the array ampage to know what that is actully showing me [16:21:59] ack [16:22:04] but it is alwasy at about 115-120v and 5-6a [16:23:11] ack [17:07:39] 10netops, 10Infrastructure-Foundations, 10SRE: Plan codfw row A/B top-of-rack switch refresh - https://phabricator.wikimedia.org/T327938 (10cmooney) @papaul in terms of the cables we will need to begin as follows. I'm assuming here we go with [[ https://www.fs.com/de-en/products/71644.html?attribute=675&id=...