[07:46:27] can I get a "just in case" review of https://gerrit.wikimedia.org/r/c/operations/dns/+/830085 for the upcoming maintenance? [07:48:17] {{done}} [07:49:48] thx1 [07:53:03] FYI, ulsfo routers upgrade starting in ~10min [09:09:31] hi there! we have a constant stream of DB errors for user wikiuser202206 since 6am UTC this morning. I don't really know how critical it is but it didn't look too healthy so I wanted to mention it just in case: https://logstash.wikimedia.org/goto/06e4660ffae3824e45c8e0966743032f [09:10:04] jnuche: thanks, looking [09:10:46] Amir1: That's probably grants issue on the new x1 eqiad master when comming from codfw [09:11:10] Fixing it now [09:11:22] ah, interesting [09:11:42] thanks. We should have it in puppet [09:12:30] Fixed, let's see if the errors go away [09:12:56] db1103 (old master) was ok [09:35:53] Router upgrade is hitting a bug (the embedded linecard is not booting up), so ulsfo will stay depooled for longer than expected [09:37:12] The error has stopped [09:37:20] jnuche: ^ thanks for letting us know! [09:37:46] XioNoX: following on from yesterday, what uplink does wdqs1004 have? [09:37:47] marostegui: np! [09:38:45] addshore: 1G as well [09:39:08] XioNoX: thanks, next time I might just transfer it straight from wdqs1004 -> google cloud rather than via a stat machine [09:39:42] addshore: wdqs1004 is a production machine so no :) [09:39:52] bah *re checks his notes* [09:40:06] im confusing stat1004 and wdqs1009! [09:40:17] XioNoX: does wdqs1009 also have 1G? [09:41:19] dcausse: looking at the timings i can get it out of the cluster about as fast as moving it within the cluster. so next time I try this I'll get get puppet disabled, turn the service off, copy it myself, and when its done get puppet turned back on [09:41:40] I don't really see myself trying to do this more than around once a month right now [09:44:13] addshore: ok then we might want to have a cookbook to avoid manual steps [09:44:50] sounds good to me, the steps should be relatively easy [09:45:13] will chat about it a little with you in the coming months / write down the steps etc from my side [09:45:24] but next job is to check if this file works for folks on the outside / is sueful [09:47:04] sure [09:47:52] addshore: yep 1G too [09:48:06] if you have netbox access you can look it up: https://netbox.wikimedia.org/dcim/devices/986/interfaces/ [09:48:58] or on the host itself: [09:49:00] wdqs1009:~$ sudo ethtool eno1 | grep Speed [09:49:00] Speed: 1000Mb/s [09:49:25] if you have sudo there [09:50:06] *checks* [09:50:14] oooh yes, I see netbox, lovely! [11:38:45] ulsfo routers upgrade is done, there are a few wrinkles to iron out [11:38:55] but overall the site is ready for traffic again [11:39:01] I'll let it sit a bit then repool [12:19:19] FYI, going to disable puppet in codfw and the edges for 10-15 starting in 5m [12:30:11] ack [13:06:43] Hiya, we are trying to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/821695 on an-test-client1001.eqiad.wmnet, and getting errors. btullis merged this 4 hours ago. the error says it is from file: /etc/puppet/modules/profile/manifests/hadoop/spark3.pp, line: 148. On puppetmaster1001 i can see this file is indeed incorrect and not the line from the merged commit. AFAICT puppet-merge has been run. [13:06:43] /var/lib/git/operations/puppet is up to date. [13:09:02] so i guess my q is: why is /etc/puppet showing an old file? [13:33:20] Is it this line? https://gerrit.wikimedia.org/r/c/operations/puppet/+/821695/7/modules/profile/manifests/hadoop/spark3.pp#152 Should be `content` instead of `source`? [13:48:41] ....? btullis i fetched production and i see it as content => ???? [13:51:30] Oh. What about here? Is this out of date? https://github.com/wikimedia/puppet/blob/production/modules/profile/manifests/hadoop/spark3.pp#L152 [13:52:24] i guess it is.... must be my local copy is somehow wrong... [13:53:46] ottomata: same md5 for puppetmaster1001:/etc/puppet/modules/profile/manifests/hadoop/spark3.pp and local modules/profile/manifests/hadoop/spark3.pp btw [13:53:58] yeah...... [13:54:06] okay, then maybe my fetch was wrong... [13:54:08] okay [13:55:44] indeed, new fetch and rebase show sit wrong...okay [13:56:06] New patch here: https://gerrit.wikimedia.org/r/c/operations/puppet/+/830170 [13:57:07] you beat me! [13:57:56] merging btullis [13:58:20] Great, thanks.