[00:01:01] 10serviceops, 10Patch-For-Review: Productionise thumbor1005, thumbor1006, thumbor2005 and thumbor2006 - https://phabricator.wikimedia.org/T285477 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by legoktm@cumin1001 for host thumbor2005.codfw.wmnet with OS stretch [00:26:52] 10serviceops, 10Patch-For-Review: Productionise thumbor1005, thumbor1006, thumbor2005 and thumbor2006 - https://phabricator.wikimedia.org/T285477 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by legoktm@cumin1001 for host thumbor2005.codfw.wmnet with OS stretch completed: - thumbor2005 (*... [00:28:20] 10serviceops, 10Patch-For-Review: Productionise thumbor1005, thumbor1006, thumbor2005 and thumbor2006 - https://phabricator.wikimedia.org/T285477 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by legoktm@cumin1001 for host thumbor2006.codfw.wmnet with OS stretch [00:54:09] 10serviceops, 10Patch-For-Review: Productionise thumbor1005, thumbor1006, thumbor2005 and thumbor2006 - https://phabricator.wikimedia.org/T285477 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by legoktm@cumin1001 for host thumbor2006.codfw.wmnet with OS stretch completed: - thumbor2006 (*... [02:08:49] 10serviceops, 10Patch-For-Review: Productionise thumbor1005, thumbor1006, thumbor2005 and thumbor2006 - https://phabricator.wikimedia.org/T285477 (10Legoktm) thumbor200[56] are ready, passing httpbb, just depooled until tomorrow morning. [07:57:36] hello folks, qq - is kartotherian meant to be depooled in codfw? [07:57:45] I tried to check SAL and phab but didn't find much [07:58:00] in case the answer is "yes" I'll update the python script that checks it [08:15:35] <_joe_> hnowlan: ^^ [08:15:42] <_joe_> elukey: it's not on you [08:16:21] <_joe_> als [08:16:34] <_joe_> *also: please reports of things not working should go to #sre [08:16:41] <_joe_> so that more eyeballs are on them [08:17:06] <_joe_> tbh I don't like that script [08:18:46] ah yes I wanted to know if people knew in here about the service, that's it [09:04:15] <_joe_> jelto / jayme I am working today to deploy the apple search service https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/736273 to k8s [09:04:21] <_joe_> should I go the helm3 way directly? [09:05:35] <_joe_> I see the user is already there [09:05:46] <_joe_> apple-search and apple-search-deploy [09:06:56] they all are. For staging you'll auto use helm3, for prod you'd still be using helm2 [09:07:33] no special treatment required AIUI [09:11:09] <_joe_> I just need to update the helmfile.yaml I guess [09:14:17] _joe_: ah, yes. That needs to comply with what we corrently have in helmfile.d/services/_example_/ [09:16:22] <_joe_> yeha the deployment patch was created just before that was changed [09:19:08] joe: you could go ahead and use helm3 for a fresh deploy, but there is no helm3 only example/template currently. I prepared a possible cleanup of the helmfile here: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/737034 but it's still in review. So I would recommend just use the current default example with helm2 and we migrate it soon [09:19:38] <_joe_> cool [09:19:46] <_joe_> btw [09:20:57] <_joe_> we need to clarify, also with an email to ops@, that "helmfile status" has a changed output in staging [09:21:42] I'll check that [09:23:24] <_joe_> it doesn't show the running pods or other release info that was coming from helm status in helm 2 [09:33:52] 10serviceops, 10GitLab, 10Security-Team, 10Release-Engineering-Team (Radar), 10SecTeam-Processed: Setup GitLab Runner in trusted environment - https://phabricator.wikimedia.org/T295481 (10MoritzMuehlenhoff) >>! In T295481#7511899, @Dzahn wrote: > @Jelto I [[ https://wikitech.wikimedia.org/wiki/Ganeti#Ver... [11:27:33] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: setup/install kubestage100[34] - https://phabricator.wikimedia.org/T293729 (10JMeybohm) [13:01:15] 10serviceops, 10SRE, 10SRE-Access-Requests, 10Parsoid (Tracking), 10Patch-For-Review: Access to scandium.eqiad.wmnet & testreduce1001.eqiad.wmnet - https://phabricator.wikimedia.org/T295900 (10Jelto) [13:12:31] 10serviceops, 10SRE, 10SRE-Access-Requests, 10Parsoid (Tracking): Access to scandium.eqiad.wmnet & testreduce1001.eqiad.wmnet - https://phabricator.wikimedia.org/T295900 (10Jelto) 05Open→03Resolved a:03Jelto I added @ihurbain to `parsoid-test-admins`. All six mentioned above should have access now b... [13:29:56] 10serviceops, 10SRE, 10SRE-Access-Requests, 10Parsoid (Tracking): Access to scandium.eqiad.wmnet & testreduce1001.eqiad.wmnet - https://phabricator.wikimedia.org/T295900 (10ihurbain) All good; took a little while for the whole auth to propagate (I guess), but I now have access. Thanks! [14:12:39] 10serviceops, 10SRE, 10SRE-Access-Requests, 10Parsoid (Tracking): Access to scandium.eqiad.wmnet & testreduce1001.eqiad.wmnet - https://phabricator.wikimedia.org/T295900 (10ssastry) Thanks @dzahn and @jelto ... It looks like isabelle cannot setup a ssh tunnel via `ssh -L 8003:localhost:8003 testreduce1001... [14:23:48] 10serviceops, 10GitLab, 10Security-Team, 10Release-Engineering-Team (Radar), 10SecTeam-Processed: Setup GitLab Runner in trusted environment - https://phabricator.wikimedia.org/T295481 (10akosiaris) >>! In T295481#7512587, @MoritzMuehlenhoff wrote: >>>! In T295481#7511899, @Dzahn wrote: >> @Jelto I [[ ht... [15:20:29] jbond: I have updated the puppet compiler job so it is now using 4 threads :] thx! [15:20:45] great thanks [15:24:06] jayme: hi! I am going to build the new releng/helm-linter image ( https://gerrit.wikimedia.org/r/c/integration/config/+/739544 ) would you like to test it once build or should I just update the jenkins job immediately? [15:24:54] hashar: cool, thanks! Please go ahead and update the job as well. I did test a local build ... should be fine [15:24:56] * jayme runns [15:25:56] 10serviceops, 10SRE, 10SRE-Access-Requests, 10Parsoid (Tracking): Access to scandium.eqiad.wmnet & testreduce1001.eqiad.wmnet - https://phabricator.wikimedia.org/T295900 (10Jelto) >>! In T295900#7513678, @ssastry wrote: > Thanks @dzahn and @jelto ... It looks like isabelle cannot setup a ssh tunnel via `s... [15:27:29] jayme: awesome :) [15:38:19] built, updating job [15:39:09] done [15:39:14] jayme: thank you for the patches / test etc [15:40:27] hashar: thanks for integrating :) [15:48:48] ...issues of cause. :( I'll check [15:58:49] :-\ [16:02:22] jayme: some repos missing apparently [16:07:15] hashar: yeah...more like I missed something needing them in the Rakefile [16:09:01] I am still around [16:39:28] hashar: nothing for you to do, thanks. It's my code that needs fixing [16:45:56] 10serviceops, 10SRE, 10SRE-Access-Requests, 10Parsoid (Tracking), 10Patch-For-Review: Access to scandium.eqiad.wmnet & testreduce1001.eqiad.wmnet - https://phabricator.wikimedia.org/T295900 (10ssastry) >>! In T295900#7513863, @Jelto wrote: >>>! In T295900#7513678, @ssastry wrote: >> Thanks @dzahn and @je... [17:24:43] 10serviceops, 10Security-Team, 10GitLab (CI & Job Runners), 10Release-Engineering-Team (Radar), 10SecTeam-Processed: Setup GitLab Runner in trusted environment - https://phabricator.wikimedia.org/T295481 (10brennen) [17:45:38] 10serviceops, 10SRE, 10SRE-Access-Requests, 10Parsoid (Tracking), 10Patch-For-Review: Access to scandium.eqiad.wmnet & testreduce1001.eqiad.wmnet - https://phabricator.wikimedia.org/T295900 (10Dzahn) bumped this for early approval because of the high prio [17:51:43] 10serviceops, 10SRE, 10ops-codfw: decom mw2280 (was: mw2280 unresponsive to powercycle and hardreset) - https://phabricator.wikimedia.org/T290708 (10Dzahn) [17:51:51] 10serviceops, 10SRE, 10ops-codfw: decom mw2280 (was: mw2280 unresponsive to powercycle and hardreset) - https://phabricator.wikimedia.org/T290708 (10Dzahn) p:05Low→03Medium [17:52:14] 10serviceops, 10SRE, 10ops-codfw: decom mw2280 (was: mw2280 unresponsive to powercycle and hardreset) - https://phabricator.wikimedia.org/T290708 (10Dzahn) a:05Papaul→03None [17:52:44] 10serviceops, 10SRE, 10ops-codfw: decom mw2280 (was: mw2280 unresponsive to powercycle and hardreset) - https://phabricator.wikimedia.org/T290708 (10Dzahn) correct me if i'm wrong @wiki_willy [20:01:15] 10serviceops, 10decommission-hardware: decommission thumbor100[34].eqiad.wmnet - https://phabricator.wikimedia.org/T273137 (10Legoktm) 05Stalled→03Invalid I'm closing this since we already have individual tasks for this decom: {T285480} and {T285479}. [20:14:44] 10serviceops, 10Phabricator, 10Release-Engineering-Team: Deprecate git-ssh service on phabricator.wikimedia.org - https://phabricator.wikimedia.org/T296022 (10mmodell) [21:03:22] 10serviceops, 10SRE, 10ops-codfw: decom mw2280 (was: mw2280 unresponsive to powercycle and hardreset) - https://phabricator.wikimedia.org/T290708 (10wiki_willy) Hi @Dzahn - there's an email thread with Alex, Lukasz, Faidon, and Mark around whether or not it's worth the extra cycles needed to replace mw2280.... [21:50:12] 10serviceops, 10decommission-hardware, 10Patch-For-Review: decommission thumbor1004.eqiad.wmnet - https://phabricator.wikimedia.org/T285480 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by legoktm@cumin1001 for hosts: `thumbor1004.eqiad.wmnet` - thumbor1004.eqiad.wmnet (**PASS**) - Downt... [21:53:12] 10serviceops, 10decommission-hardware, 10ops-eqiad: decommission thumbor1004.eqiad.wmnet - https://phabricator.wikimedia.org/T285480 (10Legoktm) [22:06:51] 10serviceops, 10decommission-hardware, 10Patch-For-Review: decommission thumbor1003.eqiad.wmnet - https://phabricator.wikimedia.org/T285479 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by legoktm@cumin1001 for hosts: `thumbor1003.eqiad.wmnet` - thumbor1003.eqiad.wmnet (**PASS**) - Downt... [22:07:57] 10serviceops, 10decommission-hardware, 10ops-eqiad: decommission thumbor1003.eqiad.wmnet - https://phabricator.wikimedia.org/T285479 (10Legoktm) [22:09:49] 10serviceops, 10Patch-For-Review: Productionise thumbor1005, thumbor1006, thumbor2005 and thumbor2006 - https://phabricator.wikimedia.org/T285477 (10Legoktm) [22:10:44] 10serviceops, 10Patch-For-Review: Productionise thumbor1005, thumbor1006, thumbor2005 and thumbor2006 - https://phabricator.wikimedia.org/T285477 (10Legoktm) thumbor100[56] and now fully in service and thumbor100[34] are shutdown and waiting decom by DC ops. I'm holding off on pooling the codfw servers just be... [23:02:37] 10serviceops, 10SRE, 10ops-codfw: decom mw2280 (was: mw2280 unresponsive to powercycle and hardreset) - https://phabricator.wikimedia.org/T290708 (10Dzahn) Thanks Willy! Alright, will wait for that. Regardless of the outcome we would remove the existing broken one, I think.