[07:08:41] (03CR) 10Hashar: [C:03+2] Zuul: Drop archived extensions from dependency list [integration/config] - 10https://gerrit.wikimedia.org/r/1054039 (owner: 10Pppery) [07:10:21] (03Merged) 10jenkins-bot: Zuul: Drop archived extensions from dependency list [integration/config] - 10https://gerrit.wikimedia.org/r/1054039 (owner: 10Pppery) [07:10:57] (03CR) 10Ilias Sarantopoulos: [C:03+1] inference-services: update nsfw-model src path [integration/config] - 10https://gerrit.wikimedia.org/r/1054085 (https://phabricator.wikimedia.org/T369344) (owner: 10Kevin Bazira) [07:20:54] (03CR) 10Hashar: [C:03+2] inference-services: update nsfw-model src path [integration/config] - 10https://gerrit.wikimedia.org/r/1054085 (https://phabricator.wikimedia.org/T369344) (owner: 10Kevin Bazira) [07:22:37] (03Merged) 10jenkins-bot: inference-services: update nsfw-model src path [integration/config] - 10https://gerrit.wikimedia.org/r/1054085 (https://phabricator.wikimedia.org/T369344) (owner: 10Kevin Bazira) [07:32:10] 10GitLab (Administration, Settings & Policy), 06Release-Engineering-Team, 06collaboration-services, 06Java-Scala-Standardization, 10Data-Platform-SRE (2024.07.08 - 2024.07.28): Create a global Maven package registry in Gitlab - https://phabricator.wikimedia.org/T367322#9980214 (10LSobanski) [07:32:31] 10GitLab (Administration, Settings & Policy), 06Release-Engineering-Team, 06collaboration-services, 06Java-Scala-Standardization, 10Data-Platform-SRE (2024.07.08 - 2024.07.28): Create a global Maven package registry in Gitlab - https://phabricator.wikimedia.org/T367322#9980220 (10LSobanski) [08:42:45] (03update) 10jnuche: Ask for confirmation instead of aborting if Train task status is no good [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/379 (https://phabricator.wikimedia.org/T369831) (owner: 10aklapper) [08:45:02] (03approved) 10jnuche: Ask for confirmation instead of aborting if Train task status is no good [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/379 (https://phabricator.wikimedia.org/T369831) (owner: 10aklapper) [08:45:52] 10Gerrit, 07Upstream: Gerrit allows merging the same patch twice - https://phabricator.wikimedia.org/T355080#9980352 (10hashar) 05Open→03Declined In the ideal world we might add an option for Gerrit to detect empty commits and reject them, then I don't think it is prone to happen that often and I am in... [08:50:28] 10Continuous-Integration-Infrastructure, 07Jenkins, 06Release-Engineering-Team, 06collaboration-services: releases1003 file system over 90% full - https://phabricator.wikimedia.org/T368239#9980371 (10hashar) The immediate issue has been fixed by cleaning up old files and left over temporary files. The Jen... [08:56:33] 10Continuous-Integration-Infrastructure, 07Jenkins, 07Upstream: Jenkins collapsible console section plugin no more shows table of content - https://phabricator.wikimedia.org/T362048#9980386 (10hashar) 05Stalled→03Resolved I have released 1.9.0 of the plugin and apparently updated the CI Jenkins at th... [09:03:34] 10Fresh, 07ARM support, 07Upstream: ECONNREFUSED error when running Selenium tests on M1 Mac - https://phabricator.wikimedia.org/T308889#9980430 (10hashar) >>! In T308889#9971593, @Mhurd wrote: > Oh, I have "Rosetta for x86_64/amd64 emulation on Apple Silicon" enabled > {F56340060} When inside the container... [09:37:05] !log gerrit: added `ncmonitor` to the Service Users group | T366637 [09:37:07] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [09:37:07] T366637: Create a Gerrit bot account for ncmonitor - https://phabricator.wikimedia.org/T366637 [09:41:02] 10Gerrit, 06Release-Engineering-Team: Create a Gerrit bot account for ncmonitor - https://phabricator.wikimedia.org/T366637#9980540 (10hashar) 05Invalid→03Resolved a:03hashar In Gerrit, I have added ncmonitor to the [[ https://gerrit.wikimedia.org/r/admin/groups/d39fe9cefd40ca1a07e372c0d7bd7e72ce2e4a... [09:42:27] 10Gerrit, 06Infrastructure-Foundations, 10Puppet CI: Avoid running PCC compiler on CRs without a Hosts footer - https://phabricator.wikimedia.org/T369270#9980544 (10hashar) [10:34:14] what are the arabic speakers *doing* on translatewiki.net T370031 [10:34:19] T370031: MediaWiki core test failure: LanguageArTest::testSprintfDate: Failed asserting that two strings are equal. - https://phabricator.wikimedia.org/T370031 [10:55:59] (03PS1) 10Robert Vogel: Change template for FlexiSkin and Workflows [integration/config] - 10https://gerrit.wikimedia.org/r/1054305 [11:02:41] (03PS1) 10Daniel Kinzler: Introduce extension_survey.py [tools/code-utils] - 10https://gerrit.wikimedia.org/r/1054308 [11:03:00] (03CR) 10CI reject: [V:04-1] Introduce extension_survey.py [tools/code-utils] - 10https://gerrit.wikimedia.org/r/1054308 (owner: 10Daniel Kinzler) [11:03:46] (03PS2) 10Daniel Kinzler: Introduce extension_survey.py [tools/code-utils] - 10https://gerrit.wikimedia.org/r/1054308 [11:11:41] (03PS3) 10Daniel Kinzler: Introduce extension_survey.py [tools/code-utils] - 10https://gerrit.wikimedia.org/r/1054308 [12:33:22] (03PS2) 10Hashar: zuul: FlexiSkin and Workflows are BlueSpice extensions [integration/config] - 10https://gerrit.wikimedia.org/r/1054305 (owner: 10Robert Vogel) [12:33:25] (03CR) 10Hashar: [C:03+2] zuul: FlexiSkin and Workflows are BlueSpice extensions [integration/config] - 10https://gerrit.wikimedia.org/r/1054305 (owner: 10Robert Vogel) [12:33:47] (03CR) 10Hashar: [C:03+2] "I have slightly reworded the commit message :)" [integration/config] - 10https://gerrit.wikimedia.org/r/1054305 (owner: 10Robert Vogel) [12:34:56] (03Merged) 10jenkins-bot: zuul: FlexiSkin and Workflows are BlueSpice extensions [integration/config] - 10https://gerrit.wikimedia.org/r/1054305 (owner: 10Robert Vogel) [12:40:55] (03CR) 10Hashar: [C:03+2] Zuul: [mediawiki/extensions/UserVerification] Add new extension [integration/config] - 10https://gerrit.wikimedia.org/r/1053949 (owner: 10Zoranzoki21) [12:42:42] (03Merged) 10jenkins-bot: Zuul: [mediawiki/extensions/UserVerification] Add new extension [integration/config] - 10https://gerrit.wikimedia.org/r/1053949 (owner: 10Zoranzoki21) [12:48:20] (03CR) 10Hashar: "The depends-on Ic026aba4fcb59d601d8219cef85e97ecd488fd51 modifies `Dockerfile` however CI does not use that." [integration/config] - 10https://gerrit.wikimedia.org/r/1053016 (https://phabricator.wikimedia.org/T369795) (owner: 10Itamar Givon) [13:37:13] Does anyone know where the keyholder passphrase is for deployment-cumin? I've looked in production pws, and on the puppetserver, so far no luck [14:27:04] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team: Rebuild integration-cumin to get rid of Debian Buster - https://phabricator.wikimedia.org/T360784#9981467 (10Andrew) Hello, @hashar! The deadline for this rebuild is today :) Likely you can just replace this VM with an identically-puppetized... [14:35:50] 10Gerrit, 06Infrastructure-Foundations, 10Puppet CI: Avoid running PCC compiler on CRs without a Hosts footer - https://phabricator.wikimedia.org/T369270#9981506 (10joanna_borun) p:05Triage→03Low [14:48:10] 10Gerrit, 06Infrastructure-Foundations, 10Puppet CI: Avoid running PCC compiler on CRs without a Hosts footer - https://phabricator.wikimedia.org/T369270#9981578 (10Volans) As long as we check that with `*` or `A:all` or similar we can have the same behavior this seems a sane idea. We could also advertise m... [16:02:49] 10Continuous-Integration-Infrastructure, 07Jenkins, 06Release-Engineering-Team, 06collaboration-services: releases1003 file system over 90% full - https://phabricator.wikimedia.org/T368239#9982209 (10Dzahn) Cool! I would say that gets us back to "not super urgent but when we do the next distro version upgr... [16:04:14] andrewbogott: is was in deployment-puppetserver-1:/srv/git/labs/private/files/ssh/tin/cumin_rsa.passphrase. `keyholder status` is happy now on deployment-cumin. [16:04:39] I was looking at it :D [16:04:54] thanks! let's see if that works on the new one... [16:05:04] andrewbogott: I will do the Buster instances from the `integration` project tomorrow [16:05:18] great, thank you [16:05:20] it is too late for me to tackle them right now [16:05:35] 06Release-Engineering-Team, 06Infrastructure-Foundations, 10Cloud-VPS (Debian Buster Deprecation): Cloud VPS "integration" project Buster deprecation - https://phabricator.wikimedia.org/T367534#9982228 (10hashar) a:03hashar I will do the last two sets (cumin and pkgbuilder) tomorrow, it is a bit too late f... [16:05:48] they should be straight forward [16:05:58] thanks for remembering me about it andrewbogott ! [16:06:59] Project beta-scap-sync-world build #163699: 04FAILURE in 1 min 56 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/163699/ [16:07:41] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team: Rebuild integration-cumin to get rid of Debian Buster - https://phabricator.wikimedia.org/T360784#9982236 (10hashar) a:03hashar I will rebuild them tomorrow as well as integration-pkgbuilder instances ( T360786 ). That should be straightfo... [16:07:44] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team: Rebuild integration-agent-pkgbuilder-1001 and integration-agent-pkgbuilder-1002 to get rid of Debian Buster - https://phabricator.wikimedia.org/T360786#9982244 (10hashar) a:03hashar [16:08:14] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team: Rebuild integration-agent-pkgbuilder-1001 and integration-agent-pkgbuilder-1002 to get rid of Debian Buster - https://phabricator.wikimedia.org/T360786#9982242 (10hashar) I will rebuild them tomorrow as well as integration-cumin ( T360784 ).... [16:10:16] 10Phabricator: Explore displaying additional info in comment field for open Outreach related tasks - https://phabricator.wikimedia.org/T367500#9982256 (10debt) @Aklapper this certainly sounds interesting, but I'm not sure how it would be activated. Automatically if we use a certain tag such as 'outreachy'? I'm c... [16:12:31] andrewbogott, hashar: I added a section at https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/Help#Secrets to try and document this hiding place for passwords. [16:13:01] nice, thank you [16:16:55] Yippee, build fixed! [16:16:55] Project beta-scap-sync-world build #163700: 09FIXED in 2 min 9 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/163700/ [16:37:18] Project beta-scap-sync-world build #163702: 04FAILURE in 1 min 56 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/163702/ [16:47:11] Yippee, build fixed! [16:47:12] Project beta-scap-sync-world build #163703: 09FIXED in 2 min 4 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/163703/ [17:09:40] hello releng! No hurry, but just wanted to give a heads-up that I tagged y'all on T368033 (CD/gitlab/airflow discussion) [17:09:40] T368033: Design a suitable DAG deployment method - https://phabricator.wikimedia.org/T368033 [17:48:25] 06Release-Engineering-Team, 10wikimedia.biterg.io: Create account for Levi Ferreira on wikimedia.biterg.io - https://phabricator.wikimedia.org/T370089 (10thcipriani) 03NEW [17:48:44] 10Release-Engineering-Team (Priority Backlog 📥), 10wikimedia.biterg.io: Create account for Levi Ferreira on wikimedia.biterg.io - https://phabricator.wikimedia.org/T370089#9982740 (10thcipriani) [19:22:44] Project beta-update-databases-eqiad build #77474: 04FAILURE in 2 min 43 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/77474/ [19:25:34] Project beta-scap-sync-world build #163719: 04FAILURE in 38 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/163719/ [19:34:44] Project beta-scap-sync-world build #163720: 04STILL FAILING in 22 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/163720/ [19:45:01] Project beta-scap-sync-world build #163721: 04STILL FAILING in 35 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/163721/ [19:54:25] 06Release-Engineering-Team, 10Data-Engineering (Q1 2024 July 1st - September 30th), 07Spike: [Developer Experience] [SPIKE] Investigate process to automate deployment of folders and artifacts to HDFS - https://phabricator.wikimedia.org/T360968#9983226 (10lbowmaker) [19:55:02] Project beta-scap-sync-world build #163722: 04STILL FAILING in 37 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/163722/ [20:04:54] Project beta-scap-sync-world build #163723: 04STILL FAILING in 30 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/163723/ [20:15:07] Project beta-scap-sync-world build #163724: 04STILL FAILING in 36 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/163724/ [20:20:05] Project beta-update-databases-eqiad build #77475: 04STILL FAILING in 4.3 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/77475/ [20:25:02] Project beta-scap-sync-world build #163725: 04STILL FAILING in 41 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/163725/ [20:34:48] Project beta-scap-sync-world build #163726: 04STILL FAILING in 23 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/163726/ [20:45:00] Project beta-scap-sync-world build #163727: 04STILL FAILING in 36 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/163727/ [20:55:03] Project beta-scap-sync-world build #163728: 04STILL FAILING in 36 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/163728/ [21:04:57] Project beta-scap-sync-world build #163729: 04STILL FAILING in 31 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/163729/ [21:13:19] andrewbogott: did you end up turning off that deployment-prep etcd node the other day? [21:13:21] "20:25:02 Warning: EtcdConfig failed to fetch data: (curl error: 28) Timeout was reached in /srv/mediawiki-staging/php-master/includes/config/EtcdConfig.php on line 204" [21:13:44] That's the current beta-scap-sync-world failure [21:15:01] bd808: I didn't shut it down but I tried to cluster it with the new node today and everything broke [21:15:06] Project beta-scap-sync-world build #163730: 04STILL FAILING in 37 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/163730/ [21:15:52] And then I thought I would just build a fresh host and restore a snapshot and that's when I learned that our docs for creating a new etcd cluster are wrong [21:15:58] `_etcd._tcp.svc.deployment-prep.eqiad1.wikimedia.cloud` is the host is configured to talk to via wmf-config/LabsServices.php [21:16:15] andrewbogott: :( fun times [21:16:53] etcd config seems very stateful, puppet isn't really offering to restore anything to a working state [21:19:33] the state of the 02 node is: it won't start because it can't talk to the new node. but you can't remove a node if it isn't running. [21:20:00] I'm not clear on how that has ever worked for anyone, probably I'm supposed to observer a 'throw it away and start over' practice instead [21:20:05] Project beta-update-databases-eqiad build #77476: 04STILL FAILING in 4.2 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/77476/ [21:20:20] bit since no one knows what's stored in there I don't know what I'll lose if I throw it away [21:21:52] The IN SRV record at `_etcd._tcp.svc.deployment-prep.eqiad1.wikimedia.cloud.` returns "1 1 2379 deployment-etcd02.deployment-prep.eqiad1.wikimedia.cloud." so if there wasn't anything on deployment-etcd02 when you queried it my hunch would be that there needs to be an etcd for the client to connect to, but it can return an empty dataset. [21:23:06] hmmm [21:23:31] does it do anything now? [21:23:41] I pointed it to 03 which is also broken but in a different way [21:23:59] (seems to return 404 for most everything) [21:24:34] * bd808 pokes the jenkins job [21:25:05] Project beta-scap-sync-world build #163731: 04STILL FAILING in 38 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/163731/ [21:25:35] "Warning: EtcdConfig failed to fetch data: (curl error: 28) Timeout was reached in /srv/mediawiki-staging/php-master/includes/config/EtcdConfig.php on line 204" -- same error message [21:25:46] * bd808 will look at that php code [21:25:48] hmmm I wonder if it's using a different endpoint [21:26:12] * andrewbogott moves the other two [21:26:26] I think it is trying to talk to https://deployment-etcd03.deployment-prep.eqiad1.wikimedia.cloud:2379 [21:29:33] The node seems to not be setup to do tls? Using `https` fails cert validation, Adding `-k` then gives a bad cert failure. Plain http seems to work [21:30:02] I'm pretty sure we only ever use tls for etcd [21:30:27] if deployment-prep isn't expecting tls then... likely none of the modern puppetization is going to work... [21:30:51] andrewbogott: no other way around. php wants to talk https, the etcd endpoint seems not to support it [21:31:37] `curl -v https://deployment-etcd03.deployment-prep.eqiad1.wikimedia.cloud:2379` from inside the project goes boom [21:33:11] The main thing I don't understand is the profile::etcd::v3::discovery setting [21:33:49] in prod it's things like role/common/etcd/v3/kubernetes.yaml:profile::etcd::v3::discovery: "dns:k8s3.%{::site}.wmnet" [21:34:01] but on 02 it was profile::etcd::v3::discovery: '%{::hostname}=https://%{::fqdn}:2380' [21:34:03] wildly different [21:35:00] Project beta-scap-sync-world build #163732: 04STILL FAILING in 35 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/163732/ [21:41:01] andrewbogott: that looks broadly like a difference between a multi-node cluster with dns involvement and a single node to me, but I don't actually know anything about etcd config in practice. [21:45:07] Project beta-scap-sync-world build #163733: 04STILL FAILING in 38 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/163733/ [21:49:26] Can we change that warning to just link to https://phabricator.wikimedia.org/T215217 ? [21:55:00] Project beta-scap-sync-world build #163734: 04STILL FAILING in 36 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/163734/ [21:55:35] andrewbogott: which warning? The Jenkins job failure? [21:56:03] the STILL FAILING but I'm just complaining [21:56:45] We can just pause the job if you expect it to be broken for a while. This is impactful to folks like QTE who do use this environment for testing despite the lack of formal organizational support for it's maintenance. [21:58:31] best to pause it for now [21:58:57] I know that this affects people, I just never know how much we should prioritize it as individuals or teams when the org as a whole doesn't [21:59:04] !log disable beta-scap-sync-world until etcd maintenance concludes [21:59:05] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [21:59:08] thanks [21:59:42] I will work on this more tonight but will probably not skip my dinner invitation [22:00:34] can we roll etcd02 back to a working state by removing the clustering attempt? [22:00:57] or are you determined to create a cluster? [22:01:41] let me quote myself: [22:01:42] "the state of the 02 node is: it won't start because it can't talk to the new node. but you can't remove a node if it isn't running." [22:01:56] but please have a go if you'd like to! [22:02:09] I was only trying to create a cluster in order to migrate state over from 02 to 03 [22:03:36] sorry. I'm not trying to be up in your biz. I was just wondering if there was a way to relieve your stress. [22:05:25] no worries! A rollback would be great, I just ran out of ideas once I got into that chicken/egg state [22:05:39] All the googles say 'just do etcd remove node' if things are broken [22:05:56] which makes me wonder if 02 is running an ancient version that behaves differently [22:06:02] All I know about rtcd is to yell for help from folks who don't work here anymore... :/ [22:06:08] *etcd [22:06:13] 10Phabricator (phabricator-next), 06Release-Engineering-Team: Deploy Phabricator/Phorge 2024-07-15 - https://phabricator.wikimedia.org/T370109 (10brennen) 03NEW [22:09:36] it is a pretty ancient version: 3.2.26 (Jan 2019). Latest upstream is 3.4.33. [22:20:04] Project beta-update-databases-eqiad build #77477: 04STILL FAILING in 3.5 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/77477/ [22:20:45] bd808: can you try again and/or tell me if it's failing in the same way now? [22:21:14] !log disable beta-update-databases-eqiad until etcd maintenance concludes [22:21:15] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [22:21:32] andrewbogott: yeah, let me try to do the needful manually... [22:21:36] thx [22:21:45] I retried everything with bullseye instead of bookworm, it seems somewhat happier [22:23:48] * bd808 jsut enables the job and runs it [22:24:32] ah, code update is running right now so the sync job is blocked until that finishes [22:24:51] * bd808 tails the log [22:25:02] Project beta-scap-sync-world build #163735: 04STILL FAILING in 35 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/163735/ [22:25:07] aw [22:25:52] Looks like the same "Warning: EtcdConfig failed to fetch data: (curl error: 28) Timeout" failure. [22:26:02] hm... it should be hitting -05 now [22:26:10] although there could be a firewall piece I missed... [22:27:02] there could be dns cache in the way too [22:27:10] let me check that... [22:28:08] `dig SRV _etcd._tcp.svc.deployment-prep.eqiad1.wikimedia.cloud.` on deploy04 is still saying etcd03. So that's a problem. [22:28:13] I'm updating a (probably meaningless) puppet setting throughout the cluster too... [22:29:41] I get 03 no matter where I dig [22:29:46] even though designate says 05 [22:30:27] "It's always DNS" [22:30:42] it wasn't dns before but it /might/ be dns now [22:30:48] can you tell where it's cached? [22:31:32] Can I? (so far I cannot) [22:33:32] If I target the ns{0,1}.openstack.eqiad1.wikimediacloud.org servers directly I get the etcd05 response [22:34:09] ok, so easiest is to just wait 45 minutes [22:34:18] which suits me since I need to go to dinner anyway [22:34:34] either everything will recover by the time I'm back, or I'll dive back in. [22:34:36] ns-recursor.openstack.eqiad1.wikimediacloud.org has the etcd03 response cached. [22:34:37] Project beta-scap-sync-world build #163736: 04STILL FAILING in 18 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/163736/ [22:36:04] thanks for help pushing through this! I may or may not reappear in ~90. [22:36:32] btw, in theory the new etcd server has the same state as the old one (the dump restore seems to have worked) but if pieces are missing I will not be shocked. [22:36:43] It really only looked like there's ~1 record stored there. [22:36:57] that honestly sounds right to me [22:44:59] Project beta-scap-sync-world build #163737: 04STILL FAILING in 33 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/163737/ [22:55:00] Project beta-scap-sync-world build #163738: 04STILL FAILING in 33 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/163738/ [23:01:53] * bd808 toggles the beta-scap-sync-world job off again [23:07:39] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team: Rebuild integration-cumin to get rid of Debian Buster - https://phabricator.wikimedia.org/T360784#9983654 (10thcipriani) >>! In T360784#9981467, @Andrew wrote: > Hello, @hashar! The deadline for this rebuild is today :) > > Likely you can ju... [23:48:00] OK, dns seems to have refreshed... bd808, should we try it again? [23:48:33] yeah, let me push some buttons [23:48:59] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team: Rebuild integration-cumin to get rid of Debian Buster - https://phabricator.wikimedia.org/T360784#9983719 (10Andrew) >>! In T360784#9983654, @thcipriani wrote: >>>! In T360784#9981467, @Andrew wrote: >> Hello, @hashar! The deadline for this r... [23:49:07] it's running... [23:49:20] Project beta-scap-sync-world build #163739: 04STILL FAILING in 31 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/163739/ [23:49:39] irc knows before the web ui, fancy [23:49:58] [23:50:28] so that looks like etcd is maybe up, but doesn't have the key it's looking for [23:50:35] andrewbogott: the error is different though, so maybe progress. Now it is "Warning: EtcdConfig failed to fetch data: HTTP 404 () in /srv/mediawiki-staging/php-master/includes/config/EtcdConfig.php on line 204" [23:50:40] yeah [23:53:02] ok, so now how/where do we look to figure out what the etcd query is? I guess code diving is in order. [23:53:30] I would guess it's using conftool? [23:53:36] `curl -kv https://deployment-etcd05.deployment-prep.eqiad1.wikimedia.cloud:2379/` does also return 404 [23:54:31] not curl -kv https://deployment-etcd05.deployment-prep.eqiad1.wikimedia.cloud:2379/health though [23:54:59] Project beta-scap-sync-world build #163740: 04STILL FAILING in 31 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/163740/ [23:55:23] andrewbogott: I think this is probably a https://wikitech.wikimedia.org/wiki/Dbctl lookup. I think that finding the db primary and replicas for a slice is the use case for etcd lookups in MediaWiki right now [23:56:20] yeah, most likely [23:58:10] where do we run conftool for deployment-prep? /me has already tried the obvious places [23:58:49] ¯\_(ツ)_/¯ maybe thcipriani has knowledge? [23:59:12] I just ran 'which conftool' on every deployment-prep host and got 68/68 failures