[07:45:01] 10serviceops, 10SRE, 10ops-eqiad: Kubernetes1018's eth negotiated speed is 10MB/s - https://phabricator.wikimedia.org/T296369 (10wiki_willy) a:03Cmjohnson [08:01:48] 10serviceops, 10Dumps-Generation: Test php7.4 for dumps generation - https://phabricator.wikimedia.org/T295580 (10ArielGlenn) p:05Triage→03Medium [08:02:46] 10serviceops, 10Dumps-Generation: Test php7.4 for dumps generation - https://phabricator.wikimedia.org/T295580 (10ArielGlenn) Doing some testing of SQL/XML dumps in deployment-prep today, with php7.2 and 7.4 both installed. I don't expect any issues given that all my local testing is with 7.4, but better safe... [12:07:15] 10serviceops, 10SRE, 10Wikimedia-Site-requests, 10Patch-For-Review, and 2 others: Split search.wikimedia.org out of ops/mediawiki-config into separate service - https://phabricator.wikimedia.org/T289224 (10Majavah) 05Open→03Resolved [12:41:09] Friends, I'm preparing some patches for kartotherian and have questions about how the deployment repo is built. [12:42:14] I think this is how the kartotherian-deploy repo is built, please correct me if I'm wrong: docker run --rm -it -v $(pwd):/srv/app -w /srv/app node:12 npm install [12:43:03] However, I'm confused about where @wikimedia/kartotherian is coming from. npmjs.com says that it comes from https://github.com/kartotherian/kartotherian , but that repo hasn't budged in 3 years and is missing the latest tags. [12:45:47] AIUI, the correct repo is https://github.com/wikimedia/mediawiki-services-kartotherian but if npmjs.com is pointing to the wrong place, how does that work? [12:47:21] Aah--maybe the repo is set correctly, but npmjs.com is reporting the bad metadata from the repo's package.json? [12:47:41] Still doesn't explain why I can't find the latest tags in *any* repo however. [12:56:44] I also see that lerna is required for the `npm install --production` so I believe I need a custom docker image. [13:20:31] 10serviceops, 10Security-Team, 10GitLab (CI & Job Runners), 10Patch-For-Review, and 2 others: Setup GitLab Runner in trusted environment - https://phabricator.wikimedia.org/T295481 (10Jelto) 05Open→03In progress p:05Triage→03High [13:47:04] No luck following the instructions which ask me to run the node script to launch docker and another node inside it: `./server.js build --deploy-repo` complains that config.yaml is missing. [13:50:08] 10serviceops, 10Wikidata-Query-Service, 10Discovery-Search (Current work): Additional capacity on the k8s Flink cluster for WCQS updater - https://phabricator.wikimedia.org/T280485 (10Gehel) 05Open→03Resolved [15:05:00] awight: fwiw, parsing & infrastructure from product are phasing out slowly kartotherian (the sister part of tilerator is already being replaced by tegola tile server, an off the shelf component). I would not be surprised if docs are out of date, and I am not sure it's worth it to even put effort to make them better (but the team should be the [15:05:01] canonical point to answer that). But ot answer your question somewhat, kartotherian is deployed from https://gerrit.wikimedia.org/g/maps/kartotherian/deploy/+/refs/heads/imposm (note that imposm branch is the currently deployed one, not the master one). [15:20:27] 10serviceops: Upgrade kafka-main nodes to buster - https://phabricator.wikimedia.org/T296641 (10elukey) [15:29:52] 10serviceops, 10SRE, 10ops-eqiad: Kubernetes1018's eth negotiated speed is 10MB/s - https://phabricator.wikimedia.org/T296369 (10Cmjohnson) 05Open→03Resolved replaced the cable. Good to go now cmjohnson@kubernetes1018:~$ sudo ethtool eno1 | grep Speed Speed: 1000Mb/s [15:31:39] _joe_: can https://gerrit.wikimedia.org/r/c/operations/puppet/+/738194 be either merged or removed from deployment-prep cherrypicks? [15:43:56] majavah: jo_e is out today, he's back tomorrow [15:44:42] ack, definitely no hurry [15:44:44] akosiaris: Thanks for the breadcrumbs! [15:45:53] yw [15:46:04] I fully endorse whatever plans exist to phase out kartotherian, and we're only making small changes. Makes sense that deploying this repo is a mystical art, it certainly gave me trouble just setting up a development build. [16:01:57] Looks like the meeting VC code is broken again. [16:02:14] Did anyone manage to join or is there another, working one? [16:03:08] sobanski: try going to meet.google.com directly to get in [16:03:11] that's what I had to do [16:03:19] Ah [16:16:21] hello folks, anybody avaiable for a wmf-ca-certificates review? https://gerrit.wikimedia.org/r/c/operations/debs/wmf-certificates/+/742485 [17:19:23] I think eventgate-main should be switched to active/active now, it was only pooled in one DC because of the old WDQS updater, which is gone now [17:20:37] I'll file a task for that [17:38:18] I might have messed up a service-checker based helmfile test (for the termbox service, staging cluster) :( [17:38:25] can someone help me fix it? [17:49:47] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Run helm test after deploy - https://phabricator.wikimedia.org/T276949 (10Lucas_Werkmeister_WMDE) Just gonna leave a note here that the command in the task description is seemingly outdated, and you shouldn’t try to run it manually (like I did): `counterexample... [18:00:05] (termbox issue discussion happening in -operations now ftr) [18:25:17] but I’ll quickly summarize the termbox status quo here, to avoid disturbing -operations during the outage (which I assume is unrelated) [18:25:35] deployment-charts change I32c6d6be7e is rolled out to staging cluster, but not eqiad or codfw [18:26:00] `helpfile -e staging -l name=staging test --cleanup` failed, to be investigated / cleaned up [18:26:29] the *test* release in the staging env/cluster seems to be fine, so no huge reason for concern [18:55:54] jelto: shall we look into the termbox issues tomorrow? (when the outage is hopefully over…) [18:58:02] Lucas_WMDE: as you mentioned tests run against the release named "test". So you have to specify this release when running tests: helmfile -e staging --selector name=test test [18:58:59] then the tests are successful. I can cleanup the pods from the failed test. I'd say deploy to eqiad and codfw should happen after the incident/tomorrow [18:59:03] okay, so PEBCAK ^^ [18:59:06] ack, thank you [19:00:11] should I wait for you with the eqiad/codfw deployments tomorrow or do it when it works for me? [19:00:14] (I’m in CET timezone) [19:04:40] Lucas_WMDE: I'm also in CET, just do it tomorrow and ping here if something doesn't work :) [19:04:52] ok :) thanks again! [21:12:36] 10serviceops, 10SRE, 10foundation.wikimedia.org, 10User-Urbanecm: Investigate and restore foundationwiki 302 httpbb test - https://phabricator.wikimedia.org/T296687 (10RLazarus) p:05Triage→03Medium [21:13:45] 10serviceops, 10SRE, 10foundation.wikimedia.org, 10User-Urbanecm_WMF (GovWiki): Investigate and restore foundationwiki 302 httpbb test - https://phabricator.wikimedia.org/T296687 (10Urbanecm_WMF) a:05Urbanecm→03Urbanecm_WMF Reassigning with my contractor hat :).