[07:45:01] <wikibugs>	 10serviceops, 10SRE, 10ops-eqiad: Kubernetes1018's eth negotiated speed is 10MB/s - https://phabricator.wikimedia.org/T296369 (10wiki_willy) a:03Cmjohnson
[08:01:48] <wikibugs>	 10serviceops, 10Dumps-Generation: Test php7.4 for dumps generation - https://phabricator.wikimedia.org/T295580 (10ArielGlenn) p:05Triage→03Medium
[08:02:46] <wikibugs>	 10serviceops, 10Dumps-Generation: Test php7.4 for dumps generation - https://phabricator.wikimedia.org/T295580 (10ArielGlenn) Doing some testing of SQL/XML dumps in deployment-prep today, with php7.2 and 7.4 both installed. I don't expect any issues given that all my local testing is with 7.4, but better safe...
[12:07:15] <wikibugs>	 10serviceops, 10SRE, 10Wikimedia-Site-requests, 10Patch-For-Review, and 2 others: Split search.wikimedia.org out of ops/mediawiki-config into separate service - https://phabricator.wikimedia.org/T289224 (10Majavah) 05Open→03Resolved
[12:41:09] <awight>	 Friends, I'm preparing some patches for kartotherian and have questions about how the deployment repo is built.
[12:42:14] <awight>	 I think this is how the kartotherian-deploy repo is built, please correct me if I'm wrong: docker run --rm -it -v $(pwd):/srv/app -w /srv/app node:12 npm install
[12:43:03] <awight>	 However, I'm confused about where @wikimedia/kartotherian is coming from.  npmjs.com says that it comes from https://github.com/kartotherian/kartotherian , but that repo hasn't budged in 3 years and is missing the latest tags.
[12:45:47] <awight>	 AIUI, the correct repo is https://github.com/wikimedia/mediawiki-services-kartotherian but if npmjs.com is pointing to the wrong place, how does that work?
[12:47:21] <awight>	 Aah--maybe the repo is set correctly, but npmjs.com is reporting the bad metadata from the repo's package.json?
[12:47:41] <awight>	 Still doesn't explain why I can't find the latest tags in *any* repo however.
[12:56:44] <awight>	 I also see that lerna is required for the `npm install --production` so I believe I need a custom docker image.
[13:20:31] <wikibugs>	 10serviceops, 10Security-Team, 10GitLab (CI & Job Runners), 10Patch-For-Review, and 2 others: Setup GitLab Runner in trusted environment - https://phabricator.wikimedia.org/T295481 (10Jelto) 05Open→03In progress p:05Triage→03High
[13:47:04] <awight>	 No luck following the instructions which ask me to run the node script to launch docker and another node inside it: `./server.js build --deploy-repo` complains that config.yaml is missing.
[13:50:08] <wikibugs>	 10serviceops, 10Wikidata-Query-Service, 10Discovery-Search (Current work): Additional capacity on the k8s Flink cluster for WCQS updater - https://phabricator.wikimedia.org/T280485 (10Gehel) 05Open→03Resolved
[15:05:00] <akosiaris>	 awight: fwiw, parsing & infrastructure from product are phasing out slowly kartotherian (the sister part of tilerator is already being replaced by tegola tile server, an off the shelf component). I would not be surprised if docs are out of date, and I am not sure it's worth it to even put effort to make them better (but the team should be the
[15:05:01] <akosiaris>	 canonical point to answer that). But ot answer your question somewhat, kartotherian is deployed from https://gerrit.wikimedia.org/g/maps/kartotherian/deploy/+/refs/heads/imposm (note that imposm branch is the currently deployed one, not the master one).
[15:20:27] <wikibugs>	 10serviceops: Upgrade kafka-main nodes to buster - https://phabricator.wikimedia.org/T296641 (10elukey)
[15:29:52] <wikibugs>	 10serviceops, 10SRE, 10ops-eqiad: Kubernetes1018's eth negotiated speed is 10MB/s - https://phabricator.wikimedia.org/T296369 (10Cmjohnson) 05Open→03Resolved replaced the cable. Good to go now  cmjohnson@kubernetes1018:~$ sudo ethtool eno1 | grep Speed  Speed: 1000Mb/s
[15:31:39] <majavah>	 _joe_: can https://gerrit.wikimedia.org/r/c/operations/puppet/+/738194 be either merged or removed from deployment-prep cherrypicks?
[15:43:56] <jelto>	 majavah: jo_e is out today, he's back tomorrow
[15:44:42] <majavah>	 ack, definitely no hurry
[15:44:44] <awight>	 akosiaris: Thanks for the breadcrumbs!
[15:45:53] <akosiaris>	 yw
[15:46:04] <awight>	 I fully endorse whatever plans exist to phase out kartotherian, and we're only making small changes.  Makes sense that deploying this repo is a mystical art, it certainly gave me trouble just setting up a development build.
[16:01:57] <sobanski>	 Looks like the meeting VC code is broken again.
[16:02:14] <sobanski>	 Did anyone manage to join or is there another, working one?
[16:03:08] <legoktm>	 sobanski: try going to meet.google.com directly to get in
[16:03:11] <legoktm>	 that's what I had to do
[16:03:19] <sobanski>	 Ah
[16:16:21] <elukey>	 hello folks, anybody avaiable for a wmf-ca-certificates review? https://gerrit.wikimedia.org/r/c/operations/debs/wmf-certificates/+/742485
[17:19:23] <legoktm>	 I think eventgate-main should be switched to active/active now, it was only pooled in one DC because of the old WDQS updater, which is gone now
[17:20:37] <legoktm>	 I'll file a task for that
[17:38:18] <Lucas_WMDE>	 I might have messed up a service-checker based helmfile test (for the termbox service, staging cluster) :(
[17:38:25] <Lucas_WMDE>	 can someone help me fix it?
[17:49:47] <wikibugs>	 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Run helm test after deploy - https://phabricator.wikimedia.org/T276949 (10Lucas_Werkmeister_WMDE) Just gonna leave a note here that the command in the task description is seemingly outdated, and you shouldn’t try to run it manually (like I did):  `counterexample...
[18:00:05] <Lucas_WMDE>	 (termbox issue discussion happening in -operations now ftr)
[18:25:17] <Lucas_WMDE>	 but I’ll quickly summarize the termbox status quo here, to avoid disturbing -operations during the outage (which I assume is unrelated)
[18:25:35] <Lucas_WMDE>	 deployment-charts change I32c6d6be7e is rolled out to staging cluster, but not eqiad or codfw
[18:26:00] <Lucas_WMDE>	 `helpfile -e staging -l name=staging test --cleanup` failed, to be investigated / cleaned up
[18:26:29] <Lucas_WMDE>	 the *test* release in the staging env/cluster seems to be fine, so no huge reason for concern
[18:55:54] <Lucas_WMDE>	 jelto: shall we look into the termbox issues tomorrow? (when the outage is hopefully over…)
[18:58:02] <jelto>	 Lucas_WMDE: as you mentioned tests run against the release named "test". So you have to specify this release when running tests: helmfile -e staging --selector name=test test
[18:58:59] <jelto>	 then the tests are successful. I can cleanup the pods from the failed test. I'd say deploy to eqiad and codfw should happen after the incident/tomorrow 
[18:59:03] <Lucas_WMDE>	 okay, so PEBCAK ^^
[18:59:06] <Lucas_WMDE>	 ack, thank you
[19:00:11] <Lucas_WMDE>	 should I wait for you with the eqiad/codfw deployments tomorrow or do it when it works for me?
[19:00:14] <Lucas_WMDE>	 (I’m in CET timezone)
[19:04:40] <jelto>	 Lucas_WMDE: I'm also in CET, just do it tomorrow and ping here if something doesn't work :)
[19:04:52] <Lucas_WMDE>	 ok :) thanks again!
[21:12:36] <wikibugs>	 10serviceops, 10SRE, 10foundation.wikimedia.org, 10User-Urbanecm: Investigate and restore foundationwiki 302 httpbb test - https://phabricator.wikimedia.org/T296687 (10RLazarus) p:05Triage→03Medium
[21:13:45] <wikibugs>	 10serviceops, 10SRE, 10foundation.wikimedia.org, 10User-Urbanecm_WMF (GovWiki): Investigate and restore foundationwiki 302 httpbb test - https://phabricator.wikimedia.org/T296687 (10Urbanecm_WMF) a:05Urbanecm→03Urbanecm_WMF Reassigning with my contractor hat :).