[01:14:41] 10serviceops, 10MW-on-K8s: IPInfo MediaWiki extension depends on presence of maxmind db in the container/host - https://phabricator.wikimedia.org/T288375 (10wkandek) Regarding 1 - a maxmind db change requires a new MediaWiki image and deployment. I believe the current frequency of MediaWiki deployments (multip... [01:19:04] 10serviceops, 10Anti-Harassment, 10IP Info, 10SRE: Update MaxMind GeoIP2 license key and product IDs for application servers - https://phabricator.wikimedia.org/T288844 (10wkandek) [01:19:06] 10serviceops, 10MW-on-K8s: IPInfo MediaWiki extension depends on presence of maxmind db in the container/host - https://phabricator.wikimedia.org/T288375 (10wkandek) [01:29:30] 10serviceops, 10Anti-Harassment, 10IP Info, 10SRE: Update MaxMind GeoIP2 license key and product IDs for application servers - https://phabricator.wikimedia.org/T288844 (10wkandek) In T288375 we are discussing how this extension would have access to the maxmind databases when we migrate MediaWiki to Kubern... [01:34:47] 10serviceops, 10Anti-Harassment, 10IP Info, 10SRE: Update MaxMind GeoIP2 license key and product IDs for application servers - https://phabricator.wikimedia.org/T288844 (10Huji) My understanding is that the changes in the data are minimal from one version to the next; it is not like the ownership of hundre... [02:07:20] jayme: re https://wikitech.wikimedia.org/w/index.php?title=Template:Kubernetes_nav&curid=448135&diff=1922630&oldid=1922561 - can we move those to doc.wikimedia.org? what does it take to build+publish the docs? [06:47:33] 10serviceops, 10SRE, 10Performance-Team (Radar), 10User-jijiki: Reduce number of shards in redis_sessions cluster - https://phabricator.wikimedia.org/T280582 (10jijiki) [06:50:08] 10serviceops, 10SRE, 10Performance-Team (Radar), 10User-jijiki: Reduce number of shards in redis_sessions cluster - https://phabricator.wikimedia.org/T280582 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts: ` ['mc2019.codfw.wmnet', 'mc1037.eqiad.wmnet'... [07:37:16] 10serviceops, 10SRE, 10Performance-Team (Radar), 10User-jijiki: Reduce number of shards in redis_sessions cluster - https://phabricator.wikimedia.org/T280582 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mc1037.eqiad.wmnet', 'mc2019.codfw.wmnet', 'mc1038.eqiad.wmnet'] ` and were **ALL** succ... [08:07:16] 10serviceops, 10SRE, 10Performance-Team (Radar), 10User-jijiki: Reduce number of shards in redis_sessions cluster - https://phabricator.wikimedia.org/T280582 (10jijiki) [08:53:29] jayme: Shoudl I use docker-registry.wikimedia.org or docker-registry.discovery.wmnet in values.yaml? I heard I should use the discovery name but in the repo it seems all other servics use the .wikimedia.org name [08:58:57] 10serviceops, 10SRE-swift-storage, 10envoy, 10Patch-For-Review: Envoy and swift HEAD with 204 response turns into 503 - https://phabricator.wikimedia.org/T288815 (10fgiunchedi) I've deployed the fix from Swift upstream and it is working (i.e. Swift DTRT and Envoy's happy). @RLazarus I believe we're okay to... [09:01:34] aha, so the default is .wikimedia.org but then it's overrideden in helmfile.d/services to use the discovery name. I see now, thanks to Jelto [09:23:35] 10serviceops, 10User-jijiki: Productionise mc10[37-54].eqiad.wmnet - https://phabricator.wikimedia.org/T278225 (10jijiki) [09:54:54] legoktm: I more or less did this https://github.com/kubernetes/website/tree/release-1.16#running-the-website-locally-using-docker to build the docs. Idk how if it's worth spending more time on it as we should try hard to move to a supported k8s version anyways [09:55:49] mutante: all charts usually use wikimedia.org as default for the images (so you can install them in minikube etc.) but that value gets overridden in helmfile.d (or at least it should) to discovery.wmnet [09:56:16] jayme: ACK, thanks for confirming that. I am doing just that now [09:56:58] and adding image name and version in the override as well, trying to fix "invalid image name" [09:59:52] the image name should stay the same (in chart vs. helmfile.d), but for the version/tag you'll probably want "latest" in the chart and override that with something solid in hekmfile.d [09:59:59] eheh *helmfile.d [10:01:29] alright [12:11:16] 10serviceops, 10SRE: Run httpbb periodically - https://phabricator.wikimedia.org/T289202 (10Dzahn) > What individual hosts should we test? "All tests on each appserver" is probably more work than we need to do. We probably don't want to pick a random host every time (the behavior should be consistent, but if i... [12:33:18] jayme, effie : I decided to post a message on flink mailing list regarding my issue - do you see any risk of me posting logs from the deployment? I know that switft client for some reason posts the actual swift password, but I already scrubbed that [12:53:51] zpapierski: if you don't see anything sinister [12:54:06] it is ok, we do do know what is in those logs [12:54:29] so we will trust your judgement and power of reducting [14:19:33] 10serviceops, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar), 10User-jijiki: Reduce number of shards in redis_sessions cluster - https://phabricator.wikimedia.org/T280582 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts: ` ['mc2021.codfw.wmnet'... [14:20:32] 10serviceops, 10Patch-For-Review, 10User-jijiki: Productionise mc10[37-54].eqiad.wmnet - https://phabricator.wikimedia.org/T278225 (10jijiki) [14:53:20] 10serviceops, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar), 10User-jijiki: Reduce number of shards in redis_sessions cluster - https://phabricator.wikimedia.org/T280582 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mc2021.codfw.wmnet'] ` and were **ALL** successful. [15:05:12] 10serviceops, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar), 10User-jijiki: Reduce number of shards in redis_sessions cluster - https://phabricator.wikimedia.org/T280582 (10jijiki) [15:17:55] 10serviceops, 10SRE-swift-storage, 10envoy, 10Patch-For-Review: Envoy and swift HEAD with 204 response turns into 503 - https://phabricator.wikimedia.org/T288815 (10RLazarus) 05Open→03Resolved Sounds good to me! That means the $runtime field is unused anywhere, but I think it's a useful knob to have, s... [15:56:06] 10serviceops, 10Patch-For-Review, 10User-jijiki: Productionise mc10[37-54].eqiad.wmnet - https://phabricator.wikimedia.org/T278225 (10jijiki) All hosts are now in the redis_session cluster, and part in mcrouter's configuration. After we merge https://gerrit.wikimedia.org/r/714032, this will be complete. [16:29:53] 10serviceops, 10Performance-Team: Rewrite mw-warmup.js in Python - https://phabricator.wikimedia.org/T288867 (10RLazarus) Documenting this for posterity -- we agreed back in T269179 that when rewriting it in Python, it would be a good idea to move it directly into Spicerack's mediawiki module (or somewhere lik... [17:18:34] bd808: just wanted to check in, how is the toolhub deployment going? did you need any other help? [17:48:15] 10serviceops, 10User-jijiki: Productionise thumbor1005 and thumbor1006 - https://phabricator.wikimedia.org/T285477 (10RLazarus) @Arnoldokoth is interested in tagging along for this. [17:55:15] 10serviceops, 10SRE: Run httpbb periodically - https://phabricator.wikimedia.org/T289202 (10RLazarus) Sounds reasonable! I'll probably hardcode a canary host at first, then we can look at choosing one automatically. [18:11:55] 10serviceops, 10User-jijiki: Productionise thumbor1005 and thumbor1006 - https://phabricator.wikimedia.org/T285477 (10Arnoldokoth) Yeah, I would like to observe the process. [18:13:15] legoktm: jeena merged my patches for the helm chart, but I haven't tried to do any next step yet. I need to try a benchmarking exercise to pick an initial size for the container. But I also realized that there are non-code things (licensing!) that needs worked out before launch, which took some of my sense of urgency away until that is settled. [18:14:50] I'm hoping to do more next week. I would like to get into the staging cluster so I can see what is missing from the chart (egress, proxy config, mcrouter? etc) [18:16:51] https://wikitech.wikimedia.org/wiki/User:Alexandros_Kosiaris/Benchmarking_kubernetes_apps has some guidance about benchmarking [18:17:52] and ok :) you should be able to deploy to staging (and prod clusters) all by yourself, feel free to ping me if you need any help [19:18:16] jeena clued me into the existence of https://github.com/thesocialdev/mediawiki-services-profiler too which is based on ak.osiaris' essay [21:41:53] 10serviceops, 10Shellbox: Benchmark Shellbox - https://phabricator.wikimedia.org/T286384 (10Legoktm) 05Open→03Resolved a:03Legoktm I think we're mostly set with the current configuration. Benchmarking and real usage shows that for the most part Shellbox is sitting idle, but the current capacity is adequ... [21:43:57] 10serviceops, 10MW-on-K8s, 10SRE, 10Shellbox, and 3 others: RFC: PHP microservice for containerized shell execution - https://phabricator.wikimedia.org/T260330 (10Legoktm)