[00:23:25] Toolforge question: is it possible to set up and search an ElasticSearch index on a Toolforge tool? Not a huge one, about 75K records, each record quite big though (full texts). [00:27:40] if the records aren’t sensitive, you might be able to use https://wikitech.wikimedia.org/wiki/Help:Toolforge/Elasticsearch [00:40:42] Oh! Yes, perfect! Thanks. The wiki search autocomplete did not offer this for "ElasticSearch", but I should have known better than to trust it. I guess it doesn't autocomplete from the Help namespace? [00:58:10] @abartov: the Help namespace does not autocomplete, but we tend to work around that with redirects and hatnotes in the case of things like ElasticSearch and Puppet that are used in both prod and WMCS spaces. [00:58:36] https://wikitech.wikimedia.org/wiki/Elasticsearch redirects to https://wikitech.wikimedia.org/wiki/Search which has a hatnote about https://wikitech.wikimedia.org/wiki/Help:Toolforge/Elasticsearch [01:07:29] Touche! When I saw "Elasticsearch" redirects to "Search" I figure it would be about Mediawiki's own search and did not even look. Another unfortunate assumption on my part! 😅 (re @wmtelegram_bot: https://wikitech.wikimedia.org/wiki/Elasticsearch redirects to https://wikitech.wikimedia.org/wiki/Search which has a ha...) [01:07:39] Thanks for the clues. I've now filed this: https://phabricator.wikimedia.org/T411445 [09:33:37] taavi: how did you solve the healthcheck issue BTW? [09:34:39] oh.. the port was hardcoded on the proxyfetch URL config parameter :) [09:36:48] I think the envoy behavior needs to be debugged [09:36:59] Host: FQDN:port should work [09:58:17] re: cloudweb outage from yesterday, did I get it right in that case with cloudweb1003 down but not depooled pybal kept sending traffic to it due to the depool threshold ? [10:01:08] yes, that's right [10:02:13] ok thank you, what's the recommended threshold in the two-host case where one goes down without depool ? [10:02:36] I am also thinking we may have other services in the same situation [10:02:41] it was a weird scenario [10:03:02] cloudweb1004 was unable to make the healthchecks happy and cloudweb1003 got reimaged anyways [10:03:36] ah! I missed the first part actually where cloudweb1004 wasn't healthy [10:03:38] ideally cloudweb1004 situation should be fixed before reimaging the other realserver [10:05:04] indeed, if that was the case then pybal would have done the right thing and send all traffic to 1004 (?) [10:34:36] yes [10:35:35] godog: current depool threshold of .5 is perfectly fine, [10:36:11] from pybal PoV, cloudweb1004 went down first and never came back. cloudweb1003 went down so it kept it pooled anyways [10:36:44] * godog nods [10:37:05] FWIW I've opened T411470 to make cloudweb services paging (wmcs) [10:37:07] T411470: Page on cloudweb/horizon down - https://phabricator.wikimedia.org/T411470 [10:48:02] hi, seems like the heroku/clojure buildpack no longer works on Toolforge, I get a "[techdoc-dashboard-788699dbdf-xvsss] bash: line 1: java: command not found" when starting a newly built image. Is there a way to run an older image to get my tool online while I figure out what to do? [10:52:38] kbch: I think you can get a list from `toolforge build list`, and then pass a specific image to `webservice start` with `--buildservice-image` [11:03:08] lucaswerkmeister: thank you, that's the command I needed, but it seems this image is also not working (it worked earlier until I stopped it to update the tool). This time I can't access service logs either - I get a 400 client error when querying for logs followed by "No logs found!". [11:27:52] kbch: "kubectl describe pod/techdoc-dashboard-74449c7465-zdxc7" shows "Failed to pull image" [11:28:20] it's possible the older image was cleaned up automatically [11:28:41] dhinus: I tried to specify the image in two different ways, by build_id and by destination_image - not sure which is correct [11:30:47] is it possible that it was cleaned up if it was running up until an hour ago? [11:30:50] "toolforge webservice --help" shows you an example image [11:31:21] which I think matches the "destination_image" in "toolforge build list" [11:31:52] looks like it. I'll try again [11:31:58] the one that k8s is trying to use has "tools-harbor.wmcloud.org" repeated twice [11:39:58] tried it again, the outcome seems the same. Is it possible "tools-harbor.wmcloud.org" is added automatically and I should specify --buildservice-image without it? [11:43:56] it shouldn't happen, but it could be a bug... please try omitting it and let's see what happens! [11:48:37] tried it without the leading "tools-harbor.wmcloud.org", starting with the "/tool-..." but it doesn't seem to work either [11:51:06] should not start with a leading slash either [11:52:26] yeah this time it failed with invalid image name, it tried a double slash [11:52:33] so if you remove the slash too, maybe it will work? [11:52:53] 'couldn't parse image name "tools-harbor.wmcloud.org//tool-techdoc-dashboard....' [11:53:13] running it without the slash now [11:55:27] still "failed to pull", but the image name looks correct now. let me double check [11:57:11] ok it looks like there is only one version in the repository, the latest one that was built [11:57:20] the older ones have probably been cleaned up automatically [12:00:25] that makes sense, it was fairly old - in that case I need to figure out why the heroku/clojure buildpack is suddenly not working as expected. Thanks for your help! [12:03:24] I'm trying to debug the new image, the build logs look fine, they do recognize it as a java/clojure app [12:06:32] I also downloaded the image and inspected it as described in the docs, everything seems to work [12:11:44] the java bin seems to be in /app/.jdk/bin/java [12:12:38] I would try changing the Procfile to use "/app/.jdk/bin/java -cp target/dashboard-standalone.jar techdoc.dashboard.core" [12:13:17] I'm not sure why it's not in the path, something that we need to investigate... but hopefully using the full path will fix your issue [12:14:38] can you please open a phab task about it? [12:18:47] Will do! And I'll try changing the Procfile as you described. Thank you! [14:50:16] !status VM reboots in progress https://lists.wikimedia.org/hyperkitty/list/cloud-announce@lists.wikimedia.org/thread/2RV6C3TKSILUX6BGZZY4MFLLIJ6IEVDE/ [14:50:16] Too long status [14:50:26] !status VM reboots in progress [16:57:02] !status ok [17:08:46] Hello, I am trying to set up a web proxy for a VPS but I get an error. I waited a bit (I created the proxy yesterday). I can port forward using ssh and I see the webpage  on my localhost, but the web proxy is still returning an error. I am exposing a web page using docker (compose) on port 8050 on 0.0.0.0 on the VPS host. I have configured the web [17:08:46] proxy to point to port 8050 on the correct instance using http. Can somebody help me sort things out? [17:12:35] CristianCantoro: do you have a security group configured to allow the proxied traffic? (see https://wikitech.wikimedia.org/wiki/Help:Using_a_web_proxy_to_reach_Cloud_VPS_servers_from_the_internet#Creating_a_web_proxy) [17:12:54] if that is correct, which proxy specifically is the one with the issue? [17:12:54] taavi: I didn't check, thanks for the pointer [17:20:18] that was the issue, thank you taavi! [17:21:23] I had set up a secutiry group to allow ports 80 and 443, but now I am using docker and the page is on port 8050, so I needed a new rule [17:28:48] taavi: I wonder if anyone would read it if we put a "Did you open the upstream port in your security groups?" prompt on the error page? [17:29:44] even better I suppose would be Horizon telling you "lol, nope" if the port wasn't already open. [17:31:16] bd808: hmm. it seems like no-one reads https://wikitech.wikimedia.org/wiki/Help:Using_a_web_proxy_to_reach_Cloud_VPS_servers_from_the_internet#Troubleshooting which we link to from the error message [17:31:22] horizon telling you that would be a very good idea [17:33:28] T411531 [17:33:29] T411531: Prevent creating web proxies on ports with no matching security group rule to permit the traffic - https://phabricator.wikimedia.org/T411531 [20:00:09] !log paws rebooting paws-127b-rpchztfjt2jb-master-0.paws to aid with host draining [20:00:11] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Paws/SAL [20:06:00] !log tools rebooting tools-harbordb1 to aid with host draining [20:06:03] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [20:22:43] !log tools stop/starting harbordb1 to fix presumed mtu mismatch [20:22:45] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [20:50:10] !log paws stop/starting paws-127b-rpchztfjt2jb-node-1 to fix mtu mismatch [20:50:12] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Paws/SAL [20:51:26] !log zuul stop/starting zuul-k8s-v128-qnv6fefra4wd-node-0 to fix mtu mismatch [20:51:26] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Zuul/SAL [23:13:04] !log bd808@tools-bastion-14 tools.mwdemo Update git checkout 71f5813a and restart everything to upgrade to php 8.4 runtime. [23:13:06] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.mwdemo/SAL [23:31:33] TIL mwdemo [23:31:42] (surely I must’ve seen it before and forgotten about it, and no doubt it’ll happen again ^^) [23:32:07] at least some of the prose in https://mwdemo.toolforge.org/wiki/Project:Mwdemo still feels familiar :) [23:41:26] @lucaswerkmeister: hopefully I told you about it since I kind of ripped off the starting point from your notwikilamba setup :) [23:43:31] yeah :D [23:43:32] My next gen idea is to turn it into a push to deploy container with gitlab a ci task to update the mediawiki bits once a week. I'm kind of waiting for the component system to support webservice before I dig into that. [23:47:07] “MediaWiki is configured to use the CACHE_DB type, since no other type seems suitable.” – actually, couldn’t this (now) use https://wikitech.wikimedia.org/wiki/Tool:Containers#Redis_container ? [23:47:24] I guess that’s also half-blocked on component webservice support [23:47:50] (might be doable now but not in the nice and elegant way that we’d want to promote long-term) [23:47:59] yeah, local redis wold be possible these days. Just not in early 2023 :) [23:48:22] yeah, it’s cute how the webarchive’d setup from 2021 uses manual k8s continuous jobs too :3 [23:49:49] weird how software tends to improve as time progresses and people put work onto it :P [23:50:20] it is easy to lose track of that when we keep staring at the most recently broken parts :)