[08:42:27] morning! [08:48:17] Morning, Aiko! [09:24:23] 06Machine-Learning-Team: Prepare docker image for hosting the logo-detection model-server on LiftWing - https://phabricator.wikimedia.org/T362598#9763238 (10kevinbazira) 05Open→03Resolved [10:24:28] * klausman lunch [11:39:14] hello folks [11:43:58] I just depooled inference-codfw as prep-step for the upgrade [11:54:21] I've just rolled out the coredns change to force ipv4 resolutions (needed for the new stuff to work, due to an istio bug) [11:55:44] rolling out the istio changes [11:57:14] and now I am going to proceed with revscoring damaging [11:58:07] ack, thank you! [12:10:48] httpbb works for all the damaging pods in codfw, proceeding with the others [12:19:25] article-description works [12:19:43] side note - once deployed, the sidecar takes a while before adjusting with the new cofnig [12:19:46] *config [12:20:15] 5-10 Minutes? Or much longer [12:20:17] ? [12:20:25] some requests, then it works [12:20:32] timeouts etc.. [12:20:46] for article-descr, it took ~30 sec to get the first response [12:20:50] You think it may be DNS caching (or lack thereof)? [12:20:52] then normalized to the usual [12:21:31] nothing really changed on the DNS caching front, except that we force A records.. I think it may be envoy getting the config upgrades from istiod [12:21:52] ah, ack. [12:22:11] Wonder if this is a one-off or something we'll se with re-deploys/restarts from now on [12:23:01] nono I think it is a one-off [12:23:23] oh, good [12:23:37] IIRC it happened in the pat [12:26:40] same happened with readability, I think that our version of istio may have some extra latency when pushing the updates to the sidecar [12:43:00] all deployed, now I am trying to make httpb to work :) [12:43:03] *httpbb [12:51:04] what's the issue with httpbb? [12:51:38] the timeouts that I was talking above :) [12:51:43] Oh, right :) [12:51:53] My brain has too many tabs open [12:51:54] hopefully they should be fixed now [12:57:45] ok mostly done! I am fixing the wikidata headers that I missed [12:57:51] like https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1026558 [12:58:25] before we used to hit the MW API with "wikidata.wikipedia.org", that doesn't exist, but the httpd's vhost config + probably the API format made it work [12:58:52] (we explicitly fix the host header also for wikibooks etc..) [13:02:36] Ah, sneaky [13:03:03] now if we don't fix it we'll get the https:// Location header redirect :( [13:03:56] Sending to inference.svc.codfw.wmnet... [13:03:57] PASS: 113 requests sent to inference.svc.codfw.wmnet. All assertions passed. [13:04:01] finally :) [13:07:26] the only one that still not works seems to be article-descriptions, afaics we don't have anything in httpbb [13:07:50] 06Machine-Learning-Team, 06Structured-Data-Backlog: Pass image objects to the logo detection service - https://phabricator.wikimedia.org/T363506#9763776 (10mfossati) >>! In T363506#9757394, @isarantopoulos wrote: > We would need the upload wizard to send a resized image (224x224) instead of the whole file. Is... [13:07:58] now it works, ok :) [13:12:51] https://gerrit.wikimedia.org/r/c/operations/puppet/+/1026559 to add the test :) [13:58:48] 06Machine-Learning-Team, 10Add-Link, 06Growth-Team, 07User-notice: Deploy "add a link" to 18th round of wikis (en.wp and de.wp) - https://phabricator.wikimedia.org/T308144#9763975 (10Trizek-WMF) [14:26:41] ok folks from my point of view codfw is ready to be pooled again [14:26:51] anybody that wants to double-check/spot-check if anything is weird? [14:39:14] looks good to me [14:39:30] thanks! [14:39:33] will do it in a bit [14:51:29] Morning all [15:00:34] o/ [15:03:29] codfw repooled! [15:05:37] I am checking the following [15:05:38] https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&from=now-30m&to=now&var-cluster=codfw%20prometheus%2Fk8s-mlserve&var-namespace=All&var-backend=mw-api-int-ro.discovery.wmnet&var-response_code=All&var-quantile=0.5&var-quantile=0.95&var-quantile=0.99 [15:06:32] so far latency and response codes are ok [15:24:18] please ping me if you see/spot anything weird, so far I don't see anything [16:15:30] YES [16:15:35] Thanks elukey [16:15:35] folks I am going afk for the evening, will check later just in case [16:15:37] <3 [16:15:42] have a good rest of the day! [16:15:45] Night elukey! Thanks for this! [16:15:48] I also left a note on slack