[04:38:16] 10Machine-Learning-Team, 10Epic: Fix feature to view old search results - https://phabricator.wikimedia.org/T329345 (10kevinbazira) [04:38:28] 10Machine-Learning-Team: Fix feature to view old search results - https://phabricator.wikimedia.org/T329345 (10kevinbazira) [07:36:25] hello folks! [07:36:31] going to upgrade the staging cluster :) [07:40:28] klausman: o/ I am going to start earlier since there are a lot of reimages, I created a tmux session on cumin1001 called "T327767" [08:00:32] (of course the cookbook fails in one step, not caught in dryrun.. alertmanager refuses to downtime) [09:01:37] ottomata: for T328899 we'll let the Research/Search teams to decide, no strong opinions.. for https://phabricator.wikimedia.org/T328576 we can probably use page_change's rc stream and start testing them, what do you think? [09:01:53] if they'll be out of RC state in a reasonable time it should be ok :) [09:29:29] isaranto: o/ [09:29:38] I see that wiki-gpt is broken, known? [09:29:45] (returns 500 every time after a long wait) [09:40:13] ottomata: mmm I checked eqiad.rc0.mediawiki.page_change and rc1 on kafka-main1001 via kafkacat, there is no event flowing.. Is it expected? [09:41:18] elukey: hey can u check again? [09:41:50] I was trying to fix it as I broke the "search url" functionality yesterday [09:42:12] I'm just here trying to break the stuff that kevin is building [09:42:14] <3 [09:43:07] mmm better, but I tried with "Who won the world war 2 ?" and it doesn't work [09:43:17] "I am sorry, but I am unable to answer this question. I can only answer questions that are answered inside the content of Wikipedia" [09:43:41] that is IIRC the first example that Chris showed to us [09:45:22] ottomata: also I'd need to figure out if the page_change schema's attributes can map to the revision-score ones [09:47:42] elukey: yeah I'm working on this today. seems like the restrictions I have added make it really hesitant now [09:48:31] elukey: \o [09:48:48] elukey: sorry for the late call-in. Is the session on cumin1001 still running? [09:49:36] klausman: o/ no problem the cookbook is broken, need to work with Riccardo on the alertmanager stuff [09:50:04] Alright! If you want review/help, just lmk [09:51:37] thanks! [10:03:25] ottomata: so I tried to map page_change to revision score, and the only thing that it is missing (afaics) is the revision timestamp, that seems not present on page_change. We could try to relax revision-score's required attributes in theory [10:10:03] (03PS1) 10Elukey: WIP - events: support multiple source events [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/888190 (https://phabricator.wikimedia.org/T328576) [10:11:37] (03CR) 10Elukey: "The only missing field in page_change that is required in revision-score's schema is the rev-id-timestamp. We'll see what to do (relax the" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/888190 (https://phabricator.wikimedia.org/T328576) (owner: 10Elukey) [10:12:20] folks I created --^ to improve our code for event handling [10:12:37] I think that we'll likely never support more than a few event sources (last famous words) [10:12:50] but in theory page_change and revision-create should suffice for the moment [10:15:23] elukey: o/ in the page_change schema, there is a "wiki_id" field which descripted as "The wiki ID, which is usually the same as the MediaWiki database name. E.g. enwiki, metawiki, etc." [10:15:56] I think we can use it instead of using meta.domain, because it seems equivalent to the "database" field in revision-create. [10:16:52] aiko: ah nice didn't see it, we can use it! Not sure what is the difference, should be the same in theory [10:18:05] (we can support both on the Change-prop's rule side) [10:20:17] elukey: yeah I'm not sure what is the difference either, meta.domain is described as "Domain the event or entity pertains to" [11:12:02] elukey: shall I try to add a new stream like you did here? https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/886918 [11:17:05] isaranto: you can definitely read docs etc.. but I'd wait a bit for code reviews, since we don't know 100% the source input stream yet [11:17:42] we'd also need https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/888190 if we switch to page-change [11:17:57] ok. yeah I followed the conversation about page_change vs revision_create [11:18:03] ack super [11:24:51] * isaranto afk lunch [11:31:50] * elukey lunch as well! [12:09:44] * klausman heading for noms as well [12:38:24] 10Machine-Learning-Team, 10Data-Engineering-Planning, 10Event-Platform Value Stream: Add a new outlink topic stream for EventGate main - https://phabricator.wikimedia.org/T328899 (10EChetty) [12:39:46] 10Machine-Learning-Team, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 9 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10EChetty) [12:41:02] 10Machine-Learning-Team, 10ORES, 10Analytics-Radar, 10Analytics-Wikistats, 10Data-Engineering-Planning: Discuss Wikistats integration for ORES - https://phabricator.wikimedia.org/T184479 (10EChetty) [12:44:11] 10Machine-Learning-Team, 10ORES, 10Analytics-Radar, 10Analytics-Wikistats, and 2 others: Discuss Wikistats integration for ORES - https://phabricator.wikimedia.org/T184479 (10EChetty) [12:46:19] 10Machine-Learning-Team, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 10 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10EChetty) [14:34:00] I managed to improve the search results by adding as much information to the prompt as it allows https://gitlab.wikimedia.org/toolforge-repos/wiki-gpt/-/merge_requests/12 [14:34:28] there is the token limit that Chris mentioned and that is the reason that we get 500 errors many times [14:34:58] however i did not put a limit on the input but only in the content. if one asks a really long question it will break [14:35:10] w8 actually I can just limit the question as well [14:53:17] great wikigpt is getting a 504 Gateway Time-out error [14:54:27] for some reason I dont know [15:22:14] ok I have broken my head around this and figured out it is probably a generic toolforge problem with webservices [15:22:51] need any help? [15:23:08] I tried with 2 other tools and if I restart a webservice (while it was working fine) then we get 504 errors [15:23:21] like here https://isaranto-test-ci.toolforge.org/ [15:23:51] I even replicated our tool to see if it would fix the issue https://wiki-gpt-ml.toolforge.org/ [15:24:17] elukey: do u know who we can contact about toolforge webservices? [15:24:36] probably the cloud team (#wikimedia-cloud) [15:24:45] thanks! [15:27:34] reached out over there 🤞 [15:47:13] 10Machine-Learning-Team, 10Add-Link, 10Growth-Team (Current Sprint), 10User-notice: Deploy "add a link" to 6th round of wikis - https://phabricator.wikimedia.org/T304550 (10Sgs) a:03Sgs [15:56:19] 10Machine-Learning-Team, 10Patch-For-Review: Upgrade the ml-staging-codfw cluster to k8s 1.23 - https://phabricator.wikimedia.org/T327767 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=bc62f62a-f265-488f-a231-65cedb81bed3) set by elukey@cumin1001 for 3 days, 0:00:00 on 5 host(s) and their... [15:56:58] klausman: o/ the staging cluster is half broken, the reimage cookbook for vm didn't work [15:57:02] sigh [15:57:10] I added downtimes etc.. for 3 days [15:57:15] what broke? [15:57:42] the PXE boot doesn't work, not sure if it is a weird DHCP config or similar [15:57:52] ack. [15:57:52] we'll need to investigate with Simon and Moritz probably [16:00:35] * elukey afk for a bit! [16:05:48] it works now [16:07:34] anyway now we also have a backup url https://wiki-gpt-ml.toolforge.org/ [16:33:35] logging off folks have a nice weekend o/ [16:35:27] 10Machine-Learning-Team, 10Epic: WikiGPT Experiment - https://phabricator.wikimedia.org/T328494 (10isarantopoulos) Regarding the 500 errors referred on the ticket description, some of them come because of failures in the OpenAI API calls and some of them have been [[ https://gitlab.wikimedia.org/toolforge-repo...