[01:54:05] 10Machine-Learning-Team, 10Epic: WikiGPT Experiment - https://phabricator.wikimedia.org/T328494 (10kevinbazira) [07:02:43] 10Machine-Learning-Team, 10DBA, 10Data-Engineering, 10Data-Persistence, and 9 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10Marostegui) [07:57:52] 10Machine-Learning-Team, 10Epic: WikiGPT Experiment - https://phabricator.wikimedia.org/T328494 (10isarantopoulos) [08:10:11] morning folks! [08:11:05] from chilly athens (feels like -6 ) [08:19:38] o/ [08:19:40] wow :D [08:19:58] it is chilly in here too, but not that cold :D There was some forecast about snow but I don't see much [09:55:17] isaranto: I forgot one thing - in the pre-commit step of inference-services, what is the component that fixes the code when it runs? [09:55:20] is it ruff? [09:55:29] ruff and black [09:55:37] black is a formatter [09:56:01] while ruff is just a linter and by passing --fix it can fix some stuff as well [09:58:26] ahhh yes black of course [09:58:29] thanks I got confused [09:59:02] I figured out a way to update a toolforge project with a webhook from gitlab [10:01:16] wow nice [10:01:54] I'll open a merge request later and probably add it in wikitech (it has only documentation only for php apps) [10:24:21] there are containerd security updates. they are already live on the wikikube cluster w/o any issues, shall I roll them out to ml-serve right away or do you want to have a look at the staging ml hosts first? [10:24:31] running pods are not impacted by the update [10:26:22] moritzm: +1 with the upgrade [10:28:18] k, I'll take of that in ~10m [10:58:45] 10Machine-Learning-Team: Create repository for WikiGPT - https://phabricator.wikimedia.org/T329028 (10isarantopoulos) Created a repo that has the following: - has a CI pipeline for Merge requests that runs precommit command for python, css and javascript files - has a webhook that syncs code to the toolforg... [11:30:14] I update the page on wikitech to include python+ gitlab information https://wikitech.wikimedia.org/wiki/Help:Toolforge/Auto-update_a_tool_from_GitHub/GitLab#Python_tool_hosted_in_$HOME/www/python/src [11:32:07] 10Machine-Learning-Team: Create repository for WikiGPT - https://phabricator.wikimedia.org/T329028 (10isarantopoulos) Added some documentation on Wikitech about how to setup a python webhook for a toolforge app in Github/GitLab https://wikitech.wikimedia.org/wiki/Help:Toolforge/Auto-update_a_tool_from_GitHub/Gi... [11:33:18] I have updated the ores dashboard with prometheus datasources, no more mix eqiad/codfw [11:33:37] and also added two new panels (per wiki/model traffic, with and without precache) [11:34:00] 🚀 [11:34:05] in eqiad ORES seems to get ~30/40 rps [11:34:29] most of them are wikidata's [11:35:10] but the numbers are weird now that I see [11:35:18] will need to recheck after lunch [11:40:55] * elukey lunch! [12:59:41] * isaranto l8 lunch [13:41:53] 10Machine-Learning-Team, 10Data-Engineering, 10Event-Platform Value Stream: Add a new outlink topic stream for EventGate main - https://phabricator.wikimedia.org/T328899 (10elukey) Thansk a lot for the details! We'll let decide the Research and Search team what's best, but we (as ML team) have discussed it a... [13:47:59] elukey: what can I do to encourage you to use page_change and its model for new mw page related change streams? [13:47:59] :) [13:48:16] and/or why is revision-create preferred? [13:49:18] ottomata: o/ mainly time, we'd need to create streams asap and we have had a ton of blockers so far :( [13:54:58] elukey: what are the blockers for page_change ones? [13:56:14] ottomata: that there is no stream atm except the rc ones, so we have to wait until all the flink workflows are up and tested before starting ours [14:26:30] https://grafana-rw.wikimedia.org/d/HIRrxQ6mk/ores - I added new latency graphs, now we know p(50,75,95) values for each combination of ores-instance/wiki/model [14:26:42] still not yet its final version, but a ton more metrics [14:26:56] I need to figure out the exact difference between the various type of requests [14:27:21] datasources_extracted, scores_processed, scores_request, precache_request [14:28:36] in the dashboard I found that the union of scores_request|precache_request seems to be the sum of all requests [14:29:09] but we also have scores_processed.. [14:35:15] going off for an hour, will be back after the errand! [14:54:00] wow a lot of graphs! nice work! [15:10:47] 10Machine-Learning-Team, 10Data-Engineering, 10Event-Platform Value Stream: Add a new outlink topic stream for EventGate main - https://phabricator.wikimedia.org/T328899 (10Isaac) Not to further muddy the water, but I'm just realizing that this model could also be triggered with just page-links changes as th... [15:59:44] elukey: no you don't have to wait for any flink workflows [15:59:51] the page_change is up now [16:00:04] if you need page_content_change, then yes you'd have to wait [16:00:37] page_change stream isn't deployed to all wikis yet, and still is 'release candidate', but the schema is finalized [16:01:00] all wikis and out of 'rc' mode is just a mw-config change away [16:25:57] ottomata: I am not saying that we don't want to use it, but it seemed something more distant in the future from what you wrote. Could you give me a timeline to have an equivalent stream like revision-create? Same wiki supported etc.. out of RC? [16:26:19] more than happy to use it or suggest to use it if feasible [16:43:26] 10Machine-Learning-Team, 10ORES, 10Wikimedia-production-error: PHP Notice: Trying to access array offset on value of type null - https://phabricator.wikimedia.org/T329304 (10thcipriani) [16:50:48] 10Machine-Learning-Team: Review ORES traffic to better understand Lift Wing's requirements - https://phabricator.wikimedia.org/T325763 (10elukey) Refactored the https://grafana.wikimedia.org/d/HIRrxQ6mk/ores dashboard using the new per-model metrics that the exporter returns. The main question mark is what is t... [16:59:08] ok so https://github.com/wikimedia/ores/blob/05f6df2890a78fa50a4403148b1e6b003253fd95/tests/metrics_collectors/tests/test_logger.py#L25 may give us some clue [16:59:35] since you can request multiple scores at once, "scores_request" should be the time taken by all of them [16:59:38] (plural) [17:00:02] "score_request" (singular) should be how much it took for a single score [17:03:59] ok so now it may make sense that "scores_request" + "precache_request" is the total traffic [17:04:48] elukey: by end of quarter [17:05:17] i corrected myself in that ticket. mediawik.page_change (with everything revision create has, and more) will be production by end of quarter. [17:05:40] its just the page_content_change stream, which is the same thing, but with raw wiki content in the stream, will not be 'production' ready by then. [17:05:51] but you don't need page_content_change, you need just page_change! [17:09:01] 10Machine-Learning-Team: Review ORES traffic to better understand Lift Wing's requirements - https://phabricator.wikimedia.org/T325763 (10elukey) From [[ https://github.com/wikimedia/ores/blob/05f6df2890a78fa50a4403148b1e6b003253fd95/tests/metrics_collectors/tests/test_logger.py#L24-L34 | this ]] test in ORES I... [17:10:09] ottomata: sure yes, but it is still 2 months away, we can wait, but it is not ready to use :) [17:10:19] even if we have rcX etc.. [17:10:40] I mean we can start using the rcX ones, and then switch and say that from that moment onward it will be production ready [17:11:04] I'll talk with my team and see what comes out [17:16:45] -- [17:17:04] also added score hits/misses panels to the ORES dashboard, we should have a complete view now [17:17:10] going afk for today folks! [17:17:15] have a good rest of the day :) [17:41:24] elukey: k, i'd highly encourage you to use it. There is nothing stopping us from making it non rc today, we just havn't done it because we haven't been pressed. :) [17:42:03] i can help with the your score schema if you like, it should be pretty easy wiht the way I made the fragmentts. [17:42:31] if you like the way the score field is modeled in revision-score, we can mostly just bring that in [17:42:34] or we can do somehting different [18:52:20] 10Machine-Learning-Team, 10Data-Engineering, 10Event-Platform Value Stream: Add a new outlink topic stream for EventGate main - https://phabricator.wikimedia.org/T328899 (10Ottomata) Interesting. I think the output score data model could still be an entity change based model, but the input wouldn't be revis...