[01:22:01] [1/2] 6206b8a5eb65d4346639f185 [01:22:01] [2/2] https://cdn.discordapp.com/attachments/1006789349498699827/1512627591431852102/image.png?ex=6a24c7b9&is=6a237639&hm=f42d939f867826066f21a158748326b9735052f427684f86b9916e0a6123bf8d& [01:22:17] Got it when tryna access my preferences on https://fivenightsatfreddys.miraheze.org/wiki/Special:Preferences [01:27:10] hmm it's about skins [01:27:56] Strange and mysterious [01:30:21] ...and the cache isn't updating [01:31:34] eh just manually reset it from shell, works now [01:32:09] using a skin that doesn't exist skill issue [01:32:50] Might've been something with disabling Cosmos [01:33:02] yeah it was because you set the skin to cosmos [01:33:08] I reset the default to vector 2022 [01:33:27] Tyyy [01:34:17] <.thehandyman.> [1/2] Tyty [01:34:17] <.thehandyman.> [2/2] I was staring at the additional settings wondering why it was acting up, because it was just showing vector as default [21:17:25] FYI that CreateWiki AI relies on an endpoint that is deprecated at the end of august, should probably be moved to responses as I’m not sure how long they plan on leaving chat completions around for [21:18:57] It would also allow bumping it up to a cheaper model like gpt-5-mini which also has the ability to think [21:21:29] Plus they are killing off gpt 4o in october along with pretty much every other non gpt 5 model [22:38:10] I wouldn't say a clanker has the ability to think [22:38:40] it can think more than gpt 4o [23:04:24] Thanks for the mention, this is already on DTech and DDTech's visibility, hopefully we'll be at a good point to gracefully transition with a testing period [23:20:04] Think as in reasoning? [23:20:19] yeah [23:30:44] gpt-oss-120b worked well enough when I tested it [23:31:07] Not as competent as 4O but if you split it into multiple prompts it worked well enough [23:35:38] iirc gpt-oss-120b can reason [23:35:44] I don't have a GPU big enough to test it on [23:36:05] although I'm kind of tempted by the offering groq have [23:36:31] It's cheap enough that you can test hundreds of wiki creation requests with the $1 credit on OpenRouter [23:36:58] groq do inference on it for like $0.15/0.60/MTok [23:37:10] groq is super cool as well because iirc they bill you at the end of each month [23:37:46] but it sort of defeats the point then of it being oss if you pay someone to do the inference [23:38:39] The hope was that we downgrade to gpt-oss-20b on our own servers, but I think that likely won't happen since my own tests resulted in significant quality degradations. [23:39:07] https://cdn.discordapp.com/attachments/1006789349498699827/1512964082402070548/image.png?ex=6a26011a&is=6a24af9a&hm=8dc2284f74c39797da2cefaf9fc8bd28c31604557efa3fcbffb912350a46611b& [23:39:10] does it fit on the gpus [23:39:16] tbh [23:39:23] 20b does. [23:39:25] you could just say fuck it and buy a mac mini [23:39:35] or it might need to be a mac studio but [23:39:42] that would run 120b [23:39:46] The main advantage is the lack of vendor lockin since you can always switch providers if you don't like the current one. [23:40:25] Although deepseek v4 flash is only 2x the price per million tokens and is known to cache very well. [23:40:40] are wiki requests frequent enough to need prompt caching though [23:40:45] the only time I can think is on rereviews [23:40:55] Good point. It probably does only when testing. [23:41:22] I have about 40-50 test requests that are sometimes run in a big batch to catch prompt regressions. [23:41:43] realistically I think you could possibly run 2 mac minis with exo (although I think that would be very experimental fuckery) and get 96GB ram [23:42:10] would probably be better and similar cost to just run a studio, but it's probably an investment that wouldn't pay itself off [23:42:53] Buying mac minis to run LLMs sounds like a bad idea since, even with the terrible cost-effectiveness of OpenAI's 4O, we are still only paying $30 per month. [23:43:20] I think realistically just switching model to a cheaper one would already be an improvement [23:43:36] I think a current gen mini model would probably give the same performance at 4o [23:43:48] btw [23:43:52] have you considered flex processing [23:44:03] would be possible with the newer models [23:45:06] there's no guarantee it actually happens within 10m but realistically given most batch jobs I submit run within like an hour they have enough capacity spare that flex would probably just be an easy 50% discount [23:46:12] Just checked lmarena and yeah 5.4-mini is slightly above 4o in text performance. [23:46:30] I assume that's from reasoning [23:46:54] That is not my decision to make. [23:47:00] although gpt-5 is also similar in price to 4o and I would assume has significantly better performance [23:47:05] was a more general suggestion [23:47:44] the one thing is it possibly wouldn't play nice with jobqueue and would need to have another job queued called like RequestWikiAIPoll job to poll it [23:48:00] otherwise it keeps a runner busy just for polling openai lmao [23:49:24] that reminds me I need to go setup mediamoderation [23:56:00] Hmm tbh that sounds like lots of work but not much real gain [23:56:25] 50% discount is not a bad gain [23:58:07] You could just make it a maintenance script and run it once per hour or something.