[09:57:16] lunch [13:23:18] the wave so nice, I did it twice! [13:27:58] o/ [13:32:27] dcausse somehow I didn't attach a Meet to the invite...I'm in https://meet.google.com/xfs-pkcw-mbu [13:32:57] inflatador: sorry missed the invite, is this now? [13:33:43] dcausse actually my fault, forgot to add you to mtg [13:33:58] ok, no worries :) [13:34:00] I'm rotating the Swift key a la https://phabricator.wikimedia.org/T345765 [13:34:07] gimme a sec and I'll join [14:52:42] \o [15:02:28] Is there anything we can do about memory usage of the cirrus-streaming-updater CI jobs? They constantly get killed while running [16:14:15] or maybe there is some way to request a high memory runner? unsure [16:17:15] we could ask jelto I guess? [16:18:24] https://phabricator.wikimedia.org/T345000 [16:18:49] tags: [memory-optimized] [16:19:28] curious name, i think of memory optimized as less memory, but ok :) [16:19:41] :) [16:29:19] I wonder if this means we still only get 4G to run tests with the memory optimized instance: https://gitlab.wikimedia.org/repos/releng/gitlab-cloud-runner/-/blob/main/production/prod.tfvars#L30 [16:31:44] you think it's not enough? [16:32:01] not sure to understand what's the default tho [16:32:38] i suppose i don't really know whats enough. I know if i start the test suite on my laptop while chrome and intellij are sucking up all the memory my laptop cries :P [16:33:12] lol [16:34:10] we can see how it goes, i noticed we have a `cat /proc/meminfo` at the beginning of the job and i've noticed it often says it's running on an 8gb host when it fails, and a 16gb host when it passes [16:35:33] seems to be failing on tests now CirrusDocFetcherTest.faultyResponses:124 [16:35:35] :/ [16:37:35] cpu throttled now perhaps? [16:38:01] hmm, indeed that didn't get very far [16:39:08] I...dont know :S 100ms should be an eternity to read from wiremock fixtures [16:39:55] yes... but perhaps the first call has to load plenty of classes and if it's throttled that might not be enough? [16:40:03] hmm, maybe [16:40:46] we can raise the request timeout during the tests to see but seems like this gitlab runner profile is not well suited for us :/ [16:41:33] 1500 should be 1.5cpu tho which seems more than enough [16:42:14] will try. I'm not sure what the general runner's cpu limit is but it seems it's probably similar [16:42:27] or potentially less, since the cpu optimized also has the same limit [16:42:48] (assuming that prod.tfvars is the right location :) [16:44:29] I'll ping jelto tomorrow if you don't find something that works [16:44:31] dinner [16:44:34] kk, thanks [16:44:49] oh cool, I didn't know we were using terraform anywhere [16:45:04] i imagine it's because thats what gitlab uses and we didn't want to reimplement :) [16:46:31] You're right about the tfvars stuff though [17:04:26] seems to have passed with the extended timeout [17:20:34] but a later run failed the increased timeout (to 250ms) :( [17:46:41] i dunno...i increased the timeout to 1s (from 100ms) and it still fails :S [18:06:53] damn, is there a way to just throw more HW at the problem? I guess if we changed the cloud OS flavor we'd have to talk to releng? [18:09:04] i really have no clue what could be taking a full second there, it's a bit mysterious [18:17:53] suppose i'm a bit late for puppet deploy window, but with the zk hosts created now this should be ready: https://gerrit.wikimedia.org/r/c/operations/puppet/+/954126 [18:24:39] ebernhardson :eyes [18:29:18] few mins late to pairing [18:34:04] Hmm getting an `Unable to sign in` error when I try to log into my google acc during the okta 2fa stage...weird [18:36:41] gehel: inflatador: can you link me the meeting link? think I'll need to join w personal email until I figure this out [20:08:33] ebernhardson I merged your patch...will work on getting the test rdf-streaming-updater chart next [20:16:00] thanks! [20:17:12] inflatador: to use it you need two more patches, this one implements the functionality in base.networkpolicy template: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/955032 [20:17:33] And this one updates the version dependencies of flink-app to use the updated template: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/955033 [20:41:39] ebernhardson 2 steps ahead per usual ;) [20:59:51] taking a break, back in ~30 [21:00:35] * ebernhardson puzzles over how ci could fail half a dozen times in the same way, but then on the revert patch it still runs on the memory-optimized hosts somehow but passes CI :P [21:32:22] sorry, been back