[00:00:51] unrelated to meh code above, here is a silly idea to get source field size from elasticsearch: https://phabricator.wikimedia.org/P32760 [00:01:07] dcausse: ^^ and running it against viwiki: https://phabricator.wikimedia.org/P32761 [00:02:26] it's not really attempting to be actually accurate, but should be close enoughâ„¢ for our purposes [07:08:49] ebernhardson: nice! [07:09:12] going to file a ticket to track these numbers [13:06:34] greetings [13:09:13] o/ [14:40:32] \o [15:01:49] * ebernhardson now realizes the big a + b + c + d + ... condition at the end of script probably could have been valueLength(params._source) since SourceLookup implements Map [15:10:53] o/ [15:11:04] :) [15:37:02] ebernhardson: if you'd like to take care of the skeleton project for the update pipeline https://gitlab.wikimedia.org/repos/data-engineering/mediawiki-stream-enrichment/-/merge_requests/13 should have a pom with the flink libs we'd like to use (except perhaps com.softwaremill.sttp.client3 & io.circe) [15:37:40] dcausse: sure can do that [15:43:46] dcausse: should we set it up with sub-projects like the rdf repo? [15:44:36] then we would have, for example, separated consumer/producer/common projects. Not sure if that turned out to be valuable or not [15:44:50] ebernhardson: yes I think so, if we keep the current WIP design we would need 3 submodules: something like: common, preparation and ingestion [15:44:54] yes [15:45:08] kk [15:50:38] dcausse: i've noticed the ticket is missing the hardest question, what do we call it? :P probably search/ [15:51:46] cirrus-streaming-updater ? [15:52:02] +1 :) [15:52:28] lol, good enough i suppose. At least it's not a fully transparent name that says nothing [15:53:43] yes it does not seem ambiguous and is (hopefully) self-explanatory [15:54:24] it's not as fun as mjolnir tho :P [15:54:52] i still think ragnarok would have amusing implications :P [15:56:02] lol [15:59:20] dcausse: maybe for tomorrow, but i'm not sure what to respond to https://www.mediawiki.org/w/index.php?title=Topic:X1i7qh0iaqog7br0&topic_showPostId=x1nfn8ab2bkdqglr&fromnotif=1#flow-post-x1nfn8ab2bkdqglr [16:00:47] ah yes saw this one... and not sure if the solution we have will make them happy, adding a new query builder is quite involved [16:01:02] I can add a comment with few pointers [16:02:55] workout, back in ~40 [16:35:43] back [17:07:03] ebernhardson is the username for thanos-swift just 'wdqs_flink'? I'm trying to clean up that container and not able to auth [17:07:59] nm, got it [17:08:03] inflatador: is your script able to keep the "T314835" folder if not I might need to restart the pipeline using hdfs [17:08:04] T314835: wdqs space usage on thanos-swift - https://phabricator.wikimedia.org/T314835 [17:08:22] (it's actually wdqs:flink) [17:09:11] dcausse looks like 'rdf-streaming-updater-codfw-T314835' is a separate container from 'rdf-streaming-updater-codfw', do you want to keep the separate container or is their a folder within the 'rdf-streaming-updater-codfw' that needs to stay? [17:09:55] it's the T314835 folder witing rdf-streaming-updater-codfw that needs to stay I was not able to use the rdf-streaming-updater-codfw-T314835 container sadly [17:10:28] but if it's easier to blow everything up we might try to use another store for the job during the cleanup [17:11:11] dcausse I **think** I can do that, swiftly does support some regex logic. Let me do a few tests and get back to y'all [17:11:20] this is what I'm using BTW https://github.com/gholt/swiftly/ [17:11:33] thanks! [17:16:28] dcausse can you give me the full path of an object in the rdf-streaming-updater-codfw that you **don't** want deleted? [17:18:48] inflatador: some of them https://phabricator.wikimedia.org/P32855 [17:19:08] note that the job is constantly writing things there [17:22:49] dcausse OK, I think I can do this using fordo: http://gholt.github.io/swiftly/2.06/#swiftly-help-fordo . Going to HEAD stuff first to make sure it's working, will start to clean out the objects that start with "commons" first [17:23:09] nice! [17:35:37] * ebernhardson is surprised to find that `./mvnw -pl mw-oauth-proxy package` doesn't run the same set of tests that `./mvnw -pl mw-oauth-proxy verify` runs. I suppose i expected package to do a full verify before packaging [17:35:48] in particular, packaging didn't complain about false vs Boolean.FALSE [17:37:13] I think package does only the unit tests? verify will run checkstyle/spotbugs and other things [17:39:28] dinner [17:40:06] ahh, ok i suppose i can see that. [17:42:44] delete is going at ~200 objects per minute. There are ~500,000 containers overall, some of which we're not deleting. We can probably increase concurrency but not until we get the OK from data persistence [17:47:56] lunch [18:32:38] ryankemper: we're in https://meet.google.com/eki-rafx-cxi [21:35:55] * ebernhardson finds it tedious that apparently the wayback machine is the best way to read nginx docs for old versions ... [21:45:24] they didn't help either :P Trying to understand why i can set `proxy_pass_request_headers off` (and then `proxy_pass_request_body off` for good measure) and yet tcpdump shows nginx still passes Content-Length header along [21:45:35] should probably give in and explicitly set that header ... [21:48:54] unrelated but mildly amusing, nginx says 'GET /blah HTTP/1.0' and jetty responds with 'HTTP/1.1 200 OK'. Wasn't aware it was valid to have a 1.0 request with a 1.1 response [22:32:03] til: RFC 2616 requires that HTTP servers always begin their responses with the highest HTTP version that they claim to support. Therefore, this Connector will always return HTTP/1.1 at the beginning of its responses. [23:36:05] heading out