[07:20:11] ryankemper: (for later) looking at https://integration.wikimedia.org/ci/blue/organizations/jenkins/wikidata-query-rdf-maven-release-docker/detail/wikidata-query-rdf-maven-release-docker/58/pipeline/#log-17 (line 17 and around), there is a failed data transfer. I wonder if this is the retrieval of the cache (not super clear from the logs), which could explain the long build time. [07:20:21] maybe something to check with rel-eng [08:08:01] going to re-enable the mjolnir dag, I've left it disabled last week to catch-up [08:17:48] dcausse: can this be abandoned - https://gerrit.wikimedia.org/r/c/wikidata/query/rdf/+/670242 ? [08:18:35] zpapierski: no it might be usefull for the data-transfer cookbook [08:18:55] ok [08:19:56] the old updater used the triplestore itself to store offsets but I'm not a fan of this solution, using files is not great either but at least makes it indenpendent from the store [09:01:32] hmm, I think wikiId in cindy that runs the test isn't cirrustest [09:01:46] dcausse: what's the wikiId for a test runner on cindy? [09:03:16] zpapierski: it depends on the wiki you target from the test [09:03:22] there are multiple wikis [09:03:26] ah, ok [09:03:53] but most tests [should ]use cirrustestwiki [09:05:32] mine as well - so id is cirrustestwiki, not cirrustest? [09:06:24] yes [09:06:30] ok, thx [09:07:05] it's the same as the --wiki option passed to maintenance scripts [09:07:30] ah, right [09:20:52] hows the streaming updater stuff going? =] [09:21:29] ironing out the last wrinkles, hopefuly [09:21:37] addshore: we're deploying it to k8s at the moment (still resolving small issues) [09:21:51] envoy/certificates and the like [09:22:03] sounds like fun :D [09:22:19] otherwise it's been running for a while from yarn and populating https://query-preview.wikidata.org [09:22:38] I was just chatting over in #wikimedia-analytics with joal, and excited what other things this updater type system could enable us to do [09:22:54] yes saw the discussion [09:23:33] I'd love to unify some parts of it (e.g. reordering events) and make the "fetch content" part tunable [09:23:43] and I just noticed I got disconnected from that channel :( [09:24:04] so that one could just concentrate on fetching some data out of the mw-api [09:26:17] zpapierski: https://phabricator.wikimedia.org/P16834 [09:26:44] thx! [09:26:50] all sounds exciting :) I'll leave yall to it! [09:43:11] tempted to switch to java 11 for the flink image [09:43:26] should work [09:44:02] rdf compiles in full on jdk11 and if there's no code from blazegraph, it should be jdk 11 compliant, AFAIR [09:44:37] it doesn't change a thing for a code itself, but binary might benefit from the switch [09:47:36] yes, and scala 2.11.12 (which we seem to use) claims to support 11 [09:53:39] change the runtime? Or the compiler? [09:54:23] if we have different targets for flink vs blazegraph/wdqs, we should probably split the project [09:56:40] I think we inherit from the disco-parent source==target==1.8 [09:56:59] is it bad to deploy a target == 1.8 on java 11? [10:00:00] for context: I need a package "wmf-certificates" that's only available in buster but only a openjdk-11 image is available on buster yet [10:00:35] so I'm trying to lazy solution here: migrate the flink image to openjdk11 [10:00:44] without changing anything else [10:06:11] there are a few potential issues (none are blocking) [10:06:48] I don't think that deploying older binaries on 11 would fail - after all, many libraries are compiled with older jdks [10:06:56] if we compile on JDK8 with target=1.8 and deploy on JDK11, we're not running the same JDK for test and for production. That's a potential issue. [10:07:19] s/compile/build/ [10:07:40] if we build on JDK11 with target=1.8 and deploy on JDK11, I see no issues [10:07:45] We can build/target on 11 with a 8 target on blazegraph [10:08:16] having different targets for different modules is easy [10:08:30] having different JDK for the build of different modules is hard [10:08:43] You don't need to, I think [10:08:50] Different level should be enough [10:08:58] target = 11 requires a java 11 to build [10:09:04] Yep [10:09:28] you don't _need_ to, but if you use a different JDK for your build and your production, then you're not testing the same thing [10:10:22] Right, that may be a potential issue [10:11:21] not the end of the world, but one of the potential issue is using method / classes from JDK11 in the blazegraph side and having those not available in production [10:11:24] I'm leaning toward keeping target=1.8 for now and do an exception for the flink app (deploying binaries to 11) [10:11:52] testing on 8 and running on 11 is less of an issue, backward compatibility is usually very good [10:11:54] But this would still require you to use jdk11 [10:12:07] to build or run? [10:12:26] to run yes [10:12:31] to build and to run the flink part. blazegraph would still be running on 8 [10:12:49] Oh, sorry, I miss read. [10:13:00] Me too, I think [10:13:12] so compile everything with target=1.8 and JDK8 [10:13:14] So, build on 8,run on 11 [10:13:14] to build unless I'm mistaken we still use java8 all over the place [10:13:20] deploy flink on 11, and blazegraph on 8 [10:13:26] yes [10:13:30] that's unlikely to cause any issue [10:13:31] Should be fine anyway [10:13:47] so: https://gerrit.wikimedia.org/r/c/wikidata/query/flink-rdf-streaming-updater/+/705633 [10:13:57] we do have a java 11 image ready in jenkins, so if we want to move the build to 11 that's easy (not that we should) [10:15:11] it is surprising that wmf-certificates are not available for stretch [10:15:25] it's quite new [10:15:25] and we should also have a buster + jdk8 image [10:15:47] do we have production on buster+jdk8? [10:15:55] honestly, I think we should push our blazegraph stuff to a walled garden of jdk8 and push everything else to at least jdk11 [10:15:58] break [10:16:02] probably cassandra (if it has migrated already) [10:16:18] wdqs is running buster + 11 [10:16:23] but not on docker [10:16:29] really? [10:16:37] you mean buster+8 [10:17:05] sorry, yes, buster+8 [10:17:34] that version matrix already has too many dimensions for me [10:17:43] true :) [10:18:28] lunch [10:19:05] meh.... Error: release 6u6um6ds failed: roles.rbac.authorization.k8s.io is forbidden: User "system:serviceaccount:ci:tiller" cannot create resource "roles" in API group "rbac.authorization.k8s.io" in the namespace "ci" [10:19:07] (and at some point, we'll migrate everything to Java 16!) [10:19:50] that seems far away :) [10:19:58] 17 [10:20:26] it's not out yet! but targeting an LTS would make sense [10:21:44] yes [10:23:53] lunch [13:41:09] dcausse: can I take 7.45min of you r time to discuss 4 tickets from streaming updater epic? [14:05:15] zpapierski: sure [14:05:34] https://meet.google.com/sxm-afao-gef [14:09:21] gehel: can we close this https://phabricator.wikimedia.org/T280579 ? [14:14:27] dcausse: I also assumed we can close this one - https://phabricator.wikimedia.org/T247058 [14:15:04] zpapierski: I think so [14:19:10] mpham: I took care of remaining opened tickets on the epic - some were added to "out of scope", some I closed and others I'm addressing seperately [14:19:34] if I didn't touch some it's because there are already being worked on, or have been addressed yesterday already [15:01:44] code coverage metric on Cirrussearch from Sonar is a bit unhelpful [15:02:28] code coverage is only for "real" unit test (the ones that extend CirrusTestCase) [15:02:40] zpapierski: for T280579, let me check with service ops. It is part of the process requirement on their side to get a new service deployed [15:02:41] T280579: New service request: WDQS Flink based Streaming Updater - https://phabricator.wikimedia.org/T280579 [15:03:08] gehel: thanks! [15:03:25] zpapierski: about coverage, that's by design, we want to promote unit tests over integration tests [15:03:26] dcausse: yeah, I noticed, bugs me everytime, though [15:03:46] gehel, ryankemper: we are about to commence T286069, reconfig of switch buffers on eqiad row D. [15:03:46] T286069: Switch buffer re-partition - Eqiad Row D - https://phabricator.wikimedia.org/T286069 [15:03:47] it would be cool if it included what cindy tests [15:03:50] Just a heads up. [15:03:53] and IMHO, branch / line coverage is meaningful for unit tests, but integration test should be measured in functional coverage [15:03:59] topranks: thanks [15:04:02] topranks: thanks! [15:04:27] Updates in #wikimedia-sre, hopefully be uneventful :) [15:18:20] Hi #wikimedia-search, our planned hadoop maintenance will take place right after the row D maintenance is done, and there are 2 running mjolnir jobs on the hadoop cluster we'd like to drain. Would that be an issue? [15:19:36] razzi: looking [15:22:09] (apparently the row d maintenance is done btw, looks like we didn't really lose connectivity at all) [15:22:36] razzi: feel free to kill the 2 running mjolnir jobs when you start [16:51:03] giving up for today, still getting failed on exception: javax.net.ssl.SSLHandshakeException: PKIX path building failed: even with the wmf-certificates package installed [16:51:29] have to run, will update the standup notes later tonight