[06:42:32] Welcome back! [06:52:36] o/ [07:05:52] o/ [07:44:55] dcausse: what's the state of this - https://gerrit.wikimedia.org/r/c/wikidata/query/rdf/+/719456 ? [07:47:32] zpapierski: was WIP because still based on the 1.14 RC, now that it's out I think we can unblock T289836 and move forward with it [07:47:32] T289836: Upgrade to latest flink (1.14) - https://phabricator.wikimedia.org/T289836 [07:53:57] ok, let's remember that during planning [07:59:41] dcausse, ebernhardson: just in case you missed it: https://lists.wikimedia.org/hyperkitty/list/wikitech-l@lists.wikimedia.org/thread/VKTQ45XYLJBXLCGQWHE3B4JZQUM6KKBK/ [08:01:20] huh, that's a nice mailing list archiver we have there, missed the switch [08:02:15] gehel: I'll take a look, will try to respond today [08:02:34] dcausse: thanks! No emergency! [08:10:27] o/ [08:10:30] hello folks! [08:11:45] whenever you have time https://gerrit.wikimedia.org/r/c/operations/puppet/+/732611 - it is for the nginx tls proxy used by elastic nodes, I think that there is a little improvement to make to improve TLS cert reload when a new one is issued via acme-chief [08:12:01] (see related task) [08:22:37] (not urgent, there was a problem last week with cloudelastic nodes, we can review/follow-up any time) [08:22:50] elukey: I'll have a look [08:24:52] ack thanks :) [08:30:23] elukey: I'm not 100% sure about those dependencies, but in principles, this patch looks good! [08:41:35] gehel: good point, I just updated the cr [08:41:54] the majority of the nodes are Search ones, the rest is ms-fe (I'll ask people to comment as well) [08:42:32] it does not look risky, so feel free to move forward whenever you want! [08:42:38] ack! [08:42:42] ryankemper: fyi ^^^ [08:42:45] thanks for the brain bounce [08:42:56] (it is not urgent so I can also wait for Ryan) [08:43:48] It's a 1 line change with low impact, don't feel blocked on us ! [08:44:22] ejoseph: are you around ? Time for a chat ? [08:44:37] * gehel is still making coffee, but will be back in 5' [08:48:58] ejoseph: ping me when you're around! [08:51:38] dcausse, gehel: in light of the streaming updater and our focus to move away from blazegraph, we can probably abandon this? https://gerrit.wikimedia.org/r/c/wikidata/query/rdf/+/556188 [08:52:04] (sorry, I'm having a Marie Kondo kind of day :) [08:52:25] yep, let's drop it [08:52:46] yes and I think I dropped the code it's based on [08:54:12] gehel: https://gerrit.wikimedia.org/r/c/wikidata/query/rdf/+/691919 - to be abandoned or merged? [08:54:48] it's not working as expected as-is, but we should get it working at some point [08:55:17] leave it for now, we'll see if I have time to dig a bit more at some point [08:55:22] ok [09:08:30] ebernhardson: you want more discussion on this - https://gerrit.wikimedia.org/r/c/wikidata/query/rdf/+/716076/ ? [09:09:40] dcausse: any objection against me +2ing this - https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CirrusSearch/+/720848 ? [09:10:51] zpapierski: please go ahead [09:16:03] dcausse: cleaning up got me thinking about the old updater code - I know we shouldn't probably remove it from the repo, but I was thinking we could at least sideline it somehow? I'm not super sure what our policy on that code should be [09:19:06] zpapierski: if it does not get in the way of future work we should probably just keep it as-is [09:30:08] @gehel [09:30:18] Good morning [09:30:22] ejoseph: good morning! [09:30:55] I add problem with my internet this morning [09:31:06] It just started working [09:32:16] ok, want to jump in a Meet to catch me up on what happened last week? [09:32:58] https://meet.google.com/zit-iuuw-jwv [09:33:15] Yh sure [09:33:44] ejoseph: https://meet.google.com/zit-iuuw-jwv [09:41:29] ryankemper: I've setup some time with Emmanuel and you to setup his SSH access today. Can you work with him to ensure he has access to at least our WMCS projects? [09:58:16] early lunch (I have SRE interview soon, please keep all your fingers crossed) [10:10:04] it will be super hard to work with all fingers crossed, but I'll try [10:16:05] lunch [10:39:44] break [11:30:11] zpapierski I need some task i can work on for cirrus search [12:08:37] ejoseph: Zbyszko is probably still at lunch, but maybe dcausse can help you [12:08:56] looking at the backlog [12:09:03] in the meantime: have you sent the expense for the laptop? [12:20:27] ejoseph: T285574 should be a good first bug [12:20:27] T285574: apihelp-cirrus-config-dump-param-prop needs creating - https://phabricator.wikimedia.org/T285574 [12:23:57] ejoseph: you'll probably need help to understand what needs to be done. Ping dcausse when ready! [12:24:06] Ok [12:24:12] I am ready [12:24:20] Scaling [12:24:33] dcausse: are you available [12:24:37] ejoseph: sure [12:25:11] ejoseph: https://meet.google.com/pvf-irhp-jse [12:55:34] ejoseph: sorry, I was out, but I see that issue has been resolved [13:00:01] errand, back in 30' [13:13:13] ejoseph: I'm back in https://meet.google.com/pvf-irhp-jse if you're still available [13:14:11] dcausse: give me 10 minutes [13:14:16] sure [13:32:29] dcausse: same link? [13:32:40] ejoseph: yes I should be there [13:32:51] https://meet.google.com/pvf-irhp-jse [15:01:06] triaging meeting: https://meet.google.com/qho-jyqp-qos [15:01:19] ryankemper, Trey314159, ejoseph ^ [16:12:05] my logging problems...probably not strictly limited to the repo, but in trying to load the mw-oauth-proxy war into a generic jetty server and getting a 500 logs only say `SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".` and then `SLF4J: Defaulting to no-operation (NOP) logger implementation`. [16:13:41] So i run jetty with `--add-to-start=slf4j-simple-impl` and it fails with `LinkageError: loader constraint violation: when resolving method "org.slf4j.impl.StaticLoggerBinder.getLoggerFactory()Lorg/slf4j/ILoggerFactory;" the class loader (instance of org/eclipse/jetty/webapp/WebAppClassLoader) of the current class, org/slf4j/LoggerFactory, and the class loader (instance of [16:13:43] sun/misc/Launcher$AppClassLoader) for the method's defining class, org/slf4j/impl/StaticLoggerBinder, have different Class objects for the type org/slf4j/ILoggerFactory used in the signature` [16:14:56] The args show this loads slf4j 1.7.32, we reference 1.7.25 in pom.xml. But recompiling against 1.7.32 made no difference [16:21:29] commenting out the *-over-slf4j entries and adding commons-logging to mw-oauth-proxy pom.xml allows it to "work" and give me appropriate error logs, but i have to undo that before submitting [16:52:58] sure you've got 1.7.32 after rebuild? [16:55:51] also, do we even have slf4j-simple on the classpaht? [16:57:09] hmm, i can do another build with 1.7.32 but pretty sure. slf4j-simple is inside the docker container for jetty, can see on command line in ps that it gets added [16:57:38] * ebernhardson notes that the docs make it seem like jetty 10+ would be much better, if we weren't stuck on java 8 :P [16:59:55] I'm not sure its needed, but I'm surprised there's no slf4j-api in wm-oauth-proxy [17:00:14] i can add that, i just added whatever gehel mentioned in the review :) [17:00:27] * ebernhardson has no clue how java logging works, this is all way more complicated than it needs to be :P [17:01:18] that's because there are 5 competing standards [17:01:59] perhaps it doesn't need the api because we don't use it directly, it should be used by the http client via commons-logging api [17:02:11] (which is also why i don't understand how this logging breaks jetty telling me about 500's....) [17:03:07] really, all i want in the end is the 500 message when oauth calls mediawiki for identity info and fails :P [17:03:38] ah, right, no actual logging from service [17:32:14] dinner [17:32:29] * ebernhardson will probably ignore the logging problem for now, and just switch between the two logging configs depending on which logs i want :P [17:59:46] * ebernhardson ignores the spellcheck claiming you cant say 'an html' [19:01:44] ebernhardson: https://meet.google.com/stp-swkd-iho [21:18:26] ryankemper: how goes everything today? Wondering if we could try and push the mw-oauth-proxy secret's stuff [21:35:43] ebernhardson: looking at https://gerrit.wikimedia.org/r/c/operations/puppet/+/732801/4/modules/query_service/templates/blazegraph-default.erb#26, is `oauth_access_token_secret` the one I need to provision? [21:36:23] ryankemper: yup. I imagine it will come in like the oauth_consumer_secret does in wcqs.pp, but i left the value as tbd [21:36:47] ah yeah I see [21:37:00] https://gerrit.wikimedia.org/r/c/operations/puppet/+/732801/4/modules/profile/manifests/query_service/wcqs.pp#62 names it `oauth_access_secret` [21:37:19] but then https://gerrit.wikimedia.org/r/c/operations/puppet/+/732801/4/modules/query_service/templates/blazegraph-default.erb#26 is trying to index into `oauth_access_token_secret`, should those be the same name? [21:37:33] hmm, indeed those are supposed to be the same. I've been indecesive on naming and didn't notice this was mixed up...hmm [21:37:46] i suppose i like oauth_access_token_secret better [21:39:05] looks like it also made it into the OAuthSettings type as oauth_access_token_secret, it's just the one place where it's misspelled [21:39:37] cool `oauth_access_token_secret` it is [21:45:44] Okay I added a random value for `profile::query_service::oauth_access_token_secret` in `hieradata/role/common/wcqs/public.yaml` of the `/srv/private` repo [21:49:00] ryankemper: ok cool, new patch up with typos fixed and referencing that. Running pcc now [21:49:57] ebernhardson: that's funny i uploaded presumably a very similar patch right at the same time [21:50:32] :) it complains that the secret isn't in the fake-secret repo [21:52:52] oh weird I thought the fake-secret entry was only needed if it wasn't using the hieradata approach [21:54:30] ebernhardson: do you know if there's an easy way to make it take your patch (PS5) instead of PS6? [21:56:59] ryankemper: hmm, probably a new ps7 with the same content [21:57:04] sec i can send that [21:58:13] sent [22:09:30] ebernhardson: https://gerrit.wikimedia.org/r/c/labs/private/+/734418 should unbreak pcc, although not sure if I should actually have added it here instead: https://github.com/wikimedia/labs-private/blob/e4febb186bc07c1d86976bd0c3d2c2a923780e5e/hieradata/common.yaml#L54 [22:12:06] ryankemper: hmm, i forget which one is the important one sec [22:12:41] ryankemper: iirc common.yaml is only referenced from labs hiera, never prod, so i think you have the right place [22:14:02] cool [22:15:25] merged [22:18:11] pcc looks reasonable to me [22:18:29] Yup looks good [22:19:54] Alright merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/732801 [22:20:25] so i suppose, after puppet runs I'll do the scap deploy [22:20:31] against wcqs env [22:21:21] or i guess you can, you've probably done it a few times :) [22:23:51] ebernhardson: yeah I can do it, were we still trying to run the dry-run command (`scap deploy --dry-run --no-log-message -v --environment wcqs`) or are we past that? forgot where we left off [22:24:35] ryankemper: can't hurt to dry-run first, but iirc that was able to work last time [22:24:57] okay that lines up with what I remember [22:24:58] got it [22:25:20] what i wanted most from dry-run was verification it only tried wcqs hosts :) [22:29:37] wise :P [22:29:51] ebernhardson: `scap deploy -v --environment wcqs 'Deploy 0.3.90 to WCQS'` worked [22:30:57] i suppose too much to hope it works out the box, getting a 500 when making requests :) Looking for where logs would end up [22:35:34] :P [22:38:42] it looks like the access token secret isn't coming up, looking why [22:39:08] it seems to have been written out appropriately...hmm [22:42:22] ryankemper: looks like /srv/deployment/wdqs/wdqs on the deployment host needs a pull, it doesn't have the runBlazegraph.sh updates [22:42:45] ah right, I never updated the git repo [22:42:55] logs end up only in journalctl afaict, i suppose it's out of scope but would be nice if it could make it to logstash [22:49:43] wcqs blazegraph just went [22:50:48] Spookreeeno: in which way? [22:51:30] ebernhardson: PROBLEM - Check systemd state on wcqs1001 is CRITICAL: CRITICAL - degraded: The following units failed: wcqs-blazegraph.service [22:51:37] see -operations [22:51:53] Spookreeeno: thx, looking [22:52:17] ahh, that would imply the deploy running right now isn't working :S We should silence the wcqs alerts though, nothing breaking there should matter yet [22:52:41] I'll downtime wcqs* [22:52:49] ryankemper: thanks [22:55:29] okay downtimed them until 1 hour before a week from now [22:55:47] sounds reasonable [22:56:30] ryankemper: maybe !log or ack the alerts that have gone off too [22:57:39] Spookreeeno: good idea, done [22:57:52] ty :)