[00:08:03] see ya tomorrow! [05:36:26] Hi all! I'll soon start running UpdateWeightedTags.php to upload custom per-country tags, per https://phabricator.wikimedia.org/T301030#7734236 [05:36:50] Feel free to kill the process if you suspect it's causing problems, it's easy to resume. [08:05:20] dcausse: due to my health issues, I completely missed the streaming updater outage - anything I can help with? [08:09:32] zpapierski: np! let's have a discussion with gehel this morning and I'll explain the situation [08:09:50] sure [08:10:03] I'll be out 10-11:30, though [08:10:50] ok [08:14:23] zpapierski: in the meatime here is I think the cause of our problem: https://phabricator.wikimedia.org/T302396#7733594 [08:14:43] unable to load a savepoint from k8s@codfw [08:14:57] but it's working fine in yarn@eqiad [08:15:21] huh,weird - networking issue? [08:15:48] I don't know, the app works fine for wcqs [08:16:31] I meant between k8s cluster and switft - apparently analytics cluster <-> swift works fine [08:17:44] difference is that yarn@eqiad is using swift@eqiad [08:17:58] but k8s@codfw is using swift@codfw [08:18:38] ah, I thought you meant the connection to codfw's swift works from yarn@eqiad [08:18:54] I don't think I can force that [08:21:56] it the comment you write that you've managed to restore from a previous checkpoint from swift@codfw [08:22:47] yes, unclear why it worked, I suspect something related to file size [08:23:10] huh, interesting - have you tried the previous ones? [08:23:19] and how are the sizes in eqiad? [08:23:41] I did not try to savepoint/restore in eqiad [08:23:54] as I deployed codfw first [08:31:57] dcausse, zpapierski: I'm around if you want to discuss [08:32:43] sure [08:33:08] gehel, zpapierski https://meet.google.com/pec-xcwa-oan [08:58:34] errand, be back in ~1.5h [09:48:51] errand + lunch [10:53:16] yarn runbook (wip) https://docs.google.com/document/d/16QEFnpttSr9CxPIHpqavMxeuoFcVy3zzqECnrls9dFk/edit?usp=sharing comments welcome [10:53:20] lunch [10:57:48] godog: I think you're the right person to ask - we use thanos via standard Swift interface, but we are experiencing issues with Swift we can't really pinpoint yet. OTOH Swift plugin was abandoned some time ago (and we have a custom verison of it anyway). We'd like to switch to S3 interface - is it available for thanos, or more specifically - can we switch to it? [11:03:12] zpapierski: hi, yes you can switch to S3 for thanos-swift [11:03:43] fantastic! how can we do that? I'm not super familiar with S3 and I have no idea if our credentials transfer [11:03:57] also, hi (I'm super rude, sorry) [13:07:08] godog: ^^ (forgot to mention you before [13:07:15] lunch [13:12:42] zpapierski: essentially use https://thanos-swift.discovery.wmnet as the s3 endpoint (and I believe region would be 'us-east-1' if required/wanted by the client) and yes the credentials to be used are the same [13:12:49] HTH! [13:24:58] errand [13:59:25] good morrow [14:21:23] gehel saw this workshop in email, think I may attend tomorrow if that is OK. https://meta.wikimedia.org/wiki/Small_wiki_toolkits/Workshops#Intro_to_Pywikibot_framework_and_installation [14:22:33] Wow, big list of attendees! Sounds like an interesting workshop! [14:24:19] yeah, would be cool to interact with our users and figure out what they do [14:24:48] o/ [14:28:14] godog: thanks, I'm going to experiment with it now [14:31:31] zpapierski: SGTM! let me know how it goes [14:47:26] zpapierski: we are starting the blazegraph replacement meeting early, and in a different room: https://meet.google.com/rvc-ykqc-oru [14:47:31] join us if you can [16:01:10] inflatador, ejoseph: retrospective: https://meet.google.com/ssh-zegc-cyw [16:01:25] i am trying to join the meeting [16:01:40] ejoseph: good luck! [16:03:00] ejoseph: the retro document is https://docs.google.com/document/d/1Hpnu7FRCffAeOhuIxC_O1y5DtAuyMdBbZ3Kgi6tYHPU/edit# [16:03:20] you can add stuff to the "what has happened since last retro" section [16:57:59] going offline early, see you tomorrow [17:33:21] how about that puppet deploy window? :) [17:35:04] i have i guess 5 puppet patches open, not sure entirely if they can all be shipped. a few at least :) [18:21:16] ebernhardson: ah sorry, been dealing with some issues that came up during decom of eqiad elastic hosts [18:21:35] ebernhardson: want to hop in now and see what we can get deployed? [18:22:02] meet.google.com/iqe-wcuz-mpn [18:22:19] ryankemper: sure, sec [18:36:16] ryankemper: inflatador: i think perhaps one of yu where fighting with puppet/certgen in relation to elastic search and i had suggested perhaps moving to the pki infratrsuctre. however due to the puppet paths you used this would have perhaps been a bit tricky, this managed to plant a passive bug in my head and as such i took a crack add adding cfssl support to tlsproxy::localssl and [18:36:22] elasticsearch::cirrus. it something i have only been working on ... [18:36:24] https://gerrit.wikimedia.org/r/c/operations/puppet/+/762535 [18:36:26] ... in the background and im not sur its completly polished. however im going on vacation for a few days so i thought i wuld ping you to get it on your radar [18:38:27] in theory that PS is a noop (and you are welcome to merge in my absenses if yu feel confortable) but i think it adds the neccesary params to test the use of pki in cloud at the very least [18:39:49] jbond: thanks, much appreciated! yeah inflatador has been leading the charge on that side of things so the cfssl support should come in handy! will take a look at the patch [18:39:57] jbond: also, enjoy your vacation :) [18:40:12] jbond once again we're in your debt, good sir! Enjoy your vacation ;P [18:40:49] wait untill yo ave reviewed th ps before thanking me ;) :P but np and will do :) [18:47:09] :P [19:17:24] ryankemper: Do you have any wish lists for desired CLIs or APIs for DevOps and scripting capabilities for the WDQS backend? [19:20:28] andreaw: great question, let me get back to you on that [19:22:53] ryankemper Feel free to start a google doc, or email me or something easier to search than IRC :-) [19:23:19] andreaw: +1, will do :) [19:27:55] gehel : ryankemper and myself are still merging puppet code with ebernhardson in https://meet.google.com/iqe-wcuz-mpn , may be late to pairing [19:29:57] ack [19:59:33] inflatador, ryankemper: should we cancel today's pairing session? [20:04:19] gehel: yeah, let's cancel [20:06:30] gehel yeah, probably just as well [20:06:37] ok [20:07:11] we can reschedule for tomorrow, anytime before the unmeeting. I'll let you send me an invite if you think that makes sense [20:16:16] lunch [20:57:59] back [21:10:40] back [21:39:57] write isolation config change applied, seeing expected changes to jobs. Things look plausible, LinksUpdate backlog is climbing a bit but still under 1s so only watching [22:33:02] in terms of job runner activity, total busy processes in eqiad increased by ~200 processes (up from 1k across eqiad). It still doesn't seem like it will be sufficient to turn on the saneitizer, but i suspect we should wait for the old ElasticaWrite queues to drain before doing anything more [23:12:17] ryankemper: turns out, i still messed up one part of that bigger patch which results in not providing the oauth war to blazegraph. I fixed up the live instances so they are running with the war now, but if they restart they wont work right. https://gerrit.wikimedia.org/r/c/operations/puppet/+/765652 [23:14:42] ebernhardson: ack, taking a look [23:17:10] it mostly amounts to forgetting during refactoring, i started to handle that case and didn't finish. and perhaps that i would have expected referencing a variable that doesn't exist in a template would have triggered a failure instead of imputing an undef [23:25:20] ebernhardson: merging [23:32:09] ryankemper: thanks [23:42:08] ebernhardson: I restarted `wcqs1001`, but not any of the other ones (already ran puppet on them though). Any logs I should be looking at on `wcqs1001`? [23:44:19] ryankemper: if it has OAUTH_RUN in /etc/default/wcqs-blazegraph it should be good [23:45:12] ebernhardson: It does. This space looks kind of weird though `OAUTH_RUN=" mw-oauth-proxy-*.war"` (if that just gets funneled into a shell command then it should work just the same I imagine due to shell splitting) [23:45:26] ryankemper: otherwise, it's annoying to test oauth. I wonder if we could have some way like the mwdebug hosts where a special header routes things to a specific server [23:45:35] ah yeah it already looked like that previously so space is def fine [23:45:44] ryankemper: hmm, indeed that space shouldn't be there. It won't break anything, just ugly :) [23:46:16] ebernhardson: that would be nice to have. I was just thinking about how annoying it is testing if the service is working (and we mentioned it a bit during the puppet deploys earlier too) [23:46:41] we've pondered ways around oauth, but not how to actually make sure oauth works on a specific host. We probably should [23:46:47] hmm