[07:51:47] AntiComposite: I can confirm, update latency has increased since friday [07:53:11] that's since we re-enable the sanitizer :/ [07:55:32] either we increase parallelism of the ElasticaWrite job or we re-tune the saneitizer profiles [08:04:37] or we re-implement back-pressure over prometheus metrics [08:06:47] gehel: if you're around I think we might want to revert https://gerrit.wikimedia.org/r/c/operations/puppet/+/752724 while we settle on a solution for the saneitizer [08:06:59] impact https://grafana-rw.wikimedia.org/d/000000484/kafka-consumer-lag?orgId=1&from=now-7d&to=now&var-datasource=eqiad%20prometheus%2Fops&var-cluster=main-eqiad&var-topic=eqiad.cpjobqueue.partitioned.mediawiki.job.cirrusSearchElasticaWrite&var-consumer_group=All [08:07:23] it's ~9hours of lag for search updates [08:13:25] dcausse: I'm on it [08:13:32] thanks! [08:14:22] dcausse: https://gerrit.wikimedia.org/r/c/operations/puppet/+/758317 [08:29:31] yay, I'm back [08:29:48] finally, some rest from the physical labour [08:30:10] zpapierski: welcome back! [08:30:42] gehel: thx - mind syncing me up on meet? [08:30:51] in 5' ? [08:31:02] sure [08:31:52] welcome back! [08:33:55] thx! [08:35:40] zpapierski: meet.google.com/aia-caib-cuy [08:35:45] dcausse: feel free to join as well [08:35:49] ejoseph: same [08:49:22] andreaw: Zbyszko is back, you should probably talk to him at some point (zpapierski on IRC) [08:55:59] ejoseph: I will be 5' late [10:55:47] lunch [10:56:33] Lunch [11:58:10] dcausse: let's continue when you are back [12:42:44] lunch and relocation [13:22:20] ejoseph: I'm around [14:52:44] inflatador: I think that T300310 can be moved to "needs reporting", can you confirm? [14:52:45] T300310: wdqs1010 puppet failure due to lack of `journal` variable - https://phabricator.wikimedia.org/T300310 [15:17:39] we somehow broke the query log on wcqs-beta :( [15:18:18] if I stuck to recording it each week I might've noticed, but since nobody used that, I decided to do that on demand [15:18:26] (since it's rolling over anyway) [15:18:33] that was perhaps a mistake :( [15:20:50] Greetings, Searcharinos [15:20:57] gehel will check it out [15:24:22] o/ [15:35:12] AFK, back in ~15 [15:55:17] aaand back [15:58:45] o/ [15:58:56] \o [16:01:43] ejoseph, ryankemper: triaging meeting - https://meet.google.com/qho-jyqp-qos [17:03:15] inflatador: you can probably already move T299797 to "in progress" (since it is) or keep it in "incoming" if you want to go through the estimation process (which is a good way to get feedback from the team on what needs to be done, what parts you might not have identified, etc...) [17:03:16] T299797: Deploy new elastic cluster nodes on deployment-prep - https://phabricator.wikimedia.org/T299797 [17:03:47] Yes, Phabricator is non intuitive, and our process is convoluted. We can talk more about it in our next 1-on-1 if you want [17:03:55] And dinner time! [17:26:27] dcausse is your Mac intel or M1 based? [17:26:27] M1: MediaWiki Userpage - https://phabricator.wikimedia.org/M1 [17:58:03] hmm, how would you mock file opening in java? I could add a one-arg and no-arg constructor, with one-arg taking a 'file opener' function, but seems awkward [17:58:35] (also the file-opener function is tedious because FileInputStream::new throws which i guess requires a FunctionalInterface? [17:59:17] the class in question of a config class that takes no args and reads out of System properties [18:13:40] inflatador: I use linux it's for helping Emmanuel to setup phpStorm/MediaWiki with unit tests [18:16:33] ebernhardson: I usually see constructor taking a InputStream but perhaps it's for allowing to re-open it? [18:17:55] dcausse I misunderstood you at the mtg, sorry [18:18:29] ...but it does sound like ejoseph is the one who might need help. Still working on it [18:18:31] dcausse: it doesn't really need to reopen, the thing is all the code that knows how to go `system property -> value we need` is inside that class, so it seems improper to have something else figure out one case and pre-provide the InputStream [18:21:59] ebernhardson: I think I'll need to read the code, not sure I understand what you're trying to do :) [18:22:01] errand [18:24:35] i guess the underlying problem is that in OAuthProxyConfig everything else returns plain strings, but i was trying to avoid having string interface for everything so this transforms the path to banned usernames into Set [18:25:58] well, no that works fine...really it's just injecting external state (from the file system) where previously this was isolated to reading properties [18:48:33] huh, turns out asking a laptop to do multiple things fails the wikidata/query/rdf build: 10:46:30.384 [main] WARN o.w.q.r.b.t.SystemOverloadFilter - Request throttled because of system load higher than 8.0. [18:54:33] lunch, back in ~45 [19:26:16] and back [19:28:36] gehel or anyone else, for the recurring phab ticket creation, do you think using a DAG in airflow would be appropriate? [19:30:10] inflatador: I think that keeping monitoring to a monitoring system makes more sense [19:30:50] technically, we could run anything we want in Airflow, but it does not feel like this is the expected place for something like that [19:32:28] * ebernhardson has wondered if there should be an airflow-like thing in prod [19:33:15] Monitor is kind of a separate piece, to me the important thing is getting that phab task created on a schedule without hacky/single person interactions like calendar reminders [19:35:44] SREs have suggested using icinga integrations, so maybe I should just listen to gehel ;) [19:42:08] i always feel awkward about icinga, but that is the way people seem to suggest. I feel like icinga is a "ZOMG problems!! do something now!!" kind of system. And we don't want that at all :) [19:43:40] i suspect though that other people know better how to use icinga and it has ways [19:55:01] patch up for reading banned usernames from file, i imagine a few -1's and hoping someone can say the right way to thread the state through... https://gerrit.wikimedia.org/r/c/wikidata/query/rdf/+/757993 [19:55:18] lunch [20:01:38] ebernhardson: well, and there's a first -1 from Jenkins :) [20:01:57] ebernhardson: since you're @lunch, I suppose we can reschedule our 1-on-1 :) [20:05:30] gehel: doh i forgot, still here :) [20:05:44] still here too :) [20:30:58] ok, second try for lunch :) [21:01:59] ryankemper: We have the update lag SLO for WDQS now, but I was wondering what more, if anything, you think might be done for https://phabricator.wikimedia.org/T258754. cc inflatador [21:06:25] good night all! [21:06:37] back [22:00:33] an amusing historical artifact of how its hard to deploy simple things: https://www.youtube.com/watch?v=3t6L-FlfeaI [22:02:36] (from a conversation earlier, but i thought others might find it amusing) [22:02:43] mpham: I think a good finishing step on the ticket would be to document our wdqs SLO on https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service [22:02:46] I can look into adding a section [22:02:58] and then I'll put a blurb about the SLO we settled on in our next SRE meeting [22:03:10] So once the former (the docu) is done I'd feel comfortable moving it to needs reporting [22:03:28] mpham: oh and we should check if we've sent out a communication to the community about that SLO as well, do you recall if we've done that? [22:03:57] ebernhardson: heh, I remember seeing a HN thread about that video once, where some googlers were saying that vid made a big stir and led to them changing some stuff internally [22:04:25] the typical "everyone has subconsciously or consciously known this was an issue but finally a sufficiently viral meme generated enough momentum to actually improve things" scenario :P [22:04:33] ryankemper: that'd be great if you want to add and bring up in the next SRE meeting. On my end, I know I've pointed to it a few times in WDQS updates [22:04:37] ryankemper: yea, i brought it up when we were talking about how hard it is do deploy things here too, thankfully it's not this bad :) [22:04:37] to the community [22:04:58] mpham: great, I'll assign the ticket to me and stick it in needs reporting when I add the documentation section [22:05:09] (I won't block the closing of the ticket on waiting to the next sre meeting, will just take TODO for that) [22:07:15] cool, thanks! [23:40:15] I'm out for the day, ryankemper will be checking on the deploy [23:41:12] (wdqs deploy, to be specific)