[01:31:11] PROBLEM - Check unit status of monitor_refine_event on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit monitor_refine_event https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [07:33:35] Good morning team - After NaΓ© last week, Lino is to stay home this beginning of week (some other kid has covid in his class, we need to have him tested) - I will be on-and-off again today :( [07:35:25] :( Sorry to hear. Hope he will be healthy [07:47:27] 10Analytics, 10Data-Engineering: Implement digest-only mediawiki_history_reduced dataset in spark - https://phabricator.wikimedia.org/T181703 (10JAllemandou) The reason behind this task is still relevant, but not clear through the parent tasks: we wish/need to define/agree on/implement a technical solution al... [08:46:21] hi teammm! [08:54:31] hola Marcel [08:54:39] good morning mforns :) [08:54:48] hello hello :] [08:57:40] mforns: I have finally took the time to watch the Mandalorian's seasons (before Boba Fett) [08:57:54] aha! [08:57:59] and I see that there is a dedicated Lego series [08:58:14] :D [08:58:43] by lego you mean: construction game, tv show, or videogame? [08:59:05] the first :D [09:00:38] yes, I have one :] [09:04:22] aaahhhhh [09:04:26] Grogu? [09:24:55] (03CR) 10Mforns: [C: 03+1] "LGTM, left a comment :]" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/747561 (https://phabricator.wikimedia.org/T263277) (owner: 10Joal) [09:41:24] hi all! [09:45:59] we have this Grogu squish mellow stuffed pillow thing, it's bigger than Atlas, so funny [09:52:24] ahahhaah [09:52:27] hello :) [10:05:15] Morning all. [10:20:19] hey btullis :] [10:20:28] heya milimetric :] [10:20:47] Hiya mforns. [10:20:53] hi mforns! I'm gonna try to wake up early to catch you these days [10:21:12] oh, sorry for that :D [10:21:40] 0h... 5am? [10:27:00] 10Analytics, 10Data-Engineering: Transform EventLoggingToDruid job to read schemas to ingest from an allowlist and process them all - https://phabricator.wikimedia.org/T202312 (10mforns) [10:49:06] 10Analytics-Clusters, 10Data-Engineering, 10Data-Engineering-Kanban, 10Cassandra, and 2 others: Investigate high levels of garbage collection on new AQS nodes - https://phabricator.wikimedia.org/T298516 (10BTullis) I've been looking at the dump captured on Friday, but I confess that I'm not really certain... [10:53:56] elukey: Should I go ahead and upgrade hive on an-coord1002, do you think? Then fail over to this with DNS once we're happy with it? [10:56:27] btullis: o/ is oozie already upgraded in test? Or do you want to do it later? [10:56:38] just to make sure that we test it as well with Hive [10:57:03] if it doesn't work for some reason we'll need to rollback prod [10:57:23] (I don't foresee this but I didn't see the last classpath error coming as well :D) [10:57:53] Good point. Oozie is not already upgraded in test. I haven't built any upgraded packages for it yet. [10:58:29] I left some notes in the task, it needs to be done after the hive build to use the same maven cache [10:58:50] otherwise it will fetch deps from maven central, that have the old log4j etc.. [10:59:08] (and there are client packages too etc..) [10:59:18] once we verify that we are good to go [10:59:44] I see it now, thanks. I will build those packages and test. [11:01:38] elukey: Any chance I get your opinion on this please, if you have time? https://phabricator.wikimedia.org/T297734 [11:03:17] Can't work out why the `%u` doesn't seem to be doing the right thing in parquet-logging. Am hoping to test upgraded hive packages to see if it goes away, but otherwise I might have to change the whole logging configuration to get it to use something other than `java.util.logging.FileHandler` [11:07:20] 10Analytics, 10Data-Engineering: Transform EventLoggingToDruid job to read schemas to ingest from an allowlist and process them all - https://phabricator.wikimedia.org/T202312 (10mforns) p:05Mediumβ†’03Low @odimitrijevic No, I don't think this has been implemented. It would be cool to have it be managed by a... [11:09:46] btullis: very weird [11:10:21] Yeah. [11:12:21] I think we can move those logs to the console setting something like SEVERE/WARN [11:12:39] and never think about them anymore :D [11:12:45] they are not really useful [11:12:59] (maybe from WARN onward it could help, but INFO surely not) [11:14:47] OK, thanks. That's a bit like this change then? https://gerrit.wikimedia.org/r/c/operations/puppet/cdh/+/471928/ [11:15:22] yes yes +1 from my side [11:15:59] I'd defer the final vote to the Master Jedi Joseph but it seems a nice workaround, the tmp logs are not really useful from what I can see [11:17:02] OK, well maybe I'll still test it with the new Hive packages first, but it's good to know that we've got a way forward if the upgraded packages don't fix it. [11:20:43] there is maybe a fix if we dig into endless jira reports etc.., but in this case it seems a big waste of time given the value of those logs (my 2c) [11:21:22] πŸ‘ Thanks. [13:24:33] Hey all, quick question about events on vagrant [13:24:55] I get "Loading schema at /analytics/mediawiki/mediasearch_interaction/1.3.0" [13:25:08] Followed by the error: "schema should be object or boolean" [13:47:16] Hi Seddon - Sorry I have no idea, I'm no vagrant user :S Maybe ottomata when he comes online? [13:47:25] Ooh, sorry seddon. This is a new one to me me. I've never tried that yet. [13:55:47] hi Seddon maybe I_can help [13:56:07] !log Upgrading oozie packages on an-test-coord1001 to test new log4j versions [13:56:09] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:56:14] i don't recall what produces mediasearch_interaction, but perhaps it is WikimediaEvents extension? [13:56:17] are you using that? [14:00:46] (03CR) 10Ottomata: [C: 03+1] rdf-streaming-updater: add a "reconcile" operation [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/737429 (https://phabricator.wikimedia.org/T279541) (owner: 10DCausse) [14:01:58] (03CR) 10Ottomata: Update refine netflow_augment transform function (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/747561 (https://phabricator.wikimedia.org/T263277) (owner: 10Joal) [14:11:55] Bigtop 3.1 (the next release) will include kafka 2.8.1 in theory, that has the first support for raft (to remove the zookeeper dep) [14:12:38] it doesn't have feature parity with the zookeeper codebase, but in kafka 3.0 they should release something more stable [14:12:57] I am reading https://kafka.apache.org/20/documentation/streams/upgrade-guide and upgrading from 1.1 to 2.x doesn't seem horrible [14:13:26] Do we have an issue with the zookeeper dependency at the moment? [14:14:13] it would just be reallllyyyyy nice not to need it [14:14:14] but [14:14:22] i think we should likely wait until that feature is widely used [14:14:30] we could probably upgrade kafka and still use zk for now [14:14:47] but once again...the reason we haven't upgraded kafka brokers is that the broker code hasn't really changed that much [14:15:08] most of the version bumps are about clients and frameworks, like kafka streams and kafka connect [14:15:18] MirrorMaker 2 looks really great [14:15:20] and we could even run it in k8s [14:15:55] sure sure bumping to 2.8.1 would need zk to be stable, but it would be an intermediate step between 1.1 and 3.x [14:16:27] I am pretty sure that even on the broker side they fixed a ton of bugs [14:18:03] also https://issues.apache.org/jira/browse/KAFKA-7264 [14:18:22] 2.x supports java 11 [14:19:13] https://issues.apache.org/jira/browse/KAFKA-7251 [14:19:15] tls 1.3 [14:19:17] etc.. [14:19:50] yeah, I think the intermediate step argument is the most important, two-major-version upgrades are nerve-wracking [14:20:24] (03CR) 10Mforns: [C: 03+1] Update refine netflow_augment transform function (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/747561 (https://phabricator.wikimedia.org/T263277) (owner: 10Joal) [14:21:49] I am in favor of keeping track, once in a while, upstream versions to avoid big upgrades when needed (like security vulnerability etc..) [14:22:15] not running the latest code with latest features in production to be clear :) [14:24:18] 10Analytics-Radar, 10fundraising-tech-ops: puppetize CA changes for kafkatee on fundraising banner loggers - https://phabricator.wikimedia.org/T296765 (10elukey) @Jgreen Hi! Let me know when it would be a good moment to work on this, I am more than happy to help! [14:25:12] btullis: we can work on https://phabricator.wikimedia.org/T296990 when JGreen gives us the green light! [14:25:25] (I moved kafka main eqiad this morning, the procedure is safe) [14:27:15] elukey: Will do. [14:27:19] 10Data-Engineering, 10observability, 10serviceops, 10Patch-For-Review: Move kafka clusters to fixed uid/gid - https://phabricator.wikimedia.org/T296982 (10elukey) [14:33:12] hello mforns ! [15:14:55] heya ottomata! :] [15:16:08] hellLoOooo [15:18:12] got lots to sync up on about conda envs when you have time :) [15:20:46] (03CR) 10Mforns: [C: 03+1] "LGTM!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/740589 (https://phabricator.wikimedia.org/T258834) (owner: 10Joal) [15:37:09] heya ottomata wanna sync now? [15:42:14] mforns: gimme 7 mins to finish my current thought [15:42:25] k! [15:43:15] actually finished my though! now good [15:43:17] bc [15:43:34] mforns: ^ [15:43:35] ottomata: omw [15:43:37] oh [15:43:58] if you haven't yet, read from https://phabricator.wikimedia.org/T296543#7585042 and below [15:44:12] ah! 2FA [15:44:13] ok [15:53:26] mforns: all DE team now has perms to restart cassandra on aqs (https://gerrit.wikimedia.org/r/c/operations/puppet/+/751104) [15:53:46] thanks elukey :] [16:05:43] SandraEbele: Heya - standup? [16:08:11] SandraEbele: we just realized you were not invited to today's instance - fixed! [16:35:27] 10Analytics-Radar, 10fundraising-tech-ops: puppetize CA changes for kafkatee on fundraising banner loggers - https://phabricator.wikimedia.org/T296765 (10Jgreen) >>! In T296765#7609049, @elukey wrote: > @Jgreen Hi! Let me know when it would be a good moment to work on this, I am more than happy to help! Hey @... [16:39:57] 10Data-Engineering, 10Airflow: [Airflow] Troubleshoot MySQL connection issues - https://phabricator.wikimedia.org/T298893 (10mforns) [16:52:21] (03CR) 10Mforns: [C: 03+1] "LGTM" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/740590 (https://phabricator.wikimedia.org/T258834) (owner: 10Joal) [17:43:52] 10Analytics-Radar, 10Cite, 10Reference Previews, 10WMDE-Technical-Wishes-Maintenance: Remove or simplify tracking metrics - https://phabricator.wikimedia.org/T242127 (10thiemowmde) [17:54:41] 10Analytics-Radar, 10fundraising-tech-ops: puppetize CA changes for kafkatee on fundraising banner loggers - https://phabricator.wikimedia.org/T296765 (10Jgreen) 05Openβ†’03Resolved p:05Triageβ†’03Medium a:03Jgreen >>! In T296765#7609049, @elukey wrote: > @Jgreen Hi! Let me know when it would be a good m... [17:54:44] 10Analytics-Radar, 10Data-Engineering-Radar, 10Event-Platform, 10Patch-For-Review: Move Kafka Jumbo's TLS clients to the new bundle - https://phabricator.wikimedia.org/T296064 (10Jgreen) [18:50:02] joal: and/or mforns yt? i know why yarn cluster mode isn't working right, looking for a brain bounce for what to do about it [18:50:16] I'm here! [18:50:19] bc? [18:50:29] ya [19:06:36] joining as well ottomata and mforns [19:06:48] joal: we're done [19:06:52] ah ok :) [19:06:56] not joining then :) [19:07:00] heheh [19:07:05] you guys are fast! [19:07:15] no no, we didn't finish [19:07:25] but andrew will be back soon, and you can join [19:07:33] I'm not going to be there though... [19:23:06] bye teammm! [19:25:56] Good night mforns [19:29:10] back joal in case you are still here! :) [19:29:29] hey ottomata - happy to help :) [19:29:36] ya i think you might be able to [19:29:37] bc? [20:33:14] 10Analytics-Clusters, 10Data-Engineering, 10Data-Engineering-Kanban, 10Cassandra, and 2 others: Investigate high levels of garbage collection on new AQS nodes - https://phabricator.wikimedia.org/T298516 (10Eevans) >>! In T298516#7608491, @BTullis wrote: > [ ... ] > If anyone else has any ideas on how to ge... [20:38:13] 10Analytics-Clusters, 10Data-Engineering, 10Data-Engineering-Kanban, 10Cassandra, and 2 others: Investigate high levels of garbage collection on new AQS nodes - https://phabricator.wikimedia.org/T298516 (10Eevans) >>! In T298516#7604802, @BTullis wrote: > I have created the heap dump file. I had to chown t... [21:01:27] hello! I was wondering if there are aggregate statistics for raw UAs out there (or, if not, whether there could be a way to get them) I know about the periodic reports but I was looking for something with the full UAs [21:02:02] Hi GenNotability - Full UAs are considered PII, we don't release them publicly [21:02:06] basically - I'm a checkuser and I'd be interested in writing a tool that annotates CU results with the prevalence of each useragent [21:02:37] joal: yeah, that's what I figured the answer would be [21:05:41] though since CUs have signed the ANPDP (which I know isn't the "real" NDA that analytics folks deal with), is there any chance of something being worked out such that only CUs have access? [21:05:45] just spitballing [21:07:28] I guess it probably depends on the need: a one off with somehow anonymised data? An interface CU could visit to regularly chekc up to date data? [21:07:39] I'm imagining something where a UA comes up during CU, gets sent to a service, and the service replies with "80% of traffic the past three months has that UA" (or something like that) [21:07:39] If the latter, it's "a lot of work" :) [21:08:34] (I also imagine me writing this hypothetical query endpoint, I don't want to just shove a big request on you all) [21:08:56] As in: we (analytics) have not yet set up dedicated interfaces for subsets of users, and security risks are not neglectible here [21:09:14] If we do aggregatio [21:10:47] GenNotability: You know what - best is to create a ticket with a precise enough descirption of your need - I'm giving you answers from the top of my head while the team should decide on this :) [21:11:25] GenNotability: I'm sure we have the data you're after (in some kind or another) - now how to present it the question [21:11:26] joal: sure, can do! and thanks for your time :) [21:11:45] You're very welcome GenNotability - I wish we can help :) [21:12:16] And with that, I'm gonna sign off! have a good rest of the da folks [21:17:48] 10Data-Engineering-Kanban, 10Airflow: Tooling for Deploying Conda Environments - https://phabricator.wikimedia.org/T296543 (10Ottomata) Alright, figured some stuff out (thanks Marcel and Joseph for the brian bounces). The problem is between call.py and the serialization of the spark code for the executors. c... [21:31:21] 10Analytics: Access to aggregate User Agent statistics - https://phabricator.wikimedia.org/T298912 (10GeneralNotability) [23:11:40] 10Analytics, 10MediaWiki-General: Update pingback "PHP Version" dashboards - https://phabricator.wikimedia.org/T298922 (10Reedy) [23:52:41] 10Analytics, 10MediaWiki-General: Update pingback "PHP Version" dashboards - https://phabricator.wikimedia.org/T298922 (10Reedy) https://meta.wikimedia.org/w/index.php?title=Config%3ADashiki%3APingback&type=revision&diff=22570140&oldid=19523789 1.35, 1.36 and 1.37 added... [23:55:12] 10Analytics, 10MediaWiki-General: Update pingback "PHP Version" dashboards - https://phabricator.wikimedia.org/T298922 (10Reedy) {F34913719 size=full} This obviously isn't so helpful... Can we start a start date in the graphs objects? We partially want to reverse sort it too... So most recent info is at the... [23:55:14] milimetric: about? :) [23:55:32] hi Reedy, what's up [23:55:42] Couple of dashiki questions wrt ^ [23:55:50] On https://meta.wikimedia.org/wiki/Config:Dashiki:Pingback [23:55:58] ah, yep, will look [23:56:11] milimetric: Can I set a "start date" (and similarly end date) in the "PHP Version (MediaWiki 1.32) Tabular View" graph etc?