[00:15:34] 10Analytics, 10Analytics-Dashiki: Dashiki Cleanup - https://phabricator.wikimedia.org/T168573 (10Milimetric) 05Open→03Declined [00:20:00] 10Analytics: Add edit count to mediawiki history reconstruction for anonymous editors - https://phabricator.wikimedia.org/T225536 (10Milimetric) 05Open→03Declined hasn't come up since this was created [00:21:05] 10Analytics: Update grouped-wiki files for sqoop - https://phabricator.wikimedia.org/T219326 (10Milimetric) 05Open→03Declined Declining in favor of the not quite duplicate T190700 [00:22:53] 10Analytics, 10Patch-For-Review, 10Unplanned-Sprint-Work: [reportupdater] consider not requiring date as a first colum of query/script results - https://phabricator.wikimedia.org/T193174 (10Milimetric) 05Open→03Declined [00:22:57] 10Analytics: reportupdater TLC - https://phabricator.wikimedia.org/T193167 (10Milimetric) [00:24:21] 10Analytics, 10Analytics-EventLogging, 10Performance-Team (Radar), 10Readers-Web-Backlog (Tracking): Make it easier to enable EventLogging's debug mode - https://phabricator.wikimedia.org/T188640 (10Milimetric) 05Open→03Declined cc-ing #product-infrastructure-team-backlog as an FYI [02:24:36] 10Analytics, 10Data-Engineering, 10Event-Platform, 10Platform Engineering, 10tech-decision-forum: MediaWiki Events as Source of Truth - Decision Statement Overview - https://phabricator.wikimedia.org/T291120 (10Ottomata) Thanks Petr! For consitency's sake (;p), let's keep discussions about consistency i... [08:52:58] 10Analytics: reportupdater TLC - https://phabricator.wikimedia.org/T193167 (10awight) [08:53:16] ottomata: thanks for the doc! [08:53:23] 10Analytics, 10Patch-For-Review, 10Unplanned-Sprint-Work: [reportupdater] Add a configurable hive client - https://phabricator.wikimedia.org/T193169 (10awight) 05Declined→03Resolved I would say the work is complete. Migrating format of existing jobs can be left up to the owners, or never. [08:54:34] (03CR) 10DCausse: [C: 03+1] Add user_identifier field to sparql/query [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/735445 (https://phabricator.wikimedia.org/T293462) (owner: 10Ebernhardson) [10:19:12] (03CR) 10Joal: [C: 03+1] "Hacky but ok - The test should work though" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/735444 (https://phabricator.wikimedia.org/T294361) (owner: 10Ottomata) [10:37:23] 10Analytics-Radar, 10Data-Engineering, 10Event-Platform, 10SRE, 10Patch-For-Review: Allow kafka clients to verify brokers hostnames when using SSL - https://phabricator.wikimedia.org/T291905 (10elukey) >>! In T291905#7435799, @jbond wrote: >>>! In T291905#7435523, @jbond wrote: >>>>! In T291905#7431136,... [10:39:51] !log roll restart of kafka-test to pick up new truststore (root PKI added) [10:39:53] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:40:04] (going afk but the cookbook is running) [12:53:56] 10Analytics, 10Patch-For-Review: HiveExtensions.convertToSchema does not properly convert arrays of structs - https://phabricator.wikimedia.org/T259924 (10Ottomata) I can't 100% recall but I believe that is correct. [12:59:29] (03CR) 10Ottomata: talk_page_edit schema (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/731333 (https://phabricator.wikimedia.org/T286076) (owner: 10DLynch) [13:06:32] (03CR) 10Ottomata: Add user_identifier field to sparql/query (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/735445 (https://phabricator.wikimedia.org/T293462) (owner: 10Ebernhardson) [13:29:41] btullis: o/ [13:29:50] Hiya. [13:29:53] yt? thinking of using pontoon to test some an-db puppet stuff i'm working on [13:30:10] you've been setting it up in cloud, yes? how far along is it? can/should I use it? [13:30:33] Ooh, great. Haven't made any progress on it yet, but there is a new horizon project that we're both admins on. [13:31:21] name? I don't see it in my drop down [13:31:37] Sorry, it's kind of been in my next-up for a while. Name: `data-engineering` [13:31:44] Hang on, will check. [13:34:48] ottomata, btullis thanks for the kafka reviews! I am planning to proceed with kafka-test, ok for you?? [13:34:58] * btullis fallen at 2FA hurdle, waiting for backup phone to boot [13:35:25] elukey: +1 [13:35:28] elukey: Yes, fine by me [13:35:32] thanks :) [13:35:32] btullis: hahah yea.>.>...> [13:35:46] at least your phone isn't yubisneezing all over your friends [13:35:53] ccccccvfnunttckecdrvjunrtlvglhtbieeibniderdl [13:36:03] whenever I yubisneeze it kinda looks like I'm mad [13:36:06] and frustrated [13:36:30] and i have to apologize [13:37:07] ahahahhaahh [13:37:38] I wouldn't mind having to use the Yubithingy, but Horizon makes me use the Authenticator app. Right now my personal phone is borked, so I'm sharing an old phone with the kids. It's flat. [13:38:03] i had that issue a few weeks ago [13:38:16] (minus the sharing with kids part) [13:41:27] This would have been quicker to remove 2FA on my account. [13:51:37] ottomta: I have now added you to the data-engineering project. I thought I had done so before, sorry. [14:02:01] (03CR) 10Mforns: Add the SearchSatisfaction legacy schema to the allowlist (035 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/715055 (https://phabricator.wikimedia.org/T274607) (owner: 10MNeisler) [14:06:22] (03CR) 10Mforns: "@sguebo Hi! We have a question around the skin field." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/715055 (https://phabricator.wikimedia.org/T274607) (owner: 10MNeisler) [14:07:15] query jbond [14:07:18] uff :) [14:28:30] thanks btullis, in the meantime am trying pontoon in analytics project [14:28:43] not quite working it seems, there was a recent refactor of some puppet base classes [14:29:15] Cool. Following along in #sre as well. Apologies that I disn't make more progress sooner on this. [15:01:50] (03CR) 10Samuel (WMF): [C: 03+1] Add the SearchSatisfaction legacy schema to the allowlist (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/715055 (https://phabricator.wikimedia.org/T274607) (owner: 10MNeisler) [15:31:15] rolled back the state for kafka-test, need to fix the truststore bit but the rest looks good [15:31:24] going to update the task in a bit [15:32:24] aye [15:33:37] ack. What was the issue with the new truststore? [15:34:04] Did it need the Puppet CA certificate in addition to the new PK one? [15:39:54] (03CR) 10Mforns: [C: 03+1] "Thank you @sguebo!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/715055 (https://phabricator.wikimedia.org/T274607) (owner: 10MNeisler) [16:01:50] btullis: sorry I missed the msg! So all the kafka brokers are now using a truststore that contains only the puppet CA (generated by cergen), and they don't even consider the system cacerts one (when a truststore is set). To make things tidy we should create a new truststore with Puppet CA + Root PKI (maybe also with the kafka intermediate, but probably not needed) and distribute it to all [16:01:56] brokers and clients [16:02:36] at this point, we could flip the TLS settings for one broker at the time, and they should trust each other [16:02:49] Gotcha. Thanks. [16:03:12] going to add all the info the in task [16:06:27] 10Analytics-Radar, 10Data-Engineering, 10Event-Platform, 10SRE, 10Patch-For-Review: Allow kafka clients to verify brokers hostnames when using SSL - https://phabricator.wikimedia.org/T291905 (10elukey) A lot of progresses today with @jbond, here's a summary: * The new keystore contains the intermediate... [16:06:40] done --^ [16:09:41] joal, any thoughts on this? https://gerrit.wikimedia.org/r/c/wikimedia-event-utilities/+/735648 [16:10:42] reading ottomata [16:13:11] ottomata: just sent a comment about the commit message, except from that it looks good [16:15:00] joal: thoughts on the comment convo about auto generating the id? [16:26:14] Hi all, curious if there's a better procedure to submit a superset bug than just going to phabricator.wikimedia.org and clicking "report a software bug" and clearing the template [16:26:23] Also happy friday \o/ [16:27:08] razzi: report to us? i'd just make a phab ticket however you like [16:29:20] cool yeah [16:29:45] this is for the superset timeouts not canceling the query, I'll create a phab ticket and try to find a good parent ticket [16:42:40] ottomata: is an-db-1 you? [16:42:50] Spookreeeno: yes [16:42:58] 10Analytics, 10Data-Engineering, 10Event-Platform, 10Platform Engineering, 10tech-decision-forum: MediaWiki Events as Source of Truth - Decision Statement Overview - https://phabricator.wikimedia.org/T291120 (10bd808) >>! In T291120#7467254, @Ottomata wrote: > But, the problem state does suggest that we... [16:43:15] 16:17:04 (PuppetAgentNoResources) firing: No Puppet resources found on instance an-db-1 on project analytics  - https://prometheus-alerts.wmcloud.org [16:43:38] 10Analytics, 10Data-Engineering, 10Event-Platform, 10Platform Engineering, 10tech-decision-forum: MediaWiki Events as Source of Truth - Decision Statement Overview - https://phabricator.wikimedia.org/T291120 (10Ottomata) > I know this is jumping to "solutioning" but the 'cheap' thing that is as reliable... [16:43:56] that's weird. [16:43:58] i just ran puppet on it [16:44:02] but.. it is using pontoon [16:44:11] so it is not using defaultl cloud puppetmaster [16:45:41] Can't load log handler "java.util.logging.FileHandler" [16:45:41] java.io.FileNotFoundException: /tmp/parquet-2.log (Permission denied) [16:45:48] Not had that before while using hive [16:48:29] razzi: heya [16:48:49] razzi: I have 10 mins before interview - would we talk about the superset /presto timeout? [16:50:26] joal: I can chat here for a minute, yeah [16:50:30] ack batcave? [16:51:02] Let's just chat here [16:51:25] I have a working solution to make presto timeout after a predefined query duration [16:51:37] ok very cool! [16:52:04] Does it work to only trigger a timeout for non-cli sources? [16:52:07] It's not exactly a superset-timeout --> presto query cancellation, but close enough [16:52:31] It's a session setting, so it's defined in the superset database config and only apply to superset [16:53:56] You can find it in the "Extra" tab of the presto_test_joal database on superset: 
"session_props": { "query_max_run_time": "70s" }, [16:55:26] Setting the timeout to 70s for presto makes superset timeout first, and then the presto query fail if it's too long [16:56:18] razzi: shall I let you update the prod DB? [16:57:25] Oh and, I checked that query reaching timeout in presto show up as failed in the system table, so we'll have more info - We could even go for a 55s limit for presto, to enforce a presto timeout before the superset one and ensure we have the query marked as failed [16:58:53] anyone got pointer to how i can increase the heap size of the `hive` cli command? [17:02:21] joal: my q is: do you think event-utilties should auto-gen a meta.id uuid? or is that weird behavior? [17:02:34] hm [17:02:34] addshore: i thikn export HADOOP_HEAPSIZE=... [17:02:37] IIRC [17:02:40] <3 [17:03:20] addshore: for reference: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Queries#Out_of_Memory_Errors_on_Client [17:05:11] ottomata: given the past behavior was to not generate, it would be kinda weird in that regard. But to me best would probably be to generate it if not provided (whether if value is null or if function without id called) [17:07:14] (03CR) 10Ebernhardson: Add user_identifier field to sparql/query (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/735445 (https://phabricator.wikimedia.org/T293462) (owner: 10Ebernhardson) [17:09:41] joal: That's great, I think a sub-queryset timeout makes sense, since superset the superset response time is at least the presto response time [17:10:04] One could even say... the time superset takes is a superset [17:25:36] Making superset a superset works for me razzi :) [17:27:25] * addshore cries slightly at his year old wikidata_entity queries not working any more [17:31:19] joal: I suspect your too close to the end of the european day to look at anything with me? [17:31:40] I'm in meeting now addshore - can spend a minute after [17:31:45] <3 [17:32:09] (03CR) 10Ottomata: Add user_identifier field to sparql/query (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/735445 (https://phabricator.wikimedia.org/T293462) (owner: 10Ebernhardson) [17:32:43] joal ok thanks, we will generate then [17:33:16] ottomata: And actually, should we force schemas to have non-null ids? [17:33:48] as in require meta.id? maybe. [17:34:01] i think there might have been a backwards compat reason we don't do that [17:34:15] ok [17:40:05] btullis: in case you read this - there a very high disk usage on cassandra2 AQS :S [17:40:21] btullis: I'm assuming we maybe store old snapshots in there? [17:45:05] !log set presto_analytics_hive extra parameter engine_params.connect_args.session_props.query_max_run_time to 55s on superset.wikimedia.org [17:45:08] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:45:23] Thanks for the tip joal, the setting is live [17:45:30] \o/ [17:46:15] razzi: would you mind adding lines to the superset wikitech pageand possibly send a message to product-analytics on this? That'd be great :) [18:16:57] Gone for tonight team [18:17:09] Have a good weekend and see you on Wednesday! [18:36:41] 10Analytics, 10Analytics-Jupyter, 10Data-Engineering, 10Product-Analytics: conda list does not show all packages in environment - https://phabricator.wikimedia.org/T294368 (10nshahquinn-wmf) 05Resolved→03Open @odimitrijevic Do you mind if we keep this open, with whatever priority (low or lowest) you th... [18:37:47] (03PS2) 10Ebernhardson: Add user_identifier field to sparql/query [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/735445 (https://phabricator.wikimedia.org/T293462) [19:11:47] (03PS3) 10Ebernhardson: Add performer field to sparql/query [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/735445 (https://phabricator.wikimedia.org/T293462) [19:23:32] 10Analytics, 10Data-Engineering, 10Event-Platform, 10Platform Engineering, 10tech-decision-forum: MediaWiki Events as Source of Truth - Decision Statement Overview - https://phabricator.wikimedia.org/T291120 (10nskaggs) [20:04:08] 10Analytics, 10Infrastructure-Foundations: Netflow data pipeline - https://phabricator.wikimedia.org/T257554 (10Aklapper) [20:15:30] (03CR) 10Ottomata: [C: 03+1] Add performer field to sparql/query [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/735445 (https://phabricator.wikimedia.org/T293462) (owner: 10Ebernhardson) [20:24:12] (03PS4) 10Ebernhardson: Add performer field to sparql/query [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/735445 (https://phabricator.wikimedia.org/T293462) [20:29:50] Oh, I think I am encountering https://issues.apache.org/jira/browse/HIVE-22757 ... [20:30:11] affected version 2.3.6. and seemingly we are on Hive 2.3.6 [20:30:22] oh my rly? [20:30:32] the stack is the same [20:30:47] i spent a while chasing it around thinking I was doing something wrong, but eventually came to this ticket [20:31:01] huh wow [20:31:19] stacktrace is a bit different, but AbstractCollection.addAll(AbstractCollection.java:343) is the same [20:32:10] and the PR isn't merged and a fix is not in a release D: [20:32:30] oh my how are you conjuring it? [20:33:08] I could still be doing something evil, but afaik this query worked a year ago :D [20:33:11] https://www.irccloud.com/pastebin/JvpzF0dG/ [20:33:25] the only change for the new year is an updated snapshot version [20:33:41] I did think it could be something null in the data, but I tried a query ruling out all the null things there too [20:37:01] i think we did upgrade to that version of hive this past year (was it in the spring?) [20:37:55] 10Analytics, 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban, 10Patch-For-Review: Refactor analytics-meta MariaDB layout to multi instance with failover - https://phabricator.wikimedia.org/T284150 (10Ottomata) Ok, I finally got something workable over in https://gerrit.wikimedia.org/r/c... [20:38:51] it being when I last ran it? or? [20:39:23] I dont know enough about hive, but is this just the client version and thus something i can sneak a different version of? ;) or something more complicated [20:39:56] probably something more complicated [20:40:02] addshore: you coudl try spark instead [20:40:27] * addshore will need to read the docs, but thats fine [20:40:28] (it being when we upgraded) [20:40:43] I won't be able to copy pasta the query to something in spark I expect? [20:41:12] probably not exactly, but maybe mostly [20:41:23] i thinkk get_json_object is a hive function [20:41:40] currently googling "how to use spark client" :D jo_al has always told me to use spark, somehow I have never flipped my hive addiction [20:41:47] and your parameterization will be different [20:41:53] addshore: i think you will like it :) [20:42:00] if you want to try just pure sql [20:42:05] you could try spark2-sql [20:42:12] but i suspect you will want to use pyspark2 [20:50:21] nice, query working, now I just gotta figure out how to do more memory here too :D java.lang.OutOfMemoryError: Java heap space [20:50:28] using spark2-sql for now [20:52:02] looks like https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Spark will help! [20:54:51] <3<3<3<3<3 [21:25:26] epic, everything fully working, thanks for the guidance and brain power ottomata ! [22:50:26] 10Analytics-Radar, 10Data-Services, 10cloud-services-team (Kanban): Mitigate breaking changes from the new Wiki Replicas architecture - https://phabricator.wikimedia.org/T280152 (10bd808) [22:58:15] !log deleted old snapshots from aqs1006 and aqs1009 [22:58:19] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [23:04:25] !log deleted all remaining old cassandra snapshots on aqs100x servers. [23:04:27] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log