[06:53:41] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Set up an-web1001 and decommission thorium - https://phabricator.wikimedia.org/T285355 (10elukey) @Ottomata what should we do about the dirs mentioned above? (just trying to confirm if we can drop them or not to avoid a "OH NOESSS" moment :)) [08:28:28] 10Analytics-Radar, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 10wdwb-tech: ApiAction log in data lake doesn't record Wikibase API actions - https://phabricator.wikimedia.org/T174474 (10Addshore) 05Stalled→03Invalid The original case for this ticket can easily be covered by the dataset in `e... [08:34:24] 10Analytics, 10Data-Engineering, 10Patch-For-Review: Upgrade Refinery Jobs to Spark 3 - https://phabricator.wikimedia.org/T291386 (10JAllemandou) >>! In T291386#7366295, @Ottomata wrote: > Hm, I think this task is also about installing and supporting Spark 3 in favor of Spark 2, with the eventual goal of rem... [08:35:38] 10Analytics: Upgrade analytics-hadoop to Spark 3 + scala 2.12 - https://phabricator.wikimedia.org/T291464 (10JAllemandou) [08:35:53] 10Analytics: Upgrade analytics-hadoop to Spark 3 + scala 2.12 - https://phabricator.wikimedia.org/T291464 (10JAllemandou) [08:35:55] 10Analytics, 10Data-Engineering, 10Patch-For-Review: Upgrade Refinery Jobs to Spark 3 - https://phabricator.wikimedia.org/T291386 (10JAllemandou) [08:36:14] 10Analytics: Refine: Use Spark SQL instead of Hive JDBC - https://phabricator.wikimedia.org/T209453 (10JAllemandou) [08:36:16] 10Analytics: Upgrade analytics-hadoop to Spark 3 + scala 2.12 - https://phabricator.wikimedia.org/T291464 (10JAllemandou) [08:39:52] 10Analytics: Analytics-test-hadoop Spark3 package upgrade - https://phabricator.wikimedia.org/T291465 (10JAllemandou) [08:41:03] 10Analytics: Analytics-hadoop Spark3 package upgrade (production) - https://phabricator.wikimedia.org/T291466 (10JAllemandou) [08:41:30] 10Analytics: Analytics-hadoop Spark3 package upgrade (production) - https://phabricator.wikimedia.org/T291466 (10JAllemandou) [08:41:33] 10Analytics: Upgrade analytics-hadoop to Spark 3 + scala 2.12 - https://phabricator.wikimedia.org/T291464 (10JAllemandou) [08:42:44] 10Analytics: Analytics-hadoop Spark3 package upgrade (production) - https://phabricator.wikimedia.org/T291466 (10JAllemandou) [08:42:46] 10Analytics, 10Data-Engineering, 10Patch-For-Review: Upgrade Refinery Jobs to Spark 3 - https://phabricator.wikimedia.org/T291386 (10JAllemandou) [08:46:11] 10Analytics, 10Analytics-Kanban: Check AQS with cassandra (serving + data) - https://phabricator.wikimedia.org/T290068 (10JAllemandou) [08:47:12] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review, 10User-MoritzMuehlenhoff: Improve user experience for Kerberos by creating automatic token renewal service - https://phabricator.wikimedia.org/T268985 (10BTullis) My ticket expired yesterday, but the ticket for @elukey is still valid. Last night... [08:48:26] 10Analytics, 10Analytics-Kanban: Repair and reload all cassandra-2 data tables but the 2 big ones - https://phabricator.wikimedia.org/T291469 (10JAllemandou) [08:49:00] joal: What's your feeling on the next steps with the Cassandra 3 migration? Should we import all of the repaired smaller tables now, so that it is done? [08:50:30] 10Analytics, 10Analytics-Kanban: Repair and reload cassandra2 mediarequest_per_file data table - https://phabricator.wikimedia.org/T291470 (10JAllemandou) [08:51:12] Hi btullis - we've discussed yesterday with folks in the tema, do you wish to spend a minute in batcave for me to summarize? [08:51:37] Yes please. Apologies for missing that discussion. [08:51:46] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review, 10User-MoritzMuehlenhoff: Improve user experience for Kerberos by creating automatic token renewal service - https://phabricator.wikimedia.org/T268985 (10elukey) Tested as well an-test-client1001, and I was able to use the spark-shell without ki... [08:51:50] no problem at all :) [08:51:53] let's batcave [09:18:25] 10Analytics, 10Analytics-Kanban: Snapshot and Reload cassandra2 pageview_per_file data table from all 12 instances - https://phabricator.wikimedia.org/T291472 (10JAllemandou) [09:23:16] 10Analytics, 10Analytics-Kanban: Test snapshot-reload from all instances using pageview-top data table - https://phabricator.wikimedia.org/T291473 (10JAllemandou) [09:23:21] btullis: --^ [09:24:17] 10Analytics, 10Analytics-Kanban: Repair and reload all cassandra-2 data tables but the 2 big ones - https://phabricator.wikimedia.org/T291469 (10JAllemandou) [09:24:58] 10Analytics, 10Analytics-Kanban: Repair and reload cassandra2 mediarequest_per_file data table - https://phabricator.wikimedia.org/T291470 (10JAllemandou) [09:26:12] Great, thanks joal. I'll pick that up and work on it now. [09:26:38] Thanks a lot btullis :) I'm refining the various tasks with steps to be performed [09:27:19] 10Analytics, 10Analytics-Kanban: Test snapshot-reload from all instances using pageview-top data table - https://phabricator.wikimedia.org/T291473 (10BTullis) 05Open→03In progress p:05Triage→03High a:05JAllemandou→03BTullis [09:27:22] 10Analytics, 10Analytics-Kanban: Check AQS with cassandra (serving + data) - https://phabricator.wikimedia.org/T290068 (10BTullis) [09:35:35] 10Analytics, 10Analytics-Kanban: Snapshot and Reload cassandra2 pageview_per_file data table from all 12 instances - https://phabricator.wikimedia.org/T291472 (10JAllemandou) [09:35:37] 10Analytics, 10Analytics-Kanban: Test snapshot-reload from all instances using pageview-top data table - https://phabricator.wikimedia.org/T291473 (10JAllemandou) [10:04:36] 10Analytics, 10Analytics-Kanban: Check AQS with cassandra (serving + data) - https://phabricator.wikimedia.org/T290068 (10BTullis) [10:18:31] 10Analytics, 10Analytics-Kanban: Test snapshot-reload from all instances using pageview-top data table - https://phabricator.wikimedia.org/T291473 (10BTullis) [10:20:02] 10Analytics, 10Analytics-Kanban: Test snapshot-reload from all instances using pageview-top data table - https://phabricator.wikimedia.org/T291473 (10BTullis) I am planning to use the following command from cumin. ` sudo cumin --mode async 'aqs100[5-9].eqiad.wmnet' 'nodetool-a snapshot -t T291473 local_group_... [10:49:35] 10Analytics, 10Analytics-Kanban: Test snapshot-reload from all instances using pageview-top data table - https://phabricator.wikimedia.org/T291473 (10BTullis) That worked and `nodetool-[ab] listsnapshots` shows them correctly with size. ` btullis@cumin1001:~$ sudo cumin --mode async 'aqs100[5-9].eqiad.wmnet' '... [10:49:42] 10Analytics, 10Analytics-Kanban: Test snapshot-reload from all instances using pageview-top data table - https://phabricator.wikimedia.org/T291473 (10BTullis) [10:52:40] 10Analytics, 10Analytics-Kanban: Test snapshot-reload from all instances using pageview-top data table - https://phabricator.wikimedia.org/T291473 (10BTullis) Created the destination directories: ` btullis@cumin1001:~$ sudo cumin --mode async 'aqs100[5-9].eqiad.wmnet' 'mkdir /srv/cassandra-a/tmp' 'mkdir /srv/c... [11:18:00] 10Analytics, 10Analytics-Kanban: Test snapshot-reload from all instances using pageview-top data table - https://phabricator.wikimedia.org/T291473 (10BTullis) I think that for the purposes of this operation, the destination rack for the snapshot is arbitrary, but we must have a 1:1 mapping. So I'll define it h... [11:27:33] (03CR) 10Mforns: [V: 03+2 C: 03+2] "LGTM! Thanks!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/716339 (https://phabricator.wikimedia.org/T281511) (owner: 10MNeisler) [12:09:18] PROBLEM - Check unit status of produce_canary_events on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [12:47:29] (03PS1) 10Jgiannelos: Fix example event for maps/tile_change [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/722604 [12:55:31] Heya btullis [12:56:45] quick note on the loading of pageview-top accross nodes: we need to reload data from aqs100[5689] only - not [47] [12:56:48] btullis: --^ [12:57:46] btullis: the [47] nodes have been repaired, so obviously they contain expected data - here we wish to test with already loaded version of [47], and load new stuff from the rest of the hosts [13:02:54] 10Analytics, 10Analytics-Kanban: Test snapshot-reload from all instances using pageview-top data table - https://phabricator.wikimedia.org/T291473 (10JAllemandou) >>! In T291473#7368626, @BTullis wrote: > I think that for the purposes of this operation, the destination rack for the snapshot is arbitrary, but w... [13:11:57] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Set up an-web1001 and decommission thorium - https://phabricator.wikimedia.org/T285355 (10Ottomata) Oh I think we should drop them, I wasn't sure where the communication was between you and @Milimetric. Let's verify with him and I'll follow up. [13:13:09] 10Analytics, 10Analytics-Kanban: Check home/HDFS leftovers of fdans - https://phabricator.wikimedia.org/T290231 (10elukey) Fran on IRC told me that there is a lot of test data and nothing worth to keep. If people don't see anything worth to backup, I think that we can drop all! [13:13:24] joal: Yes. I'm only doing those hosts aqs100[5-9]. You were vexed by this comment? https://phabricator.wikimedia.org/T291473#7368626 [13:14:25] btullis: [5-9] withtou 6, right? ;)g [13:14:55] btullis: not vexed, but I wanted to be sure we were on the same page :) [13:15:27] Oh yeah, hang on. Without 7 :-) [13:15:44] Yeah, vexed was too strong. Apologies. [13:17:18] So I've created unnecessary snapshot here. I'll delete these to avoid inadvertently transferring and loading them. [13:17:22] https://www.irccloud.com/pastebin/EaAOVd9i/ [13:18:10] you're absolutely right, without 7 - I also pasted in the task "for the golry of history" (to keep archive happy) :) [13:22:21] 10Analytics, 10Analytics-Kanban: Test snapshot-reload from all instances using pageview-top data table - https://phabricator.wikimedia.org/T291473 (10BTullis) :+1: Thanks. I realised that while I correctly excluded aqs1004 from the `nodetool snapshot` commands, I mistakenly included aqs1007. SO I've created sn... [13:26:06] (03CR) 10Ottomata: [C: 03+2] Fix example event for maps/tile_change [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/722604 (owner: 10Jgiannelos) [13:37:39] 10Analytics, 10User-Elukey: Deprecation (if possible) of the #central channel on irc.wikimedia.org - https://phabricator.wikimedia.org/T242712 (10Ottomata) From what I can tell, SULWatcher was the only reason stated so far (did I miss something?) for keeping the #central channel. Is this correct? [13:51:19] (03PS4) 10Joal: Grow mediawiki-history oozie jobs resources [analytics/refinery] - 10https://gerrit.wikimedia.org/r/719111 (https://phabricator.wikimedia.org/T290469) [13:51:28] 10Analytics, 10Analytics-Kanban: Test snapshot-reload from all instances using pageview-top data table - https://phabricator.wikimedia.org/T291473 (10BTullis) Now running the following command on aqs1005 to start the first of the 8 rsync processes. ` root@aqs1005:~# rsync -av --include-from=/root/include_file... [13:51:32] (03CR) 10Joal: Grow mediawiki-history oozie jobs resources (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/719111 (https://phabricator.wikimedia.org/T290469) (owner: 10Joal) [14:00:18] RECOVERY - Check unit status of produce_canary_events on an-launcher1002 is OK: OK: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [14:37:01] 10Analytics-Radar, 10CX-analytics, 10Language-analytics, 10Product-Analytics (Kanban): Add content_translation_event data stream to the sanitization allowlist - https://phabricator.wikimedia.org/T281511 (10MNeisler) 05Open→03Resolved [14:54:17] heya ottomata :] will you be able to come to airflow meeting (in 10 min)? Also, any one in the team interested in talking about airflow repos and dependency management! [14:54:27] a-team ^ [14:55:23] yup i'm there [14:55:34] mforns: i'd be happy to talk, whatcha mean? [14:56:10] I would be interested, but I'm double-booked. [14:56:29] ottomata: the meeting is in 4 minutes [14:56:46] I meant we'll discuss airflow repo and dependency management [14:57:03] np btullis :] [14:58:23] oh great ya i'll be there [14:58:27] 10Analytics, 10Analytics-Kanban: Test snapshot-reload from all instances using pageview-top data table - https://phabricator.wikimedia.org/T291473 (10BTullis) That worked well, so now proceeding to run the command on the remaining 7 snapshots. ` root@aqs1005:/home/btullis# rsync -av --include-from=/root/inclu... [14:58:30] 👍 [15:24:15] 10Analytics-Radar, 10Privacy Engineering, 10WMDE-Analytics-Engineering, 10Wikidata, 10Wikidata Analytics: Privacy Policy Review for Global South Wikidata edits and active editors datasets - https://phabricator.wikimedia.org/T291186 (10Htriedman) Hi @GoranSMilovanovic and @Manuel! My name is Hal — I'm a p... [15:59:04] 10Analytics, 10Event-Platform, 10serviceops: eventgate helm chart should use common_templates _tls_helpers.tpl instead of its own custom copy - https://phabricator.wikimedia.org/T291504 (10Ottomata) [16:42:07] 10Analytics, 10EventStreams: EventStreams doesn't connect to multiple streams - https://phabricator.wikimedia.org/T291505 (10SD0001) [17:51:00] https://downloads.apache.org/kafka/3.0.0/RELEASE_NOTES.html [18:01:14] 10Analytics-Radar, 10Privacy Engineering, 10WMDE-Analytics-Engineering, 10Wikidata, 10Wikidata Analytics: Privacy Policy Review for Global South Wikidata edits and active editors datasets - https://phabricator.wikimedia.org/T291186 (10GoranSMilovanovic) @Htriedman I think @Manuel as a Wikidata Analytic... [19:09:37] 10Analytics-Clusters, 10Analytics-Kanban, 10Cassandra, 10Data-Engineering, 10Patch-For-Review: Set up a testing environment for the AQS Cassandra 3 migration - https://phabricator.wikimedia.org/T257572 (10odimitrijevic) [19:28:09] 10Analytics: Refactor analytics-meta MariaDB layout to multi instance with failover - https://phabricator.wikimedia.org/T284150 (10Jclark-ctr) [20:10:52] 10Analytics, 10Analytics-Kanban, 10Traffic: Review use of realloc in varnishkafka - https://phabricator.wikimedia.org/T287561 (10odimitrijevic) @elukey Thanks for reviewing the patch. Based on your question in the pr it is unclear that there is a specific issue that this PR addresses. > Trying to add some... [20:59:56] 10Analytics-Clusters, 10DC-Ops, 10Data-Engineering, 10SRE, 10ops-eqiad: Q1:(Need By: ASAP) rack/setup/install an-db100[12].eqiad.wmnet - https://phabricator.wikimedia.org/T289632 (10Ottomata) Awesome! Servers in the DC! I can/would work on these boxes ASAP...in case that factors into the priority for t... [21:01:20] 10Analytics-Clusters, 10DC-Ops, 10Data-Engineering, 10SRE, 10ops-eqiad: Q1:(Need By: ASAP) rack/setup/install an-db100[12].eqiad.wmnet - https://phabricator.wikimedia.org/T289632 (10Jclark-ctr) [21:15:07] 10Analytics, 10EventStreams: EventStreams doesn't connect to multiple streams - https://phabricator.wikimedia.org/T291505 (10Ottomata) I don't seem to be able to reproduce. How long did you wait to see page-create events? Perhaps the initial connection streamed you page-deletes before you got any page-create... [23:44:27] 10Analytics-Clusters, 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban, 10Product-Analytics: Upgrade Superset to 1.3 - https://phabricator.wikimedia.org/T288115 (10razzi) Definitely some more work to be done here - for any "end users" seeing this ticket, know that we're aware most things... [23:55:14] 10Analytics, 10Analytics-Kanban, 10Data-Engineering: Add a presto query logger - https://phabricator.wikimedia.org/T269832 (10razzi)