[06:36:55] <wikibugs>	 10Analytics: Check home/HDFS leftovers of mholloway-shell - https://phabricator.wikimedia.org/T291353 (10MoritzMuehlenhoff)
[06:59:43] <wikibugs>	 10Analytics, 10Analytics-Kanban: Check home/HDFS leftovers of fdans - https://phabricator.wikimedia.org/T290231 (10elukey) @fdansv holaaaa! Anything worth to keep??
[07:01:41] <wikibugs>	 10Analytics: Check home/HDFS leftovers of kaywong - https://phabricator.wikimedia.org/T291060 (10elukey) ` ====== stat1004 ====== total 0  ====== stat1005 ====== total 4 drwxrwxr-x 6 28580 wikidev 4096 Jul 28 04:42 WikiReliability  ====== stat1006 ====== total 0  ====== stat1007 ====== total 0  ====== stat1008 =...
[07:03:44] <wikibugs>	 10Analytics: Check home/HDFS leftovers of jmads - https://phabricator.wikimedia.org/T290715 (10elukey) @MNovotny_WMF Hi! We are wondering if any file belonging to the old account of Jim Maddock are worth to keep/backup. If you have context could you please review the above?
[07:06:49] <wikibugs>	 10Analytics, 10Performance-Team: Check home/HDFS leftovers of gilles - https://phabricator.wikimedia.org/T290232 (10elukey) @Krinkle ping :) To unblock this task we could either move all the old home dirs under yours (something like /home/krinkle/gilles/etc..) or only some files, and then drop the rest. What d...
[07:53:24] <wikibugs>	 10Analytics-Clusters, 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban, 10Product-Analytics: Upgrade Superset to 1.3 - https://phabricator.wikimedia.org/T288115 (10elukey) Tried to do more tests on this, and in my test presto database settings I didn't tick Security -> Impersonate users...
[08:12:18] <elukey>	 !log remove old /reportcard (password protected, old files from 2012) httpd settings for stats.wikimedia.org
[08:12:20] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[08:19:32] <wikibugs>	 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Set up an-web1001 and decommission thorium - https://phabricator.wikimedia.org/T285355 (10elukey) The following dirs/files are no longer be accessible from stats.wikimedia.org (not sure if anybody really used them in years):  ` elukey@an-web1001:~...
[08:54:03] <btullis>	 joal: Based on current progress, I estimate that his full repair of the Cassandra 2 cluster is going to take a couple of months at the current rate. We might need to look at changing our technique to make this quicker.
[08:54:34] <joal>	 Hi btullis - I had expected that :D
[08:54:57] <btullis>	 The 2nd biggest table took 2 days and 5 hours.
[08:55:01] <btullis>	 https://www.irccloud.com/pastebin/yIarQylu/
[08:55:51] <joal>	 And the biggest one is really biiger than the 2nd biggest, right?
[08:55:57] <btullis>	 But the biggest table (pageviews_per_article_flat) has been running  13 hours and is only around 3% of the way through.
[08:56:11] <joal>	 hm
[08:56:41] <btullis>	 ...and our current draft of how to do it says, we repeat this procedure 4 times in total.
[08:57:17] <joal>	 The thing is, I don't know how cassandra provides us with feeback - could it be that the beginning of the process takes longer (in regard to percent notified)?
[08:59:40] <btullis>	 I don't think so. I think it's linear. I can paste the whole command output as a pastebin if you like.
[08:59:41] <joal>	 hm - the other strategy I can think of is to load 8 dumps - not sure if it'll be faster though
[08:59:51] <joal>	 nah I trust you btullis 
[09:00:57] <joal>	 btullis: given the time it takes, we could go for full-repairs for all-tables-but-1, and take the other approach for the biggest tables?
[09:01:13] <btullis>	 Other options:
[09:01:13] <btullis>	 1) Use the `--partitioner-range option` https://cassandra.apache.org/doc/latest/cassandra/operating/repair.html#other-options to restrict the work of each repair option
[09:01:13] <btullis>	 2) Use more threads on the source.
[09:02:18] <joal>	 btullis: the load has already been relatively high the past 2 days - I'm not sure if adding more threads would be a good idea for the system :S
[09:02:19] <btullis>	 This is useful: https://cassandra.apache.org/doc/latest/cassandra/operating/repair.html#usage-and-best-practices
[09:02:19] <btullis>	 > By default, repair will operate on all token ranges replicated by the node you’re running repair on, which will cause duplicate work if you run it on every node. The -pr flag will only repair the "primary" ranges on a node, so you can repair your entire cluster by running nodetool repair -pr on each node in a single datacenter.
[09:02:54] <joal>	 btullis: we wish to only run repair on 1004 and 1007 - therefore NOT using -per
[09:02:58] <joal>	 -e sorry
[09:05:30] <btullis>	 Yes, I see what you mean. I was just wondering if what they meant by 'datacentre`in that stament was equivalent to the 'rack' that we are using.
[09:06:07] <joal>	 btullis: datacenter is a different concept for cassandra than the rack one
[09:07:01] <joal>	 btullis: cassandra can replicate data accross datacenters, making the DCs hosts different rings (therefore for repair you need to consider a full DC/ring)
[09:08:53] <btullis>	 OK, gotcha. Thanks.
[09:15:09] <btullis>	 Why 8 dumps? Do you think that would contain everything? Wouldn't we need 12 dumps to be sure?
[09:16:09] <joal>	 you're absultely right btullis - 12 dumps needed
[09:16:14] <joal>	 instead of 4
[09:24:15] <btullis>	 We could try parallel `sstableloader` loading of dumps on the destination servers. Would stress the nextwork and the aqs101[1-5] hosts a lot more, but we can't run the repair operations in parallel at all.
[09:33:26] <joal>	 btullis: hm - knowing that there is compation at work I wouldn't do multi-loading - it would stress the system a great deal
[09:50:46] <btullis>	 Agreed, but they're not actually *serving* anything at the moment, so that stress wouldn't necessarily cause any threat to a production service. I mean, we could theoretically take the new servers to the red-line in terms of load, just while the data is being loaded and compacted. I'm not necessarily advocating going that for, just trying to think of ways to reduce this 2 month lag without serious risk.
[09:52:30] <joal>	 makes sense btullis 
[09:52:57] <joal>	 btullis: Could we try by launching repair on hosts for all but-1 ? do we agree on that?
[09:59:44] <btullis>	 Yes. There is no option to exclude a table from the nodetool command, so we would have to sript several `nodetool repair --full` commands per instance (4-7,a-b), each specifying keyspace and tables.
[10:00:48] <btullis>	 Given that we have already repaired one `mediarequest_per_file` and it took 2 days, we can complete the remainder of these repairs in 6 more days.
[10:01:25] <btullis>	 Maybe 7
[10:03:39] <joal>	 right
[10:04:06] <joal>	 I'm in wonder about strategy
[10:04:33] <joal>	 We should in any case go the repair way for the smaller ones (all-but-2)
[10:06:06] <btullis>	 Yes, agreed. Repair all-but-2 tables today.
[10:08:36] <joal>	 ack btullis - thanks for that - I'm gonna do some more thinking around this
[10:09:47] <btullis>	 So are you considering what to do about `mediarequest_per_file`? Choosing between:
[10:09:48] <btullis>	 1) 7 day repair, create 4 snapshots then transfer and load
[10:09:48] <btullis>	 2) create 12 snapshots then transfer and load in parallel (or load sequentially if needed)
[10:10:19] <joal>	 correct btullis - I can't think of any other solution
[10:12:03] <btullis>	 OK. Shall I interrupt the current repair operation on `pageviews_per_article_flat` and concentrate on repairing all of the other small tables? 
[10:13:11] <joal>	 btullis: I think it's wise
[10:16:19] <btullis>	 OK, will do.
[13:03:23] <ottomata>	 o/
[13:03:28] <joal>	  \o
[13:03:34] <btullis>	 Hello.
[13:04:08] <elukey>	 "The A-tea*m's handshake" :D
[13:05:23] <wikibugs>	 10Analytics: Agree on a repository structure for Airflow-related code - https://phabricator.wikimedia.org/T290664 (10Ottomata) airflow-jobs ?
[13:29:59] <wikibugs>	 10Analytics: Agree on a repository structure for Airflow-related code - https://phabricator.wikimedia.org/T290664 (10mforns) > airflow-jobs ? I like
[13:32:11] <mforns>	 hello teammm
[13:36:42] <ottomata>	 helllo!
[13:37:20] <wikibugs>	 10Analytics, 10Analytics-Kanban: Improve Refine bad data handling - https://phabricator.wikimedia.org/T289003 (10Ottomata) Hm, I just tried adding some tests in refinery-source for this, and everywhere I try `is_wmf_domain` gets set to false.  I cannot repro :/
[13:41:44] <btullis>	 All of the following keyspaces haveb now been repaired on aqs1004 and aqs1007:
[13:41:49] <btullis>	 https://www.irccloud.com/pastebin/tgJf8dTn/
[13:42:30] <joal>	 \o/
[13:46:38] <wikibugs>	 10Analytics: Standardize the stats system user uid - https://phabricator.wikimedia.org/T291384 (10Ottomata)
[13:47:09] <wikibugs>	 10Analytics: Standardize the stats system user uid - https://phabricator.wikimedia.org/T291384 (10Ottomata) The stats user is declared in `statistics::user`.
[13:47:53] <wikibugs>	 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Set up an-web1001 and decommission thorium - https://phabricator.wikimedia.org/T285355 (10Ottomata) For reference, here is Dan's response from Slack:  > The old geowiki data has been disabled for years and everyone I know uses geoeditors instead,...
[13:55:13] <ottomata>	 joal:  do we have a task for spark 3?
[13:55:15] <ottomata>	 i don't thikn so, right?
[13:55:24] <joal>	 hm, I think we do let me check
[13:56:05] <joal>	 actually you're right ottomata - we don't!!!
[13:56:09] <ottomata>	 i will make one!
[13:56:14] <ottomata>	 joal: .........can we do it?!
[13:56:19] <joal>	 thank you :)
[13:56:40] <joal>	 Well, I hope we do!!!
[13:56:45] <wikibugs>	 10Analytics: Upgrade to Spark 3 - https://phabricator.wikimedia.org/T291386 (10Ottomata)
[13:57:00] <wikibugs>	 10Analytics: Upgrade to Spark 3 - https://phabricator.wikimedia.org/T291386 (10Ottomata)
[13:57:02] <wikibugs>	 10Analytics: Refine: Use Spark SQL instead of Hive JDBC - https://phabricator.wikimedia.org/T209453 (10Ottomata)
[13:57:09] <wikibugs>	 (03PS6) 10Ottomata: [WIP] Update to spark-3 and scala-2.12 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/656897 (https://phabricator.wikimedia.org/T291386) (owner: 10Joal)
[13:57:35] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] [WIP] Update to spark-3 and scala-2.12 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/656897 (https://phabricator.wikimedia.org/T291386) (owner: 10Joal)
[14:01:20] <joal>	 Gone for kids, back at standup
[14:04:48] <wikibugs>	 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Remove all debian python-* and other user requested packages installed for analytics clients, use conda instead - https://phabricator.wikimedia.org/T275786 (10Ottomata) I'm going to close this task.  Remaining work can be done as part of {T286743}
[14:10:19] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Discovery-Search, 10Patch-For-Review: Publish both shaded and unshaded artifacts from analytics refinery - https://phabricator.wikimedia.org/T217967 (10Ottomata) We still need to update various jobs that use these jars, but that can happen whenever we need to upgrade vers...
[14:14:24] <wikibugs>	 (03PS7) 10Ottomata: [WIP] Update to spark-3 and scala-2.12 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/656897 (https://phabricator.wikimedia.org/T291386) (owner: 10Joal)
[14:48:52] <wikibugs>	 (03PS3) 10MNeisler: Add the content_translation_event stream to the allowlist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/716339 (https://phabricator.wikimedia.org/T281511)
[14:53:56] <wikibugs>	 (03CR) 10MNeisler: Add the content_translation_event stream to the allowlist (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/716339 (https://phabricator.wikimedia.org/T281511) (owner: 10MNeisler)
[14:54:00] <wikibugs>	 (03CR) 10Mforns: "LGTM! I left an indentation comment. Once fixed, will merge! Thanks" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/716339 (https://phabricator.wikimedia.org/T281511) (owner: 10MNeisler)
[14:54:31] <ottomata>	 hmm oh i thought spark 3 was in bigtop
[14:54:33] <ottomata>	 elukey: ^ do you  know?
[14:54:42] <ottomata>	 seems like just 2.4.5?
[14:55:25] <elukey>	 ottomata: it is in the next bigtop that should come out in a few weeks
[14:55:32] <ottomata>	 ohhhh hm
[14:55:33] <ottomata>	 cool
[14:58:00] <wikibugs>	 (03PS4) 10MNeisler: Add the content_translation_event stream to the allowlist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/716339 (https://phabricator.wikimedia.org/T281511)
[15:04:08] <wikibugs>	 (03CR) 10Neil P. Quinn-WMF: [C: 03+1] "Looks good! Thanks, Megan 😊" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/716339 (https://phabricator.wikimedia.org/T281511) (owner: 10MNeisler)
[15:40:15] <wikibugs>	 10Analytics: Refactor analytics-meta MariaDB layout to multi instance with failover - https://phabricator.wikimedia.org/T284150 (10Ottomata)
[16:20:42] <wikibugs>	 10Analytics, 10Analytics-Kanban: Check AQS with cassandra (serving + data) - https://phabricator.wikimedia.org/T290068 (10JAllemandou)
[16:20:46] <wikibugs>	 10Analytics-Clusters, 10Cassandra, 10Data-Engineering, 10Data-Engineering-Kanban, and 2 others: Cassandra3 migration for Analytics AQS - https://phabricator.wikimedia.org/T249755 (10JAllemandou)
[16:36:44] <wikibugs>	 10Analytics, 10Patch-For-Review: Upgrade Refinery Jobs to Spark 3 - https://phabricator.wikimedia.org/T291386 (10odimitrijevic)
[16:37:41] <wikibugs>	 10Analytics, 10Data-Engineering, 10Patch-For-Review: Upgrade Refinery Jobs to Spark 3 - https://phabricator.wikimedia.org/T291386 (10odimitrijevic) p:05Triage→03Medium
[16:38:09] <wikibugs>	 10Analytics, 10Data-Engineering, 10Patch-For-Review: Upgrade Refinery Jobs to Spark 3 - https://phabricator.wikimedia.org/T291386 (10odimitrijevic)
[16:39:04] <wikibugs>	 10Analytics: Check home/HDFS leftovers of mholloway-shell - https://phabricator.wikimedia.org/T291353 (10odimitrijevic) p:05Triage→03High
[16:39:21] <wikibugs>	 10Analytics, 10Analytics-Kanban: Check home/HDFS leftovers of kaywong - https://phabricator.wikimedia.org/T291060 (10odimitrijevic)
[16:39:52] <wikibugs>	 10Analytics, 10Analytics-Kanban: Check home/HDFS leftovers of mholloway-shell - https://phabricator.wikimedia.org/T291353 (10odimitrijevic)
[16:42:33] <wikibugs>	 10Analytics, 10Analytics-Kanban: Check home/HDFS leftovers of jmads - https://phabricator.wikimedia.org/T290715 (10odimitrijevic)
[16:53:03] <joal>	 ottomata: how long before I can borrow a few minutes to talk about the event-s ticket?
[17:23:45] <ottomata>	 oh joal how abou tnow?
[17:23:51] <ottomata>	 or maybe in 5 mins?
[17:36:18] <wikibugs>	 10Analytics, 10Data-Engineering, 10Patch-For-Review: Upgrade Refinery Jobs to Spark 3 - https://phabricator.wikimedia.org/T291386 (10Ottomata) Hm, I think this task is also about installing and supporting Spark 3 in favor of Spark 2, with the eventual goal of removing Spark 2.  This means making sure everyth...
[17:45:28] <wikibugs>	 10Analytics, 10Event-Platform, 10Patch-For-Review: WikipediaPortal Event Platform Migration - https://phabricator.wikimedia.org/T282012 (10Ottomata) @EYener ah so, the ask is different than the steps outlined in T259163.  This task is about making the WikipediaPortal code itself work with Event Platform.  Ri...
[17:46:03] <wikibugs>	 10Analytics, 10Event-Platform, 10Patch-For-Review: WikipediaPortal Event Platform Migration - https://phabricator.wikimedia.org/T282012 (10Ottomata) See also {T262433}
[17:49:47] <wikibugs>	 10Analytics, 10Event-Platform, 10Patch-For-Review: WikipediaPortal Event Platform Migration - https://phabricator.wikimedia.org/T282012 (10EYener) Hi @Ottomata! Actually that would be super helpful. Would you mind picking anything on my calendar that is open and works for you? I'll remove unnecessary events,...
[18:10:32] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] "recheck" [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/721363 (owner: 10Andrew Bogott)
[18:11:18] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Added test_user.py [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/721363 (owner: 10Andrew Bogott)
[18:11:57] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] "recheck" [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/721363 (owner: 10Andrew Bogott)
[18:14:27] <wikibugs>	 (03Merged) 10jenkins-bot: Added test_user.py [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/721363 (owner: 10Andrew Bogott)
[19:00:23] <joal>	 hey ottomata - I went for diner :)
[19:01:02] <joal>	 ottomata: tomorrow it'll be :)
[19:52:32] <wikibugs>	 10Analytics, 10Event-Platform, 10Patch-For-Review: WikipediaPortal Event Platform Migration - https://phabricator.wikimedia.org/T282012 (10Ottomata) @EYener and I met today and we are going to have to sync up with some FRtech team members about this.  @EYener, so any of the tasks listed under 'Schemas produc...
[19:57:06] <ottomata>	 joal:  ahhhh sorry
[20:43:20] <wikibugs>	 10Analytics, 10Performance-Team: Check home/HDFS leftovers of gilles - https://phabricator.wikimedia.org/T290232 (10Krinkle) That's fine yeah, just transfer them all and I'll take care of it.