[01:51:21] 10Analytics-Radar, 10Wikipedia-iOS-App-Backlog, 10iOS-app-v6.9-Carp-On-A-Zamboni: Metrics around existing Echo notifications volume - https://phabricator.wikimedia.org/T291663 (10SNowick_WMF) Data from [[ https://meta.wikimedia.org/wiki/Schema:EchoMail | Schema:EchoMail ]] (which logs notifications mailed t... [02:28:37] (03PS1) 10GoranSMilovanovic: currentevents 20211008 [analytics/wmde/WD/WikidataAnalytics] - 10https://gerrit.wikimedia.org/r/727764 [02:28:46] (03CR) 10GoranSMilovanovic: [V: 03+2 C: 03+2] currentevents 20211008 [analytics/wmde/WD/WikidataAnalytics] - 10https://gerrit.wikimedia.org/r/727764 (owner: 10GoranSMilovanovic) [07:12:31] joal: for when around, I have 2 CR for you: [07:12:43] https://gerrit.wikimedia.org/r/c/analytics/gobblin-wmf/+/727463/ [07:12:51] gehel: Hi! I'm actually fighting with gerrit config - would you have a minute for me ? [07:12:52] https://gerrit.wikimedia.org/r/c/analytics/gobblin-wmf/+/727469/ [07:12:57] sure [07:24:34] (03CR) 10Joal: [C: 03+2] "LGTM! Merging" [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/727463 (owner: 10Gehel) [07:25:10] (03CR) 10Joal: [C: 03+2] "LGTM! Merging" [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/727469 (owner: 10Gehel) [07:27:20] (03Merged) 10jenkins-bot: Upgraded to wikimedia-eventutilities 1.0.9 [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/727463 (owner: 10Gehel) [07:27:52] (03Merged) 10jenkins-bot: Removed properties already defined in parent pom.xml [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/727469 (owner: 10Gehel) [08:19:00] (03PS1) 10Gehel: Added basic configuration of log4j for testing. [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/728245 [08:19:44] joal: ^ [08:20:25] reviewing :) [08:34:35] 10Analytics-Radar, 10Data-Engineering, 10Event-Platform: Allow kafka clients to verify brokers hostnames when using SSL - https://phabricator.wikimedia.org/T291905 (10elukey) @jbond we'd basically need to expose/get a TLS certificate for the hostname (signed by the puppet CA would be very nice) on every kafk... [08:48:31] (03CR) 10Joal: [C: 03+2] "LGTM!" [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/728245 (owner: 10Gehel) [08:48:41] Thanks gehel :) [08:51:01] (03Merged) 10jenkins-bot: Added basic configuration of log4j for testing. [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/728245 (owner: 10Gehel) [09:07:25] 10Analytics, 10Analytics-Kanban, 10Data-Engineering: Snapshot and Reload cassandra2 pageview_per_article data table from all 12 instances - https://phabricator.wikimedia.org/T291472 (10BTullis) The second snapshot loading operation completed successfully in 47 hours. ` progress: [/10.64.32.128]0:2618/2618 10... [09:11:19] 10Analytics, 10Release-Engineering-Team: Add Olja as a member of the Analytics team on Gerrit - https://phabricator.wikimedia.org/T292823 (10Gehel) [09:14:45] joal: quick check: are you working on cleaning up warnings on the gobblin-wmf project? Can we split the work? [09:15:15] gehel: I've actually concentrated on cleaning/revamping my last upstream CR [09:15:31] gehel: you can start the cleaning work, I'll join you once the PR is done [09:15:39] Cool! I'll start on the cleanup, going package by package, starting from the bottom [09:17:34] ack - I'll synchronize with you before starting - remember: no cleaning of the copied classes ;) [09:17:37] gehel: --^ [09:17:42] yep [09:19:15] 10Analytics, 10Analytics-Kanban, 10Data-Engineering: Snapshot and Reload cassandra2 pageview_per_article data table from all 12 instances - https://phabricator.wikimedia.org/T291472 (10BTullis) Below is a record of the amount of disk space free at the moment (sorted for clarity): 73% is the highest amount of... [09:24:28] 10Analytics, 10Gerrit-Privilege-Requests, 10Release-Engineering-Team: Add Olja as a member of the Analytics team on Gerrit - https://phabricator.wikimedia.org/T292823 (10hashar) 05Open→03Resolved a:03hashar The [[https://gerrit.wikimedia.org/r/admin/groups/d34747bee94be39cff54b5fda1ae36b575107792 | Ana... [09:24:51] 10Analytics, 10Analytics-Kanban, 10Data-Engineering: Snapshot and Reload cassandra2 pageview_per_article data table from all 12 instances - https://phabricator.wikimedia.org/T291472 (10BTullis) The script is running now. ` ### Moving table data in keyspace local_group_default_T_pageviews_per_article_flat fo... [09:25:58] (03PS1) 10Gehel: Replace Guava's Optional with the standard JDK one. [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/728277 [09:26:31] joal: and my attempt to do it package by package has already failed. Some changes need to span over multiple classes :/ [09:26:48] I'll try to at least commit in small batches so that we can synchronize often [09:27:08] don't worry gehel, I'll ping you when I start and we can sync [09:36:35] (03PS1) 10Gehel: fix sonarlint warnings on the 'writer' package [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/728280 [09:37:16] (03CR) 10jerkins-bot: [V: 04-1] fix sonarlint warnings on the 'writer' package [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/728280 (owner: 10Gehel) [09:38:28] ok, PRs upstream are to my liking - I can start helping on the cleaning [09:38:43] gehel: let's sync when you wish [09:39:26] (03PS2) 10Gehel: fix sonarlint warnings on the 'writer' package [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/728280 [09:39:42] joal: I've pushed 2 CR. I need to get Oscar from school [09:39:48] let's sync this afternoon [09:39:59] +1 gehel - later! [09:40:09] you can build on top of my 2 CR (if they don't look too terrible) [09:40:21] Will review, probably merge, and continue :) [09:41:00] I'm also gonna spend a minute in making slides for the prez - nothing big [09:43:19] hm - my changes in notifications for gerrit hasn't helped :( [09:43:27] will need to look at it agian [09:43:35] elukey: would you have a minute by any chance? [09:46:23] joal: sure [09:46:33] elukey: gerrit question [09:47:08] elukey: I'd like to be assigned as reviewer by default on a project (or at least receive notifications for new patches) - I have not managed to find the way :( [09:47:35] 10Analytics-Radar, 10Data-Engineering, 10Event-Platform: Allow kafka clients to verify brokers hostnames when using SSL - https://phabricator.wikimedia.org/T291905 (10jbond) having a requirement to use the puppet ca means that you cant move to the pki service (cfssl). however it is worth noting that the pki... [09:48:01] joal: no idea, never done it :( [09:48:15] thanks anyway elukey :) [09:49:20] Hang on, I think I know this one. https://www.mediawiki.org/wiki/Git/Reviewers [09:50:11] joal: -^ [09:50:21] reading btullis - thanks a lot! [10:04:58] (03PS2) 10Joal: Replace Guava's Optional with the standard JDK one. [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/728277 (owner: 10Gehel) [10:05:32] (03CR) 10Joal: [C: 03+2] "Merging!" [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/728277 (owner: 10Gehel) [10:07:31] (03Merged) 10jenkins-bot: Replace Guava's Optional with the standard JDK one. [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/728277 (owner: 10Gehel) [10:11:25] (03CR) 10Joal: [C: 03+2] "Merging!" [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/728280 (owner: 10Gehel) [10:34:09] A pleasure. Did it work? [10:37:27] 10Analytics-Radar, 10Data-Engineering, 10Event-Platform: Allow kafka clients to verify brokers hostnames when using SSL - https://phabricator.wikimedia.org/T291905 (10BTullis) I'd tend to agree that the `cfssl` approach that @jbond has outlined would be preferable to using certificates signed by the Puppet C... [12:01:54] 10Analytics-Radar, 10Data-Engineering, 10Event-Platform: Allow kafka clients to verify brokers hostnames when using SSL - https://phabricator.wikimedia.org/T291905 (10jbond) >One potential concern with the cfssl PKI method is that the lifetime of the certificates might be lower than that of the puppet certif... [12:08:39] (03PS3) 10Gehel: fix sonarlint warnings on the 'writer' package [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/728280 [12:09:32] joal: I'm back, want to do a quick sync? [12:30:12] (03PS1) 10Gehel: cleanup of minor SonarQube violations [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/728427 [12:38:49] (03PS1) 10Gehel: cleanup of minor SonarQube violations - kafka package [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/728429 [12:39:13] (03PS2) 10Gehel: cleanup of minor SonarQube violations. [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/728429 [12:58:34] gehel: I'm back as well! [12:58:39] gehel: are you still here? [12:58:44] yep [13:04:00] (03CR) 10Joal: [C: 03+2] "Merging!" [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/728427 (owner: 10Gehel) [13:06:30] (03Merged) 10jenkins-bot: cleanup of minor SonarQube violations [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/728427 (owner: 10Gehel) [13:08:10] (03CR) 10Joal: [C: 03+2] "Merging!" [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/728429 (owner: 10Gehel) [13:10:23] (03Merged) 10jenkins-bot: cleanup of minor SonarQube violations. [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/728429 (owner: 10Gehel) [13:17:04] (03PS1) 10Gehel: Remove Spotbugs warnings. [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/728442 [13:25:57] (03PS1) 10Gehel: Configure SCM in main pom.xml [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/728455 [13:26:46] (03PS2) 10Gehel: Configure SCM in main pom.xml [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/728455 [13:36:58] (03CR) 10Joal: "One question, otherwise all good" [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/728442 (owner: 10Gehel) [13:39:48] (03CR) 10Gehel: Remove Spotbugs warnings. (031 comment) [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/728442 (owner: 10Gehel) [13:41:27] (03CR) 10Joal: [C: 03+2] "Merging!" [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/728442 (owner: 10Gehel) [13:41:57] (03PS1) 10Gehel: minor cleanup of javadoc [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/728458 [13:43:31] (03Merged) 10jenkins-bot: Remove Spotbugs warnings. [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/728442 (owner: 10Gehel) [13:49:19] (03CR) 10Joal: [C: 03+2] "Merging!" [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/728455 (owner: 10Gehel) [13:51:42] (03Merged) 10jenkins-bot: Configure SCM in main pom.xml [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/728455 (owner: 10Gehel) [13:52:44] (03CR) 10Joal: [C: 03+2] "Merging as well!" [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/728458 (owner: 10Gehel) [13:52:53] Wow - what a merge flow :) [13:55:22] (03Merged) 10jenkins-bot: minor cleanup of javadoc [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/728458 (owner: 10Gehel) [14:02:01] joal: prez looks good! nothing to add! [14:09:52] 10Analytics, 10Analytics-Kanban: HDFS check topology alert is currently broken - https://phabricator.wikimedia.org/T292846 (10BTullis) a:03BTullis [14:15:53] (03PS1) 10Gehel: Cleanup checkstyle warnings. [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/728477 [14:16:12] joal: this might be the last one! ^ [14:19:00] (03PS1) 10Gehel: make logging slightly less verbose during tests [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/728479 [14:19:12] looks like I was wrong... here is one more ' [14:40:37] 10Analytics, 10Analytics-Kanban, 10Data-Engineering: Snapshot and Reload cassandra2 pageview_per_article data table from all 12 instances - https://phabricator.wikimedia.org/T291472 (10hnowlan) Just a note - if you're importing tables from two racks, doing a `nodetool cleanup` will probably return a signific... [14:43:51] joal: yesterday at a team meeting I referred to you as "joal" (pronounced like "joel") instead of joseph. the ones who hang out on IRC understood me and everyone else was like "wait, who?" haha [14:46:56] bearloga: I've done that too. 🙂 These double names for people take a while to get used to. [14:56:03] 10Analytics-Radar, 10Data-Engineering, 10Event-Platform: Allow kafka clients to verify brokers hostnames when using SSL - https://phabricator.wikimedia.org/T291905 (10elukey) A couple of notes about clients: * varnishkafka is a good example of non-java client that uses TLS to authenticate from cp nodes to k... [15:00:42] hehe bearloga :) [15:31:48] 10Analytics-Radar, 10Data-Engineering, 10Event-Platform: Allow kafka clients to verify brokers hostnames when using SSL - https://phabricator.wikimedia.org/T291905 (10dpifke) For Python client examples, see the linked patches on T290131. These three services (Navtiming, Coal, and statsv) will potentially ne... [15:53:52] joal: 2 more CR before I call this done! [15:53:59] On it gehel :) [15:54:02] (already published, just needs merging) [15:54:13] No emergency! Enjoy your Friday and come back on Monday! [15:54:50] 10Analytics, 10Analytics-Kanban: HDFS check topology alert is currently broken - https://phabricator.wikimedia.org/T292846 (10BTullis) I can confirm this, but I haven't yet found the reason for it. As an aside, I'm not sure that the logs in the third code block in the description are correct. The log lines re... [15:54:53] (03CR) 10Joal: [C: 03+2] "Merging!" [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/728477 (owner: 10Gehel) [15:55:10] gehel: I'll leave in ~1h, just the time to merge :) [15:55:34] (03CR) 10Joal: [C: 03+2] "Merging!" [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/728479 (owner: 10Gehel) [15:56:10] done gehel - nothing too big to check - I wonder if I'll geet a try at refactoring the methods that needs to [15:56:49] Next step for us is to get rid of the gobblin repo, release and update our scripts [15:57:20] ottomata: you were right in thinking it was feasible, but I needed help ;) [15:59:13] 10Analytics, 10Analytics-Kanban: HDFS check topology alert is currently broken - https://phabricator.wikimedia.org/T292846 (10elukey) Ah yes I copied the wrong block, but the commands executed for the hdfs topology is basically the same! Sorry :) [16:02:53] 10Analytics, 10Analytics-Kanban: HDFS check topology alert is currently broken - https://phabricator.wikimedia.org/T292846 (10BTullis) Not a problem. It's just a bit of a mystery why the correct return code isn't coming back over NRPE. [16:02:58] (03Merged) 10jenkins-bot: Cleanup checkstyle warnings. [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/728477 (owner: 10Gehel) [16:03:01] (03Merged) 10jenkins-bot: make logging slightly less verbose during tests [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/728479 (owner: 10Gehel) [16:04:51] milimetric: heya - would you have aminute? [16:08:46] btullis: ok if I hack one thing on an-master1001? [16:08:55] (not sure if you are already testing something) [16:09:22] I want to check what happens if I change /etc/nagios/nrpe.d/check_check_hdfs_topology.cfg removing the sudo parts etc.. [16:11:34] btullis: doing it :) [16:12:16] no luck [16:22:33] joal: bc? [16:22:37] sure milimetric [16:24:25] elukey: Feel free. [16:24:49] btullis: I may have found the issue, cr incoming so you can let me know [16:30:49] btullis: https://gerrit.wikimedia.org/r/c/operations/puppet/+/728562/ [16:31:33] tried to quickly hack it and I got the CRITICAL on alert1001 [16:35:11] going to try it, in theory if it works we'll see a nice alert in here :) [16:40:02] PROBLEM - HDFS topology check on an-master1001 is CRITICAL: CRITICAL: There is at least one node in the default rack. https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23HDFS_topology_check [16:40:15] \o/ [16:40:32] sigh it has never worked, probably my bad [16:42:18] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: HDFS check topology alert is currently broken - https://phabricator.wikimedia.org/T292846 (10elukey) The change seems to have worked, we are now seeing an alert for topology. In theory after merging + deploying https://gerrit.wikimedia.org/r/c/operations/p... [16:54:17] ACKNOWLEDGEMENT - HDFS topology check on an-master1001 is CRITICAL: CRITICAL: There is at least one node in the default rack. Elukey T292846 https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23HDFS_topology_check [17:01:11] next step is to restart our dear HDFS namenodes after fixing the topology, but we can do it next week :) [17:39:05] 10Analytics, 10Analytics-Kanban, 10SRE: ~1 request/minute to intake-logging.wikimedia.org times out at the traffic/service interface - https://phabricator.wikimedia.org/T264021 (10BBlack) Removing #Traffic for now - although it could get added back if some further investigation indicates our infra is the cau... [18:38:56] 10Analytics, 10SRE: Downloading from Archiva.wikimedia.org seems slower than Maven Central - https://phabricator.wikimedia.org/T273086 (10BBlack) [19:20:51] 10Analytics, 10Analytics-Dashiki, 10Data-Engineering, 10Developer-Advocacy (Oct-Dec 2021): https://wmcs-edits.wmflabs.org/ not showing time series data since 2020-12-31 - https://phabricator.wikimedia.org/T292871 (10bd808) [19:34:06] 10Analytics, 10Analytics-Kanban, 10Traffic: Review use of realloc in varnishkafka - https://phabricator.wikimedia.org/T287561 (10BBlack) The last time I looked at the patches, I was a bit baffled and left it alone. It's not clear that there's any active issue affecting us that this will solve, and these kin... [20:06:49] 10Analytics-Clusters, 10SRE, 10ops-eqiad: analytics1069 mgmt interface intermittently goes up and down - https://phabricator.wikimedia.org/T291732 (10Cmjohnson) a:05Cmjohnson→03Jclark-ctr @BTullis or @razzi please coordinate next week with @Jclark-ctr. @Jclark-ctr this server needs the flea power drained... [20:57:21] (03PS1) 10ODimitrijevic: exclude conflicting dependencies [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/728654 [21:33:29] 10Analytics, 10Analytics-Dashiki, 10Data-Engineering, 10Developer-Advocacy (Oct-Dec 2021): https://wmcs-edits.wmflabs.org/ not showing time series data since 2020-12-31 - https://phabricator.wikimedia.org/T292871 (10Milimetric) I think there was some project about moving these logs to logstash to make it e... [21:36:22] (03PS1) 10Milimetric: Fix union all by selecting the same fields [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/728659 (https://phabricator.wikimedia.org/T292871) [21:37:14] (03CR) 10Milimetric: [V: 03+2 C: 03+2] Fix union all by selecting the same fields [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/728659 (https://phabricator.wikimedia.org/T292871) (owner: 10Milimetric) [21:38:11] 10Analytics, 10Analytics-Dashiki, 10Analytics-Kanban, 10Data-Engineering, and 2 others: https://wmcs-edits.wmflabs.org/ not showing time series data since 2020-12-31 - https://phabricator.wikimedia.org/T292871 (10Milimetric) p:05Triage→03High a:03Milimetric [21:56:29] 10Analytics-Radar, 10Wikipedia-iOS-App-Backlog, 10iOS-app-v6.9-Carp-On-A-Zamboni: Metrics around existing Echo notifications volume - https://phabricator.wikimedia.org/T291663 (10JMinor) Awesome, thanks! [22:15:08] 10Analytics: Presto error in Superset - https://phabricator.wikimedia.org/T292879 (10cchen)