[06:56:00] (03PS47) 10Cyndywikime: Add analytics for Impressions, Success and Abandonment of account creation [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/962569 (https://phabricator.wikimedia.org/T300273) [07:24:25] (03PS48) 10Cyndywikime: Add analytics for Impressions, Success and Abandonment of account creation [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/962569 (https://phabricator.wikimedia.org/T300273) [09:30:05] hello folks! Later on I am planning to rollout https://gerrit.wikimedia.org/r/c/operations/puppet/+/1013566. TL;DR: I am going to change the Cassandra instances' TLS truststore on aqs1010, as test/pre-step for switching to PKI [09:30:18] if the test goes fine I'll file a change for the whole cluster [09:30:40] once all truststores are in place, we'll be able to migrate every instance's TLS cert separately [09:30:49] (Eric knows what I am doing) [09:30:58] lemme know if you have anything against it :) [09:44:37] all good for me elukey - thanks for doing this :) [09:51:02] <3 [09:52:12] Likewise, all good from my point of view elukey. :-) [09:52:42] <3 [10:18:13] (03PS1) 10Kosta Harlan: ip_reputation/score: Add action for account auto creation [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/1017812 (https://phabricator.wikimedia.org/T354597) [11:40:22] (03CR) 10Urbanecm: "generally, looks good. i suggest reordering though." [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/962569 (https://phabricator.wikimedia.org/T300273) (owner: 10Cyndywikime) [12:00:16] !log rebooting stat1011 due to unresponsiveness [12:00:18] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [12:00:33] 14Analytics-Radar, 06Data-Engineering-Icebox, 10[DEPRECATED] wdwb-tech, 10Dumps-Generation, 10Wikidata: Proposal: Generate Wikidata JSON & RDF dumps from Hadoop - https://phabricator.wikimedia.org/T291089#9697079 (10Lydia_Pintscher) [12:01:38] 06Data-Engineering, 06Web-Team-Backlog, 13Patch-For-Review: Update Sample Rates for Metrics Platform Events - https://phabricator.wikimedia.org/T361962#9697087 (10phuedx) I agree that it would be valuable to capture the sampling config for the instrument (not just for this instrument either). Data Products... [12:03:32] 06Data-Engineering, 06Web-Team-Backlog, 13Patch-For-Review: Update Sample Rates for Metrics Platform Events - https://phabricator.wikimedia.org/T361962#9697090 (10phuedx) My preference would be for #3 (Support curation-rule-like rules in the stream config) as it keeps the stream config as the single source o... [12:20:50] 06Data-Engineering, 06Data Products, 06Web-Team-Backlog, 13Patch-For-Review: Update Sample Rates for Metrics Platform Events - https://phabricator.wikimedia.org/T361962#9697129 (10lbowmaker) [14:01:09] 06Data-Engineering, 06Data-Platform, 13Patch-For-Review: Add movement insights group/users to MWH denormalize job alerts - https://phabricator.wikimedia.org/T357472#9697630 (10CodeReviewBot) joal merged https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/642 Update analytics MW... [14:01:56] 06Data-Engineering, 06MediaWiki-Engineering, 06serviceops, 10WMF-JobQueue, and 2 others: Could not enqueue jobs: "Unable to deliver all events: 503: Service Unavailable" - https://phabricator.wikimedia.org/T249745#9697632 (10JMeybohm) We had a bunch of error again today. All of them connection errors to ev... [14:36:53] 10Quarry, 10ChangeProp, 06collaboration-services, 10GitLab, and 10 others: Figure out a plan to move forward with regarding Redis License changes - https://phabricator.wikimedia.org/T360596#9697832 (10joanna_borun) p:05Triage→03Medium [14:40:47] 06Data-Engineering, 06MediaWiki-Engineering, 06serviceops, 10WMF-JobQueue, and 2 others: Could not enqueue jobs: "Unable to deliver all events: 503: Service Unavailable" - https://phabricator.wikimedia.org/T249745#9697842 (10akosiaris) >>! In T249745#9640217, @Ottomata wrote: > >> This doesn't mean that M... [14:45:15] (03PS5) 10Mforns: WIP: Clean up and parameterize SQL code for Common Impact Metrics. [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1016796 (https://phabricator.wikimedia.org/T358681) (owner: 10Xcollazo) [14:47:31] (03CR) 10Mforns: "Changed the code of category_and_media_with_usage_map.hql to use parent_categories, primary_categories and ancestor_categories. Will chang" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1016796 (https://phabricator.wikimedia.org/T358681) (owner: 10Xcollazo) [15:04:07] folks as FYI I am going to rollout the new truststore to all the aqs nodes [15:04:14] worked nicely on aqs1010 [15:05:26] Ack, many thanksI see that this ticket was raised as a result of your earlier testing: T361964 [15:05:27] T361964: (some?) golang-based Cassandra clients do not perform TLS host verification - https://phabricator.wikimedia.org/T361964 [15:06:03] btullis: yeah :( [15:06:09] it will be easier to migrate to PKI [15:06:20] but we'll have to turn on that option eventually [15:07:37] Agreed. I'll try to help out with regard to the AQS 2.0 clients, if I can. [15:07:53] sure! [15:14:34] 06Data-Engineering, 06Data Products, 06Web-Team-Backlog, 13Patch-For-Review: Update Sample Rates for Metrics Platform Events - https://phabricator.wikimedia.org/T361962#9698029 (10KSarabia-WMF) @phuedx Thank you. We can try this. [15:25:27] !log decommissioning dumpsdata1001 [15:25:29] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:43:41] !log decommissioning dumpsdata1002 for T362065 [15:43:48] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:43:49] T362065: decommission dumpsdata1002.eqiad.wmnet - https://phabricator.wikimedia.org/T362065 [16:04:34] 10Quarry, 10ChangeProp, 06collaboration-services, 10GitLab, and 10 others: Figure out a plan to move forward with regarding Redis License changes - https://phabricator.wikimedia.org/T360596#9698336 (10CodeReviewBot) bd808 merged https://gitlab.wikimedia.org/toolforge-repos/wikibugs2/-/merge_requests/28 Re... [16:53:31] new truststore deployed to aqs-codfw, will do eqiad tomorrow (soo many nodesss) [16:54:02] Awesome! Thanks elukey. [16:54:33] btullis: I also found https://gerrit.wikimedia.org/r/c/operations/puppet/+/1017882 while working on it, Luca from the past caused some issues :( [16:55:01] (2018 so I should be excused buuut :) [16:58:43] Ooh, Grafana has been updated. [17:00:39] elukey: Will we still have some Grafana dashboard cleanups to do as a result? [17:01:04] 06Data-Engineering, 06Data Products: Past edits increase in wmf.edit_hourly with every new snapshot - https://phabricator.wikimedia.org/T355182#9698502 (10VirginiaPoundstone) @mpopov thinking through scheduling this and urgency. Does this block anything currently? [17:01:25] 06Data-Engineering, 10Data Products (Data Products Sprint 13): Past edits increase in wmf.edit_hourly with every new snapshot - https://phabricator.wikimedia.org/T355182#9698504 (10VirginiaPoundstone) [17:03:24] Am I right in thinking that, after your change, we would expect to see the aqs cluster show up here? https://grafana.wikimedia.org/d/000000453/cassandra-tables?orgId=1&var-datasource=eqiad%20prometheus%2Fanalytics&var-cluster=&var-node=All&var-keyspace=&var-table=&var-quantile=99p https://usercontent.irccloud-cdn.com/file/B52TL1Sn/image.png [17:16:23] 06Data-Engineering, 06Product-Analytics: Creating a Spark session causes a torrent of log spam - https://phabricator.wikimedia.org/T315024#9698551 (10Arinaigu) Hey folks! I think I found a pretty easy ipython-side solution to this problem: **How to suppress Spark output in Jupyter notebooks: ** 1. In termin... [18:11:41] 10Quarry: refreshing a running query changes favicon from orange to blue - https://phabricator.wikimedia.org/T362101 (10Novem_Linguae) 03NEW [18:25:53] (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-coord1003:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1003:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [20:43:24] 10Quarry: [bug] "Access denied" for quarry database user - https://phabricator.wikimedia.org/T362111 (10bvibber) 03NEW [20:57:31] 10Quarry: [bug] "Access denied" for quarry database user - https://phabricator.wikimedia.org/T362111#9699071 (10rook) I've restarted the deployments. See how it behaves now? [21:34:02] 10Quarry: 14[bug] "Access denied" for quarry database user - 14https://phabricator.wikimedia.org/T362111#9699167 (10bvibber) 05Open→03Resolved a:03bvibber 14Confirmed good now. Thanks! [22:25:53] (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-coord1003:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1003:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [22:30:53] (HiveServerHeapUsage) resolved: Hive Server JVM Heap usage is above 80% on an-coord1003:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1003:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage