[01:27:30] PROBLEM - Check unit status of monitor_refine_event on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit monitor_refine_event https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [07:37:08] !log rerun refine_event for `event`.`mediawiki_content_translation_event` year=2021/month=10/day=10/hour=16 [07:37:11] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [07:57:26] (03CR) 10Gehel: [C: 04-1] "Congratulation on your first Gerrit CR! I have a few comments inline. Mostly me not being entirely sure about the thought process that lea" [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/728654 (owner: 10ODimitrijevic) [08:58:47] 10Analytics-Clusters, 10SRE, 10ops-eqiad: analytics1069 mgmt interface intermittently goes up and down - https://phabricator.wikimedia.org/T291732 (10BTullis) Yes I'm more than happy to help out on this. @Jclark-ctr if you have a suggested time when you'd like to do the work, I'll sort out downtime and shut... [09:06:35] 10Analytics, 10Analytics-Dashiki, 10Analytics-Kanban, 10Data-Engineering, 10Developer-Advocacy (Oct-Dec 2021): https://wmcs-edits.wmflabs.org/ not showing time series data since 2020-12-31 - https://phabricator.wikimedia.org/T292871 (10BTullis) I'm not aware of a conversation to put these logs into Logst... [09:16:04] 10Analytics, 10Analytics-Kanban, 10Data-Engineering: Snapshot and Reload cassandra2 pageview_per_article data table from all 12 instances - https://phabricator.wikimedia.org/T291472 (10BTullis) Thanks @hnowlan - Perhaps we should run nodetool cleanup sequentially on all instances after importing everything.... [09:19:01] 10Analytics, 10Analytics-Kanban, 10Data-Engineering: Snapshot and Reload cassandra2 pageview_per_article data table from all 12 instances - https://phabricator.wikimedia.org/T291472 (10BTullis) Cleared the space from `aqs1010:/srv/cassandra-b/` ` root@aqs1010:/srv/cassandra-b/tmp# df -h /srv/cassandra-a/ /sr... [09:23:22] 10Analytics, 10Analytics-Kanban, 10Data-Engineering: Snapshot and Reload cassandra2 pageview_per_article data table from all 12 instances - https://phabricator.wikimedia.org/T291472 (10BTullis) Fourth snapshot loading operation is under way now. ` ### Moving table data in keyspace local_group_default_T_pagev... [09:45:58] Thank you btullis for the long-loading of cassandra :) [09:47:21] My pleasure. It's good experience at trying to be thorough and accurate over a long period of time, with accurate record keeping and by minimizing potential typing mistakes. [09:55:00] 10Analytics-Radar, 10Data-Engineering, 10Event-Platform: Allow kafka clients to verify brokers hostnames when using SSL - https://phabricator.wikimedia.org/T291905 (10jbond) > I never done it but in theory we should swap this with a .pem file that combines the Puppet CA + Root or intermediate PKI right? (in... [10:24:07] 10Analytics-Radar, 10EventStreams, 10MediaWiki-API, 10MW-1.37-notes (1.37.0-wmf.12; 2021-06-28), 10User-Urbanecm: Add user field to mediawiki/api/request - https://phabricator.wikimedia.org/T285113 (10Urbanecm) 05Open→03Resolved ` spark-sql (default)> select performer from event.mediawiki_api_request... [11:22:17] 10Analytics, 10Analytics-Kanban: Jupyter notebook logs should appear in Logstash - https://phabricator.wikimedia.org/T288348 (10BTullis) I think that I can see why logs aren't appearing in Logstash for this. On each of the stat100x servers we can list the notebooks that have been run since this configuration... [11:38:22] 10Analytics, 10Analytics-Kanban: HDFS check topology alert is currently broken - https://phabricator.wikimedia.org/T292846 (10BTullis) Thanks @elukey - I confirm that your fix of removing the double sudo call seems to have fixed the check. I'll merge and deploy https://gerrit.wikimedia.org/r/c/operations/puppe... [11:52:39] I've merged two small changes to the hadoop masters' configuration, so I'll run the `sre.hadoop.roll-restart-masters` cookbook shortly. [20:25:04] 10Analytics, 10Data-release, 10Privacy Engineering, 10Research, 10Privacy: Apache Beam go prototype code for DP evaluation - https://phabricator.wikimedia.org/T280385 (10Nuria) Love openDP @Htriedman [20:26:07] (03PS1) 10ODimitrijevic: exclude conflicting dependencies [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/730044 [20:34:38] (03CR) 10ODimitrijevic: "Added comments as per suggestion." [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/728654 (owner: 10ODimitrijevic) [20:37:31] 10Analytics-Clusters, 10SRE, 10ops-eqiad: analytics1069 mgmt interface intermittently goes up and down - https://phabricator.wikimedia.org/T291732 (10Jclark-ctr) @BTullis I am available tomorrow morning 2:00 PM UTC. 10AM EST [20:45:46] (03PS2) 10ODimitrijevic: exclude conflicting dependencies [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/728654 [20:48:28] (03PS1) 10ODimitrijevic: remove redundant exclusion [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/730045 [20:50:40] (03Abandoned) 10ODimitrijevic: exclude conflicting dependencies [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/730044 (owner: 10ODimitrijevic) [20:57:17] (03PS1) 10ODimitrijevic: add exclusion comment [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/730047 [21:12:32] RECOVERY - HDFS topology check on an-master1001 is OK: OK https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23HDFS_topology_check [21:34:22] (03PS1) 10GoranSMilovanovic: T283015 [analytics/wmde/WD/WikidataAdHocAnalytics] - 10https://gerrit.wikimedia.org/r/730052 [21:34:39] (03CR) 10GoranSMilovanovic: [V: 03+2 C: 03+2] T283015 [analytics/wmde/WD/WikidataAdHocAnalytics] - 10https://gerrit.wikimedia.org/r/730052 (owner: 10GoranSMilovanovic) [21:38:54] (03PS1) 10GoranSMilovanovic: minor [analytics/wmde/WD/WikidataAdHocAnalytics] - 10https://gerrit.wikimedia.org/r/730053 [21:39:25] (03CR) 10GoranSMilovanovic: [V: 03+2 C: 03+2] minor [analytics/wmde/WD/WikidataAdHocAnalytics] - 10https://gerrit.wikimedia.org/r/730053 (owner: 10GoranSMilovanovic) [21:41:55] (03PS1) 10GoranSMilovanovic: minor [analytics/wmde/WD/WikidataAdHocAnalytics] - 10https://gerrit.wikimedia.org/r/730067 [21:42:44] (03CR) 10GoranSMilovanovic: [V: 03+2 C: 03+2] minor [analytics/wmde/WD/WikidataAdHocAnalytics] - 10https://gerrit.wikimedia.org/r/730067 (owner: 10GoranSMilovanovic) [21:42:56] (03PS1) 10GoranSMilovanovic: minor [analytics/wmde/WD/WikidataAdHocAnalytics] - 10https://gerrit.wikimedia.org/r/730068 [21:43:05] (03CR) 10GoranSMilovanovic: [V: 03+2 C: 03+2] minor [analytics/wmde/WD/WikidataAdHocAnalytics] - 10https://gerrit.wikimedia.org/r/730068 (owner: 10GoranSMilovanovic) [21:45:00] (03PS1) 10GoranSMilovanovic: rm Rproj [analytics/wmde/WD/WikidataAdHocAnalytics] - 10https://gerrit.wikimedia.org/r/730069 [21:45:11] (03CR) 10GoranSMilovanovic: [V: 03+2 C: 03+2] rm Rproj [analytics/wmde/WD/WikidataAdHocAnalytics] - 10https://gerrit.wikimedia.org/r/730069 (owner: 10GoranSMilovanovic) [21:46:09] (03PS1) 10GoranSMilovanovic: rm Rproj [analytics/wmde/WD/WikidataAdHocAnalytics] - 10https://gerrit.wikimedia.org/r/730070 [21:46:20] (03CR) 10GoranSMilovanovic: [V: 03+2 C: 03+2] rm Rproj [analytics/wmde/WD/WikidataAdHocAnalytics] - 10https://gerrit.wikimedia.org/r/730070 (owner: 10GoranSMilovanovic) [21:47:26] (03PS1) 10GoranSMilovanovic: hard rm Rproj [analytics/wmde/WD/WikidataAdHocAnalytics] - 10https://gerrit.wikimedia.org/r/730072 [21:47:36] (03CR) 10GoranSMilovanovic: [V: 03+2 C: 03+2] hard rm Rproj [analytics/wmde/WD/WikidataAdHocAnalytics] - 10https://gerrit.wikimedia.org/r/730072 (owner: 10GoranSMilovanovic)