[06:23:55] 10Analytics: Remove support for the (deprecated) Druid datasources (in favor of Druid Tables) on Superset - https://phabricator.wikimedia.org/T263972 (10elukey) @odimitrijevic it is not, last time that I checked there was some usage of Druid datasources. We should do the following: - review what dashboards are... [06:32:52] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Refresh Druid nodes (druid100[1-3]) - https://phabricator.wikimedia.org/T255148 (10elukey) >>! In T255148#7276687, @BTullis wrote: > > Seems to work a treat. Really nice! Before closing can you update https://wikitech.wikimedia.org/wiki/Analytics... [06:45:40] good morning [06:46:00] an-druid1003's zookeeper metrics seems not showing up, but the exporter works locally, weird [06:47:55] but thanos.wikimedia.org doesn't see them [06:51:24] ahh there you go, these are the targets on prometheus1003 [06:51:26] - labels: [06:51:26] cluster: druid_analytics [06:51:26] zookeeper_cluster: druid-analytics-eqiad [06:51:26] targets: [06:51:28] - an-druid1001:12181 [06:51:31] - an-druid1002:12181 [06:51:33] - druid1003:12181 [06:51:45] puppet disabled, okok makes sense [06:53:33] we'll have to wait for puppet to run [07:08:44] elukey: if it's not immediately obvious don't sweat it, but I could use some help figuring out how to get rsync working from `thorium`->`an-web1001` [07:09:02] `sudo rsync --progress --stats -avzh /srv/ ryankemper@an-web1001.eqiad.wmnet:/srv/` just hangs silently, which I would assume means a firewall [07:09:31] but I've got the equivalent of ``&R_SERVICE(tcp, 873, @resolve((thorium.eqiad.wmnet an-web1001.eqiad.wmnet)));` configured in ferm (see https://gerrit.wikimedia.org/r/c/operations/puppet/+/710371 for patch w/ the rsync module which adds that ferm rule) [07:09:43] so not sure if there's something obvious I might be missing [07:10:06] (you prob already have some context but as a reminder we want to copy over the `/srv/` of thorium over to the new `an-web1001`) [07:20:22] hey ryankemper [07:21:35] so usually what I do is to copy from the host exposing the rsync service/module, starting the rsync command from the target [07:21:52] lemme check the configs though [07:22:28] ok I see /etc/rsync.d/frag-transfer_from_thorium on thorium [07:22:42] so I'd expect to be able to rsync /srv from an-web1001 [07:22:46] have you tried it? [07:30:17] ryankemper: for example [07:30:19] elukey@an-web1001:~$ rsync -avr thorium.eqiad.wmnet::transfer_from_thorium/log . [07:30:23] receiving incremental file list [07:30:25] log/ [07:30:28] sent 32 bytes received 61 bytes 186.00 bytes/sec [07:30:30] this works --^ [07:30:33] total size is 0 speedup is 0.00 [07:30:44] so instead of /log you can use / (since it refers to /srv) [07:30:49] and you should be done [07:31:03] (maybe as root it will be better for perms etc.) [07:31:18] * elukey bbiab [07:31:53] elukey: ah I was trying to run the rsync command I put above on thorium (and without using the module but I think it would still work without it) [07:32:05] so that would make sense :P [07:42:49] ryankemper: :) another thing that is "peculiar" on thorium - IIRC the host rsyncs data from various stat100x hosts, and it runs a script periodically to hardlink everything to look more "host anostic" (for example having paths not mentioning hostnames etc..) [07:43:19] then the result is exposed as https://analytics.wikimedia.org/published/ [07:43:45] when you'll move away from thorium this must be taken into consideration [07:44:25] for example, on stat1005's root crontab there is [07:44:26] # Puppet Name: rsync-published [07:44:26] */15 * * * * /usr/local/bin/published-sync -q [07:44:49] if you see in the script [07:44:49] dest='thorium.eqiad.wmnet::published-destination/stat1005/' [07:45:04] so this pushes from stat1005 to thorium [07:45:14] in a stat1005's dedicated dir [07:45:20] same thing for the others [07:45:38] so rsyncing /srv to an-web1001 is probably something to do multiple times [07:46:06] (say you do it once and people in the meantime try to publish more data etc..) [10:43:48] zookeeper metrics for https://grafana.wikimedia.org/d/000000261/zookeeper?orgId=1&refresh=5m&var-datasource=eqiad%20prometheus%2Fanalytics&var-cluster=druid-analytics-eqiad&var-zookeeper_hosts=All are now up! [10:43:55] (for an-druid1003) [10:45:32] elukey: great. I spotted that. Did you re-enable puppet on prometheus1003 ? [10:46:04] I asked to Filippo (he forgot yesterday) and then the new targets appeared [10:47:30] 👍 [11:01:02] (03PS1) 10Btullis: Change preferred Druid coordinator URL [analytics/refinery] - 10https://gerrit.wikimedia.org/r/712209 (https://phabricator.wikimedia.org/T255148) [11:33:40] (03CR) 10Elukey: "LGTM modulo the extra an- nit (left a comment)" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/712209 (https://phabricator.wikimedia.org/T255148) (owner: 10Btullis) [11:35:19] (03PS2) 10Btullis: Change preferred Druid coordinator URL [analytics/refinery] - 10https://gerrit.wikimedia.org/r/712209 (https://phabricator.wikimedia.org/T255148) [11:54:50] (03CR) 10Elukey: [C: 03+1] Change preferred Druid coordinator URL [analytics/refinery] - 10https://gerrit.wikimedia.org/r/712209 (https://phabricator.wikimedia.org/T255148) (owner: 10Btullis) [12:11:04] (03CR) 10Btullis: Change preferred Druid coordinator URL (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/712209 (https://phabricator.wikimedia.org/T255148) (owner: 10Btullis) [13:03:50] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Refresh Druid nodes (druid100[1-3]) - https://phabricator.wikimedia.org/T255148 (10BTullis) [13:29:11] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Refresh Druid nodes (druid100[1-3]) - https://phabricator.wikimedia.org/T255148 (10BTullis) I notice that the three new hosts are still showing as **staged** in Netbox. Can I just set these to be **Active** manually, or is there another step for t... [13:34:01] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Refresh Druid nodes (druid100[1-3]) - https://phabricator.wikimedia.org/T255148 (10elukey) Manual change of state is ok! [13:39:08] (03PS3) 10David Caro: Add database autocompletion [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/711456 (https://phabricator.wikimedia.org/T287471) [13:39:14] (03CR) 10David Caro: Add database autocompletion (032 comments) [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/711456 (https://phabricator.wikimedia.org/T287471) (owner: 10David Caro) [14:06:22] btullis: remember to open a task to dcops to track the host decommission work [14:06:38] (nice decom job for druid1003 :) [14:13:25] elukey: Thanks. I've created this ticket and I'm about to assign it to DC Ops: https://phabricator.wikimedia.org/T288736 [14:14:21] btullis: awesome job, thanks :) [14:43:29] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Fix default ownership and permissions for Hive managed databases in /user/hive/warehouse - https://phabricator.wikimedia.org/T280175 (10mpopov) [14:44:04] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Refresh Druid nodes (druid100[1-3]) - https://phabricator.wikimedia.org/T255148 (10BTullis) Thanks. I've set them all to active now. [14:45:36] !log btullis@druid1002:/etc/zookeeper/conf$ sudo systemctl stop druid-broker druid-coordinator druid-historical druid-middlemanager druid-overlord [14:45:39] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:46:10] !log btullis@druid1002:/etc/zookeeper/conf$ sudo systemctl disable druid-broker druid-coordinator druid-historical druid-middlemanager druid-overlord [14:46:13] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:13:24] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Refresh Druid nodes (druid100[1-3]) - https://phabricator.wikimedia.org/T255148 (10BTullis) [15:49:54] 10Analytics, 10Infrastructure-Foundations, 10SRE: Import the openjdk8 packages in Bullseye - https://phabricator.wikimedia.org/T287960 (10MoritzMuehlenhoff) 05Open→03Resolved OpenJDK 8u302 has been rebuilt against the bootstrap packages (which were removed) and eventually imported. Resolving this, please... [16:02:16] (03CR) 10David Caro: tox: Add python to the allowlist_externals (031 comment) [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/711134 (owner: 10David Caro) [16:09:01] 10Analytics: LVS in Analytics VLANs - https://phabricator.wikimedia.org/T288750 (10elukey) [16:09:10] got some time to create --^ [16:30:55] 10Analytics, 10 Data-Engineering, 10Data-Engineering-Kanban, 10Epic: Alluxio for Improved Superset Query Performance - https://phabricator.wikimedia.org/T288252 (10odimitrijevic) [16:30:57] 10Analytics: Add a presto query logger - https://phabricator.wikimedia.org/T269832 (10odimitrijevic) [16:31:33] 10Analytics, 10 Data-Engineering: Add a presto query logger - https://phabricator.wikimedia.org/T269832 (10odimitrijevic) [16:32:55] 10Analytics, 10Patch-For-Review: Test Alluxio as cache layer for Presto - https://phabricator.wikimedia.org/T266641 (10odimitrijevic) [16:32:57] 10Analytics, 10 Data-Engineering, 10Data-Engineering-Kanban, 10Epic: Alluxio for Improved Superset Query Performance - https://phabricator.wikimedia.org/T288252 (10odimitrijevic) [16:32:59] 10Analytics, 10Analytics-Kanban: Analytics Presto improvements - https://phabricator.wikimedia.org/T266639 (10odimitrijevic) [16:40:10] 10Analytics, 10 Data-Engineering, 10Patch-For-Review: Test Alluxio as cache layer for Presto - https://phabricator.wikimedia.org/T266641 (10odimitrijevic) [17:14:54] 10Analytics, 10 Data-Engineering, 10Epic: AQS Cassandra 3 Upgrade - https://phabricator.wikimedia.org/T288755 (10odimitrijevic) [17:15:09] 10Analytics, 10 Data-Engineering, 10Epic: AQS Cassandra 3 Upgrade - https://phabricator.wikimedia.org/T288755 (10odimitrijevic) [17:15:11] 10Analytics: Cleanup cassandra keyspaces and host - https://phabricator.wikimedia.org/T278231 (10odimitrijevic) [17:16:29] 10Analytics, 10 Data-Engineering, 10Epic: AQS Cassandra 3 Upgrade - https://phabricator.wikimedia.org/T288755 (10odimitrijevic) [17:16:31] 10Analytics: Cleanup cassandra keyspaces and host - https://phabricator.wikimedia.org/T278231 (10odimitrijevic) [17:17:02] 10Analytics, 10 Data-Engineering, 10Cassandra: Cassandra3 migration for Analytics AQS - https://phabricator.wikimedia.org/T249755 (10odimitrijevic) [17:18:32] 10Analytics, 10 Data-Engineering, 10Cassandra, 10Epic: Cassandra3 migration for Analytics AQS - https://phabricator.wikimedia.org/T249755 (10odimitrijevic) [17:23:15] 10Analytics, 10 Data-Engineering, 10Data-Engineering-Kanban, 10Patch-For-Review: Test Alluxio as cache layer for Presto - https://phabricator.wikimedia.org/T266641 (10odimitrijevic) [17:24:11] 10Analytics, 10 Data-Engineering, 10Epic: AQS Cassandra 3 Upgrade - https://phabricator.wikimedia.org/T288755 (10odimitrijevic) 05Open→03Invalid [17:26:05] 10Analytics, 10 Data-Engineering, 10Cassandra, 10Data-Engineering-Kanban, 10Epic: Cassandra3 migration for Analytics AQS - https://phabricator.wikimedia.org/T249755 (10odimitrijevic) [17:36:51] 10Analytics, 10Analytics-Kanban, 10 Data-Engineering, 10Data-Engineering-Kanban: Wikistats should allow more than one project - https://phabricator.wikimedia.org/T283254 (10odimitrijevic) [17:37:26] 10Analytics, 10Analytics-Kanban, 10 Data-Engineering, 10Data-Engineering-Kanban: Change state to store project as an array - https://phabricator.wikimedia.org/T283624 (10odimitrijevic) [17:37:58] 10Analytics, 10Analytics-Kanban, 10 Data-Engineering, 10Data-Engineering-Kanban, 10Patch-For-Review: Expand Wikiselector to allow more than one wiki - https://phabricator.wikimedia.org/T285050 (10odimitrijevic) [17:46:58] 10Analytics-Clusters, 10Analytics-Kanban, 10 Data-Engineering: Upgrade Matomo to latest upstream - https://phabricator.wikimedia.org/T275144 (10odimitrijevic) [17:53:27] 10Analytics-Clusters, 10Analytics-Kanban, 10 Data-Engineering, 10Data-Engineering-Kanban: Upgrade Matomo to latest upstream - https://phabricator.wikimedia.org/T275144 (10odimitrijevic) [17:58:05] 10Analytics-Clusters, 10Analytics-Kanban, 10 Data-Engineering, 10Data-Engineering-Kanban: Upgrade the Cassandra AQS cluster to Cassandra 3.11 - https://phabricator.wikimedia.org/T255141 (10odimitrijevic) [18:44:46] 10Analytics: actor_signature_per_project_family does not work for apps - https://phabricator.wikimedia.org/T258101 (10odimitrijevic) a:05razzi→03None [18:52:45] 10Analytics, 10 Data-Engineering, 10Data-Engineering-Kanban, 10Patch-For-Review: Test Alluxio as cache layer for Presto - https://phabricator.wikimedia.org/T266641 (10odimitrijevic) a:03BTullis [18:53:36] 10Analytics, 10 Data-Engineering: Add a presto query logger - https://phabricator.wikimedia.org/T269832 (10odimitrijevic) a:03razzi [18:56:04] 10Analytics-Clusters, 10Analytics-Kanban, 10 Data-Engineering, 10Data-Engineering-Kanban: Deploy an-test-coord1002 as a Ganeti VM to facilitate failover testing of analytics coordinator role - https://phabricator.wikimedia.org/T287864 (10odimitrijevic) [18:56:50] 10Analytics, 10 Data-Engineering, 10Data-Engineering-Kanban, 10Epic: Alluxio for Improved Superset Query Performance - https://phabricator.wikimedia.org/T288252 (10odimitrijevic) [18:56:52] 10Analytics-Clusters, 10Analytics-Kanban, 10 Data-Engineering, 10Data-Engineering-Kanban: Deploy an-test-coord1002 as a Ganeti VM to facilitate failover testing of analytics coordinator role - https://phabricator.wikimedia.org/T287864 (10odimitrijevic) [18:57:06] 10Analytics, 10 Data-Engineering: Deploy an-test-presto1002 as a Ganeti VM to test Presto and Alluxio integration - https://phabricator.wikimedia.org/T288766 (10BTullis) [18:58:33] 10Analytics, 10 Data-Engineering: Deploy an-test-launcher1002 as a Ganeti VM to test high-availability of scheduled jobs - https://phabricator.wikimedia.org/T288767 (10BTullis) [19:01:22] 10Analytics, 10 Data-Engineering: Deploy an-test-presto1002 as a Ganeti VM to test Presto and Alluxio integration - https://phabricator.wikimedia.org/T288766 (10BTullis) [19:01:24] 10Analytics, 10 Data-Engineering, 10Data-Engineering-Kanban, 10Epic: Alluxio for Improved Superset Query Performance - https://phabricator.wikimedia.org/T288252 (10BTullis) [19:42:57] (03PS1) 10Fdans: Refactor graphmodel to allow more than one project [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/712547 [19:44:13] (03CR) 10jerkins-bot: [V: 04-1] Refactor graphmodel to allow more than one project [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/712547 (owner: 10Fdans) [20:12:13] 10Analytics: actor_signature_per_project_family does not work for apps - https://phabricator.wikimedia.org/T258101 (10razzi) Hi @Isaac, I was hoping to do this task last year, but I was still getting up to speed and now I can see this isn't really in the purview of SRE (my role), and unfortunately prioritized ag... [23:58:29] 10Analytics-Radar, 10Product-Analytics, 10Growth-Team (Current Sprint), 10MW-1.37-notes (1.37.0-wmf.18; 2021-08-09), 10Patch-For-Review: Add geolocation information to Growth schemas - https://phabricator.wikimedia.org/T287121 (10Etonkovidova) 05Open→03Resolved Checked in production (`wmf.18`) - sch...