[02:54:05] 10Analytics: hdfs dfsadmin saveNamespace fails on an-master1001 - https://phabricator.wikimedia.org/T283733 (10razzi) > Historical note: we followed what indicated in https://community.cloudera.com/t5/Community-Articles/Scaling-the-HDFS-NameNode-part-2/ta-p/246681, basically half of the log2(#datanodes) * 20 fig... [03:22:02] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Upgrade the Hadoop masters to Debian Buster - https://phabricator.wikimedia.org/T278423 (10razzi) As @ottomata commented in https://phabricator.wikimedia.org/T283733#7121008, we're going to try putting the cluster in safe mode again and taking a s... [05:34:15] 10Analytics: hdfs dfsadmin saveNamespace fails on an-master1001 - https://phabricator.wikimedia.org/T283733 (10elukey) >>! In T283733#7131036, @razzi wrote: >> Historical note: we followed what indicated in https://community.cloudera.com/t5/Community-Articles/Scaling-the-HDFS-NameNode-part-2/ta-p/246681, basical... [05:45:44] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Upgrade the Hadoop masters to Debian Buster - https://phabricator.wikimedia.org/T278423 (10elukey) >>! In T278423#7131039, @razzi wrote: > As @ottomata commented in https://phabricator.wikimedia.org/T283733#7121008, we're going to try putting the... [06:26:40] Good morning [06:28:29] bonjour Joseph [07:08:10] 10Analytics, 10SRE, 10SRE-Access-Requests: Requesting access to production shell groups for JAnstee - https://phabricator.wikimedia.org/T266249 (10elukey) @JAnstee_WMF do you recall what password you used when creating the ssh key? It may be different from what you have saved, have you tried others? We canno... [09:25:16] 10Analytics, 10LDAP-Access-Requests, 10SRE: Grant Access to Superset/Turnilo for Kgordon - https://phabricator.wikimedia.org/T283057 (10MoritzMuehlenhoff) 05Resolved→03Open Contractors with a foo-ctr@wikimedia.org address should be in cn=wmf, not cn=nda. [11:00:27] 10Analytics, 10Analytics-Kanban: Crunch and delete many old dumps logs - https://phabricator.wikimedia.org/T280678 (10gmodena) Hey @WDoranWMF @Milimetric, I had a chat with @ArielGlenn today; I can help with a one-off analysis, but I'd need to understand needs and scope. Before moving forward let's make sure... [12:43:56] hi teamm! [12:44:52] 10Analytics-Clusters, 10Analytics-Kanban, 10DBA, 10Patch-For-Review: dbstore1004 85% disk space used. - https://phabricator.wikimedia.org/T283125 (10Kormat) [13:28:36] 10Analytics, 10Analytics-Kanban, 10Event-Platform: WMDEBanner* Event Platform Migration - https://phabricator.wikimedia.org/T282562 (10mforns) Ping again? :-) [13:49:18] 10Analytics, 10Platform Team Workboards (Image Suggestion API): Airflow collaborations - https://phabricator.wikimedia.org/T282033 (10Ottomata) [13:52:33] (03PS1) 10Jgreen: fix typo in superset_database_exists.py [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/697972 [14:03:21] 10Analytics, 10LDAP-Access-Requests, 10SRE: Grant Access to Superset/Turnilo for Kgordon - https://phabricator.wikimedia.org/T283057 (10colewhite) 05Open→03Resolved Moved to group cn=wmf. [14:05:11] 10Analytics, 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to production analytics data and cluster for htriedman - https://phabricator.wikimedia.org/T283368 (10Ottomata) @Htriedman I just created your Kerberos principal, you should receive an email asking you to log in and set your p... [14:21:29] 10Analytics, 10Platform Engineering, 10Research: Create airflow instances for Platform Engineering and Research - https://phabricator.wikimedia.org/T284225 (10Ottomata) [14:27:25] 10Analytics, 10Platform Engineering, 10Research: Create airflow instances for Platform Engineering and Research - https://phabricator.wikimedia.org/T284225 (10elukey) @Ottomata what is the plan for the related databases? [14:51:06] cool ok razzi so i'll add a comment about the kafka rule and verfy [14:51:35] from eyeballing the removals, they look fine to me [14:51:44] do you thnk there are any that need to be double checked? [14:52:00] Nothing in the removals stuck out to me [14:52:35] only 1 port change that makes sense [14:52:42] k [14:54:12] commented on ticket [14:54:16] 10Analytics, 10SRE, 10netops: Audit analytics firewall filters - https://phabricator.wikimedia.org/T279429 (10Ottomata) Ok, for the kafka term, we no longer need any logstash hosts. kafka logging cluster used be colocated on a few logstash hosts, but no longer, they are all on kafka-loggingXXXX. This [[ ht... [14:55:27] 10Analytics, 10Platform Engineering, 10Research: Create airflow instances for Platform Engineering and Research - https://phabricator.wikimedia.org/T284225 (10Ottomata) @elukey I think the thing to do for now is create them in analytics meta mariadb instance, and then refactor everything as part of {T284150}... [15:01:58] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Upgrade the Hadoop masters to Debian Buster - https://phabricator.wikimedia.org/T278423 (10Ottomata) > I am ok in setting up maintenance to attempt a saveNamespace, but I don't think we should upgrade until we have fixed this problem. I think the... [15:02:44] 10Analytics, 10Platform Engineering, 10Research: Create airflow instances for Platform Engineering and Research - https://phabricator.wikimedia.org/T284225 (10elukey) In this case make sure to check the max-conns + innodb buffer for meta, I am pretty sure that there may be some tuning to do :) [15:03:31] (03CR) 10Ottomata: [C: 03+1] "Haven't looked at this in forever, if this works then go for it! :)" [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/697972 (owner: 10Jgreen) [15:12:27] 10Analytics, 10Platform Engineering, 10Research: Create airflow instances for Platform Engineering and Research - https://phabricator.wikimedia.org/T284225 (10Ottomata) [15:17:15] 10Analytics, 10Analytics-Kanban: Crunch and delete many old dumps logs - https://phabricator.wikimedia.org/T280678 (10Milimetric) > Would we need to ask a security review for exporting aggregated data out of hadoop? As I understood it this data is just for internal use, so I don't think a review is needed. I... [15:17:34] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Upgrade the Hadoop masters to Debian Buster - https://phabricator.wikimedia.org/T278423 (10elukey) >>! In T278423#7131996, @Ottomata wrote: >> I am ok in setting up maintenance to attempt a saveNamespace, but I don't think we should upgrade until... [15:19:07] elukey: FYI, I just created a new user on an-coord1001 analytics meta [15:19:11] and added grants [15:19:17] they propagated to all replicas [15:20:26] !log created airflow_analytics database and user on an-coord1001 analytics-meta instance - T272973 [15:20:29] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:20:29] T272973: Generalize the current Airflow puppet/scap code to deploy a dedicated Analytics instance - https://phabricator.wikimedia.org/T272973 [15:20:40] ottomata: ah nicer than expected then! [15:20:48] ya! [15:23:25] elukey am looking for list of dbs that are backedup on db1108 [15:23:27] is it jstu all dbs in an instance? [15:23:59] I think that there is a list in puppet, they are not all backed up by default [15:24:47] right trying to find... [15:24:50] modules/profile/templates/mariadb/grants/dumps-eqiad-analytics_meta.sql.erb [15:24:55] this should be the file [15:25:22] ah ha [15:25:23] thank you [15:25:35] I never done it so I think that Jaime needs to be involved [15:25:50] will make a patch and add them as reviewer [15:33:17] the grants etc.. will likely need to be applied manually by data persistence [15:33:32] I think that the file is only there for documentational purposes [15:42:34] 10Analytics, 10I18n, 10RTL: Support right-to-left languages in Wikistats - https://phabricator.wikimedia.org/T251376 (10Milimetric) Thanks for the work, this is great. I think the static nature of the site isn't too much of a problem, we've solved similar problems with bundles and, worst case, some Apache c... [15:46:02] !log add airflow_2.1.0-py3.7-1_amd64.deb to apt.wm.org [15:46:03] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:47:45] 10Analytics, 10Analytics-Wikistats: "Page views by edition of Wikipedia" for each country - https://phabricator.wikimedia.org/T257071 (10Milimetric) > Should I create another feature request for that? Or is this idea too far-fetched? I don't think it's too far-fetched, and I think you should make a task. It'... [15:58:40] heyaa a-team, airflow@analytics up and running on an-launcher1002 [15:58:41] https://wikitech.wikimedia.org/wiki/Analytics/Systems/Airflow#analytics [15:58:45] \o/ [15:58:47] ssh -t -N -L8600:127.0.0.1:8600 an-test-coord1001.eqiad.wmnet [15:58:52] http://localhost:8600 [15:59:54] the above with an-launcher1002 right? [16:01:24] yep looks like it works fine :) [16:01:40] oh whoops [16:01:41] fixing htank you [16:01:58] oh right, wrong paste [16:02:04] ssh -t -N -L8600:127.0.0.1:8600 an-launcher1002.eqiad.wmnet [16:04:28] 10Analytics, 10I18n, 10RTL: Support right-to-left languages in Wikistats - https://phabricator.wikimedia.org/T251376 (10razzi) > I'd like to refactor the CSS and get off of Semantic before we incorporate something like this, though. It's been EOL for a long time and it has lots of security issues that I don'... [16:15:39] 10Analytics, 10SRE, 10SRE-Access-Requests: Requesting access to production shell groups for JAnstee - https://phabricator.wikimedia.org/T266249 (10Dzahn) > debug1: Executing proxy command: exec ssh -a -W stat1006.ulsfo.wmnet:22 janstee@bast4003.wikimedia.org "stat1006.ulsfo.wmnet" in there caught my eye.... [16:16:31] (03PS1) 10Milimetric: Use python 3 http server in README [analytics/dashiki] - 10https://gerrit.wikimedia.org/r/698000 [16:16:45] (03CR) 10Milimetric: [V: 03+2 C: 03+2] Use python 3 http server in README [analytics/dashiki] - 10https://gerrit.wikimedia.org/r/698000 (owner: 10Milimetric) [16:25:47] (03CR) 10Razzi: "The current import works and will import superset_config, so I wouldn't call this a typo. Is there a specific reason to make this change?" [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/697972 (owner: 10Jgreen) [16:33:36] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Upgrade the Hadoop masters to Debian Buster - https://phabricator.wikimedia.org/T278423 (10Ottomata) > I am suggesting to stop changing things until this basic operation works again, just to avoid changing other variables Ok, makes sense > The s... [16:37:46] (03CR) 10Jgreen: "> Patch Set 1:" [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/697972 (owner: 10Jgreen) [16:41:01] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Upgrade the Hadoop masters to Debian Buster - https://phabricator.wikimedia.org/T278423 (10elukey) Yes yes drain + safe mode + save namespace are totally optional, we could simply reimage an-master1002 (preserving /srv) and restarting it right aft... [16:43:46] 10Analytics: Kerberos identity for phuedx - https://phabricator.wikimedia.org/T284096 (10odimitrijevic) p:05Triage→03High a:03razzi [16:44:24] 10Analytics: Kerberos identity for jdl - https://phabricator.wikimedia.org/T284081 (10odimitrijevic) p:05Triage→03High a:03razzi [16:44:37] 10Analytics, 10SRE, 10SRE-Access-Requests: Requesting access to production shell groups for JAnstee - https://phabricator.wikimedia.org/T266249 (10JAnstee_WMF) Thank you both -- @elukey I thought I did, but nothing worked. I am hoping the fix needed that @dzahn spotted will work, I will try that and report... [16:46:23] 10Analytics, 10Analytics-Kanban: Reportupdater SQL jobs failing with Python error - https://phabricator.wikimedia.org/T284074 (10odimitrijevic) p:05Triage→03High a:03mforns [16:48:45] 10Analytics, 10Analytics-Kanban: Reportupdater should stop running a job after some fixed number of failures - https://phabricator.wikimedia.org/T284037 (10odimitrijevic) p:05Triage→03High a:03mforns [16:50:11] 10Analytics-Radar, 10Product-Analytics, 10Product-Data-Infrastructure, 10Language-Team (Language-2021-April-June): All events in the contenttranslationabusefilter data stream failing validation - https://phabricator.wikimedia.org/T283872 (10odimitrijevic) [17:27:36] 10Analytics, 10Product-Analytics, 10Epic: Replace Oozie with better workflow scheduler - https://phabricator.wikimedia.org/T271429 (10Ottomata) [17:27:38] 10Analytics, 10Platform Team Workboards (Image Suggestion API): Airflow collaborations - https://phabricator.wikimedia.org/T282033 (10Ottomata) [17:37:06] https://wikitech.wikimedia.org/wiki/Talk:Analytics/Systems/Cluster/Access [17:37:30] TLDR; hive-jdbc-1.1.0-cdh5.16.1-standalone.jar doesn't exist, can hive-jdbc-2.3.6.jar be used instead? [17:38:04] can/should/whatever [17:38:41] 10Analytics, 10Analytics-Kanban: Data drifts between superset_production on an-coord1001 and db1108 - https://phabricator.wikimedia.org/T279440 (10elukey) Before we pull the trigger, I'd like to run `pt-table-checksum` between an-coord1001 and db1108 (or any equivalent tool) to see if we have data drifts and w... [17:59:49] 10Analytics, 10SRE, 10SRE-Access-Requests: Requesting access to production analytics data and cluster for htriedman - https://phabricator.wikimedia.org/T283368 (10Htriedman) 05Resolved→03Open Hi all — reopening this task so that I can get access to https://superset.wikimedia.org as a Hive GUI. Tagging @... [18:02:40] Reedy: it looks so, also there isn't really any useful info on that page [18:02:52] datagrip was never somethign we supported, i think an analyst just got it to work llong ago [18:02:56] i'm jsut going to delete that page [18:03:35] 10Analytics, 10SRE, 10SRE-Access-Requests: Requesting access to production analytics data and cluster for htriedman - https://phabricator.wikimedia.org/T283368 (10elukey) Added the user to the `wmf` LDAP group. @Htriedman can you retry now? You are probably going to check https://superset.wikimedia.org/super... [18:03:42] haha [18:08:50] 10Analytics, 10SRE, 10SRE-Access-Requests: Requesting access to production analytics data and cluster for htriedman - https://phabricator.wikimedia.org/T283368 (10Htriedman) 05Open→03Resolved It's working perfectly! Thanks so much for the responsiveness. [18:38:47] 10Analytics, 10SRE, 10SRE-Access-Requests: Requesting access to production shell groups for JAnstee - https://phabricator.wikimedia.org/T266249 (10JAnstee_WMF) @elukey the correction to equid host did not resolve the problem and terminal continues to ask for the passphrase. Should I just delete the existing... [18:49:40] 10Analytics, 10SRE, 10SRE-Access-Requests: Requesting access to production shell groups for JAnstee - https://phabricator.wikimedia.org/T266249 (10Dzahn) Would you mind pasting the contents of ` /Users/janstee/.ssh/config`, Jaime? One more thing to try is, try to SSH just to the bastion host directly, and... [19:02:01] 10Analytics, 10SRE, 10SRE-Access-Requests: Requesting access to production shell groups for JAnstee - https://phabricator.wikimedia.org/T266249 (10Dzahn) ` 440601 Jun 2 23:05:38 bast4003 sshd[6968]: Accepted key ED25519 SHA256:plaVmNDA1Ug/00RQCUV2WfIKRDNwP7GLq9NouyMKMJM found at /etc/ssh/userkeys/janstee:1... [20:15:57] 10Analytics, 10SRE, 10SRE-Access-Requests: Requesting access to production shell groups for JAnstee - https://phabricator.wikimedia.org/T266249 (10JAnstee_WMF) @dzahn Here is my config file paste: Host * ForwardAgent no IdentitiesOnly yes Host * AddKeysToAgent yes UseKeychain yes # From https... [20:32:49] 10Analytics, 10SRE, 10SRE-Access-Requests: Requesting access to production shell groups for JAnstee - https://phabricator.wikimedia.org/T266249 (10Dzahn) Ok, thank you. Hmm.. Let's try this: Comment out or temp. remove these lines: ` > Host * > AddKeysToAgent yes > UseKeychain yes ` and then `ssh -i... [20:33:22] 10Analytics, 10SRE, 10SRE-Access-Requests: Requesting access to production shell groups for JAnstee - https://phabricator.wikimedia.org/T266249 (10Dzahn) 05Resolved→03Open [20:33:55] 10Analytics, 10SRE, 10SRE-Access-Requests: Requesting access to production shell groups for JAnstee - https://phabricator.wikimedia.org/T266249 (10Dzahn) If it turns out you need to make new keys, just paste the public part here and ask us to update the repository. [20:57:48] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Generalize the current Airflow puppet/scap code to deploy a dedicated Analytics instance - https://phabricator.wikimedia.org/T272973 (10Ottomata) > We might be able to run Dask on Yarn and use it for remote Airflow executors @JAllemandou I just tried this... [21:12:59] mforns: yt still? [21:15:56] want to check if the hdfs + LocalExecutor thing is still a bug [21:16:05] OH WAIT [21:16:06] https://gerrit.wikimedia.org/r/c/analytics/refinery/+/597623/2/airflow/pyarrow/pyarrow_concurrency_test.py [21:16:08] this is it ya? [21:38:49] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Generalize the current Airflow puppet/scap code to deploy a dedicated Analytics instance - https://phabricator.wikimedia.org/T272973 (10Ottomata) @mforns I just tried your [[ https://gerrit.wikimedia.org/r/c/analytics/refinery/+/597623/2/airflow/pyarrow/py... [21:39:54] 10Analytics, 10SRE, 10SRE-Access-Requests: Requesting access to production shell groups for JAnstee - https://phabricator.wikimedia.org/T266249 (10JAnstee_WMF) I created new keys, please use the following to update the repository: ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIKIp5RxtQOU35h+P/B+MgpSarZJnr73c8aIMBGEa... [21:56:45] 10Analytics: labstore1006 kerberos issue - https://phabricator.wikimedia.org/T284261 (10Bstorm) [21:57:39] 10Analytics: labstore1006 possible kerberos issue - https://phabricator.wikimedia.org/T284261 (10Bstorm) [22:24:24] (03CR) 10Razzi: "> Patch Set 1:" [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/697972 (owner: 10Jgreen) [22:32:13] !log sudo manage_principals.py create phuedx --email_address=phuedx@wikimedia.org [22:32:15] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [22:32:28] !log sudo manage_principals.py create jdl --email_address=jlinehan@wikimedia.org [22:32:30] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [22:38:09] 10Analytics, 10Patch-For-Review: Kerberos identity for phuedx - https://phabricator.wikimedia.org/T284096 (10razzi) 05Open→03Resolved Should be all set [22:38:44] 10Analytics, 10Patch-For-Review: Kerberos identity for jdl - https://phabricator.wikimedia.org/T284081 (10razzi) 05Open→03Resolved Should be all set [23:18:40] 10Analytics, 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to production shell groups for JAnstee - https://phabricator.wikimedia.org/T266249 (10Dzahn) @JAnstee_WMF Alright, I made a patch to replace your key and uploaded it to code review. https://gerrit.wikimedia.org/r/c/operations...