[00:44:13] PROBLEM - Check unit status of monitor_refine_eventlogging_legacy on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit monitor_refine_eventlogging_legacy https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [02:28:36] (03PS2) 10Seddon: Mediasearch_Interaction: Add search_result_page_title field [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/745966 (https://phabricator.wikimedia.org/T297400) (owner: 10Eric Gardner) [03:13:27] 10Analytics, 10Data-Engineering, 10Event-Platform, 10SRE, 10Sustainability (Incident Followup): Pool eventgate-main in both datacenters (active/active) - https://phabricator.wikimedia.org/T296699 (10Ottomata) 05Open→03Resolved a:03Ottomata Yup should be! [03:22:03] (03CR) 10Seddon: [C: 03+2] Mediasearch_Interaction: Add search_result_page_title field [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/745966 (https://phabricator.wikimedia.org/T297400) (owner: 10Eric Gardner) [03:22:47] (03Merged) 10jenkins-bot: Mediasearch_Interaction: Add search_result_page_title field [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/745966 (https://phabricator.wikimedia.org/T297400) (owner: 10Eric Gardner) [03:36:03] PROBLEM - Check unit status of produce_canary_events on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [03:47:27] RECOVERY - Check unit status of produce_canary_events on an-launcher1002 is OK: OK: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [08:48:26] 10Analytics-Radar, 10SRE: Errors for ifup@ens5.service after rebooting Ganeti VMs - https://phabricator.wikimedia.org/T273026 (10akosiaris) This is starting to show up rather frequently, so I am wondering whether it is starting to consume enough time to warrant solving it somehow. Finding the race might prove... [09:02:52] hello folks [09:02:56] https://forge.softwareheritage.org/T2544 is very interesting [09:04:51] in theory we could have puppet executing the command to reload the keystore when it gets renewed [09:08:29] I'll open a task [09:24:39] heya elukey - We're trying to rsync from a stat machine to the an-test-client machine, and can't ( [09:24:58] elukey: do you know how we can try to make that work? [09:25:05] ping btullis as well on this --^ [09:25:40] yeah it is not in the allowed hosts list [09:25:50] is it a one off or something more permanent? [09:25:50] elukey: would it be ok to add it? [09:26:17] elukey: I think it's worth adding it permanently, as we'll be willing to test stuff on the cluster [09:26:36] in theory yes, an-test-client is already something that people can access so it should be ok [09:26:59] there is a hiera list in puppet for the list [09:27:07] (too many 'list') [09:27:14] huhu [09:27:21] there is a hiera data structure in puppet that manages the allowed host list :) [09:27:39] ack :) thanks for that - if you give its name I can try to update it :) [09:28:30] it is called 'statistics_servers' in hieradata/common.yaml [09:28:46] all the hosts in the list can pull from each other [09:29:02] let's try to review case-by-case but should be ok [09:35:11] elukey, btullis : https://gerrit.wikimedia.org/r/c/operations/puppet/+/754869/ please :) [09:36:54] Thanks a lot btullis :) [09:38:28] btullis: Not sure about this - would a run of puppet on an-test-client1001 unlock my rsync, or does the change applies somewehere else? [09:38:30] That's a pleasure. [09:39:30] Which stat machine are you trying to sync from at the moment? [09:39:55] I'm on stat1008, sandra is on stat1004 [09:40:16] btullis: the puppet run that needs to happen is on the sending hosts? [09:40:41] I ran puppet on an-test-client1001 and it didn't apply anything, so I'm running it on both of those sending hosts as we speak. [09:41:44] Yup. All done on those two. [09:43:00] wil try btullis - thanks again [09:48:34] btullis: still failing for me :( Could it be that the puppet patch has an impact on network rules and should be run on a different machine)? [09:49:07] joal: what command are you running? [09:49:13] (and from waht node) [09:49:47] From stat1008: rsync -av commons-20220117-mediainfo.json.bz2 an-test-client1001.eqiad.wmnet::home/joal/ [09:50:02] nono do the opposite [09:50:15] on an-test-client1001 pull from stat1008 [09:50:19] ah - like pull from the client [09:50:21] ok will try [09:50:23] yeah [09:50:52] What he said -^ :-) [09:50:56] :) [09:54:46] hm - no rsync problem, but permission denied - we have progress :) [09:55:34] Interesting. Permission to read, or permission to write? [09:55:47] btullis, elukey: do you know if user-ids should be the same on those hosts [09:55:50] ? [09:56:05] permission to read it seems [09:56:27] Yes, they are all defined in data.yaml so should be identical on all hosts. [09:57:32] meh - interesting - it seems to be working for me, but not for sandra! [09:59:04] --verbose :) [09:59:15] (namely command and result :D) [10:00:20] actually PBCAK - doesn't work for me neither [10:00:42] Do you want to jump into the BC and work it through? [10:01:02] btullis: will run in another meeting - Sandra wil ltry to reach to you I think [10:01:14] OK. [10:01:22] thanks a lot btullis and elukey [10:01:58] My source file is u+rw g+r [10:08:40] RECOVERY - Check unit status of check_webrequest_partitions on an-launcher1002 is OK: OK: Status of the systemd unit check_webrequest_partitions https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [10:08:53] \o/ --^ [11:07:00] 10Analytics-Radar, 10SRE: Errors for ifup@ens5.service after rebooting Ganeti VMs - https://phabricator.wikimedia.org/T273026 (10akosiaris) There is indeed a race condition between `networking.service` and `ifup@ens5.service`. Checked on a couple of VMs that did not exhibit this problem as well as some that di... [11:17:36] 10Analytics-Radar, 10SRE: Errors for ifup@ens5.service after rebooting Ganeti VMs - https://phabricator.wikimedia.org/T273026 (10MoritzMuehlenhoff) >>! In T273026#7627740, @akosiaris wrote: > * Get rid of ifupdown and /etc/network/interfaces and get a proper and modern network interface manager. See T234207. T... [11:27:32] 10Analytics-Radar, 10SRE: Errors for ifup@ens5.service after rebooting Ganeti VMs - https://phabricator.wikimedia.org/T273026 (10akosiaris) >>! In T273026#7627758, @MoritzMuehlenhoff wrote: >>>! In T273026#7627740, @akosiaris wrote: >> * Get rid of ifupdown and /etc/network/interfaces and get a proper and mode... [13:42:33] 10Analytics-Radar, 10Anti-Harassment, 10CheckUser, 10Privacy Engineering, and 4 others: Deal with Google Chrome User-Agent deprecation - https://phabricator.wikimedia.org/T242825 (10JAllemandou) [13:43:09] 10Data-Engineering, 10Data-Engineering-Kanban, 10Airflow: Migrate AQS hourly job - https://phabricator.wikimedia.org/T299398 (10Antoine_Quhen) [13:44:47] 10Data-Engineering, 10Anti-Harassment, 10Privacy Engineering, 10Product-Analytics, and 2 others: Measure user-agent client hints already sent in browsers requests - https://phabricator.wikimedia.org/T299397 (10JAllemandou) Flagging teams already flagged in parent task [13:44:51] 10Data-Engineering, 10Data-Engineering-Kanban, 10Airflow: Migrate AQS hourly job - https://phabricator.wikimedia.org/T299398 (10Antoine_Quhen) [13:47:38] 10Data-Engineering, 10Traffic: VarnishKafka to propagate user agent client hints headers to webrequest - https://phabricator.wikimedia.org/T299401 (10JAllemandou) [13:49:53] 10Data-Engineering: Add user agent client hints to the `webrequest` table - https://phabricator.wikimedia.org/T299402 (10JAllemandou) [14:11:04] 10Analytics, 10Data-Engineering, 10Event-Platform: Automate EventGate validation error reporting - https://phabricator.wikimedia.org/T268027 (10Ottomata) Not specifically, but there is this [[ https://logstash.wikimedia.org/app/dashboards#/view/AXN5OoJu3_NNwgAUlbUT | eventgate validation error dashboard ]],... [14:13:52] 10Analytics, 10Data-Engineering, 10Event-Platform: Automate EventGate validation error reporting - https://phabricator.wikimedia.org/T268027 (10Ottomata) Just checked, and some of the dashboard links had changed, so I updated them in https://wikitech.wikimedia.org/wiki/Event_Platform/Instrumentation_How_To#V... [14:18:17] 10Data-Engineering, 10Data-Engineering-Kanban, 10SRE-swift-storage: Deploy research_poc Swift credidentials to Hadoop - https://phabricator.wikimedia.org/T296945 (10Ottomata) Hm, perhaps, although I'm not sure where. This is sort of a one off. We'd love to have more first class support for exporting to swi... [14:18:26] 10Data-Engineering, 10Data-Engineering-Kanban, 10SRE-swift-storage: Deploy research_poc Swift credidentials to Hadoop - https://phabricator.wikimedia.org/T296945 (10Ottomata) 05Open→03Resolved [14:33:31] 10Data-Engineering, 10SRE: Allow kafka brokers to reload the TLS keystore - https://phabricator.wikimedia.org/T299409 (10elukey) [14:35:25] PROBLEM - Check unit status of produce_canary_events on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [14:46:43] RECOVERY - Check unit status of produce_canary_events on an-launcher1002 is OK: OK: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [15:37:16] 10Analytics, 10Data-Engineering, 10Event-Platform, 10Platform Engineering, 10tech-decision-forum: MediaWiki Event Carried State Transfer - Problem Statement - https://phabricator.wikimedia.org/T291120 (10Ottomata) [15:37:35] 10Analytics, 10Data-Engineering, 10Event-Platform, 10Platform Engineering, 10tech-decision-forum: MediaWiki Event Carried State Transfer - Problem Statement - https://phabricator.wikimedia.org/T291120 (10Ottomata) Ok, going with 'MediaWiki Event Carried State Transfer' as title. [16:11:18] 10Data-Engineering, 10Data-Engineering-Kanban, 10User-razzi: Run Atlas on test cluster - https://phabricator.wikimedia.org/T296670 (10BTullis) I have been doing some more work on this too, given its priority. I'm currently blocked by Kerberos when trying to import metadata from Hive. I've joined the mailing... [17:04:57] 10Analytics, 10Analytics-Kanban, 10Data-Engineering-Kanban, 10wmfdata-python, 10Product-Analytics (Kanban): wmfdata-python's Hive query output includes logspam - https://phabricator.wikimedia.org/T275233 (10Milimetric) This is sort of blocked on {T292699} [17:09:10] 10Analytics, 10Data-Engineering, 10Event-Platform, 10Patch-For-Review, 10Readers-Web-Backlog (Kanbanana-FY-2021-22): WikipediaPortal Event Platform Migration - https://phabricator.wikimedia.org/T282012 (10Jdlrobson) a:05Ottomata→03Jdrewniak Jan can you scope the work involved in this ticket, so we ca... [17:26:03] 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban: Conda's CPPFLAGS may not be correct when pip installing a package that needs c/cpp compilation - https://phabricator.wikimedia.org/T292699 (10Ottomata) a:03Ottomata [17:26:09] 10Data-Engineering, 10Data-Engineering-Kanban, 10Product-Analytics, 10Structured-Data-Backlog, and 3 others: Write an Airflow job converting commons structured data dump to Hive - https://phabricator.wikimedia.org/T299059 (10Snwachukwu) a:03Snwachukwu [17:34:14] 10Analytics, 10Analytics-Kanban, 10Data-Engineering-Kanban, 10wmfdata-python, 10Product-Analytics (Kanban): wmfdata-python's Hive query output includes logspam - https://phabricator.wikimedia.org/T275233 (10nshahquinn-wmf) >>! In T275233#7629084, @Milimetric wrote: > This is sort of blocked on {T292699}... [17:37:15] 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban: Conda's CPPFLAGS may not be correct when pip installing a package that needs c/cpp compilation - https://phabricator.wikimedia.org/T292699 (10nshahquinn-wmf) Once this is installed on the servers, will it automatically take effect within user... [17:56:52] 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban: Conda's CPPFLAGS may not be correct when pip installing a package that needs c/cpp compilation - https://phabricator.wikimedia.org/T292699 (10Ottomata) I believe restarting Jupyter servers will be necessary, but that should be all. [18:10:39] 10Data-Engineering, 10Anti-Harassment, 10Privacy Engineering, 10Product-Analytics, and 2 others: Measure user-agent client hints already sent in browsers requests - https://phabricator.wikimedia.org/T299397 (10kzimmerman) @JAllemandou we're moving this to tracking on our end, but do reach out if there are... [18:50:47] joal: would you have some moments to talk about airflow stuff today? just haven't had a sync in a while and want to get focused [18:51:08] ottomata: in 1-1 in 10 mins, and then I'll stop :S [18:51:12] 10 mins now? [18:51:15] ya [18:51:17] bch [19:29:38] 10Analytics, 10Analytics-Wikistats, 10Product-Analytics: Wikistats pageview data missing counts for Mobile App pageviews on Commons, going back to 2020-11 - https://phabricator.wikimedia.org/T299439 (10SNowick_WMF) [19:39:10] hi all! yet another quick question here... how to I grant privileges to other users for tables I have in my own database on Hive? [19:39:48] specifically for making a Superset dashboard that all NDA users can see, based on tables in my user Hive db [19:39:50] thx in advance!! [19:41:22] AndyRussG: i believe if you hdfs dfs -chgrp and hdfs dfs -chmod the underlying files properly, folks will be able to see them [19:41:38] your table's files are likely in /user/hive/warehouse/.db/ [19:41:52] there is a -R option for both chgrp and chmod [19:42:02] you could chgrp them to analytics-privatedata-users [19:42:15] ottomata: ahh cool thx I'll try that :) I was somehow looking for something SQL-y [19:42:19] and then chmod then g+rx [19:42:41] hive is really just sql metadata mapping on top of files :) [19:42:41] right makes sense :) [19:42:54] if you do show create table [19:43:00] it should show you where the tables file LOCATION is [19:46:13] ah yee [19:53:08] ottomata: thx hmmm it seems everything already is in group analytics-privatedata-users and with permissions r-x for group [19:53:25] for example hadoop fs -ls /user/hive/warehouse/andyrussg.db/lang_country_acccess_method [19:53:39] hmm, the issue then might be who is viewing them [19:54:02] there are several levels of access, for someone to be able to view that data, they need to have a posix account in the analytics-privatedata-users group [19:54:20] not all dashboards use hive (via presto), so not all users that can access superset need posix accounts in analytics-privatedata-users [19:54:42] hmmm [19:54:49] who are you trying to get to access this? [19:54:57] https://wikitech.wikimedia.org/wiki/Analytics/Data_access#Access_Levels [19:55:52] ah I did see that [23:08:35] 10Analytics, 10Product-Analytics (Kanban): Add mediawiki_skin_diff to the allowlist - https://phabricator.wikimedia.org/T287255 (10jwang) 05Open→03Resolved [23:14:19] 10Data-Engineering, 10DC-Ops, 10SRE, 10ops-eqiad: (Need By: TBD) rack/setup/install stat1009 - https://phabricator.wikimedia.org/T299466 (10RobH) [23:14:46] 10Data-Engineering, 10DC-Ops, 10SRE, 10ops-eqiad: (Need By: TBD) rack/setup/install stat1009 - https://phabricator.wikimedia.org/T299466 (10RobH)