[08:50:19] 10Analytics, 10Analytics-Kanban, 10Data-Engineering: Snapshot and Reload cassandra2 pageview_per_article data table from all 12 instances - https://phabricator.wikimedia.org/T291472 (10BTullis) Compactions from previous operations have now completely finished, so it now back to a regular daily pattern. {F34... [11:38:56] 10Analytics, 10Chinese-Sites: Some pageviews data are missing for Oct 21, 2021 - https://phabricator.wikimedia.org/T294193 (10Shizhao) 05Open→03Resolved a:03Shizhao [11:39:18] 10Analytics, 10Chinese-Sites: Some pageviews data are missing for Oct 21, 2021 - https://phabricator.wikimedia.org/T294193 (10Shizhao) 05Resolved→03Open [11:40:06] 10Analytics, 10Chinese-Sites: Some pageviews data are missing for Oct 21, 2021 - https://phabricator.wikimedia.org/T294193 (10Shizhao) a:05Shizhao→03None [12:42:43] 10Analytics, 10Data-Engineering, 10Event-Platform, 10Platform Engineering, 10tech-decision-forum: MediaWiki Events as Source of Truth - Decision Statement Overview - https://phabricator.wikimedia.org/T291120 (10thiemowmde) [12:50:32] joal: I'm ready to start the loading of snapshot 7 of 12 snapshots for `local_group_default_T_pageviews_per_article_flat` - Happy for me to proceed? [12:51:53] !log btullis@aqs1007:~$ sudo nodetool-a clearsnapshot [12:51:56] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:02:22] !log btullis@an-coord1002:~$ sudo systemctl restart hive-server2 hive-metastore [13:02:24] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:14:49] 10Analytics-Kanban, 10Data-Engineering: Purge any Kerberos keytab files that are not managed by puppet - https://phabricator.wikimedia.org/T294124 (10Ottomata) +1 [13:19:04] 10Analytics, 10Event-Platform, 10EventStreams, 10Wikidata, 10Wikidata-Query-Service: Expose rdf-streaming-updater.mutation content through EventStreams - https://phabricator.wikimedia.org/T294133 (10Ottomata) > Here you must consume only one. Should we expose both? We'll need to declare these stream... [13:19:23] 10Analytics, 10Event-Platform, 10EventStreams, 10Wikidata, 10Wikidata-Query-Service: Expose rdf-streaming-updater.mutation content through EventStreams - https://phabricator.wikimedia.org/T294133 (10Ottomata) Oh, we'll also want to add create event schema and add it to the schema repo. [13:21:00] 10Analytics, 10Analytics-Kanban, 10Data-Engineering: SPIKE - Will Hadoop 3 container support help us for Airflow deployment pipelines? - https://phabricator.wikimedia.org/T288247 (10Ottomata) 05Open→03Resolved It won't work for our Hadoop YARN setup though, we'll still encounter the Kerberos barrier. Ro... [13:21:03] 10Analytics, 10Platform Team Workboards (Image Suggestion API): Airflow collaborations - https://phabricator.wikimedia.org/T282033 (10Ottomata) [13:21:35] 10Analytics, 10Data-Engineering, 10Event-Platform, 10Platform Engineering, 10tech-decision-forum: MediaWiki Events as Source of Truth - Decision Statement Overview - https://phabricator.wikimedia.org/T291120 (10thiemowmde) I just responded via the Problem Statement Feedback process, but would like to lea... [13:29:42] 10Analytics, 10Data-Engineering, 10Event-Platform, 10Platform Engineering, 10tech-decision-forum: MediaWiki Events as Source of Truth - Decision Statement Overview - https://phabricator.wikimedia.org/T291120 (10Ottomata) > Which part of this is controversial / of wide impact / needs a shared decision? If... [13:35:24] 10Analytics, 10Data-Engineering, 10Event-Platform, 10Platform Engineering, 10tech-decision-forum: MediaWiki Events as Source of Truth - Decision Statement Overview - https://phabricator.wikimedia.org/T291120 (10Ottomata) > I have a hard time understanding what the exact definition of "MediaWiki data" is.... [13:40:36] (03CR) 10Ottomata: Add wikibase/rdf/update_stream/1.0.0 (033 comments) [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/594098 (owner: 10DCausse) [13:42:09] 10Analytics-Kanban, 10Data-Engineering: Purge any Kerberos keytab files that are not managed by puppet - https://phabricator.wikimedia.org/T294124 (10elukey) +1 [13:51:00] 10Analytics, 10Data-Engineering, 10Desktop Improvements, 10MediaWiki-extensions-WikimediaEvents, 10Readers-Web-Backlog (Kanbanana-FY-2021-22): Add agent_type and access_method to event data - https://phabricator.wikimedia.org/T294246 (10Ottomata) [13:51:14] 10Analytics, 10Data-Engineering, 10Desktop Improvements, 10MediaWiki-extensions-WikimediaEvents, 10Readers-Web-Backlog (Kanbanana-FY-2021-22): Add agent_type and access_method to event data - https://phabricator.wikimedia.org/T294246 (10Ottomata) a:05cjming→03None [13:53:05] (03CR) 10Ottomata: talk_page_event schema (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/731333 (https://phabricator.wikimedia.org/T286076) (owner: 10DLynch) [13:53:48] 10Analytics, 10Analytics-Kanban, 10Data-Engineering: Snapshot and Reload cassandra2 pageview_per_article data table from all 12 instances - https://phabricator.wikimedia.org/T291472 (10BTullis) This is under way now. ` ### Reloading table data in keyspace local_group_default_T_pageviews_per_article_flat from... [13:54:42] I've started the loading of snapshot 7 of 12 for cassandra3 pageview_per_article table. [14:07:20] (03PS8) 10Clare Ming: Add new schema for desktop UI scroll tracking. [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/731156 (https://phabricator.wikimedia.org/T292586) [14:09:06] (03CR) 10Clare Ming: Add new schema for desktop UI scroll tracking. (032 comments) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/731156 (https://phabricator.wikimedia.org/T292586) (owner: 10Clare Ming) [14:19:57] Heya btullis - Please excuse me I completely missed your ping - Thank you for having started the loading [14:26:33] 10Analytics, 10Performance-Team: Check home/HDFS leftovers of gilles - https://phabricator.wikimedia.org/T290232 (10Ottomata) Ok, I copied all *.{py,sql,hql} from stat1007 and stat1004 to @krinkle's homedir on stat1007:/home/krinkle/gilles-homedir-leftovers. Will delete everything else now. [14:31:02] 10Analytics, 10Performance-Team: Check home/HDFS leftovers of gilles - https://phabricator.wikimedia.org/T290232 (10Ottomata) 05Open→03Resolved [14:51:02] 10Analytics, 10Analytics-Kanban: Check home/HDFS leftovers of fdans - https://phabricator.wikimedia.org/T290231 (10Ottomata) Ok, most was straightforward and was removed with: ` sudo cumin 'stat*' 'rm -rf /home/fdans' ` ` sudo -u hdfs kerberos-run-command hdfs hdfs dfs -rm -r /user/fdans ` ` DROP DATABASE f... [14:58:32] 10Analytics, 10Analytics-Kanban: Check home/HDFS leftovers of kaywong - https://phabricator.wikimedia.org/T291060 (10Ottomata) 05Open→03Resolved a:03Ottomata ` sudo -u hdfs kerberos-run-command hdfs hdfs dfs -rm -r /user/kaywong ` Thanks, removed files. [14:59:14] Heya ottomata - have you rerun the failed cassandra job from last Friday? [14:59:22] 10Analytics, 10Analytics-Kanban: Check home/HDFS leftovers of mholloway-shell - https://phabricator.wikimedia.org/T291353 (10jlinehan) >>! In T291353#7448702, @odimitrijevic wrote: > @jlinehan @dr0ptp4kt putting this on your radar again. Thanks, I don't think any of this needs to be kept, you can proceed. [15:05:15] 10Analytics, 10Analytics-Kanban, 10Chinese-Sites: Some pageviews data are missing for Oct 21, 2021 - https://phabricator.wikimedia.org/T294193 (10Ottomata) a:03Ottomata [15:10:48] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Remove all debian python-* and other user requested packages installed for analytics clients, use conda instead - https://phabricator.wikimedia.org/T275786 (10Ottomata) 05Open→03Resolved [15:10:50] 10Analytics, 10Patch-For-Review: Newpytyer python spark kernels - https://phabricator.wikimedia.org/T272313 (10Ottomata) [15:28:13] 10Analytics, 10Event-Platform, 10EventStreams, 10Wikidata, 10Wikidata-Query-Service: Expose rdf-streaming-updater.mutation content through EventStreams - https://phabricator.wikimedia.org/T294133 (10MPhamWMF) p:05Triage→03High [15:30:18] 10Analytics, 10Analytics-Kanban, 10Data-Engineering: Snapshot and Reload cassandra2 pageview_per_article data table from all 12 instances - https://phabricator.wikimedia.org/T291472 (10BTullis) Unfortunately, this failed with a streaming error to all peers. ` ERROR 15:24:06,580 [Stream #28804bf0-35a7-11ec-89... [15:34:45] 10Analytics, 10Data-Engineering, 10Desktop Improvements, 10MediaWiki-extensions-WikimediaEvents, 10Readers-Web-Backlog (Kanbanana-FY-2021-22): Add agent_type and access_method to event data - https://phabricator.wikimedia.org/T294246 (10odimitrijevic) p:05Triage→03Medium [15:34:55] 10Analytics-Clusters, 10DC-Ops, 10SRE, 10ops-eqiad: (Need By: TBD) rack/setup/install an-test-coord1002 - https://phabricator.wikimedia.org/T293938 (10Ottomata) [15:35:38] 10Analytics, 10Analytics-Kanban, 10Chinese-Sites: Some pageviews data are missing for Oct 21, 2021 - https://phabricator.wikimedia.org/T294193 (10odimitrijevic) p:05Triage→03High [15:36:26] 10Analytics-Clusters, 10Analytics-Kanban, 10observability, 10Patch-For-Review: Setup Analytics team in VO/splunk oncall - https://phabricator.wikimedia.org/T273064 (10Ottomata) a:05razzi→03BTullis [15:37:07] 10Analytics, 10Data-Engineering: Results have expired error in Hue - https://phabricator.wikimedia.org/T294144 (10odimitrijevic) @EYener what is the query(ies) that you are running? [15:43:21] 10Analytics, 10Data-Services, 10Privacy Engineering, 10cloud-services-team (Kanban): Increased visibility in wiki-replicas for volunteers fighting vandals - https://phabricator.wikimedia.org/T284944 (10odimitrijevic) a:03odimitrijevic [15:43:47] 10Analytics, 10Data-Services, 10Privacy Engineering, 10cloud-services-team (Kanban): Increased visibility in wiki-replicas for volunteers fighting vandals - https://phabricator.wikimedia.org/T284944 (10odimitrijevic) @sguebo_WMF Is this data visible on the wikis? @nskaggs do you know if this data is in u... [15:47:38] 10Analytics-Clusters, 10Data-Engineering: Create analytics-test-eqiad zookeeper cluster - https://phabricator.wikimedia.org/T289056 (10Ottomata) p:05Triage→03Low Could do this, or allow Analytics VLAN to access test-eqiad ZK cluster. Priority low, unless we need to test a zookeeper specific upgrade and do... [15:48:24] 10Analytics-Clusters, 10Data-Engineering: Deploy an-test-presto1002 as a Ganeti VM to test Presto and Alluxio integration - https://phabricator.wikimedia.org/T288766 (10Ottomata) 05Open→03Declined Decided not to do Alluxio so for now we don't need this. [15:48:26] 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban, 10Epic: Alluxio for Improved Superset Query Performance - https://phabricator.wikimedia.org/T288252 (10Ottomata) [15:49:31] 10Analytics: Create new table for 'referer' aggregated data - https://phabricator.wikimedia.org/T112284 (10odimitrijevic) @Milimetric What is the list of the search engines that we keep the data for? [15:53:48] 10Analytics-Clusters, 10Data-Engineering: Set hive.warehouse.subdir.inherit.perms to false - https://phabricator.wikimedia.org/T291664 (10Ottomata) a:03Ottomata [15:54:19] 10Analytics: Create new table for 'referer' aggregated data - https://phabricator.wikimedia.org/T112284 (10odimitrijevic) p:05Triage→03Medium [15:55:07] 10Analytics-Clusters, 10Analytics-Kanban, 10Data-Engineering: Set hive.warehouse.subdir.inherit.perms to false - https://phabricator.wikimedia.org/T291664 (10Ottomata) p:05Triage→03Medium [15:57:08] 10Analytics, 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban: Write document about making Superset fast enough - https://phabricator.wikimedia.org/T294046 (10odimitrijevic) p:05Triage→03High [15:57:49] 10Analytics: Create new table for 'referer' aggregated data - https://phabricator.wikimedia.org/T112284 (10Milimetric) >>! In T112284#7455411, @odimitrijevic wrote: > @Milimetric What is the list of the search engines that we keep the data for? The current list of search engines: https://gerrit.wikimedia.org/g... [15:58:21] 10Analytics, 10Data-Services, 10Privacy Engineering, 10cloud-services-team (Kanban): Raw IPs of logged-out users disclosed in wiki-replicas - https://phabricator.wikimedia.org/T284948 (10odimitrijevic) [15:59:45] 10Analytics: Check home/HDFS leftovers of tonina - https://phabricator.wikimedia.org/T293676 (10odimitrijevic) p:05Triage→03Medium [16:03:32] !log btullis@an-coord1001:~$ sudo systemctl restart hive-server2 hive-metastore [16:03:35] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:03:55] 10Analytics-Clusters: Enforce authentication and authorization for webrequest_* topics in Kafka jumbo-eqiad cluster - https://phabricator.wikimedia.org/T294264 (10Ottomata) [16:04:34] 10Analytics, 10Event-Platform, 10EventStreams, 10Wikidata, 10Wikidata-Query-Service: Expose rdf-streaming-updater.mutation content through EventStreams - https://phabricator.wikimedia.org/T294133 (10odimitrijevic) p:05High→03Medium [16:04:38] 10Analytics: Enforce authentication and authorization for webrequest_* topics in Kafka jumbo-eqiad cluster - https://phabricator.wikimedia.org/T294264 (10Ottomata) Perhaps this is a Q3 or Q4 task? [16:05:30] 10Analytics: Create new table for 'referer' aggregated data - https://phabricator.wikimedia.org/T112284 (10Milimetric) Side note: the excluded countries are hardcoded here: https://gerrit.wikimedia.org/r/c/analytics/refinery/+/655804/14/oozie/referrer/daily/referrer.hql#36, perhaps we should revisit and use the... [16:22:31] 10Analytics-Radar, 10Product-Analytics, 10wmfdata-python: Upstream relevant parts of wmfdata-python into refinery - https://phabricator.wikimedia.org/T293700 (10odimitrijevic) [16:31:46] joal: razzi hello! [16:31:55] 10Analytics, 10Data-Services, 10Privacy Engineering, 10cloud-services-team (Kanban): Increased visibility in wiki-replicas for volunteers fighting vandals - https://phabricator.wikimedia.org/T284944 (10JJMC89) > Is this data visible on the wikis? All AF modifications are publically listed in [[ https://en... [16:31:57] got hopeful plans to work with both fo you today [16:31:59] whoz ready? [16:32:17] i got an interview starting in 2.5 hours [16:32:21] am free until then [16:33:32] ottomata: I have meetings until your interview time! meh [16:33:55] ottomata: if you manage to get razzi with you today, I shall have time available in your morning tomorrow [16:34:05] 10Analytics, 10Analytics-Kanban, 10Data-Engineering: Snapshot and Reload cassandra2 pageview_per_article data table from all 12 instances - https://phabricator.wikimedia.org/T291472 (10BTullis) Ah, I found this from the docs: > Because sstableloader uses the streaming protocol, it requires a direct connecti... [16:35:02] joal i have 1 hour in my morning tomorrow free! 2 hours before standup tomorrow, ping me then! [16:35:39] ack ottomata - actually sending an inveite [16:36:35] cooo [16:37:47] 10Analytics-Radar, 10Product-Analytics (Kanban), 10Wikipedia-Android-App-Backlog (Android Release FY2021-22): What percentage of app editors are IP editors? - https://phabricator.wikimedia.org/T291866 (10SNowick_WMF) [16:45:44] 10Analytics, 10Analytics-Kanban: Check home/HDFS leftovers of mholloway-shell - https://phabricator.wikimedia.org/T291353 (10Ottomata) Done: ` sudo cumin 'C:profile::analytics::cluster::client or C:profile::hadoop::master or C:profile::hadoop::master::standby' 'rm -rf /home/mholloway-shell' ` ` DROP DATABASE... [16:46:05] 10Analytics-Radar, 10Event-Platform, 10Metrics-Platform, 10Product-Analytics: Draft of full process for instrumentation using new client libraries - https://phabricator.wikimedia.org/T275694 (10kzimmerman) Moving to blocked on our board until we get information about where this falls on Metrics Platform ro... [17:14:23] 10Analytics, 10Analytics-Kanban: Check home/HDFS leftovers of fdans - https://phabricator.wikimedia.org/T290231 (10Ottomata) Talked with Data Eng team, we determined we can just remove these. Done. ` sudo -u hdfs kerberos-run-command hdfs hdfs dfs -rm -r /user/hive/warehouse/geowiki_archive_monthly_edits_cou... [17:15:14] 10Analytics, 10Analytics-Kanban: Check home/HDFS leftovers of fdans - https://phabricator.wikimedia.org/T290231 (10Ottomata) Also needed to remove homedir from other hadoop nodes too: ` sudo cumin 'C:profile::analytics::cluster::client or C:profile::hadoop::master or C:profile::hadoop::master::standby' 'rm -r... [17:15:30] 10Analytics, 10Analytics-Kanban: Check home/HDFS leftovers of fdans - https://phabricator.wikimedia.org/T290231 (10Ottomata) 05Open→03Resolved a:03Ottomata [17:18:40] 10Analytics, 10Analytics-Kanban, 10Chinese-Sites: Some pageviews data are missing for Oct 21, 2021 - https://phabricator.wikimedia.org/T294193 (10Ottomata) Hi, I reran this job, and I believe it succeeded, and Cat is showing pageviews now. Does this look right to you? If so, can we resolve? [17:28:06] 10Analytics, 10Analytics-Kanban: Kerberos identity for cicalese - https://phabricator.wikimedia.org/T293850 (10Ottomata) a:05razzi→03Ottomata Done, @CCicalese_WMF you should have an email with instructions. [17:29:50] 10Analytics, 10Analytics-Kanban: Kerberos request ticket for Naray-ctr - https://phabricator.wikimedia.org/T293814 (10Ottomata) As far as I can tell, this was done as part of {T293810}. Merging this in as a duplicate. Please follow up there if you have trouble with access. [17:30:05] 10Analytics, 10Analytics-Kanban: Kerberos request ticket for Naray-ctr - https://phabricator.wikimedia.org/T293814 (10Ottomata) [17:30:35] 10Analytics, 10Analytics-Kanban: Kerberos request ticket for Naray-ctr - https://phabricator.wikimedia.org/T293814 (10Ottomata) a:05razzi→03Ottomata [17:34:57] 10Analytics, 10User-razzi: Presto error in Superset - https://phabricator.wikimedia.org/T292879 (10Ottomata) a:05razzi→03Ottomata Hi @JAnstee_WMF, that is a strange error indeed! As far as I can tell you have all the right permissions. I can't reproduce from here. Can you ping me on IRC or on Slack and... [17:38:53] 10Analytics: Check home/HDFS leftovers of tonina - https://phabricator.wikimedia.org/T293676 (10Ottomata) a:03Ottomata [17:39:27] 10Analytics: Check home/HDFS leftovers of tonina - https://phabricator.wikimedia.org/T293676 (10Ottomata) ` ====== stat1004 ====== total 0 ====== stat1005 ====== total 0 ====== stat1006 ====== total 139536 -rw-rw-r-- 1 18640 wikidev 141 Oct 23 2018 query_banner_closed_event.sql -rw-rw-r-- 1 18640 wikide... [17:39:50] 10Analytics: Check home/HDFS leftovers of tonina - https://phabricator.wikimedia.org/T293676 (10Ottomata) @WMDE-leszek can we remove all of the data listed above? [17:40:03] 10Analytics, 10Data-Engineering, 10Desktop Improvements, 10MediaWiki-extensions-WikimediaEvents, 10Readers-Web-Backlog (Kanbanana-FY-2021-22): Add agent_type and access_method to event data - https://phabricator.wikimedia.org/T294246 (10Jdlrobson) a:03Ottomata [17:43:17] 10Analytics, 10Analytics-Kanban: Kerberos request ticket for Naray-ctr - https://phabricator.wikimedia.org/T293814 (10Dzahn) >>! In T293814#7455974, @Ottomata wrote: > As far as I can tell, this was done as part of {T293810}. Merging this in as a duplicate. Please follow up there if you have trouble with acc... [17:44:17] 10Analytics, 10Analytics-Kanban: Kerberos request ticket for Naray-ctr - https://phabricator.wikimedia.org/T293814 (10Ottomata) thank you! [17:49:47] 10Analytics, 10Analytics-Kanban: Check home/HDFS leftovers of mholloway-shell - https://phabricator.wikimedia.org/T291353 (10Ottomata) 05Open→03Resolved a:03Ottomata [17:58:34] 10Analytics-Clusters, 10Analytics-Kanban, 10Data-Engineering: Set hive.warehouse.subdir.inherit.perms to false - https://phabricator.wikimedia.org/T291664 (10Ottomata) [17:58:55] 10Analytics-Clusters, 10Analytics-Kanban, 10Data-Engineering: Set hive.warehouse.subdir.inherit.perms to false - https://phabricator.wikimedia.org/T291664 (10Ottomata) [18:14:27] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Data-Engineering, and 3 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10MNeisler) [18:47:45] 10Analytics, 10Analytics-Kanban, 10wmfdata-python, 10Product-Analytics (Kanban): wmfdata-python's Hive query output includes logspam - https://phabricator.wikimedia.org/T275233 (10nshahquinn-wmf) a:05Milimetric→03nshahquinn-wmf Oh, that was quick! Thank you so much for putting together the pull request... [18:59:08] 10Analytics, 10Data-Engineering: Results have expired error in Hue - https://phabricator.wikimedia.org/T294144 (10EYener) Thanks for taking a look @odimitrijevic! Here is one where both @JMando and I received the `Results have expired, rerun the query if needed.` error in Hue last Friday: `SELECT count(event.... [20:04:22] 10Analytics: Check home/HDFS leftovers of tonina - https://phabricator.wikimedia.org/T293676 (10WMDE-leszek) @Ottomata yes, please! [20:41:57] 10Analytics, 10Analytics-Kanban: Kerberos identity for cicalese - https://phabricator.wikimedia.org/T293850 (10CCicalese_WMF) Thank you! [20:52:41] 10Analytics, 10Analytics-Kanban: Kerberos identity for cicalese - https://phabricator.wikimedia.org/T293850 (10Ottomata) 05Open→03Resolved [23:32:42] 10Analytics-Radar, 10Fundraising-Backlog, 10Product-Analytics, 10Wikipedia-iOS-App-Backlog, and 2 others: Understand impact of Apple's Relay Service - https://phabricator.wikimedia.org/T289795 (10nettrom_WMF) [23:33:51] 10Analytics-Radar, 10Fundraising-Backlog, 10Product-Analytics, 10Wikipedia-iOS-App-Backlog, and 2 others: Understand impact of Apple's Relay Service - https://phabricator.wikimedia.org/T289795 (10nettrom_WMF) [23:49:50] (03CR) 10Clare Ming: "waiting for naming convention resolution before updating schema name again" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/731156 (https://phabricator.wikimedia.org/T292586) (owner: 10Clare Ming)