[01:19:14] 10Data-Engineering, 10Platform Engineering: Deploy AQS service to codfw clusters - https://phabricator.wikimedia.org/T309808 (10Eevans) [01:19:16] 10Data-Engineering-Radar, 10Cassandra, 10Generated Data Platform: AQS multi-datacenter cluster expansion - https://phabricator.wikimedia.org/T307641 (10Eevans) [01:22:42] 10Data-Engineering-Radar, 10Cassandra, 10Generated Data Platform: AQS multi-datacenter cluster expansion - https://phabricator.wikimedia.org/T307641 (10Eevans) [01:22:46] 10Data-Engineering, 10Platform Engineering: Deploy AQS service to codfw clusters - https://phabricator.wikimedia.org/T309808 (10Eevans) [06:31:28] !log restart memcached on an-tool1005 to pick up puppet settings and clear an alert in icinga [06:31:30] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [06:35:57] 10Data-Engineering, 10CheckUser, 10MW-1.38-notes (1.38.0-wmf.26; 2022-03-14), 10MW-1.39-notes (1.39.0-wmf.15; 2022-06-06), and 3 others: Update CheckUser for actor and comment table - https://phabricator.wikimedia.org/T233004 (10dom_walden) >>! In T233004#7975137, @Zabe wrote: >>>! In T233004#7972925, @dom... [08:54:31] 10Data-Engineering, 10Data-Catalog, 10SRE, 10serviceops, and 2 others: New Service Request: DataHub - https://phabricator.wikimedia.org/T303049 (10BTullis) >>! In T303049#7976511, @JMeybohm wrote: > > Sorry for nudging @BTullis - do you miss any information or need any assistance regarding the remaining s... [09:33:05] RECOVERY - Check unit status of monitor_refine_eventlogging_legacy on an-launcher1002 is OK: OK: Status of the systemd unit monitor_refine_eventlogging_legacy https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [09:35:23] 10Data-Engineering, 10Data-Engineering-Kanban: Check home/HDFS leftovers of razzi - https://phabricator.wikimedia.org/T309000 (10BTullis) I have carried out this removal of files. ` btullis@cumin1001:~$ sudo cumin 'C:profile::analytics::cluster::client or C:profile::hadoop::master or C:profile::hadoop::master:... [09:41:48] 10Data-Engineering, 10Data-Engineering-Kanban: Check home/HDFS leftovers of razzi - https://phabricator.wikimedia.org/T309000 (10BTullis) 05Open→03Resolved [09:45:04] 10Data-Engineering, 10Data-Engineering-Kanban: RAID battery malfunction in an-worker1081 - https://phabricator.wikimedia.org/T308267 (10BTullis) [09:45:28] 10Data-Engineering, 10Data-Engineering-Kanban: RAID battery malfunction in an-worker1081 - https://phabricator.wikimedia.org/T308267 (10BTullis) 05Open→03Resolved [09:47:24] 10Data-Engineering, 10Data-Engineering-Kanban, 10Patch-For-Review: Fix turnilo after upgrade - https://phabricator.wikimedia.org/T308778 (10BTullis) Still no news of resolution from the upstream authors of turnilo on their Slack: https://turnilo.slack.com/archives/CEQMX06NB/p1652789689569899 [09:59:59] 10Data-Engineering-Kanban, 10Data-Catalog: User Experience: Authentication - https://phabricator.wikimedia.org/T307711 (10BTullis) I've investigated this a little and I can't yet find any cause for this. Here are the database references to `echetty`: ` MariaDB [datahub]> select * from metadata_aspect_v2 where... [10:18:28] 10Data-Engineering, 10Data-Engineering-Kanban, 10Airflow: Improve monitoring for airflow-scheduler services - https://phabricator.wikimedia.org/T307739 (10BTullis) p:05Triage→03Medium The monitoring is already more conrehensive than we thought, so nothing remains to be done for this task. See T307102#792... [10:18:49] 10Data-Engineering, 10Data-Engineering-Kanban, 10Airflow: Improve monitoring for airflow-scheduler services - https://phabricator.wikimedia.org/T307739 (10BTullis) a:03BTullis [11:05:13] 10Data-Engineering, 10Data-Engineering-Kanban, 10SRE, 10Traffic, and 2 others: intake-analytics is responsible for up to a 85% of varnish backend fetch errors - https://phabricator.wikimedia.org/T306181 (10BTullis) I'm a bit confused by the state of things now. 1) Has the update to service-runner 3.1.0 be... [11:11:36] 10Data-Engineering, 10Data-Engineering-Kanban, 10SRE, 10Traffic, and 2 others: intake-analytics is responsible for up to a 85% of varnish backend fetch errors - https://phabricator.wikimedia.org/T306181 (10akosiaris) >>! In T306181#7982366, @BTullis wrote: > I'm a bit confused by the state of things now. >... [11:25:36] 10Data-Engineering, 10Data-Engineering-Kanban, 10SRE, 10Traffic, and 2 others: intake-analytics is responsible for up to a 85% of varnish backend fetch errors - https://phabricator.wikimedia.org/T306181 (10BTullis) Great, thanks for the summary @akosiaris - So the reduction in replicas alone explains the s... [12:24:42] 10Data-Engineering, 10Data-Engineering-Kanban: Add the conftool pooled/depooled status and weight into prometheus for each service - https://phabricator.wikimedia.org/T309189 (10BTullis) I've pushed what I believe will be a working confd template for this, but I'm unsure what to do about the rspec tests that I... [12:36:20] 10Data-Engineering-Kanban, 10Data-Catalog: User Experience: Authentication - https://phabricator.wikimedia.org/T307711 (10BTullis) 05Open→03Resolved Confirmed, logging in with a lower case username works for @EChetty. I have added a note to this page about the requirement to use a lower case username as w... [13:06:04] 10Data-Engineering, 10Data-Engineering-Kanban: Analytics Data Lake - Hadoop Namenode failure - standby namenode backups filled up namenode data partition - https://phabricator.wikimedia.org/T309649 (10BTullis) Regarding the alerts, it did send IRC alerts but only to #wikimedia-operations and not #wikimedia-ana... [13:06:14] 10Data-Engineering, 10Data-Engineering-Kanban: Analytics Data Lake - Hadoop Namenode failure - standby namenode backups filled up namenode data partition - https://phabricator.wikimedia.org/T309649 (10BTullis) [13:31:36] 10Data-Engineering, 10Data-Engineering-Kanban: Analytics Data Lake - Hadoop Namenode failure - standby namenode backups filled up namenode data partition - https://phabricator.wikimedia.org/T309649 (10BTullis) I have double-checked that the backups are correctly configured for the namenode fsimage backups. Fro... [13:31:59] 10Data-Engineering, 10Data-Engineering-Kanban: Analytics Data Lake - Hadoop Namenode failure - standby namenode backups filled up namenode data partition - https://phabricator.wikimedia.org/T309649 (10BTullis) [13:45:35] !log restarting archiva service for new JRE [13:45:37] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:25:25] 10Data-Engineering, 10Data-Engineering-Kanban: Create `research` hive user - https://phabricator.wikimedia.org/T309922 (10odimitrijevic) p:05Triage→03High [14:25:59] 10Data-Engineering, 10Data-Engineering-Kanban: Create `research` hive user - https://phabricator.wikimedia.org/T309922 (10odimitrijevic) [14:31:36] 10Data-Engineering, 10Data-Engineering-Kanban: Create `research` hive user - https://phabricator.wikimedia.org/T309922 (10fkaelin) We went ahead and manually created a `knowledge_gaps` database in hive. We haven't verified, but assuming that the `analytics-research` user can read&write from this database, I w... [14:32:35] 10Data-Engineering, 10Data-Engineering-Kanban: Create `research` hive user - https://phabricator.wikimedia.org/T309922 (10fkaelin) 05Open→03Resolved a:03fkaelin [14:33:10] 10Data-Engineering, 10Data-Persistence (Consultation): Move Mediawiki QueryPages computation to Hadoop - https://phabricator.wikimedia.org/T309738 (10odimitrijevic) @Milimetric In order to evaluate impact of doing this work do we have info on how frequently these queries run, the duration and resource allocati... [14:34:36] 10Data-Engineering, 10Data-Persistence (Consultation): Move Mediawiki QueryPages computation to Hadoop - https://phabricator.wikimedia.org/T309738 (10Milimetric) >>! In T309738#7982848, @odimitrijevic wrote: > @Milimetric In order to evaluate impact of doing this work do we have info on how frequently these qu... [14:37:31] 10Data-Engineering, 10Data-Persistence (Consultation): Move Mediawiki QueryPages computation to Hadoop - https://phabricator.wikimedia.org/T309738 (10Milimetric) [14:37:49] 10Data-Engineering, 10Data-Persistence (Consultation): Move Mediawiki QueryPages computation to Hadoop - https://phabricator.wikimedia.org/T309738 (10Milimetric) [14:38:46] 10Data-Engineering, 10Data-Catalog: DataHub rights assignment is case-sensitive - https://phabricator.wikimedia.org/T309382 (10odimitrijevic) p:05Triage→03High [14:44:40] 10Data-Engineering, 10Data-Engineering-Kanban: Mediawiki History delayed 2022-06 - https://phabricator.wikimedia.org/T309987 (10Milimetric) [15:15:43] 10Data-Engineering, 10Data-Engineering-Kanban, 10Airflow: Fix airflow interlanguage job - https://phabricator.wikimedia.org/T308766 (10Milimetric) a:05NOkafor-WMF→03JAllemandou [15:17:32] 10Data-Engineering, 10Data-Engineering-Kanban, 10Airflow: [Airflow] Migrate Oozie's mediawiki_history_load jobs to Airflow - https://phabricator.wikimedia.org/T309718 (10mforns) [15:18:21] 10Data-Engineering-Kanban, 10Patch-For-Review: The effect of sqooping large tables on mediawiki history - https://phabricator.wikimedia.org/T309806 (10Milimetric) a:03Milimetric [15:18:41] (03PS1) 10Snwachukwu: Update wikidata metrics hql script [analytics/refinery] - 10https://gerrit.wikimedia.org/r/803306 (https://phabricator.wikimedia.org/T300021) [15:30:37] (03CR) 10Snwachukwu: [C: 03+1] Update wikidata metrics hql script [analytics/refinery] - 10https://gerrit.wikimedia.org/r/803306 (https://phabricator.wikimedia.org/T300021) (owner: 10Snwachukwu) [15:48:47] 10Data-Engineering: Large number of web requests from Iran are likely incorrectly flagged with 'user' agent type - https://phabricator.wikimedia.org/T309710 (10JArguello-WMF) [15:52:11] 10Data-Engineering, 10Product-Analytics, 10SDAW-MediaSearch, 10Structured-Data-Backlog (Current Work): [M] No data from ptwikinews in event.mediawiki_mediasearch_interaction table - https://phabricator.wikimedia.org/T308815 (10JArguello-WMF) [15:52:36] 10Data-Engineering, 10Metrics-Platform, 10MW-1.39-notes (1.39.0-wmf.12; 2022-05-16): TypeError: navigator.sendBeacon is not a function - https://phabricator.wikimedia.org/T308311 (10JArguello-WMF) [15:53:05] 10Data-Engineering, 10EventStreams: EventStreams doesn't show the Wikistories-* streams - https://phabricator.wikimedia.org/T307679 (10JArguello-WMF) [15:53:25] 10Analytics-Wikistats, 10Data-Engineering: Feature requests for Active Editors by Country - https://phabricator.wikimedia.org/T304720 (10JArguello-WMF) [15:54:35] 10Data-Engineering: Upgrade db1108 to Bullseye - https://phabricator.wikimedia.org/T304492 (10JArguello-WMF) [15:55:25] 10Data-Engineering: Clarify how users can opt-out of intake-analytics - https://phabricator.wikimedia.org/T304426 (10JArguello-WMF) [15:55:47] 10Data-Engineering, 10Event-Platform, 10Metrics-Platform: jsonschema-tools tests should fail if schema $id does not match title or path - https://phabricator.wikimedia.org/T300404 (10JArguello-WMF) [15:56:14] 10Analytics-Wikistats, 10Data-Engineering, 10Data-Engineering-Kanban, 10Product-Analytics: Wikistats reports no mobile unique devices for Wikidata and MediaWiki.org - https://phabricator.wikimedia.org/T299559 (10JArguello-WMF) [15:56:41] 10Data-Engineering, 10MediaWiki-General: Pingback dashboard data normalisation - https://phabricator.wikimedia.org/T298928 (10JArguello-WMF) [15:58:44] 10Analytics-Kanban, 10Data-Engineering, 10Pageviews-Anomaly: Article on Carles Puigdemont has inflated pageviews in many projects - https://phabricator.wikimedia.org/T263908 (10JArguello-WMF) [16:00:15] 10Data-Engineering, 10Data-Engineering-Kanban, 10Event-Platform, 10Metrics-Platform, and 2 others: Problem with delay caused by intake-analytics.wikimedia.org - https://phabricator.wikimedia.org/T295427 (10JArguello-WMF) [16:00:58] 10Data-Engineering: jmads requesting Kerberos password - https://phabricator.wikimedia.org/T250560 (10JArguello-WMF) [16:01:09] 10Data-Engineering-Radar, 10Metrics-Platform, 10MW-1.39-notes (1.39.0-wmf.12; 2022-05-16): TypeError: navigator.sendBeacon is not a function - https://phabricator.wikimedia.org/T308311 (10odimitrijevic) [16:03:02] 10Data-Engineering, 10API Platform, 10Code-Health-Objective, 10Epic, and 3 others: Implement aggregate endpoint of the pageviews API - https://phabricator.wikimedia.org/T299731 (10JArguello-WMF) [16:03:18] 10Data-Engineering, 10SRE: Also intake Network Error Logging events into the Analytics Data Lake - https://phabricator.wikimedia.org/T304373 (10JArguello-WMF) [16:03:45] 10Data-Engineering, 10SRE, 10SRE Observability: dropped packets to kafkamon 9000/tcp - https://phabricator.wikimedia.org/T238794 (10JArguello-WMF) [16:03:56] 10Analytics-Wikistats, 10Data-Engineering: Confusing filtering on "Active editors by country" topic - https://phabricator.wikimedia.org/T300365 (10JArguello-WMF) [16:04:24] 10Analytics-Wikistats, 10Data-Engineering-Radar, 10Product-Analytics, 10Wikipedia-Android-App-Backlog, and 2 others: Wikistats pageview data missing counts for Mobile App pageviews on Commons, going back to 2020-11 - https://phabricator.wikimedia.org/T299439 (10JArguello-WMF) [16:04:50] 10Data-Engineering, 10Event-Platform, 10Observability-Alerting, 10Patch-For-Review: Apparent latency warning in 90th centile of eventgate-logging-external - https://phabricator.wikimedia.org/T294911 (10JArguello-WMF) [16:05:34] 10Data-Engineering, 10Event-Platform, 10EventStreams: Expose mediawiki/revision/tags-change in stream.wikimedia.org - https://phabricator.wikimedia.org/T294391 (10JArguello-WMF) [16:05:53] 10Analytics-Wikistats, 10Data-Engineering, 10Product-Analytics: Support including edits to deleted pages in editing metrics - https://phabricator.wikimedia.org/T295212 (10JArguello-WMF) [16:06:09] 10Analytics-Jupyter, 10Data-Engineering: Autocomplete is very slow (unusable) in Newpyter - https://phabricator.wikimedia.org/T290008 (10JArguello-WMF) [16:06:29] 10Analytics-Jupyter, 10Data-Engineering, 10Product-Analytics: conda list does not show all packages in environment - https://phabricator.wikimedia.org/T294368 (10JArguello-WMF) [16:29:29] (03CR) 10Mforns: [V: 03+2 C: 03+2] "LGTM!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/803306 (https://phabricator.wikimedia.org/T300021) (owner: 10Snwachukwu) [16:31:47] 10Quarry, 10Patch-For-Review: Pressing the Stop button in Quarry results in a 500 error - https://phabricator.wikimedia.org/T290146 (10rook) I believe with the last merge we will have cleared up the 500 errors, so the user experience should be good in this front now. Background issues mentioned in this ticket... [16:31:56] 10Quarry, 10Patch-For-Review: Pressing the Stop button in Quarry results in a 500 error - https://phabricator.wikimedia.org/T290146 (10rook) 05Open→03Resolved [16:32:05] 10Quarry, 10Patch-For-Review: Pressing the Stop button in Quarry results in a 500 error - https://phabricator.wikimedia.org/T290146 (10rook) a:03rook [17:16:18] 10Data-Engineering, 10Data-Engineering-Kanban: Procure MaxMind GeoIP2 Database License - https://phabricator.wikimedia.org/T303453 (10odimitrijevic) 05Open→03Resolved The two licenses have been extended to 2023-05-13. [17:16:20] 10Data-Engineering: Migrate to MaxMind GeoIP2 - https://phabricator.wikimedia.org/T302989 (10odimitrijevic) [17:17:26] 10Analytics-Jupyter, 10Data-Engineering, 10Infrastructure-Foundations, 10CAS-SSO, 10User-MoritzMuehlenhoff: Allow login to JupyterHub via CAS - https://phabricator.wikimedia.org/T260386 (10JArguello-WMF) [17:17:44] 10Analytics-Jupyter, 10Data-Engineering, 10Product-Analytics: Internal nbviewer instance for sharing notebooks among 'wmf' and 'nda' members - https://phabricator.wikimedia.org/T290693 (10JArguello-WMF) [17:18:01] 10Analytics-Jupyter, 10Data-Engineering, 10Product-Analytics: Functionality to share & view notebooks - https://phabricator.wikimedia.org/T156934 (10JArguello-WMF) [17:18:15] 10Analytics-Jupyter, 10Data-Engineering: Notebook machine to double as RStudio Server? - https://phabricator.wikimedia.org/T190769 (10JArguello-WMF) [17:21:54] 10Data-Engineering, 10Discovery, 10Event-Platform, 10SRE, 10Platform Team Workboards (Clinic Duty Team): Avoid accepting Kafka messages with whacky timestamps - https://phabricator.wikimedia.org/T282887 (10JArguello-WMF) [17:27:11] 10Analytics, 10Data-Engineering: Druid loading of navigationtiming gets stuck - https://phabricator.wikimedia.org/T273216 (10JArguello-WMF) [17:28:09] 10Data-Engineering: Druid loading of navigationtiming gets stuck - https://phabricator.wikimedia.org/T273216 (10JArguello-WMF) [17:30:34] 10Analytics, 10Data-Engineering: Archive /home/ezachte data on stat1007 - https://phabricator.wikimedia.org/T238243 (10JArguello-WMF) [17:31:49] 10Data-Engineering: Archive /home/ezachte data on stat1007 - https://phabricator.wikimedia.org/T238243 (10JArguello-WMF) [17:34:50] update: mw history job finished ok with two small wikis. I'm going to try enwiki real quick [17:35:35] application_1651744501826_194467 is the enwiki run [17:37:47] 👍 [17:38:53] mforns: I'm trying it with slightly larger values than the oozie job uses. My guess is that if it works with enwiki, it should work for everything, since everything's partitioned by wiki_db [17:39:16] 10Analytics-Wikistats, 10Data-Engineering: Automate creation of sqoop list of wikis to import data for from sitematrix - https://phabricator.wikimedia.org/T190700 (10JArguello-WMF) [17:39:22] so if enwiki works, I'll just start the full job with these settings. But if it doesn't, wanna brainstorm different settings? [17:39:26] 10Data-Engineering, 10Epic: Upgrade analytics-hadoop to Spark 3 + scala 2.12 - https://phabricator.wikimedia.org/T291464 (10JArguello-WMF) [17:47:08] 10Analytics, 10Data-Engineering, 10Pageviews-API, 10User-Elukey: Improve user management for AQS Cassandra - https://phabricator.wikimedia.org/T142073 (10odimitrijevic) @Eevans Is this request still relevant given the latest AQS plans? [17:50:04] milimetric: sure! [18:18:30] 10Analytics, 10Data-Engineering, 10Pageviews-API, 10User-Elukey: Improve user management for AQS Cassandra - https://phabricator.wikimedia.org/T142073 (10Eevans) >>! In T142073#7983633, @odimitrijevic wrote: > @Eevans Is this request still relevant given the latest AQS plans? It is, yeah. It looks like i... [18:27:13] 10Analytics, 10Data-Engineering, 10Pageviews-API, 10User-Elukey: Improve user management for AQS Cassandra - https://phabricator.wikimedia.org/T142073 (10Eevans) [18:27:46] 10Analytics, 10Data-Engineering, 10Cassandra, 10Pageviews-API, 10User-Elukey: Improve user management for AQS Cassandra - https://phabricator.wikimedia.org/T142073 (10Eevans) [18:41:22] hm, almost an hour and still processing [18:46:16] mforns: wanna talk for a second about the keytab/principal passing to make sure I understand your last review, and then look at the mw history job? [18:46:21] (I'll be in the cave) [18:46:36] milimetric: omw! [19:16:10] 10Data-Engineering: jmads requesting Kerberos password - https://phabricator.wikimedia.org/T250560 (10jmads) 05Open→03Resolved [19:43:52] 10Data-Engineering, 10Data-Engineering-Kanban: Mediawiki History delayed 2022-05 - https://phabricator.wikimedia.org/T309987 (10Milimetric) [20:00:14] 10Data-Engineering: Remove unused Gerrit repository - https://phabricator.wikimedia.org/T309731 (10Milimetric) I'm not sure how we can remove it, https://www.mediawiki.org/wiki/Gerrit/Inactive_projects seems to say we just mark repositories as "Read Only". Is this enough? Does someone know if we have a more pe... [20:01:19] 10Data-Engineering: Remove unused Gerrit repository - https://phabricator.wikimedia.org/T309731 (10Aklapper) I don't know but if you want something in Gerrit you should add a #Gerrit tag so Gerrit folks could see it :) [20:01:28] 10Data-Engineering: Remove unused Gerrit repository mediawiki/services/aqs/deploy - https://phabricator.wikimedia.org/T309731 (10Aklapper) [21:35:01] (03PS3) 10Bearloga: movement_metrics: Migrate Content Interactions tables and ETL [analytics/wmf-product/jobs] - 10https://gerrit.wikimedia.org/r/799417 (https://phabricator.wikimedia.org/T308695) (owner: 10Mayakpwiki) [21:36:10] (03PS4) 10Bearloga: movement_metrics: Migrate Content Interactions tables and ETL [analytics/wmf-product/jobs] - 10https://gerrit.wikimedia.org/r/799417 (https://phabricator.wikimedia.org/T308695) (owner: 10Mayakpwiki) [21:36:38] (03CR) 10Bearloga: [V: 03+2 C: 03+2] movement_metrics: Migrate Content Interactions tables and ETL [analytics/wmf-product/jobs] - 10https://gerrit.wikimedia.org/r/799417 (https://phabricator.wikimedia.org/T308695) (owner: 10Mayakpwiki)