[01:16:06] (03PS1) 10Milimetric: Migrate pageview_hourly and related jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/887869 (https://phabricator.wikimedia.org/T324482) [07:02:45] 10Data-Engineering, 10DBA, 10Data-Persistence, 10Discovery-Search, and 9 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10Marostegui) [07:58:03] (03CR) 10Joal: "I don't see why this would be a problem but I'm no git-fat expert. Letting @otto merge." [analytics/hdfs-tools/deploy] - 10https://gerrit.wikimedia.org/r/887000 (https://phabricator.wikimedia.org/T328473) (owner: 10Chad) [07:58:08] (03CR) 10Joal: [C: 03+1] Drop vestiges of git-fat [analytics/hdfs-tools/deploy] - 10https://gerrit.wikimedia.org/r/887000 (https://phabricator.wikimedia.org/T328473) (owner: 10Chad) [08:16:52] (03CR) 10Joal: "Minor comments to discuss" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/887869 (https://phabricator.wikimedia.org/T324482) (owner: 10Milimetric) [08:18:12] Hi team - I plan on finalizing the deploy of a dataset change this morning - I'll start in 40mins (10am m time) - Anyone interested to pair with me, please ping :) [08:22:57] Deploy delayed by 15mins to wait for nfraison :) [08:49:49] 10Data-Engineering, 10CheckUser, 10MW-1.38-notes (1.38.0-wmf.26; 2022-03-14), 10MW-1.39-notes (1.39.0-wmf.23; 2022-08-01), and 4 others: Update CheckUser for actor and comment table - https://phabricator.wikimedia.org/T233004 (10Zabe) [09:03:12] (VarnishkafkaNoMessages) firing: varnishkafka on cp4044 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=ulsfo%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp4044%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [09:08:12] (VarnishkafkaNoMessages) resolved: varnishkafka on cp4044 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=ulsfo%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp4044%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [09:42:35] 10Data-Engineering-Planning, 10Observability-Alerting, 10SRE, 10Traffic, 10Shared-Data-Infrastructure (Shared-Data-Infra Sprint 08): Reduce/eliminate false positives for VarnishKafkaNoMessages alert - https://phabricator.wikimedia.org/T324522 (10nfraison) False alert has still been reported today in (Var... [09:51:59] I'm late, but now is the time :) Starting to deploy the pageview-learning change with nfraison [09:59:17] !log Kill oozie pageview-learning jobs [09:59:19] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:01:20] !log Move data and update hive tables from learning/actor convention to webrequest_actor convention [10:01:21] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:10:17] PROBLEM - Check systemd state on an-airflow1005 is CRITICAL: CRITICAL - degraded: The following units failed: wmf_auto_restart_airflow-kerberos@search.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [10:10:46] !log Merge airflow code for learning/actor -> webrequest_actor move [10:10:47] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:25:32] !log Setup airflow start-date variables for new dags [10:25:32] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:26:39] !log Deploy analytics-airflow [10:26:40] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:36:54] !log Start airflow webrequest_actor jobs [10:36:55] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:41:17] FYI, I'm installing containerd security updates on the dse-k8s cluster (these are already live on the main wikikube cluster and don't affect running pods) [10:47:35] jennifer_ebe: not sure if you saw joal's ping earlier - he's doing a more complicated deploy, it might be fun to follow along [10:47:50] Hi milimetric: mostly done :S [10:51:27] milimetric: if you have a minute of brain power despite the early time - https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/229 [10:51:35] otherwise I'll merge myself :) [10:54:32] looking joal [10:54:47] milimetric: I set it up to merge when the tests finish [10:57:00] joal: I thought that looked weird when I first saw it but figured there was some weird (a,b] interval thing going on [10:57:26] absolutely right - I tried to explain in the comment, probably not clearly enough :) [10:57:33] milimetric: --^ [11:04:59] 10Data-Engineering-Planning, 10Observability-Alerting, 10SRE, 10Traffic, and 2 others: Reduce/eliminate false positives for VarnishKafkaNoMessages alert - https://phabricator.wikimedia.org/T324522 (10nfraison) From those graph we can see that no requests have been received on the varnish which leads to no... [11:09:53] 10Data-Engineering-Planning, 10Observability-Alerting, 10SRE, 10Traffic, and 2 others: Reduce/eliminate false positives for VarnishKafkaNoMessages alert - https://phabricator.wikimedia.org/T324522 (10nfraison) The drop is indeed due to a depool ` 09:09 pool cp4044 with ESI testing enabled... [11:16:29] (03PS2) 10Milimetric: Migrate pageview_hourly and related jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/887869 (https://phabricator.wikimedia.org/T324482) [11:17:21] joal: no that new interval comment made sense. btw, thx for the review, I applied the changes. The only thing I wasn't quite sure about was the destructure/restructure of the user agent map, do you have any better ideas there? [11:18:00] https://gerrit.wikimedia.org/r/c/analytics/refinery/+/887869/2/hql/pageview/hourly/aggregate_pageview_actor_to_pageview_hourly.hql#45 [11:31:28] 10Data-Engineering, 10Equity-Landscape: Access input metrics - https://phabricator.wikimedia.org/T324968 (10KCVelaga_WMF) a:05KCVelaga_WMF→03JAnstee_WMF @JAnstee_WMF Access inputs QC is complete ([[ https://docs.google.com/spreadsheets/d/1RsO2BBNSK2iQp45Gx1wZnFH4gKNaPNkMuAJgS0suyr0/edit?pli=1#gid=0&range=N... [11:32:09] 10Data-Engineering, 10Equity-Landscape: Access output metrics - https://phabricator.wikimedia.org/T329185 (10KCVelaga_WMF) a:05KCVelaga_WMF→03JAnstee_WMF @JAnstee_WMF Access outputs QC is complete ([[ https://docs.google.com/spreadsheets/d/1RsO2BBNSK2iQp45Gx1wZnFH4gKNaPNkMuAJgS0suyr0/edit?pli=1#gid=1915485... [11:39:51] moritzm: Ack, thanks. [12:01:32] !log Shutting down an-worker109[89] and dse-k8s-worker1002 for another GPU move - T318696 [12:01:35] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [12:01:35] T318696: Move some GPUs from Hadoop to the DSE-K8S cluster - https://phabricator.wikimedia.org/T318696 [12:22:33] 10Data-Engineering-Planning, 10Observability-Alerting, 10SRE, 10Traffic, and 2 others: Reduce/eliminate false positives for VarnishKafkaNoMessages alert - https://phabricator.wikimedia.org/T324522 (10nfraison) I've looked back at the alerts we have faced on the 7th morning and those ones where due to a rol... [12:41:28] Meh - Another bug in my deployed code today [12:43:10] joal: Another bug discovered :-) It's a win! [12:43:43] milimetric: No better idea than the way you did it - I otherwise do using map_keys and map_values - your solution feels as good as mine [12:43:47] True btullis :) [12:44:28] (03PS10) 10Kosta Harlan: image-suggestions-feedback: Bump to version 2.0.0 [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/809150 (https://phabricator.wikimedia.org/T302925) [12:44:32] (03CR) 10Kosta Harlan: image-suggestions-feedback: Bump to version 2.0.0 (036 comments) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/809150 (https://phabricator.wikimedia.org/T302925) (owner: 10Kosta Harlan) [12:45:03] (03CR) 10CI reject: [V: 04-1] image-suggestions-feedback: Bump to version 2.0.0 [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/809150 (https://phabricator.wikimedia.org/T302925) (owner: 10Kosta Harlan) [13:10:04] 10Data-Engineering, 10Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-07)): Some users' presto queries are no longer working in Superset - https://phabricator.wikimedia.org/T328152 (10awight) My SQL Lab doesn't work either. I tried to log out of superset but the Logout menu brings me back to the same page,... [13:12:01] (03PS11) 10Kosta Harlan: image-suggestions-feedback: Bump to version 2.0.0 [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/809150 (https://phabricator.wikimedia.org/T302925) [13:27:14] 10Data-Engineering, 10Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-07)): Some users' presto queries are no longer working in Superset - https://phabricator.wikimedia.org/T328152 (10BTullis) @awight - I have added the `sql_lab` role to your account, so it should work now. Apologies for the inconvenience.... [13:32:50] (03CR) 10Ottomata: "Do we use git-fat to deploy artifacts with this repository? Joal, how are hdfs-tools deployed to e.g. the labstore servers for hdfs rsync" [analytics/hdfs-tools/deploy] - 10https://gerrit.wikimedia.org/r/887000 (https://phabricator.wikimedia.org/T328473) (owner: 10Chad) [13:38:55] (03CR) 10Joal: [C: 04-1] "I triple checked after Andrew's comment, and indeed we use git-fat to deploy the artifact. I should have been more careful when answering " [analytics/hdfs-tools/deploy] - 10https://gerrit.wikimedia.org/r/887000 (https://phabricator.wikimedia.org/T328473) (owner: 10Chad) [13:41:53] 10Data-Engineering, 10Event-Platform Value Stream, 10Machine-Learning-Team: Add a new outlink topic stream for EventGate main - https://phabricator.wikimedia.org/T328899 (10elukey) Thansk a lot for the details! We'll let decide the Research and Search team what's best, but we (as ML team) have discussed it a... [13:45:31] (03CR) 10Ottomata: "Right, yeah, we do use git-fat to rsync artifacts from archiva, for this repo and for analytics/refinery." [analytics/hdfs-tools/deploy] - 10https://gerrit.wikimedia.org/r/887000 (https://phabricator.wikimedia.org/T328473) (owner: 10Chad) [13:46:35] (03CR) 10Ottomata: "artifact syncing doc: https://gitlab.wikimedia.org/repos/data-engineering/workflow_utils/-/tree/main#artifact-module" [analytics/hdfs-tools/deploy] - 10https://gerrit.wikimedia.org/r/887000 (https://phabricator.wikimedia.org/T328473) (owner: 10Chad) [14:00:27] 10Analytics, 10Data-Engineering, 10Patch-For-Review: Fix broken image on front page of analytics.wikimedia.org - https://phabricator.wikimedia.org/T327687 (10Aklapper) p:05Triage→03Medium a:03Aklapper [14:39:13] (03CR) 10Milimetric: [V: 03+2 C: 03+2] Correct invalid URL of image on frontpage of analytics.wikimedia.org [analytics/analytics.wikimedia.org] - 10https://gerrit.wikimedia.org/r/882714 (https://phabricator.wikimedia.org/T327687) (owner: 10Aklapper) [14:40:21] 10Analytics, 10Data-Engineering, 10Patch-For-Review: Fix broken image on front page of analytics.wikimedia.org - https://phabricator.wikimedia.org/T327687 (10Milimetric) Thanks for catching that! Merged - it will auto-deploy [14:40:54] joal: need help with this other bug? I see a few failed instances [14:41:11] 10Data-Engineering, 10MediaWiki-extensions-EventLogging: Determine what UserBucketService::getUserEditCountBucket should return for anons - https://phabricator.wikimedia.org/T329292 (10phuedx) [14:42:06] hey milimetric I;'m investigating some problems - cOuld do with some help [14:42:11] omw cave [15:10:49] 10Data-Engineering, 10Event-Platform Value Stream, 10Machine-Learning-Team: Add a new outlink topic stream for EventGate main - https://phabricator.wikimedia.org/T328899 (10Isaac) Not to further muddy the water, but I'm just realizing that this model could also be triggered with just page-links changes as th... [15:13:50] joal: it does! It's just at the root of the partition path. So check this out for your new dataset: /wmf/data/wmf/webrequest_actor/label/hourly/_SUCCESS [15:31:59] 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 08), 10Patch-For-Review: Streaming services errors should be routed to an error event topic. - https://phabricator.wikimedia.org/T326536 (10Ottomata) a:03Ottomata [15:36:07] 10Analytics, 10Data-Engineering: Fix broken image on front page of analytics.wikimedia.org - https://phabricator.wikimedia.org/T327687 (10Aklapper) 05Open→03Resolved [15:39:54] joal: I noticed in transform_projectview (similar to the job I'm migrating) we're not compressing in spark, but instead I guess letting the archive task do that in airflow? Any preference? I kind of prefer compressing early so we write less/move less. [15:40:19] https://github.com/wikimedia/analytics-refinery/blob/master/hql/projectview/hourly/transform_projectview_to_legacy_format.hql [15:54:59] milimetric: if we don't compress here, I don't think we compress in the archive task [15:59:59] ah, indeed, that job just doesn't compress at all. Ok... well, I guess that nobody uses that data or nobody cares :) [16:00:11] they're tiny files anyway [16:35:47] 10Data-Engineering-Icebox: Improve Bot Detection Heuristics - https://phabricator.wikimedia.org/T310846 (10Isaac) I recently discovered the `public_cloud` key in [[https://wikitech.wikimedia.org/wiki/X-Analytics|X-Analytics]]. If I understand it correctly, it might be a simple additional heuristic to just label... [16:50:43] hey milimetric can you help me a sec? There was a bug in my deployed code, and I created a fix. Can you review and approve if OK, please? https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/233 [16:52:21] in pa sync mforns, but I'll check it in a min. [17:00:27] milimetric: no worries! [17:01:13] milimetric: sorry I totally missed the pa meeting, was deploying and got this error, and was trying to fix it [17:01:53] looks good mforns, merged [17:07:16] thank you milimetric! [17:14:46] (03PS7) 10Aqu: Remove Guava from dependency [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/883118 (https://phabricator.wikimedia.org/T327072) [17:27:46] 10Data-Engineering: Deprecate old mobile datasets - https://phabricator.wikimedia.org/T329310 (10JAllemandou) [17:28:34] 10Data-Engineering: Deprecate old mobile datasets - https://phabricator.wikimedia.org/T329310 (10JAllemandou) [17:30:43] milimetric: I'm about to redeploy airflow, and I see there are changes to analytics/dags/pageview/pageview_actor_hourly_dag.py. Can I deploy that too? [17:31:04] joal: ^ [17:31:07] I mean, I imagine I can, just letting you know [17:31:21] I think that's Jo's change, I think it was deployed manually... [17:31:41] ok ok [17:32:10] please mforns - all good, the changes are just naming and they'll be usefull [17:32:21] !log deployed airflow [17:32:22] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:40:57] 10Data-Engineering, 10Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-07)): Some users' presto queries are no longer working in Superset - https://phabricator.wikimedia.org/T328152 (10jwang) @BTullis , thanks for the followup. My account works now. [18:02:36] 10Data-Engineering: Deprecate old mobile datasets - https://phabricator.wikimedia.org/T329310 (10SNowick_WMF) Confirming these tables are not in use and ok to deprecate. [18:07:39] (03CR) 10Milimetric: Migrate pageview_hourly and related jobs (034 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/887869 (https://phabricator.wikimedia.org/T324482) (owner: 10Milimetric) [18:52:20] 10Data-Engineering, 10Event-Platform Value Stream, 10Machine-Learning-Team: Add a new outlink topic stream for EventGate main - https://phabricator.wikimedia.org/T328899 (10Ottomata) Interesting. I think the output score data model could still be an entity change based model, but the input wouldn't be revis... [19:30:38] 10Data-Engineering-Planning, 10Data Pipelines, 10Discovery-Search (Current work): migrate mjolnir application and dag to airflow v2 and spark3 - https://phabricator.wikimedia.org/T329239 (10EBernhardson) [19:32:46] 10Data-Engineering-Planning, 10Data Pipelines, 10Discovery-Search (Current work): migrate mjolnir application and dag to airflow v2 and spark3 - https://phabricator.wikimedia.org/T329239 (10EBernhardson) The mjolnir repo has been migrated to [[ https://gitlab.wikimedia.org/repos/search-platform/mjolnir | git... [20:12:09] 10Analytics-Radar, 10Analytics-Wikistats, 10Data-Engineering: WiViVi Broken in Firefox 50 (Linux only) - https://phabricator.wikimedia.org/T172304 (10Aklapper) [20:17:07] 10Analytics-Radar, 10Analytics-Wikistats, 10Data-Engineering: WiViVi Broken in Firefox 50 (Linux only) - https://phabricator.wikimedia.org/T172304 (10Aklapper) 05Open→03Stalled This task lists browser console output, but what exactly is "broken" and how does the brokenness show? [20:30:42] 10Data-Engineering, 10Project-Admins, 10PM: Archive Analytics tag - https://phabricator.wikimedia.org/T298671 (10Aklapper) This task got opened a year ago. @odimitrijevic, @JArguello-WMF: Could this please see decisions? * There are 218 open #Analytics tasks in https://phabricator.wikimedia.org/maniphest/qu... [20:55:00] 10Data-Engineering-Planning, 10Data Pipelines, 10Discovery-Search (Current work): Create airflow v2 instance and supporting repos for search platform - https://phabricator.wikimedia.org/T327970 (10Antoine_Quhen) I think the Airflow upgrade + Postgres migration deployment for our two instances is a matter of... [21:07:09] (03PS8) 10Aqu: Remove Guava from dependency [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/883118 (https://phabricator.wikimedia.org/T327072) [21:32:24] 10Data-Engineering-Planning, 10Data Pipelines, 10Pageviews-Anomaly, 10Wikipedia-iOS-App-Backlog, and 6 others: Analyze possible bot traffic for enwiki article Index (statistics), Index & XXX:_Return_of_Xander_Cage - https://phabricator.wikimedia.org/T328127 (10SNowick_WMF) Findings thus far for `Index (sta... [21:49:39] 10Data-Engineering, 10AQS 2.0 Roadmap, 10API Platform (API Platform Roadmap), 10Epic, and 2 others: AQS 2.0:Wikistats 2 service - https://phabricator.wikimedia.org/T288301 (10VirginiaPoundstone)