[01:47:45] 10Data-Engineering (Sprint 9), 06Data Products, 06Movement-Insights, 10Movement-Metrics, 13Patch-For-Review: 14Skip Wikidata when loading XML dumps to the Data Lake - 14https://phabricator.wikimedia.org/T357859#9664218 (10nshahquinn-wmf) 14I've updated the documentation on [wikitech:Analytics/Data L... [03:50:27] (03PS6) 10Snwachukwu: Mediawiki History Data Quality Metrics [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1008934 (https://phabricator.wikimedia.org/T354692) [04:07:11] (03CR) 10TChin: development: add webrequest schema (031 comment) [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/983898 (https://phabricator.wikimedia.org/T314956) (owner: 10Ottomata) [07:42:43] (03CR) 10Gmodena: Mediawiki History Data Quality Metrics (032 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1008934 (https://phabricator.wikimedia.org/T354692) (owner: 10Snwachukwu) [07:59:53] (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-coord1003:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1003:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [08:24:03] joal: tu me ping si tu veux plus de contexte sur ces CRs. J'ai essayé de découper pour que ce soit plus simple à reviewer, mais ils y a pas mal de bruit... [09:54:53] (HiveServerHeapUsage) resolved: Hive Server JVM Heap usage is above 80% on an-coord1003:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1003:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [10:12:36] 06Data-Engineering, 10Data-Platform-SRE (2024.03.25 - 2024.04.14), 13Patch-For-Review: Update the From: addresses of all email from DPE pipelines so that they use routable addresses - https://phabricator.wikimedia.org/T358675#9664676 (10BTullis) I can confirm receipt of a new refinemonitor report, using the... [10:44:11] hello, an-worker1096 is reported to not be in puppetdb by a Netbox report, I can't access it via mgmt. Hosts should not be in this state for longer period of time, please have a look and either power it off, reimaage it or decommission it. [10:57:55] volans: Will do. Thanks for the heads-up. [11:08:16] 06Data-Engineering: [Developer Experience] Implement CI hql Linting - https://phabricator.wikimedia.org/T360967#9664873 (10Antoine_Quhen) https://docs.sqlfluff.com/en/stable/dialects.html#hive [11:26:23] ack, thx [11:31:13] (03PS1) 10Mforns: Productionize CommonsCategoryGraphBuilder for CIM project [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1015013 (https://phabricator.wikimedia.org/T358681) [11:50:26] 06Data-Engineering, 10Event-Platform: [NEEDS GROOMING] Orchestrate gobblin ingestion task with Airflow - https://phabricator.wikimedia.org/T361094 (10gmodena) 03NEW [13:31:19] 10Data-Engineering (Sprint 9), 13Patch-For-Review: [Refine Refactoring] Refactor refinery code for compatibility with Airflow integration - https://phabricator.wikimedia.org/T356363#9665322 (10BTullis) I had a question about this, because I just found out that refinery has its own email sending code. (re: http... [13:38:56] 06Data-Engineering, 10Event-Platform, 13Patch-For-Review: [Event Platform] Declare webrequest as an Event Platform stream - https://phabricator.wikimedia.org/T314956#9665337 (10BTullis) Hello. FYI we are receiving some alerts about failed produce_canary_events jobs, due to being unable to find the webrequest... [13:47:15] 10Data-Engineering (Sprint 9), 13Patch-For-Review: [Refine Refactoring] Refactor refinery code for compatibility with Airflow integration - https://phabricator.wikimedia.org/T356363#9665385 (10JAllemandou) >>! In T356363#9665322, @BTullis wrote: > Would we still want this integrated email functionality within... [13:47:29] 10Data-Engineering (Sprint 9), 06Data Products, 06Movement-Insights, 10Movement-Metrics, 13Patch-For-Review: 14Skip Wikidata when loading XML dumps to the Data Lake - 14https://phabricator.wikimedia.org/T357859#9665388 (10JAllemandou) 14Thanks a lot @nshahquinn-wmf :) [13:50:31] 06Data-Engineering, 06Data Products, 10MediaWiki-extensions-WikimediaEvents, 13Patch-For-Review, 10Web-Team-Backlog (FY2023-24 Q4 Sprint 1): Update mediawiki.web_ui_actions Stream Config - https://phabricator.wikimedia.org/T360955#9665406 (10ovasileva) p:05Triage→03High [13:58:10] 06Data-Engineering, 10Data Pipelines: Refine jobs should be scheduled by Airflow - https://phabricator.wikimedia.org/T307505#9665457 (10BTullis) We (#data-platform-sre) have been working on updating the alerting system so that all emails sent by automated monitoring systems use //routable domains//. This work... [13:59:03] (03PS1) 10Gehel: Sort some refinery modules according to sortPom. [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1015035 (https://phabricator.wikimedia.org/T360219) [14:02:03] 14Analytics-Radar, 06Data-Engineering-Icebox, 06Web-Team-Backlog: 14% of "none" referers seems too high - 14https://phabricator.wikimedia.org/T195880#9665485 (10ovasileva) 05Open→03Declined 14Closing this as it seems it hasn't been updated in the past four years [14:50:14] 10Data-Engineering (Sprint 9): We should provide DQ integration with Python - https://phabricator.wikimedia.org/T353940#9665686 (10xcollazo) This is looking pretty cool! [15:02:15] !log dropping the hue.wikimedia.org CNAME - T341895 [15:02:18] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:02:18] T341895: Deprecate Hue and stop the services - https://phabricator.wikimedia.org/T341895 [15:02:39] 06Data-Engineering, 10Data-Platform-SRE (2024.03.25 - 2024.04.14), 13Patch-For-Review: 14Update the From: addresses of all email from DPE pipelines so that they use routable addresses - 14https://phabricator.wikimedia.org/T358675#9665775 (10BTullis) 05Open→03Resolved 14I believe that this is all don... [15:04:45] 06Data-Engineering, 06Data Products, 06Data-Platform, 06Movement-Insights: New Data Pipeline for New and Returning Active Editor metrics by Geo (Country & Region) and wiki - https://phabricator.wikimedia.org/T359646#9665791 (10VirginiaPoundstone) [[ https://wikimedia.slack.com/archives/C05KTS2S1J4/p1711374... [15:07:06] (03PS2) 10Mforns: Productionize CommonsCategoryGraphBuilder for CIM project [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1015013 (https://phabricator.wikimedia.org/T358681) [15:10:46] (03PS3) 10Mforns: Productionize CommonsCategoryGraphBuilder for CIM project [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1015013 (https://phabricator.wikimedia.org/T358681) [15:14:49] !log decommissioning an-tool1009 now that hue is fully offline - T341895 [15:14:52] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:14:52] T341895: Deprecate Hue and stop the services - https://phabricator.wikimedia.org/T341895 [15:17:39] all alerts related to AQS are now gone from alerts.wikimedia.org. I've had to systemctl stop, disable and reset-failed aqs.service on all aqs hosts [15:18:28] brouberol: Much wikilove to you for killing hue :) [15:18:44] my pleasure :) [15:18:52] brouberol: I don't really understand the message about aqs alerts - would explain a bit more please? [15:20:11] basically, I've decommissioned AQS 1.0 this week. However, AQS was a bit of a baroque setup for WMF: we had 2 services running on the same hosts: aqs and cassandra. Meaning that I disabled the service but left the hosts alive, to keep cassandra [15:20:33] the aqs systemd service was however in a failed state, causing alerts, that I had silenced [15:20:45] ok, makes sense brouberol - thank you for the details :) [15:21:03] I finally came around and understood what was happening, reset the failure counter of the aqs service after having force stopped them all, and the alert went quiet [15:23:31] ack brouberol - thanks so much [15:23:42] 06Data-Engineering, 10Data-Platform-SRE (2024.03.25 - 2024.04.14), 13Patch-For-Review: Deprecate Hue and stop the services - https://phabricator.wikimedia.org/T341895#9665890 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by brouberol@cumin2002 for hosts: `an-tool1009.eqiad.wmnet` - an-tool... [15:23:50] again, my pleasure [15:30:04] 06Data-Engineering, 10MediaWiki-extensions-WikimediaEvents, 10Data Products (Data Products Sprint 11), 13Patch-For-Review, 10Web-Team-Backlog (FY2023-24 Q4 Sprint 1): Update mediawiki.web_ui_actions Stream Config - https://phabricator.wikimedia.org/T360955#9665928 (10phuedx) [15:32:36] 14Analytics-Radar, 06Data-Engineering, 06Data Products, 10Metrics Platform Backlog: mw.user.generateRandomSessionId should return a UUID - https://phabricator.wikimedia.org/T266813#9665933 (10VirginiaPoundstone) @Ottomata and @lbowmaker I think this is a library that Data Engineering owns? Should this get... [15:40:11] 06Data-Engineering, 10Data-Engineering-Wikistats, 06Data Products, 07I18n, 13Patch-For-Review: Wikistats 2 should translate month names and abbreviations - https://phabricator.wikimedia.org/T336815#9665961 (10VirginiaPoundstone) [15:41:29] 06Data-Engineering, 10Data-Engineering-Wikistats, 10Data Pipelines, 06Data Products: 14Wikistats in Uzbek - 14https://phabricator.wikimedia.org/T314477#9665965 (10VirginiaPoundstone) 05Open→03Resolved 14Main task is done. Marking as resolved.  [15:45:13] 14Analytics, 06Data Products: Add cawiki to clickstream dataset - https://phabricator.wikimedia.org/T327982#9665976 (10VirginiaPoundstone) This require privacy review. @Htriedman would this be you? Once reviewed we can prioritize for implementation [15:52:12] 06Data-Engineering, 10Data-Engineering-Wikistats, 06Data Products, 06Movement-Insights: arywiki view stats too low for agent = user? - https://phabricator.wikimedia.org/T359004#9665996 (10VirginiaPoundstone) [15:54:20] 06Data-Engineering, 10Data-Engineering-Wikistats, 06Data Products, 06Movement-Insights: arywiki view stats too low for agent = user? - https://phabricator.wikimedia.org/T359004#9666021 (10VirginiaPoundstone) @Mayakp.wiki and @nshahquinn-wmf added this to #movement-insights so you can check for data quality... [15:58:24] 06Data-Engineering, 10Data-Engineering-Wikistats, 06Data Products, 06Movement-Insights: arywiki view stats too low for agent = user? - https://phabricator.wikimedia.org/T359004#9666035 (10VirginiaPoundstone) @Maurusian I am removing the wikistats tag on this task since this is a data quality question. Plea... [15:59:22] (03PS1) 10Gehel: Correct stlye issues with spotless. [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1015075 [16:03:01] (03CR) 10Gehel: [C:04-1] "We need to discuss the risk related to this change before merging!" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1015075 (owner: 10Gehel) [16:06:42] 06Data-Engineering, 10MediaWiki-extensions-WikimediaEvents, 10Data Products (Data Products Sprint 11), 13Patch-For-Review, 10Web-Team-Backlog (FY2023-24 Q4 Sprint 1): Update mediawiki.web_ui_actions Stream Config - https://phabricator.wikimedia.org/T360955#9666088 (10VirginiaPoundstone) [16:06:50] 06Data-Engineering, 10MediaWiki-extensions-WikimediaEvents, 10Data Products (Data Products Sprint 11), 13Patch-For-Review, 10Web-Team-Backlog (FY2023-24 Q4 Sprint 1): Update mediawiki.web_ui_actions Stream Config - https://phabricator.wikimedia.org/T360955#9666089 (10VirginiaPoundstone) [16:53:32] joal: I have a scary patch for you! https://gerrit.wikimedia.org/r/c/analytics/refinery/source/+/1015075 [17:20:24] * joal is scared X) [18:56:43] (03PS15) 10Gmodena: development: add webrequest schema [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/983898 (https://phabricator.wikimedia.org/T314956) (owner: 10Ottomata) [19:00:07] (03CR) 10Gmodena: development: add webrequest schema (032 comments) [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/983898 (https://phabricator.wikimedia.org/T314956) (owner: 10Ottomata) [19:00:19] (03CR) 10Gmodena: [C:03+2] development: add webrequest schema [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/983898 (https://phabricator.wikimedia.org/T314956) (owner: 10Ottomata) [19:00:48] (03Merged) 10jenkins-bot: development: add webrequest schema [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/983898 (https://phabricator.wikimedia.org/T314956) (owner: 10Ottomata) [19:22:31] (03CR) 10Snwachukwu: Mediawiki History Data Quality Metrics (032 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1008934 (https://phabricator.wikimedia.org/T354692) (owner: 10Snwachukwu) [19:32:53] (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-coord1003:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1003:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [20:58:23] (03PS4) 10Gmodena: Add gobblin job webrequest_frontend_rc0 [analytics/refinery] - 10https://gerrit.wikimedia.org/r/983926 (https://phabricator.wikimedia.org/T314956) (owner: 10Ottomata) [21:00:46] (03CR) 10Joal: [C:03+2] "LGTM!" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1014004 (https://phabricator.wikimedia.org/T358675) (owner: 10Btullis) [21:11:34] (03Merged) 10jenkins-bot: Update the from address of refine reports to be routable [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1014004 (https://phabricator.wikimedia.org/T358675) (owner: 10Btullis) [21:37:53] (HiveServerHeapUsage) resolved: Hive Server JVM Heap usage is above 80% on an-coord1003:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1003:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [22:09:39] (03PS1) 10Cooltey: Clean up old MobileWikiApp schemas [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1015126 (https://phabricator.wikimedia.org/T360579) [22:12:36] (03PS2) 10Cooltey: Clean up old MobileWikiApp schemas [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1015126 (https://phabricator.wikimedia.org/T360579) [22:26:25] 10Data-Engineering (Sprint 9), 10Event-Platform: ProduceCanaryEvents job should be scheduled by Airflow and/or a k8s service - https://phabricator.wikimedia.org/T341229#9667372 (10Ahoelzl) a:03JAllemandou [22:46:11] 14Analytics-Radar, 06Data-Engineering, 06Data Products, 10Metrics Platform Backlog: mw.user.generateRandomSessionId should return a UUID - https://phabricator.wikimedia.org/T266813#9667409 (10Ottomata) > I think this is a library that Data Engineering owns? @VirginiaPoundstone I don't think so. I believe `... [23:06:53] (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-coord1003:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1003:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage