[00:14:53] (HiveServerHeapUsage) resolved: Hive Server JVM Heap usage is above 80% on an-coord1003:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1003:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [05:07:53] (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-coord1003:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1003:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [05:40:22] is there any publicly-available data on the most requested thumbnails? [06:32:18] 10Data-Engineering (Q4 2024 April 1st - June 30th), 06Data-Platform, 13Patch-For-Review: Unique devices tables have missing or incorrect data for January and February 2024 - https://phabricator.wikimedia.org/T361242#9721339 (10JAllemandou) The problem has been fixed. The bug has been introduced when we migra... [06:46:17] FYI: Another ~3-4 hour job on stat1009 from our side. [06:47:15] Running. [07:12:53] (HiveServerHeapUsage) resolved: Hive Server JVM Heap usage is above 80% on an-coord1003:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1003:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [07:22:34] 06Data-Engineering, 06Data-Platform-SRE: Package request: install elixir and erlang-otp to the analytics clients - https://phabricator.wikimedia.org/T362678#9721459 (10MoritzMuehlenhoff) It's worth nothing that the stat hosts are on Bullseye/Debian 11, which being provides the following versions: - Erlang 23.2... [07:37:53] !log disable puppet on an-test-client1002 to test new conda anaytics deb T362648 [07:37:55] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [07:37:56] T362648: Rebuild conda-analytics container on Bullseye - https://phabricator.wikimedia.org/T362648 [07:39:09] !log analytics/refinery deploy begin (added source jars 0.2.35) [07:39:10] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:00:51] !log enable puppet on an-test-client1002 done testing new conda anaytics deb T362648 [08:00:53] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:00:54] T362648: Rebuild conda-analytics container on Bullseye - https://phabricator.wikimedia.org/T362648 [08:40:54] !log Deployed refinery using scap, then deployed onto hdfs [08:40:55] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [12:07:46] 06Data-Engineering, 10Data-Platform-SRE (2024.04.15 - 2024.05.05), 13Patch-For-Review: Migrate the matomo host to bookworm - https://phabricator.wikimedia.org/T349397#9722238 (10BTullis) [12:17:39] (03CR) 10Mforns: Productionize CommonsCategoryGraphBuilder for CIM project (033 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1015013 (https://phabricator.wikimedia.org/T358681) (owner: 10Mforns) [12:18:51] (03PS7) 10Mforns: Productionize CommonsCategoryGraphBuilder for CIM project [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1015013 (https://phabricator.wikimedia.org/T358681) [13:53:00] 06Data-Engineering, 06Data-Platform-SRE: Package request: install elixir and erlang-otp to the analytics clients - https://phabricator.wikimedia.org/T362678#9722515 (10BTullis) Hi @awight - I'm happy to try to help here, but as @MoritzMuehlenhoff points out, trying to get packages from the Debian repositories... [13:56:32] (03PS3) 10Gehel: Sort pom.xml according to standard sortpom order. [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1014516 (https://phabricator.wikimedia.org/T360219) [13:56:32] (03PS5) 10Gehel: Start using wmf-jvm-parent-pom. [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1014517 [13:56:32] (03PS4) 10Gehel: Remove duplication from parent pom. [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1014546 (https://phabricator.wikimedia.org/T360219) [13:56:33] (03PS3) 10Gehel: Sort the dependencyManagement section according to sortPom. [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1014564 (https://phabricator.wikimedia.org/T360219) [13:56:34] (03PS3) 10Gehel: Move version configuration of dependencies to main pom. [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1014601 (https://phabricator.wikimedia.org/T360219) [13:56:38] (03PS3) 10Gehel: Sort some refinery modules according to sortPom. [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1015035 (https://phabricator.wikimedia.org/T360219) [13:56:42] (03PS3) 10Gehel: Correct stlye issues with spotless. [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1015075 [13:56:58] (03CR) 10CI reject: [V:04-1] Sort pom.xml according to standard sortpom order. [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1014516 (https://phabricator.wikimedia.org/T360219) (owner: 10Gehel) [13:57:10] (03CR) 10CI reject: [V:04-1] Start using wmf-jvm-parent-pom. [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1014517 (owner: 10Gehel) [13:57:12] (03CR) 10CI reject: [V:04-1] Sort the dependencyManagement section according to sortPom. [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1014564 (https://phabricator.wikimedia.org/T360219) (owner: 10Gehel) [13:57:15] (03CR) 10CI reject: [V:04-1] Remove duplication from parent pom. [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1014546 (https://phabricator.wikimedia.org/T360219) (owner: 10Gehel) [13:59:06] (03CR) 10CI reject: [V:04-1] Sort some refinery modules according to sortPom. [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1015035 (https://phabricator.wikimedia.org/T360219) (owner: 10Gehel) [13:59:07] (03CR) 10CI reject: [V:04-1] Correct stlye issues with spotless. [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1015075 (owner: 10Gehel) [13:59:09] (03CR) 10CI reject: [V:04-1] Move version configuration of dependencies to main pom. [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1014601 (https://phabricator.wikimedia.org/T360219) (owner: 10Gehel) [14:21:38] 06Data-Engineering, 06Data-Platform-SRE, 10Scap, 07git-lfs, and 2 others: analytics/refinery: Stop using git-fat - https://phabricator.wikimedia.org/T328472#9722622 (10BTullis) 05Open→03Resolved Thanks so much @dancy and @hashar and everyone else who has helped. I believe that this is resolved. If... [15:03:15] 06Data-Engineering: [DQ] Add support for distribution metrics in data quality exporters - https://phabricator.wikimedia.org/T362780 (10gmodena) 03NEW [15:08:57] 06Data-Engineering: [DQ][NEEDS GROOMING] Add support for deequ's RowLevelSchemaValidator in refinery - https://phabricator.wikimedia.org/T362782 (10gmodena) 03NEW [15:16:00] 10Data-Engineering (Q4 2024 April 1st - June 30th): Add instrumentation for actor signatures - https://phabricator.wikimedia.org/T362783 (10gmodena) 03NEW [15:18:30] 10Data-Engineering (Q4 2024 April 1st - June 30th): Add host level instrumentation on webrequest - https://phabricator.wikimedia.org/T362785 (10gmodena) 03NEW [15:20:14] 06Data-Engineering, 10MediaWiki-extensions-WikimediaEvents, 10Data Products (Data Products Sprint 11), 13Patch-For-Review, 10Web-Team-Backlog (FY2023-24 Q4 Sprint 1): Update mediawiki.web_ui_actions Stream Config - https://phabricator.wikimedia.org/T360955#9722897 (10WDoranWMF) 05Open→03Resolved... [15:21:07] 06Data-Engineering, 06MediaWiki-Engineering, 06serviceops, 10WMF-JobQueue, and 2 others: Could not enqueue jobs: "Unable to deliver all events: 503: Service Unavailable" - https://phabricator.wikimedia.org/T249745#9722942 (10Ottomata) > see the CAP theorem C != eventual-C. Eventual Consistency + AP is fea... [15:21:31] 14Analytics, 06Data-Engineering, 06DBA, 10Event-Platform: Eventually-Consistent MediaWiki state change events | MediaWiki events as source of truth - https://phabricator.wikimedia.org/T120242#9722943 (10Ottomata) [15:25:42] 06Data-Engineering, 10Data-Engineering-Wikistats: Missing contributor stats for Singapore - https://phabricator.wikimedia.org/T344624#9722985 (10VirginiaPoundstone) [15:25:43] 06Data-Engineering, 10Data-Engineering-Wikistats: Missing contributor stats for Singapore - https://phabricator.wikimedia.org/T344624#9722984 (10VirginiaPoundstone) @Htriedman do you know if there is a reason we do not publish data on Singapore editor numbers? [15:29:34] 06Data-Engineering, 10Data-Engineering-Wikistats: Page views by country and total page namespaces are confusingly displayed - https://phabricator.wikimedia.org/T354932#9723019 (10VirginiaPoundstone) [15:29:35] 14Analytics, 06Data-Engineering, 10Data-Engineering-Wikistats: prefix symbol that modifies unit magnitude - https://phabricator.wikimedia.org/T356534#9723018 (10VirginiaPoundstone) [15:29:39] 06Data-Engineering, 10Data-Engineering-Wikistats: Contradictory descriptions in "Total page views" - https://phabricator.wikimedia.org/T354931#9723020 (10VirginiaPoundstone) [15:29:40] 06Data-Engineering, 10Data-Engineering-Wikistats: Add Farsi/Persian to WikiStats interface languages - https://phabricator.wikimedia.org/T348674#9723022 (10VirginiaPoundstone) [15:29:42] 14Analytics, 06Data-Engineering, 10Data-Engineering-Wikistats: Make wikistats pages, sections and individual infoboxes transcludable - https://phabricator.wikimedia.org/T351053#9723021 (10VirginiaPoundstone) [15:29:48] 06Data-Engineering: Codex, Graph, and Wikistats walk into a bar graph - https://phabricator.wikimedia.org/T336544#9723025 (10VirginiaPoundstone) [15:29:52] 06Data-Engineering, 10Data-Engineering-Wikistats, 07I18n, 13Patch-For-Review: Wikistats 2 should translate month names and abbreviations - https://phabricator.wikimedia.org/T336815#9723024 (10VirginiaPoundstone) [16:09:26] a-team: is there any publicly-available data on the most requested thumbnails across wikimedia projects? [16:13:18] ori: Apologies, I did see your message above, but failed to respond. I don't know of anything that relates to thumbnails specifically. [16:14:36] The nearest that I believe we have is https://wikitech.wikimedia.org/wiki/Analytics/AQS/Mediarequests#Top_files_by_mediarequests which allows one to filter by `mediatype=image` [16:15:04] 10Data-Engineering (Q4 2024 April 1st - June 30th): Update MW history data quality job to use Deequ Anomaly detection Capability - https://phabricator.wikimedia.org/T362803 (10Snwachukwu) 03NEW [16:18:49] btullis: thanks! That's close to what I want, but it looks like it aggregates requests for thumbnails by the canonical file name, so it's not possible to tell how often files are requested at particular resolutions. [16:20:34] My use-case: I've been reading with interest about a JPEG coding library that is purported to produce more compact and better-looking JPEGs than the jpeg libraries currently used to generate thumbnails. I wanted to see what the space savings and quality difference would be like for the top N jpegs by request volume. [16:21:14] hey ori, the google library Jpegli? :-) [16:21:31] ori: Thanks for the context. I can't think if anything else that is public that could help you work this out, but maybe other people might know. [16:21:52] yes :) as a disclosure I work for Google, but this has nothing to do with my employment, I don't know the people working on Jpegli, my motive is entirely as a Wikimedian [16:22:10] I wasn't implying that :) [16:22:16] I know :) [16:22:33] ( if anyone else is curious) [16:25:12] I guess I'll file a request via phab and take it from there, unless anyone has a better idea [16:25:33] ori: I think you're still in the NDA group in Phab right? If so I guess I could extract some top-N of the last month and put it in an NDA phaste [16:25:41] unless others see something wrong with it [16:26:13] volans|off: I don't see anything wrong with that approach [16:27:33] volans|off: right. I am hoping the results will be cool enough that I'd want to share them, so want to make sure in advance that I'm working with a dataset that others could look at too [16:30:05] volans|off: btw i still love cumin :D [16:30:21] <3 [16:33:54] (03PS1) 10Snwachukwu: Update MediaWikiHistory Metrics - Use AWS deequ Anomalydetection capabilities - Use FileMetricRepository to store previous snapshot metric [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1020864 (https://phabricator.wikimedia.org/T362803) [16:39:16] 06Data-Engineering, 06MediaWiki-Engineering, 06serviceops, 10WMF-JobQueue, and 2 others: Could not enqueue jobs: "Unable to deliver all events: 503: Service Unavailable" - https://phabricator.wikimedia.org/T249745#9723515 (10akosiaris) >>! In T249745#9722942, @Ottomata wrote: >> see the CAP theorem > C !=... [16:45:27] I guess I'll wait for some "approval" then [16:46:21] 06Data-Engineering, 10EventStreams, 10MediaWiki-Page-protection, 10MediaWiki-Revision-deletion, and 4 others: Create Mediawiki "oversightprotect" action that suppresses usernames of all edits of a page - https://phabricator.wikimedia.org/T354577#9723542 (10Htriedman) going to investigate the feasibility of... [16:46:56] I can try to spread the word a bit, too. [16:48:18] FYI I've already a query ready on superset sql lab for the last 30 days sampled dataset, if o.ri wants data spread across more time and not sampled I'll leave it to analytics to extract it ;) [16:48:56] * volans|off back off [17:31:28] 06Data-Engineering, 06MediaWiki-Engineering, 06serviceops, 10WMF-JobQueue, and 2 others: Could not enqueue jobs: "Unable to deliver all events: 503: Service Unavailable" - https://phabricator.wikimedia.org/T249745#9723704 (10Ottomata) >> For replicating state changes (T120242) [...] > Why though? Why is 99... [17:53:06] 10Data-Engineering (Q4 2024 April 1st - June 30th): Add host level instrumentation on webrequest - https://phabricator.wikimedia.org/T362785#9723789 (10CodeReviewBot) gmodena updated https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/650 Draft: analytics: webrequest: actor: add dq... [17:53:07] 10Data-Engineering (Q4 2024 April 1st - June 30th): Add instrumentation for actor signatures - https://phabricator.wikimedia.org/T362783#9723790 (10CodeReviewBot) gmodena updated https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/650 Draft: analytics: webrequest: actor: add dq job [18:05:38] (03PS5) 10Gmodena: refinery-job: add webrequest instrumentation. [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1019867 (https://phabricator.wikimedia.org/T351117) [18:07:38] 10Data-Engineering (Q4 2024 April 1st - June 30th), 06Data-Platform, 06Movement-Insights, 13Patch-For-Review: Unique devices tables have missing or incorrect data for January and February 2024 - https://phabricator.wikimedia.org/T361242#9723859 (10Mayakp.wiki) [18:42:09] hello [18:42:42] (03PS6) 10Gmodena: refinery-job: add webrequest instrumentation. [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1019867 (https://phabricator.wikimedia.org/T351117) [18:56:31] (03PS7) 10Gmodena: refinery-job: add webrequest instrumentation. [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1019867 (https://phabricator.wikimedia.org/T351117) [19:17:18] (03CR) 10Xcollazo: [C:03+1] "This looks great from my side. LGTM!" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1015013 (https://phabricator.wikimedia.org/T358681) (owner: 10Mforns) [19:23:28] 06Data-Engineering, 06Data-Platform, 06Movement-Insights, 13Patch-For-Review: Unique devices tables have missing or incorrect data for January and February 2024 - https://phabricator.wikimedia.org/T361242#9724119 (10lbowmaker) [20:06:22] 06Data-Engineering, 06MediaWiki-Engineering, 06serviceops, 10WMF-JobQueue, and 2 others: Could not enqueue jobs: "Unable to deliver all events: 503: Service Unavailable" - https://phabricator.wikimedia.org/T249745#9724249 (10Ladsgroup) >>! In T249745#9723704, @Ottomata wrote: >>> For replicating state chan... [20:31:50] (03CR) 10Mforns: Productionize CommonsCategoryGraphBuilder for CIM project (035 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1015013 (https://phabricator.wikimedia.org/T358681) (owner: 10Mforns) [20:32:03] (03CR) 10Mforns: [C:03+2] Productionize CommonsCategoryGraphBuilder for CIM project [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1015013 (https://phabricator.wikimedia.org/T358681) (owner: 10Mforns) [20:32:44] (03CR) 10Mforns: [V:03+2 C:03+2] Productionize CommonsCategoryGraphBuilder for CIM project [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1015013 (https://phabricator.wikimedia.org/T358681) (owner: 10Mforns) [20:37:02] (03PS19) 10Mforns: Clean up and parameterize SQL code for Common Impact Metrics. [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1016796 (https://phabricator.wikimedia.org/T358681) (owner: 10Xcollazo) [20:37:10] (03CR) 10Mforns: [C:03+2] Clean up and parameterize SQL code for Common Impact Metrics. [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1016796 (https://phabricator.wikimedia.org/T358681) (owner: 10Xcollazo) [20:37:16] (03CR) 10Mforns: [V:03+2 C:03+2] Clean up and parameterize SQL code for Common Impact Metrics. [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1016796 (https://phabricator.wikimedia.org/T358681) (owner: 10Xcollazo) [20:45:50] (03PS1) 10Mforns: Update changelog.md for 0.2.36 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1020919 [20:46:12] (03CR) 10Mforns: [V:03+2 C:03+2] Update changelog.md for 0.2.36 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1020919 (owner: 10Mforns) [20:47:40] Starting build #2 for job analytics-refinery-maven-release [21:03:03] Project analytics-refinery-maven-release build #2: 09SUCCESS in 15 min: https://integration.wikimedia.org/ci/job/analytics-refinery-maven-release/2/ [21:25:49] Starting build #2 for job analytics-refinery-update-jars [21:27:47] (03PS1) 10Maven-release-user: Add refinery-source jars for v0.2.36 to artifacts [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1020720 [21:27:47] Project analytics-refinery-update-jars build #2: 09SUCCESS in 1 min 58 sec: https://integration.wikimedia.org/ci/job/analytics-refinery-update-jars/2/ [21:39:44] (03CR) 10Mforns: [C:03+2] Add refinery-source jars for v0.2.36 to artifacts [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1020720 (owner: 10Maven-release-user) [21:39:46] (03CR) 10Mforns: [V:03+2 C:03+2] Add refinery-source jars for v0.2.36 to artifacts [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1020720 (owner: 10Maven-release-user) [21:40:49] !log Deployed refinery-source using jenkins [21:40:51] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [21:47:09] !log don't have time to deploy refinery today, will do it tomorrow first thing [21:47:11] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [22:16:46] 10Data-Engineering (Q4 2024 April 1st - June 30th): Update converted reportupdater DAG queries to correct CSV options - https://phabricator.wikimedia.org/T362699#9724602 (10amastilovic) [22:17:19] 10Data-Engineering (Q4 2024 April 1st - June 30th): Update converted reportupdater DAG queries to correct CSV options - https://phabricator.wikimedia.org/T362699#9724603 (10amastilovic) >>! In T362699#9722376, @Aklapper wrote: > Hi @amastilovic, can you please associate one or more active project tags with this... [22:23:48] 10Data-Engineering (Q4 2024 April 1st - June 30th): Migrate refinery HQL files to CI/CD supported GitLab repository - https://phabricator.wikimedia.org/T362832 (10amastilovic) 03NEW