[00:02:27] (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-test-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-test-coord1001:10100 - https://alerts.wikimedia.org [00:12:27] (HiveServerHeapUsage) resolved: Hive Server JVM Heap usage is above 80% on an-test-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-test-coord1001:10100 - https://alerts.wikimedia.org [01:00:06] I had the kernel in my jupyter notebook die on me two times now (I think that is what happened? Had to 'restart my server') while working with a big pd df. Last time was around 00:31h UTC. My log shows an note at the same time about [01:00:06] "Note: NumExpr detected 32 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8." and "NumExpr defaulting to 8 threads.". Is this related? Should I somehow pick different settings? [01:00:32] (working on stat1008) [02:02:27] (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-test-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-test-coord1001:10100 - https://alerts.wikimedia.org [02:07:27] (HiveServerHeapUsage) resolved: Hive Server JVM Heap usage is above 80% on an-test-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-test-coord1001:10100 - https://alerts.wikimedia.org [03:14:07] effeietsanders: I don't know much about it, but there's a snippet of documentation here telling you to check grafana for memory use: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Jupyter#My_kernel_restarts_when_I_run_a_large_query [03:14:12] and there may be some other tips there [03:14:58] AndyRussG: I have no idea, nobody's ever asked that! I don't think we enforce a particular limit, but I mean we don't have infinite space. What size were you thinking and for what purpose? Maybe it should be productionized if it's too big? [03:17:50] milimetric: oh thanks! I don't know the sizes of the data I was thinking of storing yet. I'll be able to get a sense after running some test queries, so I wanted to get a sense of what limits I might hit before settling on a strategy, from among a few options. [03:18:53] It's to look at distributions of articles over weekly pageviews, in total and by referrer, by project-country pairs [03:20:17] not a specific ask from my team btw, something I'm looking at myself just as a personal project (though folks in Product are also digging into the same general topic, pageview trends) [03:21:14] I did some initial queries with large samples, but it looks like using the full data will work better [03:22:24] (if you're curious, here is some of the initial exploration: https://gitlab.wikimedia.org/-/snippets/9) [04:03:27] (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-test-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-test-coord1001:10100 - https://alerts.wikimedia.org [04:08:27] (HiveServerHeapUsage) resolved: Hive Server JVM Heap usage is above 80% on an-test-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-test-coord1001:10100 - https://alerts.wikimedia.org [04:58:34] thanks milimetric [06:02:27] (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-test-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-test-coord1001:10100 - https://alerts.wikimedia.org [06:07:27] (HiveServerHeapUsage) resolved: Hive Server JVM Heap usage is above 80% on an-test-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-test-coord1001:10100 - https://alerts.wikimedia.org [08:03:27] (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-test-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-test-coord1001:10100 - https://alerts.wikimedia.org [08:08:27] (HiveServerHeapUsage) resolved: Hive Server JVM Heap usage is above 80% on an-test-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-test-coord1001:10100 - https://alerts.wikimedia.org [10:02:27] (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-test-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-test-coord1001:10100 - https://alerts.wikimedia.org [10:07:27] (HiveServerHeapUsage) resolved: Hive Server JVM Heap usage is above 80% on an-test-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-test-coord1001:10100 - https://alerts.wikimedia.org [12:02:27] (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-test-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-test-coord1001:10100 - https://alerts.wikimedia.org [12:07:27] (HiveServerHeapUsage) resolved: Hive Server JVM Heap usage is above 80% on an-test-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-test-coord1001:10100 - https://alerts.wikimedia.org [14:03:27] (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-test-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-test-coord1001:10100 - https://alerts.wikimedia.org [14:08:27] (HiveServerHeapUsage) resolved: Hive Server JVM Heap usage is above 80% on an-test-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-test-coord1001:10100 - https://alerts.wikimedia.org [15:07:45] hey teamm [15:22:12] 10Analytics, 10Stewards-and-global-tools: Collect information about users affected by blocks - https://phabricator.wikimedia.org/T297051 (10Urbanecm) [15:40:04] Hi folks! I'm a Data Science student from ITU in Copenhagen. I'm interested in monthly top viewed wikipedia page statistics per *country* going back a few years. Is there an avenue for getting access to this? Merry Christmas :) [15:47:46] Hello syrkis! Welcome to the channel. https://wikimedia.org/api/rest_v1/#/Pageviews%20data/get_metrics_pageviews_top_by_country__project___access___year___month_ looks like what you need. Can you check that? [15:49:34] (or the one below) [15:49:47] https://wikimedia.org/api/rest_v1/metrics/pageviews/top-per-country/CZ/desktop/2021/12/10 would be top visited pages from Czech Republic on 2021-12-10, for example [15:51:32] Thanks for the reply. I've looked at that, yes. The one below is exactly the right format, but it seems it only goes back about a year. I'm interested in as far back in time as possible. [15:53:23] Like from 2015 or 2011 :) [15:55:20] I see. Noting there were changes in what's deemed a pageview in about 2015 (see https://meta.wikimedia.org/wiki/Research:Page_view for more details), so data from 2014 and, let's say, 2020 will likely not be directly comparable. Let me check if there are dumps of similar information to the ones provided by the API I linked you to. [16:04:07] Woah thanks a lot :) [16:04:27] (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-test-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-test-coord1001:10100 - https://alerts.wikimedia.org [16:09:27] (HiveServerHeapUsage) resolved: Hive Server JVM Heap usage is above 80% on an-test-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-test-coord1001:10100 - https://alerts.wikimedia.org [16:10:13] syrkis: so, unfortunately, the top-per-country endpoint appears to be introduced in 2021 (see https://phabricator.wikimedia.org/T207171 for more details). This is very likely the reason why it only has 2021 data. Maybe it would be possible to backfill the public API endpoint using internal datasets to provide older information – but that's a "maybe". [16:11:08] There are regularly held Wikimedia Research Office Hours (once per month). You might want to come and bring the question there (you'd likely receive a better answer that what I'm able to provide). More information is at https://www.mediawiki.org/wiki/Wikimedia_Research/Office_hours [16:13:02] Got it. Tanks. Do you happen to know if there's a way to potentially access the data for research purposes? :) [16:15:25] syrkis: Formal collaborations exists, which can grant access to private datasets, if the research conducted requires this level of access. [16:18:11] Awesome. Thanks. Have a nice evening kind sir :) [16:18:24] syrkis: You too :). I still suggest you come to next office hours though ;) [16:18:49] It's in the calendar ;) [16:18:56] Excellent :) [16:21:10] syrkis: also, while this is not about pageviews, there are published information about editors in countries . This might be useful, or also not (I'm not sure what's your project about). Linking in case you wish to take a look. [16:32:44] urbanecm: I did see that dump, yeah. I'm in early stages of research, but the idea is to see if wikipedia activity correlates with a country's responses to a subset of the European Social Survey (or similar). Editor activity is not representative enough, probably. . . . [16:35:03] syrkis: and it's also a different kind of activity (editors are more involved than readers, for obvious reasons). Well, good luck with your research :) [16:56:57] (03PS1) 10BryanDavis: Update old *.wmflabs.org URLs to modern equivalents [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/749756 [17:09:46] milimetric: standuppp?? :] [17:38:55] Sorry mforns, been feeling ill today too. I'm a little better now after a nap. I'm eating and will catch up with you in a bit if you're still around [17:39:40] don't worry milimetric I was just pinging in case you were distracted, please take rest!! [18:03:27] (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-test-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-test-coord1001:10100 - https://alerts.wikimedia.org [18:08:27] (HiveServerHeapUsage) resolved: Hive Server JVM Heap usage is above 80% on an-test-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-test-coord1001:10100 - https://alerts.wikimedia.org [19:00:15] thx so much for the great answers urbanecm. syrkis: we do have that data internally, and could probably backfill, but it would be fighting with a lot of other priorities. Best thing to do is open up a ohab task and a formal collaboration request as you see fit. Office hours for the research team would be a great place to start. [19:00:37] happy to help milimetric :) [19:00:39] *phabricator task [19:00:57] see https://www.mediawiki.org/wiki/Phabricator/Help for details about Phabricator [19:19:41] (03PS4) 10Sharvaniharan: Android MEP schema for customizing toolbar [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/747226 (https://phabricator.wikimedia.org/T297818) [20:03:27] (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-test-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-test-coord1001:10100 - https://alerts.wikimedia.org [20:08:27] (HiveServerHeapUsage) resolved: Hive Server JVM Heap usage is above 80% on an-test-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-test-coord1001:10100 - https://alerts.wikimedia.org [22:02:27] (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-test-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-test-coord1001:10100 - https://alerts.wikimedia.org [22:07:27] (HiveServerHeapUsage) resolved: Hive Server JVM Heap usage is above 80% on an-test-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-test-coord1001:10100 - https://alerts.wikimedia.org