[09:07:50] joal: I managed to make it all run :) [09:07:52] https://usercontent.irccloud-cdn.com/file/AkoFd7mT/image.png [09:08:10] \o/ [09:08:37] thats a page view graph for the last 2 years for the covid topic on wikipedis with a log Y, and there are defintly some other minor peaks throughout the 2 year period, but nothing like the start of the whole thing [09:09:42] but I still think I should look into a little more optimization of this part https://usercontent.irccloud-cdn.com/file/BiPEHndL/image.png [09:09:46] I guess using joins [09:10:02] and also I wonder if repartitioning would help, I need to re find the docs for that :) [09:10:12] now to refactor graph generation :) [09:12:32] I think spark optimizes this IN clause as a join - I just prefer it explictly written (that also allows to give hints for partitioning) [09:13:28] ooo, nice [09:13:49] I have no idea how long those queries took, as I left it going overnight, but glad that I woke up to a filled pandas frame :) [09:15:41] running all night is all good - it takes some resource for long time, no problem :) [10:57:42] !log restarted archiva T300626 [10:57:44] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [11:00:30] !log roll-restarting aqs T300626 [11:00:32] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [11:04:28] tweaking the graphs more and more and seeing more and more joal! [11:04:28] Such as the difference in spike between US and India right at the start [11:04:31] https://usercontent.irccloud-cdn.com/file/bhnCeQVQ/image.png [11:21:57] !log roll-restarting druid-test T300626 [11:21:59] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [11:54:07] 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban, 10Patch-For-Review: Send some existing Gobblin metrics to prometheus - https://phabricator.wikimedia.org/T294420 (10fgiunchedi) >>! In T294420#7758676, @Ottomata wrote: > @fgiunchedi Q for you. > > I think using `task_number` in the groupi... [12:07:21] hmm, mobile_app page view data just stops around march 2021, is that expected? [12:08:01] !log roll-restarting druid-public. T300626 [12:08:04] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [12:12:01] (03PS1) 10Sergio Gimeno: Add a link: add number_phrases_shown value for add a link impressions [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/769034 (https://phabricator.wikimedia.org/T301095) [12:25:12] (03CR) 10Kosta Harlan: Add a link: add number_phrases_shown value for add a link impressions (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/769034 (https://phabricator.wikimedia.org/T301095) (owner: 10Sergio Gimeno) [12:47:05] !log roll-restarting druid-analytics T300626 [12:47:07] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [12:49:43] addshore: I'm not sure I follow you. I can see recent 'mobile app' data with the following query. [12:49:48] `hive (wmf)> select * from pageview_hourly where year=2022 and month=3 and day=3 and hour=9 and access_method = 'mobile app' limit 10;` [12:50:04] interesting [12:50:08] Could it be realted to the underscore in your `mobile_app`? [12:50:53] * addshore scratches his head and look harder [13:06:33] PROBLEM - Check unit status of eventlogging_to_druid_navigationtiming_hourly on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit eventlogging_to_druid_navigationtiming_hourly https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [13:09:57] !log killing and rerunning webrequest-load-text-wf for webrequest_source=text/year=2022/month=3/day=7/hour=17, it was stuck in add_partition task as SUSPENDED, not sure why. [13:09:58] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:10:25] (03PS2) 10Sergio Gimeno: Add a link: add number_phrases_shown value for add a link impressions [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/769034 (https://phabricator.wikimedia.org/T301095) [13:10:46] AGh, oops i pushed the wrong button in hue and killed the whole webrequest-load-coord-text [13:10:56] i think i need to resubmit on the CLI... [13:24:40] ottomata: good :D IIRC you have to kill the bundle, and resubmit it as a whole (starting from the lowest hour still not processed in upload/text) [13:27:21] yeah doing [13:27:43] i had to submit an individual coordinator with a stop time too to get the 1 unloaded text hour from yesterday [13:27:50] i had to make a coodinator.properties file to do thatm [13:27:53] mostly the same. [13:28:09] gettitng a weird error when starting the bundle now tho [13:28:15] The following 1 parameters are required but were not defined and no default values are available: webrequest_source [13:28:20] not sure why but i will figure it out [13:30:25] OH right right properties files given to oozie are local filepath [13:31:02] !log restarted webrequest-load oozie bundle as 0073173-220113112502223-oozie-oozi-B starting at 2022-03-08T12:00Z [13:31:03] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:44:48] (03CR) 10Kosta Harlan: [C: 03+2] Add a link: add number_phrases_shown value for add a link impressions (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/769034 (https://phabricator.wikimedia.org/T301095) (owner: 10Sergio Gimeno) [13:45:28] (03Merged) 10jenkins-bot: Add a link: add number_phrases_shown value for add a link impressions [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/769034 (https://phabricator.wikimedia.org/T301095) (owner: 10Sergio Gimeno) [13:45:42] joal: o/ [13:49:05] o/ not sure if this is specific to a table or a broader issue, but I noticed that the `referrer_daily` job didn't seem to run today: https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/referrer_daily [13:49:27] (last partition is for March 6th and yesterday's data is always available by this time of day) [13:50:16] isaac ya i think things a little blocked up atm [13:50:18] i just freed the blockage [13:50:27] am watching to make sure jobs start flowing [13:50:40] there was an hour of webrequest yesterday that didn't get refined properly [13:58:49] excellent -- many thanks, i'll give it a few hours then [13:59:38] (increases my excitement for our future airflow world where i don't rely on crontabs that just hope that the previous hour's crontab went as planned...) [14:05:27] RECOVERY - Check unit status of eventlogging_to_druid_navigationtiming_hourly on an-launcher1002 is OK: OK: Status of the systemd unit eventlogging_to_druid_navigationtiming_hourly https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [14:08:48] well, oozie is doing this [14:08:52] so it is just as good as airflow! [14:08:58] i can see the blockage in a UI! [14:45:31] joal: just realized ANOTHER complexity in generating these metrics [14:45:36] task attempts!!! [14:45:54] even though only a single task will ever work a kafka partition...it might fail and then be attempted again later! [14:46:31] although...maybe (at least with the Gobblin Event), the task related events we are collecting will only happen after a task attempt really completes? [14:51:51] hmm actually, yes, it won't mattter, as long as I set the values in the prometheus CollectorRegistry in order before I push [14:51:59] hmm, no [14:52:00] no [14:52:11] because a different attempt will probably be in a different worker [14:52:19] okay i just hope the event doesn't fire if the attempt fails [14:52:52] 10Data-Engineering-Kanban, 10Data-Catalog, 10Patch-For-Review: Set up opensearch cluster for datahub - https://phabricator.wikimedia.org/T301382 (10BTullis) I've added the firewall rule, so port 9200 is now open to the production networks. Not the analytics vlan, but that's OK. [15:04:12] ottomata: Hi! sorry for not popping up, plenty meetings, and now kids - can we talk after standup? [15:04:51] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Catalog, 10Patch-For-Review: Define LVS load-balancing for OpenSearch cluster - https://phabricator.wikimedia.org/T301458 (10BTullis) I have now merged that change and applied it to the datahubsearch servers. We can see that it now has the realserver IP... [15:05:35] ya! [15:23:04] 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban, 10Patch-For-Review: Send some existing Gobblin metrics to prometheus - https://phabricator.wikimedia.org/T294420 (10Ottomata) BTW, just had another go at getting `gobblin_task_duration`; this of course won't work because taskCommitted and T... [16:04:19] 10Data-Engineering, 10DC-Ops, 10SRE, 10ops-eqiad: Q2:(Need By: TBD) rack/setup/install an-worker11[42-48].eqiad.wmnet - https://phabricator.wikimedia.org/T293922 (10Jclark-ctr) @BTullis Are we able to rack these in new cage Row E and F [16:05:40] mforns: airflow sync? [16:25:06] 10Data-Engineering, 10DC-Ops, 10SRE, 10ops-eqiad: Q2:(Need By: TBD) rack/setup/install an-worker11[42-48].eqiad.wmnet - https://phabricator.wikimedia.org/T293922 (10BTullis) Hi @Jclark-ctr, yes that would be fine. Many thanks. [16:40:48] 10Data-Engineering, 10Infrastructure-Foundations, 10Product-Analytics, 10Research, and 2 others: Maybe restrict domains accessible by webproxy - https://phabricator.wikimedia.org/T300977 (10Milimetric) > Perhaps a way forward would be to find a way to serve those use cases by design instead of by accident.... [16:42:51] Is it possible for me to cleanup / delete something I have at analytics.wikimedia.org/published ? or will I need to file a ticket? [16:44:42] 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban, 10Event-Platform, 10Patch-For-Review: Users should run explicit commands to materialize schema versions, rather than using magic git hooks - https://phabricator.wikimedia.org/T290074 (10Milimetric) Maybe instead of letting CI do it, we can let t... [16:44:51] a-team nothing on train == no deploy, ya? [16:45:13] 👍 [16:47:26] addshore: i can do it for you i think. [16:47:36] hmm, idea: we should just use HDFS for published datasets [16:47:45] then we wouldn't have the multi-sync problem! [16:47:52] addshore: what do you need cleaned up? [16:47:53] ottomata: <3 This directory should be culled https://analytics.wikimedia.org/published/notebooks/addshore/wd-topic-pageviews/wd-topic-pageviews/ [16:48:03] addshore: and you've removed it from the source stat box? [16:48:06] yup [16:48:17] unfortunately not before the sync already grabbed it :D [16:49:19] 10Analytics, 10Analytics-Wikistats, 10Data-Engineering, 10I18n: WikiReportsLocalizations.pm still fetches language names from SVN - https://phabricator.wikimedia.org/T64570 (10Milimetric) To be clear, though, @Aklapper, Wikistats 1 will not be shut down anytime soon. We will finish migrating some more rep... [16:49:36] addshore: what node was it on? [16:49:39] what stat box? [16:49:42] 1005 [16:52:06] oh! addshore i was wrong, it would eventually be deletetd [16:52:10] i just ran the scripts that would have run [16:52:18] oh, awesome! :) [16:52:31] * addshore doesnt want to know how the syncing stuff works [16:52:41] xD [16:56:56] if by doesn't you mean does: https://github.com/wikimedia/puppet/blob/production/modules%2Fstatistics%2Ffiles%2Fhardsync.sh [16:57:30] 10Data-Engineering, 10Data-Engineering-Kanban, 10Product-Analytics: 22 small wikis missing from the mediawiki_history dataset - https://phabricator.wikimedia.org/T299548 (10Milimetric) The wikis were sqooped and are available in wikistats 2: https://stats.wikimedia.org/#/nia.wiktionary.org https://stats.wik... [16:58:11] i always like script withs phrases such as `Hard syncing` and `temp_dest_trash` [16:58:18] hahah [17:00:35] 10Analytics, 10Analytics-Wikistats, 10Data-Engineering: Automate creation of sqoop list of wikis to import data for from sitematrix - https://phabricator.wikimedia.org/T190700 (10Milimetric) p:05Low→03Triage I think this should be much higher priority now. Usually we would move tasks back to incoming to... [17:03:11] 10Data-Engineering, 10Data-Engineering-Kanban, 10Product-Analytics: 22 small wikis missing from the mediawiki_history dataset - https://phabricator.wikimedia.org/T299548 (10Milimetric) a:03Milimetric [17:04:31] 10Data-Engineering, 10Airflow: Low Risk Oozie Migration: session length - https://phabricator.wikimedia.org/T300029 (10Antoine_Quhen) a:03Antoine_Quhen [17:08:20] 10Data-Engineering-Kanban, 10Airflow: Investigate unifying HDFS Sensor and FSSPEC Sensor - https://phabricator.wikimedia.org/T302392 (10Ottomata) [17:09:13] 10Data-Engineering, 10Data-Engineering-Kanban, 10Airflow: Investigate unifying HDFS Sensor and FSSPEC Sensor - https://phabricator.wikimedia.org/T302392 (10Snwachukwu) [17:09:31] 10Data-Engineering, 10Data-Engineering-Kanban, 10Product-Analytics: 22 small wikis missing from the mediawiki_history dataset - https://phabricator.wikimedia.org/T299548 (10Urbanecm) Hello data engineering people, I'm one of the people who are usually involved with wiki creation. We have a maintenance bot th... [17:38:54] 10Data-Engineering, 10Data-Engineering-Kanban, 10Airflow: Investigate unifying HDFS Sensor and FSSPEC Sensor - https://phabricator.wikimedia.org/T302392 (10Ottomata) [17:39:43] joal: question for e? [17:39:44] me? [17:40:12] for olja sorry [17:40:22] ottomata: --^ [17:42:37] i'm sure you'll have questions for me one day [17:43:49] 10Data-Engineering, 10Product-Analytics: Support on understanding traffic and behaviors for users on legacy browsers (somewhat timely) - https://phabricator.wikimedia.org/T303301 (10STHart) [18:04:49] joal: gonna make lunch but i' here for talking if you are thiknking of leaving soon! [18:05:21] ottomata: I'll be in planning sessions, I like to listen to them - and I also have other stuff to do so I'm not gone [18:05:40] Please take your time for lunch, I'll be there when you're back (normall) [18:07:04] ok [18:18:11] 10Data-Engineering, 10Data-Engineering-Kanban, 10Airflow: Investigate using a HiveToGraphite connector job instead of individual jobs - https://phabricator.wikimedia.org/T303308 (10JAllemandou) [18:22:09] 10Data-Engineering, 10MediaWiki-extensions-WikimediaEvents, 10Product-Analytics: Remove InputDeviceDynamics EventLoggingSchemas entry - https://phabricator.wikimedia.org/T302896 (10kzimmerman) @phuedx is there something here that might impact our work in Product Analytics, or data that we should check? (Or i... [18:29:57] PROBLEM - Check if active EventStreams endpoint is delivering messages. on alert1001 is CRITICAL: CRITICAL: No EventStreams message was consumed from https://stream.wikimedia.org/v2/stream/recentchange within 10 seconds. https://wikitech.wikimedia.org/wiki/Event_Platform/EventStreams/Administration [18:30:46] mforns: hello! would you have a minute for me? [18:32:33] Hmm ^^ [18:32:48] ottomata: if you do, let's go! [18:33:12] ottomata: I didn't want to ask you a question during your lunch :) [18:33:40] Ah! you were mentioning eventstreams - excuse me [18:33:47] let me know if I can help [18:34:41] joa lets go! [18:34:49] in da acev [19:00:22] RECOVERY - Check if active EventStreams endpoint is delivering messages. on alert1001 is OK: OK: An EventStreams message was consumed from https://stream.wikimedia.org/v2/stream/recentchange within 10 seconds. https://wikitech.wikimedia.org/wiki/Event_Platform/EventStreams/Administration [19:37:28] 10Data-Engineering, 10DC-Ops, 10SRE, 10ops-eqiad: Q2:(Need By: TBD) rack/setup/install an-worker11[42-48].eqiad.wmnet - https://phabricator.wikimedia.org/T293922 (10Jclark-ctr) name rack Unit Port CableID an-worker1142 e1 27u 27 an-worker1143 e2 27u 27 an-worker1144 f1 27u 27... [19:58:51] 10Data-Engineering, 10Data-Engineering-Kanban, 10SRE: Increase max.incremental.fetch.session.cache.slots on Kafka jumbo eqiad - https://phabricator.wikimedia.org/T303324 (10Ottomata) [20:24:33] 10Analytics-Radar, 10Product-Analytics, 10wmfdata-python: Consider rewriting wmfdata-python to use omniduct - https://phabricator.wikimedia.org/T275038 (10EChetty) Just to note -> After speaking to a friend of mine that used to work at AirBnB - they are not actively using/improving Omniduct anymore as their...