[02:05:55] 10Analytics, 10Data-Engineering-Radar, 10MediaWiki-extensions-EventLogging, 10QuickSurveys, and 2 others: QuickSurveys should show an error when response is blocked - https://phabricator.wikimedia.org/T256463 (10Ottomata) > Instead, the response would be returned via an API, and saved to a new database tab... [02:13:47] (03CR) 10Ottomata: [C: 03+1] "I haven't fully followed the updates, but it seems like relevant folks have had their comments responded to so I'm leaving a +1. Maybe ge" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/772934 (owner: 10Sharvaniharan) [05:30:33] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Persistence (Consultation), 10Data-Services, 10cloud-services-team (Kanban): View 'centralauth_p.localuser' references invalid table/column/rights to use them - https://phabricator.wikimedia.org/T304733 (10Marostegui) This probably requires some more... [06:58:49] (03PS1) 10MewOphaswongse: Add welcomeemail-april2022 to homepagevisit schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/774367 (https://phabricator.wikimedia.org/T304805) [07:54:37] 10Data-Engineering, 10Data-Engineering-Kanban, 10Patch-For-Review: Add alert for varnishkafka low/zero messages per second to alertmanager - https://phabricator.wikimedia.org/T300246 (10fgiunchedi) >>! In T300246#7806879, @BTullis wrote: > Tagging @fgiunchedi who might be able to advise further. In the mean... [08:28:14] (03CR) 10Kosta Harlan: [C: 03+2] Add welcomeemail-april2022 to homepagevisit schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/774367 (https://phabricator.wikimedia.org/T304805) (owner: 10MewOphaswongse) [08:28:51] (03Merged) 10jenkins-bot: Add welcomeemail-april2022 to homepagevisit schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/774367 (https://phabricator.wikimedia.org/T304805) (owner: 10MewOphaswongse) [08:30:38] 10Data-Engineering, 10Data-Engineering-Kanban, 10Patch-For-Review: Add alert for varnishkafka low/zero messages per second to alertmanager - https://phabricator.wikimedia.org/T300246 (10BTullis) > Indeed the pooled/depooled status as a metric is sth we want for sure (I don't have the bandwidth to work on it... [08:47:30] (03PS1) 10Aqu: Add archiving job [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/774383 (https://phabricator.wikimedia.org/T300039) [08:54:51] (03CR) 10jerkins-bot: [V: 04-1] Add archiving job [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/774383 (https://phabricator.wikimedia.org/T300039) (owner: 10Aqu) [10:34:43] Hi team - I'm gonna cowork with Arzhel (network engineer) this afternoon - I'll be online and at meetings, but will not have my usual IRC access - please ping me on slack :) [10:36:42] joal: Ack. Something sflow related or netflow related, or something else? [10:44:47] btullis: cross-team bonding :) [10:46:14] Awesome :-) [10:54:48] 10Analytics-Radar, 10Data-Engineering-Radar, 10Product-Analytics, 10wmfdata-python: Consider rewriting wmfdata-python to use omniduct - https://phabricator.wikimedia.org/T275038 (10EChetty) [10:58:50] 10Data-Engineering, 10Data-Engineering-Kanban, 10Airflow, 10Epic, 10Patch-For-Review: Define and implement archiving for Airflow - https://phabricator.wikimedia.org/T300039 (10Antoine_Quhen) https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/40 [12:53:08] 10Data-Engineering-Radar, 10MW-on-K8s, 10serviceops: IPInfo MediaWiki extension depends on presence of maxmind db in the container/host - https://phabricator.wikimedia.org/T288375 (10akosiaris) >>! In T288375#7804357, @BTullis wrote: > Could we deploy the GeoIP databases to the kube-workers and then mount it... [13:03:36] 10Data-Engineering-Radar, 10MediaWiki-General: Update pingback "PHP Version" dashboards - https://phabricator.wikimedia.org/T298922 (10mforns) I see the charts for 1.35, 1.36 and 1.37 already in the dashboard (https://pingback.wmflabs.org/#php-version). It seems they were updated on January 10 2022: https://me... [13:04:28] 10Data-Engineering-Radar, 10MW-on-K8s, 10serviceops: IPInfo MediaWiki extension depends on presence of maxmind db in the container/host - https://phabricator.wikimedia.org/T288375 (10BTullis) >>! In T288375#7810592, @akosiaris wrote: >>>! In T288375#7804357, @BTullis wrote: >> Could we deploy the GeoIP datab... [13:07:56] (03PS2) 10Aqu: Add archiving job for Airflow [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/774383 (https://phabricator.wikimedia.org/T300039) [13:09:36] (03CR) 10Mforns: [C: 03+2] "LGTM!" [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/773818 (https://phabricator.wikimedia.org/T303990) (owner: 10Milimetric) [13:20:17] (03CR) 10Mforns: "LGTM overall! One question about additive:true." [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/773816 (https://phabricator.wikimedia.org/T300365) (owner: 10Milimetric) [13:25:38] (03CR) 10Sbisson: [C: 03+2] Create schemas for Wikistories instrumentation [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/773382 (https://phabricator.wikimedia.org/T287639) (owner: 10Neil P. Quinn-WMF) [13:27:34] (03Merged) 10jenkins-bot: Create schemas for Wikistories instrumentation [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/773382 (https://phabricator.wikimedia.org/T287639) (owner: 10Neil P. Quinn-WMF) [13:47:14] 10Data-Engineering, 10Data-Engineering-Kanban, 10Patch-For-Review: Add alert for varnishkafka low/zero messages per second to alertmanager - https://phabricator.wikimedia.org/T300246 (10Volans) >>! In T300246#7806879, @BTullis wrote: > I asked a question [[https://wm-bot.wmflabs.org/libera_logs/%23wikimedia-... [14:09:41] 10Data-Engineering, 10Data-Engineering-Kanban, 10Patch-For-Review: Add alert for varnishkafka low/zero messages per second to alertmanager - https://phabricator.wikimedia.org/T300246 (10BTullis) > Just to make sure the expectations are correct, one thing is the desired pooled state in etcd, another is if the... [14:10:22] (03PS4) 10Aqu: Fix: Prevent empty normalized host [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/772027 [14:11:47] 10Data-Engineering, 10Data-Engineering-Kanban, 10Airflow: [Airflow] Research, discuss and decide on DAG/task dependencies VS. success/failure files (Oozie style) - https://phabricator.wikimedia.org/T301568 (10mforns) > It allows cascading triggering of jobs to build or rebuild the dependent datasets. Agree,... [14:17:41] 10Data-Engineering-Kanban, 10Airflow: Medium Risk Oozie Migration: mediarequest - https://phabricator.wikimedia.org/T302876 (10Antoine_Quhen) a:03Antoine_Quhen [14:27:13] milimetric: yt? I want to add a 'data lifecycle example' to shared data platform, based on some feedback so far [14:27:17] want to brainstorm one with me? [14:27:49] 10Data-Engineering, 10Privacy Engineering: Investigate releasing historical top-pageview-per-country data - https://phabricator.wikimedia.org/T299627 (10JAllemandou) Hi @Htriedman - Indeed the pageview data is available without the actor signature since mid-2015, aggregated hourly over a set of dimensions (se... [14:28:27] Gimme a minute ottomata [14:30:17] sure np [14:33:28] ok, in bc ottomata [14:39:35] oh ho coming [14:56:58] (03CR) 10Mforns: [V: 03+2 C: 03+2] "LGMT, thanks!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/770900 (https://phabricator.wikimedia.org/T303712) (owner: 10Phuedx) [15:03:09] 10Data-Engineering, 10Data-Engineering-Kanban, 10Airflow: Hosting of GDI use case specific source-code - https://phabricator.wikimedia.org/T304539 (10ntsako) a:03ntsako [15:04:13] 10Data-Engineering, 10Data-Engineering-Kanban, 10Airflow: Hosting of GDI use case specific source-code - https://phabricator.wikimedia.org/T304539 (10ntsako) https://gitlab.wikimedia.org/repos/data-engineering/gdi-jobs created [15:19:54] (03CR) 10Joal: "Still one comment we discussed online the other day - ready after that." [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/772027 (owner: 10Aqu) [15:21:59] 10Data-Engineering, 10Data-Engineering-Kanban: Add the commons-entity dataset to the refinery-drop-mediawiki-snapshots script - https://phabricator.wikimedia.org/T303993 (10JAllemandou) a:03JAllemandou [15:24:32] 10Analytics, 10Data-Engineering, 10Event-Platform, 10Patch-For-Review, 10Readers-Web-Backlog (Kanbanana-FY-2021-22): WikipediaPortal Event Platform Migration - https://phabricator.wikimedia.org/T282012 (10Jdrewniak) a:05Jdrewniak→03Edtadros [15:24:35] 10Analytics, 10Data-Engineering, 10Event-Platform, 10Patch-For-Review, 10Readers-Web-Backlog (Kanbanana-FY-2021-22): WikipediaPortal Event Platform Migration - https://phabricator.wikimedia.org/T282012 (10Jdrewniak) a:05Edtadros→03Jdrewniak [15:24:41] (03CR) 10Milimetric: Fix usability bugs on active editors by country (032 comments) [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/773816 (https://phabricator.wikimedia.org/T300365) (owner: 10Milimetric) [15:25:24] 10Analytics-Dashiki, 10Data-Engineering, 10Data-Engineering-Kanban: Dashiki fixes needed for Pingback dashboard - https://phabricator.wikimedia.org/T298929 (10Milimetric) 05Open→03Resolved [15:28:22] 10Data-Engineering, 10Data-Engineering-Kanban: Add CU-UA high entropy hints to Hive webrequest tables - https://phabricator.wikimedia.org/T304850 (10JAllemandou) a:05DAbad→03JAllemandou [15:41:13] 10Data-Engineering-Radar, 10Product-Analytics: Support on understanding traffic and behaviors for users on legacy browsers (somewhat timely) - https://phabricator.wikimedia.org/T303301 (10STHart) Thanks all, I followed up Maya, and as well with the Web and Growth teams on the browsers I mentioned above, so we... [15:44:12] 10Data-Engineering-Radar, 10Product-Analytics: Support on understanding traffic and behaviors for users on legacy browsers (somewhat timely) - https://phabricator.wikimedia.org/T303301 (10STHart) Also a quick note for the thread, and the future since I might be back! Please tag me using my Phabricator name and... [15:46:15] 10Data-Engineering, 10Data-Engineering-Kanban: Add CU-UA high entropy hints to Hive webrequest tables - https://phabricator.wikimedia.org/T304850 (10JJMC89) [15:47:50] 10Data-Engineering-Radar, 10Product-Analytics: Support on understanding traffic and behaviors for users on legacy browsers (somewhat timely) - https://phabricator.wikimedia.org/T303301 (10STHart) 05Open→03Resolved [15:48:54] 10Data-Engineering, 10Airflow: Reduce the number of files generated by geoeditors airflor jobs - https://phabricator.wikimedia.org/T304852 (10JAllemandou) [15:53:30] 10Data-Engineering, 10Privacy Engineering: Investigate releasing historical top-pageview-per-country data - https://phabricator.wikimedia.org/T299627 (10Htriedman) @JAllemandou Thanks so much for getting back to me on this with some more information. We're currently in the middle of establishing protocols and... [16:16:25] 10Data-Engineering, 10Privacy Engineering: Investigate releasing historical top-pageview-per-country data - https://phabricator.wikimedia.org/T299627 (10JAllemandou) I'd love if we could use this use case as a first release of DiffPriv data :) [16:44:44] 10Data-Engineering, 10DC-Ops, 10SRE, 10ops-eqiad: Q2:(Need By: TBD) rack/setup/install an-worker11[42-48].eqiad.wmnet - https://phabricator.wikimedia.org/T293922 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host an-worker1142.eqiad.wmnet with OS buster [16:47:45] 10Data-Engineering, 10Data-Engineering-Kanban: Add CU-UA high entropy hints to Hive webrequest tables - https://phabricator.wikimedia.org/T304850 (10EChetty) p:05Triage→03High [16:57:04] hm, the webrequest re-runs failed even with thresholds=100... [16:57:20] mforns: hi [16:57:28] heya [16:58:42] ottomata: :] ? [16:58:47] looking [17:01:05] ok [17:01:15] mforns: how did you run? [17:01:20] i see data_loss_threshold=5 in the yarn logs [17:01:44] maybe i'm lloking at the wrong job [17:01:49] ottomata: I modified the bundle.properties file to point to the workflow file and ran: [17:02:00] https://www.irccloud.com/pastebin/iOGrQafN/ [17:02:28] just realized I'm missing some stuff here [17:02:33] oh i think i am [17:02:51] i was looking at wrong run [17:02:53] i see data_loss_threshold=100 [17:03:19] but okay you missing something you say? [17:03:31] the refinery_directory [17:03:37] and queue_name [17:04:14] trying again with upload [17:04:19] hm okay [17:05:50] 10Data-Engineering, 10DC-Ops, 10SRE, 10ops-eqiad: Q2:(Need By: TBD) rack/setup/install an-worker11[42-48].eqiad.wmnet - https://phabricator.wikimedia.org/T293922 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host an-worker1142.eqiad.wmnet with OS buster exec... [17:10:43] ottomata: it failed again, at check_sequence_statistics [17:12:13] hm actually mforns yeah i think it is failing a sending the email? [17:12:21] oozie job -info 0097216-220113112502223-oozie-oozi-W@send_email_with_attachments [17:12:22] libpath [hdfs://analytics-hadoop/wmf/refinery/2022-03-16T17.08.51+00.00--scap_sync_2022-03-16_0001-dirty/oozie/util/send_error_email/lib] does not exist [17:12:26] Error Message : EM007: Encountered an error while sending the email message over SMTP. [17:12:35] huh interesting [17:13:06] weird! [17:13:09] why would it send an email though,,, [17:13:10] dunno what that /lib is about [17:13:24] mforns: i geuss because there was data loss [17:13:28] we get the email anyway [17:13:29] but the job won't fail [17:13:39] but the threshold was 100 [17:13:43] right [17:13:47] don't we get emails anyway? [17:14:05] there are times when we get emails that are like "there has been loss, but it is below thresholds, so job suceededing anyway" [17:14:13] anyway...i guess if it gets this far, and the hive partitions are in place and look okay [17:14:19] you could just ignore [17:14:21] java.io.FileNotFoundException: File does not exist: /wmf/data/raw/webrequests_data_loss/upload/2022/3/28/15/ERROR/000000_0 [17:14:30] unless you want to figure out why the email didn't work [17:14:37] ERRO? [17:14:40] but, it didn't refine! [17:14:44] no? [17:14:58] java.io.FileNotFoundException: File does not exist: /wmf/data/raw/webrequests_data_loss/upload/2022/3/28/15/ERROR/000000_0 [17:14:58] ? [17:15:00] that's a weird path [17:15:16] oh that is in data_loss/ sorry [17:16:52] hm, yes it did not refine. [17:18:48] hm but mforns 0097212-220113112502223-oozie-oozi-W@mark_add_partition_done SUCCEEDED? [17:19:36] yes, I don't know why it does this first! If you look at the graph in hue, you can see it does the stats checking first and then the refine, no? [17:19:42] so did 0097212-220113112502223-oozie-oozi-W@add_partition [17:20:15] OH right that is for raw [17:20:16] right [17:20:19] add partition is for raw [17:20:28] ok you are right [17:20:41] so this email failing is causing the refine step to not run [17:20:45] so we need to fix the email failure [17:21:07] right [17:22:36] yes [17:26:19] ottomata: I understand the workflow checks for errors first, and if the error file is present and has size > 0, then it sends the error email [17:26:33] but then the error email subworkflow does not find the error file.. [17:28:11] mforns: i think maybe the thresholds at 100 is causing the file not to be written at all [17:28:21] hm [17:28:22] not sure [17:28:35] yea, maybe [17:28:47] let me check if the file is there [17:29:50] does not [17:29:56] exist [17:30:27] yeah it doesn't exist, but hm [17:30:39] it does seem like the hive ql should write something? [17:31:26] but then, how does oozie's decision node trigger sending the email??? [17:31:37] [17:31:37] [17:31:37] [17:31:37] ${fs:fileSize(concat(error_data_loss_directory, "/000000_0")) eq 0} [17:31:37] [17:31:38] [17:31:38] [17:31:38] [17:32:00] well that is checking filesize [17:32:07] Ah! I got that wrong, [17:32:18] so the file didn't ever exist [17:32:33] what happens with INSERT OVERWRITE DIRECTORY '${target}' if there are no query results? [17:32:42] it looks like this code is expecting it to write a file of 0 size [17:32:45] but that did not happen? [17:33:04] the directory was created though... [17:37:40] directory may have been created by previous run? [17:38:01] mforns: just did a test [17:38:06] it should write a file of 0 size [17:38:10] aha [17:38:17] INSERT OVERWRITE DIRECTORY '/tmp/ottoe1' select "abc" where 1 > 2; [17:39:30] strange that it isn't! [17:39:31] hm [17:43:47] mforns: ...how would you feel about just manually running a refine_webrequest.hql query? [17:43:58] and then touching the _SUCCESS file? [17:44:00] heheh [17:44:26] OK! [17:45:15] also, mark the raw dataset done [17:46:04] ottomata: I will put together the hive command and I'll let you review [17:46:11] okay [17:48:10] mforns: i think the raw dataset part succeeded [17:48:20] it has _PARTITIONED flag [17:48:24] and that is what the oozie job said too [17:48:29] that part happens before check stats [17:49:41] but.. check_sequence_statistics has: [17:49:42] [17:51:18] looking [17:51:42] oh hm [17:51:53] yes mark_add_partition_done adds _PARTITIONED [17:52:11] ok yes so I guess we need a _SUCCESS in raw too [17:52:22] yes sorry, hour=14 has that [17:52:28] sorry been years since I looked at this stuff [17:55:33] yea, me too [17:55:37] ottomata: [17:55:42] https://www.irccloud.com/pastebin/639Ic7QN/ [17:56:36] lgtm mforns [17:56:43] k [17:57:48] running [18:03:10] ottomata: ??? [18:03:13] https://www.irccloud.com/pastebin/R99AobY8/ [18:15:58] hmm.. [18:15:58] Exception in thread "main" java.lang.NoClassDefFoundError: org/wikimedia/analytics/refinery/core/UAParser [18:18:19] ok, seems to be running now, the newest refinery-hive jar does not contain all necessary for the query to run, I ran it with 0.1.2 and it's working [18:37:25] huh okay [18:37:28] that's strange [18:40:46] text finished successfully, running upload [18:41:38] 10Data-Engineering: Data Quality: Airflow migration - https://phabricator.wikimedia.org/T304884 (10Milimetric) [18:41:47] ok gr8 [18:52:51] upload successful too [18:59:59] (03PS1) 10Milimetric: [WIP] Restore logic from oozie job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/774535 (https://phabricator.wikimedia.org/T304884) [19:00:46] 10Data-Engineering, 10Data-Engineering-Kanban, 10Patch-For-Review: Data Quality: Airflow migration - https://phabricator.wikimedia.org/T304884 (10Milimetric) p:05Triage→03High a:03Milimetric [19:01:36] mforns: great [19:01:39] thank you [19:01:49] success files okay? [19:11:42] 10Data-Engineering, 10Data-Engineering-Kanban, 10SRE: Create conda .deb and docker image - https://phabricator.wikimedia.org/T304450 (10Ottomata) Alright, I seem to have got reprepro to pull the update: https://apt.wikimedia.org/wikimedia/pool/thirdparty/conda/c/conda/ And, now conda is listed in both buste... [19:39:36] (03PS2) 10Milimetric: [WIP] Restore logic from oozie job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/774535 (https://phabricator.wikimedia.org/T304884) [20:57:39] (03CR) 10Bearloga: [C: 03+2] "Nice! I really like the thorough documentation" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/772934 (owner: 10Sharvaniharan) [20:58:04] (03CR) 10Bearloga: [C: 03+2] New schema for measuring article screen interactions [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/772910 (owner: 10Sharvaniharan) [20:58:42] (03Merged) 10jenkins-bot: New schema for edit history screen interactions [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/772934 (owner: 10Sharvaniharan) [20:59:00] (03Merged) 10jenkins-bot: New schema for measuring article screen interactions [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/772910 (owner: 10Sharvaniharan)