[01:29:28] (03CR) 10Sharvaniharan: "@Ottomata @Jason Linehan please review my changes:" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/761452 (owner: 10Sharvaniharan) [01:36:29] (03PS10) 10Sharvaniharan: Add a required variable to app analytics fragment [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/761452 (https://phabricator.wikimedia.org/T299239) [06:07:37] 10Data-Engineering, 10Data-Engineering-Kanban: Some varnishkafka instances dropped traffic for a long time due to the wrong version of the package installed - https://phabricator.wikimedia.org/T300164 (10AndyRussG) Wow, thanks so much, everyone, for finding this and for all the work figuring out the impact! I'... [11:36:22] 10Data-Engineering, 10Data-Engineering-Kanban: Create DataHub containers with deployment pipeline - https://phabricator.wikimedia.org/T301453 (10BTullis) [11:37:26] (03PS21) 10Phuedx: [WIP] Metrics Platform event schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/676392 (https://phabricator.wikimedia.org/T276379) (owner: 10Jason Linehan) [11:48:59] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Catalog: Define the Kubernetes Deployments for Datahub - https://phabricator.wikimedia.org/T301454 (10BTullis) [11:49:15] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Catalog: Deploy DataHub in MVP phase - https://phabricator.wikimedia.org/T301385 (10BTullis) [11:51:11] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Catalog: Proof-of-concept Karapace as Confluent schema registry replacement - https://phabricator.wikimedia.org/T301386 (10BTullis) [11:51:35] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Catalog: Create DataHub containers with deployment pipeline - https://phabricator.wikimedia.org/T301453 (10BTullis) [12:01:45] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Catalog: Define LVS load-balancing for OpenSearch cluster - https://phabricator.wikimedia.org/T301458 (10BTullis) [12:14:15] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Catalog: Configure MariaDB database for DataHub on an-coord1001 - https://phabricator.wikimedia.org/T301459 (10BTullis) [12:17:32] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Catalog: Update DNS for the DataHub MVP services - https://phabricator.wikimedia.org/T301460 (10BTullis) [12:28:47] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Catalog: Configure CAS-SSO authentication for the DataHub frontend - https://phabricator.wikimedia.org/T301462 (10BTullis) [13:16:53] (03CR) 10Ottomata: Add a required variable to app analytics fragment (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/761452 (https://phabricator.wikimedia.org/T299239) (owner: 10Sharvaniharan) [13:30:12] (03CR) 10Ottomata: [WIP] Metrics Platform event schema (032 comments) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/676392 (https://phabricator.wikimedia.org/T276379) (owner: 10Jason Linehan) [13:34:26] (03CR) 10Ottomata: [WIP] Metrics Platform event schema (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/676392 (https://phabricator.wikimedia.org/T276379) (owner: 10Jason Linehan) [13:36:59] (03CR) 10Ottomata: "https://wikitech.wikimedia.org/wiki/Event_Platform/Schemas/Guidelines#Do_not_remove_fields" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/761452 (https://phabricator.wikimedia.org/T299239) (owner: 10Sharvaniharan) [13:55:27] 10Data-Engineering, 10Infrastructure-Foundations, 10Product-Analytics, 10Research, and 2 others: Maybe restrict domains accessible by webproxy - https://phabricator.wikimedia.org/T300977 (10dcausse) For some production jobs we still use the proxy to access: - MW APIs (all our sites) - ores.wikimedia.org F... [14:03:06] 10Quarry: investigate quarry on k8s - https://phabricator.wikimedia.org/T301469 (10mdipietro) [14:05:27] (03PS1) 10Michael DiPietro: initial helm chart [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/761629 [14:05:30] (03PS1) 10Michael DiPietro: modification of quarry to allow for deployment to minikube. Additional changes will be needed when k8s is avaliable in production to allow for production to be deployed to k8s. [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/761630 (https://phabricator.wikimedia.org/T301469) [14:05:43] 10Data-Engineering, 10Infrastructure-Foundations, 10Product-Analytics, 10Research, and 2 others: Maybe restrict domains accessible by webproxy - https://phabricator.wikimedia.org/T300977 (10Ottomata) > MW APIs (all our sites) BTW, the proper way to access MW APIs from within our networks is to use e.g. htt... [14:06:07] mforns: o/ :) [14:06:36] (03Abandoned) 10Michael DiPietro: initial helm chart [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/761629 (owner: 10Michael DiPietro) [14:07:58] (03PS1) 10Michael DiPietro: minikube helm chart [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/761631 (https://phabricator.wikimedia.org/T301469) [14:08:05] (03Abandoned) 10Michael DiPietro: modification of quarry to allow for deployment to minikube. Additional changes will be needed when k8s is avaliable in production to allow for production to be deployed to k8s. [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/761630 (https://phabricator.wikimedia.org/T301469) (owner: 10Michael DiPietro) [14:08:33] 10Data-Engineering, 10Airflow: Use arrow_hdfs:// fsspec protocol in workflow_utils artifact syncing - https://phabricator.wikimedia.org/T300876 (10Ottomata) This can be done automatically by calling `fsspec.registry.register_implementation` with `clobber=True` to make hdfs:// use arrow_hdfs by default. Just... [14:10:18] (03CR) 10jerkins-bot: [V: 04-1] minikube helm chart [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/761631 (https://phabricator.wikimedia.org/T301469) (owner: 10Michael DiPietro) [14:14:37] hello ottomata :] [14:14:45] helLoOooO [14:17:48] mforns: what we gotta to do merge artifact-registry? [14:17:55] maybe add some tests? [14:18:02] or we can do that later? [14:18:15] i want to get research instance deployment up today [14:20:32] ok, I think we can merge that already, also to unblock the other jobs that are waiting for CR [14:20:39] and I will work on the tests in parallel [14:20:49] ottomata: ^ [14:22:13] okay greatt, mforns you just gotta revert your personal changes ya? [14:22:20] i.e. # TODO REVERT [14:24:12] yes [14:24:15] will do now [14:34:04] ottomata: done, if you want to review and meerrrgeee? [14:37:22] mforns: ya will do, in meeting til start of next hour i think will do then [14:39:02] 10Data-Engineering, 10Infrastructure-Foundations, 10Product-Analytics, 10Research, and 2 others: Maybe restrict domains accessible by webproxy - https://phabricator.wikimedia.org/T300977 (10akosiaris) >>! In T300977#7700725, @Ottomata wrote: >> MW APIs (all our sites) > BTW, the proper way to access MW API... [14:42:22] 10Analytics, 10Data-Engineering, 10Event-Platform, 10EventStreams, and 2 others: Expose rdf-streaming-updater.mutation content through EventStreams - https://phabricator.wikimedia.org/T294133 (10Ottomata) > Here you must consume only one. Actually, this is curious. These are really distinct streams. We p... [14:43:59] 10Data-Engineering, 10Infrastructure-Foundations, 10Product-Analytics, 10Research, and 2 others: Maybe restrict domains accessible by webproxy - https://phabricator.wikimedia.org/T300977 (10Ottomata) > avoiding a SPOF (there aren't that many web proxies nor is it a highly available setup cause there isn't... [14:46:14] 10Analytics, 10Data-Engineering, 10Event-Platform, 10EventStreams, and 2 others: Expose rdf-streaming-updater.mutation content through EventStreams - https://phabricator.wikimedia.org/T294133 (10Ottomata) (Oh, past me said this already... :p) [14:47:04] 10Data-Engineering, 10Infrastructure-Foundations, 10Product-Analytics, 10Research, and 2 others: Maybe restrict domains accessible by webproxy - https://phabricator.wikimedia.org/T300977 (10jbond) > This has bitten me before when I used to use the webproxy internally. Don't do it! :) Its worth mentioning... [14:48:41] (03PS22) 10Phuedx: Metrics Platform event schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/676392 (https://phabricator.wikimedia.org/T276379) (owner: 10Jason Linehan) [14:58:33] otto-mata: I just had an idea about the user dev configs: make them mandatory arguments of the dev instance script. The script will then put them in a yaml file with the right structure under i.e. AIRFLOW_HOME and then the dag_config file can access that as we discussed. We also can make the script accept a properties file for convenience (if you like it, we can do this -> not now though) [15:19:27] 10Data-Engineering, 10Infrastructure-Foundations, 10Product-Analytics, 10Research, and 2 others: Maybe restrict domains accessible by webproxy - https://phabricator.wikimedia.org/T300977 (10Ottomata) Hahah, maybe what we should do is excludelist the internal domains in the webproxy! [15:20:06] interesting mforns , but what arguments would we make mandatory? [15:20:09] its all dag specific, isn't it? [15:23:38] mforns: there is still a /user/mforns in artifact_config.yaml [15:23:57] actually we need to decide on our offical place for artifact cache [15:24:02] since we are really going to start using it [15:24:08] i'll make patch [15:30:48] sorry ottomata was in meeting. argh yes sorry for the /user/mforns, forgot [15:31:07] mforns: [15:31:08] if you want to pair-deploy I'm in the meeting [15:31:16] am thinking [15:31:17] hdfs:///wmf/cache/artifacts/airflow [15:31:26] although, i'm not sure about the airflow part [15:31:36] maybe just hdfs:///wmf/cache/artifacts [15:31:37] thoughts? [15:31:38] right, sounds good to me [15:31:44] ok with both [15:31:48] or: should we have instance specific cache dirs? [15:32:13] hdfs:///wmf/cache/artifacts/airflow/{analytics,research,platform_eng} ? [15:32:41] ideally we should be able to share existing artifacts no? [15:32:44] yes [15:32:54] but there's the danger of name collisions [15:33:01] but, we have instance specific artifact.config files [15:33:06] and in order for artifact() function to work [15:33:12] the artifact needs to be declared there [15:33:25] so if two instances want to use refinery-job-shaded [15:33:29] they'll both declare it [15:33:34] which means that on deploy, it will be synced [15:33:48] if they use diferent ids, it will go to different files [15:33:51] which is ok [15:33:59] but if they use the same, it will be overwritten (or not -recached) [15:34:05] yes, but scap will notice that the artifact is already there, and not download it again, no? [15:34:06] which my be a problem if they change hte url and not the id [15:34:18] yes, but it requires that everyone version their artifacts in the same way [15:34:43] aren't the maven coords unique? [15:34:45] i.e. the artifacts declared in different artifact.yaml files need to be identical [15:34:47] yes [15:34:57] but they could change them under the same id [15:35:05] is the id in the path?? [15:35:06] like, if someone didn't put the version in their id [15:35:11] in the cache patht, i think yes [15:35:18] hm. let me check [15:35:49] ottomata: this is the filename of the artifact I defined: [15:35:54] /home/mforns/Projects/developerNotes/notes10.txt [15:35:58] https://gitlab.wikimedia.org/repos/data-engineering/workflow_utils/-/blob/main/workflow_utils/artifact/cache.py#L21-44 [15:36:08] xD sorry not that [15:36:20] org.wikimedia.analytics.refinery.job_refinery-job_jar_shaded_0.1.23 [15:36:32] it doesn't have the artifact_id [15:37:08] it's the safe filename of the coordinates [15:37:15] hm [15:38:42] huh, how is that happening...i don't see the code doing that. gotta investigate [15:40:35] oh [15:40:37] right mf [15:40:39] mforns [15:40:46] id: org.wikimedia.analytics.refinery.job:refinery-job:jar:shaded:0.1.23 [15:40:53] the id is is the maven coordinate [15:40:57] aha [15:41:06] so hm, yes in that case it will be cached by the unique maven coordinate [15:41:07] you are right [15:41:10] not the name [15:41:30] will that work with gitlab artifacts as well? [15:41:33] the name is just a logical description [15:41:38] aha [15:41:47] yes, the id for a url source will be the url [15:41:55] awesome, then [15:42:03] yes.... [15:42:04] i think [15:42:07] hehe [15:42:15] i need to add some checksuming or at least filesize checking [15:42:20] right now it only checks if the file exists [15:42:30] ok [15:42:38] thinking though...yes i think this should be okay [15:42:43] the ids are unique to the source [15:42:52] unless the underlying source content changes somehow [15:42:53] like [15:43:01] if someone has some unversioned artifact in gitlab [15:43:02] and they chanbge it [15:43:10] I see [15:43:12] but [15:43:21] that would be a problem without checksumming right now [15:43:26] hmmmm [15:43:36] okay, lets go with the shared artifact directory for now [15:43:39] ithink it will be okay [15:44:05] 10Data-Engineering, 10Infrastructure-Foundations, 10Product-Analytics, 10Research, and 2 others: Maybe restrict domains accessible by webproxy - https://phabricator.wikimedia.org/T300977 (10mpopov) A couple of questions/comments: >>! In T300977#7700842, @jbond wrote: > Its worth mentioning that when i to... [15:44:05] i'm going to keep airlfow in there though, just for now atlest to be safe [15:44:18] and it can be overriden by instance config, right? [15:44:25] yes it can [15:44:25] ok, yes [15:45:17] ok pushed, everyrthing else looks good [15:45:19] i'm going to merge mforns [15:45:33] wait ottomata [15:45:37] I have one more change... [15:45:58] okay [15:46:57] done, I added another recipient to the traffic anomaly ddtection job [15:48:13] okay [15:48:18] man what is up with these rebases/?!?! [15:49:53] arff [15:58:59] okay i rebased but i dunno what happened [15:59:03] the final result looks good [15:59:09] some commits are duplicated and i don't know wy [15:59:14] merging [16:01:06] 10Data-Engineering, 10Infrastructure-Foundations, 10Product-Analytics, 10Research, and 2 others: Maybe restrict domains accessible by webproxy - https://phabricator.wikimedia.org/T300977 (10EBernhardson) > Oh and ORES is also available under https://ores.discovery.wmnet (and it's the exact same service!)... [16:01:28] 10Data-Engineering, 10Infrastructure-Foundations, 10Product-Analytics, 10Research, and 2 others: Maybe restrict domains accessible by webproxy - https://phabricator.wikimedia.org/T300977 (10mpopov) > **First**: How difficult & how much overhead would it be to make the proxy redirect requests made to intern... [16:02:36] ok ottomata [16:05:03] 10Data-Engineering, 10Infrastructure-Foundations, 10Product-Analytics, 10Research, and 2 others: Maybe restrict domains accessible by webproxy - https://phabricator.wikimedia.org/T300977 (10Ottomata) > Is the intention to allow us to talk to prod in a more general fashion then? I think so, see the parent t... [16:06:19] mforns: deploying to analytics_test [16:06:28] k! [16:21:23] 10Data-Engineering, 10Data-Engineering-Kanban: Some varnishkafka instances dropped traffic for a long time due to the wrong version of the package installed - https://phabricator.wikimedia.org/T300164 (10Mayakp.wiki) @AndyRussG : For next steps we are planning to look into the impact of data loss on US traffic... [16:28:06] mforns: i think it looks good! [16:28:24] had to fix some scap things, but i believe useragent_distribution [16:28:24] is running just fine! [16:28:28] in analytics_test [16:28:29] can you verify? [16:28:36] ottomata: the useragent_distribution dag is blocked, but I think it's there for some time [16:28:49] re-running [16:29:24] ottomata: TypeError: sequence item 2: expected str instance, int found [16:29:45] 10Data-Engineering, 10Infrastructure-Foundations, 10Product-Analytics, 10Research, and 2 others: Maybe restrict domains accessible by webproxy - https://phabricator.wikimedia.org/T300977 (10jbond) > First: How difficult & how much overhead would it be to make the proxy redirect requests made to internal do... [16:30:31] we fixed that already no? [16:32:38] hmmm [16:32:43] i had thought so!~ [16:32:53] i see it [16:32:57] investigating [16:35:08] ottomata: can this be related to hooks/spark.pyL399? [16:35:37] no, thats for skein [16:35:43] mfornsi think the fix got lost in the rebases [16:36:39] ok [16:36:45] fixing and pushing and deploying [16:37:36] restarting ariflow just in case then will clear task [16:37:37] ok [16:42:49] mforns: it worked! [16:42:55] shall we deploy to analytics? [16:43:00] yay, sure! [16:44:15] okay i'm goign to publish this deb first and install it the right way [16:45:22] 👍 [16:48:42] !log deploying airflow analytics with lots of recent changes to airflow-dags repository [16:48:44] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:51:15] k mforns deployed to analytics [16:51:23] looking! [16:51:33] looks like todays job has already run, maybe clear and rerun to super check? [16:51:57] yes, doing [16:55:31] ottomata: seems to be working :D [16:56:40] yeehaw [17:00:40] (03PS23) 10Phuedx: Metrics Platform event schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/676392 (https://phabricator.wikimedia.org/T276379) (owner: 10Jason Linehan) [17:01:03] ottomata: finished successfully! [17:01:04] (03CR) 10Phuedx: Metrics Platform event schema (033 comments) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/676392 (https://phabricator.wikimedia.org/T276379) (owner: 10Jason Linehan) [17:01:45] nice! [17:03:19] 10Analytics, 10Analytics-Wikistats, 10Data-Engineering, 10Data-Engineering-Kanban, 10Product-Analytics: Wikistats pageview data missing counts for Mobile App pageviews on Commons, going back to 2020-11 - https://phabricator.wikimedia.org/T299439 (10JAllemandou) [17:03:37] 10Analytics, 10Analytics-Wikistats, 10Data-Engineering, 10Data-Engineering-Kanban, 10Product-Analytics: Wikistats reports no mobile unique devices for Wikidata and MediaWiki.org - https://phabricator.wikimedia.org/T299559 (10JAllemandou) [17:05:45] mforns: https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/16 [17:05:58] oh dags dir needs there too [17:06:02] hmm, i'll just do it with __init__.py [17:06:24] 10Analytics, 10Analytics-Wikistats, 10Data-Engineering, 10Data-Engineering-Kanban, 10Product-Analytics: Wikistats reports no mobile unique devices for Wikidata and MediaWiki.org - https://phabricator.wikimedia.org/T299559 (10JAllemandou) a:03JAllemandou [17:08:47] ottomata: merged! [17:11:01] ty! [18:01:39] (03CR) 10Ottomata: Metrics Platform event schema (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/676392 (https://phabricator.wikimedia.org/T276379) (owner: 10Jason Linehan) [18:14:17] 10Analytics, 10Data-Engineering: Upgrade dbstore100* hosts to Bullseye - https://phabricator.wikimedia.org/T299481 (10odimitrijevic) [18:54:54] !log setting up research airflow-dags scap deployment, recreating airflow database and starting from scractch (fab okayed this) - T295380 [18:54:57] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:54:57] T295380: [Airflow] Set up scap deployment - https://phabricator.wikimedia.org/T295380 [19:05:54] 10Data-Engineering, 10Data-Engineering-Kanban, 10Airflow, 10Patch-For-Review: [Airflow] Set up scap deployment - https://phabricator.wikimedia.org/T295380 (10Ottomata) Ok @fkaelin @bmansurov, the research airflow instance is now using [[ https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags | da... [19:06:41] 10Data-Engineering, 10Data-Engineering-Kanban: Some varnishkafka instances dropped traffic for a long time due to the wrong version of the package installed - https://phabricator.wikimedia.org/T300164 (10AndyRussG) >>! In T300164#7701197, @Mayakp.wiki wrote: > @AndyRussG : For next steps we are planning to loo... [19:08:59] mforns: woo hooo, done! research instance good to go! [19:35:01] ottomata: \\\o/// [19:36:24] 10Data-Engineering, 10Airflow: Use arrow_hdfs:// fsspec protocol in workflow_utils artifact syncing - https://phabricator.wikimedia.org/T300876 (10Ottomata) Ah, it was only a JAVA_HOME not set properly problem. It works fine. Now just to figure out where to set this. I think in workflow_utils? [19:47:19] mforns: got a sec for a brain bounce about fsspec and pyarrow? [19:49:31] yes ottomata bc? [19:49:42] ya, i think i'm landing on an answer but i want to see what you think [19:49:42] bc [20:00:39] oh mforns someone already has made an issue! reading https://github.com/fsspec/filesystem_spec/issues/874 [20:02:08] ottomata: cool! seems recent, and seems they are open to change it [20:02:26] ya, lets focus on that then [20:02:28] instead of magic [20:03:05] "if we are convinced it does all we need" [20:09:23] 10Data-Engineering, 10Airflow: Use arrow_hdfs:// fsspec protocol in workflow_utils artifact syncing - https://phabricator.wikimedia.org/T300876 (10Ottomata) Ah, this is being discussed in upstream fsspec too: https://github.com/fsspec/filesystem_spec/issues/874 Let's see what they say. [20:35:55] (03Abandoned) 10Clare Ming: POC: add new stream for VectorPrefDiffInstrumentation [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/717622 (https://phabricator.wikimedia.org/T289622) (owner: 10Clare Ming) [23:16:16] 10Data-Engineering, 10Data-Engineering-Kanban: Some varnishkafka instances dropped traffic for a long time due to the wrong version of the package installed - https://phabricator.wikimedia.org/T300164 (10Mayakp.wiki) > since the two datacenters affected are in the US, the impact on reported pageviews will be e... [23:21:26] (03PS1) 10MewOphaswongse: Add navigation_type action_data [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/761742 (https://phabricator.wikimedia.org/T301486)