[04:16:57] 10Analytics, 10Analytics-Wikistats, 10Data-Engineering, 10Language-strategy, and 2 others: Add more popular articles per country data to AQS - https://phabricator.wikimedia.org/T263697 (10MusikAnimal) [05:18:13] 10Data-Engineering, 10Data-Engineering-Kanban: Some varnishkafka instances dropped traffic for a long time due to the wrong version of the package installed - https://phabricator.wikimedia.org/T300164 (10AndyRussG) Ran a quick test query to get a by-country breakdown for pageviews on 2021-12-01. I'm not super... [07:35:45] 10Data-Engineering, 10Data-Engineering-Kanban: Some varnishkafka instances dropped traffic for a long time due to the wrong version of the package installed - https://phabricator.wikimedia.org/T300164 (10elukey) If it helps: https://github.com/wikimedia/operations-dns/blob/master/geo-maps We resolve our publi... [09:20:43] elukey: good morning! would you have a minute for a brainbounce? [09:21:53] joal: bonjour! sure [09:22:53] elukey: I'm working on the dataloss from varnishkaka issue [09:23:24] elukey: I'm pretty much ok with webrequest flows, but I have questions for statsv and eventlogging [09:24:27] elukey: my understanding is that the problem was traffic data not being sent from caches - Would that even affect eventlogging? [09:24:39] And obviously, same question for statsv [09:25:51] joal: I think so yes, the topics from which eventlogging and statsv pull data from would have seen lower data coming in [09:27:13] elukey@cp3050:~$ sudo grep -rni 'kafka.topic =' /etc/varnishkafka [09:27:13] /etc/varnishkafka/eventlogging.conf:234:kafka.topic = eventlogging-client-side [09:27:16] /etc/varnishkafka/webrequest.conf:238:kafka.topic = webrequest_text [09:27:17] joal: --^ [09:27:19] /etc/varnishkafka/statsv.conf:234:kafka.topic = statsv [09:28:06] the vk EL instances send traffic to the client-side topic, that IIRC is pulled from EL and then refined [09:28:12] elukey: ok - this means those two flows rely on "get" http posts, not post - I thought they might have [09:28:13] same thing for statsv [09:28:45] And the data is sent as URL parameters [09:29:23] so EL listens for /beacon/event(.gif)? URIs [09:29:47] and statsv "^/beacon/statsv\? [09:29:58] ok makes sense - one more question then - do we have every webrequest-text cache host also doing statsv and eventlogging? [09:30:22] the match for ReqURL, it can be also POST data [09:30:44] yes in theory cache-text nodes are also doing statsv/el [09:30:59] elukey: if it is post data - how would it be retrieved backend? varnishkafka doesn't send that, right? [09:31:28] joal: what do you mean? [09:31:52] if data is snt by post, it wouldn't show up on kafka topics [09:33:02] And we're only talking about eventlogging-clients-side indeed [09:33:14] I don't recall exactly so it may be GETs, but for statsv this is the format of the json that is sent to kafka: [09:33:17] $format = "%{fake_tag0@hostname?${::fqdn}}x %{%FT%T@dt}t %{X-Client-IP@ip}o %{@uri_path}U %{@uri_query}q %{User-Agent@user_agent}i" [09:33:30] ack elukey [09:33:35] these are all fields avaialble in the req in theory [09:33:51] I can check live on nodes if you want [09:34:37] not needed elukey thank you - I'm more interested in a confirmation that there is an exact match on caches doing webrequest-text and eventlogging :) [09:35:35] yep yep :) [09:35:56] cache-upload hosts are the only ones having vk-webrequest only [09:39:12] thanks a lot elukey - this helps :) [09:56:15] <3 [10:21:46] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Catalog, 10Release Pipeline: Create DataHub containers with deployment pipeline - https://phabricator.wikimedia.org/T301453 (10BTullis) The repository has now been created and I have permission to push to it: I have created the wmf branch I will begin... [10:55:06] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Catalog, 10Release Pipeline: Create DataHub containers with deployment pipeline - https://phabricator.wikimedia.org/T301453 (10elukey) @BTullis I saw the task passing by, what is the goal of forking the linkedin's datahub repo? Are you interested only i... [11:07:32] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Catalog, 10Release Pipeline: Create DataHub containers with deployment pipeline - https://phabricator.wikimedia.org/T301453 (10BTullis) @elukey - Oh right, yes that looks really useful. So this would be the intermediate step from here, right? https://wi... [11:15:16] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Catalog, 10Release Pipeline: Create DataHub containers with deployment pipeline - https://phabricator.wikimedia.org/T301453 (10elukey) >>! In T301453#7707085, @BTullis wrote: > @elukey - Oh right, yes that looks really useful. So this would be the inter... [11:32:54] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Catalog, 10Release Pipeline: Create DataHub containers with deployment pipeline - https://phabricator.wikimedia.org/T301453 (10elukey) @akosiaris is there any problem in avoiding the pipeline/blubber config and going through production-images instead? [11:32:56] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Catalog, 10Release Pipeline: Create DataHub containers with deployment pipeline - https://phabricator.wikimedia.org/T301453 (10akosiaris) >>! In T301453#7707100, @elukey wrote: >>>! In T301453#7707085, @BTullis wrote: >> @elukey - Oh right, yes that loo... [11:37:09] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Catalog, 10Release Pipeline: Create DataHub containers with deployment pipeline - https://phabricator.wikimedia.org/T301453 (10elukey) @akosiaris I am a little confused then, I didn't use any of it for istio/knative/kserve/etc.., those were all services... [12:28:50] 10Analytics, 10SRE, 10SRE Observability: dropped packets to kafkamon 9000/tcp - https://phabricator.wikimedia.org/T238794 (10ayounsi) [12:59:26] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Services, 10cloud-services-team (Kanban): Recreate views for globaluser table - https://phabricator.wikimedia.org/T301674 (10Majavah) [14:19:48] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Catalog, 10Release Pipeline: Create DataHub containers with deployment pipeline - https://phabricator.wikimedia.org/T301453 (10akosiaris) >>! In T301453#7707174, @elukey wrote: > @akosiaris I am a little confused then, I didn't use any of it for istio/k... [14:22:32] 10Data-Engineering, 10Data-Engineering-Kanban: Some varnishkafka instances dropped traffic for a long time due to the wrong version of the package installed - https://phabricator.wikimedia.org/T300164 (10JAllemandou) Adding precisions >>! In T300164#7705155, @Iflorez wrote: > > @Jallemandou A few questions t... [14:24:55] 10Data-Engineering, 10Data-Engineering-Kanban: Some varnishkafka instances dropped traffic for a long time due to the wrong version of the package installed - https://phabricator.wikimedia.org/T300164 (10Milimetric) I agree we should look at this loss, @AndyRussG, and estimate as accurately as possible along a... [15:12:30] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Catalog, 10Release Pipeline, 10Patch-For-Review: Create DataHub containers with deployment pipeline - https://phabricator.wikimedia.org/T301453 (10BTullis) Thanks both. This is really helpful. I have begun work on the deployment pipeline work and star... [15:15:41] ottomata: dumb question - isn't pyspark installed on our default python3 kernel for jupyter? [15:21:33] ottomata: nevermind, figured it out :) [16:08:51] !log sudo cookbook sre.ganeti.makevm --vcpus 4 --memory 8 --disk 50 eqiad_B datahubsearch1002 for T301383 [16:08:53] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:08:54] T301383: eqiad: 3 VMs requested for datahub opensearch cluster - https://phabricator.wikimedia.org/T301383 [16:38:56] 10Data-Engineering, 10Data-Engineering-Kanban, 10Airflow: Create Custom Hdfssensor - https://phabricator.wikimedia.org/T300276 (10Snwachukwu) [16:59:17] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Catalog: Define the Kubernetes Deployments for Datahub - https://phabricator.wikimedia.org/T301454 (10Milimetric) As designed, we asked SRE if we can deploy on the existing Service Ops main kubernetes cluster (WikiKube). We are planning on moving to the... [17:01:40] 10Data-Engineering, 10Data-Catalog: Connect Metadata Sources to the MVP [Mile Stone 5] - https://phabricator.wikimedia.org/T299899 (10Milimetric) [17:02:55] 10Data-Engineering, 10Data-Catalog: Connect MVP to Hive metastore [Mile Stone 4] - https://phabricator.wikimedia.org/T299897 (10Milimetric) [17:03:35] 10Data-Engineering, 10Data-Catalog: Connect Kafka to the MVP [Mile Stone 5] - https://phabricator.wikimedia.org/T299899 (10Milimetric) [17:04:58] 10Data-Engineering-Kanban, 10Data-Catalog: [[wikitech:Data Catalog Application Evaluation Rubric]] links to some non-public Google Doc "execution plan" - https://phabricator.wikimedia.org/T299900 (10Milimetric) a:03Milimetric [17:06:45] 10Data-Engineering, 10Epic: Data Catalog MVP - https://phabricator.wikimedia.org/T299910 (10Milimetric) [17:09:07] 10Data-Engineering, 10Data-Catalog, 10Epic: Data Catalog MVP - https://phabricator.wikimedia.org/T299910 (10Milimetric) [17:12:50] folks there is a failed unit on an-test-client1001, namely airflow-scheduler@analytics-test.service, does it need a reset-failed? [17:15:54] 10Analytics, 10Data-Engineering: Upgrade dbstore100* hosts to Bullseye - https://phabricator.wikimedia.org/T299481 (10odimitrijevic) @Marostegui let's coordinate downtime is the same as our cloud host cc @BTullis @razzi [17:25:17] 10Analytics-Radar, 10Anti-Harassment, 10CheckUser, 10Privacy Engineering, and 4 others: Deal with Google Chrome User-Agent deprecation - https://phabricator.wikimedia.org/T242825 (10CBogen) [17:27:22] 10Analytics, 10Data-Engineering: Upgrade dbstore100* hosts to Bullseye - https://phabricator.wikimedia.org/T299481 (10Marostegui) @odimitrijevic it is really up to you all. It only requires stopping all mariadb instances, doing the reimage and then starting them back. I can provide the commands in detail if yo... [17:27:46] 10Data-Engineering, 10Data-Engineering-Kanban: Some varnishkafka instances dropped traffic for a long time due to the wrong version of the package installed - https://phabricator.wikimedia.org/T300164 (10AndyRussG) Thanks so much, @elukey, @JAllemandou, @Milimetric! Heheh ok I guess following @Milimetric, I'l... [17:28:02] 10Data-Engineering, 10Product-Analytics: wikidata unique devices per-project-family overcount offset - https://phabricator.wikimedia.org/T301403 (10odimitrijevic) We need to document unique devices metrics and establish ownership [17:30:31] ping razzi (see above :) [17:31:10] Let me look into this an-test-client error [17:31:36] ``` UNIT LOAD ACTIVE SUB DESCRIPTION [17:31:36] ● airflow-scheduler@analytics-test.service not-found failed failed airflow-scheduler@analytics-test.service [17:31:36] ``` [17:33:58] Looks like the unit was removed so it is no longer needed at all [17:35:39] 10Data-Engineering, 10Data-Engineering-Kanban, 10Product-Analytics: 22 small wikis missing from the mediawiki_history dataset - https://phabricator.wikimedia.org/T299548 (10odimitrijevic) [17:36:41] milimetric: heya - do we spend a minute in the cave about webrequests? [17:36:47] 10Data-Engineering, 10Data-Engineering-Kanban, 10Product-Analytics: 22 small wikis missing from the mediawiki_history dataset - https://phabricator.wikimedia.org/T299548 (10odimitrijevic) Documentation on adding new wikis: https://wikitech.wikimedia.org/wiki/Data_Engineering/Ops_week#Adding_new_wikis_to_the_... [17:36:56] milimetric: actually maybe in 5 minutes? [17:37:53] razzi: sure let's reset-fail it then [17:38:22] Ok elukey I was wondering if there was a different command but didn't see any, will reset-failed [17:38:40] !log razzi@an-test-client1001:~$ sudo systemctl reset-failed airflow-scheduler@analytics-test.service [17:38:42] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:44:20] 10Data-Engineering, 10Data-Engineering-Kanban, 10Product-Analytics: 22 small wikis missing from the mediawiki_history dataset - https://phabricator.wikimedia.org/T299548 (10Milimetric) This has been a manual process so far, and my opinion is that we should take this opportunity to fix technical debt: We mai... [17:44:37] Ok actually the unit was not removed, I was just looking in the wrong place [17:44:37] Nevertheless I reset-failed it, and it seems to be running fine, so it's still an ok resolution [17:48:44] (03PS1) 10DLynch: talk_page_edit: add a new component_type for topics [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/762497 (https://phabricator.wikimedia.org/T301496) [17:50:45] razzi: the unit's name that is currently running is different, see _ [17:50:48] airflow-scheduler@analytics_test.service [17:51:05] so the one that you reset-failed, maked as "not-found", wasn't there anymore [17:51:14] you cleared the alert with the reset-failed [18:15:22] 10Data-Engineering, 10Data-Engineering-Kanban: Some varnishkafka instances dropped traffic for a long time due to the wrong version of the package installed - https://phabricator.wikimedia.org/T300164 (10Iflorez) >>! In T300164#7707661, @JAllemandou wrote: > Adding precisions > thank you [18:19:57] (03PS5) 10Michael DiPietro: minikube helm chart [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/761631 (https://phabricator.wikimedia.org/T301469) [18:25:26] (03Abandoned) 10Michael DiPietro: noop [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/761984 (owner: 10Michael DiPietro) [18:26:05] milimetric: ping? [18:34:19] hi joal, here [18:34:27] milimetric: batcave? [18:34:29] omw [19:35:45] 10Analytics, 10Analytics-Wikistats, 10Data-Engineering, 10Data-Engineering-Kanban, and 5 others: Wikistats pageview data missing counts for Mobile App pageviews on Commons, going back to 2020-11 - https://phabricator.wikimedia.org/T299439 (10LGoto) p:05Triage→03Medium [19:56:57] (03PS25) 10Phuedx: Metrics Platform event schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/676392 (https://phabricator.wikimedia.org/T276379) (owner: 10Jason Linehan) [20:01:07] (03CR) 10Phuedx: Metrics Platform event schema (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/676392 (https://phabricator.wikimedia.org/T276379) (owner: 10Jason Linehan) [20:01:14] (03CR) 10jerkins-bot: [V: 04-1] Metrics Platform event schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/676392 (https://phabricator.wikimedia.org/T276379) (owner: 10Jason Linehan) [20:23:58] (03CR) 10Ottomata: Metrics Platform event schema (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/676392 (https://phabricator.wikimedia.org/T276379) (owner: 10Jason Linehan) [20:33:39] Gone for tonight [20:54:09] (03CR) 10Bartosz Dziewoński: [C: 03+2] talk_page_edit: add a new component_type for topics [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/762497 (https://phabricator.wikimedia.org/T301496) (owner: 10DLynch) [20:55:13] (03Merged) 10jenkins-bot: talk_page_edit: add a new component_type for topics [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/762497 (https://phabricator.wikimedia.org/T301496) (owner: 10DLynch) [22:19:37] 10Data-Engineering, 10Product-Analytics: conda-create-stacked breaks wmfdata.presto - https://phabricator.wikimedia.org/T301734 (10nettrom_WMF) [22:21:17] 10Data-Engineering, 10Product-Analytics: conda-create-stacked breaks wmfdata.presto - https://phabricator.wikimedia.org/T301734 (10nettrom_WMF)