[06:38:26] (03PS1) 10Jdrewniak: [WIP] Updating desktopwebuiactionstracking with viewport buckets [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/767439 (https://phabricator.wikimedia.org/T301391) [06:47:21] 10Analytics, 10Data-Engineering, 10Event-Platform, 10Patch-For-Review, 10Readers-Web-Backlog (Kanbanana-FY-2021-22): WikipediaPortal Event Platform Migration - https://phabricator.wikimedia.org/T282012 (10Jdrewniak) @Ottomata great, should we merge the schema addition to analytics/legacy before, after, o... [07:08:27] (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1001:10100 - https://alerts.wikimedia.org [07:13:27] (HiveServerHeapUsage) resolved: Hive Server JVM Heap usage is above 80% on an-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1001:10100 - https://alerts.wikimedia.org [07:59:15] 10Data-Engineering-Kanban, 10Data-Catalog: datahubsearch nodes alerting with "Rate of JVM GC Old generation-s runs" - https://phabricator.wikimedia.org/T302818 (10elukey) This is due to elastisearch puppet code: `icinga::monitor::elasticsearch::old_jvm_gc_checks` The prometheus masters are not configured to p... [08:07:01] 10Data-Engineering-Kanban, 10Data-Catalog: datahubsearch nodes alerting with "Rate of JVM GC Old generation-s runs" - https://phabricator.wikimedia.org/T302818 (10elukey) Interesting. So the Datahub role includes `profile::opensearch::server`, but the other clusters don't do it, they use `profile::opensearch::... [09:56:29] RECOVERY - cache_upload: Varnishkafka webrequest Delivery Errors per second -drmrs- on alert1001 is OK: (C)5 ge (W)1 ge 0.7667 https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka https://grafana.wikimedia.org/d/000000253/varnishkafka?panelId=20&fullscreen&orgId=1&var-datasource=drmrs+prometheus/ops&var-source=webrequest&var-cp_cluster=cache_upload&var-instance=All [09:58:46] 10Data-Engineering-Kanban, 10Data-Catalog: datahubsearch nodes alerting with "Rate of JVM GC Old generation-s runs" - https://phabricator.wikimedia.org/T302818 (10BTullis) Right. Thanks @elukey. I think we may also to need to include `profile::opensearch::monitoring::base_checks` which sets up the rest of the... [10:02:02] 10Data-Engineering, 10Data-Engineering-Kanban, 10Airflow, 10Data-Catalog: datahubsearch nodes alerting with "Rate of JVM GC Old generation-s runs" - https://phabricator.wikimedia.org/T302818 (10BTullis) [10:25:13] 10Data-Engineering-Kanban, 10Data-Catalog, 10Patch-For-Review: Set up opensearch cluster for datahub - https://phabricator.wikimedia.org/T301382 (10BTullis) Looking at the existing logstash nodes, they have a ferm configuration fragment present, which is not present on the datahubsearch hosts. {F34972124} {F... [10:38:42] 10Data-Engineering-Kanban, 10Data-Catalog, 10Patch-For-Review: Set up opensearch cluster for datahub - https://phabricator.wikimedia.org/T301382 (10BTullis) p:05Triage→03High [10:39:03] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Catalog: Define LVS load-balancing for OpenSearch cluster - https://phabricator.wikimedia.org/T301458 (10BTullis) p:05Triage→03High [10:39:23] 10Data-Engineering, 10Data-Catalog, 10Epic: Data Catalog MVP - https://phabricator.wikimedia.org/T299910 (10BTullis) p:05Triage→03High [10:42:56] 10Data-Engineering, 10Data-Engineering-Kanban, 10SRE, 10observability, and 2 others: Upgrade Kafka Risk Evaluation - https://phabricator.wikimedia.org/T302610 (10JMeybohm) p:05Triage→03Medium [13:08:44] 10Analytics, 10Data-Engineering, 10Event-Platform, 10Patch-For-Review, 10Readers-Web-Backlog (Kanbanana-FY-2021-22): WikipediaPortal Event Platform Migration - https://phabricator.wikimedia.org/T282012 (10Ottomata) Schema should go first. Then stream config, then instrumentation deployment. :) [13:10:18] o/ joal [13:13:34] 10Data-Engineering, 10Data-Engineering-Kanban, 10SRE, 10observability, and 2 others: Upgrade Kafka Risk Evaluation - https://phabricator.wikimedia.org/T302610 (10elukey) @EChetty hi! Could you add some details about what you expect to see in this task? [13:28:59] ottomata: hello! [13:29:15] joal: hello! [13:29:24] ottomata: morning talk? [13:30:03] ya gimme 3 mins to finish soething up [13:30:07] sure! [13:31:29] (03PS1) 10Btullis: Override the location of the pidfile for datahub-frontend [analytics/datahub] (wmf) - 10https://gerrit.wikimedia.org/r/767506 (https://phabricator.wikimedia.org/T301454) [13:32:13] ok joal in bc [13:32:18] joining! [13:34:41] (03PS1) 10Btullis: Correct the location of the MAE and MCE consumer jars [analytics/datahub] (wmf) - 10https://gerrit.wikimedia.org/r/767507 (https://phabricator.wikimedia.org/T301453) [13:40:14] RECOVERY - cache_text: Varnishkafka webrequest Delivery Errors per second -drmrs- on alert1001 is OK: (C)5 ge (W)1 ge 0.8833 https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka https://grafana.wikimedia.org/d/000000253/varnishkafka?panelId=20&fullscreen&orgId=1&var-datasource=drmrs+prometheus/ops&var-source=webrequest&var-cp_cluster=cache_text&var-instance=All [14:17:27] 10Analytics, 10Data-Engineering, 10MediaWiki-extensions-EventLogging, 10QuickSurveys, and 2 others: QuickSurveys should show an error when response is blocked - https://phabricator.wikimedia.org/T256463 (10awight) [14:18:59] (03CR) 10Btullis: [C: 03+2] Override the location of the pidfile for datahub-frontend [analytics/datahub] (wmf) - 10https://gerrit.wikimedia.org/r/767506 (https://phabricator.wikimedia.org/T301454) (owner: 10Btullis) [14:19:17] (03CR) 10Btullis: [C: 03+2] Correct the location of the MAE and MCE consumer jars [analytics/datahub] (wmf) - 10https://gerrit.wikimedia.org/r/767507 (https://phabricator.wikimedia.org/T301453) (owner: 10Btullis) [14:24:56] 10Data-Engineering, 10Data-Engineering-Kanban, 10Airflow, 10Patch-For-Review: Airflow concurrency limits - https://phabricator.wikimedia.org/T300870 (10Antoine_Quhen) Related to: https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/29 [15:07:38] joal: I'm around and meetingless for 2 more hours [15:16:11] Hi milimetric [15:16:13] batcave? [15:16:18] omw [15:22:28] joal: is the -jar-with-dependencies being created by maven shade plugin? [15:22:50] ottomata: Can't remember if it's shaded or something else [15:22:56] but yes it should [15:24:57] hmmkay [15:25:05] i'm hacking pom to try and get somethign to work [15:25:09] might need help shortly... [15:36:06] GAGHHH [15:47:22] 10Data-Engineering, 10CirrusSearch, 10Discovery-Search, 10Event-Platform, and 2 others: "Could not enqueue jobs" when trying to delete a page - https://phabricator.wikimedia.org/T302887 (10dom_walden) [15:47:35] 10Data-Engineering, 10CirrusSearch, 10Discovery-Search, 10Event-Platform, and 2 others: "Could not enqueue jobs" when trying to delete a page - https://phabricator.wikimedia.org/T302887 (10dom_walden) [16:03:55] SandraEbele: wanna help me develop my first airflow job? [16:22:47] does anybody remember why we create /wmf/data/archive/projectview/geo/hourly/? :) [16:28:38] joal: got a separate issue on stat1005, but i also got this when i tried to download and use provided hadoop jars locally [16:28:44] java.lang.NoSuchMethodError: org.apache.commons.cli.Option.builder(Ljava/lang/String;)Lorg/apache/commons/cli/Option$Builder; [16:29:02] maybe the commons-cli with hadoop is a different version than what gobblin needs? [16:29:06] trying to edit pom to fid [16:29:20] x [16:34:26] ottomata: you coming to the PA sync? [16:34:29] (you have an item) [16:34:35] okay, i'm in the tech dept meeting [16:34:37] but okay [16:34:45] ah, no rush, up to you [16:43:44] 10Analytics, 10Data-Engineering, 10Event-Platform, 10Patch-For-Review: Users should run explicit commands to materialize schema versions, rather than using magic git hooks - https://phabricator.wikimedia.org/T290074 (10Ottomata) Okay, PAs are okay with this. TODO - look in git history for schema committe... [16:45:24] 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban, 10Event-Platform, 10Patch-For-Review: Users should run explicit commands to materialize schema versions, rather than using magic git hooks - https://phabricator.wikimedia.org/T290074 (10Ottomata) a:03Ottomata [16:45:54] 10Data-Engineering, 10CirrusSearch, 10Discovery-Search, 10Event-Platform, and 2 others: "Could not enqueue jobs" when trying to delete a page - https://phabricator.wikimedia.org/T302887 (10dom_walden) p:05Triage→03Unbreak! [16:46:19] 10Data-Engineering, 10CirrusSearch, 10Discovery-Search, 10Event-Platform, and 2 others: "Could not enqueue jobs" when trying to delete a page - https://phabricator.wikimedia.org/T302887 (10dom_walden) [16:46:55] 10Data-Engineering, 10CirrusSearch, 10Discovery-Search, 10Event-Platform, and 2 others: "Could not enqueue jobs" error in a lot of places - https://phabricator.wikimedia.org/T302887 (10dom_walden) [16:47:12] hiiiii! I just saw that tomorrows "Analytics Systems Hangtime" overlaps with a new meeting, "Team sharing (Product Analytics)". I'd actually really love to go to both, and I imagine a few others folks might also be in the same boat...? any thoughts on options for de-overlapping them? thx!! [16:53:31] 10Data-Engineering, 10CirrusSearch, 10Discovery-Search, 10Event-Platform, and 3 others: "Could not enqueue jobs" error in a lot of places - https://phabricator.wikimedia.org/T302887 (10dom_walden) [17:05:33] oh, yes [17:05:59] AndyRussG: I asked lauren to invite folks from hangtime to the PA team sharing this week, because of the overlap [17:06:07] we'll be giving a 'lifecycle of an event' presentation [17:06:10] so come to the sharing [17:06:53] 10Data-Engineering, 10Anti-Harassment, 10CirrusSearch, 10Discovery-Search, and 5 others: "Could not enqueue jobs" error in a lot of places - https://phabricator.wikimedia.org/T302887 (10dom_walden) We are investigating whether this was caused by https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/... [17:08:32] AndyRussG: i'll cancel this hangtime and add a note [17:13:26] ottomata: ah cool beans, thanks! [17:15:55] 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban: Check home/HDFS leftovers of bumeh-ctr - https://phabricator.wikimedia.org/T300607 (10JAnstee_WMF) Our team has a new analyst (@KCVelaga) and data engineer (@ntsako) who will be onboarding to the project fully starting next week - Following that we... [17:16:01] 10Data-Engineering, 10Anti-Harassment, 10CirrusSearch, 10Discovery-Search, and 5 others: "Could not enqueue jobs" error in a lot of places - https://phabricator.wikimedia.org/T302887 (10dom_walden) Reverting to see if that fixes this https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/767101. [17:45:55] 10Data-Engineering, 10Anti-Harassment, 10CirrusSearch, 10Discovery-Search, and 5 others: "Could not enqueue jobs" error in a lot of places - https://phabricator.wikimedia.org/T302887 (10dom_walden) >>! In T302887#7748065, @dom_walden wrote: > Reverting to see if that fixes this https://gerrit.wikimedia.org... [18:01:23] 10Data-Engineering, 10Anti-Harassment, 10CirrusSearch, 10Discovery-Search, and 5 others: "Could not enqueue jobs" error in a lot of places - https://phabricator.wikimedia.org/T302887 (10dom_walden) We pushed a new config patch: https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/767558 [18:06:55] 10Data-Engineering, 10MediaWiki-extensions-WikimediaEvents, 10Product-Analytics: Remove CompleteSuggestions EventLoggingSchemas entry - https://phabricator.wikimedia.org/T302894 (10phuedx) [18:15:11] 10Data-Engineering, 10MediaWiki-extensions-WikimediaEvents: Remove SearchSatisfactionErrors EventLoggingSchemas entry - https://phabricator.wikimedia.org/T302895 (10phuedx) [18:18:20] joal: yt? i'm very close [18:18:26] i have gobblin running in hadoop [18:18:32] ottomata: in minutes [18:18:35] but i don't think it is doing anything [18:18:36] okay [18:22:33] 10Data-Engineering, 10MediaWiki-extensions-WikimediaEvents, 10Product-Analytics: Remove CompleteSuggestions EventLoggingSchemas entry - https://phabricator.wikimedia.org/T302894 (10phuedx) [18:24:35] milimetric: can you try again to accept the meeting? [18:30:43] nm (for now joal) figured out what was wrong...still trying more thigns tho [18:31:15] ottomata: I have time now if you wish [18:31:48] its running in yarn! [18:31:53] \o/ [18:32:12] less easy to debug, but eh, at least it's runnin [18:32:27] ya [18:32:40] i had to actually remove a jar from the hadoop classpath [18:32:44] the commons-cli jar [18:32:52] our hadoop's version is older [18:32:59] and actually, it might have worekd locally if i had kept trying thatl [18:33:50] hm - interesting - you run it in yarn with a fat jar that contains all the hadoop deps? [18:33:56] no [18:33:57] 10Data-Engineering, 10MediaWiki-extensions-WikimediaEvents, 10Product-Analytics: Remove InputDeviceDynamics EventLoggingSchemas entry - https://phabricator.wikimedia.org/T302896 (10phuedx) [18:34:00] ok [18:34:08] back to normal pom [18:34:10] buutu okay [18:34:17] no data written yet, but probbably just a bad config on my side [18:34:22] and you still needed to remove that jar? [18:34:25] yes [18:34:30] WEIRD! [18:34:34] 10Data-Engineering, 10MediaWiki-extensions-WikimediaEvents: Remove SearchSatisfactionErrors EventLoggingSchemas entry - https://phabricator.wikimedia.org/T302895 (10phuedx) [18:35:42] joal why do we set data.publisher.replace.final.dir=false [18:35:44] and what is that? [18:36:09] ottomata: this is needed when you want to override folders [18:36:15] we don't do that [18:36:21] we always add [18:36:39] ok [18:36:41] cool [18:36:48] just trying to figure out wy no data [18:36:53] can't tell if its not reading from kafka [18:36:55] assuming its not [18:37:20] ottomata: if you've taken config from ours, you should have metrics! [18:37:44] i have some metrics files ya [18:37:52] and i see workunits being created for my topic and offsets [18:38:25] ok, with work unit creation you get an expected number of items to be read [18:39:05] yes [18:39:13] joal bc real quick? [18:39:16] sure [18:39:16] i'm sure its somethign silly [18:51:50] https://www.irccloud.com/pastebin/kWuGyl3P/ [18:51:52] joal ^ [20:00:05] ottomata: has the thing been working? [20:16:26] 10Data-Engineering, 10MediaWiki-extensions-WikimediaEvents, 10Product-Analytics: Remove CompleteSuggestions EventLoggingSchemas entry - https://phabricator.wikimedia.org/T302894 (10EBernhardson) As far as I'm aware this is correct, there shouldn't be anything generating or consuming from this schema. [20:17:23] 10Data-Engineering, 10MediaWiki-extensions-WikimediaEvents: Remove SearchSatisfactionErrors EventLoggingSchemas entry - https://phabricator.wikimedia.org/T302895 (10EBernhardson) [20:18:36] Anyone got any ideas on how to solve my error of `Py4JError: org.apache.spark.api.python.PythonUtils.isEncryptionEnabled does not exist in the JVM` while trying to do a spark sql query in a notebook? [20:21:07] 10Data-Engineering, 10MediaWiki-extensions-WikimediaEvents: Remove SearchSatisfactionErrors EventLoggingSchemas entry - https://phabricator.wikimedia.org/T302895 (10EBernhardson) The related code was [[ https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikimediaEvents/+/670010 | dropped ]] last year. This... [20:21:56] addshore: no clue on myside :/ pyspark is not my close friend [20:22:57] ack! If I want to do something in a notebook with sql, got a suggestion to uyse instead of wmfdata.spark.run ? [20:28:31] Think I have been saved / spared pain with this comment https://phabricator.wikimedia.org/T275233#7108441 <3 otto_mata :D [20:34:40] addshore: I use wmfdata but not the run [20:35:46] addshore: https://gist.github.com/jobar/a3c47cc2b2f0d2c015822074553d4993 [20:35:58] OOO, thanks, will look at that too! [20:36:05] this has worked for me with the default python3 kernel [20:43:21] (03CR) 10Krinkle: build: Document simpler alternative contribution flow (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/714875 (https://phabricator.wikimedia.org/T290074) (owner: 10Krinkle) [21:53:36] (03CR) 10Ottomata: build: Document simpler alternative contribution flow (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/714875 (https://phabricator.wikimedia.org/T290074) (owner: 10Krinkle) [21:59:17] (03CR) 10Ottomata: [C: 03+1] "Thanks Timo." [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/714874 (owner: 10Krinkle) [22:01:10] addshore: ya the Spark API is pretty powerful, I usually prefer it to wmfdata.spark.run [22:56:15] (EventgateLoggingExternalLatency) firing: Elevated latency for GET events on eventgate-logging-external in codfw. - https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate - https://grafana.wikimedia.org/d/ZB39Izmnz/eventgate?viewPanel=79&orgId=1&var-service=eventgate-logging-external - https://alerts.wikimedia.org [23:01:15] (EventgateLoggingExternalLatency) resolved: Elevated latency for GET events on eventgate-logging-external in codfw. - https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate - https://grafana.wikimedia.org/d/ZB39Izmnz/eventgate?viewPanel=79&orgId=1&var-service=eventgate-logging-external - https://alerts.wikimedia.org