[08:35:10] PROBLEM - Check unit status of produce_canary_events on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [08:47:02] RECOVERY - Check unit status of produce_canary_events on an-launcher1002 is OK: OK: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [10:14:47] 10Data-Engineering, 10Data-Engineering-Kanban: Procure MaxMind GeoIP2 Database License - https://phabricator.wikimedia.org/T303453 (10jbond) >>! In T303453#7765476, @odimitrijevic wrote: > Here are some open questions: > * It looks like there is no GeoIP2 replacement for Region. Who are the users and what are... [10:17:22] 10Data-Engineering, 10Data-Catalog, 10SRE, 10serviceops, 10Service-deployment-requests: New Service Request: DataHub - https://phabricator.wikimedia.org/T303049 (10akosiaris) >>! In T303049#7753274, @BTullis wrote: > How can I tell what the source IP address(es) of my services will be, as seen by the bac... [10:20:49] 10Data-Engineering, 10Data-Catalog, 10SRE, 10serviceops, 10Service-deployment-requests: New Service Request: DataHub - https://phabricator.wikimedia.org/T303049 (10BTullis) Great. Thanks both. I'm now working through the first set of comments left by @JMeybohm on the patch, trying to make it use the scaf... [10:29:20] 10Data-Engineering, 10Data-Engineering-Kanban: Procure MaxMind GeoIP2 Database License - https://phabricator.wikimedia.org/T303453 (10phuedx) >>! In T303453#7765476, @odimitrijevic wrote: > Here are the subscriptions currently on our account: > > * GeoIP Legacy Region Database 2012-02-28 2022-04-07 > * GeoIP2... [10:30:17] 10Data-Engineering, 10Data-Catalog, 10SRE, 10serviceops, 10Service-deployment-requests: New Service Request: DataHub - https://phabricator.wikimedia.org/T303049 (10akosiaris) As far as I am concerned, this service request LGTM. Thanks for the very detailed diagram (including a link to the source), repos... [10:47:49] 10Data-Engineering, 10Data-Catalog, 10SRE, 10serviceops, 10Service-deployment-requests: New Service Request: DataHub - https://phabricator.wikimedia.org/T303049 (10JMeybohm) I'd like to add the proposal of using Ingress (T290966) for the frontend (to not have to configure LVS for that). For the consumers... [10:54:29] 10Data-Engineering, 10Data-Catalog: Set up karapace instance for datahub - https://phabricator.wikimedia.org/T301562 (10BTullis) The karapace1001.eqiad.wmnet machine has now booted and is ready for karapace to be installed. [11:06:02] 10Data-Engineering, 10Data-Catalog, 10SRE, 10serviceops, 10Service-deployment-requests: New Service Request: DataHub - https://phabricator.wikimedia.org/T303049 (10BTullis) > I'd like to add the proposal of using Ingress (T290966) for the frontend (to not have to configure LVS for that). Sounds good to m... [11:08:15] 10Data-Engineering, 10Data-Catalog, 10SRE, 10serviceops, 10Service-deployment-requests: New Service Request: DataHub - https://phabricator.wikimedia.org/T303049 (10BTullis) > Per my undestanding the service will reside in the wikikube cluster for the MVP phase, despite being a bad fit for it per https://... [11:33:51] 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban, 10Event-Platform, and 3 others: Determine which remaining legacy EventLogging schemas need to be migrated or decommissioned - https://phabricator.wikimedia.org/T282131 (10phuedx) > | **Schema** | **Migrate or Decom?** | **Owners?** | **code... [11:42:17] 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban, 10Event-Platform, and 3 others: Determine which remaining legacy EventLogging schemas need to be migrated or decommissioned - https://phabricator.wikimedia.org/T282131 (10phuedx) [11:53:03] 10Data-Engineering, 10Data-Catalog, 10SRE, 10serviceops, 10Service-deployment-requests: New Service Request: DataHub - https://phabricator.wikimedia.org/T303049 (10JMeybohm) >>! In T303049#7766702, @BTullis wrote: >> I'd like to add the proposal of using Ingress (T290966) for the frontend (to not have to... [12:05:37] 10Data-Engineering, 10ExternalGuidance, 10MediaWiki-extensions-WikimediaEvents: Decommission the ExternalGuidance instrument - https://phabricator.wikimedia.org/T303508 (10phuedx) [12:06:06] 10Data-Engineering, 10ExternalGuidance, 10MediaWiki-extensions-WikimediaEvents: Decommission the ExternalGuidance instrument - https://phabricator.wikimedia.org/T303508 (10phuedx) [12:06:09] 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban, 10Event-Platform, and 3 others: Determine which remaining legacy EventLogging schemas need to be migrated or decommissioned - https://phabricator.wikimedia.org/T282131 (10phuedx) [12:08:12] 10Data-Engineering, 10ExternalGuidance, 10MediaWiki-extensions-WikimediaEvents: Decommission the ExternalGuidance instrument - https://phabricator.wikimedia.org/T303508 (10phuedx) > Remove the ExternalGuidance event sanitisation allowlist entry Should the entry be removed? I wasn't sure if removing it would... [12:09:26] 10Data-Engineering, 10Data-Catalog, 10SRE, 10serviceops, 10Service-deployment-requests: New Service Request: DataHub - https://phabricator.wikimedia.org/T303049 (10BTullis) > Sorry, totally my fault! I meant the GMS, not consumer. From what you wrote in T301454#7741876 it sounds like you just don't want... [12:14:45] 10Data-Engineering, 10Data-Catalog, 10SRE, 10serviceops, 10Service-deployment-requests: New Service Request: DataHub - https://phabricator.wikimedia.org/T303049 (10JMeybohm) >>! In T303049#7766821, @BTullis wrote: > Yes, that's right. Great! >>! In T303049#7766821, @BTullis wrote: > So I'll change the `... [12:41:54] 10Data-Engineering, 10Data-Catalog, 10SRE, 10serviceops, 10Service-deployment-requests: New Service Request: DataHub - https://phabricator.wikimedia.org/T303049 (10BTullis) > Actually I was just referring to the diagram, as is mentions specific ports and I wanted to make sure that's not a fixed requirem... [14:39:38] 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban, 10Event-Platform, and 3 others: Determine which remaining legacy EventLogging schemas need to be migrated or decommissioned - https://phabricator.wikimedia.org/T282131 (10phuedx) [14:40:48] 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban, 10Event-Platform, and 3 others: Determine which remaining legacy EventLogging schemas need to be migrated or decommissioned - https://phabricator.wikimedia.org/T282131 (10phuedx) [14:49:48] PROBLEM - Check unit status of produce_canary_events on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [15:00:15] RECOVERY - Check unit status of produce_canary_events on an-launcher1002 is OK: OK: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [16:23:22] 10Data-Engineering, 10DC-Ops, 10SRE, 10ops-eqiad: Q2:(Need By: TBD) rack/setup/install an-worker11[42-48].eqiad.wmnet - https://phabricator.wikimedia.org/T293922 (10Jclark-ctr) a:05Jclark-ctr→03Cmjohnson [16:24:23] 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban, 10Event-Platform, and 3 others: Determine which remaining legacy EventLogging schemas need to be migrated or decommissioned - https://phabricator.wikimedia.org/T282131 (10bd808) >>! In T282131#7766753, @phuedx wrote: >> | **Schema** | **Mig... [16:59:45] milimetric: I'm working on unifying 2 dag factories, and since I'm doing a refactor, I'm giving a try to the idea we discussed yesterday: task subgraphs instead of DAG factories. [17:01:53] mforns: totally, wanna pick an approach here together? [17:02:13] yes! wanna chat after meeting? [17:09:36] intrested ^ too [17:20:29] 10Data-Engineering, 10Data-Engineering-Kanban, 10ContentTranslation, 10Language-analytics, 10Product-Analytics: Abuse filter analytics dashboard is broken - https://phabricator.wikimedia.org/T302970 (10Ottomata) Interesting, what is creating this data? I see that permissions on that table directory are:... [17:48:16] 10Data-Engineering-Radar, 10ExternalGuidance, 10MediaWiki-extensions-WikimediaEvents: Decommission the ExternalGuidance instrument - https://phabricator.wikimedia.org/T303508 (10odimitrijevic) [17:58:43] 10Data-Engineering, 10Data-Engineering-Kanban, 10SRE: Increase max.incremental.fetch.session.cache.slots on Kafka jumbo eqiad - https://phabricator.wikimedia.org/T303324 (10odimitrijevic) p:05Triage→03Medium [18:00:55] 10Data-Engineering-Radar, 10Privacy Engineering, 10Privacy: Privacy review for dataset publishing (Wikidata topic -> pageview data) - https://phabricator.wikimedia.org/T303304 (10odimitrijevic) [18:02:01] 10Data-Engineering-Radar, 10Product-Analytics: Support on understanding traffic and behaviors for users on legacy browsers (somewhat timely) - https://phabricator.wikimedia.org/T303301 (10odimitrijevic) [18:03:49] 10Data-Engineering, 10Data-Engineering-Kanban, 10Airflow: Projectviews by country Airflow job - https://phabricator.wikimedia.org/T303193 (10odimitrijevic) [18:05:18] 10Data-Engineering-Radar, 10Product-Analytics: Support on understanding traffic and behaviors for users on legacy browsers (somewhat timely) - https://phabricator.wikimedia.org/T303301 (10Milimetric) Curious if this dashboard is known and/or helps: https://analytics.wikimedia.org/dashboards/browsers But Produ... [18:12:36] 10Data-Engineering, 10Event-Platform, 10serviceops, 10Patch-For-Review, 10Sustainability (Incident Followup): eventgate-* tls telemetry is disabled - https://phabricator.wikimedia.org/T303042 (10odimitrijevic) Updating to the latest helm chart template would allow for the settings to be picked up automat... [18:25:12] 10Data-Engineering, 10Product-Analytics: Consider not using anaconda as base conda environment - https://phabricator.wikimedia.org/T302819 (10odimitrijevic) p:05Triage→03Low [18:26:57] 10Analytics-Radar, 10Data-Engineering-Radar, 10Event-Platform, 10MediaWiki-Recent-changes, and 2 others: Remove deprecated RCFeedEngine support - https://phabricator.wikimedia.org/T250628 (10odimitrijevic) [18:28:47] 10Data-Engineering, 10Inuka-Team, 10Product-Analytics, 10Superset: Superset timeouts for KaiOS dashboard - https://phabricator.wikimedia.org/T277320 (10nshahquinn-wmf) 05Open→03Resolved Yes, I think this is resolved now. I think I did a few optimizations but it was mainly fixed when the timeout was inc... [18:31:59] milimetric, ottomata, aqu_ (& joal?) if you want we can brainbounce on Airflow task subgraphs in 5/10 mins? [18:32:10] sure mforns - joining the cave [18:36:32] actually mforns, I need to leave in 5 minutes, so no batcave - I'll be back later tonight though [18:36:44] mforns: later for me too. [18:37:07] ok, then let's make it later! [18:39:17] 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban, 10Event-Platform, and 4 others: Determine which remaining legacy EventLogging schemas need to be migrated or decommissioned - https://phabricator.wikimedia.org/T282131 (10phuedx) [18:40:32] 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban, 10Event-Platform, and 4 others: Determine which remaining legacy EventLogging schemas need to be migrated or decommissioned - https://phabricator.wikimedia.org/T282131 (10phuedx) [18:43:49] 10Analytics, 10Analytics-Wikistats, 10Data-Engineering-Radar, 10Product-Analytics, and 4 others: Wikistats pageview data missing counts for Mobile App pageviews on Commons, going back to 2020-11 - https://phabricator.wikimedia.org/T299439 (10SNowick_WMF) @Sharvaniharan is going to look into this for Androi... [18:44:04] omw cave mforns [18:44:09] oh [18:44:11] sorry, later :) [18:49:14] ping me when yall chattin [18:56:20] ok [19:11:27] 10Data-Engineering, 10Event-Platform, 10serviceops, 10Sustainability (Incident Followup): eventgate-* tls telemetry is disabled - https://phabricator.wikimedia.org/T303042 (10JMeybohm) 05Open→03Resolved a:03JMeybohm Change is applied and rolled out to all clusters. Data incoming. [19:11:32] 10Data-Engineering, 10Event-Platform, 10SRE, 10Traffic, and 2 others: Banner sampling leading to a relatively wide site outage (mostly esams) - https://phabricator.wikimedia.org/T303036 (10JMeybohm) [19:14:24] 10Data-Engineering, 10Event-Platform, 10serviceops, 10Sustainability (Incident Followup): eventgate-* tls telemetry is disabled - https://phabricator.wikimedia.org/T303042 (10Ottomata) Thank you [19:19:19] 10Analytics, 10Analytics-Wikistats, 10Data-Engineering: Automate creation of sqoop list of wikis to import data for from sitematrix - https://phabricator.wikimedia.org/T190700 (10Milimetric) [19:19:40] 10Data-Engineering, 10Data-Engineering-Kanban, 10Product-Analytics: 22 small wikis missing from the mediawiki_history dataset - https://phabricator.wikimedia.org/T299548 (10Milimetric) @Urbanecm thanks much for the offer, while that would be cool I think we can also pull this data from the sitematrix for now... [19:21:12] 10Analytics, 10Analytics-Kanban, 10Data-Engineering-Kanban, 10wmfdata-python, 10Product-Analytics (Kanban): wmfdata-python's Hive query output includes logspam - https://phabricator.wikimedia.org/T275233 (10Milimetric) good call, I should follow up on that release procedure [19:40:08] 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban, 10Product-Analytics, 10Superset: Help with data that's not appearing on charts - https://phabricator.wikimedia.org/T301895 (10Milimetric) 05Resolved→03Open I didn't understand this at all! I'll update the description to reflect what I learned. [19:46:15] 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban, 10Product-Analytics, 10Superset: Help with data that's not appearing on charts - https://phabricator.wikimedia.org/T301895 (10Milimetric) [19:52:42] (03PS1) 10MewOphaswongse: Help panel: update schema to reflect new post-edit experience [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/769769 (https://phabricator.wikimedia.org/T301603) [19:53:17] 10Analytics, 10Analytics-Wikistats, 10Data-Engineering: Country pageview breakdown by language - https://phabricator.wikimedia.org/T250001 (10Milimetric) I'm currently working with a volunteer to make this happen. The job that will produce the data we need is being developed in T303193. And indeed the work... [19:54:12] heya milimetric, ottomata, aqu_, joal : wanna brainbounce now? [19:54:33] I got 5 min mforns, but yall should go after that, I'm not that opinionated [19:54:39] to the cave! [19:54:43] omw! [20:09:09] ottomata, aqu, if you're here later and you want to discuss, ping me! :] [20:34:17] oh oops [20:34:24] i'm here, gimme 5 mins [20:37:25] mforns: can chat now [20:38:04] (03PS7) 10Ottomata: [WIP] Add prometheus metrics reporter [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/767178 (owner: 10Joal) [20:39:49] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Add prometheus metrics reporter [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/767178 (owner: 10Joal) [20:56:15] 10Data-Engineering, 10Product-Analytics: Investigate easier methods for WMF staff to access Superset - https://phabricator.wikimedia.org/T258962 (10Milimetric) > I think to really get this fixed, someone from high up in Tech and Product need to convince SRE that this is something that needs resources to be spe... [21:11:59] ottomata: back, you still there? [21:14:30] mforns: ya but i think i just have 5 or 10 mins! [21:15:40] in bc [21:15:57] (03PS8) 10Ottomata: [WIP] Add prometheus metrics reporter [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/767178 (owner: 10Joal) [21:15:59] milimetric: ^^ [21:16:08] ottomata: omw [21:17:29] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Add prometheus metrics reporter [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/767178 (owner: 10Joal) [21:19:20] (I'm afk, I already deferred to Marcel's approach so yall do your thing) [21:52:40] milimetric: btw, there is already a thing for this in airflow! [21:52:41] https://github.com/apache/airflow/blob/b1fdcdfe6778574c53bdf6bcbd59090c59605287/airflow/example_dags/example_task_group.py [21:58:11] ah very cool, so that's what yall decided? Task Groups are the "subdags" and we make as many as we need and as few as possible?