[01:18:12] (VarnishkafkaNoMessages) firing: varnishkafka on cp2033 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=codfw%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp2033%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [01:23:12] (VarnishkafkaNoMessages) resolved: varnishkafka on cp2033 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=codfw%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp2033%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [07:01:12] (VarnishkafkaNoMessages) firing: (3) varnishkafka on cp1085 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [07:06:12] (VarnishkafkaNoMessages) resolved: (3) varnishkafka on cp1085 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [09:30:21] 10Data-Engineering, 10Data Pipelines: Allow Cormac Parle and Marco Fossati to deploy analytics-platform-eng Airflow instance - https://phabricator.wikimedia.org/T321925 (10mfossati) [09:33:10] 10Data-Engineering, 10Data Pipelines: Allow Cormac Parle and Marco Fossati to deploy analytics-platform-eng Airflow instance - https://phabricator.wikimedia.org/T321925 (10mfossati) [09:34:00] 10Data-Engineering, 10Equity-Landscape: Wiki DB Map - https://phabricator.wikimedia.org/T309283 (10ntsako) Data loaded as: ` select * from ntsako.wiki_db_map_input_metrics; ` [11:07:08] 10Data-Engineering, 10API Platform (API Platform Roadmap), 10Platform Engineering Roadmap, 10User-Eevans: AQS 2.0: Unique Devices service - https://phabricator.wikimedia.org/T288298 (10SGupta-WMF) [11:11:22] 10Data-Engineering, 10API Platform (API Platform Roadmap), 10Platform Engineering Roadmap, 10User-Eevans: AQS 2.0: Pageviews Service - https://phabricator.wikimedia.org/T288296 (10SGupta-WMF) [11:16:10] 10Analytics, 10API Platform (API Platform Roadmap), 10Code-Health-Objective, 10Epic, and 3 others: AQS 2.0 - https://phabricator.wikimedia.org/T263489 (10SGupta-WMF) [11:20:56] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 03), 10Shared-Data-Infrastructure (Sprint 03): Create kubernetes namespace and user for the stream_enrichment PoC project - https://phabricator.wikimedia.org/T321682 (10BTullis) >> I'd prefer not to have the _poc suffix just in case it stays there for... [11:23:39] 10Data-Engineering-Planning, 10Shared-Data-Infrastructure, 10Event-Platform Value Stream (Sprint 03): [SPIKE] Deploy event driven stateless Flink service to DSE cluster - https://phabricator.wikimedia.org/T320812 (10gmodena) >>! In T320812#8349345, @gmodena wrote: [...] > 1. A chart for a service that submit... [11:37:28] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 03), 10Shared-Data-Infrastructure (Sprint 03): Create kubernetes namespace and user for the stream_enrichment PoC project - https://phabricator.wikimedia.org/T321682 (10gmodena) >>! In T321682#8356153, @BTullis wrote: > * user: `stream_enrichment_po... [12:32:22] 10Data-Engineering, 10Event-Platform Value Stream: [NEEDS GROOMING] Flink SQL queries should access Kafka topics from a Catalog - https://phabricator.wikimedia.org/T322022 (10gmodena) [13:02:38] (03PS1) 10Snwachukwu: [WIP] Add Custom Authentication Configuration Class for Cassandra. [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/851077 (https://phabricator.wikimedia.org/T306895) [13:02:48] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 03), 10Patch-For-Review, 10Shared-Data-Infrastructure (Sprint 03): Create kubernetes namespace and user for the stream_enrichment PoC project - https://phabricator.wikimedia.org/T321682 (10BTullis) Slight correction; it has to be hyphens and not und... [13:28:09] (03PS1) 10Btullis: Bump to version 0.9.0 of DataHub [analytics/datahub] (wmf) - 10https://gerrit.wikimedia.org/r/851082 (https://phabricator.wikimedia.org/T321907) [13:34:12] (VarnishkafkaNoMessages) firing: varnishkafka on cp4047 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=ulsfo%20prometheus/ops&var-cp_cluster=cache_upload&var-instance=cp4047%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [13:36:56] 10Data-Engineering, 10Product-Analytics: Strange values in stored event data generated before instrumentation code was deployed - https://phabricator.wikimedia.org/T321960 (10Ottomata) Hi, I suspect this is an issue with how Presto is doing column resolution between Hive partitions with different Parquet file... [13:39:12] (VarnishkafkaNoMessages) resolved: varnishkafka on cp4047 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=ulsfo%20prometheus/ops&var-cp_cluster=cache_upload&var-instance=cp4047%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [13:39:43] (03CR) 10CI reject: [V: 04-1] Bump to version 0.9.0 of DataHub [analytics/datahub] (wmf) - 10https://gerrit.wikimedia.org/r/851082 (https://phabricator.wikimedia.org/T321907) (owner: 10Btullis) [13:48:05] 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 03), 10Spike: Easy Flink Python UDF + SQL enrichment - https://phabricator.wikimedia.org/T320968 (10Ottomata) Nice @tchin! FYI: > 1. Create the virtual environment > We don't use conda [...] A problem with virtualenvs is that they don't inc... [13:51:56] 10Data-Engineering, 10Event-Platform Value Stream: [NEEDS GROOMING] Flink SQL queries should access Kafka topics from a Catalog - https://phabricator.wikimedia.org/T322022 (10gmodena) [13:55:02] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 03), 10Patch-For-Review, 10Shared-Data-Infrastructure (Sprint 03): Create kubernetes namespace and user for the stream_enrichment PoC project - https://phabricator.wikimedia.org/T321682 (10Ottomata) > or do we need to permit more users (e.g. @gmoden... [14:12:03] 10Data-Engineering, 10Event-Platform Value Stream: [NEEDS GROOMING] Flink SQL queries should access Kafka topics from a Catalog - https://phabricator.wikimedia.org/T322022 (10Ottomata) > Implement a user-defined Flink Catalog atop eventutilities. I'm partial to this one if it is possible! [14:14:28] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 03), 10Patch-For-Review, 10Shared-Data-Infrastructure (Sprint 03): Create kubernetes namespace and user for the stream_enrichment PoC project - https://phabricator.wikimedia.org/T321682 (10elukey) Folks one suggestion - we should aim to use the Depl... [14:37:21] (03PS2) 10Btullis: Bump to version 0.9.0 of DataHub [analytics/datahub] (wmf) - 10https://gerrit.wikimedia.org/r/851082 (https://phabricator.wikimedia.org/T321907) [14:49:07] 10Data-Engineering, 10Product-Analytics: Strange values in stored event data generated before instrumentation code was deployed - https://phabricator.wikimedia.org/T321960 (10mforns) Interesting, @Ottomata! It says [[ https://github.com/prestodb/presto/pull/16011 | here ]] that the fix for the bug you paste wa... [14:52:31] 10Data-Engineering, 10Product-Analytics: Strange values in stored event data generated before instrumentation code was deployed - https://phabricator.wikimedia.org/T321960 (10Ottomata) Ah I was looking for that, I see it now. Interesting, it does indeed sound very similar though. [14:57:11] (03CR) 10CI reject: [V: 04-1] Bump to version 0.9.0 of DataHub [analytics/datahub] (wmf) - 10https://gerrit.wikimedia.org/r/851082 (https://phabricator.wikimedia.org/T321907) (owner: 10Btullis) [15:01:32] 10Data-Engineering, 10Data Pipelines (Sprint 03): Reduce the number of files generated by geoeditors airflor jobs - https://phabricator.wikimedia.org/T304852 (10mforns) a:03mforns [15:14:06] 10Data-Engineering, 10Data Pipelines: Implement periodical cleaning of Airflow databases - https://phabricator.wikimedia.org/T322036 (10mforns) [15:21:01] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 03), 10Patch-For-Review, 10Shared-Data-Infrastructure (Sprint 03): Create kubernetes namespace and user for the stream_enrichment PoC project - https://phabricator.wikimedia.org/T321682 (10gmodena) >>! In T321682#8356624, @BTullis wrote: >>>! In T32... [15:32:35] 10Data-Engineering-Planning, 10Event-Platform Value Stream, 10MediaWiki-libs-HTTP, 10Beta-Cluster-reproducible, 10Wikimedia-production-error: PHP Warning: curl_multi_remove_handle(): supplied resource is not a valid cURL Multi Handle resource - https://phabricator.wikimedia.org/T288624 (10Ottomata) I'm n... [15:57:55] joal: hello! I'm looking into https://phabricator.wikimedia.org/T304852 but I can not find where the geoeditors jobs are generating lots of files, I can only see 1 file everywhere I look, all queries have the /*+ COALESCE(...) */ snippet [15:58:40] * mforns runs a 20-min errand [16:09:53] 10Data-Engineering-Planning, 10Event-Platform Value Stream: EventGate should support producing keyed messages for Kafka partitioning - https://phabricator.wikimedia.org/T318846 (10Ottomata) A question to answer: Do message keys need schemas? Probably yes...but this might be pretty annoying to accomplish. [16:19:12] (VarnishkafkaNoMessages) firing: varnishkafka on cp4052 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=ulsfo%20prometheus/ops&var-cp_cluster=cache_upload&var-instance=cp4052%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [16:19:54] ^ depooled [16:24:12] (VarnishkafkaNoMessages) resolved: varnishkafka on cp4052 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=ulsfo%20prometheus/ops&var-cp_cluster=cache_upload&var-instance=cp4052%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [16:49:41] 10Data-Engineering-Planning, 10Shared-Data-Infrastructure, 10Data Pipelines (Sprint 03): Create Plan for Spark 2 Deprecation - https://phabricator.wikimedia.org/T318367 (10JArguello-WMF) @mpopov Could you please help Marcel review the inventory of all the remaining jobs still using spark 2 in your team? Than... [16:51:54] (03PS3) 10Btullis: Bump to version 0.9.0 of DataHub [analytics/datahub] (wmf) - 10https://gerrit.wikimedia.org/r/851082 (https://phabricator.wikimedia.org/T321907) [17:22:04] (03CR) 10CI reject: [V: 04-1] Bump to version 0.9.0 of DataHub [analytics/datahub] (wmf) - 10https://gerrit.wikimedia.org/r/851082 (https://phabricator.wikimedia.org/T321907) (owner: 10Btullis) [18:50:17] (03PS4) 10Btullis: Bump to version 0.9.0 of DataHub [analytics/datahub] (wmf) - 10https://gerrit.wikimedia.org/r/851082 (https://phabricator.wikimedia.org/T321907) [18:56:18] (03CR) 10CI reject: [V: 04-1] Bump to version 0.9.0 of DataHub [analytics/datahub] (wmf) - 10https://gerrit.wikimedia.org/r/851082 (https://phabricator.wikimedia.org/T321907) (owner: 10Btullis) [19:03:09] (03PS5) 10Btullis: Bump to version 0.9.0 of DataHub [analytics/datahub] (wmf) - 10https://gerrit.wikimedia.org/r/851082 (https://phabricator.wikimedia.org/T321907) [19:05:42] (03PS6) 10Btullis: Bump to version 0.9.0 of DataHub [analytics/datahub] (wmf) - 10https://gerrit.wikimedia.org/r/851082 (https://phabricator.wikimedia.org/T321907) [19:11:53] (03PS7) 10Btullis: Bump to version 0.9.0 of DataHub [analytics/datahub] (wmf) - 10https://gerrit.wikimedia.org/r/851082 (https://phabricator.wikimedia.org/T321907) [19:20:26] (03CR) 10CI reject: [V: 04-1] Bump to version 0.9.0 of DataHub [analytics/datahub] (wmf) - 10https://gerrit.wikimedia.org/r/851082 (https://phabricator.wikimedia.org/T321907) (owner: 10Btullis) [20:45:20] 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 03), 10Patch-For-Review: [Shared Event Platform] Produce new mediawiki.page-change stream from MediaWiki EventBus - https://phabricator.wikimedia.org/T311129 (10Ottomata) Alright! We are live in beta, and ready to go in selective wikis in pr... [20:50:22] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [21:00:20] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [21:06:16] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [21:12:50] 10Data-Engineering-Planning, 10Event-Platform Value Stream, 10Patch-For-Review: EventGate should support producing keyed messages for Kafka partitioning - https://phabricator.wikimedia.org/T318846 (10Ottomata) > B. Using stream config to to set the fields in the value that should be used for the key. I thin... [21:36:40] 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 03), 10Spike: Easy Flink Python UDF + SQL enrichment - https://phabricator.wikimedia.org/T320968 (10tchin) > A problem with virtualenvs is that they don't include the python executable. Can you elaborate on that? I thought the executable is `... [21:56:36] 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 03), 10Spike: Easy Flink Python UDF + SQL enrichment - https://phabricator.wikimedia.org/T320968 (10Ottomata) > Can you elaborate on that? I thought the executable is venv/bin/python3 Uh hm. I just checked and I also see the executable in a... [22:15:20] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [22:21:16] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [22:33:13] (VarnishkafkaNoMessages) firing: (3) varnishkafka on cp5010 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [22:38:13] (VarnishkafkaNoMessages) resolved: (3) varnishkafka on cp5010 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [22:43:42] (VarnishkafkaNoMessages) firing: (3) varnishkafka on cp5010 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [22:45:06] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [22:48:42] (VarnishkafkaNoMessages) resolved: (3) varnishkafka on cp5010 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [22:51:02] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [23:30:24] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [23:36:20] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state