[01:56:18] 10Data-Engineering, 10Wmfdata-Python, 10Product-Analytics (Kanban): Remove Spark session timeout functionality from Wmfdata-Python - https://phabricator.wikimedia.org/T298179 (10nshahquinn-wmf) a:03nshahquinn-wmf This is currently up for review in [PR36](https://github.com/wikimedia/wmfdata-python/pull/36). [01:57:10] 10Data-Engineering, 10Wmfdata-Python, 10Product-Analytics (Kanban): Remove Spark session timeout functionality from Wmfdata-Python - https://phabricator.wikimedia.org/T298179 (10nshahquinn-wmf) Waiting for @xcollazo's review. [01:58:57] 10Data-Engineering, 10Wmfdata-Python, 10Product-Analytics (Kanban): Remodel Wmfdata-Python's Spark API to match underlying behavior - https://phabricator.wikimedia.org/T273210 (10nshahquinn-wmf) a:03nshahquinn-wmf Currently up for review in [PR36](https://github.com/wikimedia/wmfdata-python/pull/36). [01:59:18] 10Data-Engineering, 10Wmfdata-Python, 10Product-Analytics (Kanban): Remodel Wmfdata-Python's Spark API to match underlying behavior - https://phabricator.wikimedia.org/T273210 (10nshahquinn-wmf) Waiting for @xcollazo's review. [02:38:18] 10Data-Engineering, 10Wmfdata-Python, 10Product-Analytics (Kanban): Release Wmfdata-Python 2.0 - https://phabricator.wikimedia.org/T300442 (10nshahquinn-wmf) [02:38:20] 10Analytics-Jupyter, 10Data-Engineering, 10Product-Analytics: Replace anaconda-wmf with smaller, non-stacked Conda environments - https://phabricator.wikimedia.org/T302819 (10nshahquinn-wmf) [02:39:09] 10Data-Engineering, 10Wmfdata-Python, 10Product-Analytics (Kanban): Release Wmfdata-Python 2.0 - https://phabricator.wikimedia.org/T300442 (10nshahquinn-wmf) [02:39:13] 10Data-Engineering, 10Wmfdata-Python, 10Product-Analytics (Kanban): Remove Spark session timeout functionality from Wmfdata-Python - https://phabricator.wikimedia.org/T298179 (10nshahquinn-wmf) [02:39:15] 10Data-Engineering, 10Wmfdata-Python, 10Product-Analytics (Kanban): Remodel Wmfdata-Python's Spark API to match underlying behavior - https://phabricator.wikimedia.org/T273210 (10nshahquinn-wmf) [02:39:30] 10Analytics-Jupyter, 10Data-Engineering, 10Product-Analytics: Replace anaconda-wmf with smaller, non-stacked Conda environments - https://phabricator.wikimedia.org/T302819 (10nshahquinn-wmf) [02:44:36] 10Data-Engineering, 10Product-Analytics, 10Wmfdata-Python: Update anaconda-wmf's wmfdata-python to 1.4.0 - https://phabricator.wikimedia.org/T305067 (10nshahquinn-wmf) 05Open→03Declined Soon, we are going to be moving from `anaconda-wmf` to `conda-analytics` as the base for new Conda environments (T32108... [02:48:30] 10Data-Engineering, 10Product-Analytics, 10Wmfdata-Python: wmfdata.spark module should provide easy access to pyspark - https://phabricator.wikimedia.org/T293722 (10nshahquinn-wmf) 05Open→03Resolved a:03nshahquinn-wmf I've verified that `import pyspark` just works in the new conda-analytics environment... [02:52:15] 10Data-Engineering, 10Wmfdata-Python, 10Product-Analytics (Kanban): Release Wmfdata-Python 2.0 - https://phabricator.wikimedia.org/T300442 (10nshahquinn-wmf) The removals have been merged. This will stay open until we actually release version 2.0, likely late this week or early next. [04:05:28] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: eventlogging_to_druid_editattemptstep_hourly.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [04:54:22] (VarnishkafkaNoMessages) firing: varnishkafka on cp5003 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=eqsin%20prometheus/ops&var-cp_cluster=cache_upload&var-instance=cp5003%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [05:00:37] (VarnishkafkaNoMessages) resolved: (2) varnishkafka on cp5003 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [05:00:48] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [05:44:12] (VarnishkafkaNoMessages) firing: varnishkafka on cp2031 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=codfw%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp2031%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [05:45:12] (VarnishkafkaNoMessages) firing: varnishkafka on cp2033 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=codfw%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp2033%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [05:49:12] (VarnishkafkaNoMessages) resolved: (3) varnishkafka on cp2027 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [05:50:12] (VarnishkafkaNoMessages) resolved: (2) varnishkafka on cp2033 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [08:04:11] (03CR) 10Joal: Add script for HDFS XML fsimage to bin folder (036 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/850169 (https://phabricator.wikimedia.org/T321167) (owner: 10Aqu) [08:23:16] (03CR) 10Joal: Put wikihadoop into refinery/source (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/856530 (https://phabricator.wikimedia.org/T321168) (owner: 10Aqu) [08:39:22] (03PS3) 10Aqu: Add script for HDFS XML fsimage to bin folder [analytics/refinery] - 10https://gerrit.wikimedia.org/r/850169 (https://phabricator.wikimedia.org/T321167) [08:39:52] (03CR) 10Aqu: Add script for HDFS XML fsimage to bin folder (036 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/850169 (https://phabricator.wikimedia.org/T321167) (owner: 10Aqu) [09:05:26] (03PS11) 10Aqu: Add HdfsXMLFsImageConverter to refinery-job [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/852315 (https://phabricator.wikimedia.org/T321168) [09:06:30] (03CR) 10Aqu: Add HdfsXMLFsImageConverter to refinery-job (037 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/852315 (https://phabricator.wikimedia.org/T321168) (owner: 10Aqu) [09:07:50] (03CR) 10Aqu: Add HdfsXMLFsImageConverter to refinery-job (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/852315 (https://phabricator.wikimedia.org/T321168) (owner: 10Aqu) [09:12:13] (03CR) 10CI reject: [V: 04-1] Add HdfsXMLFsImageConverter to refinery-job [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/852315 (https://phabricator.wikimedia.org/T321168) (owner: 10Aqu) [09:35:50] Good morning btullis - would you have a minute for me to talk about the Hive CVE? [09:53:55] joal: Apologies for missing the ping. Yes, should we meet after your catchup with Antoine? [09:54:14] Yes all good :) Thanks btullis [09:55:27] (03PS12) 10Aqu: Add HdfsXMLFsImageConverter to refinery-job [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/852315 (https://phabricator.wikimedia.org/T321168) [09:58:09] (03CR) 10Aqu: Add HdfsXMLFsImageConverter to refinery-job (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/852315 (https://phabricator.wikimedia.org/T321168) (owner: 10Aqu) [10:00:12] 10Data-Engineering-Planning, 10Data Pipelines, 10Foundational Technology Requests, 10Traffic, and 2 others: Add a webrequest sampled topic and ingest into druid/turnilo - https://phabricator.wikimedia.org/T314981 (10elukey) >>! In T314981#8391260, @elukey wrote: > * Meeting between me Joseph Andrew Filipp... [10:00:46] (03CR) 10CI reject: [V: 04-1] Add HdfsXMLFsImageConverter to refinery-job [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/852315 (https://phabricator.wikimedia.org/T321168) (owner: 10Aqu) [10:52:40] (03CR) 10DCausse: "lgtm, left a couple of minor comments" [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/856507 (owner: 10Peter Fischer) [11:06:40] (03CR) 10Phuedx: [C: 03+1] "This LGTM. Nice work DRYing up the schemas!" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/855675 (owner: 10Mazevedo) [11:19:44] 10Data-Engineering-Planning, 10Data Pipelines (Sprint 04): Allow Cormac Parle and Marco Fossati to deploy analytics-platform-eng Airflow instance - https://phabricator.wikimedia.org/T321925 (10mfossati) @Ottomata I don't see any complaints from `scap deploy` now, thanks! [11:22:52] (03CR) 10Joal: "Still 2 comments, but then all good :)" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/850169 (https://phabricator.wikimedia.org/T321167) (owner: 10Aqu) [11:39:30] (03CR) 10Joal: "Comments on already resolved comments for posterity :)" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/852315 (https://phabricator.wikimedia.org/T321168) (owner: 10Aqu) [11:41:42] (03PS3) 10Aqu: Put wikihadoop into refinery/source [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/856530 (https://phabricator.wikimedia.org/T321168) [11:50:13] !log `elukey@kafka-jumbo1001:~$ kafka topics --create --topic webrequest_sampled --partitions 3 --replication-factor 3` - T314981 [11:50:16] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [11:50:16] T314981: Add a webrequest sampled topic and ingest into druid/turnilo - https://phabricator.wikimedia.org/T314981 [11:50:24] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [12:26:12] (03PS1) 10Elukey: druid: add new supervisor for webrequest_sampled [analytics/refinery] - 10https://gerrit.wikimedia.org/r/856949 (https://phabricator.wikimedia.org/T314981) [12:27:43] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [12:29:12] (03CR) 10Elukey: "https://wikitech.wikimedia.org/wiki/Analytics/Systems/Druid#Realtime_indexation_to_Druid is a good starting point for testing in my opinio" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/856949 (https://phabricator.wikimedia.org/T314981) (owner: 10Elukey) [12:48:28] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [13:21:43] (03CR) 10Aqu: Put wikihadoop into refinery/source (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/856530 (https://phabricator.wikimedia.org/T321168) (owner: 10Aqu) [13:23:09] 10Data-Engineering, 10Equity-Landscape: Load country data - https://phabricator.wikimedia.org/T310712 (10ntsako) [13:25:23] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [13:28:59] 10Data-Engineering-Planning, 10Cassandra, 10Data Pipelines (Sprint 04), 10Patch-For-Review: Write dedicated cassandra authorization code to read password from file when loading - https://phabricator.wikimedia.org/T306895 (10Snwachukwu) We now have a custom **`AuthConfFactory`** that will be passed as a par... [13:44:27] (03CR) 10Filippo Giunchedi: "I can't meaningfully vote for the druid-related bits but virtual +1" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/856949 (https://phabricator.wikimedia.org/T314981) (owner: 10Elukey) [13:50:03] (03CR) 10Joal: Put wikihadoop into refinery/source (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/856530 (https://phabricator.wikimedia.org/T321168) (owner: 10Aqu) [13:58:49] (03CR) 10Joal: druid: add new supervisor for webrequest_sampled (032 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/856949 (https://phabricator.wikimedia.org/T314981) (owner: 10Elukey) [14:01:36] (03PS2) 10Elukey: druid: add new supervisor for webrequest_sampled [analytics/refinery] - 10https://gerrit.wikimedia.org/r/856949 (https://phabricator.wikimedia.org/T314981) [14:01:46] (03CR) 10Elukey: "Thanks! Tried to apply all the suggestions :)" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/856949 (https://phabricator.wikimedia.org/T314981) (owner: 10Elukey) [14:04:51] (03CR) 10Joal: [C: 03+1] "Merge when you wish :)" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/856949 (https://phabricator.wikimedia.org/T314981) (owner: 10Elukey) [14:07:45] 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 04): [NEEDS GROOMING] Flink SQL queries should access Kafka topics from a Catalog - https://phabricator.wikimedia.org/T322022 (10tchin) a:03tchin [14:10:16] 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 04): [Shared Event Platform] Mediawiki Stream Enrichment should consume the consolidated page-change stream. - https://phabricator.wikimedia.org/T311084 (10gmodena) [14:10:27] 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 04): Prototype Flink job for content Dumps - https://phabricator.wikimedia.org/T320966 (10Milimetric) [14:15:52] (03CR) 10Elukey: druid: add new supervisor for webrequest_sampled (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/856949 (https://phabricator.wikimedia.org/T314981) (owner: 10Elukey) [14:16:07] (03PS3) 10Elukey: druid: add new supervisor for webrequest_sampled [analytics/refinery] - 10https://gerrit.wikimedia.org/r/856949 (https://phabricator.wikimedia.org/T314981) [14:16:23] (03CR) 10Elukey: [V: 03+2 C: 03+2] druid: add new supervisor for webrequest_sampled [analytics/refinery] - 10https://gerrit.wikimedia.org/r/856949 (https://phabricator.wikimedia.org/T314981) (owner: 10Elukey) [14:16:52] joal: thank youuu for the review.. can I try to start the supervisor? [14:17:42] Please go ahead elukey :) [14:17:58] elukey: I'll setup the data retention after a few hours of running [14:21:58] <3<# [14:21:59] <3 [14:24:19] !log started webrequest_sampled supervisor on Druid Analytics - T314981 [14:24:21] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:24:22] T314981: Add a webrequest sampled topic and ingest into druid/turnilo - https://phabricator.wikimedia.org/T314981 [14:28:36] joal: running! [14:58:37] (03CR) 10Tsevener: [C: 03+1] Add http client_ip to iOS schemas (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/855675 (owner: 10Mazevedo) [14:59:13] aaand https://gerrit.wikimedia.org/r/c/operations/puppet/+/856991 for turnilo! [15:00:05] elukey: Fantastic! [15:16:52] metrics also look good for indexations [15:51:31] 10Data-Engineering-Planning, 10Data Pipelines, 10GitLab, 10Release-Engineering-Team, 10serviceops-collab: Experiencing pipeline failure due to disk-space issues - https://phabricator.wikimedia.org/T310593 (10LSobanski) p:05Triage→03Low Likely needs a design discussion between RelEng and ServiceOps. [15:55:42] 10Data-Engineering-Planning, 10Cloud-Services, 10Shared-Data-Infrastructure, 10serviceops-collab, 10Patch-For-Review: Provide cross-dc redundancy (active-active or active-passive) to all important misc services - https://phabricator.wikimedia.org/T156937 (10LSobanski) 05Open→03Resolved a:03LSobanski... [16:00:58] https://w.wiki/5x$Y \o/ [16:01:04] we need to check sampling etc.. [16:01:08] but it looks working [16:02:32] Awesome work folks. [16:11:06] joal: ok if I add P24H as default retention rule in druid coord? (also replicants 2 as for webrequest_sampled_128) [16:11:48] 10Data-Engineering, 10Data-Engineering-Kanban, 10Patch-For-Review, 10Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-05)): Fix turnilo after upgrade - https://phabricator.wikimedia.org/T308778 (10Stevemunene) Got some input from the Turnilo Slack referencing [[ https://github.com/allegro/turnilo/issues/... [16:14:33] 10Data-Engineering-Radar, 10Cassandra: Bootstrap new Cassandra nodes (eqiad) - https://phabricator.wikimedia.org/T307802 (10Eevans) [16:18:12] 10Data-Engineering-Planning, 10Data Pipelines (Sprint 04): Allow Cormac Parle and Marco Fossati to deploy analytics-platform-eng Airflow instance - https://phabricator.wikimedia.org/T321925 (10Ottomata) Yeehaw [16:26:37] (03PS3) 10Mazevedo: Add http client_ip to iOS schemas [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/855675 (https://phabricator.wikimedia.org/T322790) [16:29:06] (03CR) 10Ottomata: Put wikihadoop into refinery/source (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/856530 (https://phabricator.wikimedia.org/T321168) (owner: 10Aqu) [16:31:30] (03CR) 10Ottomata: "LGTM in general" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/855675 (https://phabricator.wikimedia.org/T322790) (owner: 10Mazevedo) [16:35:50] that time when Luca just reimplemented refine in Benthos https://gerrit.wikimedia.org/r/c/operations/puppet/+/854499/26/modules/profile/templates/benthos/instances/webrequest_live.yaml.erb#24 (so cool!) [16:35:58] (webrequest refine) [16:40:57] (03PS4) 10Mazevedo: Add http client_ip to iOS schemas [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/855675 (https://phabricator.wikimedia.org/T322790) [16:42:35] (03CR) 10Mazevedo: Add http client_ip to iOS schemas (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/855675 (https://phabricator.wikimedia.org/T322790) (owner: 10Mazevedo) [16:43:10] (03CR) 10Ottomata: [C: 03+1] Add http client_ip to iOS schemas [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/855675 (https://phabricator.wikimedia.org/T322790) (owner: 10Mazevedo) [17:02:30] elukey: sorry I was away - I'm looking at the new datasource now, checking stuff - I can add the load/droprule [17:08:23] 10Data-Engineering-Planning, 10Data Pipelines (Sprint 04): Allow Cormac Parle and Marco Fossati to deploy analytics-platform-eng Airflow instance - https://phabricator.wikimedia.org/T321925 (10EChetty) [17:13:09] (03PS4) 10Peter Fischer: Provide internal schema for CirrusSearch update-pipeline updates. [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/856507 [17:13:59] (03PS5) 10Peter Fischer: Provide internal schema for CirrusSearch update-pipeline updates. [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/856507 (https://phabricator.wikimedia.org/T317202) [17:14:32] (03CR) 10Peter Fischer: "Processed comments" [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/856507 (https://phabricator.wikimedia.org/T317202) (owner: 10Peter Fischer) [17:19:30] elukey: I added the druid rule [17:32:19] joal: <3 [17:32:26] thanks a lot for all the help folks! [17:49:53] congrats on the new datasource! [18:01:49] (03CR) 10Kosta Harlan: "This change is ready for review." [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/809150 (https://phabricator.wikimedia.org/T302925) (owner: 10Kosta Harlan) [20:13:48] 10Data-Engineering-Planning, 10Cassandra, 10Data Pipelines (Sprint 04), 10Patch-For-Review: Write dedicated cassandra authorization code to read password from file when loading - https://phabricator.wikimedia.org/T306895 (10Eevans) >>! In T306895#8395895, @Snwachukwu wrote: > We now have a custom **`AuthCo... [21:40:40] (03PS1) 10Ottomata: Bump guava version to match wikimedia-event-utiltiies version [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/857075