[00:55:12] 10Data-Engineering: NEW FEATURE REQUEST: Dataset with active and non-active Wikis - https://phabricator.wikimedia.org/T323662 (10kzimmerman) [01:00:36] 10Data-Engineering: NEW FEATURE REQUEST: Dataset with active and non-active Wikis - https://phabricator.wikimedia.org/T323662 (10Milimetric) Just for anyone that grabs this, we already define "active wikis" and use it in datasets like [[ https://github.com/wikimedia/analytics-refinery/blob/master/hql/geoeditors/... [01:47:15] 10Data-Engineering, 10Product-Analytics, 10Wmfdata-Python, 10Documentation: Create end-user documentation for Wmfdata-Python - https://phabricator.wikimedia.org/T298178 (10nshahquinn-wmf) [01:48:08] 10Data-Engineering, 10Wmfdata-Python, 10Product-Analytics (Kanban): Update Wmfdata-Python quickstart notebook - https://phabricator.wikimedia.org/T323426 (10nshahquinn-wmf) 05Open→03Resolved Merged in [PR40](https://github.com/wikimedia/wmfdata-python/pull/40). [01:48:10] 10Data-Engineering, 10Wmfdata-Python, 10Product-Analytics (Kanban): Release Wmfdata-Python 2.0 - https://phabricator.wikimedia.org/T300442 (10nshahquinn-wmf) [02:27:04] 10Data-Engineering, 10SRE, 10SRE-Access-Requests: Grant ssh access to analytics-admins to dcausse and gmodena - https://phabricator.wikimedia.org/T323280 (10andrea.denisse) [02:33:40] 10Data-Engineering, 10Wmfdata-Python, 10Product-Analytics (Kanban): Release Wmfdata-Python 2.0 - https://phabricator.wikimedia.org/T300442 (10nshahquinn-wmf) Okay, I've merged the documentation improvements and version 2.0.0 changes to `main` and sent a pre-announcement to several Slack channels and analytic... [02:38:15] 10Data-Engineering, 10Wmfdata-Python: Release Wmfdata-Python 2.0 - https://phabricator.wikimedia.org/T300442 (10nshahquinn-wmf) a:05nshahquinn-wmf→03xcollazo [02:38:55] 10Analytics-Jupyter, 10Data-Engineering-Planning, 10Product-Analytics, 10Data Pipelines (Sprint 04), 10Patch-For-Review: Add support for jupyterlab on conda-analytics - https://phabricator.wikimedia.org/T321088 (10nshahquinn-wmf) Cool, thank you @xcollazo! 🎉 [02:47:12] (VarnishkafkaNoMessages) firing: varnishkafka on cp2042 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=codfw%20prometheus/ops&var-cp_cluster=cache_upload&var-instance=cp2042%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [02:48:12] (VarnishkafkaNoMessages) firing: varnishkafka on cp2041 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=codfw%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp2041%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [02:52:12] (VarnishkafkaNoMessages) resolved: varnishkafka on cp2042 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=codfw%20prometheus/ops&var-cp_cluster=cache_upload&var-instance=cp2042%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [02:53:12] (VarnishkafkaNoMessages) resolved: varnishkafka on cp2041 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=codfw%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp2041%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [04:21:50] (03PS1) 10MNeisler: Add the mediawiki_web_ab_test_enrollment stream to the allowlist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/859652 (https://phabricator.wikimedia.org/T323664) [04:23:12] (VarnishkafkaNoMessages) firing: (2) varnishkafka on cp2027 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [04:26:29] 10Data-Engineering, 10Patch-For-Review, 10Product-Analytics (Kanban): Add mediawiki_web_ab_test_enrollment to the allowlist - https://phabricator.wikimedia.org/T323664 (10MNeisler) [04:28:12] (VarnishkafkaNoMessages) resolved: (2) varnishkafka on cp2027 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [07:39:50] 10Data-Engineering-Planning, 10DBA, 10Data-Services: Prepare and check storage layer for guwwiktionary - https://phabricator.wikimedia.org/T309056 (10Marostegui) @BTullis this is ready [07:40:57] 10Data-Engineering-Planning, 10DBA, 10Data-Services: Prepare and check storage layer for pcmwiki - https://phabricator.wikimedia.org/T310879 (10Marostegui) @BTullis this is ready [07:41:41] 10Data-Engineering-Planning, 10DBA, 10Data-Services: Prepare and check storage layer for bjnwiktionary - https://phabricator.wikimedia.org/T312214 (10Marostegui) Reminder: @BTullis this is ready [07:42:02] 10Data-Engineering-Planning, 10DBA: Prepare and check storage layer for blkwiki - https://phabricator.wikimedia.org/T310872 (10Marostegui) Reminder: @BTullis this is ready [08:03:01] (03PS1) 10Phedenskog: navtiming: Add cumulative layout shift and largest contentful paint. [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/859968 (https://phabricator.wikimedia.org/T281022) [08:11:44] 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 04): [NEEDS GROOMING] Flink SQL queries should access Kafka topics from a Catalog - https://phabricator.wikimedia.org/T322022 (10tchin) I was able to implement a flink catalog that acts as an options passthrough to the built-in kafka connector... [09:29:15] btullis, ottomata, steve_munene: o/ [09:29:18] SSL WARNING - Certificate kafka_jumbo-eqiad_broker valid until 2022-12-04 14:47:46 +0000 (expires in 11 days [09:29:41] We have plenty of time but I thought to ping you anyway to warn you [09:47:22] 10Data-Engineering-Planning, 10Data Pipelines, 10Foundational Technology Requests, 10Traffic, and 2 others: Add a webrequest sampled topic and ingest into druid/turnilo - https://phabricator.wikimedia.org/T314981 (10elukey) After a chat with Filippo, we agreed that the work on this task seems done. There m... [10:37:57] elukey: Thanks. This is cergen work in the private repo, yes? https://wikitech.wikimedia.org/wiki/Kafka/Administration#Kafka_Certificates - We haven't switched these to cfssl yet, have we? [10:47:44] btullis: o/ yes correct [10:47:59] in theory regeneration + deploy + roll restart should suffice [10:49:10] Great, I will make a ticket and get on with it. This will be useful for steve_munene to get involved with as well. [11:00:56] Great [11:01:38] 10Data-Engineering-Planning, 10Data Pipelines, 10Foundational Technology Requests, 10Traffic, and 2 others: Add a webrequest sampled topic and ingest into druid/turnilo - https://phabricator.wikimedia.org/T314981 (10Volans) I agree. The only thing maybe left is to check if the segment size is the correct o... [11:20:52] 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 04): Flink SQL queries should access Kafka topics from a Catalog - https://phabricator.wikimedia.org/T322022 (10gmodena) [11:22:10] 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 04): Flink SQL queries should access Kafka topics from a Catalog - https://phabricator.wikimedia.org/T322022 (10gmodena) [11:29:00] 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 04): Flink SQL queries should access Kafka topics from a Catalog - https://phabricator.wikimedia.org/T322022 (10gmodena) > However, once you try to insert something, it gets a bit messy. The kafka connector only allows you to sink to one topic,... [11:46:53] 10Data-Engineering-Planning: Create puppet defined type for adding/updating/deleting secrets or other small files on HDFS - https://phabricator.wikimedia.org/T323692 (10BTullis) [11:56:48] 10Data-Engineering-Planning, 10Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-05)): Update kafka-jumbo certificates - https://phabricator.wikimedia.org/T323697 (10BTullis) [11:58:43] 10Data-Engineering-Planning, 10Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-05)): Update kafka-jumbo certificates - https://phabricator.wikimedia.org/T323697 (10BTullis) p:05Triage→03High [12:14:14] 10Data-Engineering, 10Data-Engineering-Kanban, 10Patch-For-Review, 10Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-05)): Fix turnilo after upgrade - https://phabricator.wikimedia.org/T308778 (10Stevemunene) Upgrade to 1.38.2 is done and all data cubes are visible. Something to note the pre configured... [12:16:11] 10Data-Engineering, 10Patch-For-Review, 10Product-Analytics (Kanban): Add mediawiki_web_ab_test_enrollment to the allowlist - https://phabricator.wikimedia.org/T323664 (10MNeisler) @mforns: Would it be possible to backfill any available data from 2022-07-06 once this is added to the allowlist? We'd like to p... [12:37:13] (VarnishkafkaNoMessages) firing: varnishkafka on cp5031 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=eqsin%20prometheus/ops&var-cp_cluster=cache_upload&var-instance=cp5031%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [12:42:13] (VarnishkafkaNoMessages) resolved: varnishkafka on cp5031 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=eqsin%20prometheus/ops&var-cp_cluster=cache_upload&var-instance=cp5031%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [13:16:41] (03PS1) 10Phedenskog: painttimings: Collect skin information. [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/860022 (https://phabricator.wikimedia.org/T323124) [13:18:06] (03PS2) 10Phedenskog: painttiming: Collect skin information. [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/860022 (https://phabricator.wikimedia.org/T323124) [14:52:11] ACKNOWLEDGEMENT - MegaRAID on an-worker1090 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough Btullis T318659 - Added more downtime, but replacement batteries are on their way https://wikitech.wikimedia.org/wiki/MegaCli%23M [14:52:11] ng [14:58:03] 10Data-Engineering-Planning, 10Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-05)): Update kafka-jumbo certificates - https://phabricator.wikimedia.org/T323697 (10BTullis) We have verified the date of expiry by using the following command: ` btullis@stat1004:~$ openssl s_client -connect kafka-jumbo1001.eqi... [15:20:26] 10Data-Engineering-Planning, 10Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-05)): Update kafka-jumbo certificates - https://phabricator.wikimedia.org/T323697 (10BTullis) Verified the existing certificates' status on the puppetmaster: ` root@puppetmaster1001:/srv/private/modules/secret/secrets/certificate... [15:34:23] 10Data-Engineering-Planning, 10Data Pipelines, 10Foundational Technology Requests, 10Traffic, and 2 others: Add a webrequest sampled topic and ingest into druid/turnilo - https://phabricator.wikimedia.org/T314981 (10Milimetric) @Volans asked me, basically, how come `count(distinct ip)` gives slightly inacc... [15:35:18] 10Data-Engineering-Planning, 10Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-05)): Update kafka-jumbo certificates - https://phabricator.wikimedia.org/T323697 (10BTullis) The certificates have been commited to the private repo and merged. I have verified that rolling out the new certificiates to the broke... [15:38:27] !log roll-restarting kafka-jumbo brokers to pick up new certificates. T323697 [15:38:33] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:38:33] T323697: Update kafka-jumbo certificates - https://phabricator.wikimedia.org/T323697 [15:39:01] 10Data-Engineering-Planning, 10Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-05)): Update kafka-jumbo certificates - https://phabricator.wikimedia.org/T323697 (10BTullis) Now re-enabling puppet on the remaining 8 hosts and running puppet to pull down the certificates. ` btullis@cumin2002:~$ sudo cumin A:k... [15:48:13] 10Data-Engineering-Planning, 10Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-05)): Update kafka-jumbo certificates - https://phabricator.wikimedia.org/T323697 (10BTullis) The cookbook to restart the brokers is running, but we have verified that the first broker has already restarted with the new certifica... [15:53:11] 10Data-Engineering-Planning, 10Data Pipelines, 10Foundational Technology Requests, 10Traffic, and 2 others: Add a webrequest sampled topic and ingest into druid/turnilo - https://phabricator.wikimedia.org/T314981 (10Volans) Thanks a lot for the deep dive and the explanation with examples @Milimetric, much... [16:01:39] (03CR) 10Krinkle: [C: 03+2] painttiming: Collect skin information. [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/860022 (https://phabricator.wikimedia.org/T323124) (owner: 10Phedenskog) [16:02:13] (03Merged) 10jenkins-bot: painttiming: Collect skin information. [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/860022 (https://phabricator.wikimedia.org/T323124) (owner: 10Phedenskog) [16:17:01] 10Data-Engineering, 10Data-Services, 10cloud-services-team (Kanban): clouddb* hosts with ipv6 access timeout from cumin - https://phabricator.wikimedia.org/T323550 (10dcaro) I think @Andrew might have been the one changing it: https://netbox.wikimedia.org/ipam/ip-addresses/7085/changelog/, not sure why though [16:27:48] (03PS1) 10Phedenskog: painttiming: Add missing action and namespace. [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/860078 (https://phabricator.wikimedia.org/T321398) [16:28:53] (03CR) 10Phedenskog: "Adding the tests for first contentful paint in navtiming.py I saw that I missed that we miss out on a couple of other labels too." [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/860078 (https://phabricator.wikimedia.org/T321398) (owner: 10Phedenskog) [17:08:57] 10Data-Engineering-Planning, 10Product-Analytics, 10Data Pipelines (Sprint 04): Presto returns incorrect data for an added field - https://phabricator.wikimedia.org/T321960 (10BTullis) a:03BTullis [17:37:56] 10Data-Engineering-Planning, 10Data Pipelines: NEW FEATURE REQUEST: Upgrade superset to 1.5.2 - https://phabricator.wikimedia.org/T323458 (10BTullis) [17:38:13] (VarnishkafkaNoMessages) firing: varnishkafka on cp5029 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=eqsin%20prometheus/ops&var-cp_cluster=cache_upload&var-instance=cp5029%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [17:38:57] 10Data-Engineering-Planning, 10Shared-Data-Infrastructure: NEW FEATURE REQUEST: Upgrade superset to 1.5.2 - https://phabricator.wikimedia.org/T323458 (10BTullis) [17:39:25] 10Data-Engineering-Radar, 10Cassandra: Bootstrap new Cassandra nodes (eqiad) - https://phabricator.wikimedia.org/T307802 (10Eevans) [17:41:25] 10Data-Engineering-Planning, 10Shared-Data-Infrastructure: NEW FEATURE REQUEST: Upgrade superset to 1.5.2 - https://phabricator.wikimedia.org/T323458 (10BTullis) @EChetty - Fettled the tags on this one and merged a duplicate. Hope that's OK. I thought it was more #shared-data-infrastructure than #data_pipelines. [17:43:13] (VarnishkafkaNoMessages) resolved: varnishkafka on cp5029 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=eqsin%20prometheus/ops&var-cp_cluster=cache_upload&var-instance=cp5029%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [18:00:14] 10Data-Engineering, 10SRE, 10SRE-Access-Requests: Grant ssh access to analytics-admins to dcausse and gmodena - https://phabricator.wikimedia.org/T323280 (10MNadrofsky) @BTullis I approve this for @gmodena . With Will currently away, I'm acting manager for Gabriele. Let me know if you need anything else! [18:35:01] 10Analytics-Jupyter, 10Data-Engineering, 10Product-Analytics: Replace anaconda-wmf with smaller, non-stacked Conda environments - https://phabricator.wikimedia.org/T302819 (10xcollazo) [18:35:03] 10Data-Engineering, 10Wmfdata-Python: Release Wmfdata-Python 2.0 - https://phabricator.wikimedia.org/T300442 (10xcollazo) 05Open→03Resolved Version 2.0.0 has now been released to https://github.com/wikimedia/wmfdata-python/tree/release. Thanks for all the work @nshahquinn-wmf! [19:16:12] (VarnishkafkaNoMessages) firing: varnishkafka on cp2033 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=codfw%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp2033%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [19:21:12] (VarnishkafkaNoMessages) resolved: varnishkafka on cp2033 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=codfw%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp2033%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [20:18:37] 10Data-Engineering, 10Data-Services, 10cloud-services-team (Kanban): clouddb* hosts with ipv6 access timeout from cumin - https://phabricator.wikimedia.org/T323550 (10bd808) >>! In T323550#8412241, @Marostegui wrote: > So this needs to be looked at anyways as it is also affecting maintain-dbusers script (per...