[00:08:40] 10Data-Engineering-Planning, 10Product-Analytics, 10Data Pipelines (Sprint 07): Include EU Registered Country in the canonical country database - https://phabricator.wikimedia.org/T324995 (10nshahquinn-wmf) a:05mforns→03None The PR to review is here: https://github.com/wikimedia-research/canonical-data/p... [00:30:49] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [02:03:30] 10Data-Engineering-Radar, 10Gerrit-Privilege-Requests, 10Release-Engineering-Team (Blocking 🧱): Requesting membership of the analytics group in gerrit for 'snwachukwu', 'nokafor', and 'xcollazo' - https://phabricator.wikimedia.org/T314592 (10xcollazo) Great, thanks! [03:21:58] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 07), 10Patch-For-Review: Design Schema for page state and page state with content (enriched) streams - https://phabricator.wikimedia.org/T308017 (10Tgr) >>! In T308017#8519809, @Ottomata wrote: > But recently, a new major version of the PageProperties... [09:31:45] (03PS1) 10Aqu: [WIP] Java preparations before migrating webrequest to Spark [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/883118 [09:50:08] 10Analytics: jmx_presto prometheus job down for some an-presto hosts - https://phabricator.wikimedia.org/T327753 (10fgiunchedi) [09:59:15] (03PS2) 10Aqu: [WIP] Java preparations before migrating webrequest to Spark [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/883118 (https://phabricator.wikimedia.org/T327072) [10:32:41] (VarnishkafkaNoMessages) firing: varnishkafka on cp4046 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=ulsfo%20prometheus/ops&var-cp_cluster=cache_upload&var-instance=cp4046%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [10:37:41] (VarnishkafkaNoMessages) resolved: varnishkafka on cp4046 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=ulsfo%20prometheus/ops&var-cp_cluster=cache_upload&var-instance=cp4046%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [11:20:43] btullis: o/ ok if I merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/883100 ? [11:21:39] elukey: Yep, fine by me. Thanks. [11:22:14] super thanks [11:50:51] 10Data-Engineering-Planning, 10Patch-For-Review, 10Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-05)): NEW FEATURE REQUEST: Upgrade superset to 1.5.3 - https://phabricator.wikimedia.org/T323458 (10Manuel) Hi @BTullis, this solved the issues for me, thank you! [12:28:52] 10Analytics, 10Patch-For-Review: Fix broken image on front page of analytics.wikimedia.org - https://phabricator.wikimedia.org/T327687 (10Aklapper) [13:53:50] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 07), 10Patch-For-Review: Design Schema for page state and page state with content (enriched) streams - https://phabricator.wikimedia.org/T308017 (10Ottomata) Oh, okay. Thanks. [14:14:03] ottomata: aarora was looking for a one-year-old version of pagelinks... which doesn't exist. We have snapshots back a few months, but nothing a year old. This is what I usually talk about - tracking dependencies over time. So I tried to find aarora on other chats, but I couldn't [14:16:41] (VarnishkafkaNoMessages) firing: varnishkafka on cp4047 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=ulsfo%20prometheus/ops&var-cp_cluster=cache_upload&var-instance=cp4047%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [14:20:22] 10Data-Engineering-Planning, 10DC-Ops, 10SRE, 10Shared-Data-Infrastructure, 10ops-eqiad: Q1:rack/setup/install druid10[09-11] - https://phabricator.wikimedia.org/T314335 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1001 for host druid1010.eqiad.wmnet with OS bull... [14:21:41] (VarnishkafkaNoMessages) resolved: varnishkafka on cp4047 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=ulsfo%20prometheus/ops&var-cp_cluster=cache_upload&var-instance=cp4047%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [14:30:52] 10Data-Engineering-Planning: Data Engineering Pairing system - https://phabricator.wikimedia.org/T327790 (10JArguello-WMF) [14:44:16] 10Data-Engineering-Planning: Data Engineering Pairing system - https://phabricator.wikimedia.org/T327790 (10JArguello-WMF) [15:12:24] (03PS14) 10Snwachukwu: Refactor and Expand External referer classification [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/864772 (https://phabricator.wikimedia.org/T309769) [16:08:30] 10Analytics: jmx_presto prometheus job down for some an-presto hosts - https://phabricator.wikimedia.org/T327753 (10Stevemunene) Hi @fgiunchedi the 10 servers were taken out of the cluster due to challenges in joining the presto cluster documented here T325809 T325331 T323783 Last message from @BTullis being "... [16:10:48] 10Data-Engineering, 10Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-05)): Datahub errors in staging-codfw - https://phabricator.wikimedia.org/T327799 (10BTullis) p:05Triage→03High [16:57:27] 10Data-Engineering-Planning, 10Event-Platform Value Stream: Replace refinery-source Guava caches by Caffeine - https://phabricator.wikimedia.org/T325266 (10Antoine_Quhen) [17:54:29] 10Data-Engineering, 10Patch-For-Review, 10Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-05)): Datahub errors in staging-codfw - https://phabricator.wikimedia.org/T327799 (10JMeybohm) [17:59:50] 10Data-Engineering, 10Equity-Landscape: Population input metrics - https://phabricator.wikimedia.org/T309279 (10ntsako) Final input metrics table moved from `ntsako.population_data_input_metrics` which now stores the csv data to `ntsako.population_leadership_input_metrics` [18:07:13] 10Data-Engineering, 10Product-Analytics (Kanban), 10Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-05)): Superset Date Filter fix needed - https://phabricator.wikimedia.org/T318299 (10mpopov) a:05BTullis→03Mayakp.wiki [18:07:19] 10Data-Engineering, 10Product-Analytics (Kanban), 10Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-05)): Superset Date Filter fix needed - https://phabricator.wikimedia.org/T318299 (10mpopov) p:05Triage→03Medium [18:12:26] 10Data-Engineering-Planning, 10Data Pipelines (Sprint 07), 10Product-Analytics (Kanban): Include EU Registered Country in the canonical country database - https://phabricator.wikimedia.org/T324995 (10mpopov) a:03nshahquinn-wmf [19:05:49] 10Data-Engineering-Planning, 10DC-Ops, 10SRE, 10Shared-Data-Infrastructure, 10ops-eqiad: Q1:rack/setup/install druid10[09-11] - https://phabricator.wikimedia.org/T314335 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1001 for host druid1010.eqiad.wmnet with OS bullseye... [19:06:35] 10Data-Engineering-Planning, 10DC-Ops, 10SRE, 10Shared-Data-Infrastructure, 10ops-eqiad: Q1:rack/setup/install druid10[09-11] - https://phabricator.wikimedia.org/T314335 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1001 for host druid1011.eqiad.wmnet with OS bull... [19:36:18] PROBLEM - Webrequests Varnishkafka log producer on cp5032 is CRITICAL: PROCS CRITICAL: 0 processes with args /usr/bin/varnishkafka -S /etc/varnishkafka/webrequest.conf https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka [20:04:51] 10Analytics-Radar, 10Infrastructure-Foundations, 10Puppet: modules/udp2log/manifests/instance/monitoring.pp has unreachable code - https://phabricator.wikimedia.org/T152104 (10Dzahn) It's been years since my last comment that it's been years. [20:06:24] RECOVERY - Webrequests Varnishkafka log producer on cp5032 is OK: PROCS OK: 1 process with args /usr/bin/varnishkafka -S /etc/varnishkafka/webrequest.conf https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka [20:07:06] 10Quarry, 10observability: Develop the monitoring of Quarry - https://phabricator.wikimedia.org/T205150 (10Dzahn) Shouldn't wmcs be added to this? [20:09:32] 10Analytics-Radar, 10Analytics-Wikistats, 10Data-Engineering: 500 error on wikimedia stats.wikimedia.org - https://phabricator.wikimedia.org/T205163 (10Dzahn) I never received a response to this ticket and it's been years. So seems unlikely it will be acted upon. Let's give up on it then... [20:10:12] 10Analytics-Radar, 10Analytics-Wikistats, 10Data-Engineering: 500 error on wikimedia stats.wikimedia.org - https://phabricator.wikimedia.org/T205163 (10Dzahn) 05Open→03Declined :/ [20:15:12] (VarnishkafkaNoMessages) firing: varnishkafka on cp2041 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=codfw%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp2041%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [20:20:12] (VarnishkafkaNoMessages) resolved: varnishkafka on cp2041 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=codfw%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp2041%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [20:20:28] joal: since you asked about the code review at today's standup, I'd actually like to know if you are OK with my answer to your regex question :-) [20:20:47] the other ones, I'm about to address as you suggested [20:21:41] (VarnishkafkaNoMessages) firing: varnishkafka on cp6009 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=drmrs%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp6009%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [20:21:41] (VarnishkafkaNoMessages) firing: varnishkafka on cp6009 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=drmrs%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp6009%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [20:23:11] oh, joal: and also the one question I had about your catalog.schema.table comment, please [20:26:41] (VarnishkafkaNoMessages) resolved: (2) varnishkafka on cp2041 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [20:26:41] (VarnishkafkaNoMessages) resolved: (2) varnishkafka on cp2041 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [20:35:12] (VarnishkafkaNoMessages) firing: varnishkafka on cp2041 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=codfw%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp2041%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [20:39:36] 10Data-Engineering, 10AQS 2.0 Roadmap, 10API Platform (API Platform Roadmap), 10Epic, and 2 others: AQS 2.0: Device Analytics service - https://phabricator.wikimedia.org/T288298 (10BPirkle) [20:39:56] (VarnishkafkaNoMessages) resolved: (2) varnishkafka on cp2041 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [20:40:43] 10Data-Engineering, 10AQS 2.0 Roadmap, 10API Platform (API Platform Roadmap), 10Epic, and 2 others: AQS 2.0: Device Analytics service - https://phabricator.wikimedia.org/T288298 (10BPirkle) [20:50:57] 10Data-Engineering, 10AQS 2.0 Roadmap, 10API Platform (API Platform Roadmap), 10Epic, and 2 others: AQS 2.0: Page Analytics Service - https://phabricator.wikimedia.org/T288296 (10BPirkle) [20:52:43] 10Data-Engineering, 10AQS 2.0 Roadmap, 10API Platform (API Platform Roadmap), 10Epic, and 2 others: AQS 2.0: Media Analytics Service - https://phabricator.wikimedia.org/T288303 (10BPirkle) [20:53:04] 10Data-Engineering, 10AQS 2.0 Roadmap, 10API Platform (API Platform Roadmap), 10Epic, and 2 others: AQS 2.0: Geo Analytics Service - https://phabricator.wikimedia.org/T288305 (10BPirkle) [21:07:40] 10Data-Engineering-Planning, 10DC-Ops, 10SRE, 10Shared-Data-Infrastructure, 10ops-eqiad: Q1:rack/setup/install druid10[09-11] - https://phabricator.wikimedia.org/T314335 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1001 for host druid1011.eqiad.wmnet with OS bullseye... [21:08:05] 10Data-Engineering-Planning, 10DC-Ops, 10SRE, 10Shared-Data-Infrastructure, 10ops-eqiad: Q1:rack/setup/install druid10[09-11] - https://phabricator.wikimedia.org/T314335 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1001 for host druid1009.eqiad.wmnet with OS bull... [21:22:38] 10Data-Engineering, 10Event-Platform Value Stream: [NEEDS GROOMING] Set PYTHONPATH and FLINK_CLASSPATH in Flink docker images. - https://phabricator.wikimedia.org/T327494 (10Ottomata) > We should meet the install_requires https://gerrit.wikimedia.org/r/c/operations/docker-images/production-images/+/883278 Tr... [21:44:43] 10Data-Engineering-Planning, 10DC-Ops, 10SRE, 10Shared-Data-Infrastructure, 10ops-eqiad: Q1:rack/setup/install druid10[09-11] - https://phabricator.wikimedia.org/T314335 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1001 for host druid1009.eqiad.wmnet with OS bullseye... [23:29:58] 10Data-Engineering, 10AQS 2.0 Roadmap, 10API Platform (API Platform Roadmap), 10Epic, and 2 others: AQS 2.0:Wikistats 2 service - https://phabricator.wikimedia.org/T288301 (10BPirkle) [23:31:25] 10Data-Engineering, 10AQS 2.0 Roadmap, 10API Platform (API Platform Roadmap), 10Epic, and 2 others: AQS 2.0:Wikistats 2 service - https://phabricator.wikimedia.org/T288301 (10BPirkle) 05Open→03Invalid Closing as invalid. We will instead do {T327817} and {T327818}