[00:00:40] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:35:12] (VarnishkafkaNoMessages) firing: varnishkafka on cp4044 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=ulsfo%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp4044%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [00:36:12] (VarnishkafkaNoMessages) firing: varnishkafka on cp4046 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=ulsfo%20prometheus/ops&var-cp_cluster=cache_upload&var-instance=cp4046%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [00:40:12] (VarnishkafkaNoMessages) resolved: varnishkafka on cp4044 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=ulsfo%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp4044%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [00:41:12] (VarnishkafkaNoMessages) resolved: varnishkafka on cp4046 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=ulsfo%20prometheus/ops&var-cp_cluster=cache_upload&var-instance=cp4046%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [00:43:12] (VarnishkafkaNoMessages) firing: varnishkafka on cp4044 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=ulsfo%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp4044%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [00:48:12] (VarnishkafkaNoMessages) resolved: varnishkafka on cp4044 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=ulsfo%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp4044%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [01:25:48] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: monitor_refine_event.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [03:52:12] (VarnishkafkaNoMessages) firing: varnishkafka on cp4043 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=ulsfo%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp4043%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [03:57:12] (VarnishkafkaNoMessages) resolved: varnishkafka on cp4043 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=ulsfo%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp4043%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [04:39:12] (VarnishkafkaNoMessages) firing: varnishkafka on cp4043 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=ulsfo%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp4043%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [04:44:12] (VarnishkafkaNoMessages) resolved: varnishkafka on cp4043 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=ulsfo%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp4043%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [06:14:12] (VarnishkafkaNoMessages) firing: varnishkafka on cp4047 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=ulsfo%20prometheus/ops&var-cp_cluster=cache_upload&var-instance=cp4047%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [06:19:12] (VarnishkafkaNoMessages) resolved: varnishkafka on cp4047 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=ulsfo%20prometheus/ops&var-cp_cluster=cache_upload&var-instance=cp4047%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [07:35:12] (VarnishkafkaNoMessages) firing: varnishkafka on cp4044 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=ulsfo%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp4044%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [07:40:12] (VarnishkafkaNoMessages) resolved: varnishkafka on cp4044 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=ulsfo%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp4044%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [08:58:12] (VarnishkafkaNoMessages) firing: varnishkafka on cp2033 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=codfw%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp2033%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [08:58:12] (VarnishkafkaNoMessages) firing: varnishkafka on cp1085 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=eqiad%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp1085%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [09:03:12] (VarnishkafkaNoMessages) resolved: (2) varnishkafka on cp1085 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [09:03:12] (VarnishkafkaNoMessages) resolved: varnishkafka on cp1085 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=eqiad%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp1085%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [09:12:25] PROBLEM - MegaRAID on an-worker1083 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [09:23:13] RECOVERY - MegaRAID on an-worker1083 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [10:15:01] PROBLEM - MegaRAID on an-worker1083 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [10:23:56] 10Data-Engineering-Planning, 10Data Pipelines (Sprint 03), 10Patch-For-Review, 10Technical-Debt: Prepare the fsimage - https://phabricator.wikimedia.org/T321167 (10Antoine_Quhen) @EChetty, there is a question about the openness of this dataset: * It contains a list of all files on HDFS, including the `/use... [10:29:12] (VarnishkafkaNoMessages) firing: varnishkafka on cp2033 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=codfw%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp2033%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [10:34:12] (VarnishkafkaNoMessages) resolved: varnishkafka on cp2033 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=codfw%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp2033%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [10:46:22] RECOVERY - MegaRAID on an-worker1083 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [11:51:22] 10Data-Engineering-Operations, 10Data-Engineering-Planning, 10Mail, 10SRE, 10Patch-For-Review: Change the analytics-alerts email alias to a mailman distribution list - https://phabricator.wikimedia.org/T315486 (10BTullis) OK, thanks all. I'll make that change to the exim aliases file: `analytics-alerts:... [11:52:28] PROBLEM - MegaRAID on an-worker1083 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [12:03:28] RECOVERY - MegaRAID on an-worker1083 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [12:51:39] 10Data-Engineering-Planning, 10Shared-Data-Infrastructure, 10Event-Platform Value Stream (Sprint 03): [SPIKE] Deploy event driven stateless Flink service to DSE cluster - https://phabricator.wikimedia.org/T320812 (10gmodena) Here's a summary of discussions I had with folks currently involved with Flink and k... [12:54:51] (03PS1) 10Kosta Harlan: HomepageVisit: Add specialcontribute as valid referer_route [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/850125 (https://phabricator.wikimedia.org/T320826) [13:15:36] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 03), 10Patch-For-Review: Design Schema for page state and page state with content (enriched) streams - https://phabricator.wikimedia.org/T308017 (10JArguello-WMF) [13:18:15] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 03), 10Patch-For-Review: Design Schema for page state and page state with content (enriched) streams - https://phabricator.wikimedia.org/T308017 (10JArguello-WMF) [13:20:28] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 03), 10Patch-For-Review: Design Schema for page state and page state with content (enriched) streams - https://phabricator.wikimedia.org/T308017 (10JArguello-WMF) @Ottomata Is the decision made something worth documenting in the decision log? [13:21:34] 10Data-Engineering, 10Event-Platform Value Stream, 10Patch-For-Review: Design Schema for page state and page state with content (enriched) streams - https://phabricator.wikimedia.org/T308017 (10JArguello-WMF) [13:30:30] 10Data-Engineering-Planning, 10DC-Ops, 10SRE, 10Shared-Data-Infrastructure, 10ops-eqiad: Q1:rack/setup/install druid10[09-11] - https://phabricator.wikimedia.org/T314335 (10Cmjohnson) The mgmt links are still not working, The DNS is correct but I am unable to ping the servers. [13:37:50] (03Abandoned) 10Btullis: Update the email used for alerting the data engineering team [analytics/refinery] - 10https://gerrit.wikimedia.org/r/848269 (https://phabricator.wikimedia.org/T315486) (owner: 10Btullis) [13:53:29] 10Data-Engineering, 10Equity-Landscape: World Bank Data - https://phabricator.wikimedia.org/T309282 (10ntsako) Table renamed to ` select * from ntsako.world_bank_data_input_metrics ` [13:56:48] 10Data-Engineering-Planning, 10Equity-Landscape: Load language data - https://phabricator.wikimedia.org/T315886 (10ntsako) [13:59:37] 10Data-Engineering, 10Equity-Landscape: Programs input metric - https://phabricator.wikimedia.org/T309277 (10ntsako) Data moved to: ` select * from ntsako.programs_input_metrics ` [14:03:17] 10Data-Engineering-Planning, 10Data Pipelines (Sprint 03), 10Technical-Debt: Create and deploy the fsimage job. - https://phabricator.wikimedia.org/T321168 (10Antoine_Quhen) [14:05:02] PROBLEM - MegaRAID on an-worker1083 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [14:05:51] 10Data-Engineering, 10Equity-Landscape: Affiliates input metric - https://phabricator.wikimedia.org/T309275 (10ntsako) Reads from affiliate_data_csv and joins data with affiliate leadership to output affiliate_leadership_input_metrics: Queries used: ` -- Run in Hue WITH country_data AS ( SELECT distinc... [14:06:59] 10Data-Engineering: RAID battery alert in an-worker1083 - https://phabricator.wikimedia.org/T321809 (10BTullis) [14:13:23] (03PS1) 10Aqu: Move bash script to generate & put xml fsimage [analytics/refinery] - 10https://gerrit.wikimedia.org/r/850169 (https://phabricator.wikimedia.org/T321167) [14:35:45] RECOVERY - MegaRAID on an-worker1083 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [15:03:57] PROBLEM - Checks that the local airflow scheduler for airflow @analytics is working properly on an-launcher1002 is CRITICAL: CRITICAL: /usr/bin/env AIRFLOW_HOME=/srv/airflow-analytics /usr/lib/airflow/bin/airflow jobs check --job-type SchedulerJob --hostname an-launcher1002.eqiad.wmnet did not succeed https://wikitech.wikimedia.org/wiki/Analytics/Systems/Airflow [15:05:41] RECOVERY - Checks that the local airflow scheduler for airflow @analytics is working properly on an-launcher1002 is OK: OK: /usr/bin/env AIRFLOW_HOME=/srv/airflow-analytics /usr/lib/airflow/bin/airflow jobs check --job-type SchedulerJob --hostname an-launcher1002.eqiad.wmnet succeeded https://wikitech.wikimedia.org/wiki/Analytics/Systems/Airflow [15:15:29] 10Data-Engineering, 10Equity-Landscape: Population input metrics - https://phabricator.wikimedia.org/T309279 (10ntsako) Query used: ` -- Run in Hue WITH country_data AS ( SELECT distinct iso3_country_code, first_value(country_area_label) over(PARTITION BY iso3_country_code) country_name,... [15:17:10] 10Data-Engineering, 10Equity-Landscape: Overall Engagement input metric - https://phabricator.wikimedia.org/T309278 (10ntsako) Overall engagement is no longer needed as this is calculated on the Output Rank level by @KCVelaga_WMF [15:17:24] 10Data-Engineering, 10Equity-Landscape: Overall Engagement input metric - https://phabricator.wikimedia.org/T309278 (10ntsako) 05Open→03Invalid [15:17:25] 10Data-Engineering, 10Equity-Landscape: Extract + Transformation Raw Data into Input Metrics - https://phabricator.wikimedia.org/T306625 (10ntsako) [15:21:59] 10Data-Engineering-Operations, 10Data-Engineering-Planning, 10Mail, 10SRE: Change the analytics-alerts email alias to a mailman distribution list - https://phabricator.wikimedia.org/T315486 (10BTullis) After modifying the alias I also needed to set the following option in mailman. {F35641648,width=80%} [15:22:36] 10Data-Engineering-Operations, 10Data-Engineering-Planning, 10Mail, 10SRE: Change the analytics-alerts email alias to a mailman distribution list - https://phabricator.wikimedia.org/T315486 (10BTullis) 05Open→03Resolved [16:07:45] 10Data-Engineering-Planning, 10Shared-Data-Infrastructure, 10Data Pipelines (Sprint 03): Create Plan for Spark 2 Deprecation - https://phabricator.wikimedia.org/T318367 (10mforns) [16:10:49] 10Data-Engineering-Planning, 10Data Pipelines (Sprint 03), 10Technical-Debt: Create and deploy the fsimage job. - https://phabricator.wikimedia.org/T321168 (10JArguello-WMF) [16:17:59] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 03), 10Shared-Data-Infrastructure (Sprint 03): Create kubernetes namespace and user for the stream_enrichment PoC project - https://phabricator.wikimedia.org/T321682 (10BTullis) Sorry, you weren't a subscriber @gmodena - Is the suggestion above for a... [16:30:07] 10Data-Engineering, 10Data Pipelines: Back-fill Wikidata reliability Grapite metrics - https://phabricator.wikimedia.org/T321838 (10mforns) [16:32:18] 10Data-Engineering-Planning, 10Wikidata, 10Wikidata Analytics, 10Data Pipelines (Sprint 03): Some reliability metrics missing since June 20th '22 - https://phabricator.wikimedia.org/T314131 (10mforns) I've created a task to specifically tackle the back-filling: https://phabricator.wikimedia.org/T321838 [16:40:12] (VarnishkafkaNoMessages) firing: varnishkafka on cp4050 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=ulsfo%20prometheus/ops&var-cp_cluster=cache_upload&var-instance=cp4050%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [16:45:12] (VarnishkafkaNoMessages) resolved: varnishkafka on cp4050 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=ulsfo%20prometheus/ops&var-cp_cluster=cache_upload&var-instance=cp4050%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [16:57:12] (VarnishkafkaNoMessages) firing: varnishkafka on cp4050 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=ulsfo%20prometheus/ops&var-cp_cluster=cache_upload&var-instance=cp4050%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [17:02:12] (VarnishkafkaNoMessages) resolved: varnishkafka on cp4050 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=ulsfo%20prometheus/ops&var-cp_cluster=cache_upload&var-instance=cp4050%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [17:24:38] !log re-running webrequest-load-wf-text-2022-10-27-10 with lower thresholds [17:24:39] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:37:45] 10Data-Engineering-Planning, 10DC-Ops, 10Event-Platform Value Stream, 10SRE, and 2 others: Q1:rack/setup/install kafka-stretch100[12] - https://phabricator.wikimedia.org/T314156 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host kafka-stretch1001.eqiad.w... [17:38:12] (VarnishkafkaNoMessages) firing: varnishkafka on cp4051 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=ulsfo%20prometheus/ops&var-cp_cluster=cache_upload&var-instance=cp4051%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [17:43:12] (VarnishkafkaNoMessages) resolved: varnishkafka on cp4051 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=ulsfo%20prometheus/ops&var-cp_cluster=cache_upload&var-instance=cp4051%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [17:47:48] 10Data-Engineering, 10Event-Platform Value Stream: Add schema diffing support to jsonschema-tools and run diff in CI - https://phabricator.wikimedia.org/T321850 (10Ottomata) [17:48:20] 10Data-Engineering-Planning, 10SRE, 10SRE-swift-storage, 10Wikidata, and 3 others: Clean up the rdf-streaming-updater-codfw container from thanos-swift. - https://phabricator.wikimedia.org/T316031 (10bking) 05Open→03Resolved [17:48:31] 10Data-Engineering-Planning, 10SRE, 10SRE-swift-storage, 10Wikidata, and 4 others: wdqs space usage on thanos-swift - https://phabricator.wikimedia.org/T314835 (10bking) [17:48:51] 10Data-Engineering, 10Event-Platform Value Stream: Add schema diffing support to jsonschema-tools and run diff in CI - https://phabricator.wikimedia.org/T321850 (10Ottomata) I wrote a very dirty script that does this: https://gist.github.com/ottomata/e21aaae6fd1be3f58ab59341a79cc2d7 Running this script while... [17:59:47] 10Data-Engineering-Planning, 10DC-Ops, 10Event-Platform Value Stream, 10SRE, and 2 others: Q1:rack/setup/install kafka-stretch100[12] - https://phabricator.wikimedia.org/T314156 (10Cmjohnson) @Ottomata this is failing in the installer because of the raid configuration. I probably do not have it set correct... [18:20:26] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 03), 10Shared-Data-Infrastructure (Sprint 03): Create kubernetes namespace and user for the stream_enrichment PoC project - https://phabricator.wikimedia.org/T321682 (10Ottomata) > I'd still be quite keen on the flink-operator approach We are too! >... [18:24:58] 10Data-Engineering-Planning, 10DC-Ops, 10Event-Platform Value Stream, 10SRE, and 2 others: Q1:rack/setup/install kafka-stretch100[12] - https://phabricator.wikimedia.org/T314156 (10Ottomata) What's the error you are getting? See https://phabricator.wikimedia.org/T314160#8166075 and below. In codfw, sda a... [18:46:32] 10Data-Engineering, 10Event-Platform Value Stream: Move Spark JsonSchemaConverter out of analytics/refinery/source and into wikimedia-event-utilities - https://phabricator.wikimedia.org/T321854 (10Ottomata) [18:48:13] 10Data-Engineering, 10Event-Platform Value Stream: Move Spark JsonSchemaConverter out of analytics/refinery/source and into wikimedia-event-utilities - https://phabricator.wikimedia.org/T321854 (10Ottomata) Along the way , we could consider implementing {T278467}. Using the migrated converter would then requi... [19:13:48] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 03), 10Patch-For-Review: Design Schema for page state and page state with content (enriched) streams - https://phabricator.wikimedia.org/T308017 (10Ottomata) [19:29:16] 10Analytics-Jupyter, 10Data-Engineering, 10Product-Analytics, 10Data Pipelines (Sprint 03), 10Patch-For-Review: Add support for jupyterhub on conda-analytics - https://phabricator.wikimedia.org/T321088 (10xcollazo) [19:29:34] 10Data-Engineering-Planning, 10DC-Ops, 10Event-Platform Value Stream, 10SRE, and 2 others: Q1:rack/setup/install kafka-stretch100[12] - https://phabricator.wikimedia.org/T314156 (10Cmjohnson) @Ottomata yes, that is what's happening here [19:42:30] 10Data-Engineering, 10Event-Platform Value Stream, 10MediaWiki-Core-Hooks: Add $comment and $performer to ArticleRevisionVisibilitySet params - https://phabricator.wikimedia.org/T321411 (10Ottomata) Ah, from {T240307}: > One drawback of using interfaces is that it will no longer be possible to add parameters... [19:52:36] 10Data-Engineering-Planning, 10DC-Ops, 10Event-Platform Value Stream, 10SRE, and 2 others: Q1:rack/setup/install kafka-stretch100[12] - https://phabricator.wikimedia.org/T314156 (10Ottomata) K, looks like RobH was able to [[ https://phabricator.wikimedia.org/T314160#8166665 | fix it somehow ]]. [19:56:08] 10Data-Engineering-Planning, 10Cassandra, 10Image-Suggestions, 10Section-Level-Image-Suggestions: Section Level Image Suggestions - Data Persistence Request - https://phabricator.wikimedia.org/T320831 (10Eevans) >>! In T320831#8343829, @Eevans wrote: >> >> [ ... ] >> >> **Size and Growth:** >> >> - Still... [20:23:19] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [21:20:21] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [21:41:19] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state