[00:12:12] (VarnishkafkaNoMessages) firing: (2) varnishkafka on cp2027 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [00:17:12] (VarnishkafkaNoMessages) resolved: (2) varnishkafka on cp2027 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [00:31:07] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:37:03] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: monitor_refine_eventlogging_legacy.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [06:00:35] !log Rerunning on an-launcher1002 sudo -u analytics kerberos-run-command analytics refine_eventlogging_legacy --ignore_failure_flag=true --table_include_regex='homepagemodule' --since='2022-11-04T15:00:00.000Z' --until='2022-11-05T16:00:00.000Z' [06:00:36] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [06:24:43] !log sudo systemctl reset-failed monitor_refine_eventlogging_legacy.service [06:24:44] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [06:25:47] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [08:11:23] 10Data-Engineering-Planning, 10API Platform (Sprint 00), 10Platform Engineering Roadmap, 10User-Eevans: Obtain security review of uniqueDevices - https://phabricator.wikimedia.org/T320976 (10Atieno) Depends on https://phabricator.wikimedia.org/T320983 [08:12:51] Hi team - I've been disconnected this weekend, and I'm back :) [08:18:33] bonjour [08:19:33] PROBLEM - SSH on an-coord1002.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [09:07:07] Hello [10:20:57] RECOVERY - SSH on an-coord1002.mgmt is OK: SSH OK - OpenSSH_7.4 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [10:21:34] 10Analytics-Jupyter, 10Data-Engineering, 10Product-Analytics, 10Data Pipelines (Sprint 04), 10Patch-For-Review: Add support for jupyterhub on conda-analytics - https://phabricator.wikimedia.org/T321088 (10EChetty) [10:22:02] 10Data-Engineering-Planning, 10Cassandra, 10Data Pipelines (Sprint 04), 10Patch-For-Review: Write dedicated cassandra authorization code to read password from file when loading - https://phabricator.wikimedia.org/T306895 (10EChetty) [10:23:08] 10Data-Engineering-Planning, 10Product-Analytics, 10wmfdata-python, 10Data Pipelines (Sprint 04): Upgrade WMFData Python Package to use Spark3 - https://phabricator.wikimedia.org/T318587 (10EChetty) [10:23:18] 10Data-Engineering-Planning, 10Data Pipelines (Sprint 04), 10Patch-For-Review, 10Technical-Debt: Create a dashboard from the fsImage Dataset extracted from the HDFS FsImage - https://phabricator.wikimedia.org/T321169 (10EChetty) [10:23:30] 10Data-Engineering, 10Data Pipelines (Sprint 04): Reduce the number of files generated by geoeditors airflor jobs - https://phabricator.wikimedia.org/T304852 (10EChetty) [10:23:34] 10Data-Engineering-Planning, 10Data Pipelines (Sprint 04), 10Patch-For-Review, 10Technical-Debt: Create and deploy the fsimage job. - https://phabricator.wikimedia.org/T321168 (10EChetty) [10:23:36] 10Data-Engineering-Planning, 10Shared-Data-Infrastructure, 10Data Pipelines (Sprint 04): Create Plan for Spark 2 Deprecation - https://phabricator.wikimedia.org/T318367 (10EChetty) [10:24:40] 10Data-Engineering-Planning, 10Shared-Data-Infrastructure, 10Data Pipelines (Sprint 04): Create Plan for Spark 2 Deprecation - https://phabricator.wikimedia.org/T318367 (10EChetty) [10:26:07] 10Data-Engineering-Planning, 10Data Pipelines: Bug: Deleted pages are accidentally excluded from mediawiki_history_reduced - https://phabricator.wikimedia.org/T313955 (10EChetty) [10:26:09] 10Data-Engineering-Planning, 10Data Pipelines, 10Shared-Data-Infrastructure: [Iceberg] Debianize and install iceberg support for Spark, Presto, and optionally Hive - https://phabricator.wikimedia.org/T311738 (10EChetty) [10:26:27] 10Data-Engineering-Planning, 10Data Pipelines: Airflow Upgrade Compatibility with V2.3.2 - https://phabricator.wikimedia.org/T309552 (10EChetty) [10:26:36] 10Data-Engineering-Planning, 10Data Pipelines, 10Patch-For-Review: [Iceberg] Update Refine Sanitize to insert into Iceberg tables - https://phabricator.wikimedia.org/T311739 (10EChetty) [10:27:46] 10Data-Engineering-Planning, 10Beta-Cluster-Infrastructure, 10CirrusSearch, 10Discovery-Search, 10Event-Platform Value Stream: cirrusSearchCheckerJob JobQueueErrors (Could not enqueue jobs) on Beta Cluster - https://phabricator.wikimedia.org/T322491 (10EChetty) [10:27:48] 10Data-Engineering-Planning: NEW FEATURE REQUEST: - https://phabricator.wikimedia.org/T322423 (10EChetty) [10:27:50] 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 04): Prototype Spark Streaming Job for Content Dumps - https://phabricator.wikimedia.org/T322326 (10EChetty) [10:27:51] 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 04), 10Spike: Investigate using Spark Streaming as an Event Service Platform - https://phabricator.wikimedia.org/T322320 (10EChetty) [10:27:54] 10Data-Engineering-Planning: Check home/HDFS leftovers of ejoseph - https://phabricator.wikimedia.org/T322182 (10EChetty) [10:27:55] 10Data-Engineering-Planning, 10Event-Platform Value Stream: [NEEDS GROOMING] Improve reliability of simple stateless services - https://phabricator.wikimedia.org/T322125 (10EChetty) [10:27:57] 10Data-Engineering-Planning: Check home/HDFS leftovers of faidon - https://phabricator.wikimedia.org/T322107 (10EChetty) [10:28:00] 10Data-Engineering-Planning, 10Data Pipelines: Implement periodical cleaning of Airflow databases - https://phabricator.wikimedia.org/T322036 (10EChetty) [10:28:02] 10Data-Engineering-Planning, 10Event-Platform Value Stream: [NEEDS GROOMING] Flink SQL queries should access Kafka topics from a Catalog - https://phabricator.wikimedia.org/T322022 (10EChetty) [10:28:03] 10Data-Engineering-Planning, 10Product-Analytics: Presto returns incorrect data for an added field - https://phabricator.wikimedia.org/T321960 (10EChetty) [10:28:05] 10Data-Engineering-Planning, 10Data Pipelines: Allow Cormac Parle and Marco Fossati to deploy analytics-platform-eng Airflow instance - https://phabricator.wikimedia.org/T321925 (10EChetty) [10:28:08] 10Data-Engineering-Planning, 10Event-Platform Value Stream: Move Spark JsonSchemaConverter out of analytics/refinery/source and into wikimedia-event-utilities - https://phabricator.wikimedia.org/T321854 (10EChetty) [10:28:09] 10Data-Engineering-Planning, 10Data Pipelines: Back-fill Wikidata reliability Grapite metrics - https://phabricator.wikimedia.org/T321838 (10EChetty) [10:28:12] 10Data-Engineering-Planning, 10Event-Platform Value Stream: Add schema diffing support to jsonschema-tools and run diff in CI - https://phabricator.wikimedia.org/T321850 (10EChetty) [10:28:13] 10Data-Engineering-Planning: RAID battery alert in an-worker1083 - https://phabricator.wikimedia.org/T321809 (10EChetty) [10:28:15] 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 04), 10MW-1.40-notes (1.40.0-wmf.8; 2022-10-31): EventBus' stream config destination_event_service setting should move into producers.mediawikI_eventbus specific settings. - https://phabricator.wikimedia.org/T321557 (10EChetty) [10:28:19] 10Data-Engineering-Planning: Check home/HDFS leftovers of bscarone - https://phabricator.wikimedia.org/T321542 (10EChetty) [10:28:22] 10Data-Engineering-Planning, 10Data Pipelines: refinery scap deployment to thin nodes is broken - https://phabricator.wikimedia.org/T321506 (10EChetty) [10:28:24] 10Data-Engineering-Planning, 10Event-Platform Value Stream, 10MediaWiki-Core-Hooks: Create PageUndeleteComplete hook, analogous to PageDeleteComplete - https://phabricator.wikimedia.org/T321412 (10EChetty) [10:28:26] 10Data-Engineering-Planning, 10MediaWiki-Core-Hooks, 10Event-Platform Value Stream (Sprint 07): Add $comment and $performer to ArticleRevisionVisibilitySet params - https://phabricator.wikimedia.org/T321411 (10EChetty) [10:28:31] 10Data-Engineering-Planning, 10Unstewarded-production-error, 10Wikimedia-production-error: '.client_dt' should match format "date-time", '.event.pageNamespace' should be integer, '.event.skinVersion' should be integer - https://phabricator.wikimedia.org/T321329 (10EChetty) [10:28:35] 10Data-Engineering-Planning: Bug: User History has mismatching order of fields in Parquet vs. Hive - https://phabricator.wikimedia.org/T321231 (10EChetty) [10:28:43] 10Data-Engineering-Planning, 10GitLab, 10Release-Engineering-Team, 10serviceops-collab: Experiencing pipeline failure due to disk-space issues - https://phabricator.wikimedia.org/T310593 (10EChetty) [10:28:47] 10Analytics-Jupyter, 10Data-Engineering-Planning, 10Product-Analytics, 10Data Pipelines (Sprint 04), 10Patch-For-Review: Add support for jupyterhub on conda-analytics - https://phabricator.wikimedia.org/T321088 (10EChetty) [10:28:51] 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 04), 10Patch-For-Review, 10Shared-Data-Infrastructure (Sprint 03): Create kubernetes namespace and user for the stream_enrichment PoC project - https://phabricator.wikimedia.org/T321682 (10EChetty) [10:28:57] 10Data-Engineering-Planning, 10Cloud-Services, 10serviceops-collab, 10Patch-For-Review: Provide cross-dc redundancy (active-active or active-passive) to all important misc services - https://phabricator.wikimedia.org/T156937 (10EChetty) [10:29:01] 10Data-Engineering-Planning, 10Shared-Data-Infrastructure (Sprint 03): Grant analytics-admins the right to run commands as the yarn user - https://phabricator.wikimedia.org/T321378 (10EChetty) [10:29:05] 10Data-Engineering-Planning, 10DC-Ops, 10SRE, 10ops-eqiad: Q2:rack/setup/install an-coord100[3,4] & an-mariadb100[1,2] - https://phabricator.wikimedia.org/T321119 (10EChetty) [10:29:13] 10Data-Engineering-Planning, 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for bnwikiquote - https://phabricator.wikimedia.org/T319190 (10EChetty) [10:29:21] 10Data-Engineering-Planning, 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for tlwikiquote - https://phabricator.wikimedia.org/T317111 (10EChetty) [10:29:30] 10Data-Engineering-Planning, 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for bclwikiquote - https://phabricator.wikimedia.org/T316456 (10EChetty) [10:29:37] 10Data-Engineering-Planning, 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for igwikiquote - https://phabricator.wikimedia.org/T314639 (10EChetty) [10:31:31] 10Data-Engineering-Planning, 10Product-Analytics: Presto returns incorrect data for an added field - https://phabricator.wikimedia.org/T321960 (10EChetty) p:05Triage→03High [10:31:47] 10Data-Engineering-Planning, 10Data Pipelines, 10Product-Analytics: Presto returns incorrect data for an added field - https://phabricator.wikimedia.org/T321960 (10EChetty) [10:36:21] 10Data-Engineering-Planning, 10Data Pipelines, 10Product-Analytics: Presto returns incorrect data for an added field - https://phabricator.wikimedia.org/T321960 (10EChetty) [10:38:16] 10Data-Engineering-Planning, 10Data Pipelines, 10Product-Analytics: Presto returns incorrect data for an added field - https://phabricator.wikimedia.org/T321960 (10EChetty) Modified slightly to conform to new data issue template ( Unpublished at the time of original filing). Will look at it for this weeks sp... [10:42:36] 10Data-Engineering-Planning: Bug: User History has mismatching order of fields in Parquet vs. Hive - https://phabricator.wikimedia.org/T321231 (10EChetty) p:05Triage→03High [10:43:04] 10Data-Engineering-Planning, 10Data Pipelines: Bug: User History has mismatching order of fields in Parquet vs. Hive - https://phabricator.wikimedia.org/T321231 (10EChetty) [10:43:34] 10Data-Engineering-Planning, 10DC-Ops, 10SRE, 10Shared-Data-Infrastructure, 10ops-eqiad: Q2:rack/setup/install an-coord100[3,4] & an-mariadb100[1,2] - https://phabricator.wikimedia.org/T321119 (10EChetty) [10:44:08] 10Data-Engineering-Planning, 10DBA, 10Data-Services, 10Shared-Data-Infrastructure, 10cloud-services-team (Kanban): Prepare and check storage layer for bnwikiquote - https://phabricator.wikimedia.org/T319190 (10EChetty) [10:44:44] 10Data-Engineering-Planning, 10Data Pipelines, 10GitLab, 10Release-Engineering-Team, 10serviceops-collab: Experiencing pipeline failure due to disk-space issues - https://phabricator.wikimedia.org/T310593 (10EChetty) [10:45:58] 10Data-Engineering-Planning, 10Data Pipelines: Make mediawiki-history page and user sorting complete for denormalization - https://phabricator.wikimedia.org/T321493 (10EChetty) [10:49:17] 10Data-Engineering-Planning, 10Cloud-Services, 10Shared-Data-Infrastructure, 10serviceops-collab, 10Patch-For-Review: Provide cross-dc redundancy (active-active or active-passive) to all important misc services - https://phabricator.wikimedia.org/T156937 (10EChetty) [10:50:14] 10Data-Engineering-Planning, 10DBA, 10Data Pipelines, 10Data-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for tlwikiquote - https://phabricator.wikimedia.org/T317111 (10EChetty) [10:50:38] 10Data-Engineering-Planning, 10DBA, 10Data Pipelines, 10Data-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for bclwikiquote - https://phabricator.wikimedia.org/T316456 (10EChetty) [10:51:12] 10Data-Engineering-Planning, 10DBA, 10Data Pipelines, 10Data-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for igwikiquote - https://phabricator.wikimedia.org/T314639 (10EChetty) [10:54:07] 10Data-Engineering-Planning, 10Data Pipelines: Fix mediawiki-history page computation for deleted pages having the same title - https://phabricator.wikimedia.org/T320860 (10EChetty) [10:56:10] 10Data-Engineering-Planning, 10Cassandra, 10Data Pipelines: Final cleanup tasks related to the AQS cluster migration - https://phabricator.wikimedia.org/T302278 (10EChetty) [10:56:53] 10Data-Engineering-Planning, 10Data Pipelines, 10SRE, 10Traffic: Add a rolled-up cache_status field to druid webrequest_sampled_128 - https://phabricator.wikimedia.org/T319344 (10EChetty) [10:57:15] 10Data-Engineering-Planning, 10Data Pipelines, 10Discovery-Search: Migrate Search Airflow jobs to Airflow 3 and use shared supporting code from the data engineering Airflow - https://phabricator.wikimedia.org/T318414 (10EChetty) [10:57:36] 10Data-Engineering-Planning, 10Event-Platform Value Stream: Drop GuidedTour* tables - https://phabricator.wikimedia.org/T317460 (10EChetty) [10:58:13] 10Data-Engineering: 503 on Superset (reproducible) - https://phabricator.wikimedia.org/T322525 (10Michael) [10:58:22] 10Data-Engineering-Planning, 10Event-Platform Value Stream: Move archiva to private IPs + CDN - https://phabricator.wikimedia.org/T317182 (10EChetty) [10:58:46] 10Data-Engineering-Planning, 10Data Pipelines: Support for moving data from HDFS to public http file server - https://phabricator.wikimedia.org/T317167 (10EChetty) [11:01:56] 10Data-Engineering-Planning, 10Event-Platform Value Stream, 10Patch-For-Review: Migrate pagecounts-ez generation to hadoop - https://phabricator.wikimedia.org/T192474 (10EChetty) [11:02:01] 10Data-Engineering-Planning, 10Shared-Data-Infrastructure: an-worker1090 MegaRaid issues - https://phabricator.wikimedia.org/T315748 (10EChetty) [11:03:17] 10Data-Engineering-Planning, 10Data Pipelines, 10Product-Analytics: PySpark warning messages - https://phabricator.wikimedia.org/T315024 (10EChetty) [11:03:34] 10Data-Engineering-Planning, 10Data Pipelines: Add a spark-history-server to our cluster - https://phabricator.wikimedia.org/T312541 (10EChetty) [11:04:20] 10Data-Engineering: 503 on Superset (reproducible) - https://phabricator.wikimedia.org/T322525 (10Michael) [11:04:44] 10Data-Engineering-Planning, 10Data Pipelines: Drop MediaViewer and MultimediaViewer* tables - https://phabricator.wikimedia.org/T311229 (10EChetty) [11:05:48] 10Data-Engineering-Planning, 10Data Pipelines: Generate data to count langswitches for every article - https://phabricator.wikimedia.org/T310975 (10EChetty) [11:06:03] 10Data-Engineering-Planning, 10Data Pipelines: Drop ArticleCreationWorkflow data - https://phabricator.wikimedia.org/T310863 (10EChetty) [11:06:24] 10Data-Engineering-Planning, 10Shared-Data-Infrastructure: Late events in wdqs-external.sparql-query? - https://phabricator.wikimedia.org/T310790 (10EChetty) [11:07:39] 10Data-Engineering-Planning, 10Event-Platform Value Stream, 10EventStreams: EventStreams doesn't show the Wikistories-* streams - https://phabricator.wikimedia.org/T307679 (10EChetty) [11:08:59] 10Data-Engineering-Planning, 10Shared-Data-Infrastructure: RAID battery alert in an-worker1083 - https://phabricator.wikimedia.org/T321809 (10EChetty) [11:09:08] 10Analytics-Clusters, 10Data-Engineering-Planning, 10Event-Platform Value Stream, 10Voice & Tone: Rename geoeditors_blacklist_country - https://phabricator.wikimedia.org/T259804 (10EChetty) [11:09:15] 10Data-Engineering-Planning, 10Event-Platform Value Stream, 10Patch-For-Review: Create conda .deb and docker image - https://phabricator.wikimedia.org/T304450 (10EChetty) [11:10:00] 10Data-Engineering-Planning, 10Data Pipelines, 10Product-Analytics, 10Research: Update HDFS links tables as Mediawiki changes - https://phabricator.wikimedia.org/T304979 (10EChetty) [11:11:28] 10Analytics-Wikistats, 10Data-Engineering-Planning, 10Data Pipelines: Non-mobile UAs on mobile (2g/gprs, etc) IP-blocks - https://phabricator.wikimedia.org/T58628 (10EChetty) [11:13:29] 10Analytics-Jupyter, 10Data-Engineering, 10Product-Analytics, 10Data Pipelines (Sprint 04), 10Patch-For-Review: Add support for jupyterhub on conda-analytics - https://phabricator.wikimedia.org/T321088 (10EChetty) [11:14:34] 10Analytics, 10Data-Engineering, 10Event-Platform Value Stream, 10Patch-For-Review, 10User-Elukey: Port architecture of irc-recentchanges to Kafka - https://phabricator.wikimedia.org/T234234 (10EChetty) [11:28:43] hi team, could I get a review on https://gerrit.wikimedia.org/r/c/operations/puppet/+/839512? [11:28:54] context: T319324 [11:28:54] T319324: Consider adding X-Analytics subfield for 'has a session cookie' - https://phabricator.wikimedia.org/T319324 [12:34:35] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [12:46:26] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [12:57:19] 10Analytics-Jupyter, 10Data-Engineering-Planning, 10Product-Analytics, 10Data Pipelines (Sprint 04), 10Patch-For-Review: Add support for jupyterhub on conda-analytics - https://phabricator.wikimedia.org/T321088 (10EChetty) [12:58:03] 10Analytics-Clusters, 10Data-Engineering-Planning, 10Event-Platform Value Stream, 10Voice & Tone: Rename geoeditors_blacklist_country - https://phabricator.wikimedia.org/T259804 (10EChetty) [12:58:29] 10Analytics, 10Data-Engineering-Planning, 10Event-Platform Value Stream, 10Patch-For-Review, 10User-Elukey: Port architecture of irc-recentchanges to Kafka - https://phabricator.wikimedia.org/T234234 (10EChetty) [12:59:54] 10Data-Engineering, 10Data Pipelines: 503 on Superset (reproducible) - https://phabricator.wikimedia.org/T322525 (10EChetty) [13:00:04] 10Data-Engineering-Planning, 10Data Pipelines: 503 on Superset (reproducible) - https://phabricator.wikimedia.org/T322525 (10EChetty) [13:00:38] 10Data-Engineering-Planning: requesting Kerberos password for mikeraish (MRaishWMF) - https://phabricator.wikimedia.org/T313316 (10EChetty) [13:01:47] 10Analytics-Jupyter, 10Data-Engineering-Planning, 10Data Pipelines, 10Product-Analytics, 10Patch-For-Review: Add support for jupyterhub on conda-analytics - https://phabricator.wikimedia.org/T321088 (10EChetty) [13:14:07] 10Data-Engineering-Planning, 10API Platform (Sprint 00), 10Platform Engineering Roadmap, 10User-Eevans: Obtain security review of uniqueDevices - https://phabricator.wikimedia.org/T320976 (10JArguello-WMF) [13:20:44] 10Data-Engineering-Planning, 10Data Pipelines: 503 on Superset (reproducible) - https://phabricator.wikimedia.org/T322525 (10EChetty) Weird. Can reproduce on my end too. {F35714228} @JAllemandou, @Milimetric Thoughts? Is this a superset thing? [13:26:50] 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 04), 10Spike: Easy Flink Python UDF + SQL enrichment - https://phabricator.wikimedia.org/T320968 (10Ottomata) If you want to run Flink in k8s and write to HDFS, then this will be a problem: this is the [[ https://phabricator.wikimedia.org/T31... [13:31:23] 10Data-Engineering, 10Data Pipelines: Migrate 1+ Refine jobs - https://phabricator.wikimedia.org/T307505 (10Antoine_Quhen) Migrating refine from Airflow may trigger upgrading the refine jobs to Spark 3. The last version of the refinery source includes more error logs, which will ship at the same time: * https... [13:41:32] (03CR) 10Ottomata: "FWIW, we did some thinking about what a good name for this field was, and came up with 'wiki_id' instead of just 'wiki' or 'database'. Th" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/809150 (https://phabricator.wikimedia.org/T302925) (owner: 10Kosta Harlan) [13:41:40] 10Data-Engineering, 10Data Pipelines: wmf.virtualpageview_hourly's language_variant field is corrupted - https://phabricator.wikimedia.org/T322545 (10mforns) [14:00:10] 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 04), 10Spike: Easy Flink Python UDF + SQL enrichment - https://phabricator.wikimedia.org/T320968 (10tchin) The exact error I get is `org.apache.kafka.common.errors.ClusterAuthorizationException: Cluster authorization failed` when trying to pr... [14:16:18] (03CR) 10TChin: image-suggestions-feedback: Bump to version 1.0.1 (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/809150 (https://phabricator.wikimedia.org/T302925) (owner: 10Kosta Harlan) [14:31:37] 10Data-Engineering-Planning, 10Data Pipelines: Allow Cormac Parle and Marco Fossati to deploy analytics-platform-eng Airflow instance - https://phabricator.wikimedia.org/T321925 (10xcollazo) @EChetty: this task is currently blocking @mfossati and @Cparle. From @Ottomata : > hm it looks like they are in the ri... [14:58:43] (03CR) 10Kosta Harlan: image-suggestions-feedback: Bump to version 1.0.1 (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/809150 (https://phabricator.wikimedia.org/T302925) (owner: 10Kosta Harlan) [14:59:09] 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 04): [NEEDS GROOMING] Flink SQL queries should access Kafka topics from a Catalog - https://phabricator.wikimedia.org/T322022 (10lbowmaker) [15:27:18] 10Analytics-Jupyter, 10Data-Engineering-Planning, 10Data Pipelines, 10Product-Analytics, 10Patch-For-Review: Add support for jupyterhub on conda-analytics - https://phabricator.wikimedia.org/T321088 (10xcollazo) We deployed the changes manually on Friday Nov 4 to `an-test-coord1001`. On first deploy, th... [15:32:51] 10Analytics-Jupyter, 10Data-Engineering-Planning, 10Data Pipelines, 10Product-Analytics, 10Patch-For-Review: Add support for jupyterhub on conda-analytics - https://phabricator.wikimedia.org/T321088 (10xcollazo) Installing latest wmfdata (1.4.0) fails on `an-test-client1001` while it succeeds on `stat100... [15:49:13] btullis: o/ I'd need to rebuild the istio docker images, ok if I do the spark ones too? [15:52:49] (proceeding, it should be ok in theory) [15:58:24] (wow it takes a lot to build those images) [16:01:56] Yes please. (retrospectively :-) ) [16:06:19] 10Data-Engineering-Planning, 10Data Pipelines: 503 on Superset (reproducible) - https://phabricator.wikimedia.org/T322525 (10JAllemandou) This is expected. Superset uses `presto` behind the scene, for which there is "only" 5 hosts (in comparison to 90 on the hadoop cluster). Querying the `webrequest` dataset w... [16:13:41] 10Analytics-Jupyter, 10Data-Engineering-Planning, 10Product-Analytics, 10Data Pipelines (Sprint 04), 10Patch-For-Review: Add support for jupyterhub on conda-analytics - https://phabricator.wikimedia.org/T321088 (10EChetty) [16:16:51] 10Data-Engineering, 10Data Pipelines: Reduce the number of files generated by geoeditors airflor jobs - https://phabricator.wikimedia.org/T304852 (10EChetty) [16:16:58] joal: thx <3 [16:17:25] you're welcome vgutierrez - thanks for adding this, I'm sure it'll be widely useful [16:20:42] 10Data-Engineering-Planning, 10Data Pipelines: 503 on Superset (reproducible) - https://phabricator.wikimedia.org/T322525 (10Michael) Mh, while that is understandable and not an issue once one knows about it, it would be great if it would fail in a more legible manner and maybe show a "Timeout" error in the UI? [16:21:27] 10Data-Engineering-Planning, 10Data Pipelines: 503 on Superset (reproducible) - https://phabricator.wikimedia.org/T322525 (10Michael) [16:23:39] 10Data-Engineering-Planning, 10Data Pipelines, 10Product-Analytics: Presto returns incorrect data for an added field - https://phabricator.wikimedia.org/T321960 (10EChetty) [16:27:45] 10Data-Engineering-Planning, 10Data Pipelines: Allow Cormac Parle and Marco Fossati to deploy analytics-platform-eng Airflow instance - https://phabricator.wikimedia.org/T321925 (10EChetty) [16:30:18] btullis: it is still building :D [16:31:05] afaics it is using maven to pull dependencies and then it is compiling them [16:32:24] 10Data-Engineering-Planning, 10Cassandra, 10Data Pipelines: Final cleanup tasks related to the AQS cluster migration - https://phabricator.wikimedia.org/T302278 (10EChetty) [16:32:37] 10Data-Engineering-Planning, 10Data Pipelines: Bug: User History has mismatching order of fields in Parquet vs. Hive - https://phabricator.wikimedia.org/T321231 (10EChetty) [16:34:53] 10Data-Engineering-Planning, 10Beta-Cluster-Infrastructure, 10Event-Platform Value Stream: cirrusSearchCheckerJob JobQueueErrors (Could not enqueue jobs) on Beta Cluster - https://phabricator.wikimedia.org/T322491 (10Gehel) Removing the Search team from this, it looks to be related to JobQueue and not touchi... [16:37:41] 10Data-Engineering-Planning, 10DBA, 10Data Pipelines, 10Data-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for bclwikiquote - https://phabricator.wikimedia.org/T316456 (10EChetty) [16:39:47] elukey: Yep, it's a hefty build process alright. [16:40:23] 10Data-Engineering-Planning, 10Data Pipelines (Sprint 04): Allow Cormac Parle and Marco Fossati to deploy analytics-platform-eng Airflow instance - https://phabricator.wikimedia.org/T321925 (10EChetty) [16:40:25] 10Data-Engineering-Planning, 10Data Pipelines (Sprint 04): Bug: User History has mismatching order of fields in Parquet vs. Hive - https://phabricator.wikimedia.org/T321231 (10EChetty) [16:40:27] 10Data-Engineering-Planning, 10Product-Analytics, 10Data Pipelines (Sprint 04): Presto returns incorrect data for an added field - https://phabricator.wikimedia.org/T321960 (10EChetty) [16:40:37] 10Data-Engineering-Planning, 10Cassandra, 10Data Pipelines (Sprint 04): Final cleanup tasks related to the AQS cluster migration - https://phabricator.wikimedia.org/T302278 (10EChetty) [16:40:50] 10Data-Engineering-Planning, 10DBA, 10Data-Services, 10Data Pipelines (Sprint 04), 10cloud-services-team (Kanban): Prepare and check storage layer for bclwikiquote - https://phabricator.wikimedia.org/T316456 (10EChetty) [16:41:05] 10Data-Engineering-Planning, 10Data Pipelines: 503 on Superset (reproducible) - https://phabricator.wikimedia.org/T322525 (10Milimetric) @Michael: to add a little more detail on what Joseph said, querying 5 days of webrequest (only text) means moving `5 * 1.3T = 6.5T` over the network. So there are two import... [16:57:12] 10Data-Engineering-Planning, 10Product-Analytics, 10Data Pipelines (Sprint 04): Presto returns incorrect data for an added field - https://phabricator.wikimedia.org/T321960 (10JAllemandou) I'd like us to try: ` hive.parquet.use-column-names=true ` (look for the string in https://prestodb.io/docs/current/rele... [17:06:54] 10Data-Engineering-Planning, 10DBA, 10Data-Services, 10Data Pipelines (Sprint 04), 10cloud-services-team (Kanban): Prepare and check storage layer for tlwikiquote - https://phabricator.wikimedia.org/T317111 (10EChetty) [17:07:06] 10Data-Engineering-Planning, 10DBA, 10Data-Services, 10Data Pipelines (Sprint 04), 10cloud-services-team (Kanban): Prepare and check storage layer for tlwikiquote - https://phabricator.wikimedia.org/T317111 (10EChetty) [17:07:30] 10Data-Engineering-Planning, 10DBA, 10Data-Services, 10Data Pipelines (Sprint 04), 10cloud-services-team (Kanban): Prepare and check storage layer for igwikiquote - https://phabricator.wikimedia.org/T314639 (10EChetty) [17:07:46] 10Data-Engineering-Planning, 10DBA, 10Data-Services, 10Data Pipelines (Sprint 04), 10cloud-services-team (Kanban): Prepare and check storage layer for igwikiquote - https://phabricator.wikimedia.org/T314639 (10EChetty) [17:10:34] 10Data-Engineering-Planning, 10Data Pipelines: 503 on Superset (reproducible) - https://phabricator.wikimedia.org/T322525 (10Michael) Thank you for elaborating on the current situation 🙏 The implications of this did not occur to me when looking at [Analytics/Data Lake](https://wikitech.wikimedia.org/wiki/Analy... [17:34:54] btullis: images built! [17:50:04] 10Data-Engineering, 10Product-Analytics, 10wmfdata-python: Remodel Wmfdata-Python's Spark API to match underlying behavior - https://phabricator.wikimedia.org/T273210 (10nshahquinn-wmf) [17:58:27] 10Data-Engineering, 10Product-Analytics, 10wmfdata-python: Release Wmfdata-Python 2.0 - https://phabricator.wikimedia.org/T300442 (10nshahquinn-wmf) [17:58:31] Hey folks - power cut at home - disconnecting until it comes back [18:47:05] 10Data-Engineering-Planning, 10Data Pipelines (Sprint 04): Bug: User History has mismatching order of fields in Parquet vs. Hive - https://phabricator.wikimedia.org/T321231 (10Mayakp.wiki) [19:25:18] 10Data-Engineering-Planning, 10Product-Analytics (Kanban): Superset Date Filter fix needed - https://phabricator.wikimedia.org/T318299 (10Mayakp.wiki) @BTullis : assigning to you. Can we try upgrading Superset as discussed during the [[ https://docs.google.com/document/d/1MkRw0GRti8u1SSPdaJytOJSNxdNe4ggBRVJg8... [19:25:27] 10Data-Engineering-Planning, 10Product-Analytics (Kanban): Superset Date Filter fix needed - https://phabricator.wikimedia.org/T318299 (10Mayakp.wiki) a:05Mayakp.wiki→03BTullis [20:40:08] 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 04), 10Spike: Easy Flink Python UDF + SQL enrichment - https://phabricator.wikimedia.org/T320968 (10Ottomata) Hm, maybe you can do a different topic? It might be better to do a temp topic with your name in it, so it is clear that is just you... [20:47:01] 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 04), 10Spike: Easy Flink Python UDF + SQL enrichment - https://phabricator.wikimedia.org/T320968 (10Ottomata) I got sidetracked into getting a working content enrichment pyflink UDF SQL thing working. Finally got it! https://gist.github.com/... [21:30:12] (VarnishkafkaNoMessages) firing: (2) varnishkafka on cp4040 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [21:30:12] (VarnishkafkaNoMessages) firing: (5) varnishkafka on cp2027 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [21:35:12] (VarnishkafkaNoMessages) resolved: (7) varnishkafka on cp2027 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [21:35:12] (VarnishkafkaNoMessages) resolved: (6) varnishkafka on cp2027 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [21:42:28] 10Data-Engineering, 10wmfdata-python, 10Product-Analytics (Kanban): Release Wmfdata-Python 2.0 - https://phabricator.wikimedia.org/T300442 (10nshahquinn-wmf) [21:49:37] 10Data-Engineering, 10Wmfdata-Python, 10Product-Analytics (Kanban): Release Wmfdata-Python 2.0 - https://phabricator.wikimedia.org/T300442 (10nshahquinn-wmf) a:03nshahquinn-wmf I will be doing all of this with the exception of T318587. It's a long list but it's all pretty simple. [23:33:06] PROBLEM - SSH on an-coord1002.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [23:40:12] (VarnishkafkaNoMessages) firing: (5) varnishkafka on cp2027 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [23:45:12] (VarnishkafkaNoMessages) resolved: (9) varnishkafka on cp2027 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages