[00:01:10] 10Data-Platform-SRE (2024.01.22 - 2024.02.11): Stale data/failed queries on wikidatawiki index - https://phabricator.wikimedia.org/T356941 (10bking) [00:03:29] (SystemdUnitFailed) firing: (21) monitor_refine_event_sanitized_analytics_delayed.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:06:47] 10Data-Platform-SRE (2024.01.22 - 2024.02.11): Stale data/failed queries on wikidatawiki index - https://phabricator.wikimedia.org/T356941 (10bking) [00:14:53] (HiveServerHeapUsage) resolved: Hive Server JVM Heap usage is above 80% on an-coord1003:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1003:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [00:23:29] (SystemdUnitFailed) firing: (21) monitor_refine_event_sanitized_analytics_delayed.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:28:29] (SystemdUnitFailed) firing: (21) monitor_refine_event_sanitized_analytics_delayed.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [01:17:00] (DiskSpace) firing: Disk space an-test-worker1001:9100:/ 0.002007% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=an-test-worker1001 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [02:13:29] (SystemdUnitFailed) firing: (20) monitor_refine_event_sanitized_analytics_delayed.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:58:29] (SystemdUnitFailed) firing: (21) monitor_refine_event_sanitized_analytics_delayed.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [03:46:49] (PuppetFailure) firing: Puppet has failed on an-test-worker1001:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [04:23:29] (SystemdUnitFailed) firing: (21) monitor_refine_event_sanitized_analytics_delayed.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:21:36] (DiskSpace) firing: Disk space an-test-worker1001:9100:/ 0% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=an-test-worker1001 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [06:39:46] 10Data-Engineering, 10Data-Persistence: Migrate dbstore* hosts to 10.6 - https://phabricator.wikimedia.org/T356961 (10Marostegui) [06:40:02] 10Data-Engineering, 10Data-Persistence: Migrate dbstore* hosts to 10.6 - https://phabricator.wikimedia.org/T356961 (10Marostegui) @BTullis only dbstore1007 pending [07:26:49] (PuppetFailure) resolved: Puppet has failed on an-test-worker1001:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [07:28:29] (SystemdUnitFailed) firing: (20) monitor_refine_event_sanitized_analytics_delayed.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:03:29] (SystemdUnitFailed) firing: (18) monitor_refine_event_sanitized_analytics_delayed.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:06:05] Good morning folks - There seem to be a puppet error on an-test-worker1001 - We received an error for a spark test job failure, but I can't troubleshoot it as I can't find the related systemd-timer and service [08:07:55] Also, I could do with some help in understanding the emails we receive from [08:07:58] sre-observability@wikimedia.org [08:08:27] They mention systemd unit failed, but it feels they don't take recovery into account - happu to talk about this [08:55:16] 10Data-Engineering (Sprint 8): [BUG] webrequest analyzer DQ jobs fails to store data - https://phabricator.wikimedia.org/T356401 (10gmodena) db and tables have been created: ` spark-sql (default)> use wmf_data_ops; Response code Time taken: 2.698 seconds spark-sql (default)> show tables; database tableName isTem... [08:56:26] 10Data-Engineering (Sprint 8): [Data quality] Create database and tables for DQ backend - https://phabricator.wikimedia.org/T356628 (10gmodena) [09:05:14] joal: looking now [09:21:31] joal: Have you got a few minutes to discuss the emails relating to systemd units? [09:21:43] Sure btullis [09:21:49] batcave? [09:21:55] On my way... [09:21:59] (DiskSpace) firing: Disk space an-test-worker1001:9100:/ 0.04464% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=an-test-worker1001 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [09:31:50] 10Data-Platform-SRE, 10SRE, 10SRE-Access-Requests: Production data & systems access restoration for Connie Chen - https://phabricator.wikimedia.org/T356645 (10Jelto) [09:34:38] 10Data-Platform-SRE, 10SRE, 10SRE-Access-Requests: Production data & systems access restoration for Connie Chen - https://phabricator.wikimedia.org/T356645 (10Jelto) [09:36:41] 10Data-Platform-SRE, 10SRE, 10SRE-Access-Requests: Production data & systems access restoration for Connie Chen - https://phabricator.wikimedia.org/T356645 (10Jelto) @cchen I unblocked your wikitech account. I checked all services above which should work. Can you try again accessing superset? (or resetting... [09:50:11] !log failover hadoop namenode back to an-master1003 T353776 [09:50:14] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [09:50:14] T353776: Bring an-worker11[57-75] into service - https://phabricator.wikimedia.org/T353776 [09:54:37] 10Data-Platform-SRE (2024.01.22 - 2024.02.11): Bring an-worker11[57-75] into service - https://phabricator.wikimedia.org/T353776 (10Stevemunene) 05Open→03Resolved [09:54:40] 10Data-Platform-SRE: Decommission an-worker10[78-95] & an-worker1116 - https://phabricator.wikimedia.org/T353784 (10Stevemunene) [10:25:14] 10Data-Platform-SRE, 10Epic: [Epic] define a strategy around alerting for Data Platform SRE and implement it - https://phabricator.wikimedia.org/T345698 (10BTullis) [10:25:31] 10Data-Platform-SRE, 10observability, 10Epic: [Epic] Review alerting strategy for Data Platform SRE - https://phabricator.wikimedia.org/T346438 (10BTullis) [10:27:15] (HdfsFSImageAge) firing: The HDFS FSImage on analytics-hadoop:an-master1003:10080 is too old. - https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hadoop/Alerts#HDFS_FSImage_too_old - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&viewPanel=129&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsFSImageAge [10:32:15] (HdfsFSImageAge) firing: (2) The HDFS FSImage on analytics-hadoop:an-master1003:10080 is too old. - https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hadoop/Alerts#HDFS_FSImage_too_old - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&viewPanel=129&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsFSImageAge [10:42:49] 10Data-Platform-SRE, 10observability, 10Epic: [Epic] Review alerting strategy for Data Platform SRE - https://phabricator.wikimedia.org/T346438 (10BTullis) A recent change by SRE observability has made this even more pressing, because the data-engineering-alerts@lists.wikimedia.org list is being overloaded w... [10:57:16] (HdfsFSImageAge) resolved: (2) The HDFS FSImage on analytics-hadoop:an-master1003:10080 is too old. - https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hadoop/Alerts#HDFS_FSImage_too_old - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&viewPanel=129&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsFSImageAge [10:58:34] 10Data-Platform-SRE (2024.01.22 - 2024.02.11), 10Patch-For-Review: Ensure Data Platform SREs have a contact group in puppet/alerting - https://phabricator.wikimedia.org/T342578 (10BTullis) a:05bking→03BTullis @bking - I hope you don't mind, but I'm going to claim this ticket for a little bit. Recent [[http... [11:08:35] 10Data-Platform-SRE, 10SRE, 10SRE-Access-Requests: Remove production data access for former WMDE staff member goransm - https://phabricator.wikimedia.org/T356279 (10AndrewTavis_WMDE) @Manuel and I would suggest that this task remain open. Decisions on the data processes that require this account's private da... [12:03:43] (SystemdUnitFailed) firing: (17) refinery-sqoop-whole-mediawiki.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:13:30] (SystemdUnitFailed) firing: (18) refinery-sqoop-whole-mediawiki.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:20:16] 10Data-Engineering (Sprint 8), 10Patch-For-Review: [Data Quality] Develop Airflow post processing instrumentation to collect and log configurable data metrics - https://phabricator.wikimedia.org/T349763 (10gmodena) tl;dr: our approach to address this spike is currently documented at https://wikitech.wikimedia.... [13:21:59] (DiskSpace) firing: Disk space an-test-worker1001:9100:/ 0% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=an-test-worker1001 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [13:34:45] 10Data-Platform-SRE (2024.01.22 - 2024.02.11), 10Patch-For-Review: Ensure Data Platform SREs have a contact group in puppet/alerting - https://phabricator.wikimedia.org/T342578 (10BTullis) [13:34:51] 10Data-Platform-SRE, 10Observability-Alerting, 10observability: Create VictorOps config for new Data Platform SRE team - https://phabricator.wikimedia.org/T344202 (10BTullis) 05Stalled→03Open p:05Low→03Medium a:03BTullis Could someone from the #observability team rename the `analytics` routing key... [13:37:37] 10Data-Platform-SRE (2024.01.22 - 2024.02.11), 10Observability-Alerting, 10observability: Create VictorOps config for new Data Platform SRE team - https://phabricator.wikimedia.org/T344202 (10BTullis) [13:47:02] 10Data-Platform-SRE (2024.01.22 - 2024.02.11), 10Observability-Alerting, 10observability: Create VictorOps config for new Data Platform SRE team - https://phabricator.wikimedia.org/T344202 (10fgiunchedi) >>! In T344202#9524747, @BTullis wrote: > Could someone from the #observability team rename the `analytic... [13:48:30] (SystemdUnitFailed) firing: (19) refinery-sqoop-whole-mediawiki.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:53:22] 10Data-Engineering, 10Data Products, 10Observability-Logging, 10Traffic, 10Patch-For-Review: Move analytics log from Varnish to HAProxy - https://phabricator.wikimedia.org/T351117 (10gmodena) >>! In T351117#9521093, @Fabfur wrote: > Some updates about the ongoing work: Hey @Fabfur, thanks for this! Blo... [14:14:07] 10Data-Engineering, 10Data-Platform-SRE (2024.01.22 - 2024.02.11): [Iceberg Migration] P.O.C. on Iceberg sensor using Postgres table to keep status of updates - https://phabricator.wikimedia.org/T340466 (10mforns) @BTullis, I think the Data Engineering team wants to use the dataset state store for this. I wond... [14:23:40] 10Data-Platform-SRE (2024.01.22 - 2024.02.11), 10Infrastructure-Foundations, 10Puppet-Core, 10SRE, and 5 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619 (10MoritzMuehlenhoff) >>! In T349619#9521720, @Volans wrote: > We could either catch the exception and retry or acquire a loc... [14:40:00] 10Data-Engineering, 10MediaWiki-extensions-EventLogging, 10Data Products (Data Products Sprint 09), 10Technical-Debt: Fix public documentation for mw.eventLog.submit() and dispatch() - https://phabricator.wikimedia.org/T357003 (10phuedx) [14:42:54] 10Data-Engineering, 10MediaWiki-extensions-EventLogging, 10Data Products (Data Products Sprint 09), 10Technical-Debt: Fix public documentation for mw.eventLog.submit() and dispatch() - https://phabricator.wikimedia.org/T357003 (10phuedx) [14:43:19] 10Data-Engineering, 10MediaWiki-extensions-EventLogging, 10Metrics Platform Backlog, 10Data Products (Data Products Sprint 09), 10Technical-Debt: Fix public documentation for mw.eventLog.submit() and dispatch() - https://phabricator.wikimedia.org/T357003 (10phuedx) [14:46:05] (KafkaReplicationFactorTooLow) firing: ... [14:46:11] Kafka topic codfw.ios.edit_interaction replication factor is too low on jumbo-eqiad - https://wikitech.wikimedia.org/wiki/Kafka/Administration#Increase_a_topic's_replication_factor - https://grafana.wikimedia.org/d/000000234/kafka-by-topic?var-kafka_cluster=jumbo-eqiad&var-kafka_broker=All&var-topic=codfw.ios.edit_interaction&viewPanel=40 - https://alerts.wikimedia.org/?q=alertname%3DKafkaReplicationFactorTooLow [14:46:48] (PuppetFailure) firing: Puppet has failed on an-test-worker1001:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [14:51:05] (KafkaReplicationFactorTooLow) resolved: ... [14:51:05] Kafka topic codfw.ios.edit_interaction replication factor is too low on jumbo-eqiad - https://wikitech.wikimedia.org/wiki/Kafka/Administration#Increase_a_topic's_replication_factor - https://grafana.wikimedia.org/d/000000234/kafka-by-topic?var-kafka_cluster=jumbo-eqiad&var-kafka_broker=All&var-topic=codfw.ios.edit_interaction&viewPanel=40 - https://alerts.wikimedia.org/?q=alertname%3DKafkaReplicationFactorTooLow [14:52:11] 10Data-Engineering, 10Data Products, 10Observability-Logging, 10Traffic, 10Patch-For-Review: Move analytics log from Varnish to HAProxy - https://phabricator.wikimedia.org/T351117 (10Fabfur) Some updates: * For **backend**, **dt**, **http_status**, **ip**, **response_size** keys, they are now aligned t... [15:10:29] 10Data-Platform-SRE (2024.01.22 - 2024.02.11), 10Patch-For-Review: Ensure Data Platform SREs have a contact group in puppet/alerting - https://phabricator.wikimedia.org/T342578 (10BTullis) I have modified the group settings for data-platform-alerts@wikimedia.org so that anyone on the web can post. I believe th... [15:32:08] (03CR) 10Ottomata: Fix convertToSchema to work with array of structs (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/619034 (https://phabricator.wikimedia.org/T259924) (owner: 10Ottomata) [15:33:47] 10Data-Engineering: [NEEDS GROOMING][SPIKE] Extract refine schema management into a dedicated tool - https://phabricator.wikimedia.org/T356762 (10Ottomata) Hello! I'm not entirely sure what this ticket is trying to do, but here's some hopefully useful information: RefineTarget has a `useMergedSchemaForRead` opt... [15:56:21] 10Data-Platform-SRE: Fix/monitor Elastic S3 repository status - https://phabricator.wikimedia.org/T357018 (10bking) [16:10:11] 10Data-Platform-SRE, 10SRE, 10SRE-Access-Requests: Remove production data access for former WMDE staff member goransm - https://phabricator.wikimedia.org/T356279 (10Dzahn) 05In progress→03Stalled @AndrewTavis_WMDE Ok, thanks for the update. Confirmed. Keeping open and just setting to stalled for the mome... [16:11:04] 10Data-Platform-SRE, 10SRE, 10SRE-Access-Requests: Production data & systems access restoration for Connie Chen - https://phabricator.wikimedia.org/T356645 (10Dzahn) a:03cchen [16:11:18] 10Data-Platform-SRE, 10SRE, 10SRE-Access-Requests: Production data & systems access restoration for Connie Chen - https://phabricator.wikimedia.org/T356645 (10Dzahn) 05Open→03In progress [16:30:08] seems like an-test-worker1001's disk is full "hadoop-yarn-nodemanager[3665503]: /usr/lib/hadoop-yarn/sbin/yarn-daemon.sh: line 129: echo: write error: No space left on device" [16:32:18] hdfs data is taking most of that disk space. Is there something we can cleanup? [16:32:43] 10Data-Platform-SRE (2024.01.22 - 2024.02.11): Stale data/failed queries on wikidatawiki index - https://phabricator.wikimedia.org/T356941 (10bking) 05Open→03In progress [16:32:47] 10Data-Platform-SRE (2024.01.22 - 2024.02.11), 10Patch-For-Review: Migrate cloudelastic from public to private IPs - https://phabricator.wikimedia.org/T355617 (10bking) [16:38:28] joal: you have about 40GB of data in your user HDFS directory in the test-analytics cluster. Is there something you can cleanup? Thanks! [16:39:20] also, /tmp/ebysans is about 50GB. I'm not sure how useful it is [16:52:13] oops, nevermind. The root device is full, not the HDFS device. Forget I said anything [16:58:54] the /tmp folder was full of .jar files that had accumulated. I deleted everything that hadn't been accessed in > 1 day. / is now back art 23% usage [17:01:36] (DiskSpace) resolved: Disk space an-test-worker1001:9100:/ 0% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=an-test-worker1001 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [17:06:11] 10Data-Engineering, 10Data-Platform-SRE, 10Epic: Migrate the Analytics Superset instances to our DSE Kubernetes cluster - https://phabricator.wikimedia.org/T347710 (10brouberol) [17:13:29] (SystemdUnitFailed) firing: (19) refinery-sqoop-whole-mediawiki.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [17:16:47] 10Data-Engineering, 10Data Products, 10Observability-Logging, 10Traffic, 10Patch-For-Review: Move analytics log from Varnish to HAProxy - https://phabricator.wikimedia.org/T351117 (10xcollazo) >The sequence field now is determined as timestamp + request counter. Even if HAProxy restarts, the timestamp se... [17:26:49] (PuppetFailure) resolved: Puppet has failed on an-test-worker1001:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [17:26:56] 10Data-Engineering, 10CX-cxserver, 10Citoid, 10Content-Transform-Team-WIP, and 10 others: Migrate node-based services in production to node18 - https://phabricator.wikimedia.org/T349118 (10CodeReviewBot) jforrester closed https://gitlab.wikimedia.org/repos/abstract-wiki/wikifunctions/function-orchestrator/... [17:27:06] 10Data-Engineering, 10Data Products, 10Observability-Logging, 10Traffic, 10Patch-For-Review: Move analytics log from Varnish to HAProxy - https://phabricator.wikimedia.org/T351117 (10Fabfur) @xcollazo Unfortunately the **timestamp** is referred to the request timestamp. I can check if I can somehow use t... [17:28:29] (SystemdUnitFailed) firing: (19) refinery-sqoop-whole-mediawiki.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [17:38:28] 10Data-Platform-SRE: Remove nickifeajika from analytics-privatedata-users - https://phabricator.wikimedia.org/T353665 (10lbowmaker) [17:42:36] 10Data-Platform-SRE (2024.01.22 - 2024.02.11): DataHub is throwing errors on search - https://phabricator.wikimedia.org/T356783 (10BTullis) [17:43:04] 10Data-Platform-SRE (2024.01.22 - 2024.02.11): DataHub is throwing errors on search - https://phabricator.wikimedia.org/T356783 (10BTullis) 05Open→03Resolved [17:46:00] 10Data-Platform-SRE, 10Wikidata, 10Wikidata-Query-Service, 10Patch-For-Review: Allow federated queries with the MiMoTextBase SPARQL endpoint - https://phabricator.wikimedia.org/T351488 (10HinMar) > @HinMar Thanks, that context is helpful. I'm merging the patch now; if you could provide an example query to... [17:49:29] 10Data-Platform-SRE, 10SRE, 10SRE-Access-Requests: Production data & systems access restoration for Connie Chen - https://phabricator.wikimedia.org/T356645 (10cchen) @Jelto I just reset wikitech account. Superset, hue and Jupyterhub access all work now. thank you! [17:51:45] 10Data-Platform-SRE, 10SRE, 10SRE-Access-Requests: Production data & systems access restoration for Connie Chen - https://phabricator.wikimedia.org/T356645 (10Dzahn) [17:52:00] 10Data-Platform-SRE, 10SRE, 10SRE-Access-Requests: Production data & systems access restoration for Connie Chen - https://phabricator.wikimedia.org/T356645 (10Dzahn) a:05cchen→03None [17:53:27] 10Data-Platform-SRE: Fix/monitor Elastic S3 repository status - https://phabricator.wikimedia.org/T357018 (10bking) I've deleted the elastic snapshot repo config from all eqiad and codfw clusters. Example command: `curl -XDELETE "https://search.svc.eqiad.wmnet:9243/_snapshot/elastic_snaps"` Next steps are to d... [17:53:40] 10Data-Engineering, 10Data Products, 10Observability-Logging, 10Traffic, 10Patch-For-Review: Move analytics log from Varnish to HAProxy - https://phabricator.wikimedia.org/T351117 (10Fabfur) I've adapted the Benthos configuration to produce an output similar to the current (webrequest) data: ` { "acc... [17:55:46] 10Data-Engineering, 10Data Products: Adapt Sqoop to pagelinks schema change - https://phabricator.wikimedia.org/T345771 (10lbowmaker) [17:56:29] 10Data-Engineering, 10Data Products: Modify ClickStreamBuilder pipeline to cope with pagelinks schema changes - https://phabricator.wikimedia.org/T355588 (10lbowmaker) [17:58:36] 10Data-Platform-SRE, 10SRE, 10SRE-Access-Requests: Production data & systems access restoration for Connie Chen - https://phabricator.wikimedia.org/T356645 (10Dzahn) p:05High→03Medium [21:28:43] (SystemdUnitFailed) firing: (17) refinery-sqoop-whole-mediawiki.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed