[00:16:11] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:19:58] (SystemdUnitFailed) firing: monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:20:35] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: monitor_refine_eventlogging_analytics.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:16:33] (DiskSpace) firing: Disk space an-test-worker1002:9100:/ 3.733% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=an-test-worker1002 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [04:19:58] (SystemdUnitFailed) firing: monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:16:33] (DiskSpace) firing: Disk space an-test-worker1002:9100:/ 3.473% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=an-test-worker1002 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [08:19:59] (SystemdUnitFailed) firing: monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:22:43] Hi folks morning! [08:23:00] I am going to upgrade istio on DSE, lemme know if you are ok or not [08:23:12] (buster to bullseye, no upstream version change) [08:32:45] 10Data-Platform-SRE, 10serviceops, 10Discovery-Search (Current work): Enable mediawiki.cirrussearch.page_rerender.v1 on all public wikis - https://phabricator.wikimedia.org/T351503 (10pfischer) @brouberol, if you have time, could you configure `codfw.mediawiki.cirrussearch.page_rerender.v1` and `eqiad.mediaw... [08:39:15] (proceeding) [08:42:10] all good! [08:55:07] 10Data-Engineering (Sprint 6), 10Patch-For-Review: [Data Quality] Implement Simple Monitoring Dashboard for Airflow Jobs - https://phabricator.wikimedia.org/T349532 (10Antoine_Quhen) [[ https://gerrit.wikimedia.org/r/c/operations/puppet/+/979118 | In this puppet patch, we are adding configuration to send more... [09:03:59] 10Data-Platform-SRE, 10serviceops, 10Discovery-Search (Current work): Enable mediawiki.cirrussearch.page_rerender.v1 on all public wikis - https://phabricator.wikimedia.org/T351503 (10elukey) >>! In T351503#9377865, @pfischer wrote: > @brouberol, if you have time, could you configure `codfw.mediawiki.cirruss... [09:05:52] btullis: seeing as we don't have any ingress DNS record for an ingress gateway for the dse k8s cluster, I suppose we don't have such ingress gateway currently running, right? [09:07:24] although I'm seeing https://gerrit.wikimedia.org/r/plugins/gitiles/operations/deployment-charts/+/master/custom_deploy.d/istio/dse-k8s/config.yaml#93, so it might be a matter of enabling it? [09:08:49] ah, I actually see istio-ingressgateway pods running in the istio-system NS on dse, so I think we're just lacking the DNS record [09:08:50] brouberol: yes, I think it's there, but we haven't ever used it yet so there is no DNS record set up for it. [09:08:58] snap [09:13:52] nice thanks 👍 I found the IP exposed by the istio-ingressgateway service, so I'll add that to our DNS [09:16:34] (DiskSpace) firing: Disk space an-test-worker1002:9100:/ 3.336% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=an-test-worker1002 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [09:17:54] 10Data-Engineering (Sprint 6), 10Data-Platform-SRE: Configure ingress to the spark history servers - https://phabricator.wikimedia.org/T352639 (10brouberol) [09:20:55] 10Data-Engineering (Sprint 6), 10Data-Platform-SRE: [Data Platform] Deploy Spark History Service - https://phabricator.wikimedia.org/T330176 (10brouberol) [09:57:39] 10Data-Engineering, 10Epic: Upgrade analytics-hadoop to Spark 3 + scala 2.12 - https://phabricator.wikimedia.org/T291464 (10BTullis) Can this task be closed now? [10:01:01] !log Marked TaskInstance: projectview_geo.move_data_to_archive scheduled__2023-12-02T04:00:00 as succeeded in airflow analytics. [10:01:03] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:32:52] 10Data-Platform-SRE, 10serviceops, 10Discovery-Search (Current work): Enable mediawiki.cirrussearch.page_rerender.v1 on all public wikis - https://phabricator.wikimedia.org/T351503 (10pfischer) @elukey, for page re-render, we're definitely interested only in the latest event, since we only care for the fact... [10:45:32] 10Data-Platform-SRE, 10serviceops, 10Discovery-Search (Current work): Enable mediawiki.cirrussearch.page_rerender.v1 on all public wikis - https://phabricator.wikimedia.org/T351503 (10elukey) >>! In T351503#9378213, @pfischer wrote: > @elukey, for page re-render, we're definitely interested only in the lates... [10:47:42] 10Data-Engineering (Sprint 6), 10Data-Platform-SRE, 10Patch-For-Review: Configure ingress to the spark history servers - https://phabricator.wikimedia.org/T352639 (10BTullis) We have configured the following two IP addresses in netbox, for the ingress gateway service on dse-k8s * `10.2.2.91/32` k8s-ingress-... [10:58:12] 10Data-Engineering, 10Epic: Upgrade analytics-hadoop to Spark 3 + scala 2.12 - https://phabricator.wikimedia.org/T291464 (10JAllemandou) I think it can be closed, yes - the remaining tasks are almost all about Refine improvements we can now implement thanks to the move to Spark3. [11:03:43] !log re-ran refine_eventlogging_analytics for MobileWikiAppiOSSessions [11:03:44] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [11:05:37] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [11:05:54] 10Data-Platform-SRE, 10serviceops, 10Discovery-Search (Current work): Enable mediawiki.cirrussearch.page_rerender.v1 on all public wikis - https://phabricator.wikimedia.org/T351503 (10pfischer) > We currently don't have any git-ops-like way to apply specific settings to topics Okay, I get your concern. Is t... [11:09:44] (SystemdUnitFailed) resolved: monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:09:49] 10Data-Platform-SRE, 10serviceops, 10Discovery-Search (Current work): Enable mediawiki.cirrussearch.page_rerender.v1 on all public wikis - https://phabricator.wikimedia.org/T351503 (10elukey) >>! In T351503#9378361, @pfischer wrote: >> We currently don't have any git-ops-like way to apply specific settings t... [11:14:39] 10Data-Platform-SRE: Check home/HDFS leftovers of aranyap - https://phabricator.wikimedia.org/T340945 (10MoritzMuehlenhoff) >>! In T340945#9376049, @Jcross wrote: > Hi, sorry for the delay on this. Aranya does require production shell access and we'd like to keep her in the analytics-priveatedata-users group i... [11:17:16] joal: I have seen some odd looking flags from a refine job, whilst investigating an ops week task. Have you seen this before? [11:17:23] https://www.irccloud.com/pastebin/YNmbCkOK/ [11:21:44] btullis: I have seen this before, weird state sometimes happen - I'm in meeting, will ping you after [11:51:19] 10Data-Engineering (Sprint 6), 10Data-Platform-SRE, 10Patch-For-Review: Configure ingress to the spark history servers - https://phabricator.wikimedia.org/T352639 (10BTullis) I think that you will also need to add `k8s-ingress-dse: {}` to the `profile::lvs::realserver::pools` hash in `hieradata/role/common/d... [12:14:00] !log pool druid1010 T336043 [12:14:03] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [12:14:03] T336043: Decommission druid100[4-6] - https://phabricator.wikimedia.org/T336043 [12:24:10] 10Data-Platform-SRE: Check home/HDFS leftovers of andyrussg - https://phabricator.wikimedia.org/T338234 (10BTullis) Intermediate tarballs are at: ` stat1005:/home/andyrussg/stat1005-andyrussg-hql.tar.gz stat1005:/home/andyrussg/stat1005-andyrussg-ipynb.tar.gz stat1007:/home/andyrussg/stat1007-andyrussg-hql.tar.g... [13:16:34] (DiskSpace) firing: Disk space an-test-worker1002:9100:/ 3.209% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=an-test-worker1002 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [13:41:44] !log starting a rolling restart of the daemons on the analytics druid cluster, to make sure that they restart cleanly after the puppet 7 upgrade [13:41:46] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:43:30] 10Data-Platform-SRE, 10serviceops, 10Discovery-Search (Current work): Enable mediawiki.cirrussearch.page_rerender.v1 on all public wikis - https://phabricator.wikimedia.org/T351503 (10pfischer) > Not right now, but if needed we'll surely be able to create something. It's not blocking us right now, so we cou... [13:47:17] 10Data-Platform-SRE: Check home/HDFS leftovers of ntsako - https://phabricator.wikimedia.org/T343189 (10BTullis) a:03BTullis [13:47:35] 10Data-Platform-SRE, 10serviceops, 10Discovery-Search (Current work): Enable mediawiki.cirrussearch.page_rerender.v1 on all public wikis - https://phabricator.wikimedia.org/T351503 (10pfischer) [13:50:56] 10Data-Platform-SRE: Check home/HDFS leftovers of aranyap - https://phabricator.wikimedia.org/T340945 (10BTullis) 05Open→03Resolved @JCross has confirmed that the files may be deleted, so I'll do that now. ` btullis@cumin1001:~$ sudo cumin 'C:profile::analytics::cluster::client or C:profile::hadoop::master o... [13:53:30] !log bringing an-coord1003 into service as an `analytics_cluster::coordinator` for T336045 [13:53:32] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:53:32] T336045: Bring an-coord100[3-4] into service - https://phabricator.wikimedia.org/T336045 [14:05:27] 10Data-Engineering, 10Observability-Logging, 10Traffic: Move analytics log from Varnish to HAProxy - https://phabricator.wikimedia.org/T351117 (10Fabfur) Hi @Milimetric sorry for the late reply, I'll try to answer to your question but consider we're still investigating about all pro and cons of this "migrati... [14:15:42] (SystemdUnitFailed) firing: (2) hive-metastore.service Failed on an-coord1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:16:50] 10Data-Engineering: [Iceberg Migration] Migrate aqs hourly tables to Iceberg - https://phabricator.wikimedia.org/T352669 (10lbowmaker) [14:18:38] 10Data-Engineering: [Iceberg Migration] Migrate browser_general tables to Iceberg - https://phabricator.wikimedia.org/T352670 (10lbowmaker) [14:20:09] 10Data-Engineering: [Iceberg Migration] Migrate interlanguage tables to Iceberg - https://phabricator.wikimedia.org/T352671 (10lbowmaker) [14:20:42] (SystemdUnitFailed) resolved: (2) hive-metastore.service Failed on an-coord1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:21:38] 10Data-Engineering: [Iceberg Migration] Migrate session length tables to Iceberg - https://phabricator.wikimedia.org/T352672 (10lbowmaker) [14:29:27] btullis: Heya - please excuse me I got completely sidetracked [14:30:17] btullis: the weird state can happen if the refine job fails in non-usual ways, and then we endup with both _REFINED and _REFINE_FAILED flags [14:30:49] In that case, I delete the _REFINED one, and rerun the job with a --ignore-failure parameter [14:32:49] (03CR) 10Mforns: [V: 03+2 C: 03+2] Add Commons Impact Metrics code drafts for later [analytics/refinery] - 10https://gerrit.wikimedia.org/r/979341 (https://phabricator.wikimedia.org/T351836) (owner: 10Mforns) [14:38:27] joal: Thanks. I will do that. [14:38:55] !log cleared some space on -atest-worker1002 by running: `sudo find /tmp -type f -mtime +30 -delete` [14:38:57] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:41:16] (DiskSpace) resolved: Disk space an-test-worker1002:9100:/ 3.254% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=an-test-worker1002 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [14:42:18] !log restarted archiva service on archiva1002 [14:42:19] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:47:35] !log re-running refine_event for mediawiki_cirrussearch_request failure [14:47:36] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:53:43] 10Analytics, 10Data-Engineering (Sprint 6), 10Event-Platform, 10Patch-For-Review, 10User-notice: [Event Platform] Enable canary events for all MediaWiki streams - https://phabricator.wikimedia.org/T266798 (10JArguello-WMF) [14:58:40] 10Data-Platform-SRE: Downloading from Archiva.wikimedia.org is slower than Maven Central - https://phabricator.wikimedia.org/T273086 (10xcollazo) (Just passing by to +1 this issue since it hit me quite badly recently while trying to build a maven based project on a stat machine) [15:18:24] btullis: your ops-week email answers are awesome <3 [15:39:54] 10Data-Engineering (Sprint 6): [Iceberg Migration] Migrate session length tables to Iceberg - https://phabricator.wikimedia.org/T352672 (10lbowmaker) [15:39:56] 10Data-Engineering (Sprint 6): [Iceberg Migration] Migrate interlanguage tables to Iceberg - https://phabricator.wikimedia.org/T352671 (10lbowmaker) [15:39:58] 10Data-Engineering (Sprint 6): [Iceberg Migration] Migrate aqs hourly tables to Iceberg - https://phabricator.wikimedia.org/T352669 (10lbowmaker) [15:39:59] (PuppetFailure) firing: Puppet has failed on an-coord1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [15:40:11] 10Data-Engineering (Sprint 6): [Iceberg Migration] Migrate pageview tables to Iceberg - https://phabricator.wikimedia.org/T347690 (10lbowmaker) [15:40:13] 10Data-Engineering (Sprint 6): [Iceberg Migration] Migrate browser_general tables to Iceberg - https://phabricator.wikimedia.org/T352670 (10lbowmaker) [15:50:49] 10Data-Engineering, 10Data Products: Airflow unittests failing with TypeError: Pool.create_or_update_pool() - https://phabricator.wikimedia.org/T352577 (10xcollazo) Briefly looked into this, some notes: * Generating a conda env via `conda env create --name airflow-dags-2_7_3 -f conda-environment.yml` on maco... [15:51:31] 10Data-Engineering, 10Data Products: [blocker] Airflow unittests failing with TypeError: Pool.create_or_update_pool() - https://phabricator.wikimedia.org/T352577 (10xcollazo) p:05Triage→03High [15:54:17] 10Data-Engineering (Sprint 6): [Data Quality] Finalize Data Quality Metrics Schema - https://phabricator.wikimedia.org/T352683 (10lbowmaker) [15:58:00] 10Data-Engineering, 10Data Products: [blocker] Airflow unittests failing with TypeError: Pool.create_or_update_pool() - https://phabricator.wikimedia.org/T352577 (10xcollazo) Example CI pipeline with no changes other than to the `README.md` that fails: https://gitlab.wikimedia.org/repos/data-engineering/airflo... [16:01:08] 10Data-Engineering (Sprint 6): [Data Quality] Metrics Alerting - https://phabricator.wikimedia.org/T352685 (10lbowmaker) [16:01:39] 10Data-Engineering, 10Data-Platform-SRE, 10Data Products: [blocker] Airflow unittests failing with TypeError: Pool.create_or_update_pool() - https://phabricator.wikimedia.org/T352577 (10JAllemandou) [16:02:18] 10Data-Engineering, 10Data-Platform-SRE, 10Data Products: [blocker] Airflow unittests failing with TypeError: Pool.create_or_update_pool() - https://phabricator.wikimedia.org/T352577 (10JAllemandou) Just added data-platform-SRE project to the list of projects. Ping @BTullis on this as well. [16:04:21] 10Data-Engineering (Sprint 6): [Data Quality] Adopt iceberg as the data quality metrics table backend - https://phabricator.wikimedia.org/T352687 (10lbowmaker) [16:06:38] 10Data-Engineering (Sprint 6): [Data Quality] Move MetricsExporter to refinery-spark - https://phabricator.wikimedia.org/T352688 (10lbowmaker) [16:19:52] 10Data-Platform-SRE, 10Discovery-Search (Current work): Test backfilling for cirrus-streaming-updater - https://phabricator.wikimedia.org/T350826 (10Gehel) a:03bking [16:22:00] 10Data-Platform-SRE, 10Discovery-Search (Current work): Test backfilling for cirrus-streaming-updater - https://phabricator.wikimedia.org/T350826 (10Gehel) Once we have a working deployment on Cloudelastic (T352335), we can just re-run a backfill operation there. [16:33:06] 10Data-Platform-SRE, 10Discovery-Search (Current work): Investigate performance differences between wdqs2022 and older hosts - https://phabricator.wikimedia.org/T336443 (10Gehel) [16:33:44] 10Data-Engineering, 10Data-Platform-SRE, 10Data Products: [blocker] Airflow unittests failing with TypeError: Pool.create_or_update_pool() - https://phabricator.wikimedia.org/T352577 (10BTullis) The latest pipeline that you linked to has a different error: ` /opt/miniconda/etc/profile.d/conda.sh: line 5: 37... [16:34:10] 10Data-Platform-SRE, 10Discovery-Search (Current work): Investigate performance differences between wdqs2022 and older hosts - https://phabricator.wikimedia.org/T336443 (10Gehel) [18:42:19] 10Analytics, 10Data-Engineering-Icebox, 10Tool-Pageviews: Allow users to query mediarequests using a file page link - https://phabricator.wikimedia.org/T244712 (10mforns) @Dominicbm, hi! We Data Products team are reviewing this task now to see what we can do. We realized that there might be some overlap betw... [19:11:00] 10Quarry: Allow search within SQL - https://phabricator.wikimedia.org/T352212 (10Aklapper) [19:15:04] 10Data-Platform-SRE, 10Discovery-Search (Current work): Investigate performance differences between wdqs2022 and older hosts - https://phabricator.wikimedia.org/T336443 (10bking) Created an [[ https://etherpad.wikimedia.org/p/wdqs-T336443 | Etherpad ]] for brainstorming/test results/etc. [19:18:23] 10Data-Platform-SRE: Test hardware-based performance optimizations for WDQS import - https://phabricator.wikimedia.org/T351662 (10bking) @dr0ptp4kt mentioned that it's possible to limit the scope of a data reload. Specifically, it's possible to execute this against a single munged file instead of completing a wh... [19:40:00] (PuppetFailure) firing: Puppet has failed on an-coord1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [19:52:28] 10Data-Platform-SRE, 10serviceops, 10Discovery-Search (Current work): Enable mediawiki.cirrussearch.page_rerender.v1 on all public wikis - https://phabricator.wikimedia.org/T351503 (10EBernhardson) Current plan for gradual deploy is to start with a selection of wikis that add up to ~25% of the total rate. If... [19:56:49] 10Data-Platform-SRE, 10sre-alert-triage: Alert in need of triage: SmartNotHealthy (instance an-worker1086:9100) - https://phabricator.wikimedia.org/T352168 (10Jclark-ctr) [20:23:34] (03PS2) 10Xcollazo: Fix recursion for Maps with Structs on SanitizeTransformation [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/979406 (https://phabricator.wikimedia.org/T349121) [20:24:39] (03CR) 10Xcollazo: "Ready for reviews. Let's get a bunch of eyes on this as this is my first refine contribution." [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/979406 (https://phabricator.wikimedia.org/T349121) (owner: 10Xcollazo) [20:27:32] Starting build #30 for job wikimedia-event-utilities-maven-release-docker [20:30:57] Project wikimedia-event-utilities-maven-release-docker build #30: 09SUCCESS in 3 min 25 sec: https://integration.wikimedia.org/ci/job/wikimedia-event-utilities-maven-release-docker/30/ [20:45:42] 10Data-Engineering, 10Release-Engineering-Team, 10GitLab (CI & Job Runners): Unblock Dockerfile syntax to build images with Gitlab trusted runner - https://phabricator.wikimedia.org/T351792 (10CodeReviewBot) aqu updated https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/553 Fi... [20:56:31] 10Data-Engineering, 10Release-Engineering-Team, 10GitLab (CI & Job Runners): Unblock Dockerfile syntax to build images with Gitlab trusted runner - https://phabricator.wikimedia.org/T351792 (10CodeReviewBot) milimetric merged https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/5... [21:30:41] 10Data-Engineering (Sprint 6), 10Event-Platform: [Event Platform] mediawiki.page_content_change.v1 topic should be partitioned. - https://phabricator.wikimedia.org/T345806 (10gmodena) Pick this up now that mediawiki-event-utilities 1.3.3 has been released. I'll start by version bumping deps in the python wrap... [21:43:37] 10Data-Platform-SRE: Test hardware-based performance optimizations for WDQS import - https://phabricator.wikimedia.org/T351662 (10bking) @Addshore has [[ https://addshore.com/2021/02/testing-wdqs-blazegraph-data-load-performance/ | completed an extensive battery of tests related to data reloading ]]. Any benchma... [22:28:04] (03CR) 10Kimberly Sarabia: [C: 03+2] "This looks great. We have tested this locally and events firing normally. Thanks so much." [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/978718 (https://phabricator.wikimedia.org/T351298) (owner: 10Clare Ming) [22:28:43] (03Merged) 10jenkins-bot: Add custom schema for *uiactionstracking instruments [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/978718 (https://phabricator.wikimedia.org/T351298) (owner: 10Clare Ming) [23:00:16] 10Data-Platform-SRE: Test hardware-based performance optimizations for WDQS import - https://phabricator.wikimedia.org/T351662 (10bking) ^^ last comment was meant for T336443 , apologies for repeating the earlier comment. [23:00:30] 10Data-Platform-SRE, 10Discovery-Search (Current work): Investigate performance differences between wdqs2022 and older hosts - https://phabricator.wikimedia.org/T336443 (10bking) @Addshore has [[ https://addshore.com/2021/02/testing-wdqs-blazegraph-data-load-performance/ | completed an extensive battery of tes... [23:40:14] (PuppetFailure) firing: Puppet has failed on an-coord1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure