[00:22:42] (SystemdUnitFailed) firing: monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:24:28] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: monitor_refine_eventlogging_analytics.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:06:52] PROBLEM - Check systemd state on an-worker1085 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:07:42] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [01:08:16] RECOVERY - Check systemd state on an-worker1085 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:12:42] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [03:00:24] 10Quarry: [bug] "Internal Server Error" when logging into Quarry - https://phabricator.wikimedia.org/T333043 (10Novem_Linguae) [03:06:53] 10Quarry: On first visit to Quarry in that browser session, error 500 (intermittent) - https://phabricator.wikimedia.org/T345685 (10Novem_Linguae) [03:07:04] 10Quarry: On first visit to Quarry in that browser session, error 500 (intermittent) - https://phabricator.wikimedia.org/T345685 (10Novem_Linguae) [03:08:03] 10Quarry: [bug] "Internal Server Error" when logging into Quarry - https://phabricator.wikimedia.org/T333043 (10Novem_Linguae) Thanks. I filed {T345685} [05:12:42] (SystemdUnitFailed) firing: monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:06:07] (03CR) 10Phuedx: [C: 03+1] Add Metrics Platform fragments by platform only [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/951191 (https://phabricator.wikimedia.org/T343557) (owner: 10Clare Ming) [07:08:52] (03PS13) 10Phuedx: Add analytics/metrics_platform/{app,web}/{click,view} schemas [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/952252 (https://phabricator.wikimedia.org/T344833) [07:09:27] (03CR) 10CI reject: [V: 04-1] Add analytics/metrics_platform/{app,web}/{click,view} schemas [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/952252 (https://phabricator.wikimedia.org/T344833) (owner: 10Phuedx) [08:00:49] (03CR) 10Elukey: [C: 03+1] Increase the max kafka message size for gobblin [analytics/refinery] - 10https://gerrit.wikimedia.org/r/954968 (https://phabricator.wikimedia.org/T307959) (owner: 10Btullis) [08:02:04] (03CR) 10Elukey: [C: 03+1] "I'd be very curious to know what value Gobblin is using now, the kafka defaults are around 1MB afaics, it should already have failed for m" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/954968 (https://phabricator.wikimedia.org/T307959) (owner: 10Btullis) [08:32:09] 10Data-Platform-SRE, 10Discovery-Search: Find/fix logstash logging for rdf-streaming-updater - https://phabricator.wikimedia.org/T345668 (10Gehel) p:05Triage→03High [08:33:42] 10Data-Platform-SRE: Rolling operation cookbook: Detect and remove failed index aliases - https://phabricator.wikimedia.org/T345449 (10Gehel) p:05Triage→03Medium [08:34:25] (03PS14) 10Phuedx: Add analytics/metrics_platform/{app,web}/{click,view} schemas [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/952252 (https://phabricator.wikimedia.org/T344833) [08:34:58] 10Data-Engineering, 10Data-Platform-SRE, 10Product-Analytics: Conda analytics environments breakage - conflicting dependencies between r-base and other - https://phabricator.wikimedia.org/T343823 (10Gehel) [08:37:31] 10Data-Engineering, 10Data-Platform-SRE, 10Product-Analytics: Conda analytics environments breakage - conflicting dependencies between r-base and other - https://phabricator.wikimedia.org/T343823 (10Gehel) One related solution: T321512 [08:43:35] 10Data-Platform-SRE: [Epic] define a strategy around alerting for Data Platform SRE and implement it - https://phabricator.wikimedia.org/T345698 (10Gehel) [08:43:42] 10Data-Platform-SRE, 10Epic: [Epic] define a strategy around alerting for Data Platform SRE and implement it - https://phabricator.wikimedia.org/T345698 (10Gehel) [08:45:45] 10Data-Platform-SRE, 10Epic: [Epic] define a strategy around alerting for Data Platform SRE and implement it - https://phabricator.wikimedia.org/T345698 (10Gehel) [08:45:47] 10Data-Engineering, 10Data-Platform-SRE, 10Observability-Alerting: Explore the use of Airflow notifiers for more flexible DAG failure handling - https://phabricator.wikimedia.org/T343234 (10Gehel) [08:45:54] 10Data-Platform-SRE, 10Epic: [Epic] define a strategy around alerting for Data Platform SRE and implement it - https://phabricator.wikimedia.org/T345698 (10Gehel) [08:45:56] 10Data-Engineering, 10Data-Platform-SRE, 10Observability-Metrics: Configure Airflow to send metrics to Prometheus - https://phabricator.wikimedia.org/T343232 (10Gehel) [08:50:15] 10Data-Engineering: Need data insight on the Hindi Wikipedia and Wikisource Edit-a-thon - https://phabricator.wikimedia.org/T345655 (10Aklapper) [09:02:18] 10Data-Engineering, 10Data-Platform-SRE, 10SRE: Grant IdempotentWrite Kafka Cluster ACL to User:ANONYMOUS in all Kafka clusters - https://phabricator.wikimedia.org/T334733 (10BTullis) Should we resolve this ticket now? The change has been applied to all clusters other than kafka-logging, and it's been noted... [09:12:57] (SystemdUnitFailed) firing: monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:45:22] 10Data-Platform-SRE, 10Product-Analytics: Allow connections to presto UI port - https://phabricator.wikimedia.org/T331455 (10Gehel) I'm uncomfortable adding more dependencies on SSH tunnels, and exposing a full admin interface just to get visibility on workloads. I'm declining this for now. If there is a stron... [09:45:31] 10Data-Platform-SRE, 10Product-Analytics: Allow connections to presto UI port - https://phabricator.wikimedia.org/T331455 (10Gehel) 05Open→03Declined [10:08:42] (03CR) 10Btullis: Increase the max kafka message size for gobblin (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/954968 (https://phabricator.wikimedia.org/T307959) (owner: 10Btullis) [10:24:56] (03CR) 10Elukey: [C: 03+1] Increase the max kafka message size for gobblin (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/954968 (https://phabricator.wikimedia.org/T307959) (owner: 10Btullis) [10:44:32] 10Data-Platform-SRE, 10Data-Catalog: Errors from datahub relating to the search indices - https://phabricator.wikimedia.org/T345616 (10BTullis) 05Open→03Resolved No conclusive answer to the investigation and no recurrence, but I will be on the lookout for any further errors of this type. [12:01:25] 10Data-Platform-SRE, 10SRE, 10SRE-Access-Requests: Requesting Creation of a new POSIX group and system user for the Analytics WMDE team. - https://phabricator.wikimedia.org/T345726 (10Stevemunene) [12:01:53] 10Data-Platform-SRE, 10SRE, 10SRE-Access-Requests: Requesting Creation of a new POSIX group and system user for the Analytics WMDE team. - https://phabricator.wikimedia.org/T345726 (10Stevemunene) [12:01:59] 10Data-Platform-SRE, 10Patch-For-Review: [Airflow] Setup Airflow instance for WMDE - https://phabricator.wikimedia.org/T340648 (10Stevemunene) [12:07:06] 10Data-Platform-SRE, 10Patch-For-Review: [Airflow] Setup Airflow instance for WMDE - https://phabricator.wikimedia.org/T340648 (10Stevemunene) Thank you for your response @Manuel , we shall be moving forward with `analytics-wmde` user, I have sent out the access request for this. Corresponding patches to follow. [12:26:48] (03CR) 10Gmodena: "Changes LGTM. Makes sense to avoid a version bump if the schema was never used." [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/951829 (https://phabricator.wikimedia.org/T325315) (owner: 10Peter Fischer) [12:50:16] (03CR) 10Gmodena: [C: 03+1] Skip schema-deterministic-types for metrics_event schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/954097 (https://phabricator.wikimedia.org/T344511) (owner: 10TChin) [12:57:13] brouberol: Hey! Could you add your contact infos to https://office.wikimedia.org/wiki/Contact_list#Search_Platform_and_Data_Platform_SRE ? If you need help understanding how that template works, give us a shout here! [13:02:02] (03CR) 10Mforns: [C: 03+1] "LGTM!!" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/952252 (https://phabricator.wikimedia.org/T344833) (owner: 10Phuedx) [13:09:02] gehel: for sure! will do [13:09:52] (03CR) 10Gmodena: WIP: Create a job to dump XML/SQL MW history files to HDFS (035 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/938941 (https://phabricator.wikimedia.org/T335862) (owner: 10Aqu) [13:12:04] (03CR) 10Mforns: [C: 03+1] Add analytics/metrics_platform/{app,web}/{click,view} schemas (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/952252 (https://phabricator.wikimedia.org/T344833) (owner: 10Phuedx) [13:12:57] (SystemdUnitFailed) firing: monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:33:21] 10Data-Platform-SRE, 10Discovery-Search (Current work), 10Patch-For-Review: Provision Zookeeper Cluster for storing Flink HA data - https://phabricator.wikimedia.org/T341792 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin1001 for host flink-zk2001.codfw.wmnet with OS bo... [13:55:27] gehel: done [13:55:31] 10Data-Engineering, 10Data Engineering and Event Platform Team, 10Wikidata, 10Wikidata-Query-Service, and 2 others: Five deleted Wikidata items pertaining to Wikimedia category pages still present in the Query Service - https://phabricator.wikimedia.org/T342593 (10dcausse) a:03dcausse Going to work on im... [14:14:11] 10Data-Platform-SRE: Investigate an-presto1002 failures - https://phabricator.wikimedia.org/T344808 (10BTullis) 05Open→03Resolved We haven't seen any further issues since reimaging, so I'm going to resolve this ticket for now.. [14:15:37] (03CR) 10Clare Ming: [C: 03+1] "lgtm too" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/952252 (https://phabricator.wikimedia.org/T344833) (owner: 10Phuedx) [14:39:35] 10Data-Platform-SRE: Write a cookbook for rolling reboot/restart of datahubsearch servers - https://phabricator.wikimedia.org/T344798 (10brouberol) @BTullis I see that we already have a [[ https://gerrit.wikimedia.org/r/plugins/gitiles/operations/cookbooks/+/refs/heads/master/cookbooks/sre/elasticsearch/rolling-... [14:52:46] 10Data-Platform-SRE, 10Discovery-Search (Current work), 10Patch-For-Review: Provision Zookeeper Cluster for storing Flink HA data - https://phabricator.wikimedia.org/T341792 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by bking@cumin1001 for host flink-zk2001.codfw.wmnet with OS bookwo... [15:28:20] 10Data-Platform-SRE: Write a cookbook for rolling reboot/restart of datahubsearch servers - https://phabricator.wikimedia.org/T344798 (10BTullis) >>! In T344798#9146488, @brouberol wrote: > @BTullis I see that we already have a [[ https://gerrit.wikimedia.org/r/plugins/gitiles/operations/cookbooks/+/refs/heads/m... [15:29:13] (03CR) 10Joal: "Reading this patch, a new wonder came to me regarding the setting change: Would the change to 10Mb max per partition lead to gobblin tryin" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/954968 (https://phabricator.wikimedia.org/T307959) (owner: 10Btullis) [15:35:31] (03CR) 10Btullis: Increase the max kafka message size for gobblin (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/954968 (https://phabricator.wikimedia.org/T307959) (owner: 10Btullis) [15:36:48] (03CR) 10Gmodena: Increase the max kafka message size for gobblin (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/954968 (https://phabricator.wikimedia.org/T307959) (owner: 10Btullis) [15:46:58] (03PS1) 10Peter Fischer: npm install [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/955357 [15:51:52] (03PS4) 10Peter Fischer: Adapt schema to meet latest requirements. [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/951829 (https://phabricator.wikimedia.org/T325315) [15:53:40] (03CR) 10Peter Fischer: Adapt schema to meet latest requirements. (031 comment) [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/951829 (https://phabricator.wikimedia.org/T325315) (owner: 10Peter Fischer) [16:30:41] 10Data-Platform-SRE, 10Discovery-Search, 10Infrastructure-Foundations: Unable to provision Ganeti VMs in CODFW - https://phabricator.wikimedia.org/T345754 (10bking) [16:38:47] 10Data-Platform-SRE, 10Discovery-Search, 10Infrastructure-Foundations: Unable to provision Ganeti VMs in CODFW - https://phabricator.wikimedia.org/T345754 (10taavi) You can log in with the [[ https://wikitech.wikimedia.org/wiki/Server_Lifecycle#Manual_installation | install-console ]] utility if the first Pu... [16:40:44] 10Data-Platform-SRE: Upgrade hadoop workers to bullseye - https://phabricator.wikimedia.org/T332570 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by btullis@cumin1001 for host an-worker1132.eqiad.wmnet with OS bullseye [16:53:03] 10Data-Platform-SRE: Upgrade hadoop workers to bullseye - https://phabricator.wikimedia.org/T332570 (10BTullis) The error I see is this: {F37667677,width=50%} I go to this over IPMI with: `ipmitool -I lanplus -H "an-worker1132.mgmt.eqiad.wmnet" -U root -E sol activate` I also logged in with the `sudo install_co... [17:08:34] 10Data-Platform-SRE: Upgrade hadoop workers to bullseye - https://phabricator.wikimedia.org/T332570 (10BTullis) I have executed: ` lvcreate -L 10g -n journalnode an-worker1132-vg ` Now we can see that there is a 10 GB journalnode volume. ` # lvs LV VG Attr LSize Pool Origin Data%... [17:12:57] (SystemdUnitFailed) firing: monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [17:25:38] 10Data-Platform-SRE: Upgrade hadoop workers to bullseye - https://phabricator.wikimedia.org/T332570 (10BTullis) The installation looks to be proceeding as expected now. I will check the other nodes to see if any others will experience the same issue. [17:26:29] (03CR) 10Gmodena: [C: 03+1] Adapt schema to meet latest requirements. [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/951829 (https://phabricator.wikimedia.org/T325315) (owner: 10Peter Fischer) [17:30:01] 10Data-Platform-SRE, 10Discovery-Search, 10Infrastructure-Foundations: Unable to provision Ganeti VMs in CODFW - https://phabricator.wikimedia.org/T345754 (10bking) Thanks @taavi ! I'll give it a shot. However, the reason I changed it in the first place is because the VM wouldn't provision in the first place... [17:32:55] 10Data-Platform-SRE: Upgrade hadoop workers to bullseye - https://phabricator.wikimedia.org/T332570 (10BTullis) I ran `sudo cumin A:hadoop-worker "lvs | grep journalnode"` from cumin1001 and it looks like this is the only host that is going to be affected by this issue. There are a few discrepancies in the VG na... [17:48:53] 10Data-Platform-SRE: Upgrade hadoop workers to bullseye - https://phabricator.wikimedia.org/T332570 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by btullis@cumin1001 for host an-worker1132.eqiad.wmnet with OS bullseye completed: - an-worker1132 (**PASS**) - Removed from Puppet and Puppet... [20:13:00] 10Data-Platform-SRE, 10Discovery-Search, 10Infrastructure-Foundations: Unable to provision Ganeti VMs in CODFW - https://phabricator.wikimedia.org/T345754 (10taavi) And just to mention it, [[ https://puppetboard.wikimedia.org/ | puppetboard ]] can also be used to figure out why a particular VM is failing to... [20:15:09] 10Data-Platform-SRE, 10Discovery-Search (Current work), 10Patch-For-Review: Provision Zookeeper Cluster for storing Flink HA data - https://phabricator.wikimedia.org/T341792 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin1001 for host flink-zk2002.codfw.wmnet with OS bo... [20:27:13] 10Data-Platform-SRE, 10Discovery-Search, 10Infrastructure-Foundations: Unable to provision Ganeti VMs in CODFW - https://phabricator.wikimedia.org/T345754 (10bking) Thanks, I checked Puppetboard too, but my VM wasn't there. On the bright side, your Puppet suggestion seems to have worked. Thanks again for yo... [20:55:38] 10Data-Platform-SRE, 10Patch-For-Review: Implement depool (source only) and keep-downtime options on data-transfer cookbook - https://phabricator.wikimedia.org/T340793 (10RKemper) [20:56:42] 10Data-Platform-SRE, 10Discovery-Search (Current work), 10Patch-For-Review: Provision Zookeeper Cluster for storing Flink HA data - https://phabricator.wikimedia.org/T341792 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by bking@cumin1001 for host flink-zk2002.codfw.wmnet with OS bookwo... [20:58:07] 10Data-Platform-SRE: Migrate WDQS and WCQS servers to Debian Bullseye - https://phabricator.wikimedia.org/T343124 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin1001 for host wdqs1006.eqiad.wmnet with OS bullseye [20:58:34] 10Data-Platform-SRE: Migrate WDQS and WCQS servers to Debian Bullseye - https://phabricator.wikimedia.org/T343124 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin1001 for host wdqs1007.eqiad.wmnet with OS bullseye [21:12:42] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:15:14] PROBLEM - Check systemd state on an-worker1085 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [21:20:54] RECOVERY - Check systemd state on an-worker1085 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [21:21:00] 10Data-Platform-SRE: Decommission wdqs100[3-5] - https://phabricator.wikimedia.org/T344198 (10RKemper) [21:22:43] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:32:40] 10Data-Platform-SRE, 10Discovery-Search (Current work), 10Patch-For-Review: Provision Zookeeper Cluster for storing Flink HA data - https://phabricator.wikimedia.org/T341792 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin1001 for host flink-zk2003.codfw.wmnet with OS bo... [21:34:40] (DruidSegmentsUnavailable) firing: More than 10 segments have been unavailable for mediawiki_history_reduced_2023_08 on the druid_public Druid cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Druid/Alerts#Druid_Segments_Unavailable - https://grafana.wikimedia.org/d/000000538/druid?refresh=1m&var-cluster=druid_public&panelId=49&fullscreen&orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DDruidSegmentsUnavailable [21:38:12] 10Data-Platform-SRE: Migrate WDQS and WCQS servers to Debian Bullseye - https://phabricator.wikimedia.org/T343124 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by bking@cumin1001 for host wdqs1006.eqiad.wmnet with OS bullseye completed: - wdqs1006 (**WARN**) - Downtimed on Icinga/Alertman... [21:40:30] 10Data-Platform-SRE: Migrate WDQS and WCQS servers to Debian Bullseye - https://phabricator.wikimedia.org/T343124 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by bking@cumin1001 for host wdqs1007.eqiad.wmnet with OS bullseye completed: - wdqs1007 (**WARN**) - Downtimed on Icinga/Alertman... [21:54:40] (DruidSegmentsUnavailable) resolved: More than 10 segments have been unavailable for mediawiki_history_reduced_2023_08 on the druid_public Druid cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Druid/Alerts#Druid_Segments_Unavailable - https://grafana.wikimedia.org/d/000000538/druid?refresh=1m&var-cluster=druid_public&panelId=49&fullscreen&orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DDruidSegmentsUnavailable [22:10:25] 10Data-Platform-SRE, 10Discovery-Search (Current work), 10Patch-For-Review: Provision Zookeeper Cluster for storing Flink HA data - https://phabricator.wikimedia.org/T341792 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by bking@cumin1001 for host flink-zk2003.codfw.wmnet with OS bookwo... [22:21:32] (03PS1) 10Tsevener: Add watchlist-specific properties to ios_watchlists [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/955405 (https://phabricator.wikimedia.org/T334968) [23:44:24] 10Data-Engineering, 10WMDE-Analytics-Engineering, 10Wikidata-Campsite, 10Event-Platform: Validation Error for eventlogging_WMDEBannerSizeIssue - https://phabricator.wikimedia.org/T344027 (10Jdlrobson)