[02:12:05] (KafkaReplicationFactorTooLow) firing: ... [02:12:11] Kafka topic eqiad.cpjobqueue.retry.mediawiki.job.phonosIPAFilePersist replication factor is too low on main-eqiad - https://wikitech.wikimedia.org/wiki/Kafka/Administration#Increase_a_topic's_replication_factor - https://grafana.wikimedia.org/d/000000234/kafka-by-topic?var-kafka_cluster=main-eqiad&var-kafka_broker=All&var-topic=eqiad.cpjobqueue.retry.mediawiki.job.phonosIPAFilePersist&viewPanel=40 - ... [02:12:11] https://alerts.wikimedia.org/?q=alertname%3DKafkaReplicationFactorTooLow [02:17:05] (KafkaReplicationFactorTooLow) resolved: ... [02:17:05] Kafka topic eqiad.cpjobqueue.retry.mediawiki.job.phonosIPAFilePersist replication factor is too low on main-eqiad - https://wikitech.wikimedia.org/wiki/Kafka/Administration#Increase_a_topic's_replication_factor - https://grafana.wikimedia.org/d/000000234/kafka-by-topic?var-kafka_cluster=main-eqiad&var-kafka_broker=All&var-topic=eqiad.cpjobqueue.retry.mediawiki.job.phonosIPAFilePersist&viewPanel=40 - ... [02:17:05] https://alerts.wikimedia.org/?q=alertname%3DKafkaReplicationFactorTooLow [10:53:15] (03PS1) 10Kosta Harlan: Add ip_reputation/score schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/1011281 (https://phabricator.wikimedia.org/T354597) [13:17:20] 10Quarry: Deploy magnum cluster for quarry - https://phabricator.wikimedia.org/T349032#9633851 (10rook) @SD0001 @Audiodude could yinz take a look at quarry-test.wmcloud.org and see if there are any obvious problems? The data is all a duplicate of the production quarry, though now everything is running in k8s (ex... [14:21:12] (03PS4) 10Kosta Harlan: Add ip_reputation/score schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/1011281 (https://phabricator.wikimedia.org/T354597) [14:25:23] (03PS5) 10Kosta Harlan: Add ip_reputation/score schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/1011281 (https://phabricator.wikimedia.org/T354597) [16:02:35] (03CR) 10Ottomata: Add ip_reputation/score schema (033 comments) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/1011281 (https://phabricator.wikimedia.org/T354597) (owner: 10Kosta Harlan) [16:15:37] 06Data-Engineering, 10Metrics Platform Backlog, 10Event-Platform: Document instructions for deleting an event stream and its usages - https://phabricator.wikimedia.org/T360210 (10Ottomata) 03NEW [17:11:15] (03PS6) 10Kosta Harlan: Add ip_reputation/score schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/1011281 (https://phabricator.wikimedia.org/T354597) [17:11:24] (03CR) 10Kosta Harlan: Add ip_reputation/score schema (033 comments) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/1011281 (https://phabricator.wikimedia.org/T354597) (owner: 10Kosta Harlan) [17:55:22] 10Quarry: Scrape prometheus metrics from Quarry - https://phabricator.wikimedia.org/T360220#9634600 (10taavi) They are supposed to be available on https://prometheus.wmcloud.org ([[ https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Monitoring | docs ]]) but seems like my code to merge the results of the... [17:59:13] 06Data-Engineering, 06Data-Platform-SRE: spark3 in yarn master mode exhibits warnings when the HDFS namenodes are in the failed over state - https://phabricator.wikimedia.org/T338137#9634622 (10BTullis) a:05BTullis→03None [18:00:11] 06Data-Engineering, 06Data-Platform-SRE, 07Epic: Alluxio for Improved Superset Query Performance - https://phabricator.wikimedia.org/T288252#9634628 (10BTullis) a:05BTullis→03None [18:00:30] 10Data-Engineering (Sprint 9), 06Data-Platform-SRE, 13Patch-For-Review: [Data Platform] Test Alluxio as cache layer for Presto - https://phabricator.wikimedia.org/T266641#9634629 (10BTullis) a:05BTullis→03None [18:53:14] 06Data-Engineering, 06Data-Platform-SRE, 06Discovery-Search, 06Java-Scala-Standardization, and 3 others: Adapt gitlab pipelines for the new wmf-jvm-parent-pom - https://phabricator.wikimedia.org/T358841#9634740 (10RKemper) [20:44:54] 10Quarry: Deploy magnum cluster for quarry - https://phabricator.wikimedia.org/T349032#9634942 (10SD0001) Had a brief look and it looks good to me. Thanks! I'm no longer able to invoke `kubectl` from quarry-bastion, though. It says `Unable to connect to the server: dial tcp 172.16.4.237:6443: connect: no route... [20:55:44] 10Quarry: Deploy magnum cluster for quarry - https://phabricator.wikimedia.org/T349032#9634948 (10rook) Oh there's a new k8s cluster for it. I've put the updated config in /opt/quarry-123-2.config [20:57:46] 10Quarry: store quarry state in object storage - https://phabricator.wikimedia.org/T360233 (10rook) 03NEW [20:57:59] 10Quarry: store quarry state in object storage - https://phabricator.wikimedia.org/T360233#9634995 (10rook) [20:58:01] 10Quarry: Deploy magnum cluster for quarry - https://phabricator.wikimedia.org/T349032#9634994 (10rook) [20:59:18] 10Quarry: Deploy magnum cluster for quarry - https://phabricator.wikimedia.org/T349032#9634997 (10rook) I've opened T360233 to manage the tofu state in an object store. That way things shouldn't be stored locally, and tofu/deploy.sh can be run from wherever in the bastion. [21:58:33] (03PS1) 10Joal: Update ProduceCanaryEvents job [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1011354 (https://phabricator.wikimedia.org/T341229) [21:59:58] 10Data-Engineering (Sprint 9), 13Patch-For-Review: [Dataset Config Store] Deploy poc to dse-k8s - https://phabricator.wikimedia.org/T357434#9635088 (10CodeReviewBot) tchin opened https://gitlab.wikimedia.org/repos/releng/gitlab-trusted-runner/-/merge_requests/64 Add dataset config store project to trusted run... [22:02:58] 06Data-Engineering, 06Data-Platform-SRE: Cleanup superset related resources from puppet - https://phabricator.wikimedia.org/T358570#9635115 (10brouberol) a:03brouberol [22:03:07] 06Data-Engineering, 06Data-Platform-SRE, 07Epic, 13Patch-For-Review: Remove all resources associated with the superset-(next-)k8s.wimedia.org domains - https://phabricator.wikimedia.org/T358480#9635116 (10brouberol) a:03brouberol [22:04:31] Starting build #31 for job wikimedia-event-utilities-maven-release-docker [22:04:37] (03CR) 10CI reject: [V:04-1] Update ProduceCanaryEvents job [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1011354 (https://phabricator.wikimedia.org/T341229) (owner: 10Joal) [22:09:02] Project wikimedia-event-utilities-maven-release-docker build #31: 09SUCCESS in 4 min 30 sec: https://integration.wikimedia.org/ci/job/wikimedia-event-utilities-maven-release-docker/31/ [22:12:51] (03CR) 10Joal: "recheck" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1011354 (https://phabricator.wikimedia.org/T341229) (owner: 10Joal) [22:29:41] (03CR) 10Gmodena: [C:03+1] "Code changes LGTM, we went over it in pair programming." [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1011354 (https://phabricator.wikimedia.org/T341229) (owner: 10Joal) [22:45:15] (03PS2) 10Joal: Update ProduceCanaryEvents job [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1011354 (https://phabricator.wikimedia.org/T341229)