[09:05:52] pfischer: o/ if you have some time could you review https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CirrusSearch/+/1146938 and the two other patches in this same chain, I'd like to backport them this afternoon if possible [09:06:16] dcausse: sure, one sec. [09:06:20] thanks! [09:42:56] errand+lunch [09:50:09] dcausse: LGTM, just found nits [12:09:17] pfischer: thanks, looking [12:23:28] pfischer: fixed [12:25:22] dcausse: LGTM, +1 [12:26:33] pfischer: for mediawiki code repo, the reviewer generally hit +2 (unless they prefer someone else to do it) [12:27:29] don't hesitate to +2 but if you prefer Erik to have a look it's completely fine :) [12:27:51] dcausse: Oh, now I am confused, I thought +2 leads to a merge and you wanted to wait for a backport window. [12:29:38] pfischer: yes... that's confusing... for backports you'll see that it's targetted at a branch named like "wmf/x.y.z-wmf.p" (e.g. https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CirrusSearch/+/1146643) [12:29:57] these ones must only be +2ed during the backport window [12:30:16] only exception is mediawiki-config that does not have branches like that [12:30:37] where all patches must be +2ed only during the deployment window [12:32:02] dcausse: Okay, so I’ll +2 your chain of CRs [12:34:32] thanks! [13:13:09] o/ [13:19:34] o/ [13:49:10] we're getting an alert for ` CirrusStreamingUpdaterRateTooLow: CirrusSearch update rate from flink-app-consumer-search`, anything I can do to help? [13:49:19] looking [13:53:25] it's in eqiad: "ElasticsearchStatusException[Unable to parse response body]; nested: ResponseException[method [POST], host [https://search.svc.eqiad.wmnet:9243], URI [/_bulk?timeout=120000ms], status line [HTTP/1.1 504 Gateway Timeout]" [13:53:54] perhaps an host eqiad was pooled but not yet ready to handle connection? [13:54:30] "upstream request timeout" or slow [13:54:46] seems to be back [13:54:59] looks like it resolved on its own...and yeah, that might be. I'm already preparing a puppet patch to add more hosts back into conftool, but it shouldn't matter all that much [13:56:42] It would be nice if we could tell at a glance which hosts are failing health checks but I'm not sure how to do that. I thought config-master showed it but apparently that's not the case [14:18:30] apparently there's not an easy way to find this yet, created T394676 for when I have time [14:18:32] T394676: Create tool that displays real-time load balancer health status per pool/node - https://phabricator.wikimedia.org/T394676 [14:31:41] inflatador: should failed hosts show up in the graph "Monitors Down" at https://grafana.wikimedia.org/d/000000421/pybal?orgId=1&from=now-24h&to=now&timezone=utc&var-datasource=000000006&var-server=$__all&var-service=search-https_9243 [14:41:32] dcausse Nice find! I think that does the trick [14:50:22] quick CR for starting EQIAD row C if anyone has time to look https://gerrit.wikimedia.org/r/c/operations/puppet/+/1147779 [15:00:14] firefox is being stupid so might be a bit late to mtg [15:38:11] pfischer: it's https://gerrit.wikimedia.org/r/q/project:wikidata/query/deploy for the project using git-lvs to pull from archiva & https://gerrit.wikimedia.org/r/q/project:wikidata/query/rdf for the java project pushing these artifacts [15:38:29] dcausse: thanks! [15:39:24] might be easy to fix actually, the pull is on our side https://gerrit.wikimedia.org/r/plugins/gitiles/wikidata/query/deploy/+/refs/heads/master/deploy-prepare.sh [15:47:31] It looks like we already use git lfs, so we probably don't have a dependency on archiva [17:17:03] dinner [18:00:44] Oops, forgot I have a doctor appointment at 2...lunch time!\ [19:02:10] ebernhardson: 1:1? [19:06:35] gehel: doh, sec [19:58:16] back [20:50:09] we got a systemd alert for `curator_actions_apifeatureusage_codfw.service on apifeatureusage1001:9100` ... probably cross-cluster settings again? Checking it out [20:50:41] hmm, apifeatureusage shouldn't really interact with cross-cluster directly. maybe indirectly? [20:52:34] oops, yeah, I was thinking of search-loader [20:53:51] seems to be a different problem anyway, our version of curator doesn't work with OpenSearch [20:53:52] ` ERROR Elasticsearch version 1.3.20 incompatible with this version of Curator (5` [20:54:43] I believe Observability has their own version of Curator that works. We also have a ticket somewhere about using the built-in features of OS for this instead. Checking... [20:56:19] ahh, yea i suppose that makes sense [20:56:37] T386525 is for using the built-in features [20:56:41] T386525: Replace curator with opensearch index state management - https://phabricator.wikimedia.org/T386525 [21:07:33] Codesearch is down for me, but maybe that O11y fork is in apt browser somewhere [21:09:18] https://apt-browser.toolforge.org/bullseye-wikimedia/thirdparty/opensearch1/ yeah, I think we could probably install the pkg from here and see what happens [22:27:08] ryankemper looks like the relforge decom set off some alerts. I acked and downtimed but we should probably disable alerts for relforge, or at least keep them from going to #operations [23:10:47] ack