[09:58:59] lunch [13:06:37] I'm going to try a rolling restart on eqiad, as we haven't tested the rolling-operation cookbook on our large clusters yet [13:21:10] aaannddd...we have our first failure [13:21:55] it's pretty minor, we just need to decom the servers mentioned in T395331 [13:21:55] T395331: Find/ban "ghost" Elastic hosts - https://phabricator.wikimedia.org/T395331 [13:41:33] o/ [13:45:59] \o [13:55:43] o/ [13:58:49] .o/ [13:59:56] E_TOO_MANY_MEETINGS: I won't be able to join the Wednesday meeting. [14:23:33] WDQS is not happy now ;( https://grafana.wikimedia.org/public-dashboards/5f8884e809234a35b90213608f2a8dbf [14:33:33] ^^ this was due to a maintenance on wikidata DB, we're all good for now [15:01:41] Created T395465 to discuss the titlesuggest rebuild issu [15:01:41] T395465: Investigate EQIAD daily completion suggester rebuild failure - https://phabricator.wikimedia.org/T395465 [15:25:33] CR for updating curator on apifeatureusage if anyone has time to look https://gerrit.wikimedia.org/r/c/operations/puppet/+/1151687 [15:48:03] ^^ nm, been reviewed [16:42:41] back [16:43:54] heading out [16:45:16] .o/ [17:08:51] https://gerrit.wikimedia.org/r/c/operations/puppet/+/1151754 'nother CR for apifeatureusage if anyone has time to look [17:21:01] lunch, back in ~40 [18:22:17] ^^ merged the above [18:26:35] inflatador: yup change looks good. we can merge a followup patch to remove the now-absented resource block next (once puppet has ran everywhere ofc) [18:33:46] ryankemper I'm still getting puppet failures, something is hard-coding the curator version to `5.8.5-1~wmf5+deb11u1`. Best guess is https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/modules/opensearch/manifests/curator.pp#16 [18:34:42] I thought https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/hieradata/role/common/apifeatureusage/logstash.yaml#64 would avoid the hard-coding but I guess not [18:35:53] It'll probably work if we stop importing that profile and just install the package with 'present' in our own puppet plan [18:38:04] heading to lunch but what's the value you expect for curator version? [18:42:14] I just want it to install, no preference on version. I think that's being imposed by the curator.pp [19:09:04] OK, https://gerrit.wikimedia.org/r/c/operations/puppet/+/1151775 is up. Based on the output of PCC, this should do the trick [19:16:36] confirmed, I just merged it...now let's see if the curator units can start [19:32:41] yeah, looks like they can. I should've let the timer start them though, looks like they just run forever if not [19:32:53] err...not forever, but they don't background [19:35:24] awesome! [19:44:57] Yeah, the next bit of fun is gonna be decomming all those eqiad elastic hosts, ref https://gerrit.wikimedia.org/r/c/operations/puppet/+/1151784 [20:16:23] ryankemper ^^ is ready for review. I'm a bit confused why PCC is failing on one of the hosts we're decommissioning, but I don't think it matters much [21:05:09] ryankemper you're frozen in the Meet, can you try rejoining? Assuming you didn't lose Internet completely [21:18:10] inflatador: https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/1151797