[08:19:01] quick q: remind me what we need to do process wise for rebooting mwmaint hosts. Just announce it a couple of days in advance so that people don't start new maint jobs in screen and reboot outside of a deployment window ? [08:26:24] <_joe_> akosiaris: basically, yes [08:34:26] yeah, that was what was done in the past for mwmaint/eqiad reboots (although we often scheduled eqiad reboots for the periods when eqiad was passive in data centre failovers) [09:43:28] _joe_ / akosiaris any objections to https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/808208 moving forward, now that the patch for T312225 has been merged? [09:43:32] T312225: Envoy cannot connect to image-suggestion service - https://phabricator.wikimedia.org/T312225 [09:43:50] <_joe_> I'd ask jayme [09:44:14] 👀 [09:46:12] the unresolved comments from _joe_ seem to be resolved. Not sure about labs, though [09:46:43] <_joe_> uhh [09:46:49] labs needs to continue using the WCMS instance until there is a public API for the new app [09:46:54] <_joe_> yep [09:47:21] <_joe_> kostajh: gimme 5 mins to verify a couple things [09:47:30] they are two different applications, with a mostly similar spec and output format 😅 [09:48:10] from the services-proxy side it's fine now. curl localhost:6030/public/image_suggestions/suggestions/enwiki/2383439 looks good on appservers [09:48:40] from k8s it probably won't work though [09:49:48] that would need https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/811744 to be merged [09:51:50] afaik, we don't need it to work from k8s unless there are imminent plans around debug servers or appservers switching to that? [09:53:47] <_joe_> I would merge that too but we can live with the k8s mwdebug cluster being unable to call that service for a few hours :) [09:54:04] <_joe_> so yeah green light I'd say [09:54:10] +1 [09:54:51] thanks! [10:06:25] it seems to work fine in terms of connections, but we had to revert in order to fix some application code issues and how it interacts with the new service. Will try again later today, probably. [11:46:44] So I've run into an "interesting" problem with our recent update of ORES to Buster. Ever since, one of the Kafka topics it feeds has fallen quiet. Specifically, the uwsgi app sends request logs to rsyslogd on localhost, which does some mangling and then sends it to Kafka proper. Logn story short, when uwsgi is configured with logger=logstash socket:localhost:11514, things break. Configuring [11:46:46] it with logger=logstash socket:127.0.0.1:11514 seems to work. ANyone seen this before? [12:06:06] Rsyslog not listening on ipv6? [12:06:49] No, it does listen on both v4 and v6 [12:49:39] it could be related to this? "Jul 7 00:00:13 ores1009 rsyslogd: omkafka: action will suspended due to kafka error -187: Local: All broker connections are down [v8.1901.0 try https://www.rsyslog.com/e/2422 ]" [13:12:50] Nah, that happens every now and then, and doesn't need restarts or addressing [13:14:04] this may be happening also to other uwsgi-based logs [13:14:44] maybe netbox etc..? [13:16:54] having issues running the downtime cookbook, can anyone assist? I'm using the syntax as described here https://github.com/wikimedia/operations-cookbooks/blob/master/cookbooks/sre/hosts/downtime.py#L28 and getting "no hosts provided" [13:17:51] https://phabricator.wikimedia.org/P30946 full cmd and stack trace [13:18:38] you're missing a star at the end of the hostname [13:18:45] otherwise you need to provide a FQDN [13:19:02] vgutierrez tried that too, same issue...will post the response on the paste [13:19:32] used FQDN too, with or w/out single quotes on the host [13:20:56] inflatador: hmmm cloudelastic1003 isn't a valid host on the PuppetDB [13:21:08] vgutierrez@cumin1001:~$ sudo -i cumin 'cloudelastic10*' [13:21:08] 5 hosts will be targeted: [13:21:09] cloudelastic[1001-1002,1004-1006].wikimedia.org [13:21:57] vgutierrez Awesome, thank you. That one has failed reimages [13:22:06] confusing error message though :) [13:22:23] Yeah, I'll file a ticket, but thanks again for clearing that up [14:01:40] Turns out it was not the localhost thing. Something else is fucky. We have two loggers in uwsgi: a plaintext one to a file and a JSON one sending to rsyslog. The latter one is broken. If I add another logger that is indetical to the JSON but logging to a local file, the JSON-by-udp one starts working. [14:18:35] klausman: let's open a task with details, so people can chime in [14:18:43] ack [14:20:33] You doing it or should I? [14:28:33] https://phabricator.wikimedia.org/T312550 <- uwsgi weirdness [15:48:52] brett: puppet-merged your change as well [16:02:51] akosiaris: Thank you