[06:02:32] parsercache in codfw is getting too many connections [06:06:18] This DELETE has been running for hours and got everything overloaded: https://phabricator.wikimedia.org/P43429 [10:27:07] YA stupid cumin question; can I do input redirection? e.g. if I have a file of grep patterns and wanted to do something like sudo cumin --force hosts "grep -f - /some/logfile" collected_output ? [10:34:45] moritzm: FYI, I got this in the dns netbox cookbook: [10:34:58] https://www.irccloud.com/pastebin/yWpmf2dw/ [10:42:44] arturo: will be fixed with https://gerrit.wikimedia.org/r/c/operations/dns/+/884276/ [10:43:44] Emperor: if the /path/containing/patterns is on the targetted hosts, yes, but I am willing to bet it isn't ? [10:44:48] arturo: the patch has been merged, you can re-run the cookbook [10:45:01] (merged and deployed on authdns) [10:45:08] e.g. this works . sudo cumin deploy1002.eqiad.wmnet 'cat godog (or o11y), any idea what's up with https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=1&host=flowspec1001 ? I decom it using the cookbook (cf. https://phabricator.wikimedia.org/T328009#8560574 ) but it's still in Icinga [11:03:56] hmm, looks like a puppetDB bug? it's still in https://puppetboard.wikimedia.org/node/flowspec1001.eqiad.wmnet (cc jbond ) [11:06:22] looking [11:06:47] XioNoX: mmhh yeah it'll show up in icinga as long as it is active in puppet(db) [11:06:57] ditto for prometheus FWIW [11:10:17] akosiaris: yeah, I was (lazily) hoping to avoid having to copy the patterns-file around. [11:13:20] you can just put it under people.wikimedia.org and do something like cumin target 'grep -f - /var/log $(< curl -s http://people.w.o/emperor/blah)' [11:13:46] it's a hack, but it should work [11:14:57] Emperor: bit of a vague question but are there established historical patterns/examples of what causes 500 errors similar to the thumbor ones? [11:16:35] hnowlan: "Emperor is on vacation or deep in the weeds of debugging something else" [11:17:28] hnowlan: Um, less unhelpfully, my best-current-theory is that it's typically some sort of load-relation, and a poor failure mode of some bit of the swift/memcached/auth interaction [11:18:07] hnowlan: I think they are sometimes something else in the stack timing out waiting for swift [11:18:28] XioNoX: it should be gone nw, still investigating the change in behaviour [11:18:43] akosiaris: hadn't thought of people.w.o, thanks for the suggestion [11:19:31] jbond: confirmed, thx [11:20:55] Emperor: cool, thanks [13:03:38] jelto: mutante: I have fixed Apache 2 not starting up when Gerrit hosts are rebooted https://phabricator.wikimedia.org/T326125 It was all due to Apache listening explicitly on an IPv6 but when it starts on machine boot, the IPv6 is not ready yet which causes apache to fail to start [13:03:56] solved by making it listen to any address (`Listen 80` and `Listen 443`) and it works like a charm [13:04:12] next reboot would have the service come back without human interventions :] [13:05:14] great thank you for taking care of that :) [14:16:00] If I needed to run something on a server in an ad-hoc fashion, and the easiest way was via its docker image, what would be the best way to go about that? Just `apt install ...` docker, and `apt uninstall ...` afterward? [14:22:35] I always use `apt purge` but it's probably OK as docker doesn't have a ton of deps. [14:23:32] urandom: that lines up with what I've been told in the past, just also !log what you're doing in SAL [14:24:01] cool; thanks! [14:27:55] on paper that seems like a very clear L3 violation though [14:38:11] if anybody has time for a quick python review, I filed https://gerrit.wikimedia.org/r/c/operations/software/httpbb/+/884285 for httpbb :) [14:44:40] taavi: definitely at least goes against the spirit (if not the letter) [17:13:55] elukey: thank you! I need to cut a new httpbb release anyway, I'll roll that in [17:49:43] elukey: btw if you're in a hurry, you can test with the string "123" instead of the int 123 :) same HTTP body