[12:24:43] <jbond>	 FYI about to merge a refactor of monitoring classes 725045
[12:36:08] <jbond>	 reverting there was a minor bug
[12:47:08] <godog>	 ack, thanks jbond 
[12:48:24] <jbond>	 fyi some checks lost all there parents for drmrs, also the anycast monitoring broke for about 10 mins
[12:48:37] <jbond>	 should all be fixing it self now
[13:11:15] <godog>	 *nod*, is https://gerrit.wikimedia.org/r/c/operations/puppet/+/744787 good to be reviewed jbond ?
[13:11:47] <godog>	 also what was wrong with the first iteration?
[13:11:54] <jbond>	 godog: im just running a pcc now https://puppet-compiler.wmflabs.org/compiler1002/32843/
[13:12:18] <jbond>	 godog: specifically https://gerrit.wikimedia.org/r/c/operations/puppet/+/744787/4/hieradata/common/profile/monitoring.yaml was missing the drmrs parent
[13:12:39] <jbond>	 the first patch in monitoring::hst #23 had 
[13:12:58] <jbond>	 $nagios_address         = pick($ip_address, $host_fqdn) and not $nagios_address         = pick($host_fqdn, $ip_address)
[13:14:06] <godog>	 ack, I'll take a look at the new patch shortly
[13:14:27] <jbond>	 ack thanks
[13:38:45] <jbond>	 godog: fyi going for a second attempt of that patch.  this time ill disable puppet and roll it out a bit more gradually
[13:39:36] <godog>	 jbond: +1
[15:18:06] <cdanis>	 did the prometheus servers in eqiad just oom and restart...? https://grafana.wikimedia.org/d/GWvEXWDZk/prometheus-server?orgId=1&refresh=1m
[15:22:08] <godog>	 looks like it
[15:22:21] <godog>	 codfw too
[15:22:50] <godog>	 codfw had 3 months uptime (!) well deserved I'd say
[15:27:23] <topranks>	 chanced across this post, not sure if it's a worry for us?
[15:27:24] <topranks>	 https://discord.com/channels/766613591994007562/766613591994007565/917749600717258773
[15:27:32] <topranks>	 Not that sorry
[15:27:48] <topranks>	 https://twitter.com/Dinosn/status/1468144253239021574
[15:30:29] <godog>	 topranks: interesting! thanks for the heads up, I'm takign a look
[15:31:14] <topranks>	 thread suggests it affects all plugins, but unsure if we have any?  Also don't see mention of a CVE or other post yet.
[15:36:19] <godog>	 yeah unclear how to reproduce it, good to keep an eye on it heh!
[15:41:28] <RhinosF1>	 We can't reproduce on 8.3 godog topranks
[15:41:40] <RhinosF1>	 I just asked our team to check Miraheze's
[15:42:31] <godog>	 ack, thanks RhinosF1! appreciate it
[15:44:29] <topranks>	 RhinosF1: thanks, apologies for the false alarm
[15:46:59] <RhinosF1>	 topranks: i'd rather a false alarm than an xmas day RCE
[16:20:01] <godog>	 (the bug is real btw, only 8.x is affected though)
[16:20:12] <godog>	 we're running 7 still
[16:22:39] <godog>	 cc moritzm ^
[16:23:56] <moritzm>	 ack, thx
[16:24:30] <moritzm>	 is there a formal announcement/confirmation by Grafana yetß
[16:24:31] <moritzm>	 is there a formal announcement/confirmation by Grafana yet?
[16:25:16] <godog>	 not afaik
[17:28:38] <RhinosF1>	 moritzm: i sent security@ an email 40 minutes ago
[17:29:18] <moritzm>	 RhinosF1: thanks :-)
[17:31:03] <RhinosF1>	 moritzm: https://grafana.com/blog/2021/12/07/grafana-8.3.1-8.2.7-8.1.8-and-8.0.7-released-with-high-severity-security-fix/
[17:31:08] <RhinosF1>	 about 10 seconds ago
[17:31:25] <RhinosF1>	 cc godog 
[17:32:22] <lmata>	 Thanks for the update!
[17:55:26] <dcausse>	 o/ we just saw a ~30mins gap (missing) in some graphite metrics coming from MW
[17:57:43] <cdanis>	 dcausse: there were some issues with graphite1004, I believe herron was looking into it
[17:58:05] <dcausse>	 thanks!
[18:00:00] <herron>	 yes the host had a very high load avg and I power cycled it, it is back online now though still looking into it
[19:39:04] <dancy>	 Hello observability folks.  Can someone tell me what the size limit is that will result in a log message going to the jsonTruncated channel?
[19:39:19] <dancy>	 asking in regard to T297219
[20:16:14] <cwhite>	 dancy: jsonTruncated tag gets applied when type=syslog, program=mediawiki, and message does not match `^{.*}$`.  Truncation could be caused within mediawiki itself (monolog), the UDP transport, or somewhere in rsyslog.  On the output side, ElasticSearch has a max 32k keyword indexing limit for lucene's term byte-length limit.
[21:09:15] <dancy>	 thx