[04:28:48] FIRING: PuppetZeroResources: Puppet has failed generate resources on cp3070:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [04:43:48] FIRING: [2x] PuppetZeroResources: Puppet has failed generate resources on cp3070:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [04:48:48] RESOLVED: [2x] PuppetZeroResources: Puppet has failed generate resources on cp3070:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [05:34:48] FIRING: PuppetZeroResources: Puppet has failed generate resources on cp3069:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [05:49:48] RESOLVED: PuppetZeroResources: Puppet has failed generate resources on cp3069:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [07:42:36] Amir1: what's the canonical source for standard sizes? [10:57:41] vgutierrez: this is one $wgThumbnailSteps = [ 20, 40, 60, 120, 250, 330, 500, 960 ]; there is another set we could exclude which are for file loading when you click on it https://noc.wikimedia.org/wiki.php#wgUploadThumbnailRenderMap [10:57:57] [ 20, 40, 60, 120, 250, 330, 500, 960 ] are the 90% of traffic now [12:14:19] 06Traffic, 06Data-Engineering, 10DPE HAProxy Migration, 13Patch-For-Review: Add HAproxy termination field to webrequest - https://phabricator.wikimedia.org/T387454#10767900 (10JAllemandou) Thanks @Fabfur ! When the data flows in, we need a schema change and a job modification on our side to make it appear... [12:33:06] topranks, XioNoX we've been experiencing issues with magru RIPE ipv6 atlas anchor during the week, and by issues I mean getting alerted on -operations [12:33:07] https://grafana.wikimedia.org/goto/prIISQJNR?orgId=1 [12:46:19] nothing interesting on NEL though [12:52:41] vgutierrez: yeah I'm just looking at the stats now to see if I can find any pattern [12:53:16] NELs were my next step. I need to get more eyes on those alerts if they fire and try to investigate when the problem occurs, not much we can do now. [13:07:30] topranks: we had another occurrence a few minutes ago [13:07:40] and it doesn't seem to back to previous values yet [13:09:37] the ping success ratio? [13:10:29] yup [13:10:53] now it's back to the baseline [13:11:25] yeah marginal things like that will happen, I don't think much point chasing [13:11:38] when there is a steady decrease for a period of time it's a different story [13:24:30] ack [15:04:40] FIRING: [6x] VarnishHighThreadCount: Varnish's thread count on cp5025:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [15:05:09] FIRING: [2x] LVSHighCPU: The host lvs5005:9100 has at least its CPU 0 saturated - https://bit.ly/wmf-lvscpu - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs5005 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighCPU [15:05:17] that's not good... [15:05:19] hmm [15:06:32] >2Mpps [15:09:40] FIRING: [8x] VarnishHighThreadCount: Varnish's thread count on cp5025:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [15:10:09] RESOLVED: [2x] LVSHighCPU: The host lvs5005:9100 has at least its CPU 0 saturated - https://bit.ly/wmf-lvscpu - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs5005 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighCPU [15:19:40] FIRING: [8x] VarnishHighThreadCount: Varnish's thread count on cp5025:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [15:24:40] FIRING: [8x] VarnishHighThreadCount: Varnish's thread count on cp5025:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [15:29:40] RESOLVED: [8x] VarnishHighThreadCount: Varnish's thread count on cp5025:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [17:30:45] 06Traffic, 06[Archived]Wikidata Dev Team, 10Prod-Kubernetes, 06SRE, and 5 others: Frequent 500 Errors and Timeouts When Adding Statements to New Item or Lexeme-typed Properties - https://phabricator.wikimedia.org/T374230#10769015 (10ArthurPSmith) Problem is still there - I just created P13478 (first item-v...