[01:47:38] (LVSHighCPU) firing: (4) The host lvs5002:9100 has at least its CPU 0 saturated - https://bit.ly/wmf-lvscpu - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs5002 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighCPU [01:52:38] (LVSHighCPU) resolved: (4) The host lvs5002:9100 has at least its CPU 0 saturated - https://bit.ly/wmf-lvscpu - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs5002 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighCPU [06:36:19] 10Traffic, 10Performance-Team, 10SRE, 10serviceops, 10Patch-For-Review: Progressive Multi-DC roll out - https://phabricator.wikimedia.org/T279664 (10tstarling) [06:44:14] 10Traffic, 10Performance-Team, 10SRE, 10serviceops, 10Patch-For-Review: Progressive Multi-DC roll out - https://phabricator.wikimedia.org/T279664 (10tstarling) [08:36:09] 10netops, 10Infrastructure-Foundations, 10SRE, 10netbox: Netbox: Allocation of .0 and .255 IP address from 10.65.3.0/16 and 10.65.2.0/16 network - https://phabricator.wikimedia.org/T314183 (10ayounsi) Even though they look surprising, they are valid IPs. We do `ip_address = prefix.get_first_available_ip()`... [13:25:01] 10netops, 10Infrastructure-Foundations, 10SRE: Lumen link between cr2-eqiad and cr2-esams down - July 2022 - https://phabricator.wikimedia.org/T313783 (10ayounsi) 05Open→03Resolved a:03ayounsi > Issue on the Subsea portion, betwen Bellport and Bude. No event and unable to isolate the cause [14:59:39] bblack: flaggging https://phabricator.wikimedia.org/T138093#8117992 for your attention / input :) [15:01:05] we abused std.random() in the past to redirect a fraction of traffic [15:01:18] but we didn't need to take into account the URL IIRC [15:02:39] stuff like https://gerrit.wikimedia.org/r/c/operations/puppet/+/550868 [15:25:44] 10netops, 10Infrastructure-Foundations, 10SRE, 10netbox: Netbox: Allocation of .0 and .255 IP address from 10.65.3.0/16 and 10.65.2.0/16 network - https://phabricator.wikimedia.org/T314183 (10Papaul) @ayounsi make sense if it is /16 and yes it is working on IDRAC. The only issue is we received an alert on... [15:47:57] vgutierrez: ack; I think random might not work well, for the reason mentioned in the ticket: the cache key for a given ReqURL would bounce between the normal and non-normal forms, so we'd be fragmenting the cache [15:53:11] ori: https://phabricator.wikimedia.org/T138093#8120045 [16:41:44] nice, thank you! [16:42:55] 10netops, 10Infrastructure-Foundations, 10SRE, 10netbox: Netbox: Allocation of .0 and .255 IP address from 10.65.3.0/16 and 10.65.2.0/16 network - https://phabricator.wikimedia.org/T314183 (10ayounsi) 05Open→03Resolved a:03ayounsi Assuming the duplicate IPs issue got solved. Feel free to re-open if... [17:36:06] (am i right that math operators are not available in native VCL in varnish 6? they're in the manual for version 7 but not 6.) [19:07:20] ori: I'm pretty sure VCL has always had basic math operators: + - * / > < == and such [19:10:21] https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/modules/varnish/templates/analytics.inc.vcl.erb#60 [19:10:35] ^ an example in existing VCL which uses div and mul [19:24:20] bblack: nice :) but no arbitrary variables it seems? [19:24:52] everything has to be squirreled in some header field? [19:25:04] (and can those have type int?) [19:42:01] yeah no arbitary variables in basic VCL. There is a vmod_var that gives you true temporaries, but we haven't ever moved over all our pseudo-headers to it. [19:42:22] but also, all our vcl files are ERB templates. So you can always use that for cleanliness [19:42:34] (for non-runtime calculations) [19:42:43] yeah, I think that's easiest [19:42:49] since it compiles to C anyways, constant math will get precalculated [19:43:23] but for clarity, you could define a named variable like QUERYSORT_ROLLOUT_PERCENT in an ERB fragment. [19:44:26] it's tempting to want to pre-do the math in ruby too ("4294967296 * QUERYSORT_ROLLOUT_PERCENT / 100" in my example) [19:44:48] yeah, I think that might be easiest [19:44:51] but then you've gotta figure out if that really works correctly across multiple programming languages in the way you'd want, with int limits, etc [19:45:07] whereas it's pretty straightforward to know that expression works in C, given we know the data types involved [19:45:43] (and that the compiler will do the * and / at compile-time) [19:46:27] makes sense [19:47:37] I think there should be an intermediate step in the roll-out before we start doing a fraction of all text requests [19:47:52] where we target a particular wiki [19:49:03] maybe: normalize all requests to test/test2 wikis, wait a couple of days, normalize all requests for mediawiki.org, wait a week(?), then start the fractional rollout? [20:07:55] 10netops, 10Infrastructure-Foundations, 10observability: Grafana posting to http://wpt-graphite.wmftest.org:8080/ - https://phabricator.wikimedia.org/T307445 (10lmata) >>! In T307445#8047790, @ayounsi wrote: > @jbond i think this can be closed? also curious whether this task can be closed [20:14:40] I'll write it in the task [23:52:15] 10Traffic, 10Performance-Team, 10SRE, 10serviceops, 10Patch-For-Review: Progressive Multi-DC roll out - https://phabricator.wikimedia.org/T279664 (10tstarling)