[04:32:32] 10Traffic, 10SRE: increased 5xx rate for esams frontend traffic - https://phabricator.wikimedia.org/T342121 (10Joe) 05Openโ†’03Resolved Hi @TheDJ, what you're seeing there is a big influx of 429 from our systems rate-limiting some very aggressive api user from a public cloud. To put this in prespective, we... [09:01:02] 10Traffic, 10SRE, 10ops-eqiad: Relocate lvs1013-lvs1016 to rows E & F - https://phabricator.wikimedia.org/T341992 (10cmooney) >>! In T341992#9029300, @RobH wrote: > Cool, I understand now. I'll move and update netbox/homer for these two hosts tomorrow to move them to 10G configured ports 44/45 I renumbered... [11:58:19] 10Traffic, 10MW-on-K8s, 10SRE, 10serviceops, and 3 others: Migrate internal traffic to k8s - https://phabricator.wikimedia.org/T333120 (10Joe) [11:59:45] 10Traffic, 10MW-on-K8s, 10SRE, 10serviceops, and 2 others: Serve production traffic via Kubernetes - https://phabricator.wikimedia.org/T290536 (10Joe) [12:08:42] 10Traffic, 10SRE, 10ops-eqiad: Relocate lvs1013-lvs1016 to rows E & F - https://phabricator.wikimedia.org/T341992 (10RobH) links moved, servers online for remote os installation. [12:18:22] ๐Ÿ‘‹ I've a quick service catalog question -- in this change, we're changing the target IPs of the releases servers, but the comment points out that it's not an LVS service. So can someone help me understand what this change is going to affect? (We're in the process of decommissioning the old VMs and finding any references to them, including these ones) [12:20:23] 10Traffic, 10SRE: Disable keep-alive on HAProxy port 80 - https://phabricator.wikimedia.org/T342211 (10Fabfur) All cp hosts in esams and eqsin have keep-alive disabled on port 80. Drop in number of sessions on port 80: {F37144516} {F37144519} The number of (correctly redirected) requests managed by those h... [13:07:14] 10Traffic, 10SRE: Disable keep-alive on HAProxy port 80 - https://phabricator.wikimedia.org/T342211 (10Fabfur) [13:08:37] eoghan: hi, could you point us to the service itself? [13:08:43] (on the service catalog I mean) [13:09:12] Oh, I forgot to include the change, sorry! https://gerrit.wikimedia.org/r/c/operations/puppet/+/938889/3/hieradata/common/service.yaml [13:10:08] eoghan: so, it means that pybal isn't involved at all [13:10:24] confctl is still involved though [13:15:24] vgutierrez: So what does that affect, or where would that be used if it's not in lvs? [13:17:14] eoghan: actually nothing? [13:17:30] That's what I think. I'm just not sure what I'm missing tbh. [13:17:42] it looks like releases.discovery.wmnet isn't using the geoip!disc-releases [13:17:58] so it's hardcoded as a CNAME to releases1003 [13:19:18] eoghan: configuring the service as non-LVS in the service catalog enables https://wikitech.wikimedia.org/wiki/DNS/Discovery [13:19:53] and the dnscdisc is created [13:20:00] https://www.irccloud.com/pastebin/gDD3P0Kb/ [13:20:14] but you aren't leveraging it cause the DNS setup is not done [13:20:48] https://github.com/wikimedia/operations-dns/blob/347bb9f571c20e93bfcf96a2fb3daecae8ec5d02/templates/wmnet#L869 [13:21:10] it's just a CNAME to releases1003 rather than "IN DYNA geoip!disc-releases" [13:29:27] eoghan: oh, and of course that also configures the blackbox prometheus probes [13:29:30] https://grafana.wikimedia.org/goto/IsFZ09j4z?orgId=1 [13:42:54] eoghan: I hope that answers your question :) [13:57:51] Yep, thanks a million, thatโ€™s really helpful [14:37:51] 10Traffic, 10SRE, 10ops-eqiad: Relocate lvs1013-lvs1016 to rows E & F - https://phabricator.wikimedia.org/T341992 (10RobH) a:05RobHโ†’03Vgutierrez Ready for installation! [14:59:10] 10Traffic, 10SRE, 10Patch-For-Review: Disable keep-alive on HAProxy port 80 - https://phabricator.wikimedia.org/T342211 (10Fabfur) [15:14:51] 10Traffic, 10SRE: Disable keep-alive on HAProxy port 80 - https://phabricator.wikimedia.org/T342211 (10Fabfur) [16:49:59] 10Traffic, 10SRE, 10Patch-For-Review: Disable keep-alive on HAProxy port 80 - https://phabricator.wikimedia.org/T342211 (10Fabfur) [18:08:06] 10Traffic, 10SRE: Upgrade to pdns-recursor 4.8.4 - https://phabricator.wikimedia.org/T341611 (10ssingh) `pdns-recursor 4.8.4-1+wmf11u1` has been running in production on the following hosts for a while: dns1004, 2004, 4003, 5003 doh6001 No issues observed, so we will rolling out to all hosts that use it on... [18:11:21] 10Traffic, 10SRE: Upgrade to pdns-recursor 4.8.4 - https://phabricator.wikimedia.org/T341611 (10ssingh) [18:59:46] (HAProxyRestarted) firing: HAProxy server restarted on cloudcontrol1005:9100 - https://wikitech.wikimedia.org/wiki/HAProxy#HAProxy_for_edge_caching - https://grafana.wikimedia.org/d/gQblbjtnk/haproxy-drilldown?orgId=1&var-site=eqiad%20prometheus/ops&var-instance=cloudcontrol1005&viewPanel=10 - https://alerts.wikimedia.org/?q=alertname%3DHAProxyRestarted [19:06:32] ^^ that's not under our scope [19:06:40] We need to refine that alert [19:40:28] yeah [19:40:38] will look into it [22:59:46] (HAProxyRestarted) firing: HAProxy server restarted on cloudcontrol1005:9100 - https://wikitech.wikimedia.org/wiki/HAProxy#HAProxy_for_edge_caching - https://grafana.wikimedia.org/d/gQblbjtnk/haproxy-drilldown?orgId=1&var-site=eqiad%20prometheus/ops&var-instance=cloudcontrol1005&viewPanel=10 - https://alerts.wikimedia.org/?q=alertname%3DHAProxyRestarted