[09:38:37] 06Traffic, 10conftool, 13Patch-For-Review, 10Sustainability (Incident Followup): requestctl can't act on cache hits - https://phabricator.wikimedia.org/T317794#9980527 (10Joe) To clarify a bit, I didn't take the route described in the task. In fact, we want: * Rules with `cache_miss_only: true` to only be... [09:41:16] 06Traffic, 10conftool, 13Patch-For-Review, 10Sustainability (Incident Followup): requestctl can't act on cache hits - https://phabricator.wikimedia.org/T317794#9980543 (10Joe) a:03Joe [09:58:48] 06Traffic, 10conftool: Allow integrating requestctl rules into haproxy - https://phabricator.wikimedia.org/T369606#9980594 (10Joe) Haproxy has a logic that's very different from varnish, but it should be possible to translate most of our current patterns or ipblocks to something haproxy can read. Specifically... [10:57:40] 06Traffic, 10conftool: Allow integrating requestctl rules into haproxy - https://phabricator.wikimedia.org/T369606#9980817 (10Joe) As @Fabfur made me notice, conditions can also be expressed inline: ` http-request silent-drop if (req.hdr(user-agent) path_reg Googlebot.* ) && ... ` and ipblocks will be transf... [11:08:15] 10Domains, 06Traffic: [toolforge] transfer/adopt toolsbeta.org domain to the foundation - https://phabricator.wikimedia.org/T362253#9980857 (10dcaro) Thinking a bit more about this, I think I'd prefer going with openstack as the NS, as we want any toolforge root to be able to play around and change things in t... [13:36:04] 10netops, 06Infrastructure-Foundations: cr3-ulsfo flapping on July 14 - https://phabricator.wikimedia.org/T370048 (10ssingh) 03NEW [13:36:05] 10netops, 06Infrastructure-Foundations: cr3-ulsfo flapping on July 14 - https://phabricator.wikimedia.org/T370048#9981303 (10ssingh) p:05Triage→03Medium [13:36:15] 10netops, 06Infrastructure-Foundations, 06SRE: cr3-ulsfo flapping on July 14 - https://phabricator.wikimedia.org/T370048#9981304 (10ssingh) [13:39:27] 10netops, 06Infrastructure-Foundations, 06SRE: cr3-ulsfo flapping on July 14 - https://phabricator.wikimedia.org/T370048#9981315 (10ssingh) [13:57:44] 10netops, 06Infrastructure-Foundations, 06SRE: cr3-ulsfo flapping on July 14 - https://phabricator.wikimedia.org/T370048#9981361 (10ayounsi) If I was paranoid, I'd say it's possibly a bug being exploited that can cause a DDoS and we should prioritize T364092. We have a couple runbooks that could fit the sit... [13:58:36] 10netops, 06Infrastructure-Foundations, 06SRE: cr3-ulsfo flapping on July 14 - https://phabricator.wikimedia.org/T370048#9981370 (10ssingh) >>! In T370048#9981361, @ayounsi wrote: > If I was paranoid, I'd say it's possibly a bug being exploited that can cause a DDoS and we should prioritize T364092. > > We... [15:26:05] 06Traffic, 06SRE, 13Patch-For-Review: Migrate DNS depooling of sites from operations/dns (git) to confctl - https://phabricator.wikimedia.org/T369366#9981821 (10Volans) Sorry if I'm late to the task, I discovered it just today as I was not subscribed to it. Allow me to be really sad that in this whole discu... [15:42:42] 10netops, 06Traffic, 06Infrastructure-Foundations, 06SRE: Upgrade anycast-healthchecker to 0.9.8 (from 0.9.1-1+wmf12u1) - https://phabricator.wikimedia.org/T370068 (10ssingh) 03NEW [15:57:57] 06Traffic, 06SRE, 13Patch-For-Review: Migrate DNS depooling of sites from operations/dns (git) to confctl - https://phabricator.wikimedia.org/T369366#9982145 (10ssingh) >>! In T369366#9981821, @Volans wrote: > Sorry if I'm late to the task, I discovered it just today as I was not subscribed to it. > > Allow... [17:19:39] 06Traffic, 06SRE, 13Patch-For-Review: Migrate DNS depooling of sites from operations/dns (git) to confctl - https://phabricator.wikimedia.org/T369366#9982548 (10Volans) Thanks for the clarification. I didn't meant to imply that you didn't want a cookbook as end goal (although it was not mentioned). >>! In T... [17:55:42] 06Traffic, 10conftool: Allow integrating requestctl rules into haproxy - https://phabricator.wikimedia.org/T369606#9982784 (10CDanis) >>! In T369606#9980816, @Joe wrote: > and ipblocks will be transformed to haproxy acls like we do in varnish, something like > ` > acl ipblock_cloud_google src -f /ipblocks/clou... [18:06:04] 06Traffic, 06SRE, 13Patch-For-Review: Migrate DNS depooling of sites from operations/dns (git) to confctl - https://phabricator.wikimedia.org/T369366#9982911 (10ssingh) >>! In T369366#9982548, @Volans wrote: > Thanks for the clarification. I didn't meant to imply that you didn't want a cookbook as end goal (... [21:33:25] FIRING: [5x] SystemdUnitCrashLoop: varnishmtail@default.service crashloop on cp3066:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop [21:34:25] FIRING: SystemdUnitFailed: varnishmtail@internal.service on cp3068:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:38:07] ? [21:38:16] huh [21:38:25] FIRING: [6x] SystemdUnitCrashLoop: varnishmtail@default.service crashloop on cp3066:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop [21:39:20] surge in requests [21:39:39] https://grafana.wikimedia.org/goto/NWUBCq_SR?orgId=1 [21:40:55] log overrun and then the restart as expected [21:41:21] Ack [21:41:35] leave it to me, late for you. making dinner but will be around [21:41:51] seems in practice just in esams [21:41:57] fabfur: yep [21:43:25] RESOLVED: [6x] SystemdUnitCrashLoop: varnishmtail@default.service crashloop on cp3066:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop [21:43:28] great [21:44:25] RESOLVED: SystemdUnitFailed: varnishmtail@internal.service on cp3068:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [23:35:43] 06Traffic, 13Patch-For-Review: [ncmonitor] ncredir should check whether second-level domains are used - https://phabricator.wikimedia.org/T369114#9983682 (10BCornwall) 05Open→03Resolved [23:36:05] 06Traffic, 13Patch-For-Review: [ncmonitor] ncredir should check whether second-level domains are used - https://phabricator.wikimedia.org/T369114#9983688 (10BCornwall) 05Resolved→03In progress