[09:57:18] 06Traffic: Upgrade haproxy to 2.8.13 on cp hosts - https://phabricator.wikimedia.org/T383111#10452383 (10Vgutierrez) [14:50:23] 10netops, 10Ceph, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Configure DSCP marking for cloudceph* hosts - https://phabricator.wikimedia.org/T371501#10453489 (10cmooney) @dcaro is there anything left to be done here? I see traffic profiled in the low and high classes across the cloud switc... [14:57:38] 06Traffic, 10DNS, 06MediaWiki-Platform-Team, 06SRE, and 2 others: Set up auth.wikimedia.org - https://phabricator.wikimedia.org/T377187#10453514 (10Tgr) a:03Tgr [15:01:37] 06Traffic, 10DNS, 06MediaWiki-Platform-Team, 06SRE, and 2 others: Set up auth.wikimedia.org - https://phabricator.wikimedia.org/T377187#10453543 (10Tgr) Notes from @elukey on IRC: > 17:12 < elukey> IIUC the config needs to run on the deployment servers via puppet run, so the correspondent yaml files for he... [15:48:30] 10netops, 06Infrastructure-Foundations: Multiple unreachable hosts in eqiad - https://phabricator.wikimedia.org/T382772#10453873 (10cmooney) p:05Triageβ†’03Low a:03cmooney [15:57:21] 10netops, 10Ceph, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Configure DSCP marking for cloudceph* hosts - https://phabricator.wikimedia.org/T371501#10453986 (10dcaro) >>! In T371501#10453489, @cmooney wrote: > @dcaro is there anything left to be done here? I see traffic profiled in the lo... [16:02:37] 10netops, 06Infrastructure-Foundations: peering issues with Meta? - https://phabricator.wikimedia.org/T383442#10454029 (10cmooney) 05Openβ†’03Resolved a:03cmooney Thanks for the task Daniel. I actually picked up on that email last week and re-enabled the sessions. So we should be ok here. The backgr... [16:08:59] 10netops, 10fundraising-tech-ops, 06Infrastructure-Foundations, 06SRE: Manage frack switches with Netbox - https://phabricator.wikimedia.org/T268802#10454053 (10cmooney) [17:21:55] 06Traffic, 10DNS, 06SRE, 13Patch-For-Review: Remove leftover DNS from declined chapter wikis causing language Wikipedia to resolve incorrectly on a *.wikimedia.org - https://phabricator.wikimedia.org/T382730#10454523 (10Dzahn) 05In progressβ†’03Resolved @Dylsss Thanks for reporting this! The 2 DNS r... [17:25:48] 06Traffic, 10DNS, 06SRE, 13Patch-For-Review: Remove leftover DNS from declined chapter wikis causing language Wikipedia to resolve incorrectly on a *.wikimedia.org - https://phabricator.wikimedia.org/T382730#10454546 (10Dylsss) Thanks for actioning! [17:37:24] FIRING: SystemdUnitFailed: nginx.service on ncredir1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [17:42:24] RESOLVED: SystemdUnitFailed: nginx.service on ncredir1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [17:47:04] ^checking [17:53:03] brett: that got solved already.. see -operations :D [17:53:51] ay yi yi [17:55:21] 06Traffic: puppet restarts nginx instead of reloading it on ncredir instances - https://phabricator.wikimedia.org/T383599 (10Vgutierrez) 03NEW [18:28:27] 10Domains, 06Traffic, 06SRE: Register wiki(m|p)edia.ro - https://phabricator.wikimedia.org/T222080#10455072 (10Dzahn) New acme-chief config has been deployed and ncredir* hosts now have a TLS cert for wikimedia.ro and wikipedia.ro. [19:56:57] sukhe: any objections to letting limit-by-path just roll itself out via puppet? [19:58:19] cdanis: as long as you test it in one of (not eqiad/eqsin/ulsfo), which I know you will, no objections :) [19:58:27] πŸ‘ [20:18:54] https://grafana.wikimedia.org/d/wiU3SdEWk/cache-host-drilldown?orgId=1&var-site=esams&var-instance=cp3080&from=now-1h&to=now looking good to me :) [20:20:16] same https://grafana.wikimedia.org/d/gQblbjtnk/haproxy-drilldown?orgId=1&var-site=esams%20prometheus%2Fops&var-instance=cp3080&from=now-1h&to=now [20:20:33] thanks! and yeah, I had a brief look and it looked good, at least the basic/obvious stuff for such a change (haproxy reloading, nothing else breaking in our usual checks) [20:21:14] yeah and no obvious perf or cpu differences afaict [20:21:26] thanks :) [21:12:08] 06Traffic: puppet restarts nginx instead of reloading it on ncredir instances - https://phabricator.wikimedia.org/T383599#10455905 (10BCornwall) 05Openβ†’03In progress p:05Triageβ†’03Low [21:17:40] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: WMF RIPE Atlas probe in Eqiad offline - https://phabricator.wikimedia.org/T382518#10455925 (10VRiley-WMF) 05Openβ†’03In progress [21:17:53] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: WMF RIPE Atlas probe in Eqiad offline - https://phabricator.wikimedia.org/T382518#10455927 (10VRiley-WMF) Rebooting Now [21:18:39] 06Traffic, 13Patch-For-Review: puppet restarts nginx instead of reloading it on ncredir instances - https://phabricator.wikimedia.org/T383599#10455928 (10BCornwall) a:03BCornwall [21:23:04] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: WMF RIPE Atlas probe in Eqiad offline - https://phabricator.wikimedia.org/T382518#10455949 (10VRiley-WMF) This has been rebooted @cmooney would you be able to check this when you have a chance? [22:26:42] 10netops, 06Infrastructure-Foundations, 10observability, 10Observability-Alerting, 06SRE: Alertmanager rule for network interface errors? - https://phabricator.wikimedia.org/T335350#10456238 (10andrea.denisse) Hi @cmooney, I noticed that patch 915489 has been merged. Do you know if there’s any remaining... [22:28:33] 06Traffic, 13Patch-For-Review: puppet restarts nginx instead of reloading it on ncredir instances - https://phabricator.wikimedia.org/T383599#10456249 (10BCornwall) 05In progressβ†’03Resolved Changed a file, ran puppet and observed an appropriate result: ` root@ncredir1001:/etc/nginx/conf.d# run-puppet-... [22:50:37] 10Domains, 06Traffic, 06SRE: Register wiki(m|p)edia.ro - https://phabricator.wikimedia.org/T222080#10456325 (10BCornwall) 05In progressβ†’03Resolved This is all done now. Thanks all!