[00:20:26] good point. then the question is just.. does anyone at WMF actually get those emails if that would happen with certs from acme_chief. if traffic does and that's good enough then I would keep saying subteams dont need to worry about it for each of their certs [00:20:35] laters [01:45:44] (VarnishHighThreadCount) firing: Varnish's thread count is high - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/wiU3SdEWk/cache-host-drilldown?viewPanel=99&var-site=eqsin&var-instance=cp5021 - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [01:50:44] (VarnishHighThreadCount) firing: (14) Varnish's thread count is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [02:00:44] (VarnishHighThreadCount) firing: (16) Varnish's thread count is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [02:20:44] (VarnishHighThreadCount) firing: (10) Varnish's thread count is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [02:30:44] (VarnishHighThreadCount) firing: (18) Varnish's thread count is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [02:50:44] (VarnishHighThreadCount) resolved: (8) Varnish's thread count is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [04:41:38] 10Traffic, 10Infrastructure-Foundations, 10SRE, 10Performance-Team (Radar): GeoIP mapping experiments - https://phabricator.wikimedia.org/T332024 (10Krinkle) (I'm responding here in response to an email to the Peformance Team.) This is an exciting project to see happen. We love meauring stuff and are happ... [06:12:00] mutante: we don't have specific checks for that but acmechief will attempt to reissue any configured certificate 1 month before its expiration date [06:13:19] but that won't cover certain glitches like a service failing to use a new TLS cert after renewal [06:21:46] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Allow managing drmrs DHCP settings with Homer - https://phabricator.wikimedia.org/T328737 (10ayounsi) a:05ayounsi→03cmooney [06:30:08] 10netops, 10Infrastructure-Foundations, 10SRE: Add generic mechanism to add static routes on switches - https://phabricator.wikimedia.org/T334281 (10cmooney) 05Open→03Resolved [06:55:59] 10netops, 10Infrastructure-Foundations, 10SRE: Core routers: replace bootp with dhcp-relay - https://phabricator.wikimedia.org/T320508 (10ayounsi) a:05ayounsi→03cmooney [07:04:01] 10Traffic, 10netops, 10DBA, 10Data-Engineering, and 8 others: eqiad row D switches upgrade - https://phabricator.wikimedia.org/T333377 (10ayounsi) a:03cmooney [07:04:44] (VarnishHighThreadCount) firing: (8) Varnish's thread count is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [07:09:44] (VarnishHighThreadCount) firing: (11) Varnish's thread count is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [07:19:44] (VarnishHighThreadCount) firing: (11) Varnish's thread count is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [07:29:44] (VarnishHighThreadCount) firing: (10) Varnish's thread count is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [07:42:14] 10Traffic, 10Patch-For-Review: Test ESI feasibility with current Varnish installation - https://phabricator.wikimedia.org/T308799 (10Vgutierrez) 05In progress→03Resolved We didn't experience any major issues with ESI enabled under normal load but under heavy traffic nodes running the ESI experiment starve... [07:49:44] (VarnishHighThreadCount) firing: (16) Varnish's thread count is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [08:09:44] (VarnishHighThreadCount) resolved: (8) Varnish's thread count is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [08:39:51] 10Traffic, 10SRE, 10Wikidata, 10wdwb-tech, 10wmde-wikidata-tech: Wikidata seems to still be utilizing insecure HTTP URIs - https://phabricator.wikimedia.org/T331356 (10ItamarWMDE) [09:06:18] 10Traffic, 10MW-on-K8s, 10SRE, 10serviceops, and 3 others: Migrate internal traffic to k8s - https://phabricator.wikimedia.org/T333120 (10Clement_Goubert) [11:22:38] 10Traffic, 10netops, 10DBA, 10Data-Engineering, and 8 others: eqiad row D switches upgrade - https://phabricator.wikimedia.org/T333377 (10Vgutierrez) [13:42:12] 10Traffic, 10SRE, 10conftool, 10serviceops: Pybal maintenances break safe-service-restart.py (and thus prevent scap deploys of mediawiki) - https://phabricator.wikimedia.org/T334703 (10Clement_Goubert) For future reference, this left 89 out of 280 appservers and 9 out of 20 parsoid servers depooled in codf... [13:54:22] 10Traffic, 10RESTBase, 10Wikipedia-iOS-App-Backlog, 10iOS-app-feature-Performance, and 3 others: PCS caching and pregeneration when restbase is decommissioned - https://phabricator.wikimedia.org/T319365 (10Jgiannelos) [13:54:57] 10Traffic, 10RESTBase, 10Wikipedia-iOS-App-Backlog, 10iOS-app-feature-Performance, and 3 others: PCS caching and pregeneration when restbase is decommissioned - https://phabricator.wikimedia.org/T319365 (10Jgiannelos) Just an update after experimenting with PCS/Summary not using pregenerated content in pro... [14:05:14] 10Traffic, 10SRE: Deprecate pybal test hosts pybal-test200[12] - https://phabricator.wikimedia.org/T334745 (10ssingh) [14:38:12] 10Traffic, 10SRE: Deprecate pybal test hosts pybal-test200[12] - https://phabricator.wikimedia.org/T334745 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by sukhe@cumin2002 for hosts: `pybal-test2001.codfw.wmnet` - pybal-test2001.codfw.wmnet (**PASS**) - Downtimed host on Icinga/Alertmanage... [14:44:26] 10Traffic, 10Infrastructure-Foundations, 10SRE, 10Performance-Team (Radar): GeoIP mapping experiments - https://phabricator.wikimedia.org/T332024 (10CDanis) >>! In T332024#8780999, @Krinkle wrote: > (I'm responding here in response to an email to the Peformance Team.) > > This is an exciting project to se... [14:49:41] 10Traffic, 10SRE: Deprecate pybal test hosts pybal-test200[12] - https://phabricator.wikimedia.org/T334745 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by sukhe@cumin2002 for hosts: `pybal-test2002.codfw.wmnet` - pybal-test2002.codfw.wmnet (**PASS**) - Downtimed host on Icinga/Alertmanage... [15:12:51] 10Traffic, 10SRE: Deprecate pybal test hosts pybal-test200[12] - https://phabricator.wikimedia.org/T334745 (10ssingh) 05Open→03Resolved a:03ssingh Hosts decommissioned and removed from Puppet. [15:14:02] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ssingh) [15:24:46] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ssingh) [15:37:00] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host lvs1013.eqiad.wmnet with OS bullseye [15:38:30] 10Traffic, 10Upstream: HAProxy 2.6.12 segfaults - https://phabricator.wikimedia.org/T334448 (10Vgutierrez) uptream has decided to rollback the commit triggering the underlying issue: http://git.haproxy.org/?p=haproxy-2.6.git;a=commit;h=d66823ece6e40cf27dca767591097f13d9aac57b [15:55:47] 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team: Configure cloudsw1-b1-codfw and migrate cloud hosts in codfw B1 to it - https://phabricator.wikimedia.org/T327919 (10fnegri) [16:06:17] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host lvs1013.eqiad.wmnet with OS bullseye completed: - lvs1013 (**PASS**) - Downtimed on Icinga/Aler... [16:12:04] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ssingh) [16:47:27] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host lvs1015.eqiad.wmnet with OS bullseye [17:15:52] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host lvs1015.eqiad.wmnet with OS bullseye completed: - lvs1015 (**PASS**) - Downtimed on Icinga/Aler... [17:16:20] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ssingh) [17:17:31] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host lvs1014.eqiad.wmnet with OS bullseye [17:29:45] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host lvs1016.eqiad.wmnet with OS bullseye [17:53:50] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host lvs1014.eqiad.wmnet with OS bullseye completed: - lvs1014 (**PASS**) - Downtimed on Icinga/Aler... [17:57:10] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host lvs1016.eqiad.wmnet with OS bullseye completed: - lvs1016 (**PASS**) - Downtimed on Icinga/Aler... [18:26:44] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10BCornwall) [18:35:20] 10Traffic, 10Infrastructure-Foundations, 10SRE, 10Performance-Team (Radar): GeoIP mapping experiments - https://phabricator.wikimedia.org/T332024 (10JameelKaisar) First of all thank you Timo and Chris for the detailed information. ## Measurement domain - The shuffling the targets/domains part is implemen... [18:51:54] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ssingh) [19:08:09] 10Traffic, 10Infrastructure-Foundations, 10SRE, 10Performance-Team (Radar): GeoIP mapping experiments - https://phabricator.wikimedia.org/T332024 (10BBlack) It's awesome to see this moving along! One minor point: >> This would then be immediately queryable in Grafana by DC and Country code, where you c... [19:37:45] (HAProxyRestarted) firing: HAProxy server restarted on cp2033:9100 - https://wikitech.wikimedia.org/wiki/HAProxy#HAProxy_for_edge_caching - https://grafana.wikimedia.org/d/gQblbjtnk/haproxy-drilldown?orgId=1&var-site=codfw%20prometheus/ops&var-instance=cp2033&viewPanel=10 - https://alerts.wikimedia.org/?q=alertname%3DHAProxyRestarted [21:09:02] 10Traffic, 10netops, 10DBA, 10Data-Engineering, and 8 others: eqiad row D switches upgrade - https://phabricator.wikimedia.org/T333377 (10colewhite) [23:38:00] (HAProxyRestarted) firing: HAProxy server restarted on cp2033:9100 - https://wikitech.wikimedia.org/wiki/HAProxy#HAProxy_for_edge_caching - https://grafana.wikimedia.org/d/gQblbjtnk/haproxy-drilldown?orgId=1&var-site=codfw%20prometheus/ops&var-instance=cp2033&viewPanel=10 - https://alerts.wikimedia.org/?q=alertname%3DHAProxyRestarted