[07:13:31] 06Traffic, 06Content-Transform-Team, 06Data-Persistence: ATSBackendErrorsHigh cache_text sre (restbase.discovery.wmnet eqiad) - https://phabricator.wikimedia.org/T359234#9604644 (10Joe) Hi @Eevans, I'm a bit perplexed by why you think serviceops should be able to assist with this issue. This seems like an ap... [07:37:27] 06Traffic, 06Content-Transform-Team, 10MW-on-K8s, 06SRE, and 3 others: Create parsoid mediawiki deployment and migrate parsoid-php.discovery.wmnet traffic to it - https://phabricator.wikimedia.org/T357392#9604693 (10akosiaris) [07:41:42] 06Traffic, 06Content-Transform-Team, 10MW-on-K8s, 06SRE, and 3 others: Reimage parse* hosts as kubernetes nodes - https://phabricator.wikimedia.org/T358752#9604721 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by akosiaris@cumin1002 for host parse1014.eqiad.wmnet with OS bullseye [09:46:13] 06Traffic, 06Content-Transform-Team, 06Data-Persistence: ATSBackendErrorsHigh cache_text sre (restbase.discovery.wmnet eqiad) - https://phabricator.wikimedia.org/T359234#9604786 (10Jgiannelos) a:03Jgiannelos [09:49:25] 06Traffic, 06Content-Transform-Team, 10MW-on-K8s, 06SRE, and 2 others: Reimage parse* hosts as kubernetes nodes - https://phabricator.wikimedia.org/T358752#9604826 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by akosiaris@cumin1002 for host parse1014.eqiad.wmnet with OS bullseye comp... [09:50:17] 06Traffic, 06Content-Transform-Team, 10MW-on-K8s, 06SRE, and 2 others: Reimage parse* hosts as kubernetes nodes - https://phabricator.wikimedia.org/T358752#9604842 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by akosiaris@cumin1002 for host parse2008.codfw.wmnet with OS bullseye [09:50:37] 06Traffic, 06Content-Transform-Team, 10MW-on-K8s, 06SRE, and 2 others: Reimage parse* hosts as kubernetes nodes - https://phabricator.wikimedia.org/T358752#9604847 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by akosiaris@cumin1002 for host parse2009.codfw.wmnet with OS bullseye [09:51:05] 06Traffic, 06Content-Transform-Team, 10MW-on-K8s, 06SRE, and 2 others: Reimage parse* hosts as kubernetes nodes - https://phabricator.wikimedia.org/T358752#9604848 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by akosiaris@cumin1002 for host parse2010.codfw.wmnet with OS bullseye [09:51:33] 06Traffic, 06Content-Transform-Team, 10MW-on-K8s, 06SRE, and 2 others: Reimage parse* hosts as kubernetes nodes - https://phabricator.wikimedia.org/T358752#9604849 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by akosiaris@cumin1002 for host parse2011.codfw.wmnet with OS bullseye [09:52:01] 06Traffic, 06Content-Transform-Team, 10MW-on-K8s, 06SRE, and 2 others: Reimage parse* hosts as kubernetes nodes - https://phabricator.wikimedia.org/T358752#9604850 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by akosiaris@cumin1002 for host parse2012.codfw.wmnet with OS bullseye [09:52:37] 06Traffic, 06Content-Transform-Team, 10MW-on-K8s, 06SRE, and 2 others: Reimage parse* hosts as kubernetes nodes - https://phabricator.wikimedia.org/T358752#9604852 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by akosiaris@cumin1002 for host parse2013.codfw.wmnet with OS bullseye [09:54:21] 06Traffic, 06Content-Transform-Team, 10MW-on-K8s, 06SRE, and 2 others: Reimage parse* hosts as kubernetes nodes - https://phabricator.wikimedia.org/T358752#9604859 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by akosiaris@cumin1002 for host parse2014.codfw.wmnet with OS bullseye [09:54:50] 06Traffic, 06Content-Transform-Team, 10MW-on-K8s, 06SRE, and 2 others: Reimage parse* hosts as kubernetes nodes - https://phabricator.wikimedia.org/T358752#9604860 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by akosiaris@cumin1002 for host parse2015.codfw.wmnet with OS bullseye [10:02:28] 06Traffic, 06Content-Transform-Team, 10MW-on-K8s, 06SRE, and 2 others: Reimage parse* hosts as kubernetes nodes - https://phabricator.wikimedia.org/T358752#9604968 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by akosiaris@cumin1002 for host parse2009.codfw.wmnet with OS bullseye comp... [10:03:33] 06Traffic, 06Content-Transform-Team, 10MW-on-K8s, 06SRE, and 2 others: Reimage parse* hosts as kubernetes nodes - https://phabricator.wikimedia.org/T358752#9604974 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by akosiaris@cumin1002 for host parse2008.codfw.wmnet with OS bullseye comp... [10:04:37] 06Traffic, 06Content-Transform-Team, 10MW-on-K8s, 06SRE, and 2 others: Reimage parse* hosts as kubernetes nodes - https://phabricator.wikimedia.org/T358752#9605005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by akosiaris@cumin1002 for host parse2013.codfw.wmnet with OS bullseye comp... [10:06:45] 06Traffic, 06Content-Transform-Team, 10MW-on-K8s, 06SRE, and 2 others: Reimage parse* hosts as kubernetes nodes - https://phabricator.wikimedia.org/T358752#9605029 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by akosiaris@cumin1002 for host parse2015.codfw.wmnet with OS bullseye comp... [10:07:54] 06Traffic, 06Content-Transform-Team, 10MW-on-K8s, 06SRE, and 2 others: Reimage parse* hosts as kubernetes nodes - https://phabricator.wikimedia.org/T358752#9605037 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by akosiaris@cumin1002 for host parse2012.codfw.wmnet with OS bullseye comp... [10:09:22] 06Traffic, 06Content-Transform-Team, 10MW-on-K8s, 06SRE, and 2 others: Reimage parse* hosts as kubernetes nodes - https://phabricator.wikimedia.org/T358752#9605060 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by akosiaris@cumin1002 for host parse2014.codfw.wmnet with OS bullseye comp... [10:10:15] 06Traffic, 06Content-Transform-Team, 10MW-on-K8s, 06SRE, and 2 others: Reimage parse* hosts as kubernetes nodes - https://phabricator.wikimedia.org/T358752#9605070 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by akosiaris@cumin1002 for host parse2010.codfw.wmnet with OS bullseye comp... [10:11:35] 06Traffic, 06Content-Transform-Team, 10MW-on-K8s, 06SRE, and 2 others: Reimage parse* hosts as kubernetes nodes - https://phabricator.wikimedia.org/T358752#9605090 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by akosiaris@cumin1002 for host parse2011.codfw.wmnet with OS bullseye comp... [10:15:56] 06Traffic, 06Content-Transform-Team, 06Data-Persistence: ATSBackendErrorsHigh cache_text sre (restbase.discovery.wmnet eqiad) - https://phabricator.wikimedia.org/T359234#9605122 (10hnowlan) Is this the same bug as T356369? [10:29:34] 06Traffic, 06Content-Transform-Team, 06Data-Persistence, 13Patch-For-Review: ATSBackendErrorsHigh cache_text sre (restbase.discovery.wmnet eqiad) - https://phabricator.wikimedia.org/T359234#9605273 (10Jgiannelos) @hnowlan similar issue. I just sent a patch that fixes it. [10:44:25] (SystemdUnitFailed) firing: benthos@haproxy_cache.service on cp4037:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:49:25] (SystemdUnitFailed) resolved: benthos@haproxy_cache.service on cp4037:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:05:07] ^^ this is me, already fixed [11:10:02] 06Traffic, 06Data Products, 06Data-Engineering, 10Observability-Logging, 13Patch-For-Review: Move analytics log from Varnish to HAProxy - https://phabricator.wikimedia.org/T351117#9605542 (10Fabfur) Update: Benthos is installed on cp4037 and after some minor fixes, is finally ready to ingest, process and... [11:24:45] 06Traffic, 06Content-Transform-Team, 06Data-Persistence, 13Patch-For-Review: ATSBackendErrorsHigh cache_text sre (restbase.discovery.wmnet eqiad) - https://phabricator.wikimedia.org/T359234#9605611 (10Jgiannelos) I just deployed a patch that should improve things for this issue: From production: ` jgiannel... [11:25:15] 06Traffic, 06Content-Transform-Team, 06Content-Transform-Team-WIP, 06Data-Persistence, 13Patch-For-Review: ATSBackendErrorsHigh cache_text sre (restbase.discovery.wmnet eqiad) - https://phabricator.wikimedia.org/T359234#9605618 (10Jgiannelos) [12:40:46] 06Traffic, 06Content-Transform-Team, 10MW-on-K8s, 06SRE, and 3 others: Reimage parse* hosts as kubernetes nodes - https://phabricator.wikimedia.org/T358752#9606067 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by akosiaris@cumin1002 for host parse2002.codfw.wmnet with OS bullseye [12:41:33] 06Traffic, 06Content-Transform-Team, 10MW-on-K8s, 06SRE, and 3 others: Reimage parse* hosts as kubernetes nodes - https://phabricator.wikimedia.org/T358752#9606068 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by akosiaris@cumin1002 for host parse2003.codfw.wmnet with OS bullseye [12:42:06] 06Traffic, 06Content-Transform-Team, 10MW-on-K8s, 06SRE, and 3 others: Reimage parse* hosts as kubernetes nodes - https://phabricator.wikimedia.org/T358752#9606070 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by akosiaris@cumin1002 for host parse2004.codfw.wmnet with OS bullseye [12:42:43] 06Traffic, 06Content-Transform-Team, 10MW-on-K8s, 06SRE, and 3 others: Reimage parse* hosts as kubernetes nodes - https://phabricator.wikimedia.org/T358752#9606072 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by akosiaris@cumin1002 for host parse2005.codfw.wmnet with OS bullseye [12:43:16] 06Traffic, 06Content-Transform-Team, 10MW-on-K8s, 06SRE, and 3 others: Reimage parse* hosts as kubernetes nodes - https://phabricator.wikimedia.org/T358752#9606073 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by akosiaris@cumin1002 for host parse2006.codfw.wmnet with OS bullseye [12:44:00] 06Traffic, 06Content-Transform-Team, 10MW-on-K8s, 06SRE, and 3 others: Reimage parse* hosts as kubernetes nodes - https://phabricator.wikimedia.org/T358752#9606080 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by akosiaris@cumin1002 for host parse2007.codfw.wmnet with OS bullseye [13:17:58] 06Traffic, 06Content-Transform-Team, 10MW-on-K8s, 06SRE, and 3 others: Reimage parse* hosts as kubernetes nodes - https://phabricator.wikimedia.org/T358752#9606271 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by akosiaris@cumin1002 for host parse2002.codfw.wmnet with OS bullseye comp... [13:20:37] 06Traffic, 06Content-Transform-Team, 10MW-on-K8s, 06SRE, and 3 others: Reimage parse* hosts as kubernetes nodes - https://phabricator.wikimedia.org/T358752#9606285 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by akosiaris@cumin1002 for host parse2005.codfw.wmnet with OS bullseye comp... [13:24:45] 06Traffic, 06Content-Transform-Team, 10MW-on-K8s, 06SRE, and 3 others: Reimage parse* hosts as kubernetes nodes - https://phabricator.wikimedia.org/T358752#9606341 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by akosiaris@cumin1002 for host parse2003.codfw.wmnet with OS bullseye comp... [13:25:01] 06Traffic, 06Content-Transform-Team, 10MW-on-K8s, 06SRE, and 3 others: Reimage parse* hosts as kubernetes nodes - https://phabricator.wikimedia.org/T358752#9606342 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by akosiaris@cumin1002 for host parse2006.codfw.wmnet with OS bullseye comp... [13:27:23] 06Traffic, 06Content-Transform-Team, 10MW-on-K8s, 06SRE, and 3 others: Reimage parse* hosts as kubernetes nodes - https://phabricator.wikimedia.org/T358752#9606371 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by akosiaris@cumin1002 for host parse2007.codfw.wmnet with OS bullseye comp... [13:30:24] 06Traffic, 06Content-Transform-Team, 10MW-on-K8s, 06SRE, and 3 others: Reimage parse* hosts as kubernetes nodes - https://phabricator.wikimedia.org/T358752#9606405 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by akosiaris@cumin1002 for host parse2004.codfw.wmnet with OS bullseye comp... [14:20:11] 06Traffic, 06Content-Transform-Team, 10MW-on-K8s, 06SRE, and 3 others: Create parsoid mediawiki deployment and migrate parsoid-php.discovery.wmnet traffic to it - https://phabricator.wikimedia.org/T357392#9606851 (10akosiaris) [14:23:31] 06Traffic, 06Content-Transform-Team, 10MW-on-K8s, 06SRE, and 2 others: Reimage parse* hosts as kubernetes nodes - https://phabricator.wikimedia.org/T358752#9606911 (10akosiaris) Almost all parsoid hosts have been reimaged as kubernetes nodes. Scandium, testreduce1002, parse1001 and parse1002 being the exce... [14:24:35] 06Traffic, 06Content-Transform-Team, 10MW-on-K8s, 06SRE, and 3 others: Create parsoid mediawiki deployment and migrate parsoid-php.discovery.wmnet traffic to it - https://phabricator.wikimedia.org/T357392#9606936 (10akosiaris) [14:26:23] 06Traffic, 06Content-Transform-Team, 10MW-on-K8s, 06SRE, and 2 others: Reimage parse* hosts as kubernetes nodes - https://phabricator.wikimedia.org/T358752#9606934 (10akosiaris) 05Open→03Resolved [14:34:37] 06Traffic, 06Content-Transform-Team, 10MW-on-K8s, 06SRE, and 3 others: Create parsoid mediawiki deployment and migrate parsoid-php.discovery.wmnet traffic to it - https://phabricator.wikimedia.org/T357392#9607078 (10akosiaris) 05In progress→03Resolved [14:35:17] 06Traffic, 10MW-on-K8s, 06SRE, 06serviceops, 10Release-Engineering-Team (Seen): Serve production traffic via Kubernetes - https://phabricator.wikimedia.org/T290536#9607081 (10akosiaris) [14:44:27] 06Traffic, 06Content-Transform-Team, 06Content-Transform-Team-WIP, 06Data-Persistence: ATSBackendErrorsHigh cache_text sre (restbase.discovery.wmnet eqiad) - https://phabricator.wikimedia.org/T359234#9607232 (10Eevans) >>! In T359234#9604644, @Joe wrote: > Hi @Eevans, I'm a bit perplexed by why you think s... [15:14:33] 06Traffic, 06Content-Transform-Team, 06Content-Transform-Team-WIP, 06Data-Persistence: ATSBackendErrorsHigh cache_text sre (restbase.discovery.wmnet eqiad) - https://phabricator.wikimedia.org/T359234#9607423 (10Jgiannelos) 05Open→03Resolved Error rate seems to be at previous levels after deploying the... [15:34:40] (VarnishHighThreadCount) firing: Varnish's thread count on cp1106:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/wiU3SdEWk/cache-host-drilldown?viewPanel=99&var-site=eqiad&var-instance=cp1106 - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [15:39:41] (VarnishHighThreadCount) firing: (2) Varnish's thread count on cp1106:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [15:44:40] (VarnishHighThreadCount) firing: (2) Varnish's thread count on cp1106:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [15:54:41] (VarnishHighThreadCount) firing: (8) Varnish's thread count on cp1100:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [16:17:23] the maps tile invalidation service just failed sending data to eventgate - interestingly it was a connection refused rather than a 503 but possibly related to T249745. We had a spike in 503 errors for jobs failing to enqueue again today [16:17:24] T249745: Could not enqueue jobs: "Unable to deliver all events: 503: Service Unavailable" - https://phabricator.wikimedia.org/T249745 [16:34:40] (VarnishHighThreadCount) firing: (16) Varnish's thread count on cp1100:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [16:54:41] (VarnishHighThreadCount) resolved: (8) Varnish's thread count on cp1100:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [17:44:43] oops, just realised the above was in the wrong channel :) [18:32:22] 06Traffic, 10netops, 06Infrastructure-Foundations, 06SRE: Support PyBal routes announced with lower priority than "backup" - https://phabricator.wikimedia.org/T354839#9608338 (10cmooney) p:05Medium→03Low [19:36:41] (VarnishHighThreadCount) firing: (8) Varnish's thread count on cp5017:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [19:46:40] (VarnishHighThreadCount) firing: (12) Varnish's thread count on cp5017:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [19:51:41] (VarnishHighThreadCount) firing: (15) Varnish's thread count on cp5017:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [19:56:41] (VarnishHighThreadCount) firing: (15) Varnish's thread count on cp5017:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [20:06:41] (VarnishHighThreadCount) firing: (11) Varnish's thread count on cp5017:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [20:11:41] (VarnishHighThreadCount) firing: (9) Varnish's thread count on cp5017:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [20:16:41] (VarnishHighThreadCount) firing: (9) Varnish's thread count on cp5017:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [20:26:40] (VarnishHighThreadCount) resolved: (3) Varnish's thread count on cp5021:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount