[07:20:15] 10Traffic, 10netops, 10DBA, 10Data-Engineering-Planning, and 11 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10Marostegui) [07:31:08] 10Traffic, 10netops, 10DBA, 10Data-Engineering-Planning, and 11 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10Marostegui) [07:56:52] 10Traffic, 10SRE, 10Patch-For-Review: Move cache text cluster from nginx to ats-tls - https://phabricator.wikimedia.org/T231627 (10Vgutierrez) [08:03:23] 10Traffic, 10SRE: varnish test text/02-frontend-headers.vtc is currently failing in production - https://phabricator.wikimedia.org/T328898 (10Vgutierrez) [08:03:58] 10Traffic, 10SRE: varnish test text/02-frontend-headers.vtc is currently failing in production - https://phabricator.wikimedia.org/T328898 (10Vgutierrez) p:05Triageโ†’03Medium [08:44:02] 10Traffic, 10netops, 10DBA, 10Data-Engineering-Planning, and 11 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10ayounsi) [08:49:46] 10Traffic, 10netops, 10DBA, 10Data-Engineering-Planning, and 11 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10MoritzMuehlenhoff) [09:57:06] 10Traffic, 10SRE: varnish test text/02-frontend-headers.vtc is currently failing in production - https://phabricator.wikimedia.org/T328898 (10Vgutierrez) This could be an intermittent issue as my latest run against a text node worked as expected: `0 tests failed, 0 tests skipped, 35 tests passed` [09:57:57] 10Traffic, 10netops, 10DBA, 10Data-Engineering-Planning, and 12 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10akosiaris) >>! In T327925#8587186, @Marostegui wrote: >>>! In T327925#8587104, @Joe wrote: >> I would suggest that instead of handling individual systems, we d... [10:11:37] 10Traffic, 10netops, 10DBA, 10Data-Engineering-Planning, and 12 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10Marostegui) Cool! I am going to repool the hosts then :) [10:21:07] 10Traffic, 10SRE, 10Patch-For-Review: Add DP cookie for pageview filtering - https://phabricator.wikimedia.org/T315676 (10Vgutierrez) @Jcross @Htriedman https://gerrit.wikimedia.org/r/886337 got merged a few minutes ago, initial tests in cp6016 look good: ` vgutierrez@cp6016:~$ curl -v -o /dev/null "https://... [10:51:43] 10Traffic, 10SRE: varnish test text/02-frontend-headers.vtc is currently failing in production - https://phabricator.wikimedia.org/T328898 (10Vgutierrez) 05Openโ†’03Declined hmmm this is triggered by https://gerrit.wikimedia.org/r/c/operations/puppet/+/886401, error was caused by running a non rebased test s... [11:38:37] Iยดd like to merge this https://gerrit.wikimedia.org/r/c/operations/puppet/+/883151/ [11:39:00] Is https://grafana.wikimedia.org/d/000000399/dns-recursors?orgId=1 the right dash to watch for possible issues? [12:47:42] 10Traffic, 10netops, 10DBA, 10Data-Engineering-Planning, and 12 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10dcaro) [12:51:50] 10Traffic, 10netops, 10DBA, 10Data-Engineering-Planning, and 12 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10Marostegui) I am repooling all the databases since we are going to fully depool codfw for reads. [13:47:39] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 2 others: Upgrade cloudsw1-c8-eqiad and cloudsw1-d5-eqiad to Junos 20+ - https://phabricator.wikimedia.org/T316544 (10dcaro) So currently we can't take down all the osds on rack C8 (14), as we don't have enough space to allocate their data on... [13:54:15] 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): unwind the Puppetized /etc/hosts override of statsd.eqiad.wmnet - https://phabricator.wikimedia.org/T239862 (10fgiunchedi) [14:09:55] 10Traffic, 10netops, 10DBA, 10Data-Engineering-Planning, and 12 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10ssingh) [14:12:14] 10Traffic, 10DC-Ops, 10SRE, 10ops-eqsin: eqsin hosts are not rebooting when running sre.hosts.reimage cookbook - https://phabricator.wikimedia.org/T327812 (10ssingh) omething and so just writing it here: - The PXE boot worked fine for us in cases with the old firmware as well; the DHCP in d-i failed and t... [14:39:48] 10Traffic, 10netops, 10DBA, 10Data-Engineering-Planning, and 12 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10MatthewVernon) If we're "just" depooling codfw it's worth noting we will still need to depool the affected ms-fe* nodes (since mw always tries to write to both... [15:06:30] 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): unwind the Puppetized /etc/hosts override of statsd.eqiad.wmnet - https://phabricator.wikimedia.org/T239862 (10Clement_Goubert) Is https://grafana.wikimedia.org/d/000000399/dns-recursors?orgId=1 the right dash to watch for possible issues after mergi... [15:10:07] 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): unwind the Puppetized /etc/hosts override of statsd.eqiad.wmnet - https://phabricator.wikimedia.org/T239862 (10CDanis) >>! In T239862#8589709, @Clement_Goubert wrote: > Is https://grafana.wikimedia.org/d/000000399/dns-recursors?orgId=1 the right dash... [15:26:12] claime: yep, that's right [15:26:17] (sorry about the delay) [15:26:29] No worries [15:26:38] I'm going to merge it now [15:26:59] ๐Ÿฟ [15:28:57] Merged ๐Ÿ‘€ [15:30:34] vgutierrez: I imagine mostly slow answers are to watch? [15:31:32] and/or a spike in number of questions [15:32:03] ack [16:02:01] 10Traffic, 10SRE: Add DP cookie for pageview filtering - https://phabricator.wikimedia.org/T315676 (10Htriedman) @Vgutierrez thanks so much! taking a look now [17:22:03] 10Traffic, 10netops, 10DBA, 10Data-Engineering-Planning, and 12 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10jbond) [18:09:57] 10Traffic, 10DC-Ops, 10SRE, 10ops-eqsin: eqsin hosts are not rebooting when running sre.hosts.reimage cookbook - https://phabricator.wikimedia.org/T327812 (10BCornwall) > I guess the main question is for the hosts in eqsin that failed where you restarted the cookbook, was there a firmware upgrade in betwee... [18:33:14] 10Traffic, 10SRE: oom killed varnish on cp4052 - https://phabricator.wikimedia.org/T325797 (10BBlack) 05Openโ†’03Resolved a:03BBlack [22:51:59] 10Traffic, 10netops, 10DBA, 10Data-Engineering-Planning, and 12 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=e0e96453-af13-467f-a75e-ebd1c4122a32) set by bking@cumin2002 for 1 day, 0:00:00 on 13 host(s)...