[04:02:22] 10Traffic, 10SRE, 10serviceops, 10Performance-Team (Radar): Reconcile MediaWiki POST timeout and Varnish/ATS timeouts - https://phabricator.wikimedia.org/T294800 (10Krinkle) [06:41:59] 10Traffic, 10SRE, 10serviceops, 10Performance-Team (Radar): Reconcile MediaWiki POST timeout and Varnish/ATS timeouts - https://phabricator.wikimedia.org/T294800 (10Joe) If anything, I think we should go in the other direction, and progressively and drastically reduce our timeouts for any synchronous reque... [09:48:55] Good morning, as I missed Fri and Mon, what's new on the drmrs side? Anything I could do or should look at wrt reimages or similar? (cc XioNoX, topranks, mmandere) [09:49:26] I was out yesterday too [09:49:40] so I'm interested too :) [09:54:38] I think traffic are hoping to try to image one of the dns hosts later today or tomorrow. Marc is taking care of it to get the experience so they aren’t rushing it. [09:55:07] Would be worth testing imaging and dhcp relay when we can. [09:55:41] Also they will need us to announce the public IP ranges so those hosts can talk to internet when they are up. [09:57:32] ack, anyone involved feel free to ping me in case of issues, the first reimage might hit some hiccups due to missing configs for the new DC [10:58:00] volans: we'll update you... so currently working on the ulsfo as testing ground before we try the dns in drmrs [11:36:00] ack thanks [12:12:57] (VarnishPrometheusExporterDown) firing: Varnish Exporter on instance cp4033:9331 is unreachable - https://alerts.wikimedia.org [12:17:57] (VarnishPrometheusExporterDown) resolved: Varnish Exporter on instance cp4033:9331 is unreachable - https://alerts.wikimedia.org [12:19:36] ^ we're currently reimaging the instance, safe to ignore [13:32:11] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo: Q1:(Need By: TBD) rack/setup/install cp403[3-6].ulsfo.wmnet - https://phabricator.wikimedia.org/T290694 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by mmandere@cumin1001 for host cp4035.ulsfo.wmnet with OS buster [13:41:57] (VarnishPrometheusExporterDown) firing: Varnish Exporter on instance cp4035:9331 is unreachable - https://alerts.wikimedia.org [13:44:22] ^^ being reimaged as we speak [13:44:41] ^^ volans I guess that the reimage script should be able to downtime those as well [13:44:52] s/script/cookbook/ [13:45:08] vgutierrez: yes, not yet though, there is a task to add support for alertmanager to spicerack [13:45:16] thx <3 [13:45:25] I was made aware that we're starting to use alertmanager for ops only very recently [13:45:41] I'll try to find the time at some point, should not be hard [13:46:57] (VarnishPrometheusExporterDown) resolved: Varnish Exporter on instance cp4035:9331 is unreachable - https://alerts.wikimedia.org [14:24:27] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo: Q1:(Need By: TBD) rack/setup/install cp403[3-6].ulsfo.wmnet - https://phabricator.wikimedia.org/T290694 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by mmandere@cumin1001 for host cp4035.ulsfo.wmnet with OS buster completed: - cp4035 (**WARN**... [14:31:21] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo: Q1:(Need By: TBD) rack/setup/install cp403[3-6].ulsfo.wmnet - https://phabricator.wikimedia.org/T290694 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by mmandere@cumin1001 for host cp4034.ulsfo.wmnet with OS buster [14:34:59] vgutierrez, mmandere: FYI unless there is something specifically needed on your side, the reimage script can take already care of depooling hosts from conftool [14:35:11] I've seen the current reimages are finding the hosts already depooled [14:37:30] volans: new hosts that haven't been pooled ever [14:37:45] that explains :) [14:37:49] but --new doesn't apply cause they're already on PuppetDB [14:38:35] ack, I've sent a patch as the last confctl message in https://phabricator.wikimedia.org/T290694#7474784 is a bit confusing [14:38:41] https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/736239 [14:40:57] (VarnishPrometheusExporterDown) firing: Varnish Exporter on instance cp4034:9331 is unreachable - https://alerts.wikimedia.org [14:50:57] (VarnishPrometheusExporterDown) resolved: Varnish Exporter on instance cp4034:9331 is unreachable - https://alerts.wikimedia.org [15:17:59] 10netops, 10Infrastructure-Foundations: Management routers: use BGP instead of OSPF - https://phabricator.wikimedia.org/T294845 (10ayounsi) p:05Triage→03Medium [15:22:13] 10netops, 10Infrastructure-Foundations: Management routers: use BGP instead of OSPF - https://phabricator.wikimedia.org/T294845 (10ayounsi) First draft here, v6 is not complete, but it's more to agree on the general direction, @cmooney, let me know what you think. Once configured and confirmed working, I'll s... [15:32:12] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo: Q1:(Need By: TBD) rack/setup/install cp403[3-6].ulsfo.wmnet - https://phabricator.wikimedia.org/T290694 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by mmandere@cumin1001 for host cp4034.ulsfo.wmnet with OS buster completed: - cp4034 (**WARN**... [15:38:58] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo: Q1:(Need By: TBD) rack/setup/install cp403[3-6].ulsfo.wmnet - https://phabricator.wikimedia.org/T290694 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by mmandere@cumin1001 for host cp4036.ulsfo.wmnet with OS buster [16:30:28] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo: Q1:(Need By: TBD) rack/setup/install cp403[3-6].ulsfo.wmnet - https://phabricator.wikimedia.org/T290694 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by mmandere@cumin1001 for host cp4036.ulsfo.wmnet with OS buster completed: - cp4036 (**WARN**... [16:48:04] volans: for the dns reimaging, we're good to go tomorrow. We're now done reimaging the 4 instances in ulsfo [16:54:38] mmandere: ack, thanks for the update! [20:55:43] 10Traffic, 10Analytics-Radar, 10SRE, 10WMF-General-or-Unknown, and 2 others: Requests for /static get an invalid WMF-Last-Access cookie for wikipedia.org on non-Wikipedia requests - https://phabricator.wikimedia.org/T261803 (10Krinkle) [21:05:47] 10Traffic, 10Analytics-Radar, 10SRE, 10WMF-General-or-Unknown, and 2 others: Requests for /static get an invalid WMF-Last-Access cookie for wikipedia.org on non-Wikipedia requests - https://phabricator.wikimedia.org/T261803 (10Krinkle) The URLs that use w/extensions, w/skins, and w/resources are also used...