[03:10:19] 10Traffic, 10Performance-Team, 10SRE, 10SRE-swift-storage, 10Patch-For-Review: Automatically clean up unused thumbnails in Swift - https://phabricator.wikimedia.org/T211661 (10ori) Unfortunately `tasks_per_second` was only added in 2.27, and we're running 2.10. [06:13:31] 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team (Kanban): asw2-c5-eqiad crash - https://phabricator.wikimedia.org/T313382 (10Marostegui) [06:13:41] 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team (Kanban): asw2-c5-eqiad crash - https://phabricator.wikimedia.org/T313382 (10Marostegui) [06:15:07] 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team (Kanban): asw2-c5-eqiad crash - https://phabricator.wikimedia.org/T313382 (10Marostegui) Both masters, s7 and x1 have been switched over and no longer live in this rack. [06:26:23] 10Traffic, 10Performance-Team, 10SRE, 10serviceops, 10Patch-For-Review: Progressive Multi-DC roll out - https://phabricator.wikimedia.org/T279664 (10tstarling) [06:39:32] 10netops, 10Infrastructure-Foundations, 10ops-eqiad: eqiad: upgrade row C and D uplinks from 4x10G to 1x40G - https://phabricator.wikimedia.org/T313463 (10ayounsi) p:05Triage→03Medium [06:41:18] 10netops, 10Infrastructure-Foundations, 10ops-eqiad: eqiad: upgrade row C and D uplinks from 4x10G to 1x40G - https://phabricator.wikimedia.org/T313463 (10ayounsi) [06:41:26] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: eqiad: Move links to new MPC7E linecard - https://phabricator.wikimedia.org/T304712 (10ayounsi) [06:42:24] 10netops, 10Infrastructure-Foundations, 10ops-eqiad: eqiad: upgrade row C and D uplinks from 4x10G to 1x40G - https://phabricator.wikimedia.org/T313463 (10ayounsi) [06:42:32] 10netops, 10Infrastructure-Foundations, 10SRE: Packet Drops on Eqiad ASW -> CR uplinks - https://phabricator.wikimedia.org/T291627 (10ayounsi) [07:08:56] 10netops, 10Infrastructure-Foundations, 10ops-eqiad: eqiad: upgrade row C and D uplinks from 4x10G to 1x40G - https://phabricator.wikimedia.org/T313463 (10ayounsi) [07:52:27] jbond: happy to resume https://gerrit.wikimedia.org/r/c/operations/puppet/+/815728 when you're ready [07:58:00] 10Traffic, 10SRE, 10Security-Team, 10SecTeam-Processed, 10Security: US Department of Homeland Security (DHS) IP blocks - https://phabricator.wikimedia.org/T303055 (10ayounsi) 05Open→03Resolved Thank you all, network block removed. [08:44:32] 10Traffic, 10Performance-Team, 10SRE, 10SRE-swift-storage, 10Patch-For-Review: Automatically clean up unused thumbnails in Swift - https://phabricator.wikimedia.org/T211661 (10MoritzMuehlenhoff) >>! In T211661#8093119, @ori wrote: > Unfortunately `tasks_per_second` was only added in 2.27, and we're runni... [09:11:21] vgutierrez: great looking now will ping when reedy [09:15:13] vgutierrez: ready to disable puppet and start the deploy when you give the green light :) [09:16:25] go ahead :) [09:17:34] cool thx [11:30:30] I wondered what I'd done then :P [12:11:19] 10Traffic, 10Performance-Team, 10SRE, 10SRE-swift-storage, 10Patch-For-Review: Automatically clean up unused thumbnails in Swift - https://phabricator.wikimedia.org/T211661 (10fgiunchedi) What @MoritzMuehlenhoff said (though we'll be upgrading to 2.26). At any rate `object-expirer` will remove the actual... [13:14:45] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: eqiad: upgrade row C and D uplinks from 4x10G to 1x40G - https://phabricator.wikimedia.org/T313463 (10Cmjohnson) [13:14:53] 10netops, 10Infrastructure-Foundations, 10SRE: Move asw2-d5-eqiad to spares - https://phabricator.wikimedia.org/T313115 (10Cmjohnson) [15:29:55] 10Traffic, 10Performance-Team, 10SRE, 10SRE-swift-storage, 10Patch-For-Review: Automatically clean up unused thumbnails in Swift - https://phabricator.wikimedia.org/T211661 (10ori) The reason ratelimiting via `tasks_per_second` was introduced (per the [[ https://bugs.launchpad.net/swift/+bug/1784753 | bu... [17:37:31] 10Traffic, 10SRE: pontoon.traffic.eqiad1.wikimedia.cloud unable to run puppet agent due to certificate mismatch - https://phabricator.wikimedia.org/T310303 (10BCornwall) 05Open→03Resolved The instances have been replaced. [17:44:52] 10Traffic, 10SRE: DRMRS: Geodns Configuration -- Phase 2 - https://phabricator.wikimedia.org/T311472 (10BCornwall) 05Open→03In progress [17:44:58] 10Traffic, 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: drmrs: initial geodns configuration - https://phabricator.wikimedia.org/T304089 (10BCornwall) [17:45:23] 10Traffic, 10SRE: DRMRS: Geodns Configuration -- Phase 2 - https://phabricator.wikimedia.org/T311472 (10BCornwall) p:05Triage→03Medium [17:56:00] Hello traffic folks! I'm trying to add some new web nodes and could use a hand with the last few pieces. This is probably where the trail starts... https://gerrit.wikimedia.org/r/c/operations/puppet/+/815378/3/conftool-data/node/eqiad.yaml [17:56:16] I'm guessing that now I have to run some conftool commands to get the new hosts with non-0 weight? [18:01:06] andrewbogott: confctl select name= set/weight=10 [18:01:13] confctl select name= set/pooled=yes [18:01:21] nice, trying... [18:01:34] do 'pool/depool' commands still work as well? Once the weight is set? [18:02:17] hmm IIRC "pool" should do the trick for you as well [18:02:54] nah.. pool will complain [18:02:58] ok. It tells me the weight is 0 which makes sense [18:03:07] confctl is throwing an exception though :( [18:03:14] Probably I've failed to config something important [18:03:22] why? [18:03:39] https://www.irccloud.com/pastebin/mB974H2e/ [18:04:00] oh, perhaps I need to run confctl on an etcd node, not on the webserver? [18:04:36] {"cloudweb1003.wikimedia.org": {"weight": 10, "pooled": "no"}, "tags": "dc=eqiad,cluster=labweb,service=labweb"} [18:05:04] it worked [18:05:08] looks like the irc logging code is timing out as it can't talk to logmsgbot? [18:05:13] it looks to me that it failed to log it yeah [18:05:35] huh. Well ok, maybe I don't care about that :) [18:06:03] on on WMCS I'd say that you wanna !log pooling a new server but.. 🤷‍♂️ [18:06:15] Yep, I'll log by hand [18:06:16] sorry.. on production, dunno about WMCS :) [18:08:53] BTW those instances don't seem to be healthy [18:08:59] Jul 21 18:08:09 lvs1020 pybal[34158]: [labweb-ssl_7443 ProxyFetch] WARN: cloudweb1003.wikimedia.org (enabled/partially up/not pooled): Fetch failed (https://wikitech.wikimedia.org:7443), 1.085 s [18:08:59] Jul 21 18:08:11 lvs1020 pybal[34158]: [labweb-ssl_7443 ProxyFetch] WARN: cloudweb1004.wikimedia.org (enabled/up/pooled): Fetch failed (https://wikitech.wikimedia.org:7443), 1.070 s [18:08:59] Jul 21 18:08:11 lvs1020 pybal[34158]: [labweb-ssl_7443] ERROR: Monitoring instance ProxyFetch reports server cloudweb1004.wikimedia.org (enabled/up/pooled) down: 500 Internal Server Error [18:09:00] Jul 21 18:08:20 lvs1020 pybal[34158]: [labweb-ssl_7443 ProxyFetch] WARN: cloudweb1003.wikimedia.org (enabled/partially up/not pooled): Fetch failed (https://wikitech.wikimedia.org:7443), 1.083 s [18:10:05] yeah [18:10:22] it's still progress :) I'm investigating. [18:10:53] looks like we're missing the mediawiki database grants for the new servers [18:11:25] taavi: likely, although more things are broken than that. [18:13:14] ok, so you might want to keep them as inactive to avoid pybal considering them for depooling threshold purposes [18:14:21] cause right now assuming the default 0.5 depool threshold, if one of your healthy servers fail, pybal will force it to stay pooled even if it fails [18:14:42] ok. I'm going to hack for a few minutes first but I'll depool them before leaving anything [18:14:58] Thank you for the quick/easy answer vgutierrez ! [18:16:21] no problem [18:22:27] ah, if the health check is /only/ hitting wikitech then maybe the db access really is the only issue... [18:25:43] you got two healthchecks on port 7443 [18:25:53] IdleConnection that keeps a TCP socket open against port 7443 [18:26:04] and ProxyFetch against https://wikitech.wikimedia.org:7443/ [18:26:21] and considers a 301 as a healthy HTTP status code [18:27:43] ok. So if wikitech.wikimedia.org is down it won't send traffic to the other sites either, that fits what I was seeing. [18:27:44] https://github.com/wikimedia/puppet/blob/production/hieradata/common/service.yaml#L1049-L1056 -_> that's the config [18:27:52] andrewbogott: indeed [18:27:54] depooled for now, pending the db grants [23:33:58] 10Traffic, 10DNS, 10SRE, 10WMF-Legal, and 4 others: Setup redirect of policy.wikimedia.org to Advocacy portal on Foundation website - https://phabricator.wikimedia.org/T310738 (10Dzahn) This has been deployed to all appservers and passes the tests in redirects and all other tests on all the hosts: ` [depl... [23:40:33] 10Traffic, 10DNS, 10SRE, 10WMF-Legal, and 4 others: Setup redirect of policy.wikimedia.org to Advocacy portal on Foundation website - https://phabricator.wikimedia.org/T310738 (10Dzahn) test from external: ` curl -H "Host: policy.wikimedia.org" https://dyna.wikimedia.org ..

The document has moved 10Traffic, 10DNS, 10SRE, 10WMF-Legal, and 4 others: Setup redirect of policy.wikimedia.org to Advocacy portal on Foundation website - https://phabricator.wikimedia.org/T310738 (10Dzahn) >>! In T310738#8070654, @Varnent wrote: > Just wanted to check on if there is anything else you are waiting from me on. I... [23:59:22] 10Traffic, 10DNS, 10SRE, 10WMF-Legal, and 4 others: Setup redirect of policy.wikimedia.org to Advocacy portal on Foundation website - https://phabricator.wikimedia.org/T310738 (10Dzahn) 05Open→03In progress p:05Medium→03High