[07:28:10] volans: considering we want the original reason rather the one massaged by Spicerack.admin_reason, is any motive to instantiate admin_reason rather than use args.puppet_reason directly? [07:30:14] vgutierrez: actually my suggestion from yesterday would not work [07:30:42] so https://doc.wikimedia.org/spicerack/master/api/spicerack.puppet.html#spicerack.puppet.PuppetHosts.run wants a spicerack.administrative.Reason as reason argument [07:31:24] ok, so we should document somehow that puppet needs to be disabled in the way that spicerack admin_reason expects [07:32:24] and disable-puppet adds the user [07:32:44] is SUDO_USER is set, so not when run from cumin [07:32:48] hmmm via cumin it doesn't [07:32:50] s/is/if/ [07:33:08] so yeah just use a reason that follows [07:33:08] https://doc.wikimedia.org/spicerack/master/api/spicerack.administrative.html#spicerack.administrative.Reason.__str__ [07:33:20] sorry about that [07:38:38] no problem [07:38:58] I guess we should pass the TASK ID to the administrative reason as well [07:39:36] cause as a user of the cookbook I'd expect that setting the task id would result in the task id being used on the puppet reason [07:43:24] up to you, pick one of the two incantation and stick with it :) [07:57:35] fabfur: ^^ [08:05:27] really don't know, maybe we could assume that if the "task id" is not specified, the puppet reason will be used as puppet id [08:05:47] don't know if also the opposite could be done [08:09:48] you can freely pass the task id from args to the puppet reason instance and know that if you set it you have to disable puppet with: "Given reason - username@hostname - Txxxxx" and if not with "Given reason - username@hostname" [08:09:52] that's it [08:11:42] vgutierrez: if it's ok for you we could test it [08:12:00] fabfur: what do you wanna test? [08:12:16] the cookbook on cp2042 [08:12:40] fabfur: don't you wanna merge https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/924590 first? [08:12:58] we should get a clean dry-run after that [08:13:11] yep [08:37:12] 10Traffic, 10Data-Engineering: Move varnishkafka to PKI - https://phabricator.wikimedia.org/T337825 (10elukey) [08:37:31] 10Traffic, 10Data-Engineering, 10Data-Platform-SRE: Move varnishkafka to PKI - https://phabricator.wikimedia.org/T337825 (10elukey) [09:13:40] volans: slightly annoying feature.. if the cookbook targets just one host (--query P{cp2042.codfw.wmnet}) it still enforces the grace sleep [09:14:30] 301 jbond ^^^ [09:15:26] patches are welcome, it's just an if away on cookbooks/sre/__init__.py [09:15:37] I think around line 417 [09:25:50] FYI https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/924897 [09:26:40] .thx [09:34:33] jbond: <3 [09:36:16] Hi. Would it be okay if I moved a service to lvs_setup today? https://gerrit.wikimedia.org/r/c/operations/puppet/+/920664 [09:36:29] * jbond merged [09:39:21] hnowlan: yep, https://wikitech.wikimedia.org/wiki/Deployments#Wednesday,_May_31 looks clear [09:44:17] hnowlan: just mind the next deployment starting in 18 minutes :) [09:47:48] vgutierrez: good point :) I can probably get it done, thanks! [09:53:16] going head with secondaries [09:55:14] actually, no - at this point I will wait until the deployments are done. [10:01:06] ack :) [10:59:08] 10Traffic, 10SRE, 10SRE-swift-storage: Revisit CDN<-->Swift communication - https://phabricator.wikimedia.org/T317616 (10MatthewVernon) [11:15:39] vgutierrez: I'll proceed now [11:16:34] Ack [11:24:54] ahh, cookbook is hanging on waiting for lvs2010's icinga checks because there's a warning: PYBAL WARNING - WARNING - Pool schema_443 is too small to allow depooling [11:24:59] seems like the restart was fine otherwise though [11:32:42] yeah we need to rollback that change [11:32:48] joe mentioned that it was ok to be reverted [11:33:04] I'll do it after lunch if you don't mind [11:36:41] no problem [11:37:18] doesn't really affect the restarts - secondaries are done, should I leave doing the primaries or should I proceed? [11:52:46] I'll leave it til after lunch also :) [12:07:48] ack [13:45:12] hnowlan: let's proceed with the primaries? [13:45:21] vgutierrez: sounds good [13:45:25] doing it now [13:45:31] cheers [14:02:12] all done! [14:29:37] 10Traffic, 10SRE-OnFire, 10conftool, 10serviceops, and 2 others: Pybal maintenances break safe-service-restart.py (and thus prevent scap deploys of mediawiki) - https://phabricator.wikimedia.org/T334703 (10BBlack) We've got a pair of patches to review now which configure this on the pybal and safe-service-... [15:01:53] vgutierrez: would you mind if I did the move to production today as well? https://gerrit.wikimedia.org/r/c/operations/puppet/+/920664 [15:02:16] err https://gerrit.wikimedia.org/r/c/operations/puppet/+/920667/1 [15:08:16] hnowlan: I guess that you should turn paging on as well then? [15:08:47] vgutierrez: ideally not for now - it won't be receiving any traffic until we make further changes. [15:09:00] but if you think it's worth doing now it'll be relatively stable either way [15:09:39] no problem.. just don't forget about it :) [15:10:23] oh yeah absolutely :) [15:13:32] 10Traffic, 10SRE, 10SRE-swift-storage: Revisit CDN<-->Swift communication - https://phabricator.wikimedia.org/T317616 (10MatthewVernon) @Vgutierrez we are now running bullseye everywhere, so if you are wanting to look at TLS termination on the swift frontends, I think we're not longer blocking you by having... [15:14:55] 10Traffic, 10DNS, 10SRE, 10Abstract Wikipedia team (Phase κ – Clean-up): Establish wikifunctions.org - https://phabricator.wikimedia.org/T275904 (10Jdforrester-WMF) 05Open→03Resolved I believe this is long-since done, thanks to SRE Traffic. [15:19:50] 10Traffic: Deploy Wikimedia DNS: DNS-over-HTTPS (DoH) and DNS-over-TLS (DoT) public resolver - https://phabricator.wikimedia.org/T252132 (10ssingh) [15:35:39] (VarnishPrometheusExporterDown) firing: Varnish Exporter on instance cp2035:9331 is unreachable - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/000000304/varnish-dc-stats?viewPanel=17 - https://alerts.wikimedia.org/?q=alertname%3DVarnishPrometheusExporterDown [15:35:46] (PurgedHighBacklogQueue) firing: Large backlog queue for purged on cp2035:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://grafana.wikimedia.org/d/RvscY1CZk/purged?var-datasource=codfw%20prometheus/ops&var-instance=cp2035 - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighBacklogQueue [15:39:57] (PurgedHighBacklogQueue) firing: (2) Large backlog queue for purged on cp2035:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://grafana.wikimedia.org/d/RvscY1CZk/purged?var-datasource=codfw%20prometheus/ops&var-instance=cp2035 - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighBacklogQueue [15:40:12] (PurgedHighBacklogQueue) firing: Large backlog queue for purged on cp2035:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://grafana.wikimedia.org/d/RvscY1CZk/purged?var-datasource=codfw%20prometheus/ops&var-instance=cp2035 - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighBacklogQueue [15:42:28] ^^ expected noise due to cp2035 issues [15:43:07] vgutierrez: Am I clear for the codfw mh rollout or should I hold off? [15:43:25] go ahead [15:43:29] thanks :) [15:43:31] we've stopped our work [15:43:48] cp2035 is currently depooled but it shouldn't be a big deal [17:40:33] 10Traffic, 10SRE, 10Epic: Deploy Wikimedia DNS: DNS-over-HTTPS (DoH) and DNS-over-TLS (DoT) public resolver - https://phabricator.wikimedia.org/T252132 (10bd808) [18:51:42] 10Traffic, 10DC-Ops, 10SRE, 10ops-eqiad: Q4:rack/setup/install dns100[345] - https://phabricator.wikimedia.org/T326685 (10RobH) >>! In T326685#8816875, @Jclark-ctr wrote: > dns1004. A6. U.8 PORT. 11 CABLEID 1038 > dns1005. B6 U.5 PORT. 0 CABLEID 1969 > dns1006. C6 U27. PORT.27 CABLEID 3249 cabl... [18:54:06] 10Traffic, 10DC-Ops, 10SRE, 10ops-eqiad: Q4:rack/setup/install dns100[345] - https://phabricator.wikimedia.org/T326685 (10RobH) [19:10:43] 10Traffic, 10DC-Ops, 10SRE, 10ops-eqiad: Q4:rack/setup/install dns100[345] - https://phabricator.wikimedia.org/T326685 (10RobH) [19:14:30] 10Traffic, 10DC-Ops, 10SRE, 10ops-eqiad: Q4:rack/setup/install dns100[345] - https://phabricator.wikimedia.org/T326685 (10RobH) [19:15:31] 10Traffic, 10DC-Ops, 10SRE, 10ops-eqiad: Q4:rack/setup/install dns100[345] - https://phabricator.wikimedia.org/T326685 (10RobH) 05Open→03In progress a:05Jclark-ctr→03RobH ran network port setup steps, bios/idrac setup steps/dns/network steps applying firmware updates [20:27:57] 10Traffic, 10DC-Ops, 10SRE, 10ops-eqiad: Q4:rack/setup/install dns100[345] - https://phabricator.wikimedia.org/T326685 (10RobH) [20:28:59] 10Traffic, 10DC-Ops, 10SRE, 10ops-eqiad: Q4:rack/setup/install dns100[345] - https://phabricator.wikimedia.org/T326685 (10RobH) a:05RobH→03ssingh These are now ready for imaging! =] >>! In T326685#8876909, @ssingh wrote: > @Jclark-ctr: Hi John, Traffic has completed its work on the dns hosts in codfw,...