[12:38:27] hi folks! [12:38:50] as anticipated on friday, I'd need one/two canary cp nodes to deploy a new version of glibc [12:39:15] the task is https://phabricator.wikimedia.org/T367978, nothing major it is just an excercise basically [12:39:49] the idea is to restart all glibc related on the canary nodes (just to be sure) and rollout the upgrade to all cp nodes afterwards (without requiring a major roll restart) [13:04:00] 06Traffic, 06Infrastructure-Foundations, 13Patch-For-Review: Migrate ldap-ro and ldap-ro-ssl to IPIP encapsulation - https://phabricator.wikimedia.org/T367861#9917457 (10MoritzMuehlenhoff) p:05Triage→03Medium [13:22:53] elukey: sounds good to me [13:23:32] elukey: cp4034 && cp4052? [13:27:00] vgutierrez: perfect! What is the best way to proceed? Is it ok for me to depool, downtime, restart, check, repool? [13:27:52] elukey: yeah.. depool and !log :) [13:28:22] and maybe alert oncall people [13:28:26] * fabfur whistles... [13:30:18] for a 1 host thingie? come on :) [13:31:04] fabfur rightfully doesn't trust me and wants to be as safe as possible :D [13:32:04] naaah, was just a superstitious joke as I'm oncall :d [13:35:55] ok, see what's in -private... @elukey proceed without alerting anyone! sukhe will take care of it! :D [13:36:12] haha [13:36:18] happy to :) [13:37:19] :D [13:38:07] vgutierrez: I don't find cp4034 though, in site.pp the first node available seems 4037 [13:38:30] yes that seems right [13:38:36] elukey: that was some serious brain-farting on my side [13:38:38] 4037 [13:38:47] aahahahahahah [13:38:54] yep, cp4037 :) [13:39:04] I stared at my monitor for a solid 5 mins because I was sure it was my pebcak [13:39:13] my L8 is solid too lol [13:42:11] wow apparently nothing needs to be restarted [13:42:23] I was almost sure that libc was used somewhere [13:44:23] and on cp4052 indeed a lot of things need to be restarted [13:44:36] what's the list for cp4052? [13:45:49] most of the daemons that we run.. but I think that on cp4037 we already had the correct version deployed [13:46:01] this is why I didn't see a list from debdeploy [13:46:50] yep yep [13:47:42] cp4037 repooled, will do 4052 after meetings! [13:47:46] thanks :) [13:47:58] ok! thanks [13:54:45] 06Traffic, 06Data-Engineering, 13Patch-For-Review: Investigate increase in CD termination state after upgrading eqsin/ulsfo to HAProxy 2.8.10 - https://phabricator.wikimedia.org/T367963#9917695 (10Vgutierrez) this is caused a bug in the mtail regex used to parse haproxy logs, on haproxy 2.6.17 http status ge... [13:59:18] 06Traffic, 06SRE, 13Patch-For-Review: Anycast NTP and update the list of timeservers for P:systemd::timesyncd - https://phabricator.wikimedia.org/T366360#9917704 (10ssingh) >>! In T366360#9914657, @Dwisehaupt wrote: > Frack config has been updated to use the new ntp-[abc].anycast.wmnet servers. The previous... [14:54:06] cp4052 upgraded and repooled! If I don't see any issue I'll rollout the glibc upgrades to all cp nodes tomorrow [14:54:10] 06Traffic, 06Infrastructure-Foundations, 13Patch-For-Review: Migrate ldap-ro and ldap-ro-ssl to IPIP encapsulation - https://phabricator.wikimedia.org/T367861#9917940 (10Vgutierrez) 05Open→03In progress [14:58:34] 10netops, 06cloud-services-team, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Move WMCS servers to 1 single NIC - https://phabricator.wikimedia.org/T319184#9917973 (10aborrero) [15:38:51] 06Traffic, 06Data-Engineering, 13Patch-For-Review: Investigate increase in CD termination state after upgrading eqsin/ulsfo to HAProxy 2.8.10 - https://phabricator.wikimedia.org/T367963#9918190 (10Vgutierrez) 05Open→03Resolved a:05Fabfur→03Vgutierrez cache_haproxy.mtail failed to accept -1 as an... [15:54:28] I apparently can't run the unit tests for ats' lua (even from the production branch), I get "attempt to call upvalue 'multi_dc_file' (a nil value)". I'm probably missing something stupidly simple, anyone know what off the top of their head? [15:54:55] running `busted -c multi-dc_test.lua` inside the directory [15:55:16] Same for mw-on-k8s tests, which i could have sworn I was able to make work sometime ago [16:06:05] claime: /bin/sh -c 'busted --verbose --helper=modules/profile/files/trafficserver/mock.helper.lua --lpath=modules/profile/files/trafficserver/?.lua ./modules/profile/files/trafficserver/*.lua' [16:06:15] claime: that's how you run those tests [16:06:23] ah, should I update wikitech? [16:06:38] please :) [16:09:29] vgutierrez: done :) [16:26:19] thx [16:33:52] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Add per-output queue monitoring for Juniper network devices - https://phabricator.wikimedia.org/T326322#9918514 (10cmooney) >>! In T326322#9615636, @fgiunchedi wrote: > Yeah having some ballpark numbers will be a great help @cmooney, unless... [17:01:06] vgutierrez: ty you for review on [Configurably remove varnish handling of /beacon/event (1042278)](https://gerrit.wikimedia.org/r/c/operations/puppet/+/1042278) [17:01:10] (I somehow didn't notice until today) [17:01:27] I resolved your comments, does your +1 mean I should merge when ready? [17:05:49] Maybe sync with somebody on the traffic team but yes [17:05:54] I don't see any blocker [17:06:17] ottomata: just finishing one more task and we can do it today [17:08:02] You got the eqsin stuff tonight:) [17:08:15] vgutierrez: yeah, plan is to wrap up soon [17:12:38] ottomata: can we move this tomorrow please? happy to do, same time or earlier [17:12:51] finishing up one more task and then I have to disappear for overnight work [20:05:40] FIRING: [8x] VarnishHighThreadCount: Varnish's thread count on cp3066:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [20:05:40] FIRING: [8x] VarnishHighThreadCount: Varnish's thread count on cp3066:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [20:10:40] FIRING: [10x] VarnishHighThreadCount: Varnish's thread count on cp3066:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [20:15:22] oh sukhe for sure i'm busy too! sorry i missed your ping! [20:15:40] FIRING: [10x] VarnishHighThreadCount: Varnish's thread count on cp3066:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [20:15:50] around the same time you pinged me today would work tomorrow [20:20:40] FIRING: [11x] VarnishHighThreadCount: Varnish's thread count on cp3066:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [20:25:40] FIRING: [11x] VarnishHighThreadCount: Varnish's thread count on cp3066:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [20:30:40] FIRING: [9x] VarnishHighThreadCount: Varnish's thread count on cp3066:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [20:40:40] RESOLVED: [3x] VarnishHighThreadCount: Varnish's thread count on cp3066:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [21:09:47] 06Traffic, 10DNS, 06SRE: Cleanup unused DNS subdomains - https://phabricator.wikimedia.org/T367012#9919440 (10Dzahn) [21:10:42] 06Traffic, 10DNS, 10fundraising-tech-ops, 06SRE: Cleanup unused DNS subdomains - https://phabricator.wikimedia.org/T367012#9919442 (10Dzahn) [21:14:19] 06Traffic, 10DNS, 10fundraising-tech-ops, 06SRE: Cleanup unused DNS subdomains - https://phabricator.wikimedia.org/T367012#9919450 (10Dzahn) Hello, fundraising-tech-ops, is https://benefactors.wikimedia.org/ still being used somehow for email campaigns? (mandrillapp.com ?) Would you agree with the sugges... [21:18:49] 06Traffic, 06DC-Ops, 10ops-codfw, 06serviceops, 06SRE: lvs2011 Memory failure on slot B1 - https://phabricator.wikimedia.org/T368165#9919461 (10BCornwall) a:05Papaul→03Jhancock.wm [22:02:50] 06Traffic, 06DC-Ops, 10ops-eqsin, 06SRE, 13Patch-For-Review: Q4: install PCIe NVMe SSDs into eqsin text cp50(1[789]|2[01234] - https://phabricator.wikimedia.org/T365763#9919586 (10RobH) [22:03:49] 06Traffic, 06DC-Ops, 10ops-eqsin, 06SRE, 13Patch-For-Review: Q4: install PCIe NVMe SSDs into eqsin text cp50(1[789]|2[01234] - https://phabricator.wikimedia.org/T365763#9919588 (10RobH) [23:33:23] 06Traffic, 06DC-Ops, 10ops-eqsin, 06SRE, 13Patch-For-Review: Q4: install PCIe NVMe SSDs into eqsin text cp50(1[789]|2[01234] - https://phabricator.wikimedia.org/T365763#9919721 (10BCornwall) [23:33:52] 06Traffic, 06DC-Ops, 10ops-eqsin, 06SRE, 13Patch-For-Review: Q4: install PCIe NVMe SSDs into eqsin text cp50(1[789]|2[01234] - https://phabricator.wikimedia.org/T365763#9919722 (10BCornwall)