[07:16:30] wow https://gerrit.wikimedia.org/r/q/topic:calico-v3.23.3 :) [09:44:13] I don't have any serviceops meeting in my schedule for today - are we not going to do them? [09:49:26] jayme: todays serviceops meeting was canceled because of US holiday and PTOs [09:58:24] ah, I see. Thanks [10:00:55] jayme: the SRE meeting stands though, so you wont miss us [10:17:03] 10serviceops, 10Data-Persistence-Backup, 10serviceops-collab, 10GitLab (Infrastructure), and 2 others: Backups for GitLab - https://phabricator.wikimedia.org/T274463 (10Jelto) @Dzahn thanks for merging the partman change and reimaging `gitlab2003`! #### Backup keep time reduction In the last GitLab IC me... [10:34:41] 10serviceops, 10Beta-Cluster-Infrastructure: Serve beta cluster via PHP 7.4 by default - https://phabricator.wikimedia.org/T306042 (10Joe) 05In progress→03Resolved [10:34:48] 10serviceops, 10Dumps-Generation, 10Patch-For-Review, 10Performance-Team (Radar): Migrate WMF production from PHP 7.2 to PHP 7.4 - https://phabricator.wikimedia.org/T271736 (10Joe) [10:37:03] effie: so glad :) [10:44:48] 10serviceops: Decommission wtp10[25-48].eqiad.wmnet - https://phabricator.wikimedia.org/T317025 (10Clement_Goubert) [10:46:52] 10serviceops, 10Parsoid, 10Patch-For-Review, 10Performance-Team (Radar): Parsoid migration to php 7.4 - https://phabricator.wikimedia.org/T312638 (10Clement_Goubert) >>! In T312638#8209540, @Zabe wrote: > Currently none of the parse1* hosts is a canary server, is that intended to be changed at some point?... [10:48:41] 10serviceops, 10Beta-Cluster-Infrastructure: Serve beta cluster via PHP 7.4 by default - https://phabricator.wikimedia.org/T306042 (10Joe) As of now, deployment-prep is using php 7.4 only. We can cleanup later and remove php 7.2 completely. [11:33:09] 10serviceops, 10Parsoid, 10Patch-For-Review, 10Performance-Team (Radar): Parsoid migration to php 7.4 - https://phabricator.wikimedia.org/T312638 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=5982b372-9469-405c-a18d-48d12b854a91) set by cgoubert@cumin1001 for 7 days, 0:00:00 on 3 host... [11:57:06] sre.hosts.downtime will set a downtime on (for instance) parse1001.mgmt if invoked with parse1001* as a host [11:57:26] But sre.hosts.remove-downtime won't remove it when invoked with the same query [11:57:32] That's odd [11:58:47] claime: what's the specific issue? the remove-downtime has not been updated to remove the downtime on Alertmanager btw, also because it would require the downtime ID [11:59:41] volans: I added a 1w downtime on parse100* servers while they were being prepped for production [12:00:12] Now I'm removing the downtimes one by one as I add the servers [12:00:34] But it only removes it for parse100x, not parse100x.mgmt [12:00:59] It's just a bit conterintuitive to have two cookbooks that seem like they should be symmetric but aren't [12:01:53] mmmh the downtime one doesn't downtime .mgmt automatically, as those are not in puppetdb [12:02:21] you have to pass --force to pass a verbatim Icinga "hostname" [12:02:24] that is not in puppetdb [12:03:27] I don't have the logs at hand, but if you check for instance https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=1&host=parse1003.mgmt [12:03:47] I did not set it into downtime manually, it was a sre.hosts.downtime on parse1* [12:06:39] I don't really mind, I'll remove them by hand, it just surprised me, that's all [12:06:50] it wasnt' teh downtime cookbook [12:07:00] your parse1* translated to icinga-status -j "parse[1001-1024]" [12:07:16] Scheduling downtime on Icinga server alert1001.wikimedia.or [12:07:17] g for hosts: parse[1001-1024] [12:08:15] you can remove them via the downtiem cookbook with --force parse[1001-1024].mgmt [12:08:22] or let them expire (they expire in few hours) [12:08:34] *the remove-downtime cookbook with... [12:09:08] I've checked the cookbook logs, and it wasn't the cookbook :) [12:09:26] Thank you :) [12:11:22] That reminds me I need to renew the downtime on those that are not yet in rotation [12:15:04] anytime :) [12:20:45] 10serviceops, 10Parsoid, 10Patch-For-Review, 10Performance-Team (Radar): Parsoid migration to php 7.4 - https://phabricator.wikimedia.org/T312638 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=83cf85ce-8731-463b-9d53-4500611c52ac) set by cgoubert@cumin1001 for 7 days, 0:00:00 on 18 hos... [13:15:42] I 'll have to skip today's SRE Monday update meeting btw. [13:17:06] feel free to add anything you want in the updates section of our team and talk about it though [13:46:37] 10serviceops: Put parse parse10[01-24] in production - https://phabricator.wikimedia.org/T307219 (10Clement_Goubert) `parse1003` replaced `wtp1036` `parse1005` replaced `wtp1037` [13:47:10] 10serviceops: Put parse parse10[01-24] in production - https://phabricator.wikimedia.org/T307219 (10Clement_Goubert) marked `wtp1034` and `wtp1035` as `inactive` [14:41:09] 10serviceops, 10Parsoid, 10Patch-For-Review, 10Performance-Team (Radar): Parsoid migration to php 7.4 - https://phabricator.wikimedia.org/T312638 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=3d4522be-4ec5-4d1a-8bba-1d5621e4d400) set by cgoubert@cumin1001 for 7 days, 0:00:00 on 3 host... [14:43:06] 10serviceops: Put parse parse10[01-24] in production - https://phabricator.wikimedia.org/T307219 (10Clement_Goubert) Icinga downtime and Alertmanager silence (ID=3d4522be-4ec5-4d1a-8bba-1d5621e4d400) set by cgoubert@cumin1001 for 7 days, 0:00:00 on 3 host(s) and their services with reason: Downtiming replace wtp... [14:45:26] 10serviceops: Put parse parse10[01-24] in production - https://phabricator.wikimedia.org/T307219 (10Clement_Goubert) `parse1005` replaced `wtp1038` `parse1006` replaced `wtp1039` 32% of parse traffic in php7.4 only marked `wtp1036` and `wtp1037` as `inactive` [16:26:06] 10serviceops: Put parse parse10[01-24] in production - https://phabricator.wikimedia.org/T307219 (10Clement_Goubert) `parse1007` replaced `wtp1040` 36% of parse traffic in php7.4 only marked `wtp1038` as inactive [17:17:57] 10serviceops: Decommission wtp10[25-48].eqiad.wmnet - https://phabricator.wikimedia.org/T317025 (10Clement_Goubert) p:05Triage→03Medium