[05:52:34] 10netops, 10Infrastructure-Foundations, 10SRE: TATA SKY Broadband (AS134674) issues with connecting to upload.wikimedia.org - https://phabricator.wikimedia.org/T275234 (10ayounsi) 05Open→03Resolved a:03ayounsi No more news from Tata Sky and nothing we can do at our network layer neither. To be reopened... [07:57:22] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Test dhcp-option 82 - https://phabricator.wikimedia.org/T221388 (10Volans) 05Open→03Resolved The test of the option 82 has been successful and we're switching to this system for all physical hosts DHCP settings in T269855. In the final... [08:04:41] 10netops, 10Infrastructure-Foundations, 10SRE, 10Traffic-Icebox: externally-hosted NEL report forwarders for more timely report reception - https://phabricator.wikimedia.org/T292870 (10ayounsi) I'd wary of the complexity of the setup. As I'm not quite familiar with NEL setup, is there a downside of puttin... [08:07:34] 10netbox, 10Infrastructure-Foundations, 10Patch-For-Review: Manage DHCP from Netbox - https://phabricator.wikimedia.org/T269855 (10Volans) >>! In T269855#7415861, @gerritbot wrote: > Change 727415 **merged** by Volans: > %%%[operations/puppet@production] cumin: remove wmf-auto-reimage scripts%%% > https://ge... [08:22:53] 10netbox, 10Infrastructure-Foundations, 10Patch-For-Review: Manage DHCP from Netbox - https://phabricator.wikimedia.org/T269855 (10Volans) Removed the old directory for the renamed cookbook: `lang=shell $ sudo cumin 'A:cumin' 'rm -rfv /srv/deployment/spicerack/cookbooks/sre/experimental' 3 hosts will be tar... [08:24:26] 10netbox, 10Infrastructure-Foundations, 10Patch-For-Review: Manage DHCP from Netbox - https://phabricator.wikimedia.org/T269855 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by volans@cumin1001 for host sretest1001.eqiad.wmnet [08:41:11] 10netbox, 10Infrastructure-Foundations, 10Patch-For-Review: Manage DHCP from Netbox - https://phabricator.wikimedia.org/T269855 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by volans@cumin2002 for host sretest1002.eqiad.wmnet [08:48:38] 10netbox, 10Infrastructure-Foundations, 10Patch-For-Review: Manage DHCP from Netbox - https://phabricator.wikimedia.org/T269855 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by volans@cumin1001 for host sretest1001.eqiad.wmnet completed: - sretest1001 (**PASS**) - Downtimed on Icinga... [09:05:55] 10netbox, 10Infrastructure-Foundations, 10Patch-For-Review: Manage DHCP from Netbox - https://phabricator.wikimedia.org/T269855 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by volans@cumin2002 for host sretest1002.eqiad.wmnet completed: - sretest1002 (**PASS**) - Downtimed on Icinga... [10:23:23] topranks: thank you for the detailed debugging about management interface flapping ( https://phabricator.wikimedia.org/T283582#7394322 ) :D [10:28:34] hashar: np. I see the update there nice to get clarification we're on the right track! [10:28:56] 10SRE-tools, 10Infrastructure-Foundations: Spicerack: split wmf-auto-reimage-lib into Spicerack modules - https://phabricator.wikimedia.org/T205884 (10Volans) 05In progress→03Resolved With the completion of the conversion of the reimage script to the sre.hosts.reimage cookbook, all the needed bits that wer... [10:29:02] 10SRE-tools, 10Infrastructure-Foundations, 10SRE, 10Goal: Expand Spicerack library and SRE Cookbooks - Q2 2018-19 Goal - https://phabricator.wikimedia.org/T205867 (10Volans) [10:30:52] 10SRE-tools, 10Infrastructure-Foundations: Cookbooks: convert wmf-auto-reimage scripts to Cookbooks - https://phabricator.wikimedia.org/T205885 (10Volans) 05In progress→03Resolved a:03Volans The conversion to the sre.hosts.reimage cookbook has been completed. The new procedure is outlined in https://wiki... [10:30:58] 10SRE-tools, 10Infrastructure-Foundations, 10SRE, 10Goal: Expand Spicerack library and SRE Cookbooks - Q2 2018-19 Goal - https://phabricator.wikimedia.org/T205867 (10Volans) [10:31:41] 10SRE-tools, 10Infrastructure-Foundations: Cookbooks: convert wmf-auto-reimage scripts to Cookbooks - https://phabricator.wikimedia.org/T205885 (10Volans) [10:31:47] 10SRE-tools, 10Infrastructure-Foundations, 10SRE: wmf-auto-reimage tries to remove from Debmonitor even with --new - https://phabricator.wikimedia.org/T204789 (10Volans) 05Open→03Resolved a:03Volans The current procedure outlined at https://wikitech.wikimedia.org/wiki/Server_Lifecycle/Reimage doesn't h... [10:31:56] 10SRE-tools, 10Infrastructure-Foundations, 10SRE, 10Goal: Expand Spicerack library and SRE Cookbooks - Q2 2018-19 Goal - https://phabricator.wikimedia.org/T205867 (10Volans) [10:32:34] 10SRE-tools, 10Infrastructure-Foundations, 10SRE, 10Goal: Expand Spicerack library and SRE Cookbooks - Q2 2018-19 Goal - https://phabricator.wikimedia.org/T205867 (10Volans) 05Open→03Resolved With the conversion of the reimage scripts to the sre.hosts.reimage cookbook this has been finally completed. T... [10:34:10] 10SRE-tools, 10Infrastructure-Foundations, 10SRE: wmf-auto-reimage errors: failure to downtime (w/ no rename), pytho gc whine - https://phabricator.wikimedia.org/T239897 (10Volans) 05Open→03Resolved a:03Volans Resolving as the reimage scripts have been ported to the sre.hosts.reimage cookbook and this... [10:39:56] 10SRE-tools, 10Infrastructure-Foundations: wmf-auto-reimage-host on HP gen10 WARNING: unable to verify that BIOS boot parameters are back to normal, got: - https://phabricator.wikimedia.org/T234358 (10Volans) 05Open→03Resolved a:03Volans I'm not sure why I did comment that way back in 2019, but from the... [10:41:25] 10SRE-tools, 10Infrastructure-Foundations, 10SRE, 10observability: cookbook sre.hosts.downtime: add feature to remove downtimes - https://phabricator.wikimedia.org/T251519 (10Volans) 05Open→03Resolved a:03Volans Since a while there is the `sre.hosts.remove-downtime: Remove the Icinga downtime for the... [10:44:36] 10SRE-tools, 10Infrastructure-Foundations, 10SRE: Exception raised while executing cookbook sre.hosts.downtime - https://phabricator.wikimedia.org/T259158 (10Volans) The reimage scripts have been converted to the sre.hosts.reimage cookbook and don't have anymore the race condition that was present here, henc... [10:45:13] 10SRE-tools, 10Infrastructure-Foundations, 10SRE: Exception raised while executing cookbook sre.hosts.downtime - https://phabricator.wikimedia.org/T259158 (10Volans) 05Open→03Resolved a:03Volans [10:48:09] 10SRE-tools, 10Infrastructure-Foundations, 10SRE: wmf-auto-reimage: 'execution expired' on first puppet run - https://phabricator.wikimedia.org/T201317 (10Volans) @ema do you know if this is still happening? If so it seems more of a puppetization issue than a reimage one, should we remove the SRE-tools tag? [10:50:04] 10SRE-tools, 10Infrastructure-Foundations, 10SRE: wmf-auto-reimage should retry on ipmi failures - https://phabricator.wikimedia.org/T201669 (10Volans) 05Open→03Resolved a:03Volans The reimage scripts have been converted to the sre.hosts.reimage cookbook that checks that a working IPMI connection to th... [10:54:14] 10SRE-tools, 10Infrastructure-Foundations: Better detection for "reboot into PXE failed" conditions in wmf-auto-reimage - https://phabricator.wikimedia.org/T261956 (10Volans) p:05Triage→03Medium The reimage scripts have been converted to the sre.hosts.reimage cookbook. While this issue could still happenin... [10:54:48] sorry for the spam, I've been closing/updating a lot of reimage/DHCP related tasks [10:56:52] closing tasks, the best kind of spam! [11:12:52] 10Puppet, 10Infrastructure-Foundations: Admin module should use systemd-sysuser for syustem accounts - https://phabricator.wikimedia.org/T292965 (10jbond) p:05Triage→03Lowest [11:18:31] 10Puppet, 10Infrastructure-Foundations: Admin module should use systemd-sysuser for system accounts - https://phabricator.wikimedia.org/T292965 (10Lucas_Werkmeister_WMDE) [11:40:15] 10Puppet, 10Infrastructure-Foundations, 10GitLab (Infrastructure), 10Patch-For-Review, and 3 others: Puppetise gitlab-ansible playbook - https://phabricator.wikimedia.org/T283076 (10hashar) @brennen Given the configuration has been moved from Ansible to Puppet may we archive the Gerrit repo? ( https://gerr... [14:31:29] hi all i forgot to add to last mondays meeting but it is a ntional holiday tomorrow (https://en.wikipedia.org/wiki/National_Day_of_Spain) so i will be off [14:42:41] enjoy! [14:51:00] thanks :)