[07:12:15] 10netops, 10Analytics, 10DBA, 10Infrastructure-Foundations, and 3 others: Switch buffer re-partition - Eqiad Row D - https://phabricator.wikimedia.org/T286069 (10MoritzMuehlenhoff) [07:14:49] we should probably add another column to the switch buffer tasks to track which actions have been reverted. some of them are idempotent (like "Notify Jaime"), but we should probably also track that e.g. servers which were depooled are pooled again? [07:16:11] large phab tables are impossible to manage [07:17:19] a list with checkboxes could be an option though [07:19:52] yeah, it needs needs a visual editor :-) [07:20:22] fine either way on the format, but let's make sure everything gets properly reverted, especially since there's plenty of people involved [07:33:13] moritzm: indeed, and yes the format isn't good, I think next time a spreadsheet or something might be a better option. [07:34:19] even managing the verification that all work has been carrier out prior is a bit of a hassle, in terms of the format. [07:34:58] What I may do is wait until we are complete later, then add that additional column and start filling it out? [07:37:04] or what we could do: [07:37:46] Phab markup supports strikethrough using ~foo~~: https://secure.phabricator.com/book/phabricator/article/remarkup/ [07:38:13] once the maintenance is over, we can simply strikethrough all action items, which were done or which aren't needed [07:38:23] this way we can track this and don't need to fiddle with another column [07:38:34] should be ~~foo~~ [08:10:49] hmmm thanks Moritz that's not a bad idea [08:15:01] XioNoX, topranks: fwiw I'd like to mention one thing for today switch maintenance [08:16:01] with the codfw switch going down, one of the ES host needed a restart of elasticsearch because it was stuck after having failed to resolve a DNS entry (logstash in that case). I don't know if that was because it was reloaded/restarted after we lost the switch or what. [08:16:25] But seemed something worth mentioning, I hope that elastic doesn't get stuck just for few seconds of network unrechability [08:20:33] Hopefully. Guillaume and Ryan asked that we ping them after so they can do a quick health check so we will do that. [08:21:17] #greatsoftware [08:21:30] ok, great, just to be on the safe side, remind them to avoid running wdqs cookbooks (those take days) that involve those rows [09:52:58] 10netops, 10Analytics, 10DBA, 10Infrastructure-Foundations, and 3 others: Switch buffer re-partition - Eqiad Row D - https://phabricator.wikimedia.org/T286069 (10aborrero) [09:54:33] 10netops, 10Analytics, 10DBA, 10Infrastructure-Foundations, and 3 others: Switch buffer re-partition - Eqiad Row D - https://phabricator.wikimedia.org/T286069 (10aborrero) [09:54:50] 10SRE-tools, 10Infrastructure-Foundations, 10SRE Observability, 10IPv6: Some Observability clusters do not support IPv6. - https://phabricator.wikimedia.org/T271138 (10fgiunchedi) [12:34:07] 10netops, 10Infrastructure-Foundations, 10SRE, 10Datacenter-Switchover: Record traffic flows in and out of eqiad during switchover - https://phabricator.wikimedia.org/T286038 (10ayounsi) Pushing the following (and similar on cr2) should do the trick. As it's only for a few days, and it would not be trivial... [12:56:56] 10CAS-SSO, 10Cloud Services Proposals, 10Infrastructure-Foundations, 10LDAP, and 3 others: Create solution for developer account authentication for services hosted in Cloud VPS - https://phabricator.wikimedia.org/T286716 (10jbond) We currently have a cloud version of [[ https://idp.wmfcloud.org | idp ]] wh... [13:16:05] 10CAS-SSO, 10Cloud Services Proposals, 10Infrastructure-Foundations, 10LDAP, and 3 others: Create solution for developer account authentication for services hosted in Cloud VPS - https://phabricator.wikimedia.org/T286716 (10jbond) Also worth pointing out the following debugging end point https://idp-test-l... [13:16:24] 10CAS-SSO, 10Cloud Services Proposals, 10Infrastructure-Foundations, 10LDAP, and 3 others: Create solution for developer account authentication for services hosted in Cloud VPS - https://phabricator.wikimedia.org/T286716 (10jbond) p:05Triage→03Medium [13:48:59] 10netops, 10Infrastructure-Foundations, 10SRE, 10Datacenter-Switchover: Record traffic flows in and out of eqiad during switchover - https://phabricator.wikimedia.org/T286038 (10cmooney) Looks good to me @ayounsi if you want to commit. I would totally agree btw, Netflow is probably handled in silicon, o... [13:59:56] 10netops, 10Analytics, 10DBA, 10Infrastructure-Foundations, and 3 others: Switch buffer re-partition - Eqiad Row D - https://phabricator.wikimedia.org/T286069 (10ArielGlenn) [14:03:11] 10netops, 10Analytics, 10DBA, 10Infrastructure-Foundations, and 3 others: Switch buffer re-partition - Eqiad Row D - https://phabricator.wikimedia.org/T286069 (10MoritzMuehlenhoff) [14:40:56] 10netops, 10Analytics, 10DBA, 10Infrastructure-Foundations, and 3 others: Switch buffer re-partition - Eqiad Row D - https://phabricator.wikimedia.org/T286069 (10ops-monitoring-bot) Icinga downtime set by vgutierrez@cumin1001 for 1:00:00 4 host(s) and their services with reason: eqiad row D maintenance ` c... [14:43:17] 10netops, 10Analytics, 10DBA, 10Infrastructure-Foundations, and 3 others: Switch buffer re-partition - Eqiad Row D - https://phabricator.wikimedia.org/T286069 (10Vgutierrez) [14:48:40] 10netops, 10Analytics, 10DBA, 10Infrastructure-Foundations, and 3 others: Switch buffer re-partition - Eqiad Row D - https://phabricator.wikimedia.org/T286069 (10ops-monitoring-bot) Icinga downtime set by vgutierrez@cumin1001 for 1:00:00 1 host(s) and their services with reason: eqiad row D maintenance ` d... [14:49:40] 10netops, 10Analytics, 10DBA, 10Infrastructure-Foundations, and 3 others: Switch buffer re-partition - Eqiad Row D - https://phabricator.wikimedia.org/T286069 (10Vgutierrez) [14:51:18] 10netops, 10Analytics, 10DBA, 10Infrastructure-Foundations, and 3 others: Switch buffer re-partition - Eqiad Row D - https://phabricator.wikimedia.org/T286069 (10ops-monitoring-bot) Icinga downtime set by vgutierrez@cumin1001 for 1:00:00 1 host(s) and their services with reason: eqiad row D maintenance ` l... [14:55:36] 10netops, 10Analytics, 10DBA, 10Infrastructure-Foundations, and 3 others: Switch buffer re-partition - Eqiad Row D - https://phabricator.wikimedia.org/T286069 (10cmooney) [15:06:14] 10netops, 10Analytics, 10DBA, 10Infrastructure-Foundations, and 3 others: Switch buffer re-partition - Eqiad Row D - https://phabricator.wikimedia.org/T286069 (10cmooney) [15:11:42] 10netops, 10Analytics, 10DBA, 10Infrastructure-Foundations, and 3 others: Switch buffer re-partition - Eqiad Row D - https://phabricator.wikimedia.org/T286069 (10cmooney) [15:52:21] 10netops, 10Analytics, 10DBA, 10Infrastructure-Foundations, and 3 others: Switch buffer re-partition - Eqiad Row D - https://phabricator.wikimedia.org/T286069 (10cmooney) All works complete, no signs of any issues really, I had no ping loss on 16 pings towards 2 hosts connected off each member switch. Ver... [15:53:19] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Adjust egress buffer allocations on ToR switches - https://phabricator.wikimedia.org/T284592 (10cmooney) [17:08:50] 10netops, 10Analytics, 10DBA, 10Infrastructure-Foundations, and 3 others: Switch buffer re-partition - Eqiad Row D - https://phabricator.wikimedia.org/T286069 (10Andrew) [17:53:37] 10netops, 10Analytics, 10DBA, 10Infrastructure-Foundations, and 3 others: Switch buffer re-partition - Eqiad Row D - https://phabricator.wikimedia.org/T286069 (10Bstorm)