[09:42:23] For today's meeting I propose to do it async as we have half team, thoughts? [09:48:49] +1 [10:05:50] 10netbox, 10Infrastructure-Foundations, 10Patch-For-Review: Netbox/DNS, browse parents prefixes to set site - https://phabricator.wikimedia.org/T294082 (10Volans) 05Open→03Resolved a:03Volans With the fix of the above patch merged I've run the dns.netbox cookbook that has moved over the records to the... [11:18:12] 10netops, 10Infrastructure-Foundations, 10SRE, 10observability, 10Sustainability (Incident Followup): Alert that should have paged did not reach VictorOps because of partial networking outage - https://phabricator.wikimedia.org/T294166 (10Volans) AFAIK we're still alerting just sending emails to VO inste... [11:56:23] +1 [13:47:23] 10Puppet, 10Infrastructure-Foundations, 10GitLab (Infrastructure), 10Patch-For-Review, and 3 others: Puppetise gitlab-ansible playbook - https://phabricator.wikimedia.org/T283076 (10Jelto) I identified at least two issues which prevent us from having a successful restore: One is puppet agent runs automati... [14:22:34] 10Puppet, 10Infrastructure-Foundations, 10GitLab (Infrastructure), 10Patch-For-Review, and 3 others: Puppetise gitlab-ansible playbook - https://phabricator.wikimedia.org/T283076 (10Dzahn) >>! In T283076#7454868, @Jelto wrote: > So we have to make sure GitLab is not started by puppet agent runs during the... [14:44:50] 10netops, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10ops-codfw: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL )) - https://phabricator.wikimedia.org/T283582 (10Papaul) @Dzahn I need mw2253 and contint2001 down for me to reset the IDRAC befor... [14:46:51] 10netops, 10Infrastructure-Foundations, 10SRE, 10observability, 10Sustainability (Incident Followup): Alert that should have paged did not reach VictorOps because of partial networking outage - https://phabricator.wikimedia.org/T294166 (10herron) To clarify, the alert did make it to VO after a delay http... [14:52:28] 10netops, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10ops-codfw: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL )) - https://phabricator.wikimedia.org/T283582 (10Dzahn) @Papaul mw2253 is not a problem. done. it's shut down and downtimed. cont... [14:54:25] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: Patch Telxius transport cross-connect to cr1-eqiad - https://phabricator.wikimedia.org/T293709 (10ayounsi) p:05Medium→03High Could you try to roll that fiber? Telxius is not receiving light from our side. Nor the other way around. [15:14:29] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: Patch Telxius transport cross-connect to cr1-eqiad - https://phabricator.wikimedia.org/T293709 (10RobH) [16:32:29] 10CAS-SSO, 10Infrastructure-Foundations, 10GitLab (Auth & Access), 10Release-Engineering-Team (Radar): Attempting to login to gitlab.wikimedia.org sometimes results in CAS 500 Internal Server Error - https://phabricator.wikimedia.org/T291964 (10jbond) Hi all i have updated idp.wikimedia.org today, could yo... [17:01:34] 10netops, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10ops-codfw: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL )) - https://phabricator.wikimedia.org/T283582 (10Dzahn) [17:02:37] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: Patch Telxius transport cross-connect to cr1-eqiad - https://phabricator.wikimedia.org/T293709 (10RobH) [17:02:43] 10netops, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10ops-codfw: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL )) - https://phabricator.wikimedia.org/T283582 (10Dzahn) @Papaul Let's go ahead with mw2253. For contint2001 please consider it sta... [17:03:46] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: Patch Telxius transport cross-connect to cr1-eqiad - https://phabricator.wikimedia.org/T293709 (10RobH) a:05Cmjohnson→03Jclark-ctr [17:15:33] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: Patch Telxius transport cross-connect to cr1-eqiad - https://phabricator.wikimedia.org/T293709 (10Jclark-ctr) [17:16:08] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: Patch Telxius transport cross-connect to cr1-eqiad - https://phabricator.wikimedia.org/T293709 (10Jclark-ctr) using light meter no light. from ports 31/32 on patch panel [17:16:38] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: Patch Telxius transport cross-connect to cr1-eqiad - https://phabricator.wikimedia.org/T293709 (10Jclark-ctr) a:05Jclark-ctr→03RobH [17:26:26] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: Patch Telxius transport cross-connect to cr1-eqiad - https://phabricator.wikimedia.org/T293709 (10RobH) [17:27:37] 10netops, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10ops-codfw: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL )) - https://phabricator.wikimedia.org/T283582 (10Papaul) [17:27:51] 10netops, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10ops-codfw: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL )) - https://phabricator.wikimedia.org/T283582 (10Papaul) [17:28:31] 10netops, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10ops-codfw: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL )) - https://phabricator.wikimedia.org/T283582 (10Papaul) @Dzahn mw2253 done [17:41:37] 10netops, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10ops-codfw: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL )) - https://phabricator.wikimedia.org/T283582 (10Dzahn) @Papaul. Thank you! - scap pulled - confirmed icinga green - repooled to... [17:51:09] 10Packaging, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Disable man-db in pbuilder in package_builder on deneb - https://phabricator.wikimedia.org/T276632 (10Legoktm) 05Open→03Resolved ` ... Setting up man-db (2.8.5-2) ... Not building database; man-db/auto-update is not 'true'. ` [19:33:36] 10netops, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10ops-codfw: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL )) - https://phabricator.wikimedia.org/T283582 (10Dzahn) [19:34:24] 10netops, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10ops-codfw: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL )) - https://phabricator.wikimedia.org/T283582 (10Dzahn) @Papaul Afraid this is a long story. just saw `mw2255.mgmt` alerting in Ic... [20:00:40] 10Packaging, 10Infrastructure-Foundations, 10SRE, 10Toolforge, 10Patch-For-Review: Please add php-imagick and php-redis packages to apt.wikimedia.org thirdparty/php72 - https://phabricator.wikimedia.org/T200666 (10Dzahn) This is needed for 7.4 now < James_F> We're depending on php-imagick which doesn't... [20:18:33] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review, and 2 others: Alert that should have paged did not reach VictorOps because of partial networking outage - https://phabricator.wikimedia.org/T294166 (10herron) [22:08:39] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10ops-ulsfo: (Need By: TBD) rack/setup/install new mr1-ulsfo - https://phabricator.wikimedia.org/T294314 (10RobH) [22:08:49] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10ops-ulsfo: (Need By: TBD) rack/setup/install new mr1-ulsfo - https://phabricator.wikimedia.org/T294314 (10RobH)