[11:18:10] 10netops, 10Infrastructure-Foundations, 10SRE, 10observability, 10Sustainability (Incident Followup): Alert that should have paged did not reach VictorOps because of partial networking outage - https://phabricator.wikimedia.org/T294166 (10Volans) AFAIK we're still alerting just sending emails to VO inste... [13:08:43] 10Traffic, 10Observability-Logging, 10SRE, 10Patch-For-Review, 10User-ema: varnishmtail metric loss due to mtail not reading from pipe fast enough - https://phabricator.wikimedia.org/T293879 (10ema) >>! In T293879#7454657, @gerritbot wrote: > Change 732925 **merged** by Ema: > %%%[operations/puppet@produ... [14:44:50] 10netops, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10ops-codfw: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL )) - https://phabricator.wikimedia.org/T283582 (10Papaul) @Dzahn I need mw2253 and contint2001 down for me to reset the IDRAC befor... [14:46:51] 10netops, 10Infrastructure-Foundations, 10SRE, 10observability, 10Sustainability (Incident Followup): Alert that should have paged did not reach VictorOps because of partial networking outage - https://phabricator.wikimedia.org/T294166 (10herron) To clarify, the alert did make it to VO after a delay http... [14:52:28] 10netops, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10ops-codfw: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL )) - https://phabricator.wikimedia.org/T283582 (10Dzahn) @Papaul mw2253 is not a problem. done. it's shut down and downtimed. cont... [14:54:25] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: Patch Telxius transport cross-connect to cr1-eqiad - https://phabricator.wikimedia.org/T293709 (10ayounsi) p:05Medium→03High Could you try to roll that fiber? Telxius is not receiving light from our side. Nor the other way around. [15:09:27] 10Traffic, 10SRE, 10observability, 10Discovery-Search (Current work), 10Patch-For-Review: flapping icinga Letsencrypt TLS cert alerts around renewal time - https://phabricator.wikimedia.org/T293826 (10MPhamWMF) [15:14:29] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: Patch Telxius transport cross-connect to cr1-eqiad - https://phabricator.wikimedia.org/T293709 (10RobH) [15:49:40] 10Traffic, 10SRE, 10observability, 10Discovery-Search (Current work), 10Patch-For-Review: flapping icinga Letsencrypt TLS cert alerts around renewal time - https://phabricator.wikimedia.org/T293826 (10MPhamWMF) [17:01:36] 10netops, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10ops-codfw: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL )) - https://phabricator.wikimedia.org/T283582 (10Dzahn) [17:02:37] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: Patch Telxius transport cross-connect to cr1-eqiad - https://phabricator.wikimedia.org/T293709 (10RobH) [17:02:45] 10netops, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10ops-codfw: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL )) - https://phabricator.wikimedia.org/T283582 (10Dzahn) @Papaul Let's go ahead with mw2253. For contint2001 please consider it sta... [17:03:46] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: Patch Telxius transport cross-connect to cr1-eqiad - https://phabricator.wikimedia.org/T293709 (10RobH) a:05Cmjohnson→03Jclark-ctr [17:15:34] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: Patch Telxius transport cross-connect to cr1-eqiad - https://phabricator.wikimedia.org/T293709 (10Jclark-ctr) [17:16:08] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: Patch Telxius transport cross-connect to cr1-eqiad - https://phabricator.wikimedia.org/T293709 (10Jclark-ctr) using light meter no light. from ports 31/32 on patch panel [17:16:39] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: Patch Telxius transport cross-connect to cr1-eqiad - https://phabricator.wikimedia.org/T293709 (10Jclark-ctr) a:05Jclark-ctr→03RobH [17:26:28] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: Patch Telxius transport cross-connect to cr1-eqiad - https://phabricator.wikimedia.org/T293709 (10RobH) [17:27:37] 10netops, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10ops-codfw: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL )) - https://phabricator.wikimedia.org/T283582 (10Papaul) [17:27:51] 10netops, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10ops-codfw: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL )) - https://phabricator.wikimedia.org/T283582 (10Papaul) [17:28:31] 10netops, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10ops-codfw: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL )) - https://phabricator.wikimedia.org/T283582 (10Papaul) @Dzahn mw2253 done [17:41:37] 10netops, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10ops-codfw: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL )) - https://phabricator.wikimedia.org/T283582 (10Dzahn) @Papaul. Thank you! - scap pulled - confirmed icinga green - repooled to... [19:33:36] 10netops, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10ops-codfw: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL )) - https://phabricator.wikimedia.org/T283582 (10Dzahn) [19:34:24] 10netops, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10ops-codfw: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL )) - https://phabricator.wikimedia.org/T283582 (10Dzahn) @Papaul Afraid this is a long story. just saw `mw2255.mgmt` alerting in Ic... [20:17:56] (EdgeTrafficDrop) firing: 51% request drop in text@eqiad during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=eqiad&var-cache_type=text - https://alerts.wikimedia.org [20:18:33] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review, and 2 others: Alert that should have paged did not reach VictorOps because of partial networking outage - https://phabricator.wikimedia.org/T294166 (10herron) [20:22:56] (EdgeTrafficDrop) resolved: 67% request drop in text@eqiad during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=eqiad&var-cache_type=text - https://alerts.wikimedia.org [21:42:31] (log authdns1001 (DNS) - sudo authdns-update, add new project language "pwn" (Paiwan) for T292415) - edited langlist.tmpl which regenerates all project zones [21:42:31] T292415: Create Wikipedia Paiwan - https://phabricator.wikimedia.org/T292415 [21:50:14] !log log authdns1001 (DNS) - sudo authdns-update, add new project language "ami" (Amis) for T292414 - edited langlist.tmpl which regenerates all project zones [21:50:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:50:22] T292414: Create Wikipedia Amis - https://phabricator.wikimedia.org/T292414 [22:08:40] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10ops-ulsfo: (Need By: TBD) rack/setup/install new mr1-ulsfo - https://phabricator.wikimedia.org/T294314 (10RobH) [22:08:49] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10ops-ulsfo: (Need By: TBD) rack/setup/install new mr1-ulsfo - https://phabricator.wikimedia.org/T294314 (10RobH)