[01:19:12] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Rancid unable to login to network devices - https://phabricator.wikimedia.org/T314936 (10andrea.denisse) Hello team, after further testing it the least disruptive and simplest approach is to create the `.ssh` directory using Puppet. It nee... [03:13:07] 10netops, 10Infrastructure-Foundations, 10Observability-Metrics, 10SRE Observability (FY2022/2023-Q1): LibreNMS seemingly not scraping many devices after migration to netmon1003 - https://phabricator.wikimedia.org/T314972 (10CDanis) [03:13:13] 10netops, 10Infrastructure-Foundations, 10Observability-Metrics, 10SRE Observability (FY2022/2023-Q1): LibreNMS seemingly not scraping many devices after migration to netmon1003 - https://phabricator.wikimedia.org/T314972 (10CDanis) p:05Triage→03High [03:14:23] 10netops, 10Infrastructure-Foundations, 10Observability-Metrics, 10SRE Observability (FY2022/2023-Q1): LibreNMS seemingly not collecting data for many ports after migration to netmon1003 - https://phabricator.wikimedia.org/T314972 (10CDanis) [03:47:27] 10netops, 10Infrastructure-Foundations, 10Observability-Metrics, 10SRE, 10SRE Observability (FY2022/2023-Q1): LibreNMS seemingly not collecting data for many ports after migration to netmon1003 - https://phabricator.wikimedia.org/T314972 (10ayounsi) Looks like permission issues: `name=netmon1003 ayounsi@... [04:18:58] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Rancid unable to login to network devices - https://phabricator.wikimedia.org/T314936 (10ayounsi) > @ayounsi do you anticipate any fallout from this? I agree that it's better to check host keys, so +1 as long as: * there is some kind of al... [04:53:48] 10netops, 10Infrastructure-Foundations, 10Observability-Metrics, 10SRE, and 2 others: LibreNMS seemingly not collecting data for many ports after migration to netmon1003 - https://phabricator.wikimedia.org/T314972 (10andrea.denisse) I think that the owner is override to '`deploy-librenms`' during the [[ ht... [07:57:24] 10netops, 10Infrastructure-Foundations: Telia ulsfo-eqord transport link down - https://phabricator.wikimedia.org/T314978 (10ayounsi) p:05Triage→03High [07:58:03] 10netops, 10Infrastructure-Foundations: Telia ulsfo-eqord transport link down - https://phabricator.wikimedia.org/T314978 (10ops-monitoring-bot) ===== Automated diagnostic for Netbox interface ID cr3-ulsfo:xe-0/1/1 --- **Interface cr3-ulsfo:xe-0/1/1** - admin-status: up - ⚠️ oper-status: down - interface-fl... [08:05:36] 10netops, 10Infrastructure-Foundations: Telia ulsfo-eqord transport link down - https://phabricator.wikimedia.org/T314978 (10ayounsi) Email sent to Telia's NOC - https://netbox.wikimedia.org/tenancy/contacts/21/ [08:06:25] 10netops, 10Infrastructure-Foundations: Telia ulsfo-eqord transport link down - https://phabricator.wikimedia.org/T314978 (10ops-monitoring-bot) ===== Automated diagnostic for Netbox interface ID cr3-ulsfo:xe-0/1/1 --- **Interface cr3-ulsfo:xe-0/1/1** - admin-status: up - oper-status: up - interface-flapped... [08:07:46] 10netops, 10Infrastructure-Foundations: Telia ulsfo-eqord transport link down - https://phabricator.wikimedia.org/T314978 (10ayounsi) And of course it went back up as I'm sending the email. Also got a quick reply from Telia: > Please be informed that your circuit is affected by a Major Disturbance being track... [08:34:01] 10netops, 10Infrastructure-Foundations, 10Observability-Metrics, 10SRE, and 2 others: LibreNMS seemingly not collecting data for many ports after migration to netmon1003 - https://phabricator.wikimedia.org/T314972 (10fgiunchedi) My apologies! I ran the quickdatacopy the other day ahead of the failover and... [09:12:43] 10netops, 10Infrastructure-Foundations, 10Observability-Metrics, 10SRE, and 2 others: LibreNMS seemingly not collecting data for many ports after migration to netmon1003 - https://phabricator.wikimedia.org/T314972 (10fgiunchedi) I looked into why quickdatacopy didn't do the right thing: * the rsync server... [09:14:41] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Rancid unable to login to network devices - https://phabricator.wikimedia.org/T314936 (10fgiunchedi) Agreed on the short term fix to create the `.ssh` directory. However if we were not checking host keys to begin with I think we should keep... [09:41:03] 10netops, 10Infrastructure-Foundations, 10SRE: Telia ulsfo-eqord transport link down - https://phabricator.wikimedia.org/T314978 (10ayounsi) 05Open→03Resolved a:03ayounsi [12:13:14] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Rancid unable to login to network devices - https://phabricator.wikimedia.org/T314936 (10cmooney) @fgiunchedi yeah that may be an option. I'm not sure how easy it is to change Rancid to add that to the command when running ssh, but I'm sur... [12:23:07] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Rancid unable to login to network devices - https://phabricator.wikimedia.org/T314936 (10cmooney) Another oddity here with rancid from netmon1003. The permission change has removed the problem for most of our estate (all the Juniper device... [12:32:38] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Rancid unable to login to network devices - https://phabricator.wikimedia.org/T314936 (10cmooney) Logs suggest a timeout: ` scs-oe16-esams.mgmt.esams.wmnet oglogin error: Error: TIMEOUT reached scs-oe16-esams.mgmt.esams.wmnet: missed cmd(s... [13:01:49] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Rancid unable to login to network devices - https://phabricator.wikimedia.org/T314936 (10cmooney) I believe the issue is that the expect script Rancid is running for these is not saying "yes" to accept the host key. This did not happen in... [14:30:32] 10netbox, 10Infrastructure-Foundations: Move AS allocations to Netbox - https://phabricator.wikimedia.org/T310744 (10ayounsi) 05Open→03Resolved [14:30:38] 10netbox, 10Infrastructure-Foundations: Move AS allocations to Netbox - https://phabricator.wikimedia.org/T310744 (10ayounsi) a:03ayounsi [15:12:05] 10netbox, 10Infrastructure-Foundations: Netbox: decom old production infrastructure - https://phabricator.wikimedia.org/T310716 (10ayounsi) 05Open→03Resolved a:03ayounsi This is done. [16:22:09] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Rancid on netmon1003 unable to login to network devices - https://phabricator.wikimedia.org/T314936 (10cmooney) [19:44:43] 10netops, 10Infrastructure-Foundations, 10SRE, 10Discovery-Search (Current work): Possible problem communicating between eqiad elastic hosts in racks F2 and F3 - https://phabricator.wikimedia.org/T315038 (10ayounsi) [19:44:50] 10netops, 10Infrastructure-Foundations, 10SRE, 10Discovery-Search (Current work): Possible problem communicating between eqiad elastic hosts in racks F2 and F3 - https://phabricator.wikimedia.org/T315038 (10ayounsi) p:05Triage→03High [20:02:57] 10netbox, 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Represent sub-interface and bridge device assocations in Netbox - https://phabricator.wikimedia.org/T296832 (10cmooney) The above patch uses the new puppet facts to define vlan sub-interface and bridge relations as described in... [20:21:19] 10netops, 10Data-Engineering, 10Infrastructure-Foundations, 10Product-Analytics, and 3 others: Maybe restrict domains accessible by webproxy - https://phabricator.wikimedia.org/T300977 (10herron) [20:46:14] 10Puppet, 10Infrastructure-Foundations, 10SRE, 10Wikimedia-Mailing-lists: Unable to clone "operations/puppet" repo successfully on Windows - https://phabricator.wikimedia.org/T314698 (10Dzahn) @Novem_Linguae While we are still thinking about a better fix, there is at least one work around using WSL on Wind... [20:47:21] 10netops, 10Infrastructure-Foundations, 10SRE, 10Discovery-Search (Current work): Possible problem communicating between eqiad elastic hosts in racks F2 and F3 - https://phabricator.wikimedia.org/T315038 (10ayounsi) I had a quick look and can't find any smoking gun so far. The issue seems to be related to... [21:01:43] 10Puppet, 10Infrastructure-Foundations, 10SRE, 10Wikimedia-Mailing-lists: Unable to clone "operations/puppet" repo successfully on Windows - https://phabricator.wikimedia.org/T314698 (10Dzahn) mailman3 upstream docs at https://docs.mailman3.org/projects/mailman/en/latest/src/mailman/rest/docs/templates.htm... [21:02:12] 10Puppet, 10Infrastructure-Foundations, 10SRE, 10Wikimedia-Mailing-lists: Unable to clone "operations/puppet" repo successfully on Windows (mailman3 template names use colon in file names) - https://phabricator.wikimedia.org/T314698 (10Dzahn) [21:24:43] 10netbox, 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Represent sub-interface and bridge device assocations in Netbox - https://phabricator.wikimedia.org/T296832 (10ayounsi) > My goal is to proceed and update the automation to set switch interface access/trunk and allowed vlans onc... [23:00:40] 10Puppet, 10Infrastructure-Foundations, 10SRE, 10Wikimedia-Mailing-lists: Unable to clone "operations/puppet" repo successfully on Windows (mailman3 template names use colon in file names) - https://phabricator.wikimedia.org/T314698 (10Legoktm)