[01:23:07] Hello world! I am a new contributor with a background in software development. I am diving in on the ops side of things and looking forward to learning and collaborating. Any tips on the best way to hit the ground running will be appreciated. [01:33:44] Hello everyone, I am very excited to be here. I come from a software development background and getting started with the "Ops" side of things. I am looking forward to getting started with the foundations team and any tips to hit the ground running will be appreciated. [03:16:31] (SystemdUnitFailed) firing: (6) generate_os_reports.service Failed on cumin2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:59:19] volans: good morning! how do we delete https://github.com/wikimedia/operations-software-netbox-reports ? [07:00:43] XioNoX: good morning [07:00:52] what's its status in gerrit? [07:01:39] it's still there, so I think we need first to archive/delete on gerrit, see if it get's removed/archived from github too or do it manually [07:02:05] ok, thx! I'll open a task [07:03:43] volans: is that something for us or for releng? or someone else? [07:04:13] I think we could commit on the README to state that the repo is obsolete and superseeded by the extras one with a link first [07:04:26] why? [07:04:39] I was looking at https://www.mediawiki.org/wiki/Gerrit/Inactive_projects but for intentionally archived repos I see they just point to https://phabricator.wikimedia.org/project/profile/2829/ [07:05:14] so I guess a task in https://phabricator.wikimedia.org/project/board/2829/ maybe? [07:05:49] most tasks are about extensions, so not sure [07:06:21] XioNoX: yeah I'd say something like T346176 [07:06:22] T346176: Archive wikimedia/discovery/analytics - https://phabricator.wikimedia.org/T346176 [07:06:49] it looks like they used a template, so far I didn't find a link to use a specific template [07:08:07] there's https://phabricator.wikimedia.org/maniphest/task/edit/form/33/, but that's very mediawiki extension specific [07:08:14] yeah [07:08:31] opened https://phabricator.wikimedia.org/T346600 [07:10:32] someone with access to this repo wouldn't mind removing the "knams" string https://github.com/wikimedia/operations-mediawiki-config/blob/65a8a0ef02746832823a6259e3a5a57068aa9b79/typos#L8 ? [07:10:38] and probably pmpta too [07:11:12] and easy review on https://gerrit.wikimedia.org/r/c/operations/dns/+/958390 (just changing comments) [07:16:31] (SystemdUnitFailed) firing: (6) generate_os_reports.service Failed on cumin2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:18:49] https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/958392/, I can merge that once an another patch on that repo is done deploying [07:21:37] thanks [07:21:38] Now, I'm wondering if we should also add drmrs and eqsin, probably not much relevant here as there are no fqdns of hosts in those POPs in the repo [07:37:59] 10netbox, 10netops, 10Infrastructure-Foundations, 10SRE: Netbox Juniper report - https://phabricator.wikimedia.org/T306238 (10SLyngshede-WMF) Rereading the answer for Juniper: > For OIDC we’ll need your IDToken which would look like below or the IDP Issuer URL (This URL must be publicly accessible). > S... [07:40:45] XioNoX: I just commented on https://phabricator.wikimedia.org/T306238#9173518, I think they just need the URL [07:57:59] 10SRE-tools, 10Infrastructure-Foundations, 10Spicerack: spicerack.phabricator: Don't fail when logging to a restricted task - https://phabricator.wikimedia.org/T335879 (10Volans) Yes and no. The wmflib code could be improved to distinguish between a permission error and any other error and raise two differen... [08:20:08] slyngs: thanks, should I give them the prod url directly? [08:20:18] I'll CC you in my reply [08:21:24] Did we hand them credentials as well? If they already have the production credentials I think we should just try to give them: https://idp.wikimedia.org/oidc/oidcAuthorize [08:21:53] We might still need to tweak some settings in CAS, but that depends on their implementation [08:23:15] yeah I'm pretty sure I gave them the prod creds [08:25:03] :-) [08:25:39] Let's give them the production URL and then see what the logs says, if they can use the URL [08:30:59] slyngs: sent [08:31:39] I'm still a little confused as to how this works on the Juniper side of things [08:35:28] slyngs: don't worry, they're probably confused too [08:36:36] https://media.tenor.com/6VjHriBVc20AAAAC/laugh-bus.gif [09:16:28] homer's failure with "Hash-Mode not set to layer2-payload: inet fields can not be configured' are known? [09:17:51] that's on 2 cloudsw1 [09:22:56] volans: probably related to https://phabricator.wikimedia.org/T339852 [09:23:24] I guess QFX5100 use a different syntax from the QFX5120... (cc topranks) [09:23:50] eh, looks like there is already a patch to fix it, thanks :) [09:24:10] :D [09:24:13] yeah [09:24:55] command doesn’t exist for the newer model at all, an unexpected quirk [09:25:05] +1 [09:25:21] cool, sorry for the noise I’ll merge that now [09:26:34] no prob, I was just checking if it was known, happy to see it already solved :D [10:19:37] (SystemdUnitFailed) firing: (7) generate_os_reports.service Failed on cumin2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:46:31] (SystemdUnitFailed) firing: (7) generate_os_reports.service Failed on cumin2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:47:48] 10SRE-tools, 10Infrastructure-Foundations, 10Spicerack: spicerack.phabricator: Don't fail when logging to a restricted task - https://phabricator.wikimedia.org/T335879 (10Aklapper) >>! In T335879#9173531, @Volans wrote: > The wmflib code could be improved to distinguish between a permission error and any oth... [10:51:58] 10netbox, 10Infrastructure-Foundations: Should we have two versions of the Juniper QFX5120-48Y in Netbox? - https://phabricator.wikimedia.org/T331519 (10cmooney) @ayounsi I was leaning towards having a single device, from a netops perspective they are pretty much the same. In terms of licensing we know everyt... [10:59:12] 10SRE-tools, 10Infrastructure-Foundations, 10Spicerack: spicerack.phabricator: Don't fail when logging to a restricted task - https://phabricator.wikimedia.org/T335879 (10Volans) @Aklapper What I meant is that there is no way to distinguish between the "no access" error and any other error that could be a mi... [11:15:46] 10SRE-tools, 10Infrastructure-Foundations, 10Spicerack: spicerack.phabricator: Don't fail when logging to a restricted task - https://phabricator.wikimedia.org/T335879 (10Aklapper) >>! In T335879#9174123, @Volans wrote: > It's just the message that differ, that is something wmflib should not rely on because... [11:42:16] 10CAS-SSO, 10Data-Platform-SRE, 10Infrastructure-Foundations: Switch DataHub authentication to OIDC - https://phabricator.wikimedia.org/T305874 (10Stevemunene) [12:21:32] 10SRE-tools, 10Cloud-VPS, 10Infrastructure-Foundations, 10Spicerack, 10cloud-services-team: spicerack: sal_logger does not work when running from a laptop - https://phabricator.wikimedia.org/T343336 (10fnegri) 05Open→03Resolved a:03fnegri This was fixed by @taavi in https://gerrit.wikimedia.org/r/c... [12:26:43] 10SRE-tools, 10Cloud-VPS, 10Infrastructure-Foundations, 10Spicerack, 10cloud-services-team: spicerack: sal_logger does not work when running from CloudVPS instances - https://phabricator.wikimedia.org/T343335 (10fnegri) 05Open→03Resolved a:03fnegri Similarly to T343336, this was also fixed by @taav... [13:47:10] 10netops, 10Infrastructure-Foundations, 10Observability-Metrics: cr*-eqsin long poll times from librenms - https://phabricator.wikimedia.org/T346606 (10fgiunchedi) + netops for visibility since this can impact network devices [13:48:19] 10netops, 10Infrastructure-Foundations, 10Observability-Metrics: cr*-eqsin long poll times from librenms - https://phabricator.wikimedia.org/T346606 (10ayounsi) We had a quick chat on IRC. the `ports` and `bgp-peers` modules are the ones taking the most time, so no need to focus on the `snmp-max-oids` Libre... [14:00:22] 10netops, 10Infrastructure-Foundations, 10SRE: scrape ripe atlas data for a few anchors at other large networks - https://phabricator.wikimedia.org/T252890 (10CDanis) 05Open→03Declined >>! In T252890#9165519, @ayounsi wrote: > @CDanis Is that still needed now that we have NEL? It would be interesting t... [14:46:33] (SystemdUnitFailed) firing: (6) generate_os_reports.service Failed on cumin2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:00:29] o/ I think I'm probably in the right place with this. [15:00:43] I'm having some issues with cergen on puppetmaster1001. [15:01:04] https://www.irccloud.com/pastebin/e0uH1ClY/trace [15:08:35] The manifest looks good I think... It's this one `ticket-test.certs.yaml`. Attempted generating/viewing certificate status with other manifests and no luck. :( [15:15:46] Eerm. Never mind. :'( My mistake... Misread the docs. [17:16:43] 10netbox, 10Infrastructure-Foundations: Netbox: define strategy to track standard server configurations - https://phabricator.wikimedia.org/T284614 (10Volans) 05Open→03Resolved We're using the standard configuration since a while now. Boldly resolving, feel free to re-open in case there is anything left. [18:49:39] (SystemdUnitFailed) firing: (6) generate_os_reports.service Failed on cumin2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [20:24:37] (SystemdUnitFailed) firing: (7) generate_os_reports.service Failed on cumin2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:24:37] (SystemdUnitFailed) firing: (7) generate_os_reports.service Failed on cumin2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed