[00:18:13] 10netops, 06Infrastructure-Foundations: Downgrade pfw1-codfw to Junos 23.4R2-S3 - https://phabricator.wikimedia.org/T393996#10861802 (10Dwisehaupt) Just pinging on this. Maintenance week is this week and we are ok for the work to happen when you are ready. Morning to midday UTC is extra good for us as most of... [02:21:25] FIRING: SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:40:25] FIRING: MirrorHighLag: Mirrors - /srv/mirrors/ubuntu synchronization lag - https://wikitech.wikimedia.org/wiki/Mirrors - https://grafana.wikimedia.org/d/dbd8a904-eab2-48d1-a3b9-fa1851ef3ed2/mirrors?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DMirrorHighLag [02:47:49] FIRING: PuppetZeroResources: Puppet has failed generate resources on ganeti7002:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [03:18:34] FIRING: NetboxPhysicalHosts: Netbox - Report parity errors between PuppetDB and Netbox for physical devices. - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxPhysicalHosts [03:23:34] FIRING: NetboxAccounting: Netbox - Accounting job failed - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxAccounting [06:19:32] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: codfw: BAD PEM3 on cr2-codfw - https://phabricator.wikimedia.org/T394868#10862103 (10ayounsi) @Jhancock.wm did the PSU arrive? Can you set it up asap ? thx [06:21:25] FIRING: SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:22:48] 10netbox, 06Infrastructure-Foundations, 06SRE, 10SRE-Access-Requests: Selena can't see objects in Netbox despite having wmf group membership - https://phabricator.wikimedia.org/T395172#10862105 (10SLyngshede-WMF) We are looking into some sort of central sign out, but I can't really say how that would work... [06:22:57] 10netbox, 06Infrastructure-Foundations, 06SRE, 10SRE-Access-Requests: Selena can't see objects in Netbox despite having wmf group membership - https://phabricator.wikimedia.org/T395172#10862109 (10SLyngshede-WMF) 05Open→03Resolved [06:40:25] FIRING: MirrorHighLag: Mirrors - /srv/mirrors/ubuntu synchronization lag - https://wikitech.wikimedia.org/wiki/Mirrors - https://grafana.wikimedia.org/d/dbd8a904-eab2-48d1-a3b9-fa1851ef3ed2/mirrors?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DMirrorHighLag [06:47:48] FIRING: PuppetZeroResources: Puppet has failed generate resources on ganeti7002:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [07:18:34] FIRING: NetboxPhysicalHosts: Netbox - Report parity errors between PuppetDB and Netbox for physical devices. - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxPhysicalHosts [07:23:34] FIRING: NetboxAccounting: Netbox - Accounting job failed - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxAccounting [08:20:25] RESOLVED: MirrorHighLag: Mirrors - /srv/mirrors/ubuntu synchronization lag - https://wikitech.wikimedia.org/wiki/Mirrors - https://grafana.wikimedia.org/d/dbd8a904-eab2-48d1-a3b9-fa1851ef3ed2/mirrors?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DMirrorHighLag [08:21:25] RESOLVED: SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:32:55] FIRING: [2x] SystemdUnitFailed: prometheus-ganeti-exporter.service on ganeti7002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:47:49] RESOLVED: PuppetZeroResources: Puppet has failed generate resources on ganeti7002:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [09:02:55] RESOLVED: SystemdUnitFailed: prometheus-ganeti-exporter.service on ganeti7002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:08:25] FIRING: SystemdUnitFailed: prometheus-ganeti-exporter.service on ganeti7002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:03:25] RESOLVED: SystemdUnitFailed: prometheus-ganeti-exporter.service on ganeti7002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:18:34] FIRING: NetboxPhysicalHosts: Netbox - Report parity errors between PuppetDB and Netbox for physical devices. - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxPhysicalHosts [11:23:34] FIRING: NetboxAccounting: Netbox - Accounting job failed - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxAccounting [11:52:25] FIRING: SystemdUnitFailed: netbox_ganeti_magru03_sync.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:02:25] RESOLVED: SystemdUnitFailed: netbox_ganeti_magru03_sync.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:08:26] 10SRE-tools, 06Infrastructure-Foundations, 10observability: Cookbook sre.hosts.remove_downtime does not remove silences - https://phabricator.wikimedia.org/T395032#10863796 (10lmata) [15:18:34] FIRING: NetboxPhysicalHosts: Netbox - Report parity errors between PuppetDB and Netbox for physical devices. - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxPhysicalHosts [15:23:11] 10CAS-SSO, 06Infrastructure-Foundations, 10Phabricator, 13Patch-For-Review: Phabricator should use IDP for developer account logins - https://phabricator.wikimedia.org/T377061#10864300 (10Arendpieter) [15:23:34] FIRING: NetboxAccounting: Netbox - Accounting job failed - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxAccounting [15:23:48] 10CAS-SSO, 06Infrastructure-Foundations, 10Phabricator: Phabricator should use IDP for developer account logins - https://phabricator.wikimedia.org/T377061#10864304 (10Arendpieter) [16:10:40] 10CAS-SSO, 06Infrastructure-Foundations, 10Phabricator: Phabricator should use IDP for developer account logins - https://phabricator.wikimedia.org/T377061#10864752 (10Aklapper) I need to get back to this but currently using the test instance to test upgrades in {T370266} and {T386558} has priority, sorry :-/ [17:06:55] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Install and cable Nokia test devices and test servers in codfw - https://phabricator.wikimedia.org/T385217#10865012 (10cmooney) Nokia have been on asking to get the test kit back. Mostly this will fall to DC-Ops but if you can leave m... [17:31:31] Heyo! I'm wondering why --reason and --task-id aren't included in the -operations output. It might be nice to have it included for visibility [17:36:45] Is this something that foundations would be interested in pursuing? I could open a task/patch [17:37:47] Or, at the very least, having a toggleable bool that enables/disables the inclusion of reason/task-id [17:40:55] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: codfw: BAD PEM3 on cr2-codfw - https://phabricator.wikimedia.org/T394868#10865143 (10cmooney) 05Open→03Resolved All is now good with this one, thanks for sorting it @Jhancock.wm ! ` cmooney@re0.cr2-codfw> show chassis environme... [19:18:34] FIRING: NetboxPhysicalHosts: Netbox - Report parity errors between PuppetDB and Netbox for physical devices. - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxPhysicalHosts [19:23:34] FIRING: NetboxAccounting: Netbox - Accounting job failed - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxAccounting [19:59:42] brett: makes sense to me, feel free to file a task and we can talk about it [20:28:39] brett: if you're referring to cookbooks, each cookbook can decide what to include in the log to SAL defining the runtime_description() method of the runner [20:30:12] also each cookbook can define what CLI args it wants/need. Are you referring to a specific context? [20:38:44] I'm aware of the ability to customize, but I was arguing that it would be a good default to include the reason and task id [20:38:54] volans: ^ [20:39:31] in a specific context or in general [20:39:38] in general [20:39:50] If there is one specified [20:40:18] there is no strict API for arguments and the definition of arguments and the runtime description live in different classes [20:41:00] there is currently no way for spicerack to know if and where those arguments are stored in the runner's class [20:42:11] it would also bare the question of whether it would be a good default, the reason might contain information that should go into an NDA task and not being public on IRC/SAL/twitter [20:42:45] volans: Are you interested in a bug report? [20:43:36] if you're instead talking about SREBatchRunnerBase based cookbooks that's totally another story [20:44:21] I was talking about all cookbooks wherein a task id and/or reason was set [20:47:37] for the task id it might make sense, for the reason by default I'm not sure it would be a good idea. [20:48:06] Currently there are 88 cookbooks that define their own runtime description, how to integrate the "default inclusion of the task id" with the customization of those? [20:48:23] that usually include a target host/hosts, the type of operation, etc... [20:51:51] That's their problem if they're overriding it, isn't it? [20:55:23] not really, if not usable by most of the cookbooks what's the benefit [22:43:25] FIRING: SystemdUnitFailed: update-tails-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [23:18:34] FIRING: NetboxPhysicalHosts: Netbox - Report parity errors between PuppetDB and Netbox for physical devices. - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxPhysicalHosts [23:23:34] FIRING: NetboxAccounting: Netbox - Accounting job failed - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxAccounting