[07:39:39] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Add per-output queue monitoring for Juniper network devices - https://phabricator.wikimedia.org/T326322 (10ayounsi) Update on the JTAC case, they're still working on it, they tested it without SSL and it worked fine, but: > Hope you are doi... [08:35:25] 10netops, 10Infrastructure-Foundations, 10SRE, 10netbox: Netbox Juniper report - https://phabricator.wikimedia.org/T306238 (10ayounsi) They replied back saying that the staging env is ready. Following their instructions I get a 401, to be investigated. [09:26:17] 10netops, 10Infrastructure-Foundations, 10SRE: Put Dell SONiC switches in production - https://phabricator.wikimedia.org/T335028 (10ayounsi) [13:45:42] (SystemdUnitFailed) firing: (2) varnishncsa.service Failed on cp1104:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:46:28] 10Traffic, 10SRE, 10Patch-For-Review: Q1:Install cp11[00-15] and rotate into production - https://phabricator.wikimedia.org/T349244 (10Fabfur) [13:46:45] (SystemdUnitCrashLoop) firing: varnish-frontend.service crashloop on cp1108:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop [13:47:26] ^^ those hosts are currently NOT in production [13:50:42] (SystemdUnitFailed) firing: (6) haproxy_stek_job.service Failed on cp1104:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:51:45] (SystemdUnitCrashLoop) firing: (3) varnish-frontend.service crashloop on cp1104:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop [13:51:46] (SystemdUnitCrashLoop) firing: varnish-frontend.service crashloop on cp1108:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop [14:13:22] 10netops, 10Infrastructure-Foundations, 10SRE, 10observability, 10SRE Observability (FY2023/2024-Q2): librenms.syslog table size - https://phabricator.wikimedia.org/T349362 (10lmata) [14:28:51] 10Traffic, 10SRE: Q1:Install cp11[00-15] and rotate into production - https://phabricator.wikimedia.org/T349244 (10Fabfur) First host reimage is complete: ` Reimage completed: - cp1100 (**PASS**) - Downtimed on Icinga/Alertmanager - Disabled Puppet - Removed from Puppet and PuppetDB if present and delet... [15:15:08] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bookworm - https://phabricator.wikimedia.org/T342154 (10ssingh) [16:45:27] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bookworm - https://phabricator.wikimedia.org/T342154 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host dns1005.wikimedia.org with OS bookworm [17:04:34] 10Traffic, 10SRE, 10GitLab (Project Migration), 10Patch-For-Review: Migrate Traffic repositories from Gerrit to Gitlab - https://phabricator.wikimedia.org/T347623 (10CodeReviewBot) brett merged https://gitlab.wikimedia.org/repos/sre/acme-chief/-/merge_requests/4 Implement Gitlab CI and Blubber config [17:29:03] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bookworm - https://phabricator.wikimedia.org/T342154 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host dns1005.wikimedia.org with OS bookworm completed: - dns1005 (**PASS**) - Downtimed on Icinga/Al... [19:27:35] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bookworm - https://phabricator.wikimedia.org/T342154 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host dns1006.wikimedia.org with OS bookworm [20:05:00] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bookworm - https://phabricator.wikimedia.org/T342154 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host dns1006.wikimedia.org with OS bookworm completed: - dns1006 (**PASS**) - Downtimed on Icinga/Al... [20:26:39] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bookworm - https://phabricator.wikimedia.org/T342154 (10BCornwall) [22:52:48] 10Traffic, 10Commons, 10Wikimedia-Site-requests, 10serviceops: Enforce upload rate limits for bots on commons - https://phabricator.wikimedia.org/T248177 (10Pppery)