[00:08:25] FIRING: SystemdUnitFailed: generate_vrts_aliases.service on mx1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [01:08:25] RESOLVED: SystemdUnitFailed: generate_vrts_aliases.service on mx1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [01:13:19] 10SRE-tools, 06Infrastructure-Foundations, 10Spicerack: Unified pattern for RemoteHosts accessors in Spicerack - https://phabricator.wikimedia.org/T374073 (10Scott_French) 03NEW [08:25:01] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: Q1:codfw:frack network upgrade tracking task - https://phabricator.wikimedia.org/T371434#10120336 (10cmooney) >>! In T371434#10119784, @Papaul wrote: > The diagram below will outline the cabling of the new Fundraising n... [08:26:20] 10netops, 06DBA, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: Migrate servers in codfw rack C1 from asw-c1-codfw to lsw1-c1-codfw - https://phabricator.wikimedia.org/T373095#10120339 (10cmooney) 05Open→03Resolved a:03cmooney [08:28:25] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10120363 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.renumber-node was started by jayme@cumin1002 Renumbering for host wikikube-w... [08:28:36] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10120364 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jayme@cumin1002 for host wikikube-worker2088.codfw.... [09:00:29] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10120464 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.rename started by jayme@cumin1002 from mw2434 to wik... [09:01:32] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10120472 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.rename started by jayme@cumin1002 from mw2435 to wik... [09:02:20] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10120477 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.renumber-node was started by jayme@cumin1002 Renumberi... [09:02:33] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10120478 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.renumber-node started by jayme@cumin1002 Renumbering f... [09:03:08] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10120480 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.renumber-node was started by jayme@cumin1002 Renumberi... [09:03:18] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10120481 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.renumber-node started by jayme@cumin1002 Renumbering f... [09:03:48] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10120482 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.renumber-node was started by jayme@cumin1002 Renumberi... [09:04:09] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10120483 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jayme@cumin1002 for host wiki... [09:04:55] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10120484 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.renumber-node was started by jayme@cumin1002 Renumberi... [09:05:11] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10120485 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jayme@cumin1002 for host wiki... [09:06:25] FIRING: SystemdUnitFailed: generate_vrts_aliases.service on mx2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:12:54] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10120495 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jayme@cumin1002 for host wikikube... [09:19:57] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10120516 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.renumber-node started by jayme@cumin1002 Renumbering f... [09:39:23] 10SRE-tools, 06Infrastructure-Foundations, 10Spicerack: Unified pattern for RemoteHosts accessors in Spicerack - https://phabricator.wikimedia.org/T374073#10120608 (10Volans) Thanks for the task, we'll evaluate the various options and come up with a final proposal. [09:54:38] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10120668 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jayme@cumin1002 for host wikikube-worker2089.codfw.wmne... [09:59:46] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10120679 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.renumber-node started by jayme@cumin1002 Renumbering for host wikikube-worke... [10:03:43] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10120684 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.renumber-node was started by hnowlan@cumin1002 Renumbering for host wikikube... [10:03:56] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10120685 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin1002 for host wikikube-worker2084.codf... [10:06:25] RESOLVED: SystemdUnitFailed: generate_vrts_aliases.service on mx2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:55:37] 10netops, 06collaboration-services, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: Migrate servers in codfw racks C2 & C3 from asw to lsw - https://phabricator.wikimedia.org/T373096#10120867 (10MatthewVernon) Further Data Persistence nodes (Ceph / Swift) in `C2`: |`C2` | moss-be2003 | needs mainten... [10:57:54] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10120876 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin1002 for host wikikube-worker2084.codfw.wm... [10:57:58] !log homer lsw1-b3-codfw* commit [10:57:58] hnowlan: Not expecting to hear !log here [10:58:05] oop [11:00:35] don't mind him :P [11:01:05] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10120886 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.renumber-node started by hnowlan@cumin1002 Renumbering for host wikikube-wor... [11:21:22] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10120933 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jayme@cumin1002 for host wikikube-worker2090.codfw.wmne... [11:22:12] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10120940 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.renumber-node started by jayme@cumin1002 Renumbering for host wikikube-worke... [11:25:22] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10120948 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.renumber-node was started by cgoubert@cumin1002 Renumbering for host wikikub... [11:26:49] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10120950 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host wikikube-worker2029.cod... [12:54:46] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10121332 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikikube-worker2029.codfw.w... [12:58:42] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10121341 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.renumber-node started by cgoubert@cumin1002 Renumbering for host wikikube-wo... [13:06:48] FIRING: PuppetZeroResources: Puppet has failed generate resources on idp2004:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [13:11:48] FIRING: [2x] PuppetZeroResources: Puppet has failed generate resources on idp1004:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [13:17:25] FIRING: SystemdUnitFailed: generate_vrts_aliases.service on mx1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:19:27] 10netops, 06collaboration-services, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: Migrate servers in codfw racks C4 & C5 from asw to lsw - https://phabricator.wikimedia.org/T373097#10121449 (10MatthewVernon) There are 4 swift servers in `C4` - ms-be2058 ms-be2064 ms-be2072 ms-be2077 ; they'll need... [13:24:35] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Migrate servers in codfw racks C6 & C7 from asw to lsw - https://phabricator.wikimedia.org/T373101#10121464 (10MatthewVernon) There are some impact Swift servers: - ms-be2054 and ms-be2078 and thanos-be2003 - these just need a quick c... [13:33:52] 10netops, 06collaboration-services, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: Migrate servers in codfw racks D1 & D2 from asw to lsw - https://phabricator.wikimedia.org/T373102#10121496 (10MatthewVernon) These racks have the following Swift/Ceph nodes: - ms-fe2012 moss-fe2002 thanos-fe2003 (ne... [13:36:00] 10netops, 06collaboration-services, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: Migrate servers in codfw racks D5 & D6 from asw to lsw - https://phabricator.wikimedia.org/T373104#10121520 (10MatthewVernon) No affected swift/Ceph nodes in these racks. [13:36:02] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Migrate servers in codfw racks D3 & D4 from asw to lsw - https://phabricator.wikimedia.org/T373103#10121503 (10MatthewVernon) No Swift/Ceph nodes affected in this one. [13:40:05] 10netops, 06collaboration-services, 06DC-Ops, 06Infrastructure-Foundations, and 3 others: Migrate servers in codfw racks D7 & D8 from asw to lsw - https://phabricator.wikimedia.org/T373105#10121536 (10MatthewVernon) There are these impacted Swift/Ceph nodes: - thanos-be2004 ms-be2056 ms-be2059 ms-be2073 m... [13:40:41] 10netops, 06collaboration-services, 06DC-Ops, 06Infrastructure-Foundations, and 3 others: Migrate servers in codfw racks D1 & D2 from asw to lsw - https://phabricator.wikimedia.org/T373102#10121551 (10MatthewVernon) [13:40:57] 10netops, 06collaboration-services, 06DC-Ops, 06Infrastructure-Foundations, and 3 others: Migrate servers in codfw racks C2 & C3 from asw to lsw - https://phabricator.wikimedia.org/T373096#10121554 (10MatthewVernon) [13:41:14] 10netops, 06collaboration-services, 06DC-Ops, 06Infrastructure-Foundations, and 3 others: Migrate servers in codfw racks C4 & C5 from asw to lsw - https://phabricator.wikimedia.org/T373097#10121555 (10MatthewVernon) [13:41:35] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, and 2 others: Migrate servers in codfw racks C6 & C7 from asw to lsw - https://phabricator.wikimedia.org/T373101#10121561 (10MatthewVernon) [13:51:48] FIRING: [2x] PuppetZeroResources: Puppet has failed generate resources on idp1004:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [14:17:25] RESOLVED: SystemdUnitFailed: generate_vrts_aliases.service on mx1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:21:48] RESOLVED: PuppetZeroResources: Puppet has failed generate resources on idp2004:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [14:36:50] 10netops, 06Infrastructure-Foundations, 10probenet, 06SRE, 06Traffic: improve GeoDNS-to-edge mapping - https://phabricator.wikimedia.org/T316160#10121802 (10CDanis) [14:37:41] 10netops, 06Infrastructure-Foundations, 10probenet, 06SRE, 06Traffic: improve GeoDNS-to-edge mapping - https://phabricator.wikimedia.org/T316160#10121808 (10CDanis) [14:47:00] moritzm, elukey: is it ok if I deploy fleet-wide python3-wmflib or are you debdeploying things? [14:47:32] please go ahead! [14:48:24] volans: I am debdeploying things but I can stop, please go ahead [14:48:45] elukey: I can wait for yours to finish, no hurry :) [14:49:30] nono I was doing some staggered updates, finished [14:49:50] ok, thx [14:51:04] 10SRE-tools, 06Infrastructure-Foundations, 10Spicerack: Unified pattern for RemoteHosts accessors in Spicerack - https://phabricator.wikimedia.org/T374073#10121866 (10elukey) [15:00:41] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10121908 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.rename started by kamila@cumin1002 from mw2420 to wi... [15:01:49] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10121910 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.rename started by kamila@cumin1002 from mw2421 to wi... [15:03:33] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10121913 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.renumber-node was started by kamila@cumin1002 Renumber... [15:04:04] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10121914 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by kamila@cumin1002 for host wik... [15:06:18] moritzm, elukey: all done, thanks! [15:07:00] ack [15:07:12] nice :) [15:10:50] 10netops, 06collaboration-services, 06DC-Ops, 06Infrastructure-Foundations, and 3 others: Migrate servers in codfw racks C2 & C3 from asw to lsw - https://phabricator.wikimedia.org/T373096#10121964 (10Fabfur) Hosts cp203[5-6] downtimed and depooled [15:17:24] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10121992 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.renumber-node was started by kamila@cumin1002 Renumber... [15:17:45] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10121994 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by kamila@cumin1002 for host wik... [15:19:25] FIRING: SystemdUnitFailed: generate_vrts_aliases.service on mx1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:48:46] 10netops, 06collaboration-services, 06DC-Ops, 06Infrastructure-Foundations, and 3 others: Migrate servers in codfw racks C2 & C3 from asw to lsw - https://phabricator.wikimedia.org/T373096#10122089 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=8726666c-096a-491c-b6d3-edc93e2996f1) set... [16:19:25] RESOLVED: SystemdUnitFailed: generate_vrts_aliases.service on mx1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [16:31:56] 10netops, 06collaboration-services, 06DC-Ops, 06Infrastructure-Foundations, and 3 others: Migrate servers in codfw racks C2 & C3 from asw to lsw - https://phabricator.wikimedia.org/T373096#10122267 (10MatthewVernon) @cmooney all good to go from a Swift/Ceph perspective, thanks for your patience [16:37:28] 10netops, 06collaboration-services, 06DC-Ops, 06Infrastructure-Foundations, and 3 others: Migrate servers in codfw racks C2 & C3 from asw to lsw - https://phabricator.wikimedia.org/T373096#10122281 (10cmooney) >>! In T373096#10122267, @MatthewVernon wrote: > @cmooney all good to go from a Swift/Ceph perspe... [16:39:16] 10netops, 06collaboration-services, 06DC-Ops, 06Infrastructure-Foundations, and 3 others: Migrate servers in codfw racks C2 & C3 from asw to lsw - https://phabricator.wikimedia.org/T373096#10122286 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=cde90074-86b4-49ac-9878-436a5d041f2b) set... [16:49:41] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10122317 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by kamila@cumin1002 for host wikikub... [16:49:43] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10122318 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.renumber-node started by kamila@cumin1002 Renumbering... [16:51:21] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10122337 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by kamila@cumin1002 for host wikikub... [16:55:24] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10122354 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.renumber-node started by kamila@cumin1002 Renumbering... [16:58:16] 10netops, 06collaboration-services, 06DC-Ops, 06Infrastructure-Foundations, and 3 others: Migrate servers in codfw racks C2 & C3 from asw to lsw - https://phabricator.wikimedia.org/T373096#10122371 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=07e91a47-4c42-404a-bc7d-ad277bbf3e2b) set... [17:08:51] 10netops, 06collaboration-services, 06DC-Ops, 06Infrastructure-Foundations, and 3 others: Migrate servers in codfw racks C2 & C3 from asw to lsw - https://phabricator.wikimedia.org/T373096#10122430 (10MatthewVernon) Swift / Ceph back to normal, thanks! [17:09:21] 10netops, 06collaboration-services, 06DC-Ops, 06Infrastructure-Foundations, and 3 others: Migrate servers in codfw racks C2 & C3 from asw to lsw - https://phabricator.wikimedia.org/T373096#10122440 (10ABran-WMF) kudos @Jhancock.wm! d/p nodes are repooling [17:09:26] 10netops, 06collaboration-services, 06DC-Ops, 06Infrastructure-Foundations, and 3 others: Migrate servers in codfw racks C2 & C3 from asw to lsw - https://phabricator.wikimedia.org/T373096#10122429 (10cmooney) All links moved and all hosts now responding to ping again. Average interruption in the region o... [18:08:16] 10netops, 06collaboration-services, 06DC-Ops, 06Infrastructure-Foundations, and 3 others: Migrate servers in codfw racks C2 & C3 from asw to lsw - https://phabricator.wikimedia.org/T373096#10122695 (10cmooney) 05Open→03Resolved a:03cmooney [20:18:22] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, and 2 others: LInk errors from lvs1019 to ssw1-f1-eqiad - https://phabricator.wikimedia.org/T374155 (10cmooney) 03NEW p:05Triage→03High [20:37:22] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, and 2 others: LInk errors from lvs1019 to ssw1-f1-eqiad - https://phabricator.wikimedia.org/T374155#10123168 (10cmooney) Should mention there is a good case for shutting down PyBal on lvs1019 now, so that no traffic uses this bad link (instead... [22:05:37] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Codfw row C/D switch installation & configuration - https://phabricator.wikimedia.org/T364095#10123484 (10cmooney) 05Open→03Resolved a:03cmooney [22:08:38] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Configure QoS marking and policy across network - https://phabricator.wikimedia.org/T339850#10123487 (10cmooney) 05Open→03Resolved [22:35:33] topranks: where can I find that nice Grafana dashboard re. the lvs1019 link errors task? [22:36:07] I think this was the one: [22:36:08] https://grafana.wikimedia.org/d/iUATvNzSz/network-queues [22:36:08] nowadays (finally), I can see the error graphs in LibreNMS, but... it's LibreNMS [22:36:33] or maybe this one: [22:36:35] https://grafana.wikimedia.org/d/d968a627-b6f6-47fc-9316-e058854a4945/network-interface-throughput-gnmi [22:37:04] https://grafana-rw.wikimedia.org/d/f61a7d56-e132-44dc-b9da-d722b11566cf/network-totals-by-site [22:37:39] yeah LibreNMS Is great but the RRDs are clunky and UI isn't great, very nice to have some of these stats in Prometheus at last :) [22:37:50] there are some wrinkles with it but it's working pretty well [22:38:03] failed to log in as user, specified in auth proxy header [22:38:07] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: Q1:codfw:frack network upgrade tracking task - https://phabricator.wikimedia.org/T371434#10123557 (10Papaul) [22:38:20] grafana-rw doesn't like the recent addition to ldap/nda :P [22:38:49] Sorry first link was wrong [22:38:53] the gNMIc stuff is great [22:38:58] https://grafana.wikimedia.org/d/5p97dAASz/network-interface-queue-and-error-stats [22:39:13] you can remove the -rw from hostname and still view them [22:39:38] just not edit... it's too late for me to be working even so I won't look at the ldap/nda stuff right now :P [22:42:58] the worst of all is that I still cannot view netbox [22:43:50] http 302 /your_bed [22:45:03] sorry for bugging you at midnight ;) [22:50:38] haha no worries, later for you! [22:51:48] 👋 [22:54:59] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, and 2 others: LInk errors from lvs1019 to ssw1-f1-eqiad - https://phabricator.wikimedia.org/T374155#10123611 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=01519f9e-2903-4b5b-b71f-e25b1467cc00) set by cmooney@cumin1002 for... [22:55:42] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, and 2 others: LInk errors from lvs1019 to ssw1-f1-eqiad - https://phabricator.wikimedia.org/T374155#10123609 (10cmooney) @Jclark-ctr is on site so I will swing the live traffic to lvs1020 so we can investigate. [22:57:02] Southparkfan: I can confirm 2 things for now: [22:57:11] - southparkfan is in LDAP group nda [22:57:28] - grafana-rw has "Require cas-attribute memberOf:cn=nda,ou=groups,dc=wikimedia,dc=org" [22:57:39] so should be fine [22:57:41] so it _looks_ like it should [22:57:45] is it failing? [22:58:02] could it be Southparkfan vs southparkfan ? [22:58:11] but didn't grafana had some kind of ldap sync script? [22:58:14] have* [22:59:09] https://github.com/wikimedia/operations-puppet/blob/production/modules/grafana/manifests/ldap_sync.pp#L15 [22:59:13] wondering what you see when you go to https://idp.wikimedia.org [22:59:32] does that tell you you are logged into CAS? [22:59:33] I'm a member of ou=groups,dc=wikimedia,dc=org [22:59:39] yep [23:01:16] feel free to see if my account has been provisioned in Grafana, if not, I'll create a task and bug the SRE clinician / o11y [23:01:28] one moment [23:02:02] I started that sync [23:02:21] but it may have an issue [23:04:11] trying to figure out where it syncs to [23:05:38] so.. the sync script has problems. this must be why [23:05:48] requests.exceptions.HTTPError: 412 Client Error: Precondition Failed for url: [23:05:57] when it tries to talk to the grafana API [23:06:11] yea, this is turning into task territory now [23:07:09] I'll create one for o11y, you can edit it afterwards to describe the sync error [23:07:30] sounds good, thanks [23:11:02] I can tell it started failing on August 27. So not today but also not that long ago. [23:11:06] be back later [23:11:38] zgrep grafana-ldap-users-sync.service /var/log/syslog*.gz | grep FAILURE [23:11:42] https://phabricator.wikimedia.org/T374173 here you go [23:11:46] ty [23:12:06] just edit the task description [23:12:17] thanks so much for your help ;) [23:13:39] yw, commented. [23:14:56] right, time to sleep [23:35:03] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: codfw:frack:rack/install/configuration new firewalls - https://phabricator.wikimedia.org/T374176 (10Papaul) 03NEW [23:37:28] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: codfw:frack:rack/install/configuration new firewalls - https://phabricator.wikimedia.org/T374176#10123837 (10Papaul) [23:38:31] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: codfw:frack:rack/install/configuration new firewalls - https://phabricator.wikimedia.org/T374176#10123839 (10Papaul) [23:43:18] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, and 2 others: LInk errors from lvs1019 to ssw1-f1-eqiad - https://phabricator.wikimedia.org/T374155#10123855 (10cmooney) 05Open→03Resolved Ok so we replaced the optic on the lvs1019 side, and things seem to be good. Sent a test stream...