[04:52:09] FIRING: LVSHighRX: Excessive RX traffic on lvs2013:9100 (eno12399np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs2013 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [04:57:09] RESOLVED: LVSHighRX: Excessive RX traffic on lvs2013:9100 (eno12399np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs2013 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [06:44:09] FIRING: LVSHighRX: Excessive RX traffic on lvs2013:9100 (eno12399np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs2013 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [06:49:09] RESOLVED: LVSHighRX: Excessive RX traffic on lvs2013:9100 (eno12399np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs2013 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [09:25:29] 06Traffic, 06[Archived]Wikidata Dev Team, 10Prod-Kubernetes, 06SRE, and 5 others: Frequent 500 Errors and Timeouts When Adding Statements to New Item or Lexeme-typed Properties - https://phabricator.wikimedia.org/T374230#10771849 (10Silvan_WMDE) @Kirilloparma We have created and merged a patch that will ho... [10:09:42] 10netops, 06Infrastructure-Foundations, 10Observability-Alerting, 13Patch-For-Review: Migrate network icinga alerts to gNMI/prometheus - https://phabricator.wikimedia.org/T388641#10771974 (10ayounsi) Another oddity is that subscribing to `/components/component/transceiver/physical-channels/channel/state` a... [11:08:46] 06Traffic, 06[Archived]Wikidata Dev Team, 10Prod-Kubernetes, 06SRE, and 5 others: Frequent 500 Errors and Timeouts When Adding Statements to New Item or Lexeme-typed Properties - https://phabricator.wikimedia.org/T374230#10772066 (10Silvan_WMDE) Until then: as noted above, the problem is not actually cause... [12:40:09] FIRING: [8x] LVSHighCPU: The host lvs5005:9100 has at least its CPU 0 saturated - https://bit.ly/wmf-lvscpu - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs5005 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighCPU [12:50:09] RESOLVED: [8x] LVSHighCPU: The host lvs5005:9100 has at least its CPU 0 saturated - https://bit.ly/wmf-lvscpu - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs5005 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighCPU [12:56:25] FIRING: SystemdUnitFailed: wmf_auto_restart_varnish-frontend-hospital.service on cp5029:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:51:25] FIRING: [2x] SystemdUnitFailed: wmf_auto_restart_varnish-frontend-hospital.service on cp5029:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:51:31] huh [13:54:18] ^ this is probably a stale alert since the service has been fine for ~50 mins [13:54:21] yeah [13:56:25] FIRING: [3x] SystemdUnitFailed: wmf_auto_restart_varnish-frontend-hospital.service on cp5029:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:11:34] 06Traffic, 06[Archived]Wikidata Dev Team, 10Prod-Kubernetes, 06SRE, and 5 others: Frequent 500 Errors and Timeouts When Adding Statements to New Item or Lexeme-typed Properties - https://phabricator.wikimedia.org/T374230#10772794 (10ArthurPSmith) @Silvan_WMDE Thanks for working on this! I would note that t... [14:31:52] 06Traffic, 06[Archived]Wikidata Dev Team, 10Prod-Kubernetes, 06SRE, and 5 others: Frequent 500 Errors and Timeouts When Adding Statements to New Item or Lexeme-typed Properties - https://phabricator.wikimedia.org/T374230#10772840 (10Silvan_WMDE) >>! In T374230#10772794, @ArthurPSmith wrote: > Does the fix... [14:32:01] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: eqiad: determine second frack - https://phabricator.wikimedia.org/T392007#10772843 (10ayounsi) p:05Triage→03Medium [14:32:17] 10netops, 10Hiddenparma, 06Infrastructure-Foundations: Reduce the steps needed to deploy hiddenparma - https://phabricator.wikimedia.org/T382268#10772846 (10CDanis) p:05Triage→03Low [14:32:54] 06Traffic, 06Infrastructure-Foundations, 06serviceops: Ownership of the sre.deploy.hiddenparma cookbook - https://phabricator.wikimedia.org/T383809#10772847 (10CDanis) p:05Triage→03Medium [14:34:14] 06Traffic, 06SRE: Console domain and property access request - https://phabricator.wikimedia.org/T381904#10772852 (10joanna_borun) [17:56:25] RESOLVED: SystemdUnitFailed: wmf_auto_restart_varnish-frontend-hospital.service on cp5029:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:05:11] 06Traffic, 10Spicerack, 10SRE-tools: Spicerack's Icinga module should provide a way to skip specific services in sub-optimal but desired state - https://phabricator.wikimedia.org/T392848 (10ssingh) 03NEW [19:05:32] 06Traffic, 06Infrastructure-Foundations, 10Spicerack, 10SRE-tools: Spicerack's Icinga module should provide a way to skip specific services in sub-optimal but desired state - https://phabricator.wikimedia.org/T392848#10774076 (10ssingh) p:05Triage→03Low [19:31:30] 06Traffic, 06Content-Transform-Team, 06serviceops: Purging edge caches doesn't work for articles with ":" in their title - https://phabricator.wikimedia.org/T392849 (10Jgiannelos) 03NEW [19:32:24] 06Traffic, 06Content-Transform-Team, 06serviceops: Purging edge caches doesn't work for articles with ":" in their title - https://phabricator.wikimedia.org/T392849#10774111 (10Jgiannelos) [19:39:33] 06Traffic, 06Content-Transform-Team, 06serviceops: Purging edge caches doesn't work for articles with ":" in their title - https://phabricator.wikimedia.org/T392849#10774120 (10Jgiannelos) [19:41:37] 06Traffic, 06Content-Transform-Team, 06serviceops: Purging edge caches doesn't work for articles with ":" in their title - https://phabricator.wikimedia.org/T392849#10774133 (10Jgiannelos) [19:44:56] 06Traffic, 06Content-Transform-Team, 06serviceops: Purging edge caches doesn't work for articles with ":" in their title - https://phabricator.wikimedia.org/T392849#10774139 (10Jgiannelos) [19:58:49] 06Traffic, 06DC-Ops, 10ops-codfw: Q4:rack/setup/install cp40[53-68] - https://phabricator.wikimedia.org/T392851 (10RobH) 03NEW [20:00:46] 06Traffic, 06DC-Ops, 10ops-codfw: Q4:rack/setup/install cp40[53-68] - https://phabricator.wikimedia.org/T392851#10774204 (10RobH) a:03ssingh @ssingh, We didn't get racking details on ordering task T389840, so can you populate the racking details on this racking task? Additionally, please update the site.... [20:01:30] 06Traffic, 06DC-Ops, 10ops-codfw: Q4:rack/setup/install cp40[53-68] - https://phabricator.wikimedia.org/T392851#10774214 (10RobH) [21:09:57] 06Traffic, 06Content-Transform-Team, 06serviceops: Purging edge caches doesn't work for articles with ":" in their title - https://phabricator.wikimedia.org/T392849#10774560 (10Jgiannelos) [21:20:20] 06Traffic, 06DC-Ops, 10ops-codfw, 06SRE: Q4:rack/setup/install cp20[43-58] codfw - https://phabricator.wikimedia.org/T392851#10774597 (10ssingh) [21:20:33] 06Traffic, 06DC-Ops, 10ops-codfw, 06SRE: Q4:rack/setup/install cp20[43-58] codfw - https://phabricator.wikimedia.org/T392851#10774599 (10ssingh) a:05ssingh→03BCornwall [21:25:01] 06Traffic, 06DC-Ops, 10ops-codfw, 06SRE: Q4:rack/setup/install cp20[43-58] codfw - https://phabricator.wikimedia.org/T392851#10774617 (10ssingh) Thanks @RobH. Task assigned to Traffic and hostnames updated. We will take care of the preseed.yaml bit, thanks for the reminder! [21:34:57] 06Traffic, 06DC-Ops, 10ops-codfw, 06SRE: Q4:rack/setup/install cp20[43-58] codfw - https://phabricator.wikimedia.org/T392851#10774644 (10ssingh) (Scratch that, preseed.yaml is `cp[1-9][0-9][0-9][0-9]` so that's good but we just need to update site.pp) [21:36:59] 06Traffic, 06DC-Ops, 10ops-codfw, 06SRE: Q4:rack/setup/install cp20[43-58] codfw - https://phabricator.wikimedia.org/T392851#10774654 (10BCornwall) [22:07:39] 06Traffic, 06Content-Transform-Team, 06serviceops: Purging edge caches doesn't work for articles with ":" in their title - https://phabricator.wikimedia.org/T392849#10774819 (10cscott) I bet this is the problem: https://wikitech.wikimedia.org/wiki/URL_path_normalization