[06:16:15] 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team: Move WMCS servers to 1 single NIC - https://phabricator.wikimedia.org/T319184 (10ayounsi) [09:44:35] 10Traffic, 10Commons, 10SRE: Specific PNG thumbnail of SVG file is outdated / stuck (European caching cluster) - https://phabricator.wikimedia.org/T333042 (10MatthewVernon) >>! In T333042#8776941, @Umar wrote: > For more than a month I have not seen new versions of files. > > https://commons.wikimedia.org/w... [09:46:34] 10Traffic, 10Commons, 10SRE: Specific PNG thumbnail of SVG file is outdated / stuck (European caching cluster) - https://phabricator.wikimedia.org/T333042 (10MatthewVernon) >>! In T333042#8764707, @Lionel_Scheepmans wrote: > Hello. I still have a problem with the display of a PDF on this page : https://fr.w... [10:25:01] 10Traffic, 10Commons, 10MediaWiki-File-management, 10SRE: PNG thumbnail of Wikimedia Commons SVG file sometimes not updated - https://phabricator.wikimedia.org/T334303 (10MatthewVernon) I have cleared out the old thumbnails of this image (so as the CDN expires the ones its cached they should get regenerated). [10:36:36] 10netops, 10Infrastructure-Foundations, 10SRE: Verify and Configure ECMP operation for EVPN switches - https://phabricator.wikimedia.org/T334658 (10cmooney) p:05Triage→03Medium [11:07:53] 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team (FY2022/2023-Q4): Configure cloudsw1-b1-codfw and migrate cloud hosts in codfw B1 to it - https://phabricator.wikimedia.org/T327919 (10fnegri) [11:23:58] 10Traffic, 10MW-on-K8s, 10SRE, 10serviceops, and 3 others: Migrate internal traffic to k8s - https://phabricator.wikimedia.org/T333120 (10Clement_Goubert) [11:38:18] 10Traffic, 10Commons, 10SRE: Specific PNG thumbnail of SVG file is outdated / stuck (European caching cluster) - https://phabricator.wikimedia.org/T333042 (10Ladsgroup) [11:38:28] 10Traffic, 10Commons, 10MediaWiki-File-management, 10SRE: PNG thumbnail of Wikimedia Commons SVG file sometimes not updated - https://phabricator.wikimedia.org/T334303 (10Ladsgroup) [13:19:45] (HAProxyRestarted) firing: HAProxy server restarted on cp5022:9100 - https://wikitech.wikimedia.org/wiki/HAProxy#HAProxy_for_edge_caching - https://grafana.wikimedia.org/d/gQblbjtnk/haproxy-drilldown?orgId=1&var-site=eqsin%20prometheus/ops&var-instance=cp5022&viewPanel=10 - https://alerts.wikimedia.org/?q=alertname%3DHAProxyRestarted [13:29:45] (HAProxyRestarted) resolved: HAProxy server restarted on cp5022:9100 - https://wikitech.wikimedia.org/wiki/HAProxy#HAProxy_for_edge_caching - https://grafana.wikimedia.org/d/gQblbjtnk/haproxy-drilldown?orgId=1&var-site=eqsin%20prometheus/ops&var-instance=cp5022&viewPanel=10 - https://alerts.wikimedia.org/?q=alertname%3DHAProxyRestarted [13:47:09] 10Traffic, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Receive network latency reports from the browsers - https://phabricator.wikimedia.org/T334417 (10CDanis) [14:07:42] (SystemdUnitFailed) firing: ifup@ens13.service Failed on ncredir2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:12:42] (SystemdUnitFailed) resolved: (2) ifup@ens13.service Failed on ncredir2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:35:41] 10netops, 10Infrastructure-Foundations, 10SRE: eqiad/codfw virtual-chassis upgrades - https://phabricator.wikimedia.org/T327248 (10ayounsi) [15:23:59] 10Traffic, 10MW-on-K8s, 10SRE, 10serviceops, and 3 others: Migrate internal traffic to k8s - https://phabricator.wikimedia.org/T333120 (10Clement_Goubert) [15:36:41] 10Traffic, 10SRE: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10BCornwall) [15:50:52] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10BCornwall) [16:18:56] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Codfw:row A/B: rack/cable new switches - https://phabricator.wikimedia.org/T332180 (10Papaul) [16:49:45] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host lvs2008.codfw.wmnet with OS bullseye [17:30:16] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host lvs2008.codfw.wmnet with OS bullseye completed: - lvs2008 (**PASS**) - Downtimed on Icinga/Aler... [18:07:09] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10BCornwall) [18:16:35] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host lvs2009.codfw.wmnet with OS bullseye [18:28:45] 10Traffic, 10Infrastructure-Foundations, 10SRE: Receive network latency reports from the browsers - https://phabricator.wikimedia.org/T334417 (10JameelKaisar) [18:32:35] 10Traffic, 10SRE, 10Wikidata, 10wdwb-tech: Wikidata seems to still be utilizing insecure HTTP URIs - https://phabricator.wikimedia.org/T331356 (10BBlack) >>! In T331356#8718619, @MisterSynergy wrote: > Some remarks: > * We should consider these canonical HTTP URIs to be //names// in the first place, which... [18:55:57] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host lvs2009.codfw.wmnet with OS bullseye completed: - lvs2009 (**PASS**) - Downtimed on Icinga/Aler... [19:59:11] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10BCornwall) [20:50:26] 10Traffic, 10SRE, 10conftool, 10serviceops: Pybal maintenances break safe-service-restart.py (and thus prevent scap deploys of mediawiki) - https://phabricator.wikimedia.org/T334703 (10CDanis) [21:04:35] 10Traffic, 10DC-Ops: Upgrade lvs1013-1016 firmware - https://phabricator.wikimedia.org/T334259 (10BCornwall) 05In progress→03Resolved I'm going to go ahead and close this; Unless there's anything that comes up in our testing we're just going to leave the nics where they are. [21:26:08] 10Traffic, 10Data-Services, 10SRE: 2022-09-04 Scraping from AS714 (Apple) against dumps.wikimedia.org saturating network links - https://phabricator.wikimedia.org/T317001 (10BCornwall) 05Open→03Stalled [21:26:56] 10Traffic, 10Data-Services, 10SRE: 2022-09-04 Scraping from AS714 (Apple) against dumps.wikimedia.org saturating network links - https://phabricator.wikimedia.org/T317001 (10BCornwall) @CDanis, thank you for your work on this ticket! Would you agree that it's worth closing this ticket? Is there a desire to f... [22:33:37] Is there monitoring for "cert expiry" for certs issued by acmechief/LE? and is it not in Icinga? [22:33:43] I am being asked myself on a code review :) [22:34:00] because we had some Icinga checks like the custom one for gerrit.. that also did this [22:35:00] and in new monitoring land that would go away....but I said "well, I think we don't need to monitor every cert from acme_chief for expiry with a separate check in our own services' puppet code" .. but .. [23:29:06] not the kind of monitoring you are asking about, but LE itself will email you when there's less than (15?) days left for your certificate, and you don't have a newer one for that name