[00:02:34] RESOLVED: DiskSpace: Disk space serpens:9100:/ 5.077% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=serpens - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [03:11:41] FIRING: NetboxAccounting: Netbox - Accounting job failed - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxAccounting [04:11:41] RESOLVED: NetboxAccounting: Netbox - Accounting job failed - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxAccounting [08:19:05] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, and 2 others: Netbox Cable report - incorrectly parsing Nokia power supplies - https://phabricator.wikimedia.org/T410073#11412010 (10ayounsi) 05Open→03Resolved All good ! https://netbox.wikimedia.org/extras/scripts/results/274160/ [10:55:01] 10netops, 06Infrastructure-Foundations, 06SRE, 06Traffic, 13Patch-For-Review: Transport link saturation not alerting - https://phabricator.wikimedia.org/T409330#11412388 (10ayounsi) 05Open→03Resolved Paging alerting added. I won't disable the LibreNMS one for now, but only in the future to make s... [11:30:25] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: Move sretest1006 to rack D8 and connect to lswtest-d8-eqiad - https://phabricator.wikimedia.org/T411098#11412473 (10cmooney) 05Open→03Resolved [11:43:05] 10SRE-tools, 06cloud-services-team, 06Infrastructure-Foundations, 07IPv6: Some WMCS clusters apparently do not support IPv6 - https://phabricator.wikimedia.org/T271139#11412503 (10ayounsi) 05Open→03Resolved a:03ayounsi All solved. [11:52:38] 10SRE-tools, 06Infrastructure-Foundations, 06SRE, 07IPv6: Some Foundation clusters do not appear to support IPv6 - https://phabricator.wikimedia.org/T271136#11412520 (10MoritzMuehlenhoff) 05Open→03Resolved a:03MoritzMuehlenhoff These are all done, the remaining Ganeti nodes w/o AAAA records were... [12:42:48] 10SRE-tools, 06Infrastructure-Foundations, 06SRE: sre.ganeti.makevm: Create machine types - https://phabricator.wikimedia.org/T344972#11412674 (10MoritzMuehlenhoff) a:05MoritzMuehlenhoff→03None [13:20:14] 10CAS-SSO, 10Gerrit, 06Infrastructure-Foundations: Use IDP for authentication in Gerrit - https://phabricator.wikimedia.org/T147864#11412908 (10Tacsipacsi) 05Stalled→03Open Thanks for the response! So it’s ready to be worked on, even if that work is unlikely to happen in the near future (at least by staf... [14:06:34] Hey! Just FYI, we're transferring a Blazegraph journal from wdqs2009 to AWS for testing purpose. The file is ~1.3TB. No PII in there. Let me (or gmodena) know if this is an issue. [14:08:00] gehel: can you rate limit the transfer ? The host is 10G, so it could potentially saturate our upstream peering link [14:08:20] XioNoX: I'll check [14:08:25] thx [14:08:39] at what rate should we limit? [14:08:41] and thanks for asking :) [14:09:10] gehel: 5Gbps should be good [14:09:23] if that's ok for you, if you need more we need to look more in details [14:10:44] We're at ~250MiB/s, so around 4Gbps. It this good enough? [14:11:37] gehel: yup, thx! [14:11:41] great! [14:12:50] Let me know if this becomes an issue, or just kill the `/usr/bin/aws s3 sync` process on wdqs2009 if needed. It should be complete in a couple of hours [14:16:23] more context on T411095 if needed [14:16:46] gehel: do you have the IPs it's sending traffic towards? [14:17:26] I can find it! [14:20:08] Actually, not sure. It's going through webproxy. [14:21:11] yeah, that's what I don't understand, I'm not seeing an uptick of traffic on the codfw proxy : https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&from=now-3h&to=now&timezone=utc&var-server=install2005&var-datasource=000000026&var-cluster=misc&refresh=5m&viewPanel=panel-8 [14:22:21] we're uploading to us-east-1, so probably 3.216.216.90, 3.216.243.220, 3.217.241.85 [14:25:11] from wdqs2009, I see connections to 2620:0:861:2:208:8:8080 [14:31:31] Oh! Wrong proxy configured! We're going through eqiad proxy. Let's fix this real quick! [14:31:46] XioNoX: thanks for asking! I wouldn't have realised otherwise. [14:32:28] ah right, that makes more sens : https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&from=now-3h&to=now&timezone=utc&var-server=install1005&var-datasource=000000026&var-cluster=misc&refresh=5m&viewPanel=panel-8 [14:32:44] yeah you will get much better perf by using the local proxy [14:40:18] transfer restarted with the correct proxy and capped at 200MiB/s [14:40:43] nice! [14:40:50] should be done in <2h [15:30:58] 10SRE-tools, 06Infrastructure-Foundations: Decide which cookbooks using icinga_hosts.wait_for_optimal() should use skip_acked=True - https://phabricator.wikimedia.org/T330136#11413437 (10MoritzMuehlenhoff) 05Open→03Resolved a:03MoritzMuehlenhoff I think all the relevant cookbooks have this enabled no... [16:23:08] 10netops, 06Infrastructure-Foundations, 06SRE, 06Traffic: Users reporting issues connecting to Gerrit with HTTPS from Orange, FR mobile network (AS 3215) - https://phabricator.wikimedia.org/T411203 (10cmooney) 03NEW p:05Triage→03Low [16:27:47] 10netops, 06Infrastructure-Foundations, 06SRE, 06Traffic: Users reporting issues connecting to Gerrit with HTTPS from Orange, FR mobile network (AS 3215) - https://phabricator.wikimedia.org/T411203#11413739 (10cmooney)