[06:18:33] morning! Correction from what I wrote yesterday: https://phabricator.wikimedia.org/T368238 [06:19:05] it seems that Wikifeeds tlsproxy usage went up since the middle of April, opened a task [10:09:57] FYI, I'll "depool" (by means of redirecting CNAMEs to the eqiad servers) the codfw LDAP replicas as prep work for moving them to ipip. there should be no issues, but if you see anything strange, please let me know [11:05:26] I'm running the `sre.dns.netbox` cookbook and I see some unexpected entries related to `ssw1-d1-codfw`,`ssw1-d8-codfw`, and `lsw1-d1-codfw`. Should I proceed, or abort? [11:12:05] btullis: can you share the diff? [11:12:53] XioNoX: Sure thing. https://phabricator.wikimedia.org/P65389 [11:13:23] btullis: all good! [11:13:34] Many thanks. [12:30:19] topranks: no probs, I just wanted to double check. [14:33:24] There seems to be a toolforge outage. Can't log in to dev/login.toolforge.org, my tmux session hangs on NFS, my tools hang writinge to log files. Started a few minutes ago. [14:34:08] yeah, SAL is borked too it seems [14:47:47] CountCount: can you ask in #wikimedia-cloud ? [14:48:31] cdanis: I asked there and it's back now [14:48:36] ah okay [14:48:41] thanks :) [14:48:43] I forgot to update here sorry :] [14:48:54] np, I had fallen out of the channel myself so I didn't see [19:17:31] mutante: want me to merge 'add Daphne Smit to ldap_only users'? [19:18:19] * andrewbogott does it [19:27:25] andrewbogott: yes please [20:23:17] btullis: this is lower priority than dealing with the network flood 9 [20:23:31] (if it's still happening) but I'm getting a new alert from a toolforge infra service: "pymysql.err.OperationalError: (2003, "Can't connect to MySQL server on 'an-redacteddb1001.eqiad.wmnet' (timed out)")" [20:23:47] Seems like that's a new host, or newly renamed host, maybe it's missing a ferm rule? [20:25:15] It's the replacement for clouddb1021. I wasn't aware that anything in toolforge should be connecting to it yet. [20:26:19] I removed a stray AAAA record for an-redacteddb1001 earlier, so you may find that it is no longer happening. [20:26:43] Unless you removed the record in the last 5 minutes, it's still happening [20:27:06] T368316 [20:27:07] T368316: maintain-dbusers.service failing on cloudcontrol1005 - https://phabricator.wikimedia.org/T368316 [20:27:20] toolforge certainly /was/ connecting to clouddb1021, so maybe this is a search/replace issue? [20:30:03] the config file in question enumerates profile::mariadb::section_ports [20:30:55] https://www.irccloud.com/pastebin/uaxrLEdv/ [20:54:45] btullis: networking at the interface between cloud-vps and prod is weird. I think we should just leave this until cathal has a chance to look, probably he's doing something fancy for the other existing clouddb hosts. [20:55:36] OK, I've rebooted an-redacteddb1001 for good measure. It's still pre-prod. [22:57:54] grafana question for folks: anyone aware of a way to search the data sources used in grafana dashboards for a specific string? (in my case, a metric label that might be statically configured in many prometheus queries)