[08:49:53] !log lucaswerkmeister-wmde@tools-bastion-13 tools.phpunit-results-cache webservice restart # clear cache, maybe it fixes https://gerrit.wikimedia.org/r/1164312 (cc hashar) [08:49:55] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.phpunit-results-cache/SAL [20:43:20] Hey folks! I'm seeing some pretty serious clock skew on one of the instances in account-creation-assistance; it looks like NTP is not working because it's not able to reach any of the internal NTP servers: https://app.warp.dev/block/PmaF4FfaWVYtAJE73bcKfD [20:44:30] I thought this was maybe due to security groups, but I tried adding rules to allow ingress and egress on UDP port 123 to the internal CIDR and that didn't help either; plus, another instance with the same base security group (and the extra security groups don't do anything with UDP) is able to talk to the NTP servers just fine [20:45:25] So I'm a little stumped as to what could be the cause. Maybe I've just missed something because dum, but would appreciate any insights/thoughts. The clock drift is causing failures in validating JWTs and OAuth identity tickets from Wikipedia [20:47:02] (I did just reboot the instance which improved the situation, but even just minutes after the reboot the clock is behind the RTC by a noticable amount) [22:15:44] FastLizard4: hmmm... I see that log showing that it is trying to call the same NTP relays that all Cloud VPS instances should be using. [22:22:20] Indeed [22:38:22] Would this be something I should open a Phabricator ticket for? I'm genuinely quite stumped as to what could be causing this [22:39:28] reduce a test case? tcpdump the NTP? [22:40:03] FastLizard4: yeah, it is worth a ticket. I'm trying something on accounts-appserver7 -- I added the WMCS managed "default" security group and then `sudo timedatectl set-ntp true` to restart things. [22:40:10] try a different NTP server instead of official? [22:41:25] `timedatectl` is still not saying "System clock synchronized: yes" as hoped, but I'm not seeing connection failures in `journalctl -u systemd-timedated.service --no-pager --follow` yet [22:42:46] also can block NTP in iptables on a host that's not already broken and see what the logs look like [22:42:50] I've only been seeing the connection failures in the logs for systemd-timesyncd, not timedated [22:43:55] Just restarted it, still seeing the timeouts alas [22:44:24] FastLizard4: yeah. I was looking in the wrong place. This is really weird [22:44:58] like movie plot weird? :) [22:45:02] https://serverfault.com/a/972336 is the troubleshooting tips I was looking at [22:47:14] I took the "default" Security Group back off that instance. I think you probably should be using that one to make sure you catch changes to the monitoring servers and such, but I don't want to mess up your opentofu setup.