[00:32:55] RESOLVED: MaxConntrack: Max conntrack at 82.17% on krb1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [01:11:55] FIRING: MaxConntrack: Max conntrack at 81.56% on krb1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [01:16:55] RESOLVED: MaxConntrack: Max conntrack at 80.06% on krb1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [01:17:55] FIRING: MaxConntrack: Max conntrack at 80.97% on krb1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [01:57:55] RESOLVED: MaxConntrack: Max conntrack at 80.26% on krb1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [02:35:55] FIRING: MaxConntrack: Max conntrack at 80.39% on krb1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [02:54:25] FIRING: SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [03:40:55] RESOLVED: MaxConntrack: Max conntrack at 80.87% on krb1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [03:47:25] FIRING: MaxConntrack: Max conntrack at 81.02% on krb1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [03:57:25] RESOLVED: MaxConntrack: Max conntrack at 81.49% on krb1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [06:54:25] FIRING: SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:13:32] 10netops, 06Infrastructure-Foundations, 10ops-drmrs: cr1-drmrs to asw1-b12-drmrs link down - https://phabricator.wikimedia.org/T389071#10655910 (10ayounsi) @RobH make sure to link the inbound shipment to the existing ticket, so remote hands can set it up directly. Let's also use the initial positions : port... [07:41:00] XioNoX: ganeti5005 is ready for the /e/n/i fix, whenever it works for you [07:57:02] 07Puppet, 06Infrastructure-Foundations, 10Keyholder, 06SRE: keyholder-proxy doesn't restart on config change - https://phabricator.wikimedia.org/T374711#10655946 (10fgiunchedi) >>! In T374711#10652054, @jhathaway wrote: >>>! In T374711#10650455, @fgiunchedi wrote: >> There's two parts to keyholder, `-proxy... [07:57:26] moritzm: cool! trying a manual ifup first before the reboot [07:58:43] sgtm [07:58:49] alright, perfect [07:59:11] rebooting [08:06:45] all fine, then I'll proceed with getting 5006 ready? [08:07:09] moritzm: +1 [08:07:23] on it [08:08:13] I'll relocated to a coworking space in the meantime, back in less than 30 [08:08:23] ack [08:15:58] ganeti5006 is ready whenever you're back [08:36:33] rebooting 5006 [08:41:02] moritzm: done (waiting for the first puppet run) [08:45:35] I'm getting 5007 ready [08:54:25] RESOLVED: SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:56:01] XioNoX: 5007 is ready [09:00:49] rebooting [09:17:09] moritzm: done [09:22:16] ack, for 5004 I need to failover the master first, will ping you when it's ready [09:22:58] thx! [09:41:41] XioNoX: 5004 is ready [09:44:51] rebooting [09:53:20] moritzm: all done! thanks [09:54:59] ack, great! [12:11:15] FIRING: ProbeDown: Service idp2004:443 has failed probes (http_idp_wikimedia_org_ip6) - https://wikitech.wikimedia.org/wiki/CAS-SSO#Alerting - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [12:16:16] RESOLVED: ProbeDown: Service idp2004:443 has failed probes (http_idp_wikimedia_org_ip6) - https://wikitech.wikimedia.org/wiki/CAS-SSO#Alerting - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [12:18:15] FIRING: [2x] ProbeDown: Service idp2004:443 has failed probes (http_idp_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/CAS-SSO#Alerting - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [12:23:15] RESOLVED: [2x] ProbeDown: Service idp2004:443 has failed probes (http_idp_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/CAS-SSO#Alerting - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [12:25:14] moritzm: Is that you restarting Tomcat? [12:26:48] no, I think it's the issue we've seen before, but since the failover to idp1004 was needed anywway I went ahead with it and will restart tomcat on 2004 next [12:28:10] Okay, that also fits with the grafana graphs. [12:28:59] Btw. sorry about the 7.1 upgrade patch, let's me know if we need to break it down into smaller patches. [12:29:31] nah, I'll good. I'll review these later or tomorrow [12:35:28] I've restarted Tomcat on idp2004, that alert should be gone for good [12:37:02] .... well, until next time or hopefully when we upgrade to 7.1 [12:38:48] I'll take that as encouragement tor review the 7.1 so quickly we can beat it :-) [12:42:26] Uh, they also fixed OIDC back channel logout support [13:08:22] topranks: XioNoX: hello! what's the BoF topic today? [13:13:11] sukhe: nothing that I'm aware off, do you have anything in mind ? [13:13:25] we're currently in a meeting with Nokia so we might be slow to reply [13:24:50] oh no worries [13:25:11] nothing for this week but I can share the mini pop draft today and we can discuss that in the next? [13:43:48] sukhe: sounds good! note that I'm usually off on thursday/friday until end of May, so it's more an exception that I'm here today [13:44:34] XioNoX: ok thanks. I guess I will wait for that session then when you are around since I already have some input from Cathal [13:50:51] happy to give feedback async if needed too [13:54:28] yeah, that's the plan too! I am almost done writing so will share shortly [14:46:31] XioNoX: shared. it's a long read but you can of course just focus on the section(s) you want to [14:46:48] we have one full quarter to review it, so not urgent :) [14:53:28] thx! [15:18:48] sukhe: ah right, 92 pages :) [15:23:04] :] [15:40:11] hey, do y'all (Infrastructure Foundations) have a mailing list? Could you DM it to me if it's private? I wanted to ask about level of effort for standing up a mini-Ganeti cluster on DPE cluster [15:41:34] inflatador: sre-foundations@wikimedia.org [15:41:48] XioNoX ACK, thanks! [15:55:22] sukhe: nice work!! [16:00:23] topranks: thanks, tear it apart :D [16:15:55] FIRING: MaxConntrack: Max conntrack at 84.83% on krb1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [16:20:55] RESOLVED: MaxConntrack: Max conntrack at 81.83% on krb1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [16:36:55] FIRING: MaxConntrack: Max conntrack at 84.29% on krb1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [16:40:11] 10CAS-SSO, 10Phabricator, 10wikitech.wikimedia.org, 07LDAP: Password reset not working for uid=maskaret,ou=people,dc=wikimedia,dc=org account - https://phabricator.wikimedia.org/T389496#10657815 (10bd808) Ok there is funkiness but it may just be confusion about SUL account vs Developer account naming. @Gry... [16:40:52] 10CAS-SSO, 10Phabricator, 10wikitech.wikimedia.org, 07LDAP: Password reset not working as expected for Gryllida's Developer account - https://phabricator.wikimedia.org/T389496#10657818 (10bd808) [16:41:55] RESOLVED: MaxConntrack: Max conntrack at 81.76% on krb1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [19:50:25] FIRING: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:55:25] RESOLVED: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [23:12:48] 10CAS-SSO, 10Phabricator, 10wikitech.wikimedia.org, 07LDAP: Password reset not working as expected for Gryllida's Developer account - https://phabricator.wikimedia.org/T389496#10659986 (10bd808) 05Open→03Invalid `lang=irc [18:33] < gry> bd808: thanks, i was able to login as you told me the  w...