[00:02:48] (SystemdUnitFailed) firing: (2) krb5-admin-server.service Failed on krb2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:05:13] (DiskSpace) resolved: Disk space urldownloader1001:9100:/ 0% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=urldownloader1001 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [04:04:17] (SystemdUnitFailed) firing: (2) krb5-admin-server.service Failed on krb2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:04:17] (SystemdUnitFailed) firing: (2) krb5-admin-server.service Failed on krb2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:36:59] topranks: when you have a sec can you give https://gerrit.wikimedia.org/r/c/operations/alerts/+/908556 a final pass [09:37:39] jbond: sure will do [09:37:44] cheers [09:38:06] (SystemdUnitFailed) firing: (5) krb5-admin-server.service Failed on krb2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:42:48] (SystemdUnitFailed) firing: (5) krb5-admin-server.service Failed on krb2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:29:27] 10netops, 10Infrastructure-Foundations, 10SRE: eqiad/codfw virtual-chassis upgrades - https://phabricator.wikimedia.org/T327248 (10Volans) [10:45:20] 10netops, 10DBA, 10Data-Engineering, 10Infrastructure-Foundations, and 8 others: eqiad row D switches upgrade - https://phabricator.wikimedia.org/T333377 (10BTullis) [10:52:13] (DiskSpace) firing: Disk space urldownloader1001:9100:/ 5.88% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=urldownloader1001 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [11:11:23] 10netops, 10Infrastructure-Foundations, 10SRE: eqiad/codfw virtual-chassis upgrades - https://phabricator.wikimedia.org/T327248 (10ayounsi) [11:15:59] 10netops, 10Infrastructure-Foundations, 10SRE: eqiad/codfw virtual-chassis upgrades - https://phabricator.wikimedia.org/T327248 (10ayounsi) [12:10:40] jbond: https://gerrit.wikimedia.org/r/c/operations/puppet/+/909237 [12:11:26] We can also roll the original patch back ... Again :-) [12:13:40] slyngs: gave a quick pass and left a coment, going for food now can take a better look when im back if needed [12:15:41] I'll just fix the comment and merge it... And keep an eye on it [12:37:13] (DiskSpace) resolved: Disk space urldownloader1001:9100:/ 3.161% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=urldownloader1001 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [13:00:48] jbond: Just had to move a few lines: https://gerrit.wikimedia.org/r/c/operations/puppet/+/909250 [13:06:17] slyngs: TIL i didn;t reaalise template was affected by parse order [13:07:54] Is it just because of the way the template is used? Puppet just build the variable and "carry" it around, but if you use templates elsewhere it's bundled up and applied [13:08:34] Anyway it seems to work now, then we just need to see if 48 is to many to keep around [13:09:21] when generating the template the template recives all the variables defined in "node scope". im gussing it compiles the template as it hits it during parse order so only the variables defined higher in the file have been added to node scope [13:09:56] and ack, the compress well so i think we will be fine from that pov [13:10:52] I just manually compress the syslog.1 because we where getting close to zero disk space, but I think that confused logrotate , unless it's still working on rotating [13:11:38] Ah, no, okay, that fine, syslog just rotate once a day [13:24:12] yes only the quid logs rotate hourly [13:39:17] (SystemdUnitFailed) firing: (4) krb5-admin-server.service Failed on krb2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:42:48] (SystemdUnitFailed) firing: (4) krb5-admin-server.service Failed on krb2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:52:25] jobo: I don't think I'll make meeting. guy from Eir is just arrived to install new phone line [13:52:48] supposed to have been here this morning but what's new [13:55:06] topranks who uses land line anymore? ;) ok thanks for letting me know. [13:57:31] jobo: it's for the internet of course :D [14:04:01] who uses internet anymore... ah no :D [14:19:17] (SystemdUnitFailed) firing: (4) krb5-admin-server.service Failed on krb2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:22:48] (SystemdUnitFailed) firing: (4) krb5-admin-server.service Failed on krb2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:04:56] jbond / moritzm: https://gerrit.wikimedia.org/r/c/operations/puppet/+/909295 yet another attempt at make squid logs behave [15:28:19] slyngs: i think we should change post_rotate so it takes a list of lines, see https://gerrit.wikimedia.org/r/c/operations/puppet/+/909301 [15:28:25] running pcc now [15:32:48] (SystemdUnitFailed) firing: (4) krb5-admin-server.service Failed on krb2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:34:17] (SystemdUnitFailed) firing: (4) krb5-admin-server.service Failed on krb2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:44:55] jbond: That look much better [15:45:27] slyngs: i have merged mine and up[dated yours, should be good to go now [15:46:17] You truly are the quickest Puppet-wrangler in the south :-) [15:46:44] lol :D [15:47:11] Running puppet on the urldownloader now [15:47:16] cool [15:49:02] I'll take a look in 12 minutes when it's suppose to run [15:49:20] ack sgtm [16:02:16] jbond: Works :-) [16:05:43] \o/ WootWoot :) [16:09:56] Only took like five attempts :-) [16:19:47] lol :D got there in the end thats all that matters ;) [16:23:51] slyngs: I added a saved filter to include or exclude urldownloader hosts from https://logstash.wikimedia.org/app/dashboards#/view/58c908a0-a394-11ec-bf8e-43f1807d5bc2 [16:24:15] Cool, thanks [19:37:48] (SystemdUnitFailed) firing: (4) krb5-admin-server.service Failed on krb2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [20:32:48] (SystemdUnitFailed) firing: (5) httpbb_kubernetes_mw-web_hourly.service Failed on cumin1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [20:34:17] (SystemdUnitFailed) firing: (6) httpbb_kubernetes_mw-api-int_hourly.service Failed on cumin1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:24:17] (SystemdUnitFailed) firing: (6) httpbb_kubernetes_mw-api-int_hourly.service Failed on cumin1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:27:48] (SystemdUnitFailed) firing: (6) httpbb_kubernetes_mw-api-int_hourly.service Failed on cumin1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:32:49] (SystemdUnitFailed) firing: (6) httpbb_kubernetes_mw-api-int_hourly.service Failed on cumin1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:34:17] (SystemdUnitFailed) firing: (6) httpbb_kubernetes_mw-api-int_hourly.service Failed on cumin1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed