[01:34:14] (ProbeDown) firing: (2) Service gitlab1003:443 has failed probes (http_gitlab_replica_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [01:38:15] I suspected and confirmed.. this happens while the daily restore cron runs on the replica ^ [01:38:18] bash /srv/gitlab-backup/gitlab-restore.sh [01:38:50] so we need to do something about that or we will get a fake alert every day which then recovers a little while later.. but it's not a real problem [01:39:22] kind of a nice side effect is we see that it _does_ the restore and when it ends and stuff comes back [01:42:50] ah good find [01:43:22] this turned into email too but only to the "collab" subteam, fwiw [01:43:25] where I replied to it [01:43:56] new blackbox monitoring has the new "receivers" for it [01:45:10] same stuff for OTRS we agreed to keep off during inspiration week and activate afterwards.. until this all stabilizes [01:47:42] we need to run the downtime cookbook from the trigger for this or something... [01:48:30] ideally we don't have to know for how long but we can just actively send "downtime start" and "downtime stop" [01:48:40] dont know that yet about alertmanager [01:49:14] (ProbeDown) resolved: (2) Service gitlab1003:443 has failed probes (http_gitlab_replica_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [01:49:41] double checked. backup stopped and bash /srv/gitlab-backup/gitlab-restore.sh [01:49:57] ended and that's whemn https://gitlab-replica.wikimedia.org/explore is up again [01:50:04] goes afk again [02:28:53] 10serviceops, 10observability, 10GitLab (Initialization), 10Patch-For-Review: Define monitoring for gitlab - https://phabricator.wikimedia.org/T275170 (10Dzahn) Adding the history of changes that we should have all linked here. Also this is a way to share information with @thcipriani because we have talke... [02:30:34] 10serviceops, 10observability, 10GitLab (Initialization), 10Patch-For-Review: Define monitoring for gitlab - https://phabricator.wikimedia.org/T275170 (10Dzahn) Optionally we could reopen this ticket for just a short time until we declare it done and link the alert dashboard. [02:35:56] fixing with https://gerrit.wikimedia.org/r/812427 - don't want to spam you during that week [02:52:52] 10serviceops, 10serviceops-collab: monitoring / VRTS - new blackbox check reports 'ProbeDown' - https://phabricator.wikimedia.org/T312194 (10Dzahn) >>! In T312194#8060699, @fgiunchedi wrote: > I have clarified a bit the wording at https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown on what the lab... [03:04:05] 10serviceops, 10serviceops-collab: monitoring / VRTS - new blackbox check reports 'ProbeDown' - https://phabricator.wikimedia.org/T312194 (10Dzahn) >>>! In T312194#8057548, @Dzahn wrote: >> I added silences in alerts.wikimedia.org for all of these. silence feels like disabling notifications though. What I real... [03:05:27] 10serviceops, 10serviceops-collab: otrs1001 - ProbeDown - https://phabricator.wikimedia.org/T312609 (10Dzahn) [03:06:30] 10serviceops, 10serviceops-collab: monitoring / VRTS - new blackbox check reports 'ProbeDown' - https://phabricator.wikimedia.org/T312194 (10Dzahn) P.S. In tickets like this and T312609 it would be nice if we can somehow get the host name / service name in there. Currently the ticket tiles are just "Probe Down... [03:08:51] 10serviceops, 10serviceops-collab, 10vrts: VRTS monitoring- re-activate new blackbox check (was: 'ProbeDown') - https://phabricator.wikimedia.org/T312194 (10Dzahn) p:05Triage→03Medium [03:11:28] 10serviceops, 10serviceops-collab, 10vrts, 10Patch-For-Review: VRTS monitoring- re-activate new blackbox check (was: 'ProbeDown') - https://phabricator.wikimedia.org/T312194 (10Dzahn) the alert/check that created this ticket automatically has been disabled currently. We agreed to leave it that way for the... [03:15:58] 10serviceops, 10serviceops-collab, 10vrts, 10Patch-For-Review: VRTS monitoring- re-activate new blackbox check (was: 'ProbeDown') - https://phabricator.wikimedia.org/T312194 (10Dzahn) [03:16:05] 10serviceops, 10SRE, 10Znuny, 10serviceops-collab, 10Sustainability (Incident Followup): enhance Znuny (otrs) alerting - https://phabricator.wikimedia.org/T303190 (10Dzahn) [03:17:45] 10serviceops, 10serviceops-collab, 10vrts, 10Patch-For-Review: VRTS monitoring- re-activate new blackbox check (was: 'ProbeDown') - https://phabricator.wikimedia.org/T312194 (10Dzahn) not sure if duplicate or child of T303190 but highly related either way, just thought of that one [03:20:28] 10serviceops, 10Znuny, 10serviceops-collab, 10vrts, 10Patch-For-Review: VRTS monitoring- re-activate new blackbox check (was: 'ProbeDown') - https://phabricator.wikimedia.org/T312194 (10Dzahn) [03:23:25] 10serviceops, 10Znuny, 10serviceops-collab, 10vrts, 10Patch-For-Review: VRTS monitoring- re-activate new blackbox check (was: 'ProbeDown') - https://phabricator.wikimedia.org/T312194 (10Dzahn) [11:24:57] 10serviceops, 10WikimediaDebug, 10Patch-For-Review, 10Performance-Team (Radar): Add "php 7.4" option to the Wikimedia Debug extension - https://phabricator.wikimedia.org/T312653 (10Mainframe98) a:03Mainframe98 [11:26:45] 10serviceops, 10WikimediaDebug, 10Patch-For-Review, 10Performance-Team (Radar): Add "php 7.4" option to the Wikimedia Debug extension - https://phabricator.wikimedia.org/T312653 (10Mainframe98) I'd appreciate another pair of eyes on the patch I uploaded. I cannot get the debug header to actually contain an... [19:21:22] 10serviceops, 10WikimediaDebug, 10Patch-For-Review, 10Performance-Team (Radar): Add "php 7.4" option to the Wikimedia Debug extension - https://phabricator.wikimedia.org/T312653 (10Krinkle) >>! In T312653#8067203, @Mainframe98 wrote: > I cannot get the debug header to actually contain anything but the debu...