[11:08:33] lunch [11:36:28] lunch 2 [13:57:06] dcausse: just to check, will you be there for the Mobile search schema meeting in 1.5h? You have not responded to the invite [13:57:28] gehel: yes I think I'll make it, will ack [13:57:34] thanks! [14:04:03] o/ [14:24:25] The data reloads on wdqs1009 and wdqs2009 might be stuck? Nothing logged in spicerack since 2023-01-08 00:43:18 [14:25:00] looking [14:25:59] wdqs1009 is importing file:///srv/wdqs/munged/wikidump-000000251.ttl.gz [14:26:06] can't log in wdqs2009 [14:26:48] dcausse thanks, was stracing on 1009 [14:28:00] will check on wdqs2009, I can't get in either [14:37:12] dunno what to make of wdqs2009 so far. No SSH, but it does ping. mgmt console unresponsive, but NFS traffic is going back and forth between it and clouddumps. [14:38:04] I vaguely remember seeing few alerts flapping on the new codfw hosts, ssh was one of them [14:50:51] I think let's wait until the munge is finished on 1009 and check it again, since NFS traffic still seems to be flowing [14:51:51] you mean 2009? wdqs1009 is done with the munging as it's importing to blazegraph now [14:52:52] dcausse sorry, just saw the word 'munge'. Should have said wait until the reload is complete on wdqs1009 [14:53:23] ok [14:58:34] nm, I'm gonna reboot it. No log messages on centrallog for more than 24h [15:03:51] OK, host is rebooted, SSH back up, logs coming in to centrallog [15:04:04] just started reload again [16:01:28] dcausse, pfischer: triage meeting: https://meet.google.com/eki-rafx-cxi [17:01:08] hmm, i guess first is to check into what all these new airflow complaints are about [17:10:13] ebernhardson: all refinery jobs stopped during the week end [17:10:47] ahh, that would explain it. yea poking at the failed task instances seemed like you already cleared it up [17:10:54] I tried to clear the failed ones that I could identify [17:11:27] imo easiest way to find failed task instances is from browse->task instances, set a filter on state failed and sort by date [17:11:45] oh [17:12:16] indeed that's better than the dag view, esp on hourly jobs [17:12:50] they just finished the backfill apparently so things will start moving again hopefully [17:13:04] i think you only missed one, i saw one in there but everything else is from november or earlier [17:13:25] did the elastic plugins get shipped since i was out, for the LTR feature collection bug? [17:13:35] no I don't think so [17:13:42] ebernhardson no, it's on our radar though [17:13:49] ok [17:14:02] we can probably finish that in the next day or so [18:42:21] ryankemper: is there anything urgent we need to discuss in our 1:1? I'm not feeling super well, so could we reschedule for tomorrow? [18:44:37] lunch , back in ~1h [18:56:21] gehel: that's fine with me! [19:01:32] Thanks! [19:05:47] dinner [19:09:24] Error: Class 'Onoi\MessageReporter\ObservableMessageReporter' not found :S [19:14:10] seems otherrs have simply been ignoring CI in REL1_39 to get Cirrus patches through, antoine poked it a little in november but doesn't seem to have gone anywhere. [19:18:53] seems that while onoi/message-reporter is in master branch, it's not in the REL1_39 vendor repo. Probably just add it? [19:40:46] back [19:45:21] ryankemper let's work on https://phabricator.wikimedia.org/T324247 during our pairing if that is OK [21:05:31] ^^ New LTR plugin is packaged. It's currently deployed in relforge and cloudelastic is in progress [21:21:04] quick break, back in ~15 [21:42:04] \o/ thanks [21:45:06] back . Cloudelastic is finished, starting CODFW shortly [23:01:28] ryankemper about to head out, are you able to keep an eye on the codfw restarts? It's up in my user's tmux window on cumin1001 [23:01:48] window is called 'codfw-ltr-plugin' [23:13:06] we went red for a second on chi and omega in codfw, but it recovered [23:14:55] inflatador: watching now [23:15:30] ryankemper ACK , thanks