[07:45:48] 10Analytics: Check home/HDFS leftovers of jmads - https://phabricator.wikimedia.org/T290715 (10MoritzMuehlenhoff) [07:50:46] PROBLEM - Hadoop NodeManager on an-worker1104 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23Yarn_Nodemanager_process [07:54:34] RECOVERY - Hadoop NodeManager on an-worker1104 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23Yarn_Nodemanager_process [07:55:21] elaragon: Hi - I see you've restarted your job - let's kill that one and talk a bit before if you may, I have ideas/suggestions [08:24:13] elukey: Hi! I have a question for you [08:31:55] bonjour! sure [08:32:08] joal: sure, thanks [08:33:36] elukey: do we have access to service-logs older than a day? [08:34:27] elukey: on an-launcher1002 when I query logs from yesterday, I have as a first line: -- Logs begin at Thu 2021-09-09 11:00:06 UTC, end at Fri 2021-09-10 08:34:09 UTC. -- [08:35:51] elaragon: google meet? [08:37:10] joal: do you mean journalctl logs? [08:37:17] what command did you run? [08:37:18] yes elukey! [08:37:34] I ran : sudo journalctl -u gobblin-webrequest --since '2021-09-09 09:14' [08:38:33] /var/log/refinery/gobblin/gobblin-webrequest/webrequest.log [08:38:34] :) [08:39:20] journald persists on tmpfs (so not on disk) IIRC and it rotates periodically logs (dropping some of them) [08:39:34] \o/ thanks a milion elukey :) [08:39:38] the retention is not a lot, so this is why you see limits in journalctl [08:39:54] but for some of our timers we add an rsyslog rule to dump content to files [08:40:00] and gobblin has this option enabled [08:40:12] it is not turned on for all timers [08:40:26] (as a compromise between space used vs flexibility vs importance etc..) [08:41:11] joal: if you want a quick way to check `grep -rni gobblin /etc/rsyslog.d/` on an-launcher1002 [08:46:27] joal: Any time you like to sync on the cassandra3 import procedure and questions. [08:47:25] The import of `local_group_default_T_mediarequest_per_file/data` that I started at 6:20 PM BST is at 90% so we still have a few hours before the next step, unless we want to start more imports in parallel. [08:48:28] Hi btullis - do you have a minute to batcave? [08:48:39] Yes, see you there. [09:02:15] I can see you btullis, but can't hear you [09:16:16] joal: your wise suggestions worked :) [09:16:45] \o/! [09:54:08] 10Analytics, 10Data-Engineering: SPIKE - Will Hadoop 3 container support help us for Airflow deployment pipelines? - https://phabricator.wikimedia.org/T288247 (10JAllemandou) From scanning quickly through the docs, it seems this would require Docker to be installed on nodemanagers. [10:52:37] 10Analytics: Investigate why gobblin pulls webrequest data late - https://phabricator.wikimedia.org/T290723 (10JAllemandou) [10:53:01] 10Analytics, 10Analytics-Kanban: Investigate why gobblin pulls webrequest data late - https://phabricator.wikimedia.org/T290723 (10JAllemandou) a:03JAllemandou [13:59:45] 10Analytics, 10Analytics-Kanban: Investigate why gobblin pulls webrequest data late - https://phabricator.wikimedia.org/T290723 (10JAllemandou) Finding: The problem comes from some Gobblin tasks not pulling data! Gobblin distributes the work of pulling data into tasks. In our webrequest setup each task is re... [14:02:36] (03PS1) 10Joal: Update Gobblin kafka fetch timeout to 5s [analytics/refinery] - 10https://gerrit.wikimedia.org/r/720317 (https://phabricator.wikimedia.org/T290723) [14:03:28] Gone for kids, will be late for fun hour (but will join!) [15:01:18] 10Analytics, 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users group for Abban Dunne - https://phabricator.wikimedia.org/T289775 (10akosiaris) 05Open→03Resolved Hi @AbbanWMDE, Change has been merged now that it has been approved. It will take ~30mins to... [15:06:55] 10Analytics, 10Product-Analytics: Internal nbviewer instance for sharing notebooks among 'wmf' and 'nda' members - https://phabricator.wikimedia.org/T290693 (10mpopov) @Urbanecm Thank you!!! That works really well as an interim solution and I was able to upload & restrict https://people.wikimedia.org/~bearloga... [15:09:49] 10Analytics, 10Product-Analytics: Internal nbviewer instance for sharing notebooks among 'wmf' and 'nda' members - https://phabricator.wikimedia.org/T290693 (10Urbanecm) @mpopov Confirmed, I'm able to access the link through my NDA account, but not via my test account :-). I'm glad it works for you. The Googl... [15:14:27] (03CR) 10Bstorm: "recheck" [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/716558 (owner: 10Andrew Bogott) [15:15:53] (03CR) 10Bstorm: "This seems to end up giving a +2 after failing the primary test suite. Just rerunning to confirm. tox-docker fails because of pytest-redis" [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/716558 (owner: 10Andrew Bogott) [15:16:37] 10Quarry: Query queued for several days, can't be stopped - https://phabricator.wikimedia.org/T290743 (10GoingBatty) [15:18:25] (03CR) 10Bstorm: "Yeah, that's weird. Isn't the pipeline config just running tox as well? Why does one pass and the other fail?" [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/716558 (owner: 10Andrew Bogott) [15:27:51] 10Analytics, 10Product-Analytics: Internal nbviewer instance for sharing notebooks among 'wmf' and 'nda' members - https://phabricator.wikimedia.org/T290693 (10mpopov) [15:30:50] (03CR) 10Nskaggs: Added minimal page load test for '/' route (031 comment) [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/716558 (owner: 10Andrew Bogott) [15:38:55] (03PS11) 10Andrew Bogott: Added minimal page load test for '/' route [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/716558 [15:39:36] (03CR) 10Andrew Bogott: Added minimal page load test for '/' route (031 comment) [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/716558 (owner: 10Andrew Bogott) [15:45:50] (03CR) 10Nskaggs: Added test_health.py (031 comment) [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/720094 (https://phabricator.wikimedia.org/T210359) (owner: 10Andrew Bogott) [15:59:22] have a good weekend folks :) [16:22:22] (03CR) 10Andrew Bogott: Added test_health.py (031 comment) [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/720094 (https://phabricator.wikimedia.org/T210359) (owner: 10Andrew Bogott) [16:29:50] (03CR) 10Nskaggs: Added test_health.py (031 comment) [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/720094 (https://phabricator.wikimedia.org/T210359) (owner: 10Andrew Bogott) [16:30:24] (03CR) 10Nskaggs: [V: 03+2 C: 03+2] Added minimal page load test for '/' route [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/716558 (owner: 10Andrew Bogott) [16:30:57] (03CR) 10jerkins-bot: [V: 04-1] Added minimal page load test for '/' route [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/716558 (owner: 10Andrew Bogott) [16:35:07] (03CR) 10Andrew Bogott: "recheck" [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/716558 (owner: 10Andrew Bogott) [16:38:58] (03CR) 10Andrew Bogott: [C: 03+2] Added minimal page load test for '/' route [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/716558 (owner: 10Andrew Bogott) [16:42:21] (03Merged) 10jenkins-bot: Added minimal page load test for '/' route [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/716558 (owner: 10Andrew Bogott) [16:53:59] (03PS3) 10Andrew Bogott: Added test_health.py [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/720094 (https://phabricator.wikimedia.org/T210359) [17:21:25] (03PS4) 10Andrew Bogott: Added test_health.py [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/720094 (https://phabricator.wikimedia.org/T210359) [17:21:31] (03PS1) 10Andrew Bogott: test query routes [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/720353 [17:23:59] (03CR) 10jerkins-bot: [V: 04-1] test query routes [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/720353 (owner: 10Andrew Bogott) [17:38:40] (03CR) 10Michael DiPietro: add stop status (031 comment) [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/719567 (https://phabricator.wikimedia.org/T289349) (owner: 10Michael DiPietro) [18:11:40] 10Analytics, 10Readers-Web-Backlog, 10Patch-For-Review, 10Product-Analytics (Kanban): Add UniversalLanguageSelector to the allowlist - https://phabricator.wikimedia.org/T287256 (10jwang) 05Open→03Resolved @mforns, thank you very much for back porting the data for us. The data looks good. [18:49:09] (03PS1) 10Bstorm: testing: setup a docker test runner to prevent differences locally [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/720375 [18:49:56] (03PS2) 10Andrew Bogott: test query routes [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/720353 [18:52:26] (03CR) 10jerkins-bot: [V: 04-1] test query routes [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/720353 (owner: 10Andrew Bogott) [18:55:42] (03CR) 10Andrew Bogott: [C: 03+1] testing: setup a docker test runner to prevent differences locally [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/720375 (owner: 10Bstorm) [18:56:40] (03CR) 10Bstorm: [C: 03+2] testing: setup a docker test runner to prevent differences locally [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/720375 (owner: 10Bstorm) [18:59:57] (03Merged) 10jenkins-bot: testing: setup a docker test runner to prevent differences locally [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/720375 (owner: 10Bstorm) [19:01:21] (03PS5) 10Bstorm: Added test_health.py [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/720094 (https://phabricator.wikimedia.org/T210359) (owner: 10Andrew Bogott) [19:02:34] (03PS3) 10Bstorm: test query routes [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/720353 (owner: 10Andrew Bogott) [19:04:58] (03CR) 10jerkins-bot: [V: 04-1] test query routes [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/720353 (owner: 10Andrew Bogott) [19:06:44] (03PS6) 10Andrew Bogott: Added test_health.py [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/720094 (https://phabricator.wikimedia.org/T210359) [19:06:49] (03PS4) 10Andrew Bogott: test query routes [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/720353 [19:09:13] (03CR) 10jerkins-bot: [V: 04-1] test query routes [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/720353 (owner: 10Andrew Bogott) [19:13:44] (03PS1) 10Bstorm: testing: put back the copy for requirements [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/720379 [19:14:46] (03CR) 10Andrew Bogott: [C: 03+1] testing: put back the copy for requirements [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/720379 (owner: 10Bstorm) [19:17:20] (03CR) 10Bstorm: [C: 03+2] testing: put back the copy for requirements [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/720379 (owner: 10Bstorm) [19:20:57] (03Merged) 10jenkins-bot: testing: put back the copy for requirements [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/720379 (owner: 10Bstorm) [19:22:25] (03PS7) 10Bstorm: Added test_health.py [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/720094 (https://phabricator.wikimedia.org/T210359) (owner: 10Andrew Bogott) [19:23:19] (03PS5) 10Bstorm: test query routes [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/720353 (owner: 10Andrew Bogott) [19:29:13] (03CR) 10jerkins-bot: [V: 04-1] test query routes [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/720353 (owner: 10Andrew Bogott) [19:34:32] (03PS4) 10Michael DiPietro: add stop status [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/719567 (https://phabricator.wikimedia.org/T289349) [19:37:36] (03CR) 10Michael DiPietro: "While celery doesn't seem to be able to stop a job it might handle an external signal telling it the job was stopped. This would be cleane" [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/719567 (https://phabricator.wikimedia.org/T289349) (owner: 10Michael DiPietro) [19:37:52] (03CR) 10jerkins-bot: [V: 04-1] add stop status [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/719567 (https://phabricator.wikimedia.org/T289349) (owner: 10Michael DiPietro) [20:10:52] (03PS6) 10Andrew Bogott: test query routes [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/720353 [20:14:32] (03CR) 10jerkins-bot: [V: 04-1] test query routes [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/720353 (owner: 10Andrew Bogott) [20:16:21] (03PS7) 10Andrew Bogott: test query routes [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/720353 [20:19:42] (03CR) 10jerkins-bot: [V: 04-1] test query routes [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/720353 (owner: 10Andrew Bogott) [22:19:52] (03PS8) 10Bstorm: test query routes [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/720353 (owner: 10Andrew Bogott) [22:20:24] (03CR) 10jerkins-bot: [V: 04-1] test query routes [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/720353 (owner: 10Andrew Bogott)