[00:25:15] RECOVERY - Check unit status of monitor_refine_eventlogging_analytics on an-launcher1002 is OK: OK: Status of the systemd unit monitor_refine_eventlogging_analytics https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [09:54:32] Is the build failing on analytics/report-updater normal? I submitted a doc-only change last week and the build fails [09:57:34] phuedx: I don't think so, it looks to me like an upstream dependency can't be downloaded. `Downloading Jinja2-2.6.tar.gz (389 kB)` `Command errored out with exit status 1:` [09:59:57] Not sure why that should fail though. That version appears here: https://pypi.org/project/Jinja2/2.6/ [10:01:05] Did it fail at the same point on both test runs? [10:09:58] I'm planning to start a rolling reboot of the hadoop workers today, unless anyone has any objections. [10:24:45] btullis: Yes. Both builds are failing to fetch that dependency [10:24:54] (03CR) 10Phuedx: "Recheck" [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/777838 (https://phabricator.wikimedia.org/T305565) (owner: 10Phuedx) [10:25:10] ^ Triple checking [10:27:37] Oh! It's able to fetch the dependency but the installation is failing: [10:27:42] > ImportError: cannot import name 'Feature' [10:53:14] Bumping the minor version to 2.7 allows the tests to run. I'll submit a patch [10:53:33] phuedx: Great, thanks for that :-) [10:54:31] It looks like Feature was a deprecated API that was removed, reintroduced (and used), and then removed again: https://github.com/pypa/setuptools/issues/2017#issuecomment-596307305 [11:18:04] Proceedng to run the sre.hadoop.reboot-workers cookbook now. [11:20:04] (03PS3) 10Phuedx: Remove analytics/limn-multimedia-data repo reference [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/777838 (https://phabricator.wikimedia.org/T305565) [11:20:06] (03PS1) 10Phuedx: Fix build [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/779011 [12:13:57] (03PS2) 10Aqu: Migrate mediarequest hourly from Oozie to Airflow [analytics/refinery] - 10https://gerrit.wikimedia.org/r/775255 (https://phabricator.wikimedia.org/T302876) [12:15:46] (03CR) 10Aqu: [V: 03+2 C: 03+2] "Reviewed here: https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/41" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/775255 (https://phabricator.wikimedia.org/T302876) (owner: 10Aqu) [12:23:13] (03PS3) 10Aqu: Migrate mediarequest hourly from Oozie to Airflow [analytics/refinery] - 10https://gerrit.wikimedia.org/r/775255 (https://phabricator.wikimedia.org/T302876) [12:27:01] (03PS4) 10Aqu: Migrate mediarequest hourly from Oozie to Airflow [analytics/refinery] - 10https://gerrit.wikimedia.org/r/775255 (https://phabricator.wikimedia.org/T302876) [12:28:06] (03CR) 10Aqu: [V: 03+2 C: 03+2] Migrate mediarequest hourly from Oozie to Airflow [analytics/refinery] - 10https://gerrit.wikimedia.org/r/775255 (https://phabricator.wikimedia.org/T302876) (owner: 10Aqu) [12:35:00] !log About to deploy refinery/source "Migrate mediarequest hourly from Oozie to Airflow" [12:35:02] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [12:35:48] !log About to deploy analytics/refinery "Migrate mediarequest hourly from Oozie to Airflow" (replace previous msg) [12:35:51] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [12:37:46] o. [12:37:47] o/ [13:00:28] morning all [13:02:55] Hiya folks. [13:05:11] I apologize for any additional refine failures that are caused by the rolling reboot of the hadoop workers. [13:05:38] looking into some of those now [13:05:43] any of the readingdepth ones are caused by [13:06:09] v [13:06:09] https://meta.wikimedia.org/w/index.php?title=Schema%3AReadingDepth&type=revision&diff=23114987&oldid=18559669 [13:06:19] timo 'deleted' the schema [13:06:58] Ah, ok. [13:07:26] https://gerrit.wikimedia.org/r/c/operations/puppet/+/779025 [13:26:18] 10Data-Engineering, 10Equity-Landscape: Deploy the GDI Equity Landscape Dashboard - https://phabricator.wikimedia.org/T305468 (10EChetty) [13:26:59] 10Data-Engineering, 10Equity-Landscape: Milestone: Dashboard Mockup Complete - https://phabricator.wikimedia.org/T305476 (10EChetty) [13:27:23] 10Data-Engineering, 10Equity-Landscape: Milestone: Dashboard Interaction Map Complete - https://phabricator.wikimedia.org/T305477 (10EChetty) [13:27:43] (03CR) 10Ottomata: [WIP] Add flink job reporting webrequest patterns (032 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/763610 (owner: 10Joal) [13:28:01] 10Data-Engineering, 10Equity-Landscape: Milestone: Data Visualization Table Views defined - https://phabricator.wikimedia.org/T305478 (10EChetty) [13:28:28] 10Data-Engineering, 10Equity-Landscape: Milestone: Dashboard Template Complete - https://phabricator.wikimedia.org/T305479 (10EChetty) [13:28:49] 10Data-Engineering, 10Equity-Landscape: Milestone: Publish the Dashboard! - https://phabricator.wikimedia.org/T305481 (10EChetty) [13:29:25] 10Data-Engineering, 10Equity-Landscape: Milestone: Create and Publish Data Visualisation Views: - https://phabricator.wikimedia.org/T305480 (10EChetty) [13:30:56] 10Data-Engineering, 10Equity-Landscape, 10Epic: Deploy the GDI Equity Landscape Dashboard - https://phabricator.wikimedia.org/T305468 (10EChetty) [13:32:17] 10Data-Engineering, 10Data-Engineering-Kanban, 10Airflow, 10Patch-For-Review: Data Quality: Airflow migration - https://phabricator.wikimedia.org/T304884 (10EChetty) [13:33:09] 10Data-Engineering, 10Data-Engineering-Kanban, 10Airflow, 10Patch-For-Review: Data Quality: Airflow migration - https://phabricator.wikimedia.org/T304884 (10EChetty) 05Open→03Resolved [13:39:00] 10Analytics-Radar, 10Data-Engineering, 10Discovery, 10Event-Platform: '.event.pageViewId' should be string, '.event.subTest' should be string, '.event.searchSessionId' should be string - https://phabricator.wikimedia.org/T286814 (10EChetty) p:05Triage→03High [13:40:50] 10Data-Engineering, 10Airflow: Add a new member to the airlfow-dags repository - https://phabricator.wikimedia.org/T305719 (10EChetty) [13:51:43] btullis: does datahub need ldap write access? if not, could it be migrated to use the ldap-ro.$DC.wikimedia.org ips instead? [13:55:02] taavi: No it doesn't need write access. It's only authentication. Oh, I see. I didn't realize that there were special IPs for the read-only services. [13:56:50] Just the ipv4s, right? No ipv6 necessary. [13:58:40] looks like it doesn't currently have ipv6 endpoints [13:59:05] those are lvs vips backed up by the ldap-replica* hosts [14:00:06] Nice. Wasn't aware of that option. Thanks. [14:03:38] taavi: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/779039 [14:04:30] do you also need to change the actual hostnames somewhere? [14:05:18] https://usercontent.irccloud-cdn.com/file/bOz7Wnkq/image.png [14:05:25] https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/778345/6/charts/datahub/charts/datahub-frontend/jaas-ldap.conf [14:05:38] (03CR) 10Mforns: "LGTM! Left 2 minor comments. Otherwise +1!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/778550 (https://phabricator.wikimedia.org/T300025) (owner: 10NOkafor) [14:05:53] Hostnames are already using ldap-ro.$dc.wikimedia.org so in fact my previous configuration wouldn't have worked. :-) [14:06:34] ah, +1'd [14:06:45] (03CR) 10Gehel: [WIP] Add flink job reporting webrequest patterns (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/763610 (owner: 10Joal) [14:07:01] Thanks again [14:08:12] taavi: o/ since you are here: https://phabricator.wikimedia.org/T304433#7819828 :) [14:18:58] ottomata: ./2022-03-31.log:61:[16:44:41] ottomata: I was probably trying to unbreak something that was broken at the time, phab activity matching the timestamps might give more insight. sorry I don't have more details :/ feel free to hack it as much as you need as long as things don't break [14:32:17] (03PS1) 10Milimetric: Fix semantic-datepicker dependency [analytics/dashiki] - 10https://gerrit.wikimedia.org/r/779046 [14:32:40] (03CR) 10Milimetric: [V: 03+2 C: 03+2] Fix semantic-datepicker dependency [analytics/dashiki] - 10https://gerrit.wikimedia.org/r/779046 (owner: 10Milimetric) [14:50:45] ok thanks taavi [14:51:09] 10Data-Engineering, 10Data-Engineering-Kanban, 10Beta-Cluster-Infrastructure, 10Event-Platform: Upgrade event platform related VMs in deployment-prep to Debian bullsye (or buster) - https://phabricator.wikimedia.org/T304433 (10Ottomata) From @Majavah: > ./2022-03-31.log:61:[16:44:41] ottomata: I... [14:59:23] (03PS1) 10Luke Bowmaker: Image Suggestions feature schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/779052 [15:00:20] (03CR) 10jerkins-bot: [V: 04-1] Image Suggestions feature schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/779052 (owner: 10Luke Bowmaker) [15:01:01] (03CR) 10Luke Bowmaker: "Please review when you get a chance. I hope I did this right...." [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/779052 (owner: 10Luke Bowmaker) [15:08:56] (03CR) 10Ottomata: "Nice, let's get together and bikeshed some things, maybe with erik and gabriele or other folks too" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/779052 (owner: 10Luke Bowmaker) [15:17:22] 10Data-Engineering, 10Data-Engineering-Kanban, 10Patch-For-Review: Upgrade Turnilo - https://phabricator.wikimedia.org/T301990 (10razzi) a:03razzi According to @hashar we can get node 12.22.5 by upgrading Debian to version 11 Bullseye (staging and production Turnilo run Debian 10). I'll try upgrading Debia... [15:52:47] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Catalog, 10Patch-For-Review: Define the Helm charts and helmfile deployments for Datahub - https://phabricator.wikimedia.org/T301454 (10BTullis) Marking this task as complete. There are still some minor tweaks to the charts to support the deployments, b... [15:54:17] 10Data-Engineering-Kanban, 10Airflow: Fix use of Java LinkedHashMap caching in Spark multi-threaded environment - https://phabricator.wikimedia.org/T305386 (10EChetty) p:05Triage→03High [16:28:59] About my error on scap deploy on airflow-dags/analytics I run the command manually without problems: [16:29:00] aqu@an-launcher1002:/srv/deployment/airflow-dags/analytics$ sudo -u analytics /usr/local/bin/kerberos-run-command analytics /usr/lib/airflow/bin/artifact-cache warm /srv/deployment/airflow-dags/analytics/wmf_airflow_common/config/artifact_config.yaml /srv/deployment/airflow-dags/analytics/analytics/config/artifacts.yaml [16:29:00] Artifact(refinery-job-0.1.23-shaded): [16:29:00] hdfs:///wmf/cache/artifacts/airflow/org.wikimedia.analytics.refinery.job_refinery-job_jar_shaded_0.1.23 (exists=True) [16:29:00] https://archiva.wikimedia.org/repository/releases/org/wikimedia/analytics/refinery/job/refinery-job/0.1.23/refinery-job-0.1.23-shaded.jar (exists=True) [16:29:00] Artifact(refinery-job-0.1.24-shaded): [16:29:00] hdfs:///wmf/cache/artifacts/airflow/org.wikimedia.analytics.refinery.job_refinery-job_jar_shaded_0.1.24 (exists=True) [16:29:01] https://archiva.wikimedia.org/repository/releases/org/wikimedia/analytics/refinery/job/refinery-job/0.1.24/refinery-job-0.1.24-shaded.jar (exists=True) [16:29:01] Artifact(refinery-hive-0.1.25-shaded): [16:29:02] hdfs:///wmf/cache/artifacts/airflow/org.wikimedia.analytics.refinery.hive_refinery-hive_jar_shaded_0.1.25 (exists=True) [16:29:02] https://archiva.wikimedia.org/repository/releases/org/wikimedia/analytics/refinery/hive/refinery-hive/0.1.25/refinery-hive-0.1.25-shaded.jar (exists=True) [16:29:24] I will try to run scap deploy again [16:33:04] mforns, ottomata: No more errors after manually running `artifact-cache` on an-launcher. Thanks for your help! [16:34:24] I have a stack trace if you want. But, for me, I think I will let it go till the next time. [16:38:01] hm okay [16:38:06] sounds like a bug for sure [16:38:13] sorry i haven't had time to look yet! [16:38:20] maybe make a phab task just in case so i can follow up? [16:39:11] Alright [16:47:14] 10Data-Engineering: Crash of artifact-cache in scap deploy context - https://phabricator.wikimedia.org/T305868 (10Antoine_Quhen) [16:47:59] ottomata: https://phabricator.wikimedia.org/T305868 [16:48:32] 10Data-Engineering: Crash of artifact-cache in scap deploy context - https://phabricator.wikimedia.org/T305868 (10Ottomata) a:03Ottomata [16:49:26] 10Data-Engineering-Kanban, 10Airflow: Medium Complexity Oozie Migration: mobile_apps-session_metrics - https://phabricator.wikimedia.org/T302874 (10EChetty) [16:49:31] (03PS5) 10Snwachukwu: [WIP] Create a Hive to Graphite job [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/775376 (https://phabricator.wikimedia.org/T304623) [16:52:53] (03CR) 10Snwachukwu: "Hi Marcel," [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/775376 (https://phabricator.wikimedia.org/T304623) (owner: 10Snwachukwu) [17:23:27] (03CR) 10Ottomata: Fixing typo in desktopwebuiactionstracking schema (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/777876 (https://phabricator.wikimedia.org/T301391) (owner: 10Jdrewniak) [17:32:26] (03CR) 10Mforns: "LGTM in general!!" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/775376 (https://phabricator.wikimedia.org/T304623) (owner: 10Snwachukwu) [17:38:10] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Catalog, 10Patch-For-Review: Configure LDAP authentication for the DataHub frontend - https://phabricator.wikimedia.org/T301462 (10BTullis) >>! In T301462#7840934, @Milimetric wrote: > Superset seems to be doing this through CAS, but maybe you can find... [17:39:05] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Catalog, 10Patch-For-Review: Configure LDAP authentication for the DataHub frontend - https://phabricator.wikimedia.org/T301462 (10BTullis) LDAP authentication is now working on the datahub staging deployment. {F35047097,width=80%} [17:44:52] 10Data-Engineering, 10Data-Catalog, 10Infrastructure-Foundations, 10CAS-SSO, 10Epic: Switch DataHub authentication to OIDC - https://phabricator.wikimedia.org/T305874 (10BTullis) [17:53:02] (03PS1) 10Jdlrobson: Fixes typo in schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/779084 (https://phabricator.wikimedia.org/T301391) [17:53:51] (03CR) 10Jdlrobson: [C: 04-2] "Can be abandoned. See https://gerrit.wikimedia.org/r/c/schemas/event/secondary/+/779084" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/777876 (https://phabricator.wikimedia.org/T301391) (owner: 10Jdrewniak) [17:54:07] (03CR) 10Ottomata: [C: 03+2] "Allowing a schema modification in this case, because 1.2.0 never had real events produced, and this is not a data type change." [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/779084 (https://phabricator.wikimedia.org/T301391) (owner: 10Jdlrobson) [17:54:50] mforns: responded on https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/43#note_5777 [17:54:52] that okay? [18:32:41] yess! no problemo [18:35:36] (03CR) 10AGueyte: [C: 03+2] Add event_ipinfo_version to ipinfo_interaction schema (032 comments) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/777816 (https://phabricator.wikimedia.org/T296417) (owner: 10Tchanders) [18:36:07] (03Merged) 10jenkins-bot: Add event_ipinfo_version to ipinfo_interaction schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/777816 (https://phabricator.wikimedia.org/T296417) (owner: 10Tchanders) [18:36:54] (03CR) 10AGueyte: "Tested with the "Record the access level of the user when logging the 'open_popup' event" patch" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/777816 (https://phabricator.wikimedia.org/T296417) (owner: 10Tchanders) [19:22:17] 10Data-Engineering: Request to add user gmodena to analytics-research-admins group - https://phabricator.wikimedia.org/T305880 (10gmodena) [19:23:33] 10Data-Engineering: Request to add user gmodena to analytics-research-admins group - https://phabricator.wikimedia.org/T305880 (10gmodena) [19:26:21] 10Data-Engineering: Request to add user gmodena to analytics-research-admins group - https://phabricator.wikimedia.org/T305880 (10Ottomata) Approved, and I'll make it happen :) [19:48:25] hello! I'm trying to hunt down VMs which don't have https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/236bca04c631da8bebca54baf4c9aaaef011ab4f applied yet, airflow-test-1.analytics.eqiad1.wikimedia.cloud is popping up in the logs and is refusing my root key, could someone please make sure it's running puppet properly? [19:50:51] ottomata: ^^ [20:00:30] 10Data-Engineering, 10LDAP-Access-Requests: Request to add user gmodena to analytics-research-admins group - https://phabricator.wikimedia.org/T305880 (10Peachey88) [20:09:05] taavi: i just deleted the 2 airflow-test instnaces, haven't used them since last may for some puppet development [20:11:53] 10Data-Engineering, 10Data-Engineering-Kanban, 10Patch-For-Review: Pageview definition relies on X-Analytics to determine special pages - https://phabricator.wikimedia.org/T304362 (10Milimetric) Indeed, `namespace_id` and `page_id` are not set. The data I'm looking at is like: uri_path: /wiki/Special:Watch... [20:13:26] works for me. thanks! [20:55:22] (03CR) 10Mforns: [WIP] Create a Hive to Graphite job (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/775376 (https://phabricator.wikimedia.org/T304623) (owner: 10Snwachukwu)