[07:01:22] Good morning [07:10:45] bonjour [07:18:49] !log Rerun cassandra-daily-wf-local_group_default_T_pageviews_per_article_flat-2021-6-11 [07:18:51] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [09:08:59] hello team! [09:09:15] wow :) European-time fdans [09:09:17] Hi fdans ) [09:12:33] fdans: hola!! [09:12:59] o/ [11:55:04] (03CR) 10Fdans: "@awight I see these two code reviews slipped through the cracks, still need a review? Thank you!" [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/682747 (https://phabricator.wikimedia.org/T193169) (owner: 10Awight) [12:42:13] hellooo teamm! [12:42:33] Hi mforns [12:45:27] looking at cassandra alerts [12:46:30] mforns: I restarted the job this morning [12:46:45] I forgot to answer the email after the job succeeeded - doing it now [12:50:45] fdans: Thanks for the note! If you have a CEST minute, my greatest wish would actually be some help with https://phabricator.wikimedia.org/T273748#7051951 . The "native hql" patches in other people's reports are a nice to have and can be merged by the report owners, at any pace. [13:38:55] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Patch-For-Review: LandingPageImpression Event Platform Migration - https://phabricator.wikimedia.org/T282855 (10Ottomata) [13:40:08] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Patch-For-Review: WMDEBanner* Event Platform Migration - https://phabricator.wikimedia.org/T282562 (10Ottomata) [13:40:42] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Patch-For-Review: LandingPageImpression Event Platform Migration - https://phabricator.wikimedia.org/T282855 (10Ottomata) [13:42:36] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 5 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10Ottomata) [13:42:50] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 5 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10Ottomata) [13:46:25] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Fundraising-Backlog, 10MW-1.37-notes (1.37.0-wmf.4; 2021-05-04): CentralNoticeBannerHistory and CentralNoticeImpression Event Platform Migration - https://phabricator.wikimedia.org/T271168 (10Ottomata) > Did ^ happen? I just looked, php-1.37.0-wmf.9 ha... [13:47:13] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Fundraising-Backlog, 10MW-1.37-notes (1.37.0-wmf.4; 2021-05-04): CentralNoticeBannerHistory and CentralNoticeImpression Event Platform Migration - https://phabricator.wikimedia.org/T271168 (10Ottomata) [13:57:35] (03PS1) 10Ottomata: Add centralnoticeimpression and centralnoticebannerhistory legacy schemas [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/699753 (https://phabricator.wikimedia.org/T271168) [13:58:26] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Fundraising-Backlog, and 2 others: CentralNoticeBannerHistory and CentralNoticeImpression Event Platform Migration - https://phabricator.wikimedia.org/T271168 (10Ottomata) [13:58:51] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 5 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10Ottomata) [13:59:04] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Fundraising-Backlog, and 2 others: CentralNoticeBannerHistory and CentralNoticeImpression Event Platform Migration - https://phabricator.wikimedia.org/T271168 (10Ottomata) [14:01:47] (03CR) 10Ottomata: [C: 03+2] Add centralnoticeimpression and centralnoticebannerhistory legacy schemas [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/699753 (https://phabricator.wikimedia.org/T271168) (owner: 10Ottomata) [14:02:25] (03Merged) 10jenkins-bot: Add centralnoticeimpression and centralnoticebannerhistory legacy schemas [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/699753 (https://phabricator.wikimedia.org/T271168) (owner: 10Ottomata) [14:03:07] (03PS1) 10Gerrit maintenance bot: Add shi.wikipedia to pageview whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/699757 (https://phabricator.wikimedia.org/T284885) [14:05:24] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Fundraising-Backlog, and 2 others: CentralNoticeBannerHistory and CentralNoticeImpression Event Platform Migration - https://phabricator.wikimedia.org/T271168 (10Ottomata) [14:09:40] Good morning! [14:10:45] mornig! [14:11:00] mforns: o/ you shoudl comment on the metrics platform ticket! [14:37:53] > Effective June 1, 2021: Phabricator is no longer actively maintained. [14:37:53] Just noticed this! https://github.com/phacility/phabricator [14:39:36] hey folks morning, I'd need some time for kubeflow before the SRE meeting, do you mind if I skip the analytics ops sync> [14:39:39] ? [14:42:59] Fine by me elukey, an-master os upgrade tomorrow! [14:43:19] fingers crossed [14:48:51] 10Analytics, 10Analytics-Kanban, 10Platform Engineering, 10Research, 10User-razzi: Create airflow instances for Platform Engineering and Research - https://phabricator.wikimedia.org/T284225 (10razzi) Sounds good @Ottomata, creating vms in https://phabricator.wikimedia.org/T284934 [15:21:05] ottomata: 10 mins on Gobblin/puppet? [15:21:21] k [15:21:32] back in standup [15:21:35] b c [15:21:37] yup [15:41:10] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Product-Data-Infrastructure, 10Platform Team Initiatives (Modern Event Platform (TEC2)): Allow disabling/enabling configured streams via wgEventStreams config - https://phabricator.wikimedia.org/T259712 (10odimitrijevic) p:05Low→03High [15:41:15] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Product-Data-Infrastructure, 10Platform Team Initiatives (Modern Event Platform (TEC2)): Allow disabling/enabling configured streams via wgEventStreams config - https://phabricator.wikimedia.org/T259712 (10Ottomata) a:03Ottomata [15:41:27] 10Analytics, 10Analytics-Kanban, 10Better Use Of Data, 10Event-Platform, and 2 others: Allow disabling/enabling configured streams via wgEventStreams config - https://phabricator.wikimedia.org/T259712 (10Ottomata) [16:44:14] razzi: qq - I recall that we needed to follow up on Yarn queue status (RUNNING vs STOPPED), to avoid the issue that you fixed the last time tha twe tried to do the saveNamespace. Is it already live or WIP? [16:44:41] (I was thinking about outstanding things before tomorrow) [16:45:22] elukey: ah yes, adding support to the config file to turn everything on and off? Haven't done that yet [16:45:51] razzi: yep not mandatory for tomorrow, but it may simplify your life :) [16:45:58] :) [18:06:37] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Fundraising-Backlog, 10MW-1.37-notes (1.37.0-wmf.4; 2021-05-04): CentralNoticeBannerHistory and CentralNoticeImpression Event Platform Migration - https://phabricator.wikimedia.org/T271168 (10Ottomata) [18:22:35] joal: do you think we would use airflow for gobblin jobs, rather than systemd timers? [18:24:14] that might be pretty nice; would give us better visualization of jobs and status [18:25:47] and if we did that...maybe we don't want to use puppet for this? [18:25:55] i guess we'd be deploying the gobblin jars with scap like we do now [18:26:07] common stuff could go in airflow files [18:26:19] perhaps we could render some common propeties files with puppet to DRY up some configs like kafka clusterrs, etc. [18:26:20] ? [18:27:37] 10Analytics-Clusters, 10Analytics-Kanban: Remove all debian python-* and other user requested packages installed for analytics clients, use conda instead - https://phabricator.wikimedia.org/T275786 (10Ottomata) p:05Triage→03Medium [18:43:42] !log remove packges from stat nodes: sudo cumin 'stat*' apt-get -y remove subversion mercurial tofrodos libwww-perl libcgi-pm-perl libjson-perl libtext-csv-xs-perl libproj-dev libboost-regex-dev libboost-system-dev libgoogle-glog-dev libboost-iostreams-dev libgdal-dev [18:43:42] - T275786 [18:43:44] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:43:44] T275786: Remove all debian python-* and other user requested packages installed for analytics clients, use conda instead - https://phabricator.wikimedia.org/T275786 [18:45:04] !log remove packges from hadoop common nodes: sudo cumin 'R:Class = profile::analytics::cluster::packages::common' 'apt-get -y remove python3-pandas python3-pycountry python3-numpy python3-tz' - T275786 [18:45:06] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:47:12] 10Analytics, 10Analytics-EventLogging, 10Better Use Of Data, 10Event-Platform, and 4 others: KaiOS / Inuka Event Platform client - https://phabricator.wikimedia.org/T273219 (10nshahquinn-wmf) @SBisson since the new logging code is already working well in production (T283768), can we close this? [18:51:07] ottomata: very possible! [18:52:18] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 5 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10Ottomata) [18:53:03] ottomata: the only thing that pop up to my mind to be checked are logs [18:53:09] eh? [18:53:42] ottomata: timers facilitate logs handling in nice ways - I don't know how it's handled in airflow (we'll need not onl hadoop logs but also local ones) [18:55:23] ah, joal yeah they all go in the airflow logs directory, per dag and time schedule [18:55:27] also viewable in the UI [18:55:28] which is nice [18:55:32] can just click on the failed job and view them [18:55:32] sounds great [18:55:47] well, why not airflow :) [18:55:52] hm [18:56:09] would we want an individual gobblin job per stream? [18:56:26] i was able to generate airlfow tasks using eventstreamconfig [18:56:30] this feels like too far stretch IMO --^ [18:56:41] yeah maybe..,. [18:56:54] My next thing was: we have no idea of reliability /scalability as oif now [18:57:29] yeah [18:57:39] Maybe we'll move to that - maybe not now? [18:57:45] yeah [18:57:47] Same as for flink [18:59:41] ok, joal lets try to go for airflow then for gobblin instead of puppet [19:00:08] Ok - same jobs as the puppet ones, but in airflow [19:01:52] ottomata: I assume we can get the log4j and analytics_defaults properties files from pupper [19:02:16] Maybe even the last one (kafka_to_hdfs_hourly) [19:02:37] ya w can do that, or at least maybe the cluster specific configs [19:02:40] like kafka brokers, etc. [19:02:41] right? [19:03:31] joal maybe we can just render some common properties that can be included and then referenced [19:03:32] in the job files [19:04:10] I was thinking: puppet could render infra files (hdfs adress, kafka-brokers) [19:05:25] Then the config files are generated using gobblin, incliding the puppet for infra? [19:09:28] In any case ottomata we'll need 2 config files for gobblin (the 'system' one and the 'job' one) [19:19:06] ottomata: another thing to consider before the airflow choice: alerts [19:20:50] I'm uncomfortable with the idea of putting gobblin jobs to airflow now - I think it's related to our lack of knowledge of the tools, and the 'critical' aspect of ingestion [19:22:33] But that's probably me not liking the idea of stepping outside my comfort zone [19:23:08] When we upgrade the cluster, ingestion was down for several hours, and it caught up without issues [19:23:31] anyway - sorry for the online rambling [19:27:52] (sorry in meeting!) [20:20:14] joal: i don't think there's any difference in how gobblin will run if we use airflwo vs puppet + systemd timers [20:20:34] it'll just be easier to manage, view logs, see things that have gone run, look at individual run logs, etc. [20:43:33] 10Analytics, 10Event-Platform, 10Product-Analytics: Augment Hive event data with normalized host info from meta.domain - https://phabricator.wikimedia.org/T251320 (10Ottomata) @Mholloway this might be a fun one to do. [20:48:34] joal: also, we wanted to change the way we configure streams for discovery by camus [20:48:35] https://phabricator.wikimedia.org/T273901#6879350 [20:48:46] https://gerrit.wikimedia.org/r/c/operations/puppet/+/668125/1/modules/profile/manifests/analytics/refinery/job/test/camus.pp [20:49:17] i held off for gobblin [20:49:33] we should do it that way instead of using destination_event_service [21:02:44] Patch for new airflow vms, ottomata if you're around care to review? https://gerrit.wikimedia.org/r/c/operations/puppet/+/699790 [21:03:05] +1 razzi ty!