[01:08:08] elukey: hi, can you /msg chanserv flags #wikimedia-analytics wikibugs +Vv [01:17:10] elukey: I also pinged you at https://phabricator.wikimedia.org/T283230#7229142 to add more channel ops here and in -ml [06:24:17] legoktm: hi! Fixed wikibugs, but I thought more people were ops in here (I recall adding Razzi but at this point I missed some step probably) [06:26:24] legoktm: --^ is it sufficient to +o nicks or should I do something different? [06:43:58] elukey: nope, that does not persist when you disconnect and join back in, I responded on the task [06:46:41] majavah: ahhhh nice thanks! [06:47:11] Thanks majavah :) [06:50:46] ok deopped the users that I had /mode op before, they should be able to op by themselves now [06:51:15] btullis: o/ when you are online can you try to op yourself to see if I didn't make mistakes? [07:41:43] 10Analytics: [EventGate] Failures when getting stream config from MediaWiki API - https://phabricator.wikimedia.org/T286793 (10mforns) Thanks a lot @Mholloway for the clarifications! Then, this volume is not related to the EventGate issues. I wonder where the EventGate request logs are? Will continue looking. Ch... [09:02:34] 10Analytics, 10DBA, 10Infrastructure-Foundations, 10SRE, and 2 others: Switch buffer re-partition - Eqiad Row C - https://phabricator.wikimedia.org/T286065 (10cmooney) [09:04:39] Elukey. Looks like I am op'd now. Thanks. [09:04:45] super :) [13:27:44] ottomata: we discussed sometime ago a prometheus pushgateway inside of analytics network. We managed to get around the lack of it back then (we only needed metrics to appear in kibana, that was doable with Graphite), but no we're thinking of having alerting based on our jobs on Hadoop. Since Alert Manager is deeply tied to Prometheus (I think?) it seems to me that a proper way of using it would be to have metrics [13:27:53] (03CR) 10Ottomata: Don't use Gobblin lock but rather yarn check (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/705970 (https://phabricator.wikimedia.org/T286559) (owner: 10Joal) [13:28:15] in Prometheus, hence the pushgateway. Any plans related to that? [13:35:31] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Metrics-Platform, 10Product-Data-Infrastructure: Client-side error logging should use Elastic Common Schema (ECS) fields when possible - https://phabricator.wikimedia.org/T267602 (10Ottomata) I can't recall if we made any real decisions on what to do... [13:48:11] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Wikidata, and 3 others: Automate event stream ingestion into HDFS for streams that don't use EventGate - https://phabricator.wikimedia.org/T273901 (10Ottomata) Yes! We've done this now that we are using Gobblin instead of Camus. Moving this to our Kanb... [13:51:41] 10Analytics, 10Analytics-Kanban: Deprecate profile::analytics::cluster::users - https://phabricator.wikimedia.org/T287063 (10Ottomata) a:03Ottomata Thanks Luca! I'd like to work on this ASAP, as it kinda blocks {T284225} [13:52:24] 10Analytics, 10Analytics-Kanban, 10Platform Engineering, 10Research: Create airflow instances for Platform Engineering and Research - https://phabricator.wikimedia.org/T284225 (10Ottomata) Blocked by {T287063} [13:53:09] (03CR) 10Joal: Don't use Gobblin lock but rather yarn check (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/705970 (https://phabricator.wikimedia.org/T286559) (owner: 10Joal) [13:55:28] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Add analytics-presto.eqiad.wmnet CNAME for Presto coordinator failover - https://phabricator.wikimedia.org/T273642 (10Ottomata) Interesting! I agree if discovery works (with Kerberos), it'd be easier to failover using confctl rather than DNS chan... [13:55:52] 10Analytics: Create aggregate alarms for Hadoop daemons running on worker nodes - https://phabricator.wikimedia.org/T287027 (10Ottomata) Go for it! [13:57:44] zpapierski: yes! https://phabricator.wikimedia.org/T286503 [13:57:54] we haven't started it but need to do it soon (this or next q?) [13:57:59] ping also joal ^ [13:58:20] There is a push gateway in wmf prod already, i think analytics netwroks should be able to use it [13:58:27] but i haven't tried [14:01:59] ottomata: great! so if I understand what you're saying, we should have prereq already set up, good to hear [14:04:39] 10Analytics, 10Growth-Team, 10Product-Analytics: Add geolocation information to Growth schemas - https://phabricator.wikimedia.org/T287121 (10Ottomata) If you add the `http.client_ip` field to your schemas, EventGate will automatically populate it. If this field exists, the Hive Refine step will then automa... [14:08:27] yeah, observabiliity set it up, and i think we can use it, just haven't tried yet [14:08:29] we will try it soon [14:08:38] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Wikidata, and 3 others: Automate event stream ingestion into HDFS for streams that don't use EventGate - https://phabricator.wikimedia.org/T273901 (10Zbyszko) @Ottomata thanks! [14:11:02] (03CR) 10Ottomata: Don't use Gobblin lock but rather yarn check (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/705970 (https://phabricator.wikimedia.org/T286559) (owner: 10Joal) [14:20:50] (03CR) 10Joal: Don't use Gobblin lock but rather yarn check (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/705970 (https://phabricator.wikimedia.org/T286559) (owner: 10Joal) [14:23:09] (03CR) 10Ottomata: Don't use Gobblin lock but rather yarn check (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/705970 (https://phabricator.wikimedia.org/T286559) (owner: 10Joal) [14:40:51] 10Analytics, 10DBA, 10Infrastructure-Foundations, 10SRE, and 2 others: Switch buffer re-partition - Eqiad Row C - https://phabricator.wikimedia.org/T286065 (10ops-monitoring-bot) Icinga downtime set by mmandere@cumin2002 for 1:00:00 4 host(s) and their services with reason: Eqiad row C maintenance ` cp[108... [14:42:03] PROBLEM - Check unit status of produce_canary_events on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [14:42:54] 10Analytics, 10DBA, 10Infrastructure-Foundations, 10SRE, and 2 others: Switch buffer re-partition - Eqiad Row C - https://phabricator.wikimedia.org/T286065 (10Vgutierrez) [14:49:27] 10Analytics, 10DBA, 10Infrastructure-Foundations, 10SRE, and 2 others: Switch buffer re-partition - Eqiad Row C - https://phabricator.wikimedia.org/T286065 (10herron) [14:50:20] 10Analytics, 10DBA, 10Infrastructure-Foundations, 10SRE, and 2 others: Switch buffer re-partition - Eqiad Row C - https://phabricator.wikimedia.org/T286065 (10ops-monitoring-bot) Icinga downtime set by mmandere@cumin2002 for 1:00:00 1 host(s) and their services with reason: Eqiad row C maintenance ` lvs101... [14:52:03] RECOVERY - Check unit status of produce_canary_events on an-launcher1002 is OK: OK: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [14:55:40] 10Analytics, 10DBA, 10Infrastructure-Foundations, 10SRE, and 2 others: Switch buffer re-partition - Eqiad Row C - https://phabricator.wikimedia.org/T286065 (10Vgutierrez) [15:04:04] 10Analytics, 10DBA, 10Infrastructure-Foundations, 10SRE, and 2 others: Switch buffer re-partition - Eqiad Row C - https://phabricator.wikimedia.org/T286065 (10MoritzMuehlenhoff) [15:07:50] 10Analytics, 10DBA, 10Infrastructure-Foundations, 10SRE, and 2 others: Switch buffer re-partition - Eqiad Row C - https://phabricator.wikimedia.org/T286065 (10aborrero) [15:10:22] 10Analytics, 10DBA, 10Infrastructure-Foundations, 10SRE, 10netops: Switch buffer re-partition - Eqiad Row C - https://phabricator.wikimedia.org/T286065 (10herron) [15:22:09] 10Analytics, 10DBA, 10Infrastructure-Foundations, 10SRE, 10netops: Switch buffer re-partition - Eqiad Row C - https://phabricator.wikimedia.org/T286065 (10aborrero) [15:23:06] 10Analytics, 10DBA, 10Infrastructure-Foundations, 10SRE, 10netops: Switch buffer re-partition - Eqiad Row C - https://phabricator.wikimedia.org/T286065 (10cmooney) [15:49:09] (03PS4) 10Joal: Load cassandra3 from spark [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/686629 (https://phabricator.wikimedia.org/T280649) (owner: 10Milimetric) [15:52:03] (03PS4) 10Joal: [WIP] Update to spark-3 and scala-2.12 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/656897 [15:52:26] (03PS5) 10Joal: [WIP] Update to spark-3 and scala-2.12 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/656897 [16:19:09] 10Analytics-Clusters, 10Analytics-Kanban: Upgrade Matomo to latest upstream - https://phabricator.wikimedia.org/T275144 (10BTullis) [17:02:15] 10Analytics, 10DBA, 10Infrastructure-Foundations, 10SRE, 10netops: Switch buffer re-partition - Eqiad Row C - https://phabricator.wikimedia.org/T286065 (10cmooney) All went very well with the change, this time I ran rapid ping from the CR to see if any packet loss was observed, and did detect some loss,... [17:02:31] 10Analytics, 10DBA, 10Infrastructure-Foundations, 10SRE, 10netops: Switch buffer re-partition - Eqiad Row C - https://phabricator.wikimedia.org/T286065 (10cmooney) 05Open→03Resolved [17:15:14] (03PS1) 10Joal: [WIP] Add cassandra3 to oozie loading jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/706605 (https://phabricator.wikimedia.org/T280649) [17:16:17] (03PS2) 10Joal: [WIP] Add cassandra3 to oozie loading jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/706605 (https://phabricator.wikimedia.org/T280649) [18:07:43] (03CR) 10Joal: Don't use Gobblin lock but rather yarn check (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/705970 (https://phabricator.wikimedia.org/T286559) (owner: 10Joal) [18:26:39] (03CR) 10Ottomata: [V: 03+2 C: 03+2] "Ok1" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/705970 (https://phabricator.wikimedia.org/T286559) (owner: 10Joal) [18:34:54] (03CR) 10Ottomata: Add Refine transform function to add normalized host (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/705021 (https://phabricator.wikimedia.org/T251320) (owner: 10Mholloway) [18:35:23] (03CR) 10Ottomata: Add Refine transform function to add normalized host (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/705021 (https://phabricator.wikimedia.org/T251320) (owner: 10Mholloway) [18:38:24] !log deploy refinery to an-launcher1002 for bin/gobblin job lock change [18:38:27] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:39:02] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Replace Camus by Gobblin - https://phabricator.wikimedia.org/T271232 (10Ottomata) [18:51:10] ottomata: failed :( meh [18:51:39] oh looking [18:52:14] java.lang.IllegalArgumentException: Can not create a Path from a null string [18:52:15] ? [18:52:24] this is unexpected :S [18:52:33] gobblin docs says that not setting is the way to go [18:52:46] hmmm [18:52:58] we shoulda done in test cluster first anyway eh? [18:52:58] ok [18:53:24] lets undo the lock dir ? [18:54:05] yeah :) [18:54:10] sorry for that Andrew :( [18:54:12] tch [18:54:18] sending patch [18:54:18] s'ok [18:55:02] (03PS1) 10Ottomata: Set gobblin job.lock.dir after all [analytics/refinery] - 10https://gerrit.wikimedia.org/r/706673 (https://phabricator.wikimedia.org/T271232) [18:55:42] (03CR) 10Ottomata: [V: 03+2 C: 03+2] Set gobblin job.lock.dir after all [analytics/refinery] - 10https://gerrit.wikimedia.org/r/706673 (https://phabricator.wikimedia.org/T271232) (owner: 10Ottomata) [19:09:32] ottomata: fixed - thank you [19:11:52] gr8 ty [19:16:38] ottomata: I'm so dumb sorry - job.lock.enabled is the property :S [19:16:59] Will send a new patch - feel free to test/merge next week without rush [19:17:18] ? oh you mean to disable it [19:17:18] k [19:17:37] yeah - will write my thoughts on an email tomorrow and let you decide when I'm gone :) [19:20:50] ok [19:21:57] (03PS1) 10Joal: Add property disabling gobblin lock [analytics/refinery] - 10https://gerrit.wikimedia.org/r/706696 (https://phabricator.wikimedia.org/T286559) [19:22:17] ok done for tonight - talk to you tomorrow online team [19:28:15] (03PS2) 10Ottomata: Add property disabling gobblin lock [analytics/refinery] - 10https://gerrit.wikimedia.org/r/706696 (https://phabricator.wikimedia.org/T286559) (owner: 10Joal) [19:28:52] (03CR) 10Ottomata: Add property disabling gobblin lock (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/706696 (https://phabricator.wikimedia.org/T286559) (owner: 10Joal) [19:28:58] l8rs joal! [20:03:38] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10MW-1.37-notes (1.37.0-wmf.11; 2021-06-21), 10Patch-For-Review: LandingPageImpression Event Platform Migration - https://phabricator.wikimedia.org/T282855 (10Ottomata) [20:03:51] 10Analytics, 10Analytics-Kanban, 10Better Use Of Data, 10Event-Platform, and 5 others: VirtualPageView Event Platform Migration - https://phabricator.wikimedia.org/T238138 (10Ottomata) [20:03:59] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10MW-1.37-notes (1.37.0-wmf.11; 2021-06-21), 10Patch-For-Review: WMDEBanner* Event Platform Migration - https://phabricator.wikimedia.org/T282562 (10Ottomata) [20:04:07] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Fundraising-Backlog, and 2 others: CentralNoticeBannerHistory and CentralNoticeImpression Event Platform Migration - https://phabricator.wikimedia.org/T271168 (10Ottomata) [20:11:08] 10Analytics, 10Event-Platform: EchoMail and EchoInteraction Event Platform Migration - https://phabricator.wikimedia.org/T287210 (10Ottomata) [20:12:11] 10Analytics, 10Event-Platform: EchoMail and EchoInteraction Event Platform Migration - https://phabricator.wikimedia.org/T287210 (10Ottomata) [20:12:46] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 5 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10Ottomata) [20:15:46] 10Analytics, 10Event-Platform: EchoMail and EchoInteraction Event Platform Migration - https://phabricator.wikimedia.org/T287210 (10Ottomata) @nettrom_WMF @MMiller_WMF Do either EchoMail or EchoInteraction need client_ip and/or geocoded data? [20:25:15] (03PS1) 10Ottomata: Add legacy/echomail and legacy/echointeraction schemas [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/706742 (https://phabricator.wikimedia.org/T287210) [20:50:35] 10Analytics, 10Event-Platform, 10Patch-For-Review: EchoMail and EchoInteraction Event Platform Migration - https://phabricator.wikimedia.org/T287210 (10Ottomata) [20:54:00] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 2 others: Decommission EventLogging backend components by migrating to MEP - https://phabricator.wikimedia.org/T238230 (10ovasileva) [21:17:02] 10Analytics, 10Product-Analytics, 10Growth-Team (Current Sprint): Add geolocation information to Growth schemas - https://phabricator.wikimedia.org/T287121 (10mewoph)