[04:54:43] (03PS1) 10NOkafor: Fix unique_devices hql and pageview_top_per_country [analytics/refinery] - 10https://gerrit.wikimedia.org/r/833485 [06:38:39] RECOVERY - SSH on analytics1077.mgmt is OK: SSH OK - OpenSSH_7.4 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [06:56:06] 10Data-Engineering-Operations, 10SRE, 10SRE-Access-Requests: Requesting Kerberos access for alinebruenger and siko - https://phabricator.wikimedia.org/T316766 (10Siko_WMDE) Hi @Ottomata, Got the E-Mail! Thank you :-) [08:55:57] PROBLEM - Check systemd state on stat1005 is CRITICAL: CRITICAL - degraded: The following units failed: export_smart_data_dump.service,session-c4122.scope https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [09:25:13] hi, stat1005 appears to have some issues (looks like overload to me, but might be something else). i nearly can't even ssh in (it takes an ethernity). [09:31:10] a-team ^^ help'd be appreciated [09:34:28] (also reported by a fellow user in #data-engineering at Slack) [12:02:25] (03CR) 10Joal: [V: 03+2 C: 03+2] "LGTM! Merging" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/833485 (owner: 10NOkafor) [12:18:43] (03CR) 10Joal: "Still some minor demands of change - thanks Xabriel" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/832336 (https://phabricator.wikimedia.org/T317124) (owner: 10Xcollazo) [14:40:10] mforns: you're looking at stat1005, right? Did you check memory/cpu and stuff like that? Looks like some job's killing it [14:43:09] ok milimetric will look! [14:45:05] I was just connecting your email about troubleshooting systemd to what was said above, sounds like a runaway job [15:28:39] 10Data-Engineering, 10Event-Platform Value Stream, 10Wikimedia-production-error: Eventgate error: '' should have required property 'database', '' should have required property 'performer' - https://phabricator.wikimedia.org/T317343 (10cjming) [15:29:30] 10Data-Engineering, 10Event-Platform Value Stream, 10Growth-Team, 10Product-Analytics, 10Wikimedia-production-error: Eventgate error: '' should have required property 'database', '' should have required property 'performer' - https://phabricator.wikimedia.org/T317343 (10cjming) [15:32:28] 10Data-Engineering, 10Event-Platform Value Stream, 10Growth-Team, 10Product-Analytics, 10Wikimedia-production-error: Eventgate error: '' should have required property 'database', '' should have required property 'performer' - https://phabricator.wikimedia.org/T317343 (10cjming) [15:32:55] 10Data-Engineering, 10Event-Platform Value Stream, 10Growth-Team, 10Product-Analytics, 10Wikimedia-production-error: Eventgate error: '' should have required property 'database', '' should have required property 'performer' - https://phabricator.wikimedia.org/T317343 (10cjming) [15:33:34] 10Data-Engineering, 10Event-Platform Value Stream, 10Growth-Team, 10Product-Analytics, 10Wikimedia-production-error: Eventgate error: '' should have required property 'database', '' should have required property 'performer' - https://phabricator.wikimedia.org/T317343 (10cjming) Looks like these errors ar... [15:39:53] 10Data-Engineering, 10Event-Platform Value Stream, 10Product-Analytics, 10Growth-Team (Current Sprint), 10Wikimedia-production-error: Eventgate error: '' should have required property 'database', '' should have required property 'performer' - https://phabricator.wikimedia.org/T317343 (10kostajh) [15:49:42] heya joal and aqu :] are you available to discuss some tricky Airflow template interpolation stuff before standup? If so, meet us in the standup meeting! [16:00:46] Hey mforns - sorr I was not there [16:00:53] mforns: after stadup? [16:06:42] 10Data-Engineering, 10Event-Platform Value Stream, 10Product-Analytics, 10Growth-Team (Current Sprint), 10Wikimedia-production-error: Eventgate error: '' should have required property 'database', '' should have required property 'performer' - https://phabricator.wikimedia.org/T317343 (10Tgr) a:03Tgr Th... [16:10:05] (03PS4) 10Xcollazo: Move most CREATE statements from hive/ to hql/ [analytics/refinery] - 10https://gerrit.wikimedia.org/r/832336 (https://phabricator.wikimedia.org/T317124) [16:14:21] (03CR) 10Xcollazo: Move most CREATE statements from hive/ to hql/ (033 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/832336 (https://phabricator.wikimedia.org/T317124) (owner: 10Xcollazo) [16:16:35] (03CR) 10Joal: [C: 03+1] "LGTM! Thanks a lot for this Xabriel 😊" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/832336 (https://phabricator.wikimedia.org/T317124) (owner: 10Xcollazo) [16:43:14] 10Data-Engineering, 10API Platform, 10Platform Engineering Roadmap, 10User-Eevans: AQS 2.0:Wikistats 2 service - https://phabricator.wikimedia.org/T288301 (10BPirkle) >>! In T288301#8249130, @Milimetric wrote: > @BPirkle: sorry so late. Only one tiny misunderstanding left, I think. Wikistats 2 pulls data... [16:46:36] 10Quarry, 10cloud-services-team (Kanban): Should quarry use our standard secrets management - https://phabricator.wikimedia.org/T290184 (10rook) Should T301469 be successful this will likely be moved to git-crypt at least at first [17:43:24] mforns, aqu - I'll be late but join! [17:53:49] ok! [18:01:13] mforns: no one in either batcave nor in the meeting Sandra sent - are you still there? [18:01:26] yes, I'll pass you the link [18:01:32] https://meet.google.com/egs-ggja-xtj [18:06:02] 10Data-Engineering, 10MediaWiki-extensions-EventLogging, 10Metrics-Platform, 10Epic: [EPIC] Deprecate EventLogging::logEvent() - https://phabricator.wikimedia.org/T318263 (10phuedx) [18:15:36] 10Data-Engineering, 10MediaWiki-extensions-EventLogging, 10Metrics-Platform, 10Epic: [EPIC] Deprecate EventLogging::logEvent() - https://phabricator.wikimedia.org/T318263 (10phuedx) [18:56:22] 10Data-Engineering, 10MediaWiki-extensions-EventLogging, 10Metrics-Platform, 10Epic: [EPIC] Deprecate mw.eventLog.logEvent() - https://phabricator.wikimedia.org/T317874 (10phuedx) [19:11:32] !log kill aarora process 14584 on stat1005 - using 2500% cpu [19:11:33] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:13:38] !log Deployed refinery for HQL patch (Njideka) [19:13:39] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:16:08] (03PS1) 10Gergő Tisza: Un-require some analytics/mediawiki/accountcreation/block fields [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/833845 (https://phabricator.wikimedia.org/T317343) [19:16:42] (03CR) 10CI reject: [V: 04-1] Un-require some analytics/mediawiki/accountcreation/block fields [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/833845 (https://phabricator.wikimedia.org/T317343) (owner: 10Gergő Tisza) [19:18:03] !log kill aarora process 30421 run_embedding_training.sh on stat1005 [19:18:04] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:21:56] (03CR) 10Kosta Harlan: Un-require some analytics/mediawiki/accountcreation/block fields (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/833845 (https://phabricator.wikimedia.org/T317343) (owner: 10Gergő Tisza) [19:24:35] (03PS2) 10Gergő Tisza: Un-require some analytics/mediawiki/accountcreation/block fields [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/833845 (https://phabricator.wikimedia.org/T317343) [19:25:24] !log Kill oozie daily cassandra loading jobs as we move them to airflow [19:25:25] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:26:06] Gone for tonight - talk tomorrow folks :) [19:29:27] (03CR) 10Gergő Tisza: Un-require some analytics/mediawiki/accountcreation/block fields (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/833845 (https://phabricator.wikimedia.org/T317343) (owner: 10Gergő Tisza) [19:37:02] hi! we want to push a quick fix for an eventlogging issue (T317343) by marking two fields as not required. That apparently requires going from 2.0.0 -> 3.0.0 instead of 2.1.0. Is there some harm in doing that? [19:37:03] T317343: Eventgate error: '' should have required property 'database', '' should have required property 'performer' - https://phabricator.wikimedia.org/T317343 [19:38:11] (03CR) 10Kosta Harlan: [C: 03+1] Un-require some analytics/mediawiki/accountcreation/block fields [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/833845 (https://phabricator.wikimedia.org/T317343) (owner: 10Gergő Tisza) [19:38:27] kostajh: o/ it is technically an incompatible change, as a consumer would expect a required field to always be present. [19:38:34] but, it will not break hive. [19:39:24] ottomata: ack. thanks! [19:40:03] might be good to add a changelog.md file to that direcotry, just so other folks can know why things are changed later [19:40:53] ottomata: is the inverse change OK for hive too? This is meant to be a temporary change. [19:41:05] (03CR) 10Ottomata: [C: 03+1] Un-require some analytics/mediawiki/accountcreation/block fields (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/833845 (https://phabricator.wikimedia.org/T317343) (owner: 10Gergő Tisza) [19:42:04] tgr_: temporary? i'm pretty sure just changing the schema here won't fix the validation errors, as the schema used for validation is the one the producer sets, not the latest one available [19:42:41] you'd have to alter the existent schema version and do a redeploy of eventgate to have it work with deployed instrumentation at the same schema version it is producing [19:42:48] which is annoying [19:43:40] The schema version is updated in https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikimediaEvents/+/833846 [19:43:56] ah okay, so why temporary then? [19:44:04] The proper fix would be actually sending those required fields, but that's not easy to backport [19:44:26] ah i see, its a bigger code change so needs more testing you mean? [19:45:00] yeah, I plan to do that as a followup and revert the schema change in 4.0.0 [19:45:07] to answer your q directly: i think adding requiredness is also incompatible...but i'm less sure that it should be. [19:47:20] that's fine as long as it does not break Hive [19:47:55] it doesn't matter much which part of the version number we change [19:49:15] (03CR) 10Gergő Tisza: Un-require some analytics/mediawiki/accountcreation/block fields (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/833845 (https://phabricator.wikimedia.org/T317343) (owner: 10Gergő Tisza) [19:49:35] aye okay. it shouldn't break Hive. we set all fields as nullable no matter what the event schema says in Hive :) [19:49:36] https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-spark/src/main/scala/org/wikimedia/analytics/refinery/spark/connectors/DataFrameToHive.scala#L113-L123 [19:50:22] since the hive table is one big merged version of all the event schema versions [20:08:32] (03CR) 10Kosta Harlan: [C: 03+2] Un-require some analytics/mediawiki/accountcreation/block fields [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/833845 (https://phabricator.wikimedia.org/T317343) (owner: 10Gergő Tisza) [20:09:08] (03Merged) 10jenkins-bot: Un-require some analytics/mediawiki/accountcreation/block fields [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/833845 (https://phabricator.wikimedia.org/T317343) (owner: 10Gergő Tisza) [20:19:55] PROBLEM - Check unit status of produce_canary_events on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [20:22:01] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [20:31:17] RECOVERY - Check unit status of produce_canary_events on an-launcher1002 is OK: OK: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [20:31:29] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [21:37:29] 10Data-Engineering, 10Event-Platform Value Stream, 10Product-Analytics, 10Growth-Team (Current Sprint), and 2 others: Eventgate error: '' should have required property 'database', '' should have required property 'performer' - https://phabricator.wikimedia.org/T317343 (10Tgr) Fixed & added the stream to th... [21:46:25] 10Analytics, 10API Platform, 10Platform Engineering Roadmap, 10User-Eevans: AQS 2.0 documentation - https://phabricator.wikimedia.org/T288664 (10apaskulin) [21:48:15] (03PS1) 10Gergő Tisza: analytics/mediawiki/accountcreation/block: Re-add required flags [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/833857 (https://phabricator.wikimedia.org/T317343) [21:54:49] 10Analytics, 10API Platform, 10Platform Engineering Roadmap, 10User-Eevans: AQS 2.0 documentation - https://phabricator.wikimedia.org/T288664 (10apaskulin) [22:08:43] 10Analytics, 10API Platform, 10Platform Engineering Roadmap, 10User-Eevans: AQS 2.0 documentation - https://phabricator.wikimedia.org/T288664 (10apaskulin)