[06:25:51] <joal>	 ok we have a mess this morning
[06:30:14] <razzi>	 Hmm joal anything I can help with? I’m still up for a bit
[06:30:32] <razzi>	 Does it have to do with the change I merged? :X
[06:31:17] <joal>	 Hi razzi - The hdfs-cleaner deployed yesterday didn't work as expected and deleted files needed for gobblin to work normally
[06:31:32] <joal>	 razzi: nothing you could have found by looking at code - 
[06:31:47] <joal>	 razzi: the code misbehaved for a reason I don't yet understand
[06:32:21] <joal>	 But now we have every single topic pulled from gobblin missing data for hour 02 UTC
[06:32:38] <joal>	 I'm gonna devise a fix plan
[06:37:10] <elukey>	 if you need help I am around :)
[06:37:30] <joal>	 Thanks a lot razzi and elukey 
[06:37:44] <joal>	 I'm gonna devise the plan, and ask you to review
[06:39:10] <razzi>	 Ok thanks elukey I’ll be back in ~9 hours after much needed sleep! :)
[06:39:33] <joal>	 ack razzi - have a good night :)
[06:43:44] <joal>	 elukey: if you have a minute - https://etherpad.wikimedia.org/p/analytics-gobblin-mess
[06:46:38] <elukey>	 before starting - do we need to stop/cleanup the timer that deleted data as precaution
[06:46:41] <elukey>	 ?
[06:47:16] <joal>	 hm
[06:47:56] <joal>	 probably yes
[06:48:53] <elukey>	 that is
[06:48:54] <elukey>	 elukey@an-launcher1002:~$ sudo systemctl list-timers | grep hdfs-cleaner-gobblin
[06:48:57] <elukey>	 Thu 2021-10-21 23:45:00 UTC  16h left            Wed 2021-10-20 23:45:00 UTC  7h ago             hdfs-cleaner-gobblin.timer                               hdfs-cleaner-gobblin.service
[06:49:01] <elukey>	 right?
[06:49:17] <elukey>	 so there is time, but if we don't remove it tomorrow you'll have to do the same mess probably :D
[06:49:22] <joal>	 it is yes - it'll not run before tonight - let me send a patch to absent it from puppet
[06:49:53] <joal>	 ok for you elukey --^ ?
[06:50:43] <elukey>	 joal: wait 10 sec I have one ready
[06:50:49] <joal>	 ack elukey 
[06:54:32] <elukey>	 https://gerrit.wikimedia.org/r/c/operations/puppet/+/732610
[06:58:55] <elukey>	 ok all cleaned up
[06:59:17] <elukey>	 I am probably not super familiar with Gobblin (and don't have a lot of caffeine flowing yet :D)
[06:59:27] <elukey>	 but how do you retrieve the deleted state?
[06:59:27] <joal>	 ok - I'm not even sure how that thing broke :( I have ideas, but no confirmation
[06:59:59] <joal>	 The cleaner doesn't skip trash - the deleted states are in the trash ( i am currently copying them to my folder)
[07:00:00] <elukey>	 (if it is too long to explain please proceed, I don't want to derail/slowdown)
[07:00:07] <elukey>	 ahhhh good
[07:00:11] <joal>	 yes :)
[07:01:16] <elukey>	 so you want to test the gobblin jobs with the deleted states in your dir, verify that all is sound, run the jobs and transfer the data to prod correctly
[07:01:22] <elukey>	 so that webrequests etc.. will be unblocked
[07:02:12] <joal>	 Almost - I wish to test jobs from states in my folder - correct
[07:02:32] <joal>	 Then I wish to run the tested jobs to prod-destination (once asserted correct) - yes
[07:03:13] <joal>	 finally we'll have to manually rerun a bunch of jobs - for webrequest it's easy, they failed on missing data, so rerun should do - for events we to -re-refine
[07:03:29] <elukey>	 +1 then looks sound
[07:03:43] <joal>	 ack elukey thanks
[07:04:00] <joal>	 Will proceed gently, trying not to over-mess :)
[07:04:19] <elukey>	 do you need a rubber duck while you do it? Or do you prefer to do it on your own?
[07:04:54] <joal>	 I can always do with some of your help :) But I know you have other things to do :)
[07:25:09] <elukey>	 here I am sorry
[07:25:18] <elukey>	 I can join if needed :)
[07:26:16] <joal>	 thanks elukey - I'll ping you if I feel alone ;)
[07:26:44] <elukey>	 ack
[08:34:49] <btullis>	 I am also here now, in case I can help.
[08:36:17] <joal>	 ack btullis thanks a lot
[08:36:50] <joal>	 I'm moving gently - I have managed to have a working webrequest job - will check data somehow and will then unlock the jobs by copying data
[08:37:19] <btullis>	 ack
[08:41:08] <joal>	 !log Rerun webrequest-load jobs for hour 2021-10-21T02:00
[08:41:11] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[08:44:43] <btullis>	 I didn't know that HDFS trash was a thing until now, but I've just read: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster#recover_files_deleted_by_mistake_using_the_hdfs_CLI_rm_command?
[10:09:51] <wikibugs>	 (03Abandoned) 10Jhernandez: POC: Using a <datalist> to show the dbs [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/668544 (owner: 10Jhernandez)
[10:27:20] <joal>	 Spark 3.2 is out - and there some super cool improvements :)
[10:35:41] <joal>	 !log Re-refine netflow data after gobblin pulled data fix
[10:35:44] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[10:44:30] <zpapierski>	 joal: I want to get turnilo and superset permissions for ejoseph - is there some kind of a ticket template/process for that? I forgot how it looked for me
[10:45:18] <joal>	 zpapierski: IIRC he needs a LDAP acouunt, and the analytics-privatedata-user group
[10:45:58] <zpapierski>	 LDAP he has, so I understand I need to get him analytics-privatedata-user group - do you remember how it's done?
[10:47:09] <joal>	 zpapierski: Needs to be done throuhg a ticket to SRE IIRC - docs are here: https://wikitech.wikimedia.org/wiki/Analytics/Data_access
[10:47:20] <zpapierski>	 ah, perfect  - thanks
[11:26:22] <icinga-wm>	 PROBLEM - Check unit status of produce_canary_events on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[11:51:52] <wikibugs>	 10Analytics, 10Maps, 10Product-Infrastructure-Team-Backlog (Kanban): Sending events to `maps.tiles_change` stream is failing - https://phabricator.wikimedia.org/T294011 (10Jgiannelos)
[11:52:41] <wikibugs>	 10Analytics, 10Maps, 10Product-Infrastructure-Team-Backlog (Kanban): Sending events to `maps.tiles_change` stream is failing - https://phabricator.wikimedia.org/T294011 (10Jgiannelos)
[12:39:25] <wikibugs>	 (03CR) 10DCausse: Spark JsonSchemaConverter - additionalProperties with schema is always a MapType (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/629406 (https://phabricator.wikimedia.org/T263466) (owner: 10Ottomata)
[12:51:46] <wikibugs>	 (03PS9) 10DCausse: Add fragment/mediawiki/revision/slot [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/731006 (https://phabricator.wikimedia.org/T293195)
[12:52:26] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add fragment/mediawiki/revision/slot [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/731006 (https://phabricator.wikimedia.org/T293195) (owner: 10DCausse)
[12:55:00] <wikibugs>	 (03PS10) 10DCausse: Add fragment/mediawiki/revision/slot [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/731006 (https://phabricator.wikimedia.org/T293195)
[13:00:19] <wikibugs>	 (03CR) 10DCausse: Add fragment/mediawiki/revision/slot (031 comment) [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/731006 (https://phabricator.wikimedia.org/T293195) (owner: 10DCausse)
[13:17:06] <icinga-wm>	 PROBLEM - Check unit status of gobblin-event_default on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit gobblin-event_default https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[13:19:13] <btullis>	 joal: anything that I can do to help with this or the previousl gobblin issue? --^
[13:19:38] <ottomata>	 o/
[13:19:47] <ottomata>	 joal just saw your note, still checking email, etc. let me know if i can help too!
[13:19:52] <joal>	 hm
[13:20:13] <joal>	 Ok I think I know what have happened
[13:20:17] <joal>	 batcave?
[13:21:12] <ottomata>	 sure
[13:39:08] <wikibugs>	 10Analytics-Radar, 10Fundraising-Backlog, 10Product-Analytics, 10Wikipedia-iOS-App-Backlog, and 2 others: Understand impact of Apple's Relay Service - https://phabricator.wikimedia.org/T289795 (10TheDJ) >>! In T289795#7382434, @GeneralNotability wrote: > a list of current egress points can be found at http...
[13:39:31] <btullis>	 !log btullis@an-launcher1002:~$ sudo systemctl restart gobblin-event_default
[13:39:33] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[13:50:14] <icinga-wm>	 RECOVERY - Check unit status of gobblin-event_default on an-launcher1002 is OK: OK: Status of the systemd unit gobblin-event_default https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[13:50:20] <wikibugs>	 10Analytics, 10Maps, 10Product-Infrastructure-Team-Backlog (Kanban): Sending events to `maps.tiles_change` stream is failing - https://phabricator.wikimedia.org/T294011 (10Ottomata) Ah right, sorry.  eventgate-main only requests stream config on startup, so we need to just restart the service...will do...
[13:54:51] <wikibugs>	 10Analytics, 10Analytics-Kanban: Move the Analytics/DE testing infrastructure to Pontoon - https://phabricator.wikimedia.org/T292388 (10BTullis)
[13:59:04] <wikibugs>	 10Analytics, 10Analytics-Kanban: Move the Analytics/DE testing infrastructure to Pontoon - https://phabricator.wikimedia.org/T292388 (10BTullis) This request has now been granted. We have a new project on WMCS named `data-engineering`. I have added @Ottomata and @razzi as project admins. @elukey was also an ad...
[14:00:20] <icinga-wm>	 RECOVERY - Check unit status of produce_canary_events on an-launcher1002 is OK: OK: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[14:05:33] <ottomata>	 !log rerun refine_eventlogging_analytics refine_eventlogging_legacy  and refine_event with -ignore-done-flag=true --since=2021-10-21T01:00:00 --until=2021-10-21T04:00:00 for backfill of missing data after gobblin problems
[14:05:35] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[14:11:53] <milimetric>	 I'm a little confused.  I'm not sure how I'd set HADOOP_HEAPSIZE when using a library like PyHive.  There doesn't seem to be a corresponding configuration option (https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties), but would I just set it as an environment variable and when a Hive client is started by PyHive it would just use that?
[14:12:44] <urbanecm>	 Hello, is there a good way to get ORES articletopic scores for all articles on a given wiki? I'm thinking about "for each article for the wiki that's in event_sanitized.mediawiki_revision_score, keep only the row with highest rev_id", but perhaps there's already a way to get exactly that kind of information?
[14:12:58] <urbanecm>	 (obviously, i could query ores.wikimedia.org directly, but for all articles, that'd...take a while)
[14:13:24] <icinga-wm>	 PROBLEM - Check unit status of produce_canary_events on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[14:17:45] <milimetric>	 urbanecm: good question, I don't know of a dump of ORES scores anywhere, elukey is there such a thing?  If there is I'll add a line about it at https://wikitech.wikimedia.org/wiki/ORES
[14:18:37] <elukey>	 as far as I know there is no such thing
[14:18:45] <urbanecm>	 there's https://analytics.wikimedia.org/published/datasets/one-off/ores/scores_dumps/damaging_goodfaith_enwiki/, but I have no idea who and how created that :)
[14:19:03] <urbanecm>	 (and it's different set of scores, although same table i guess)
[14:19:30] <elukey>	 that one was requested as one off in a task, but it should contain the scores for the changes done in a certain period of time
[14:19:39] <elukey>	 with changes I meant edit
[14:19:41] <elukey>	 *edits
[14:20:18] <elukey>	 since change prop asks a score for every edits of all wikis, storing those in kafka
[14:25:38] <wikibugs>	 10Analytics: [Airflow] Implement DAG that syncs archiva packages to HDFS - https://phabricator.wikimedia.org/T294024 (10mforns)
[14:25:54] <wikibugs>	 10Analytics: [Airflow] Implement DAG that syncs archiva packages to HDFS - https://phabricator.wikimedia.org/T294024 (10mforns)
[14:25:56] <wikibugs>	 10Analytics, 10Platform Team Workboards (Image Suggestion API): Airflow collaborations - https://phabricator.wikimedia.org/T282033 (10mforns)
[14:29:07] <wikibugs>	 10Analytics: [Airflow] Create repository for Airflow DAGs - https://phabricator.wikimedia.org/T294026 (10mforns)
[14:29:20] <wikibugs>	 10Analytics: [Airflow] Create repository for Airflow DAGs - https://phabricator.wikimedia.org/T294026 (10mforns)
[14:29:22] <wikibugs>	 10Analytics, 10Platform Team Workboards (Image Suggestion API): Airflow collaborations - https://phabricator.wikimedia.org/T282033 (10mforns)
[14:35:11] <wikibugs>	 10Analytics, 10Maps, 10Product-Infrastructure-Team-Backlog (Kanban): Sending events to `maps.tiles_change` stream is failing - https://phabricator.wikimedia.org/T294011 (10Jgiannelos) 05Open→03Resolved Thanks, looks like its working now. I got some canary events and I also publish a couple of test events...
[14:35:28] <icinga-wm>	 RECOVERY - Check unit status of produce_canary_events on an-launcher1002 is OK: OK: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[14:36:08] <wikibugs>	 10Analytics, 10Maps, 10Product-Infrastructure-Team-Backlog (Kanban): Sending events to `maps.tiles_change` stream is failing - https://phabricator.wikimedia.org/T294011 (10Ottomata) Done!
[14:59:53] <wikibugs>	 (03CR) 10Ppchelko: [C: 04-1] "last nitpick." [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/731006 (https://phabricator.wikimedia.org/T293195) (owner: 10DCausse)
[15:02:35] <razzi>	 Hi team, good morning from the MIDWEST USA!!
[15:02:39] <razzi>	 (crowd cheers)
[15:03:11] <milimetric>	 so urbanecm I guess we should open up a task to create such a dump maybe, querying the API for *all* articles doesn't seem like a great idea
[15:03:16] <milimetric>	 morning razzi, lol
[15:03:45] <milimetric>	 razzi: you wanna help me troubleshoot something?
[15:03:57] <razzi>	 I'm down
[15:04:03] <milimetric>	 ok, to the batcave!
[15:04:04] <razzi>	 batcave?
[15:04:05] <razzi>	 !!!
[15:04:14] <razzi>	 My camera isn't working, gonna reboot real quick
[15:04:16] <milimetric>	 batcave's busy.  to the TARDIS!
[15:09:26] <wikibugs>	 (03CR) 10MNeisler: [C: 03+1] talk_page_event schema (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/731333 (https://phabricator.wikimedia.org/T286076) (owner: 10DLynch)
[15:12:47] <ottomata>	 joal:  all 3 refine reruns succeeded
[15:17:08] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Data-Engineering: Snapshot and Reload cassandra2 pageview_per_article data table from all 12 instances - https://phabricator.wikimedia.org/T291472 (10BTullis) There are still compactions running on the new cluster, although they have almost completed. We have decided to wai...
[15:20:42] <razzi>	 ottomata: we need a pckage on debian
[15:20:47] <razzi>	 on stat boxes
[15:20:52] <razzi>	 libsasl
[15:21:53] <ottomata>	 razzi:  https://packages.debian.org/buster/libsasl2-2 ?
[15:22:16] <razzi>	 hmm
[15:22:17] <ottomata>	 razzi:  if just stat boxes
[15:22:17] <ottomata>	 profile::analytics::cluster::packages::statistics
[15:22:20] <ottomata>	 add to that
[15:22:21] <razzi>	 unclear, apparently that's already got it
[15:23:26] <razzi>	 we need libsasl2-dev
[15:23:28] <razzi>	 ottomata: 
[15:23:33] <razzi>	 for the headers
[15:23:37] <ottomata>	 aye
[15:23:41] <razzi>	 ok yeah I'll add to that
[15:23:44] <ottomata>	 coo
[15:23:57] <btullis>	 Also installed, at least on stat1004
[15:24:00] <btullis>	 https://www.irccloud.com/pastebin/wIzwX3GF/
[15:24:30] <razzi>	 btullis: yep, we need dev headers
[15:24:36] <razzi>	 libsasl2-dev
[15:24:44] <btullis>	 Oops, sorry, I missed the -dev on my command. Apologies
[15:25:05] <milimetric>	 hm, so why does pip install fail with:
[15:25:08] <milimetric>	 https://www.irccloud.com/pastebin/EUZymed8/
[15:25:14] <razzi>	 Strangely though I see  ensure_packages libsasl2-dev
[15:25:17] <razzi>	         # For pyhive
[15:25:18] <razzi>	         'libsasl2-dev',
[15:25:19] <btullis>	 Ah, no I didn't, I just pasted the wrong thing.
[15:25:22] <btullis>	 https://www.irccloud.com/pastebin/B2DmeHI1/
[15:25:35] <razzi>	 yeah idk why it isn't finding the header
[15:25:43] <milimetric>	 right... so why's it failing to pip install sasl or pip install pyhive[hive]
[15:25:48] <elukey>	 razzi: you are in anaconda
[15:25:58] <razzi>	 aha!
[15:26:14] <elukey>	 there is all the mess outlined in https://phabricator.wikimedia.org/T292699
[15:26:31] <elukey>	 I think that you'd need to install libsasl via conda 
[15:26:38] <milimetric>	 woah
[15:27:40] <elukey>	 mmm should be `conda install -c conda-forge cyrus-sasl `
[15:27:42] <elukey>	 razzi: --^
[15:28:19] <elukey>	 or not, I don't see the headers in the webpage related to it, maybe there is another
[15:28:49] <elukey>	 well let's try it first :)
[15:29:02] <razzi>	 elukey: what's the deal with cyrus? I never heard of it
[15:29:25] <elukey>	 it is an implementation of sasl IIRC
[15:29:30] <milimetric>	 cool, trying
[15:29:59] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Data-Engineering, 10Patch-For-Review, and 2 others: Migrate analytics cluster alerts from Icinga to AlertManager - https://phabricator.wikimedia.org/T293399 (10BTullis) I have confirmed the presence of the first rule that has been added, by using an SSH tunnel and checkin...
[15:30:00] <elukey>	 ah wait it was milimetric asking, sorry :D
[15:30:41] <elukey>	 in theory though simply installing will not work miriam 
[15:30:43] <elukey>	 err milimetric 
[15:31:02] <milimetric>	 right, it didn't :)
[15:31:14] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Data-Engineering, 10Patch-For-Review, and 2 others: Migrate analytics cluster alerts from Icinga to AlertManager - https://phabricator.wikimedia.org/T293399 (10BTullis)
[15:31:15] <elukey>	 I think you'd also need export CPPFLAGS="${CPPFLAGS} -isystem ${CONDA_PREFIX}/include"
[15:32:07] <milimetric>	 worked elukey, thank you!
[15:32:15] <milimetric>	 I'm gonna add a little section to the docs and point to your task
[15:32:26] <btullis>	 Who's Miriam? 🙂
[15:33:43] <elukey>	 milimetric: nice! In theory when we package the new anaconda-wmf deb we'll get rid of the extra export CPP...
[15:34:44] <elukey>	 btullis: Miriam Redi! (ciao Miriam!)
[15:35:33] <miriam>	 ciao ciao elukey and btullis!!
[15:35:43] <milimetric>	 k, added this: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Anaconda#Installing_packages_into_your_user_conda_environment
[15:36:08] <btullis>	 Hello. Nice to meet you Miriam.
[16:00:15] <wikibugs>	 10Analytics, 10Product-Analytics, 10Structured-Data-Backlog: Create a Commons equivalent of the wikidata_entity table in the Data Lake - https://phabricator.wikimedia.org/T258834 (10JAllemandou) Moving back to incoming as there is demand from @cchen to prioritize.
[16:00:43] <wikibugs>	 (03CR) 10Ottomata: talk_page_event schema (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/731333 (https://phabricator.wikimedia.org/T286076) (owner: 10DLynch)
[16:03:32] <joal>	 mforns: standup?
[16:03:35] <joal>	 razzi: standuo?
[16:03:42] <mforns>	 uop!
[16:05:57] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Refactor analytics-meta MariaDB layout to multi instance with failover - https://phabricator.wikimedia.org/T284150 (10Ottomata) a:03Ottomata
[16:24:15] <wikibugs>	 10Analytics, 10Analytics-Kanban: Check home/HDFS leftovers of mholloway-shell - https://phabricator.wikimedia.org/T291353 (10odimitrijevic) @jlinehan @dr0ptp4kt  putting this on your radar again.
[16:30:24] <btullis>	 joal: In answer to your questionm yes the server that appears to be very slow compacting *is* one of the two that was restarted recently.
[16:30:27] <btullis>	 https://www.irccloud.com/pastebin/8kJoUR5w/
[16:34:04] <joal>	 hm
[16:34:42] <joal>	 btullis: would you mind triple checking this server's log? I wonder if it could be blocked or anything like the other one
[16:37:37] <btullis>	 It's still producing log messages, but slowly. No recent stack traces.
[16:37:48] <joal>	 ack btullis thanks
[16:47:58] <wikibugs>	 (03CR) 10DLynch: talk_page_event schema (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/731333 (https://phabricator.wikimedia.org/T286076) (owner: 10DLynch)
[16:50:02] <wikibugs>	 (03CR) 10DLynch: talk_page_event schema (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/731333 (https://phabricator.wikimedia.org/T286076) (owner: 10DLynch)
[17:01:12] <wikibugs>	 (03CR) 10DLynch: talk_page_event schema (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/731333 (https://phabricator.wikimedia.org/T286076) (owner: 10DLynch)
[17:08:57] <wikibugs>	 10Analytics, 10Product-Analytics, 10Structured-Data-Backlog: Create a Commons equivalent of the wikidata_entity table in the Data Lake - https://phabricator.wikimedia.org/T258834 (10JAllemandou) p:05Medium→03Triage
[17:09:22] <wikibugs>	 (03PS5) 10DLynch: talk_page_event schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/731333 (https://phabricator.wikimedia.org/T286076)
[17:12:23] <wikibugs>	 (03CR) 10Ottomata: talk_page_event schema (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/731333 (https://phabricator.wikimedia.org/T286076) (owner: 10DLynch)
[17:14:38] <wikibugs>	 10Analytics, 10Analytics-Kanban: HDFS check topology alert is currently broken - https://phabricator.wikimedia.org/T292846 (10BTullis) 05Open→03Resolved
[17:15:39] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Data-Engineering: Automate kerberos credential creation and management to ease the creation of testing infrastructure - https://phabricator.wikimedia.org/T292389 (10BTullis)
[17:16:16] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Data-Engineering, 10Patch-For-Review: Standardize the stats system user uid - https://phabricator.wikimedia.org/T291384 (10Ottomata)
[17:16:29] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Data-Engineering, and 2 others: Decommission EventLogging backend components by migrating to MEP - https://phabricator.wikimedia.org/T238230 (10Ottomata)
[17:16:34] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Data-Engineering, and 3 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10Ottomata)
[17:16:41] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Data-Engineering, 10Patch-For-Review: Refactor analytics-meta MariaDB layout to multi instance with failover - https://phabricator.wikimedia.org/T284150 (10Ottomata)
[17:16:48] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Data-Engineering: Improve Refine bad data handling - https://phabricator.wikimedia.org/T289003 (10Ottomata)
[17:16:50] <wikibugs>	 10Analytics, 10Data-Engineering: Create aggregate alarms for Hadoop daemons running on worker nodes - https://phabricator.wikimedia.org/T287027 (10BTullis)
[17:17:03] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Data-Engineering, and 3 others: Determine which remaining legacy EventLogging schemas need to be migrated or decommissioned - https://phabricator.wikimedia.org/T282131 (10Ottomata)
[17:17:31] <wikibugs>	 10Analytics, 10Analytics-Kanban: hdfs directory for analytics-research - https://phabricator.wikimedia.org/T290918 (10Ottomata) 05Open→03Resolved
[17:17:33] <wikibugs>	 10Analytics, 10Platform Team Workboards (Image Suggestion API): Airflow collaborations - https://phabricator.wikimedia.org/T282033 (10Ottomata)
[17:17:50] <wikibugs>	 10Analytics, 10Data-Engineering: Create aggregate alarms for Hadoop daemons running on worker nodes - https://phabricator.wikimedia.org/T287027 (10BTullis) This will be done as part of {T293399}
[17:18:05] <wikibugs>	 10Analytics, 10Platform Team Workboards (Image Suggestion API): Airflow collaborations - https://phabricator.wikimedia.org/T282033 (10Ottomata)
[17:18:07] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Data-Engineering: SPIKE - Will Hadoop 3 container support help us for Airflow deployment pipelines? - https://phabricator.wikimedia.org/T288247 (10Ottomata) 05Open→03Resolved
[17:18:32] <wikibugs>	 10Analytics: [Airflow] Create repository for Airflow DAGs - https://phabricator.wikimedia.org/T294026 (10razzi) My personal opinion is that it should be called airflow-config so that the file directory structure will look like `airflow-config/dags/my_cool_dag` etc.
[17:18:35] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Data-Engineering: Move the Analytics/DE testing infrastructure to Pontoon - https://phabricator.wikimedia.org/T292388 (10BTullis)
[17:18:43] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Discovery-Search, 10Patch-For-Review: Publish both shaded and unshaded artifacts from analytics refinery - https://phabricator.wikimedia.org/T217967 (10Ottomata) 05Open→03Resolved
[17:19:01] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Data-Engineering: [Airflow] Create repository for Airflow DAGs - https://phabricator.wikimedia.org/T294026 (10odimitrijevic) p:05Triage→03High a:03mforns
[17:19:31] <wikibugs>	 10Analytics: [Airflow] Implement DAG that syncs archiva packages to HDFS - https://phabricator.wikimedia.org/T294024 (10odimitrijevic) p:05Triage→03High
[17:20:04] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: High volume mediawiki analytics events camus import is lagging - https://phabricator.wikimedia.org/T233718 (10Ottomata)
[17:20:12] <wikibugs>	 10Analytics: We should get an alarm for partitions that have no data for topics that have data influx at all times, most of the  mediawiki.* - https://phabricator.wikimedia.org/T250699 (10Ottomata) 05Open→03Resolved canary events + monitoring exist.
[17:20:15] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Data-Engineering, 10Event-Platform, 10serviceops: Enable envoy tls proxy logging from eventgate - https://phabricator.wikimedia.org/T291856 (10Ottomata) 05Open→03Resolved
[17:20:42] <wikibugs>	 10Analytics, 10Data-Engineering, 10Event-Platform, 10Patch-For-Review: Refine drops $schema field values - https://phabricator.wikimedia.org/T255818 (10Ottomata)
[17:21:27] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Data-Engineering, 10Event-Platform, 10Platform Team Initiatives (Modern Event Platform (TEC2)): Allow disabling/enabling configured streams via wgEventStreams config - https://phabricator.wikimedia.org/T259712 (10Ottomata)
[17:22:12] <wikibugs>	 10Analytics, 10Analytics-SWAP: Users should be able to read their jupyter instance logs - https://phabricator.wikimedia.org/T198764 (10Ottomata)
[17:22:17] <wikibugs>	 10Analytics, 10Analytics-Kanban: Check AQS with cassandra (serving + data) - https://phabricator.wikimedia.org/T290068 (10JAllemandou)
[17:22:19] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Jupyter notebook logs should appear in Logstash - https://phabricator.wikimedia.org/T288348 (10Ottomata)
[17:22:30] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Data-Engineering: Repair and reload all cassandra-2 data tables but the 2 big ones - https://phabricator.wikimedia.org/T291469 (10JAllemandou) 05In progress→03Resolved Resolving!
[17:22:59] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Data-Engineering: Check AQS with cassandra (serving + data) - https://phabricator.wikimedia.org/T290068 (10JAllemandou)
[17:23:21] <wikibugs>	 10Analytics-Kanban: Analytics Hardware for Fiscal Year 2020/2021 - https://phabricator.wikimedia.org/T255145 (10Ottomata)
[17:23:24] <wikibugs>	 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Set up an-web1001 and decommission thorium - https://phabricator.wikimedia.org/T285355 (10Ottomata) 05Open→03Resolved
[17:23:45] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Metrics-Platform, and 2 others: wgEventStreams (EventStreamConfig) should support per wiki overrides - https://phabricator.wikimedia.org/T277193 (10Ottomata) 05Open→03Resolved
[17:25:30] <wikibugs>	 10Analytics, 10Data-Engineering, 10Event-Platform, 10Patch-For-Review: Users should run explicit commands to materialize schema versions, rather than using magic git hooks - https://phabricator.wikimedia.org/T290074 (10Ottomata)
[17:25:51] <wikibugs>	 10Analytics, 10SRE, 10SRE Observability (FY2021/2022-Q2): statsd and gunicorn metrics for superset - https://phabricator.wikimedia.org/T293761 (10odimitrijevic) p:05Triage→03Medium
[17:26:56] <wikibugs>	 10Analytics, 10Data-Engineering: Make it possible to use anaconda + stacked conda envs for Airflow executors - https://phabricator.wikimedia.org/T288271 (10Ottomata)
[17:26:58] <wikibugs>	 10Analytics: [Airflow] Implement DAG that syncs archiva packages to HDFS - https://phabricator.wikimedia.org/T294024 (10Ottomata)
[17:28:14] <wikibugs>	 10Analytics, 10Product-Analytics, 10wmfdata-python: Upstream relevant parts of wmfdata-python into refinery - https://phabricator.wikimedia.org/T293700 (10Ottomata) Related: {T286743}
[17:28:24] <wikibugs>	 10Analytics: Use corosync and pacemaker for presto coordinator active/standby configuration - https://phabricator.wikimedia.org/T287967 (10BTullis)
[17:28:30] <ottomata>	 joal found the timestamp kafka thing: 
[17:28:31] <ottomata>	 https://phabricator.wikimedia.org/T282887
[17:33:19] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Data-Engineering: Write document about making Superset fast enough - https://phabricator.wikimedia.org/T294046 (10JAllemandou)
[17:34:09] <joal>	 ottomata: We should try to make this happen --^ !
[17:34:12] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10wmfdata-python: wmfdata-python's Hive query output includes logspam - https://phabricator.wikimedia.org/T275233 (10Milimetric) https://github.com/wikimedia/wmfdata-python/pull/23
[17:36:47] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Data-Engineering: Write document about making Superset fast enough - https://phabricator.wikimedia.org/T294046 (10JAllemandou) a:03JAllemandou
[17:39:38] <wikibugs>	 10Analytics: Reduce superset timeouts problem - https://phabricator.wikimedia.org/T294048 (10JAllemandou)
[17:40:18] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Data-Engineering: Write document about making Superset fast enough - https://phabricator.wikimedia.org/T294046 (10JAllemandou)
[17:40:21] <wikibugs>	 10Analytics: Reduce superset timeouts problem - https://phabricator.wikimedia.org/T294048 (10JAllemandou)
[17:41:01] <wikibugs>	 10Analytics: Reduce superset timeouts problem - https://phabricator.wikimedia.org/T294048 (10JAllemandou) p:05Triage→03High
[17:42:57] <wikibugs>	 10Analytics, 10Analytics-Kanban: Purge gobblin files - https://phabricator.wikimedia.org/T287084 (10JAllemandou) Back to "In Progress" to assess whether the deletion script is stable enough and doesn't break Gobblin on a regular basis.
[17:45:47] <wikibugs>	 10Analytics: Fix gobblin not writing _IMPORTED flags when runs don't overlap hours - https://phabricator.wikimedia.org/T286343 (10JAllemandou) a:05JAllemandou→03None
[17:58:48] <wikibugs>	 10Analytics, 10DC-Ops, 10SRE, 10ops-eqiad: (Need By: TBD) rack/setup/install an-test-coord1002 - https://phabricator.wikimedia.org/T293938 (10Cmjohnson) a:03Jclark-ctr
[19:16:49] <wikibugs>	 (03PS1) 10Ottomata: Fix maps.tiles_change schema required field [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/732767 (https://phabricator.wikimedia.org/T293366)
[19:17:58] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] Fix maps.tiles_change schema required field [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/732767 (https://phabricator.wikimedia.org/T293366) (owner: 10Ottomata)
[19:20:43] <wikibugs>	 10Analytics, 10Discovery, 10Event-Platform, 10SRE, 10Platform Team Workboards (Clinic Duty Team): Avoid accepting Kafka messages with whacky timestamps - https://phabricator.wikimedia.org/T282887 (10Ottomata) This happened today, somehow there were recentchange events with timestamps from around 2007 in...
[22:48:36] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Data-Engineering: Add a presto query logger - https://phabricator.wikimedia.org/T269832 (10razzi) Turns out Presto has a built-in query log in their internal table `system.runtime.queries`:  {F34705387}  Example usage (remember to kinit first):  `razzi@stat1005:/srv/home/ra...