[01:15:28] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10wmfdata-python: wmfdata-python's Hive query output includes logspam - https://phabricator.wikimedia.org/T275233 (10Milimetric) Perfect, thank you for the guidance.  I'm reading the docs on the two projects and agree both would work, Impyla seems to ha...
[01:24:16] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10wmfdata-python: wmfdata-python's Hive query output includes logspam - https://phabricator.wikimedia.org/T275233 (10nshahquinn-wmf) Sounds good!
[02:05:46] <wikibugs>	 (03CR) 10Neil P. Quinn-WMF: "Congratulations on the adding the first new-style schema! 😊" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/716339 (https://phabricator.wikimedia.org/T281511) (owner: 10MNeisler)
[02:06:26] <wikibugs>	 (03PS2) 10Andrew Bogott: Added minimal page load test for '/' route [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/716558
[02:07:03] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Added minimal page load test for '/' route [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/716558 (owner: 10Andrew Bogott)
[02:14:09] <wikibugs>	 (03PS3) 10Andrew Bogott: Added minimal page load test for '/' route [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/716558
[02:14:47] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Added minimal page load test for '/' route [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/716558 (owner: 10Andrew Bogott)
[02:14:53] <wikibugs>	 (03PS4) 10Andrew Bogott: Added minimal page load test for '/' route [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/716558
[02:14:59] <wikibugs>	 (03PS1) 10Andrew Bogott: run test_output.py through Black [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/716747
[02:15:39] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] run test_output.py through Black [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/716747 (owner: 10Andrew Bogott)
[02:15:42] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Added minimal page load test for '/' route [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/716558 (owner: 10Andrew Bogott)
[02:17:18] <wikibugs>	 (03PS2) 10Andrew Bogott: run test_output.py through Black [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/716747
[02:17:20] <wikibugs>	 (03PS5) 10Andrew Bogott: Added minimal page load test for '/' route [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/716558
[02:18:48] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Added minimal page load test for '/' route [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/716558 (owner: 10Andrew Bogott)
[02:19:08] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] run test_output.py through Black [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/716747 (owner: 10Andrew Bogott)
[02:20:05] <wikibugs>	 (03Merged) 10jenkins-bot: run test_output.py through Black [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/716747 (owner: 10Andrew Bogott)
[02:38:31] <wikibugs>	 (03PS6) 10Andrew Bogott: Added minimal page load test for '/' route [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/716558
[02:39:07] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Added minimal page load test for '/' route [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/716558 (owner: 10Andrew Bogott)
[02:53:32] <wikibugs>	 10Quarry, 10cloud-services-team (Kanban): Quarry is degraded/partially inaccessible - https://phabricator.wikimedia.org/T290291 (10Chlod)
[03:01:03] <wikibugs>	 (03PS1) 10Andrew Bogott: query.py: fix a couple of url_for calls [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/716793
[03:02:15] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] query.py: fix a couple of url_for calls [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/716793 (owner: 10Andrew Bogott)
[03:02:55] <wikibugs>	 (03Merged) 10jenkins-bot: query.py: fix a couple of url_for calls [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/716793 (owner: 10Andrew Bogott)
[03:05:51] <wikibugs>	 10Quarry, 10cloud-services-team (Kanban): Quarry is degraded/partially inaccessible - https://phabricator.wikimedia.org/T290291 (10Chlod) a:03Andrew
[03:53:52] <wikibugs>	 10Analytics, 10Data-Engineering, 10Event-Platform: Discussion of Event Driven Systems - https://phabricator.wikimedia.org/T290203 (10Milimetric) >>! In T290203#7329419, @daniel wrote: > I made this doodle of an "event driven mediawiki" architecture a while ago. I had forgotten about this, but listening the "...
[04:18:28] <icinga-wm>	 RECOVERY - Check unit status of monitor_refine_event_sanitized_analytics_delayed on an-launcher1002 is OK: OK: Status of the systemd unit monitor_refine_event_sanitized_analytics_delayed https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[04:31:20] <icinga-wm>	 PROBLEM - Check unit status of monitor_refine_event_sanitized_analytics_delayed on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit monitor_refine_event_sanitized_analytics_delayed https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[04:33:06] <wikibugs>	 10Analytics-Radar, 10Product-Analytics (Kanban): [REQUEST] Investigate decrease in New Registered Users - https://phabricator.wikimedia.org/T289799 (10Tgr) In theory an account gets autocreated for all new users on metawiki and loginwiki (although this is also something that could break in theory, but stewards...
[06:43:32] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 4 others: Determine which remaining legacy EventLogging schemas need to be migrated or decommissioned - https://phabricator.wikimedia.org/T282131 (10awight) From what I can see, support for producing Schema:EditConflict wa...
[07:29:56] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 4 others: Determine which remaining legacy EventLogging schemas need to be migrated or decommissioned - https://phabricator.wikimedia.org/T282131 (10awight) I'd like to see some discussion about the `$wgPingback` defaults....
[08:21:01] <joal>	 hnowlan: good morning - I have a request to deploy for aqs new hosts (now that you've merged the scap deploy-list change) - I know we're friday, and I'd rather have your validation and monitoring - Please :)
[08:29:32] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10Event-Platform, 10Wikidata, and 3 others: Migrate WikibaseTermboxInteraction EventLogging Schema to new EventPlatform thingy - https://phabricator.wikimedia.org/T290303 (10Michael)
[08:31:14] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 4 others: Determine which remaining legacy EventLogging schemas need to be migrated or decommissioned - https://phabricator.wikimedia.org/T282131 (10Michael)
[08:31:48] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10Event-Platform, 10Wikidata, and 3 others: Migrate WikibaseTermboxInteraction EventLogging Schema to new EventPlatform thingy - https://phabricator.wikimedia.org/T290303 (10Michael)
[08:36:34] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10Event-Platform, 10Wikidata, and 3 others: Migrate WikibaseTermboxInteraction EventLogging Schema to new EventPlatform thingy - https://phabricator.wikimedia.org/T290303 (10Michael)
[08:39:46] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10Event-Platform, 10Wikidata, and 3 others: Migrate WikibaseTermboxInteraction EventLogging Schema to new EventPlatform thingy - https://phabricator.wikimedia.org/T290303 (10awight)
[08:40:10] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 4 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10awight)
[08:41:06] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 4 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10awight)
[08:49:03] <awight_>	 I'm running into this new error when running the legacy event schema migration script: > Error: No resolver found for key client_dt.
[08:51:15] <elukey>	 joal: if you need a hand I can follow the deployment as well (in theory it should be fine for the new hosts, no impact to the current cluster)
[08:51:39] <joal>	 Hi elukey - thanks a lot for offering :)
[08:52:20] <joal>	 awight_: I'm sorry I have no idea on how to help on this - I think it'd be better to wait for Andrew (he works today I think)
[08:52:41] <joal>	 elukey: Then with your approval, I'll go and deploy aqs code on new hsots
[08:53:15] <elukey>	 joal: +1 from me, I guess that they are in a separate scap env right?
[08:53:59] <elukey>	 ah no all in the same https://gerrit.wikimedia.org/r/c/analytics/aqs/deploy/+/715995/1/scap/aqs-prod
[08:54:14] <hnowlan>	 it should be safe as long as we're deploying *only* to them :) 
[08:54:20] <hnowlan>	 but we can limit the hosts scap looks at
[08:54:27] <joal>	 nope they-re not elukey - I need to rely on -limnit
[08:54:51] <elukey>	 hnowlan: o/ I'll leave it to you sorryyyy I thought you were afk :)
[08:55:19] <joal>	 scap deploy -l aqs1010.eqiad.wmnet aqs1011.eqiad.wmnet aqs1012.eqiad.wmnet ...
[08:55:22] <joal>	 hnowlan: --^
[08:55:29] <joal>	 ?
[08:55:34] <elukey>	 didn't know you could pass a list
[08:56:02] <elukey>	 but if so then it seems good, maybe start with only one to see if anything explodes 
[08:56:09] <joal>	 actualy maybe it's better with: scap deploy -l aqs101[012345].eqiad.wmnet
[08:56:51] <joal>	 elukey: aqs1010 being canary, I'll go with the above, check on aqs1010 when ready (if working), and then stop or proceed for the rest
[08:57:01] <joal>	 hnowlan (as well) --^
[08:59:06] <elukey>	 joal: does it work even with -l ?
[08:59:17] <joal>	 hm - good question!
[08:59:24] <elukey>	 I thought that the canary thing was only for a "Regular" scap deploy
[08:59:29] <joal>	 let's force a manual signel deploy to aqqs1010 then )
[08:59:49] <hnowlan>	 I think -l takes a regex 
[08:59:54] <elukey>	 this is why I was wondering about a separate scap env (like we have in refinery for hadoop-test etc..)
[09:00:26] <hnowlan>	 yeah maybe a separate env makes sense even for the short term 
[09:00:47] <elukey>	 at the end we drop the old one and that's it
[09:01:14] <joal>	 do you folks prefer me to wait and do it when the new env is ready?
[09:01:46] <hnowlan>	 joal: that'd probably be best, writing the change now 
[09:01:53] <elukey>	 hnowlan: <3
[09:02:01] <joal>	 ack hnowlan - thanks a lot for following up :)
[09:02:11] <joal>	 thanks elukey as well <3
[09:05:21] <wikibugs>	 (03PS1) 10Joal: Fix mediarequest top cassandra3 loading jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/717174 (https://phabricator.wikimedia.org/T290068)
[09:07:09] <wikibugs>	 (03CR) 10Joal: [V: 03+2 C: 03+2] "Self merging for hotfix" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/717174 (https://phabricator.wikimedia.org/T290068) (owner: 10Joal)
[09:07:47] <joal>	 !log Deploying refinery to hotfix mediarequest cassandra3 loading jobs
[09:07:50] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[09:12:05] <wikibugs>	 (03PS1) 10Hnowlan: Move new aqs hosts to aqs-next for test deploys [analytics/aqs/deploy] - 10https://gerrit.wikimedia.org/r/717181
[09:12:22] <joal>	 !log Rerun mediawiki-history-denormalize-wf-2021-08 after failure
[09:12:25] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[09:14:13] <wikibugs>	 (03CR) 10Joal: [C: 03+1] "LGTM but I don't know scap config syntax, so someone else should review :)" [analytics/aqs/deploy] - 10https://gerrit.wikimedia.org/r/717181 (owner: 10Hnowlan)
[09:18:21] <wikibugs>	 (03PS2) 10Jgiannelos: Map tile state change event schema [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/716219 (https://phabricator.wikimedia.org/T289771)
[09:19:39] <wikibugs>	 (03CR) 10Jgiannelos: Map tile state change event schema (031 comment) [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/716219 (https://phabricator.wikimedia.org/T289771) (owner: 10Jgiannelos)
[09:20:36] <wikibugs>	 (03CR) 10Jgiannelos: Map tile state change event schema (031 comment) [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/716219 (https://phabricator.wikimedia.org/T289771) (owner: 10Jgiannelos)
[09:38:17] <wikibugs>	 (03CR) 10Elukey: [C: 03+1] "LGTM!" [analytics/aqs/deploy] - 10https://gerrit.wikimedia.org/r/717181 (owner: 10Hnowlan)
[09:44:15] <wikibugs>	 (03CR) 10Hnowlan: [V: 03+2 C: 03+2] Move new aqs hosts to aqs-next for test deploys [analytics/aqs/deploy] - 10https://gerrit.wikimedia.org/r/717181 (owner: 10Hnowlan)
[09:45:54] <joal>	 !log Kill-restart mediarequest-top cassandra loading jobs after deploy
[09:45:56] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[09:53:45] <joal>	 hnowlan: I assume you've merged the patch and that mean I can now deploy using the new environment?
[09:56:09] <elukey>	 joal: it only needs another git pull on deploy1002 and then you are set
[09:56:14] <elukey>	 scap deploy -e aqs-next
[09:56:47] <joal>	 ok I'll test that - I'm always in favor of getting a +1 from an SRE before deploying :)
[09:56:50] <joal>	 thanks elukey 
[09:57:23] <joal>	 !log Deploy AQS on new AQS servers
[09:57:25] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[09:58:14] <hnowlan>	 oh heh I am deploying it right now :) 
[09:58:14] <joal>	 Ah! actually hnowlan is already doing it :) thanks a lot hnowlan :)
[09:59:20] <hnowlan>	 ahh deploy is failing possibly because of empty data 
[09:59:22] <hnowlan>	 Check 'endpoints' failed: /analytics.wikimedia.org/v1/pageviews/aggregate/{project}/{access}/{agent}/{granularity}/{start}/{end} (Get aggregate page views) is CRITICAL: Test Get aggregate page views returned the unexpected status 404 (expecting: 200); /analytics.wikimedia.org/v1/pageviews/top/{project}/{access}/{year}/{month}/{day} (Get top page views) is CRITICAL: Test Get top page views 
[09:59:28] <hnowlan>	 returned the unexpected status 404 (expecting: 200);
[09:59:48] <joal>	 hnowlan: I think we miss the test data in the table - let me fix that
[10:00:37] <hnowlan>	 ahh ok
[10:00:54] <joal>	 hnowlan: done - can you please retry?
[10:01:39] <hnowlan>	 joal: ack, doing it now 
[10:02:56] <hnowlan>	 joal: just one failure now (only on non-canary host interestingly): "Check 'endpoints' failed: /analytics.wikimedia.org/v1/mediarequests/per-file/{referer}/{agent}/{file_path}/{granularity}/{start}/{end} (Get per file requests) is CRITICAL: Test Get per file requests returned the unexpected status 404 (expecting: 200)"
[10:03:48] <joal>	 hm - can be related to non-deterministic results from cassandra
[10:04:22] <joal>	 I've been hitting this every now and then during my tests as the cluster is a bit under pressure (meaning, no result given by table while result eists)
[10:04:30] <hnowlan>	 aha
[10:04:45] <hnowlan>	 trying again 
[10:05:13] <hnowlan>	 deploy was successful on the same host but failed on another 
[10:05:42] <joal>	 right - I'm gonna stop my test, it should make the cluster a lot more stable
[10:06:25] <joal>	 there still is some read pressure AFAICS, but it's not me this time :)
[10:07:28] <hnowlan>	 heh
[10:08:39] <joal>	 Ah no my bad - aqs1010 is fine - the old cluster receives pressure
[10:19:25] <joal>	 hnowlan: shall I try anew?
[10:20:17] <hnowlan>	 joal: hold for a sec, trying to debug what's going on atm 
[10:20:20] <hnowlan>	 deploys are still failing
[10:20:24] <joal>	 ack hnowlan 
[10:20:27] <hnowlan>	 but usually for a single host 
[10:20:34] <joal>	 hnowlan: let me know if there is anything I can help with
[10:20:41] <hnowlan>	 I assume this isn't an old pattern :) 
[10:21:01] <joal>	 hnowlan: it has not happened for us this way before, I don't think
[10:25:00] <hnowlan>	 In a more confusing development, none of the new hosts are logging to logstash 
[10:27:15] <joal>	 hnowlan: I found AQS logs in logstash!
[10:29:18] <hnowlan>	 joal: Oh? For the new hosts? 
[10:29:34] <joal>	 yessir!
[10:30:05] <joal>	 hnowlan: with filter service.type = aqs
[10:30:23] <joal>	 I see the warn events of the restart after deploy
[10:34:59] <joal>	 hnowlan: it seems that the test data I inserted has not been written - when querying manually for it I get no result
[10:35:16] <joal>	 That's weird, cause I definitely imported the whole lot
[10:35:57] <joal>	 hnowlan: do you wish I insert it anew?
[10:36:17] <wikibugs>	 (03PS3) 10Jgiannelos: Map tile state change event schema [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/716219 (https://phabricator.wikimedia.org/T289771)
[10:37:26] <hnowlan>	 joal: please do 
[10:37:43] <hnowlan>	 joal: I have manually deployed the latest aqs on all hosts but ^that sounds very worrying 
[10:38:49] <joal>	 done hnowlan - data is readable now
[10:39:08] <joal>	 It's as if my insertion had worked for most but not all rows :(
[10:41:55] <joal>	 hnowlan: I confirm your deploy has worked on aqs1010 - my problem there is fixed
[10:43:40] <hnowlan>	 joal: cool - all hosts should be running the same version now 
[10:44:07] <joal>	 thanks a lot for making this happen - do ou wish I try a deploy to see if my data insertion has fixed the deploy issue?
[10:44:20] <hnowlan>	 what consistency are you writing the example data with? 
[10:44:30] <joal>	 default, meaning local I assume
[10:44:39] <hnowlan>	 joal: couldn't hurt to try again
[10:44:47] <joal>	 hnowlan: trying now
[10:46:13] <joal>	 wow - interesting - when there is an -e specify in scap but the env isn't existing, scap defaults to the default one - I think it should error
[10:46:24] <hnowlan>	 eeek 
[10:46:25] <hnowlan>	 yeah 
[10:47:18] <joal>	 hnowlan: deploy successful - man - what a mess - sorry for that :S
[10:47:57] <hnowlan>	 Well that's at least some comfort :) 
[10:48:21] <hnowlan>	 but the fact the data write might not have worked the first time is real worrying 
[10:50:02] <joal>	 agreed hnowlan - particularly without any type of error
[10:50:31] <joal>	 hnowlan: maybe asking for consistency quorum would be safer when writing
[10:50:39] <joal>	 It'd cost more, but it'd be safer
[10:51:55] <hnowlan>	 I'm worried if the lack of error means some larger hidden issue with the new cluster - it seems a bit overly paranoid but those writes *should* have propagated quite quickly 
[10:52:02] <hnowlan>	 with local consistency 
[10:52:16] <joal>	 yes - agreed
[10:52:32] <hnowlan>	 not like that cluster is busy 
[10:53:09] <joal>	 hnowlan: it was at the time I wrote the data - I was also doing some querying at the same time (nothing huge though)
[10:53:28] <hnowlan>	 even so, it should be well able to handle both... 
[10:53:36] <hnowlan>	 I'll see if eric has any insight on potential risks/verifications 
[10:54:07] <joal>	 ack
[10:54:15] <joal>	 thank you for the help hnowlan :)
[10:54:17] <hnowlan>	 is there an easy way to sample stuff from the hourly jobs so that we could maybe check every instance soon after import for the presence of the data? 
[10:54:23] <hnowlan>	 no worries, hope we can get this right! :) 
[10:54:37] <joal>	 hnowlan: I have some script doing exactly that
[10:55:53] <hnowlan>	 ah nice! 
[10:58:53] <hnowlan>	 has that script seen any inconsistency so far? 
[11:01:37] <joal>	 hnowlan: I test 10M rows on pageview_per_articles and mediarequest_per_file --> ~10 inconsistency from old cluster(no res in old) with pageviews, none in mediarequest
[11:02:11] <joal>	 And I triple checked manually on old cluster the inconsistent rows --> manual query was returning data
[11:02:21] <joal>	 I think we're ok
[11:03:16] <joal>	 Now I'm experimenting an interesting one - I have reloaded some error-data, and cassandra still see the old one
[11:07:48] <hnowlan>	 I wonder how long a write will take when consistency is set to ALL 
[11:12:50] <joal>	 I don't know
[11:29:32] <joal>	 actually my last concern was not true - there was yet another iisue at load time, cassandra was doing its job correctly - trying to fix
[11:39:02] <joal>	 I finally nailed it :)
[11:42:22] <wikibugs>	 (03PS1) 10Joal: Fix mediarequest top cassandra3 loading jobs fix [analytics/refinery] - 10https://gerrit.wikimedia.org/r/717291 (https://phabricator.wikimedia.org/T290068)
[11:42:39] <wikibugs>	 (03CR) 10Joal: [V: 03+2 C: 03+2] "Merging for hotfix" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/717291 (https://phabricator.wikimedia.org/T290068) (owner: 10Joal)
[11:43:18] <joal>	 !log Deploying refinery to hotfix mediarequest cassandra3 loading jobs (second)
[11:43:22] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[12:14:18] <joal>	  !log Kill-restart mediarequest-top cassandra loading jobs after deploy (bis)
[13:19:46] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 4 others: Determine which remaining legacy EventLogging schemas need to be migrated or decommissioned - https://phabricator.wikimedia.org/T282131 (10Ottomata)
[13:20:29] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 4 others: Determine which remaining legacy EventLogging schemas need to be migrated or decommissioned - https://phabricator.wikimedia.org/T282131 (10Ottomata)
[13:21:45] <wikibugs>	 10Quarry: Pressing the Stop button in Quarry results in a 500 error - https://phabricator.wikimedia.org/T290146 (10mdipietro) This may be related to fawiki_p, seems to leave jobs queue or running even when they are short. If the job is queued the stop function will likely fail, as it won't find a job to stop. We...
[13:37:08] <wikibugs>	 (03CR) 10Ottomata: "Great, a couple of nits, but LGTM otherwise!" [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/716219 (https://phabricator.wikimedia.org/T289771) (owner: 10Jgiannelos)
[13:43:04] <wikibugs>	 10Quarry, 10cloud-services-team (Kanban): Quarry is degraded/partially inaccessible - https://phabricator.wikimedia.org/T290291 (10Andrew) This is probably fixed by https://gerrit.wikimedia.org/r/c/analytics/quarry/web/+/716793 but it might be only partial -- can you retest?
[13:55:36] <wikibugs>	 (03CR) 10Mholloway: Map tile state change event schema (032 comments) [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/716219 (https://phabricator.wikimedia.org/T289771) (owner: 10Jgiannelos)
[13:58:36] <wikibugs>	 10Quarry: quarry giving deprecated notices - https://phabricator.wikimedia.org/T289871 (10mdipietro) 05Open→03Resolved
[13:59:22] <wikibugs>	 10Quarry: quarry giving deprecated notices - https://phabricator.wikimedia.org/T289871 (10mdipietro) The notice appears to have vanished with a separate commit. Closing. Reopen if notice reappears.
[13:59:34] <wikibugs>	 10Quarry: quarry giving deprecated notices - https://phabricator.wikimedia.org/T289871 (10mdipietro) a:03mdipietro
[14:04:17] <wikibugs>	 (03CR) 10Mholloway: Map tile state change event schema (031 comment) [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/716219 (https://phabricator.wikimedia.org/T289771) (owner: 10Jgiannelos)
[14:08:26] <wikibugs>	 (03CR) 10Mholloway: Map tile state change event schema (031 comment) [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/716219 (https://phabricator.wikimedia.org/T289771) (owner: 10Jgiannelos)
[14:10:16] <wikibugs>	 10Quarry, 10cloud-services-team (Kanban): Quarry is degraded/partially inaccessible - https://phabricator.wikimedia.org/T290291 (10Chlod) I tried executing four queries to test the changes ([[ https://quarry.wmcloud.org/query/56472 | 56472 ]], [[ https://quarry.wmcloud.org/query/48083 | 48083 ]], [[ https://qu...
[14:54:32] <wikibugs>	 10Quarry, 10cloud-services-team (Kanban): Quarry is degraded/partially inaccessible - https://phabricator.wikimedia.org/T290291 (10Andrew) I'm sorry this is misbehaving.  I just tried re-running one of your queries and it worked:  https://quarry.wmcloud.org/query/58317  This has me still thinking that this is...
[15:02:44] <wikibugs>	 10Quarry: celery version six preparation - https://phabricator.wikimedia.org/T290328 (10mdipietro)
[15:22:05] <wikibugs>	 10Quarry, 10cloud-services-team (Kanban): Quarry is degraded/partially inaccessible - https://phabricator.wikimedia.org/T290291 (10Andrew) Current theory is that this is related to database timeouts, which we adjust shortly
[15:22:41] <wikibugs>	 (03PS1) 10Michael DiPietro: update config to match for celery 6 [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/717443 (https://phabricator.wikimedia.org/T290328)
[16:30:10] <joal>	 Hi ottomata: I have not rerun the failed refine job :)
[17:15:11] <ottomata>	 joal:  huh the data looks present and fine
[17:15:24] <joal>	 wewird ottomata 
[17:15:37] <ottomata>	 i reran but it said no data needed refeinement
[17:15:49] <ottomata>	 mforns: see email i just sent about monitor refine sanitize
[17:16:03] <ottomata>	 the reason for the alerts is that delayed is backfilling https://gerrit.wikimedia.org/r/c/analytics/refinery/+/713570
[17:35:17] <wikibugs>	 10Quarry: Pressing the Stop button in Quarry results in a 500 error - https://phabricator.wikimedia.org/T290146 (10Huji) I did not understand about half of what you said! You are clearly the expert, so I defer to you on how to handle this.
[17:43:05] <wikibugs>	 10Quarry, 10cloud-services-team (Kanban): Quarry is degraded/partially inaccessible - https://phabricator.wikimedia.org/T290291 (10Andrew) Hello again @Chlod .  We've adjusted some timeouts which were probably the cause of the queued/running-forever issue. Since those are orphaned queries now they will probabl...
[17:56:52] <mforns>	 ottomata: thanks for launching the backfilling!
[17:57:13] <mforns>	 ottomata: still it doesn't make sense to me that it was failing, though...
[17:57:59] <mforns>	 ottomata and joal: it was me who re-ran the refine job, sorry for not remembering to respond to the email
[17:58:36] <mforns>	 looking into the mediawiki-history-denormalized error
[18:45:03] <milimetric>	 oof, IRC had signed me out and I didn't notice
[18:45:17] <milimetric>	 did anyone look at mw history?  It failed again after jo reran it
[18:45:50] <wikibugs>	 (03CR) 10Mforns: "@Neil, thanks for your privacy analysis! :]" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/716339 (https://phabricator.wikimedia.org/T281511) (owner: 10MNeisler)
[18:49:27] <milimetric>	 mforns: oh! you just reran it (mw history).  Jo had rerun it this morning, did you find out what was wrong?
[18:49:54] <mforns>	 oh! milimetric... no, I thought it was the first time...
[18:50:02] <mforns>	 sorry
[18:50:12] <milimetric>	 I was just searching and it said "JA018" means "output directory exists" so I was thinking maybe he forgot to delete the failed output and it failed when rerunning
[18:50:17] <milimetric>	 np at all, glad I caught you
[18:50:27] <milimetric>	 ok, so do you have the logs from this last failure?
[18:50:53] <mforns>	 not yet
[18:51:39] <milimetric>	 bc real quick?
[18:52:49] <mforns>	 omw milimetric 
[19:00:49] <wikibugs>	 10Quarry: Pressing the Stop button in Quarry results in a 500 error - https://phabricator.wikimedia.org/T290146 (10MarioGom) I'm experiencing the same issue with enwiki_p. I have one job stuck in "running", another stuck in "queued", and stop button gives error 500 for both of them.
[19:22:12] <wikibugs>	 10Quarry, 10cloud-services-team (Kanban): Quarry is degraded/partially inaccessible - https://phabricator.wikimedia.org/T290291 (10Bstorm)
[19:39:22] <wikibugs>	 10Analytics, 10Event-Platform, 10Metrics-Platform, 10Patch-For-Review: wgEventStreams (EventStreamConfig) should support per wiki overrides - https://phabricator.wikimedia.org/T277193 (10Mholloway) Here's a pleasant PHP surprise: associative arrays in PHP //[[ https://www.php.net/manual/en/language.types.a...
[20:33:32] <wikibugs>	 10Quarry: Close quarry db autocompletion on tab - https://phabricator.wikimedia.org/T289872 (10mdipietro) a:03mdipietro
[20:37:30] <wikibugs>	 10Analytics, 10Event-Platform, 10Metrics-Platform, 10Patch-For-Review: wgEventStreams (EventStreamConfig) should support per wiki overrides - https://phabricator.wikimedia.org/T277193 (10Mholloway) To expand on my dislike of the reliance on ordering: Before I realized that PHP associative arrays were order...
[20:39:55] <wikibugs>	 (03PS1) 10Michael DiPietro: close quarry db dropdown on tab [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/717592 (https://phabricator.wikimedia.org/T289872)
[20:42:21] <wikibugs>	 10Analytics, 10Event-Platform, 10Metrics-Platform, 10Patch-For-Review: wgEventStreams (EventStreamConfig) should support per wiki overrides - https://phabricator.wikimedia.org/T277193 (10Ottomata) Nice!
[20:46:57] <wikibugs>	 10Quarry: Pressing the Stop button in Quarry results in a 500 error - https://phabricator.wikimedia.org/T290146 (10mdipietro) I suspect this is a combination of a new problem and an old problem. The new problem is that the stop function doesn't consider a job in the "queued" status, it needs different logic than...
[20:54:06] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+1] close quarry db dropdown on tab [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/717592 (https://phabricator.wikimedia.org/T289872) (owner: 10Michael DiPietro)
[20:59:31] <wikibugs>	 (03CR) 10Bstorm: [C: 03+1] "Seems to work great locally." [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/717443 (https://phabricator.wikimedia.org/T290328) (owner: 10Michael DiPietro)
[21:09:36] <wikibugs>	 10Analytics, 10Data-Engineering, 10FR-Tech-Analytics, 10Privacy Engineering: event.WikipediaPortal referer modification - https://phabricator.wikimedia.org/T279952 (10sguebo_WMF) >>! In T279952#7329494, @mforns wrote: > @sguebo_WMF & @EYener, we discussed this task and will go ahead and implement this feat...
[21:40:16] <wikibugs>	 (03PS1) 10Clare Ming: POC: add new stream for VectorPrefDiffInstrumentation [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/717622 (https://phabricator.wikimedia.org/T289622)
[21:41:22] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] POC: add new stream for VectorPrefDiffInstrumentation [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/717622 (https://phabricator.wikimedia.org/T289622) (owner: 10Clare Ming)
[22:08:10] <wikibugs>	 (03PS2) 10Clare Ming: POC: add new stream for VectorPrefDiffInstrumentation [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/717622 (https://phabricator.wikimedia.org/T289622)
[22:08:19] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] POC: add new stream for VectorPrefDiffInstrumentation [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/717622 (https://phabricator.wikimedia.org/T289622) (owner: 10Clare Ming)
[22:13:22] <wikibugs>	 (03PS3) 10Clare Ming: POC: add new stream for VectorPrefDiffInstrumentation [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/717622 (https://phabricator.wikimedia.org/T289622)