[00:02:39] 06Data-Engineering, 06Data Products: project-title-country missing US data in recent data, and double quote escaping - https://phabricator.wikimedia.org/T341139#9724846 (10nshahquinn-wmf) 05Open→03Resolved [07:38:34] 06Data-Engineering, 06Data Products, 06Structured-Data-Backlog: DagProperties don't automatically update Airflow variables - https://phabricator.wikimedia.org/T348963#9725222 (10mforns) I think the solution I propose above does not work, I think this assumption is wrong: > If there's a difference between a c... [07:54:43] 06Data-Engineering, 06Data Products, 06Structured-Data-Backlog: DagProperties don't automatically update Airflow variables - https://phabricator.wikimedia.org/T348963#9725258 (10mforns) I think the checksum might need to contain the state of both the default values coming from the code AND the override value... [08:03:05] (KafkaReplicationFactorTooLow) firing: ... [08:03:05] Kafka topic codfw.mediawiki.ip_reputation.score replication factor is too low on jumbo-eqiad - https://wikitech.wikimedia.org/wiki/Kafka/Administration#Increase_a_topic's_replication_factor - https://grafana.wikimedia.org/d/000000234/kafka-by-topic?var-kafka_cluster=jumbo-eqiad&var-kafka_broker=All&var-topic=codfw.mediawiki.ip_reputation.score&viewPanel=40 - https://alerts.wikimedia.org/?q=alertname%3DKafkaReplicationFactorTooLow [08:07:03] (03PS1) 10Mforns: Commons Impact Metrics queries - Correct order of insert [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1021365 (https://phabricator.wikimedia.org/T358699) [08:08:05] (KafkaReplicationFactorTooLow) resolved: ... [08:08:05] Kafka topic codfw.mediawiki.ip_reputation.score replication factor is too low on jumbo-eqiad - https://wikitech.wikimedia.org/wiki/Kafka/Administration#Increase_a_topic's_replication_factor - https://grafana.wikimedia.org/d/000000234/kafka-by-topic?var-kafka_cluster=jumbo-eqiad&var-kafka_broker=All&var-topic=codfw.mediawiki.ip_reputation.score&viewPanel=40 - https://alerts.wikimedia.org/?q=alertname%3DKafkaReplicationFactorTooLow [08:08:07] (03CR) 10Mforns: [V:03+2 C:03+2] "I forgot to push this changes after testing." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1021365 (https://phabricator.wikimedia.org/T358699) (owner: 10Mforns) [08:10:13] !log starting refinery deployment for commons impact metrics changes (0.2.36) [08:10:15] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [09:30:28] !log finished refinery deployment for commons impact metrics changes (0.2.36) [09:30:29] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:05:44] (03CR) 10Phuedx: [C:03+2] "Thanks!" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/1020222 (owner: 10Kai Nissen (WMDE)) [10:06:30] (03Merged) 10jenkins-bot: Fix typo in property attribute [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/1020222 (owner: 10Kai Nissen (WMDE)) [11:04:46] 06Data-Engineering, 06Data Products, 10Observability-Logging, 06Traffic, 13Patch-For-Review: Move analytics log from Varnish to HAProxy - https://phabricator.wikimedia.org/T351117#9725885 (10gmodena) >>! In T351117#9688466, @gmodena wrote: > Next steps: now that we are starting to collect more logs, we c... [11:23:14] 06Data-Engineering: Define a strategy to deal with xml-dumps huge files - https://phabricator.wikimedia.org/T362870 (10JAllemandou) 03NEW [11:30:30] 06Data-Engineering, 06MediaWiki-Engineering, 06serviceops, 10WMF-JobQueue, and 2 others: Could not enqueue jobs: "Unable to deliver all events: 503: Service Unavailable" - https://phabricator.wikimedia.org/T249745#9725953 (10Ottomata) > search index not getting updated in 0.001% of edits Search is probabl... [11:30:56] 06Data-Engineering: Define a strategy to deal with xml-dumps huge files on the datalake - https://phabricator.wikimedia.org/T362870#9725975 (10JAllemandou) [11:35:41] btullis: Hello! would be ok to review/apply my patch for yarn queues? [11:37:27] joal: Yes, on it now. [11:37:37] btullis: thank you so much <3 [11:41:33] !log adding new 'launchers' yarn queue and renaming 'fifo' to 'gpus' for T361499 [11:41:36] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [11:41:36] T361499: [Maintenance] Resolve long launch times for canary events on Airflow (30mins in total) - https://phabricator.wikimedia.org/T361499 [12:07:30] joal: that's deployed now, but I'll restart the resourcemanagers after lunch. Feel free to do it yourself, if you prefer. [12:23:13] 06Data-Engineering, 06MediaWiki-Engineering, 06serviceops, 10WMF-JobQueue, and 2 others: Could not enqueue jobs: "Unable to deliver all events: 503: Service Unavailable" - https://phabricator.wikimedia.org/T249745#9726100 (10Ladsgroup) >>! In T249745#9725953, @Ottomata wrote: >> search index not getting up... [12:29:48] 06Data-Engineering, 06Data-Platform-SRE: Package request: install elixir and erlang-otp to the analytics clients - https://phabricator.wikimedia.org/T362678#9726116 (10awight) 05Open→03Resolved a:03awight @BTullis Thanks for highlighting this possibility! I tried the Conda environment as you suggest... [12:34:26] 14Analytics, 06Data-Engineering, 06DBA, 10Event-Platform: Eventually-Consistent MediaWiki state change events | MediaWiki events as source of truth - https://phabricator.wikimedia.org/T120242#9726131 (10Ottomata) There is a lil discussion about this topic in {T249745}. Moving that discussion to here. @la... [12:34:47] 06Data-Engineering, 06Data Products, 10Observability-Logging, 06Traffic, 13Patch-For-Review: Move analytics log from Varnish to HAProxy - https://phabricator.wikimedia.org/T351117#9726133 (10Fabfur) I agree with @gmodena on all topics, more specifically: * About the `sequence` issue, that's the most pla... [12:36:13] 06Data-Engineering, 06MediaWiki-Engineering, 06serviceops, 10WMF-JobQueue, and 2 others: Could not enqueue jobs: "Unable to deliver all events: 503: Service Unavailable" - https://phabricator.wikimedia.org/T249745#9726141 (10Ottomata) Replied at T120242#9726131 [12:39:43] (03CR) 10Ottomata: [C:03+1] "Retroactive +1, thanks!" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1003745 (https://phabricator.wikimedia.org/T356363) (owner: 10Joal) [12:39:45] !log refreshing yarn queues after config change (joal@an-launcher1002:~$ sudo -u yarn yarn rmadmin -refreshQueues) [12:40:58] Arf, I can't do that [12:41:48] 06Data-Engineering, 06Data Products, 10Observability-Logging, 06Traffic, 13Patch-For-Review: Move analytics log from Varnish to HAProxy - https://phabricator.wikimedia.org/T351117#9726146 (10Ottomata) > We could append (or prepend) other information pieces to the sequence number (like the haproxy process... [12:42:06] Since a queue is deleted, we need a restart of the resource manager - I'll wait for you btullis, I don't have the right to do that :) [12:55:57] 06Data-Engineering: Define a strategy to deal with xml-dumps huge files on the datalake - https://phabricator.wikimedia.org/T362870#9726232 (10JAllemandou) Some of the big files listed above are due to the dumps job not splitting files (for `cawiki` and `cswiki` for instance). For the rest, big files come from b... [13:12:25] 06Data-Engineering, 06Data Products, 10Observability-Logging, 06Traffic, 13Patch-For-Review: Move analytics log from Varnish to HAProxy - https://phabricator.wikimedia.org/T351117#9726287 (10gmodena) > About the sequence issue, that's the most plausible hypotheses. We could append (or prepend) other info... [13:46:33] 10Data-Engineering (Q4 2024 April 1st - June 30th), 13Patch-For-Review: Add host level instrumentation on webrequest - https://phabricator.wikimedia.org/T362785#9726510 (10gmodena) [13:46:54] 10Data-Engineering (Q4 2024 April 1st - June 30th), 13Patch-For-Review: Add instrumentation for actor signatures - https://phabricator.wikimedia.org/T362783#9726512 (10gmodena) [13:47:35] 06Data-Engineering, 06Data Products, 10Observability-Logging, 06Traffic, 13Patch-For-Review: Move analytics log from Varnish to HAProxy - https://phabricator.wikimedia.org/T351117#9726516 (10Fabfur) >>! In T351117#9726287, @gmodena wrote: >> About the sequence issue, that's the most plausible hypotheses.... [13:59:16] 06Data-Engineering, 06Data Products, 10Observability-Logging, 06Traffic, 13Patch-For-Review: Move analytics log from Varnish to HAProxy - https://phabricator.wikimedia.org/T351117#9726548 (10JAllemandou) I think @Ottomata 's idea is good: having another column makes it easy to keep the "monotonic" values... [14:02:22] btullis: Ping in case you have not seen my previous message :) [14:07:28] !log restarted the hadoop-yarn-resourcemanager.service on an-master100[3-4] to pick up new queue settings for T361499 [14:07:31] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:07:31] T361499: [Maintenance] Resolve long launch times for canary events on Airflow (30mins in total) - https://phabricator.wikimedia.org/T361499 [14:08:14] joal: That's restarted now. :-) [14:13:10] 10Quarry: [bug] Internal Server Error when trying to Stop Query - https://phabricator.wikimedia.org/T362891 (10Ahecht) 03NEW [14:45:55] 10Quarry: [bug] Internal Server Error when trying to Stop Query - https://phabricator.wikimedia.org/T362891#9726771 (10SD0001) →14Duplicate dup:03T362213 [14:46:16] 10Quarry: Error 500 when clicking "stop query" - https://phabricator.wikimedia.org/T362213#9726773 (10SD0001) [15:15:03] yarn.wikimedia.org gives me a broken redirect to http://an-master1004.eqiad.wmnet:8088/, is that expected/known? [15:15:47] Ah, sorry. It's failed over to the standby. I'll fail it back to the primary now. [15:16:18] ah, that explains [15:16:27] aqs cassandra instances in codfw are using PKI certs now! Will wait until next week before proceeding with eqiad, if you see anything weird lemme know [15:16:48] https://www.irccloud.com/pastebin/mUcReMMk/ [15:16:57] moritzm: That should work again now. [15:19:00] confirmed,thanks [15:37:19] 06Data-Engineering, 06MediaWiki-Engineering, 06serviceops, 10WMF-JobQueue, and 2 others: Could not enqueue jobs: "Unable to deliver all events: 503: Service Unavailable" - https://phabricator.wikimedia.org/T249745#9726999 (10akosiaris) >>! In T249745#9723704, @Ottomata wrote: >>> For replicating state chan... [15:37:20] 14Analytics, 06Data-Engineering, 06DBA, 10Event-Platform: Eventually-Consistent MediaWiki state change events | MediaWiki events as source of truth - https://phabricator.wikimedia.org/T120242#9726998 (10akosiaris) Commenting here as well at the request of @ottomata in T249745#9725953 In what apparently is... [16:03:27] 06Data-Engineering, 06Data Products, 10Observability-Logging, 06Traffic, 13Patch-For-Review: Move analytics log from Varnish to HAProxy - https://phabricator.wikimedia.org/T351117#9727085 (10xcollazo) >>! In T351117#9726548, @JAllemandou wrote: > I think @Ottomata 's idea is good: having another column m... [16:04:48] 14Analytics-Radar, 06Data-Engineering-Icebox, 10observability, 10Observability-Logging, and 2 others: Retire udp2log: onboard its producers and consumers to the logging pipeline - https://phabricator.wikimedia.org/T205856#9727092 (10bd808) [16:36:13] moritzm: Just for reference, this is the ticket about that behaviour. T331448 [16:36:20] T331448: Make YARN web interface work with both primary and standby resourcemanager - https://phabricator.wikimedia.org/T331448 [16:57:52] !log switching matmo service from matomo1002 to matomo1003 [16:57:53] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:08:59] cheers [17:37:26] !log DEploy airflow for canary-event scaling [17:37:27] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:42:43] !log Rerun cacnry-events on previous hour to test patch [17:42:45] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:57:33] 10Data-Engineering (Q4 2024 April 1st - June 30th), 13Patch-For-Review: [Maintenance] Resolve long launch times for canary events on Airflow (30mins in total) - https://phabricator.wikimedia.org/T361499#9727650 (10CodeReviewBot) joal opened https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/mer... [18:08:26] 10Data-Engineering (Q4 2024 April 1st - June 30th), 13Patch-For-Review: [Maintenance] Resolve long launch times for canary events on Airflow (30mins in total) - https://phabricator.wikimedia.org/T361499#9727711 (10CodeReviewBot) amastilovic merged https://gitlab.wikimedia.org/repos/data-engineering/airflow-dag... [18:10:03] !log Re-deploy airflow for canary-event scaling [18:10:04] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:10:10] !log Rerun canary-events on previous hour to test patch [18:10:11] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:21:26] 14Analytics, 06Data-Engineering-Icebox, 10Metrics Platform Backlog, 06Product-Analytics: Schema repository structure, naming - https://phabricator.wikimedia.org/T269936#9727754 (10Ottomata) 05Open→03Resolved Being bold. [18:24:44] 10Data-Engineering (Q4 2024 April 1st - June 30th), 13Patch-For-Review: [Maintenance] Resolve long launch times for canary events on Airflow (30mins in total) - https://phabricator.wikimedia.org/T361499#9727783 (10JAllemandou) Global execution times have been divided by 3 (10mins for 170 jobs). We are using a... [18:27:46] 10Quarry, 10ChangeProp, 06collaboration-services, 06Infrastructure-Foundations, and 10 others: Figure out a plan to move forward with regarding Redis License changes - https://phabricator.wikimedia.org/T360596#9727771 (10brennen) [21:12:44] (03PS5) 10Mforns: Add queries to format commons impact metrics data as dumps [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1019845 (https://phabricator.wikimedia.org/T358701) [21:14:28] (03CR) 10Mforns: "I've fixed some typos and tested all queries." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1019845 (https://phabricator.wikimedia.org/T358701) (owner: 10Mforns) [21:14:43] (03PS6) 10Mforns: Add queries to format commons impact metrics data as dumps [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1019845 (https://phabricator.wikimedia.org/T358701) [23:56:03] 06Data-Engineering, 06Movement-Insights, 10Temporary accounts: Clarify analytics and metrics definitions around anonymous and temporary editors - https://phabricator.wikimedia.org/T332205#9728461 (10nshahquinn-wmf) 05Open→03Resolved This was actually done a while ago.