[10:11:02] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban: Snapshot and Reload cassandra2 pageview_per_article data table from all 12 instances - https://phabricator.wikimedia.org/T291472 (10BTullis) All compactions from the 7th snapshot loading operation have completed successfully. S...
[10:19:56] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban, and 3 others: Migrate analytics cluster alerts from Icinga to AlertManager - https://phabricator.wikimedia.org/T293399 (10BTullis)
[11:45:32] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban, and 3 others: Migrate analytics cluster alerts from Icinga to AlertManager - https://phabricator.wikimedia.org/T293399 (10BTullis)
[14:30:07] <ottomata>	 btullis: o/
[14:30:12] <ottomata>	  we had 2 mariadb masters on one host,  can you ever think of a need to do a master failover for only one of the instances?
[14:30:12] <ottomata>	 i had assumed this would be nice to be able to do...but now i can't think of a case where that would actually be useful
[14:32:04] <btullis>	 What about if you had changed a setting in `/etc/mysql/my.cnf` for only one of the two instances? You could fail only this instance over to the replica, restart the service, then fail it back?
[14:32:35] <ottomata>	 each instance has its own  conf file
[14:32:38] <ottomata>	 but hm
[14:33:33] <ottomata>	 i think the failover isn't so smooth though, you'd have to do a restart of mariadb to do the failover
[14:33:39] <ottomata>	 beause you need to switch on/off read_only mode
[14:33:44] <btullis>	 Yes, I just spotted that, I should have just said `my.cnf`
[14:34:03] <ottomata>	 and if we have to do that anyway, we might as well just restart the master
[14:34:22] <btullis>	 Yeah, understood. I'm not really familiar with mariaDB failovers here yet. 
[14:34:23] <ottomata>	 oh
[14:34:24] <ottomata>	 SET GLOBAL read_only = 1;
[14:34:25] <ottomata>	 no i guess not
[14:34:34] <ottomata>	 we could do it manual without a restart
[14:38:45] <btullis>	 I've typically done live failovers of master->replica and back with Percona Resource Manager in the past: https://github.com/Percona-Lab/pacemaker-replication-agents/blob/master/doc/PRM-setup-guide.rst
[14:40:17] <btullis>	 ...but this just automates exactly those steps. SET GLOBAL read_only = 1 on master, wait for replication lag=0, CHANGE MASTER to ..., CHANGE SLAVE TO ... , move VIP address to secondary host, SET GLOBAL read_only = 0 on replica
[14:42:24] <btullis>	 I'm aware of orchestrator: https://wikitech.wikimedia.org/wiki/Orchestrator and https://orchestrator.wikimedia.org/web/clusters - Might that be useful in this setup?
[14:46:46] <wikibugs>	 10Analytics-Radar, 10Product-Analytics: Do the messages left for unregistered or logged-out IP editors get read by those editors? - https://phabricator.wikimedia.org/T291297 (10Dbrant)
[14:46:48] <wikibugs>	 10Analytics-Radar, 10Product-Analytics (Kanban), 10Wikipedia-Android-App-Backlog (Android Release FY2021-22): What percentage of app editors are IP editors? - https://phabricator.wikimedia.org/T291866 (10Dbrant) 05Open→03Resolved
[14:50:03] <btullis>	 Here is an description of how orchestrator does managed failover:
[14:50:03] <btullis>	 https://github.com/openark/orchestrator/blob/master/docs/topology-recovery.md#graceful-master-promotion
[14:50:38] <ottomata>	 btullis:  it might be
[14:50:43] <ottomata>	 but i was tryign to do multi instance
[14:50:45] <ottomata>	 masters
[14:50:49] <ottomata>	 so 2 master instances on the same host
[14:51:02] <ottomata>	 and, while possible, data-persistence is very resistant to it, because they don't do it
[14:51:06] <btullis>	 and I think that in our case we don't use virtual IPs, but we do use HAProxy to determine which out of a given set of servers is the writeable master: https://wikitech.wikimedia.org/wiki/HAProxy
[14:51:20] <ottomata>	 i almost gave up last week, then got it mostly working friday, but after a discussion today am going to give up again
[14:51:36] <ottomata>	 the reasons why we might failover a single instance instead of all instances on a host seem very rare
[14:51:54] <ottomata>	 and, while i think the puppet would be much better and cleaner if it were written to be host-agnostic
[14:52:12] <ottomata>	 fighting with it (and data-persistance) is seeming maybe not worth it
[14:53:04] <btullis>	 OK, I see. I missed that bit. I knew that you were working on multi-instance, but didn't pick up that you'd restarted working on a similar kind of method.
[14:54:01] <ottomata>	 convo mostly happening in #wikimedia-data-persistence
[14:54:37] <btullis>	 What are our two master instances that we have? I thought we were just moving the single master->replica DB from an-coord100[12] 
[14:55:18] * btullis reading scrollback
[14:55:20] <ottomata>	 https://gerrit.wikimedia.org/r/c/operations/puppet/+/735688
[14:55:34] <ottomata>	 https://phabricator.wikimedia.org/T284150
[14:55:45] <ottomata>	 oh you aren't subscribed to that one
[14:55:46] <ottomata>	 adding you
[14:57:06] <btullis>	 Ah, it was this statement that I had missed previously:
[14:57:06] <btullis>	 > Each database, or at least database class (e.g. maybe all airflow databases on the same instance?), get their own MariaDB instance.
[14:58:16] <btullis>	 Then this:
[14:58:16] <btullis>	 > Instead, I'm leaning towards 2 instances, one for important data-metadata like hive and druid, and one for more user-facing stuff, like superset and airflow.
[15:04:47] <ottomata>	 a-team standup
[15:05:07] <milimetric>	 (reminder I'm in that business school event thing this morning, I'll be missing all meetings)
[15:06:38] <wikibugs>	 10Analytics, 10Data-Engineering, 10Product-Analytics, 10Structured-Data-Backlog, and 2 others: Create a Commons equivalent of the wikidata_entity table in the Data Lake - https://phabricator.wikimedia.org/T258834 (10Gehel) p:05Triage→03High
[15:07:54] <wikibugs>	 (03PS2) 10MNeisler: Add the SearchSatisfaction legacy schema to the allowlist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/715055 (https://phabricator.wikimedia.org/T274607)
[15:11:12] <wikibugs>	 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban, 10Wikidata, and 3 others: Events missing from event.rdf_streaming_updater_fetch_failure but present in /wmf/data/raw/event/eqiad.rdf-streaming-updater.fetch-failure - https://phabricator.wikimedia.org/T294361 (10Gehel)
[15:14:16] <addshore>	 joal: ottomata thanks for the support friday etc too, this is the tool you helped me breath some life into to be presented at wikidata con https://wmde.github.io/wikidata-map/dist/index.html
[15:15:29] <addshore>	 You can get an introduction to the data I was throwing around at https://github.com/wmde/wikidata-map/blob/master/docs/data/DATA.md 
[15:20:43] <ottomata>	 addshore:  cool!  
[15:21:32] <addshore>	 It could be cool to try and "productionize" the extraction of coordinates from the wikidata dumps each time they are loaded, and aslo backfil since 2013 :D but thats for another day
[15:24:04] <wikibugs>	 (03PS1) 10Clare Ming: Update web_ab_test_enrollment group property [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/735993 (https://phabricator.wikimedia.org/T292587)
[15:26:19] <wikibugs>	 10Analytics, 10Data-Engineering, 10Event-Platform, 10Internet-Archive, 10The-Wikipedia-Library: Store page-links-change data in a database table and make available through a Special page - https://phabricator.wikimedia.org/T221397 (10odimitrijevic)
[15:26:20] <btullis>	 ottomata: I'm now totally following the idea about getting multi-master, multi-instance working with neat and tidy puppet. I like the idea.
[15:27:24] <wikibugs>	 10Analytics, 10Data-Engineering, 10Event-Platform, 10Internet-Archive, 10The-Wikipedia-Library: Page-links-change stream doesn't capture duplicated links - https://phabricator.wikimedia.org/T216492 (10odimitrijevic)
[15:27:37] <btullis>	 I just don't quite understandy yet what the perceived benefits would be from splitting up the DBs into 'important' and 'user-facing' would be.
[15:28:43] <btullis>	 I think that the benefits from making the HA and failover systems smoother would be great, but I'm not sure why we would want more instances per-se.
[15:33:12] <wikibugs>	 10Analytics, 10Data-Engineering, 10Event-Platform, 10Internet-Archive, and 3 others: page-links-change stream is assigning template propagation events to the wrong edits - https://phabricator.wikimedia.org/T216504 (10Ottomata)
[15:42:15] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban: Add a presto query logger - https://phabricator.wikimedia.org/T269832 (10odimitrijevic)
[15:42:18] <ottomata>	 btullis: i think you are right
[15:42:30] <ottomata>	 the benefits are few, esp given the effor
[15:42:31] <ottomata>	 t
[15:42:44] <ottomata>	 i think we naively wanted to do that since is seems like the cleaner thing to do
[15:43:32] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban: Add a presto query logger - https://phabricator.wikimedia.org/T269832 (10odimitrijevic)
[15:47:12] <wikibugs>	 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban, 10Privacy Engineering: Implement Data Governance Tool - https://phabricator.wikimedia.org/T272060 (10odimitrijevic)
[15:53:38] <wikibugs>	 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban, 10Privacy Engineering: Implement Data Governance Tool - https://phabricator.wikimedia.org/T272060 (10odimitrijevic)
[15:58:36] <wikibugs>	 10Analytics-Radar, 10Data-Engineering, 10Event-Platform, 10Internet-Archive, 10The-Wikipedia-Library: Store page-links-change data in a database table and make available through a Special page - https://phabricator.wikimedia.org/T221397 (10odimitrijevic)
[16:02:29] <wikibugs>	 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban, 10Event-Platform, and 4 others: page-links-change stream is assigning template propagation events to the wrong edits - https://phabricator.wikimedia.org/T216504 (10mforns)
[16:03:13] <wikibugs>	 10Analytics-Radar, 10Data-Engineering, 10Event-Platform, 10Internet-Archive, 10The-Wikipedia-Library: Store page-links-change data in a database table and make available through a Special page - https://phabricator.wikimedia.org/T221397 (10Ottomata) If you get access to the Analytics Hadoop cluster, you...
[16:03:21] <wikibugs>	 10Analytics-Radar, 10Data-Engineering, 10Event-Platform, 10Internet-Archive, 10The-Wikipedia-Library: Store page-links-change data in a database table and make available through a Special page - https://phabricator.wikimedia.org/T221397 (10Ottomata) I think it's unlikely to get this feature implemented i...
[16:08:44] <wikibugs>	 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban: Data structuring guidance request - https://phabricator.wikimedia.org/T287402 (10odimitrijevic) a:03mforns @JAnstee_WMF can you please set up a meeting with @mforns @EChetty  to discuss and tag this ticket in the agenda. Ideally any documentation/...
[16:13:44] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban, 10Epic: Analytics Presto improvements - https://phabricator.wikimedia.org/T266639 (10mforns)
[16:17:03] <wikibugs>	 10Analytics, 10Analytics-Jupyter, 10Data-Engineering, 10Product-Analytics: conda list does not show all packages in environment - https://phabricator.wikimedia.org/T294368 (10odimitrijevic) p:05Triage→03Lowest
[16:17:29] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban, 10Epic: Analytics Presto improvements - https://phabricator.wikimedia.org/T266639 (10mforns)
[16:18:09] <wikibugs>	 10Analytics, 10Patch-For-Review: Presto should warn or prevent users from querying without Hive partition predicates - https://phabricator.wikimedia.org/T273004 (10mforns)
[16:18:19] <wikibugs>	 10Analytics, 10Patch-For-Review: Decide whether to migrate from Presto to Trino - https://phabricator.wikimedia.org/T266640 (10mforns)
[16:18:37] <wikibugs>	 10Analytics-Clusters, 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban, 10Patch-For-Review: Deploy an-test-coord1002 to facilitate failover testing of analytics coordinator role - https://phabricator.wikimedia.org/T287864 (10odimitrijevic)
[16:34:02] <wikibugs>	 10Analytics, 10Data-Engineering, 10Product-Analytics, 10Epic: Reconstruct Hive & Hadoop permissions for shared database - https://phabricator.wikimedia.org/T288983 (10odimitrijevic) Review a potential solution as part of https://phabricator.wikimedia.org/T288255
[16:36:50] <wikibugs>	 10Analytics, 10Analytics-Jupyter, 10Data-Engineering, 10Data-Engineering-Kanban: Autocomplete is very slow (unusable) in Newpyter - https://phabricator.wikimedia.org/T290008 (10odimitrijevic) p:05Triage→03Medium
[16:37:42] <wikibugs>	 10Analytics, 10Analytics-Jupyter, 10Data-Engineering, 10Data-Engineering-Kanban: Autocomplete is very slow (unusable) in Newpyter - https://phabricator.wikimedia.org/T290008 (10odimitrijevic)
[16:41:29] <wikibugs>	 10Analytics-Clusters, 10Analytics-Kanban, 10Cassandra, 10Data-Engineering, and 2 others: Set up a testing environment for the AQS Cassandra 3 migration - https://phabricator.wikimedia.org/T257572 (10odimitrijevic) a:05razzi→03None @BTullis Is this task still relevant?
[16:44:19] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban: dbstore1007 is swapping heavilly, potentially soon killing mysql services due to OOM error - https://phabricator.wikimedia.org/T290841 (10odimitrijevic) a:05razzi→03None
[16:46:13] <wikibugs>	 10Analytics, 10Data-Engineering, 10Epic: Upgrade analytics-hadoop to Spark 3 + scala 2.12 - https://phabricator.wikimedia.org/T291464 (10odimitrijevic)
[16:55:12] <wikibugs>	 (03CR) 10Mforns: [V: 03+2 C: 03+2] "LGTM!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/715055 (https://phabricator.wikimedia.org/T274607) (owner: 10MNeisler)
[16:58:07] <wikibugs>	 10Analytics, 10Data-Engineering, 10Product-Analytics, 10Epic: Reconstruct Hive & Hadoop permissions for shared database - https://phabricator.wikimedia.org/T288983 (10Mayakp.wiki) @odimitrijevic : T288255 seems to be a restricted task. Can you pls give me view access ? Thanks!
[17:29:54] <ottomata>	 dcausse: FYI, eventgate-main deployed with revision-create slot change
[17:39:32] <wikibugs>	 (03PS1) 10MNeisler: Add discussiontools_subscription query to sqoop [analytics/refinery] - 10https://gerrit.wikimedia.org/r/736021 (https://phabricator.wikimedia.org/T290516)
[21:24:35] <wikibugs>	 10Analytics, 10Platform Engineering: Replace Airflow's HDFS client (snakebite) with pyarrow - https://phabricator.wikimedia.org/T284566 (10Harej)
[22:13:35] <wikibugs>	 (03PS1) 10MewOphaswongse: Add an image: update schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/736070 (https://phabricator.wikimedia.org/T294659)
[22:42:18] <wikibugs>	 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban: Data structuring guidance request - https://phabricator.wikimedia.org/T287402 (10JAnstee_WMF) Set us up a couple weeks out!