[00:49:09] PROBLEM - MariaDB sustained replica lag on pc2010 is CRITICAL: 2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2010&var-port=9104 [00:50:57] RECOVERY - MariaDB sustained replica lag on pc2010 is OK: (C)2 ge (W)1 ge 0.6 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2010&var-port=9104 [02:15:55] PROBLEM - MariaDB sustained replica lag on pc2007 is CRITICAL: 2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2007&var-port=9104 [02:17:41] RECOVERY - MariaDB sustained replica lag on pc2007 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2007&var-port=9104 [02:31:03] PROBLEM - MariaDB sustained replica lag on db1118 is CRITICAL: 4 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1118&var-port=9104 [02:34:41] RECOVERY - MariaDB sustained replica lag on db1118 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1118&var-port=9104 [02:45:29] PROBLEM - MariaDB sustained replica lag on db1118 is CRITICAL: 2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1118&var-port=9104 [02:50:53] RECOVERY - MariaDB sustained replica lag on db1118 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1118&var-port=9104 [03:08:59] PROBLEM - MariaDB sustained replica lag on db1118 is CRITICAL: 3 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1118&var-port=9104 [03:16:11] RECOVERY - MariaDB sustained replica lag on db1118 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1118&var-port=9104 [04:13:01] PROBLEM - MariaDB sustained replica lag on pc2010 is CRITICAL: 2.8 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2010&var-port=9104 [04:16:37] RECOVERY - MariaDB sustained replica lag on pc2010 is OK: (C)2 ge (W)1 ge 0.2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2010&var-port=9104 [05:06:16] 10Blocked-on-schema-change, 10DBA: Schema change for dropping default of page_touched - https://phabricator.wikimedia.org/T282372 (10Marostegui) [05:06:25] 10Blocked-on-schema-change, 10DBA: Schema change for dropping default of ar_timestamp - https://phabricator.wikimedia.org/T282371 (10Marostegui) [05:06:38] 10Blocked-on-schema-change, 10DBA: Schema change for dropping default of page_touched - https://phabricator.wikimedia.org/T282372 (10Marostegui) 05Open→03Resolved All done [05:06:52] 10Blocked-on-schema-change, 10DBA: Schema change for dropping default of ar_timestamp - https://phabricator.wikimedia.org/T282371 (10Marostegui) 05Open→03Resolved All done [05:13:36] 10Blocked-on-schema-change, 10DBA: Schema change to turn user_last_timestamp.user_newtalk to binary(14) - https://phabricator.wikimedia.org/T266486 (10Marostegui) [05:13:50] 10Blocked-on-schema-change, 10DBA: Schema change for watchlist.wl_notificationtimestamp going binary(14) from varbinary(14) - https://phabricator.wikimedia.org/T268392 (10Marostegui) [05:14:00] 10Blocked-on-schema-change, 10DBA: Schema change for dropping default of img_timestamp and making it binary(14) - https://phabricator.wikimedia.org/T273360 (10Marostegui) [05:14:35] 10Blocked-on-schema-change, 10DBA: Schema change to turn user_last_timestamp.user_newtalk to binary(14) - https://phabricator.wikimedia.org/T266486 (10Marostegui) 05Open→03Stalled All done, pending master swap or master DC switch to complete all the masters that are pending [05:14:54] 10Blocked-on-schema-change, 10DBA: Schema change for watchlist.wl_notificationtimestamp going binary(14) from varbinary(14) - https://phabricator.wikimedia.org/T268392 (10Marostegui) 05Open→03Stalled All done, pending master swap or master DC switch to complete all the masters that are pending [05:15:02] 10Blocked-on-schema-change, 10DBA: Schema change for dropping default of img_timestamp and making it binary(14) - https://phabricator.wikimedia.org/T273360 (10Marostegui) 05Open→03Stalled All done, pending master swap or master DC switch to complete all the masters that are pending [05:15:08] 10Blocked-on-schema-change, 10DBA: Schema change for adding oi_timestamp on oldimage table - https://phabricator.wikimedia.org/T284221 (10Marostegui) p:05Triage→03Medium a:03Marostegui [05:17:28] 10Blocked-on-schema-change, 10DBA: Schema change for adding oi_timestamp on oldimage table - https://phabricator.wikimedia.org/T284221 (10Marostegui) [05:22:15] 10Blocked-on-schema-change, 10DBA: Schema change for adding oi_timestamp on oldimage table - https://phabricator.wikimedia.org/T284221 (10Marostegui) Deployed on s6 codfw and on db1096:3316. Let's wait a few days to make sure there're no regressions on the optimizer decisions when querying this table. s6 eqia... [05:22:32] 10Blocked-on-schema-change, 10DBA: Schema change for adding oi_timestamp on oldimage table - https://phabricator.wikimedia.org/T284221 (10Marostegui) [05:34:51] 10DBA, 10Orchestrator, 10User-Kormat: orchestrator: Add service monitoring - https://phabricator.wikimedia.org/T266338 (10Marostegui) 05Open→03Resolved a:03Dzahn Closing this as it is done we just keep notifications disabled for now. Thanks Daniel [05:34:53] 10DBA, 10Orchestrator, 10Patch-For-Review, 10User-Kormat: orchestrator: Puppetize - https://phabricator.wikimedia.org/T265990 (10Marostegui) [05:36:52] PROBLEM - MariaDB sustained replica lag on pc2010 is CRITICAL: 2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2010&var-port=9104 [05:39:38] RECOVERY - MariaDB sustained replica lag on pc2010 is OK: (C)2 ge (W)1 ge 0.2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2010&var-port=9104 [06:07:47] 10DBA, 10Patch-For-Review: Upgrade s5 to Debian Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T283235 (10Marostegui) 05Stalled→03Open [06:15:12] PROBLEM - MariaDB sustained replica lag on db2132 is CRITICAL: 17.8 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2132&var-port=9104 [06:18:22] RECOVERY - MariaDB sustained replica lag on db2132 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2132&var-port=9104 [06:50:25] 10DBA, 10Patch-For-Review: Upgrade s5 to Debian Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T283235 (10Marostegui) Starting to upgrade the 10.4 hosts to the latest 10.4.19 uploaded [07:08:07] 10DBA, 10Patch-For-Review: Upgrade s5 to Debian Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T283235 (10Marostegui) @jcrespo let me know if you want to upgrade db1150 to 10.4.19 or you want me to do it. [07:32:46] 10DBA, 10Patch-For-Review: Upgrade s5 to Debian Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T283235 (10jcrespo) >>! In T283235#7133534, @Marostegui wrote: > @jcrespo let me know if you want to upgrade db1150 to 10.4.19 or you want me to do it. I can do it quickly. [07:33:11] 10DBA, 10Patch-For-Review: Upgrade s5 to Debian Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T283235 (10Marostegui) Cool thanks! [07:50:31] there is 2 hosts with the prometheus exporter down: db1124 and db1096 [07:52:11] 10DBA, 10Patch-For-Review: Upgrade s5 to Debian Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T283235 (10jcrespo) >>! In T283235#7133581, @Marostegui wrote: > Cool thanks! ` MariaDB read only s4 Version 10.4.19-MariaDB, Uptime 188s, read_only: True, event_scheduler: True, 215.45 QPS, connection... [08:10:46] 10DBA, 10Patch-For-Review: Upgrade s5 to Debian Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T283235 (10jcrespo) [08:59:10] 10Blocked-on-schema-change, 10DBA: Schema change to turn user_last_timestamp.user_newtalk to binary(14) - https://phabricator.wikimedia.org/T266486 (10Marostegui) [08:59:13] 10DBA, 10Patch-For-Review: Upgrade s5 to Debian Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T283235 (10Marostegui) [08:59:15] 10DBA, 10Patch-For-Review: Upgrade s3 to Debian Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T283131 (10Marostegui) [08:59:26] 10Blocked-on-schema-change, 10DBA: Schema change for watchlist.wl_notificationtimestamp going binary(14) from varbinary(14) - https://phabricator.wikimedia.org/T268392 (10Marostegui) [08:59:28] 10DBA, 10Patch-For-Review: Upgrade s5 to Debian Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T283235 (10Marostegui) [08:59:30] 10DBA, 10Patch-For-Review: Upgrade s3 to Debian Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T283131 (10Marostegui) [08:59:45] 10Blocked-on-schema-change, 10DBA: Schema change for dropping default of img_timestamp and making it binary(14) - https://phabricator.wikimedia.org/T273360 (10Marostegui) [08:59:49] 10DBA, 10Patch-For-Review: Upgrade s5 to Debian Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T283235 (10Marostegui) [08:59:53] 10DBA, 10Patch-For-Review: Upgrade s3 to Debian Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T283131 (10Marostegui) [09:43:48] 10DBA, 10Patch-For-Review: Add *_direct_link to imagelinks and templatelinks - https://phabricator.wikimedia.org/T278236 (10Marostegui) 05Open→03Declined I am going to close this per: T278236#6940822 In summary, *link tables do require lots of work pre work before we can start adding more things to them.... [09:47:11] kormat: is pc1008 depooled? can the kernel be upgraded? https://phabricator.wikimedia.org/T273281 [09:49:41] marostegui: it is. i've not heard from krinkle, so i don't know for sure if he's running the purge against it or not [09:49:44] but it looks quiet [09:49:59] ah ok! no worries, it can wait until he said we can do the optimize [10:21:24] one thing I would like to explore at some time is https://mariadb.com/kb/en/system-versioned-tables/ as if that could help point in time recovery (how performant it is, how much space it takes, how reliable it is, etc.) [10:23:28] yeah, it would be interesting to know the storage requirements [10:25:34] 10DBA, 10Patch-For-Review: Upgrade s3 to Debian Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T283131 (10Kormat) >>! In T283131#7131272, @Marostegui wrote: > s6 hasn't given any issues, so maybe we can start working on this next week (after 3 weeks since we switched s6) and attempt to do the swit... [10:28:13] PROBLEM - MariaDB sustained replica lag on pc2010 is CRITICAL: 2.4 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2010&var-port=9104 [10:29:19] RECOVERY - MariaDB sustained replica lag on pc2010 is OK: (C)2 ge (W)1 ge 0.8 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2010&var-port=9104 [10:52:35] 10DBA, 10Orchestrator: orchestrator: Upgrade to v3.2.4 (ish) - https://phabricator.wikimedia.org/T275784 (10Marostegui) https://github.com/openark/orchestrator/releases/tag/v3.2.5 This version does include our patch. I think we've never upgraded orchestrator since it was installed, it would be a good practice... [10:52:48] 10DBA, 10Orchestrator: orchestrator: Upgrade to v3.2.4 (ish) - https://phabricator.wikimedia.org/T275784 (10Marostegui) For the record: ` Changes since https://github.com/openark/orchestrator/releases/tag/v3.2.4: v3.2.4...v3.2.5 Notable: Introducing RecoverNonWriteableMaster flag #1332 Drop fixed list of cip... [11:19:44] 10Blocked-on-schema-change, 10DBA: Schema change for dropping default of user_touched - https://phabricator.wikimedia.org/T282373 (10Marostegui) [11:20:01] 10Blocked-on-schema-change, 10DBA: Schema change for dropping default of user_touched - https://phabricator.wikimedia.org/T282373 (10Marostegui) 05Open→03Resolved All done [12:30:54] 10DBA, 10Patch-For-Review: Upgrade s5 to Debian Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T283235 (10Marostegui) I am starting to upgrade all eqiad hosts, including clouddb* replicas [12:37:56] 10DBA: Re-image (rename) dbstore1006 into db1125 - https://phabricator.wikimedia.org/T284128 (10Marostegui) Updated: https://wikitech.wikimedia.org/w/index.php?title=MariaDB&type=revision&diff=1914322&oldid=1912301 [13:41:00] 10DBA, 10MediaWiki-Parser, 10Performance-Team, 10Parsoid (Tracking), 10Patch-For-Review: purgeParserCache.php should not take over 24 hours for its daily run - https://phabricator.wikimedia.org/T282761 (10Kormat) `15:39:55 kormat: it's running now, tee'ed to /home/krinkle/purge_parsercache_now_... [14:42:21] 10DBA, 10DiscussionTools, 10Performance-Team, 10Editing-team (FY2020-21 Kanban Board), and 2 others: Reduce parser cache retention temporarily for DiscussionTools - https://phabricator.wikimedia.org/T280605 (10Krinkle) Verified on mwdebug that: * Before this, articles on nlwiki have 21 days expiry (purge... [15:00:43] 10DBA, 10Data-Persistence-Backup, 10ops-codfw: db2100 rebooted, mysqld alerted after to say it hadn't started - https://phabricator.wikimedia.org/T283995 (10Papaul) Dear Mr Papaul Tshibamba, Thank you for contacting Hewlett Packard Enterprise for your service needs. This email confirms that your request for... [15:15:15] 10DBA, 10Data-Persistence-Backup, 10ops-codfw: db2100 rebooted, mysqld alerted after to say it hadn't started - https://phabricator.wikimedia.org/T283995 (10Papaul) I will be receiving the CPU on Monday [16:37:17] 10DBA, 10Data-Persistence-Backup, 10ops-codfw: db2100 rebooted, mysqld alerted after to say it hadn't started - https://phabricator.wikimedia.org/T283995 (10jcrespo) Thank you! [17:22:44] 10Data-Persistence-Backup, 10GitLab (Initialization), 10Patch-For-Review, 10User-brennen: Backups for GitLab - https://phabricator.wikimedia.org/T274463 (10Sergey.Trofimovsky.SF) On a different note, I'd like to remind about a separate backup set for Gitlab configuration backup, which we discussed earlier... [18:13:07] 10Data-Persistence-Backup, 10GitLab (Initialization), 10Patch-For-Review, 10User-brennen: Backups for GitLab - https://phabricator.wikimedia.org/T274463 (10Dzahn) Alright, thanks! That's not even on a different note but exactly the kind of feedback needed. So we'll have to add /etc/gitlab/config_backup to... [19:36:27] 10Data-Persistence-Backup, 10database-backups, 10Patch-For-Review, 10cloud-services-team (Kanban): Use mariabackup instead of xtrabackup for galera backups? (Or possibly for all maria backups?) - https://phabricator.wikimedia.org/T284157 (10Andrew) After considerable code-diving, we have determined that w... [20:09:31] 10Data-Persistence-Backup, 10GitLab (Initialization), 10Patch-For-Review, 10User-brennen: Backups for GitLab - https://phabricator.wikimedia.org/T274463 (10Dzahn) I added `/etc/gitlab/config_backup` to the suggested backup::set, so it would consist of /srv/gitlab-backup and /etc/gitlab/config_backup as par... [20:49:33] 10Data-Persistence-Backup, 10GitLab (Initialization), 10Patch-For-Review, 10User-brennen: Backups for GitLab - https://phabricator.wikimedia.org/T274463 (10Sergey.Trofimovsky.SF) It may contain sensitive data, particularly `/etc/gitlab/gitlab-secrets.json`. If you guys believe a single backup set is safe e... [23:57:22] 10Data-Persistence-Backup, 10GitLab (Initialization), 10Patch-For-Review, 10User-brennen: Backups for GitLab - https://phabricator.wikimedia.org/T274463 (10Dzahn) I think all backups are treated as if they contain secrets, it's not like they are made public. I am not aware of 2 levels of security when it c...