[00:07:31] (PrometheusMysqldExporterFailed) firing: Prometheus-mysqld-exporter failed (matomo1002:9104) - TODO - https://grafana.wikimedia.org/d/000000278/mysql-aggregated - https://alerts.wikimedia.org/?q=alertname%3DPrometheusMysqldExporterFailed [01:20:32] (MysqlReplicationLag) firing: MySQL instance db1145:13313 has too large replication lag (1h 2m 45s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db1145&var-port=13313 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [01:25:32] (MysqlReplicationLag) resolved: MySQL instance db1145:13313 has too large replication lag (1h 2m 45s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db1145&var-port=13313 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [03:42:32] (PrometheusMysqldExporterFailed) firing: Prometheus-mysqld-exporter failed (db1117:13323) - TODO - https://grafana.wikimedia.org/d/000000278/mysql-aggregated - https://alerts.wikimedia.org/?q=alertname%3DPrometheusMysqldExporterFailed [04:00:32] (MysqlReplicationLag) firing: MySQL instance db2139:13313 has too large replication lag (50m 58s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db2139&var-port=13313 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [04:05:32] (MysqlReplicationLag) resolved: MySQL instance db2139:13313 has too large replication lag (50m 58s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db2139&var-port=13313 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [04:07:31] (PrometheusMysqldExporterFailed) firing: Prometheus-mysqld-exporter failed (matomo1002:9104) - TODO - https://grafana.wikimedia.org/d/000000278/mysql-aggregated - https://alerts.wikimedia.org/?q=alertname%3DPrometheusMysqldExporterFailed [04:59:32] (MysqlReplicationLag) firing: MySQL instance db1139:13311 has too large replication lag (1h 40m 41s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db1139&var-port=13311 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [05:09:32] (MysqlReplicationLag) resolved: MySQL instance db1139:13311 has too large replication lag (11m 45s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db1139&var-port=13311 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [05:42:17] (PrometheusMysqldExporterFailed) resolved: Prometheus-mysqld-exporter failed (db1117:13323) - TODO - https://grafana.wikimedia.org/d/000000278/mysql-aggregated - https://alerts.wikimedia.org/?q=alertname%3DPrometheusMysqldExporterFailed [08:07:31] (PrometheusMysqldExporterFailed) firing: Prometheus-mysqld-exporter failed (matomo1002:9104) - TODO - https://grafana.wikimedia.org/d/000000278/mysql-aggregated - https://alerts.wikimedia.org/?q=alertname%3DPrometheusMysqldExporterFailed [08:28:34] Hi all! we have a database (m5-master.eqiad.wmnet host, labsdbaccounts database) where we found out that adding a row with 'n' collides with an existing one with 'ñ', so we think it's a collation issue. The question is, do you think that's the case? (see T318047) and if so, what's the process to change the collation in there? [08:28:35] T318047: Newly created Toolforge tools unable to connect to MariaDB databases - https://phabricator.wikimedia.org/T318047 [08:42:32] tested locally, and using a case insensitive collation seems to do the trick :), would appreciate your input though, as I might be missing some side effects (or there might be a better option!) [09:03:24] dcaro: we usually use binary charset for production which automatically solves this [09:04:05] varchar can cause headaches, you can also simply use varbinary() instead (without needing to change collation) [09:08:20] Amir1: that sounds interesting :), a superficial read of the docs seems to point that varbinary might be easier than just binary. How would I go about applying those changes? (/me is not familiar with db changes processes) [09:08:25] I'd say that depends what's your input and which charset you need to support. Wikis use binary also because back in the days unicode support in mysql was horrible [09:09:17] currently supported ones in mariadb for reference :https://mariadb.com/kb/en/supported-character-sets-and-collations/ [09:09:25] dcaro: alter table for changing to varbinary [09:09:47] https://www.w3schools.com/SQl/sql_alter.asp [09:11:57] sorry, I was asking about how to apply those changes, if there's a process/puppet manifest/etc. more than what sql command to apply xd [09:13:37] about charsets, I think utf8 should be good for it (what we currently use), well, with the comparison working of course [09:37:13] I'm guessing that this would be the process? https://wikitech.wikimedia.org/wiki/Schema_changes#Workflow_of_a_schema_change [09:40:06] dcaro: that's for production dbs, for toollabs dbs, just run it with replication on master [09:40:19] it'll choke the replication for a bit but that's fine for wmcs dbs [09:40:47] Amir1: 👍 [09:42:14] just making sure, 'with replication on master' means just running on master not disabling the replication right? (no extra actions/commands needed) [10:32:01] dcaro: yup [11:06:22] the replag errors are quite noisy, let me see if I can make them happen only on core dbs [11:12:33] https://gerrit.wikimedia.org/r/c/operations/alerts/+/835117 [11:12:37] that was rather easy [11:14:14] marostegui: maybe I'm missing something but I think db1189 went down again (unless you shut it down for socket replacement) [11:23:11] let me see [11:23:28] it did [11:23:29] [11:23:20] marostegui@db1189:~$ w [11:23:29] 11:23:22 up 4:05, 1 user, load average: 0.02, 0.05, 0.04 [11:23:33] reporting it [11:25:16] I think we should repurporse this host to be a mw appserver, making it a serviceops problem [11:25:33] XD [12:42:04] Is it spontaneously kernel panicing? Might be worth netconsole if so? [12:53:53] no, it is the memory dimma [12:53:55] dimms [13:07:47] can we replace them? :) [13:14:46] that's what we are doing ;) [13:25:51] * Emperor will stop teaching you to suck eggs :) [13:27:03] Emperor: So the thing is that we already replaced the failed DIMM, but the host crashed again, John switched them as we normally do that to see if it is the slot or the module, and the host crashed again with a different slot, so I reckon we just need a mainboard replacement [13:33:07] urandom: you around for weekly meeting? [16:13:32] first s1 templatelinks drop is done, 100GB https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=db1099&var-datasource=thanos&var-cluster=wmcs&from=1664173726700&to=1664208783488&viewPanel=28