[02:56:13] I'm trying to somewhat test php-wmerrors on an appserver in beta, on the assumption that there must be a way to get a hold of the syslog()'ed result even when logstash isn't working there. [02:56:32] The two places I checked are /var/log/syslog and journalctl -u php7.2-fpm [02:56:43] neither seems to show the @cee json lines emitted by php though [02:58:17] while MW runtime normally logs via the rsyslog port, these fatals go via syslog() but I'm not sure if that's wired up to directly go to rsyslog somehow (maybe in php-fpm it's told to route them there?) or whether rsyslog is meant to scrape it from syslog. I thought the latter. [02:58:23] I do see several of these: [02:58:24] > deployment-mediawiki11 rsyslogd: could not load module 'lmnsd_tcp', errors: trying to load module /usr/lib/x86_64-linux-gnu/rsyslog/lmnsd_tcp.so: /usr/lib/x86_64-linux-gnu/rsyslog/lmnsd_tcp.so: cannot open shared object file: No such file or directory [v8.1901.0 try https://www.rsyslog.com/e/2066 ] [06:16:55] <_joe_> ok that is bad [06:17:55] <_joe_> but the messages don't necessarily end up in /var/log/syslog or elsewhere on the system (surely *not* in journald) if you've configured rsyslog to send them to a remote location only [06:18:03] <_joe_> which I think is the case for mediawiki logs [08:26:50] akosiaris: moritzm: hi, I found our Bullseye docker image lacks the security and updates suites. So I crafted a change for that: https://gerrit.wikimedia.org/r/c/operations/puppet/+/720241 [08:27:43] I noticed that after finding NodeJS get upgraded from 12.21 to 12.22 via debian-security [11:21:20] jbond: puppet-merge seems blocked by you on puppetmaster1001 [11:24:28] arturo: sorry vps host went down, you happy for me to merge yours [11:24:43] jbond: yeah [11:24:51] merged [11:24:53] there should be also a patch for labs/private [11:25:12] I got the labs/private one merged, nevermind [11:25:18] ack cool [12:43:23] I think labs/private.git gerrit config should be changed into fast-forward to avoid polluting the history with merge commits [12:47:56] arturo: I made a mistake this morning, I realized it only when it was too late, sorry for that [12:48:26] elukey: not your problem! there are many other merge commits. I think is a configuration issue [15:12:14] mutante: did you backup mwmaint homedir's before reimage? dcausse might have lost stuff (they seeing if they saved it elsewhere) [15:13:00] we might have a copy on the backups, I can ask Jaime tomorrow, or better, if you or them, can create a task, we could double check with Jaime [15:13:44] dcausse: ^ [15:13:49] thanks :) [15:14:06] looks like we do backup homes on mwmaints: https://github.com/wikimedia/puppet/blob/b4a04bfed966b24881d5b187e64fe43e3f492f65/modules/profile/manifests/mediawiki/maintenance.pp#L89 [15:14:15] I'll check and create a task if I really need to restore something [15:16:56] majavah: yeah, the key is to know when we need to restore things from [15:17:08] when meaning from which date the files dcausse might be looking for [15:23:44] I am trying to use pcc for a puppet change (that used to compile fine) and I am get a weird realm.pp error [15:23:51] https://puppet-compiler.wmflabs.org/compiler1003/31166/contint1001.wikimedia.org/change.contint1001.wikimedia.org.err [15:27:09] it seems a weirdness with facts on the compiler hosts, but not sure [15:30:59] marostegui, dcausse: yeah, we had to restore something on the other mwmaint reimage too, and we just used the reimage time -- in this case 2021-09-16 14:35, according to SAL [15:31:50] rzl: Sure, if we can get that on the task, it will be helpful as a data point from where to restore [15:32:04] <_joe_> yes it looks like the facts don't have 'numa' at all from that error message [15:32:26] <_joe_> but ssh contint1001.wikimedia.org sudo facter -p numa.nodes says otherwise [15:36:11] marostegui: 👍 [15:40:37] <_joe_> elukey: the facts for contint1001 do have numa/nodes on compiler1003 [15:41:42] <_joe_> but yes, every compilation fails the same [15:42:17] <_joe_> jbond: did anything change on the compilers today? [15:44:30] _joe_: not afaik, just reading backlog [15:45:51] <_joe_> so any compilation running on compiler1003 (AFAICT) will result in something like https://puppet-compiler.wmflabs.org/compiler1003/31171/deploy1002.eqiad.wmnet/prod.deploy1002.eqiad.wmnet.err [15:47:43] <_joe_> so unless puppet is failing everywhere in production, it looks like that warning indicates facts are not correctly read [15:47:50] ahh crap i know what this is [15:48:06] <_joe_> oh please do tell :) [15:48:42] I have updated pcc's facts just in case (I knew it wasn't the reason but it was ok for me as test) [15:48:45] so to fix some issues with puppetdb we started filtering facts. the facts are still avalible at catalog compilation time so they dont affect production compilation. It just means that the facts are not avalible in puppetdb so they can be used in things like cumin queries, puppetboard etc. [15:49:08] hwoever the script which generates the facts to be used by the compileres asks for the facts from puppetdb so it will see the filtered ones [15:49:11] <_joe_> oh crap [15:49:13] <_joe_> right [15:49:28] <_joe_> so we're now importing the facts *in puppetdb*? [15:49:46] <_joe_> because I can tell you that on disk the facts file for contint1001 still has the numa data [15:50:06] <_joe_> jbond: also why did this broke only today? [15:50:11] yes the stuff on disk and what is actully used by production is all unaffected [15:50:20] <_joe_> no I mean on the compiler [15:50:37] _joe_: possibly someone did a facts import today and before that it was using what ever file was last imported [15:50:41] <_joe_> under /var/lib/catalog-differ/puppet/facts [15:50:59] <_joe_> the file for contint1001 has the "numa" stanza [15:51:42] <_joe_> /var/lib/catalog-differ/puppet/yaml/facts/contint1001.wikimedia.org.yaml specifically [15:51:58] there are two files for that node, and /var/lib/catalog-differ/puppet/yaml/puppetmaster2001.codfw.wmnet/facts/deploy1002.eqiad.wmnet.yaml dosn;t have them [15:52:15] <_joe_> ohhh ok we've changed the structure [15:52:22] oh sorry i was chcking deploy1002 [15:52:25] <_joe_> but never removed the old directory :D [15:53:01] <_joe_> and yes, indeed the fact is gone [15:53:17] ik think it searches for any matching file under /var/lib/catalog-differ/puppet/yaml/ would have too look to see how it decides when there aremultiple [15:53:42] <_joe_> elukey: you broke the compiler [15:54:57] anyway it turned out that dependencies had a much bigger impact on puppetdb preformance then the fact set so we could probably add numa back in [15:55:24] _joe_ nono the update facts was ran after it broke [15:56:06] <_joe_> jbond: do you concur we can blame elukey though? [15:56:07] no idea if somebody else ran it before me [15:56:15] <_joe_> jbond: yeah makes sense :) [15:56:24] no jbond is a nice person :D [15:56:37] * jbond deletes his last response [15:56:42] <_joe_> LOL [15:56:44] of course not _joe_ im too nice :P [15:57:00] ahahahahah [15:57:02] <_joe_> wink wink [15:57:16] well played you two [15:57:25] I am lucky that kormat is not seeing this [15:57:33] anyway as this is getting to the end of the day i would prefer to fix it tomorrow unless its blocking? [15:57:45] if it is though i can do now [15:57:50] +1 nothing blocked, I was re-running a pcc [15:57:54] <_joe_> I mean, the compiler is broken, I'd ask people who have a full day ahead of them [15:58:08] <_joe_> or at least let's open a task with the details so people know what's broken [15:58:44] oh your ritght i can do it now, planned to be around for another hour anyway [15:59:01] jbond: if I can help I'm around too [15:59:08] ack thanks volans [16:18:00] _joe_: elukey: have deployed the change will update the facts in 30 minutes once most puppet has run everywhere [16:18:21] nice thanks a lot! [16:18:21] <_joe_> <3 [16:18:35] <_joe_> elukey: now you only need to reimport the facts in ~ 1 hour [16:21:27] yes yes [16:31:11] RhinosF1: it's in Bacula if needed [16:31:30] mutante: ty [16:31:32] can restore it later today [16:47:50] marostegui: regarding Tendril / show processlist - it seems that today it is able to group queries such that it ignores values (e.g. foo=X instead of foo=123) and ignores the already-unique comment we put in the sql text with the request username/ipaddresss. [16:48:28] are one or both of those things we do somewhere in mariadb and/or in tendril? I couldn't find any normalizing code in tendril, so I'm guessing it comes out this way straight from processlist. [16:48:56] but I cringe at the thought that maybe we hacked mariadb to regex-change part of a sql text comment out of processlist. [17:34:05] dcausse: I heard you might be missing some files from mwmaint2002. If that is the case I can restore them from bacula for you. I just won't be here for a couple hours but I will read messages later today, just let me know. cya [17:37:16] mutante: hey! no worries, I think I can survive without them, I've found one of the script I needed elsewhere :) [17:40:18] dcausse: ok, cool, if you change your mind, it's not a big deal [17:41:23] usually I have always copied all the files but then we end with a growing pile of older and older data so I figured this time Bacula option is enough [22:17:50] elukey: _joe_: (for tomorrow) looks like PCC is still broken, is there still something left to be done there (maybe reimporting the facts)? [22:20:50] ryankemper: https://wikitech.wikimedia.org/wiki/Help:Puppet-compiler#Updating_nodes might be useful if facts are stale, but I'm just dropping drive-by links at you. :) [22:21:48] bd808: thanks, those are my favorite types of links :P [22:22:47] I think I likely lack project membership based on my inability to ssh into `compiler1001.puppet-diffs.eqiad.wmflabs` [22:24:28] Current admins at https://openstack-browser.toolforge.org/project/puppet-diffs can add you to the project. Technically I can too if you can't find anyone active to help you out. [22:25:33] If our openstack stuff allowed it I feel like the admin list there should == the "ops" ldap group