[03:42:20] <ryankemper>	 grr, the systemd timer for the sane-itizer is called `mediawiki_job_cirrus_sanitize_jobs.timer`
[03:42:33] * ryankemper was confused why `sudo systemctl list-timers | grep sane` returned nothing
[04:28:46] <ryankemper>	 I fully disabled the sane-itizer. While we're at it here's a patch to fix the name: https://gerrit.wikimedia.org/r/740711
[05:45:25] <ryankemper>	 Codfw rolling restart's complete so we should be ready to kick off the index restoration whenever
[09:02:21] <gehel>	 ejoseph: meeting? https://meet.google.com/ukb-kgxq-gvq
[09:57:13] <zpapierski>	 meal break
[09:58:55] <ejoseph>	 [INFO] [WARNING] The requested profile "pom.xml" could not be activated because it does not exist.
[09:58:55] <ejoseph>	 [INFO] [ERROR] Failed to execute goal org.sonatype.plugins:nexus-staging-maven-plugin:1.6.8:deploy (injected-nexus-deploy) on project extra-analysis: Execution injected-nexus-deploy of goal org.sonatype.plugins:nexus-staging-maven-plugin:1.6.8:deploy failed: Cannot decipher credentials to be used with Nexus! org.sonatype.plexus.components.sec.dispatcher.shaded.SecDispatcherException: java.io.FileNotFoundException: 
[09:58:55] <ejoseph>	 /Users/iemarjay/.m2/settings-security.xml (No such file or directory) -> [Help 1]
[09:58:55] <ejoseph>	 [INFO] [ERROR] 
[09:58:55] <ejoseph>	 [INFO] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[09:58:55] <ejoseph>	 [INFO] [ERROR] Re-run Maven using the -X switch to enable full debug logging.
[09:58:55] <ejoseph>	 [INFO] [ERROR] 
[09:58:56] <ejoseph>	 [INFO] [ERROR] For more information about the errors and possible solutions, please read the following articles:
[09:58:56] <ejoseph>	 [INFO] [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/PluginExecutionException
[09:59:21] <ejoseph>	 when i run ./mvnw -B -P deploy-central release:perform
[10:01:42] <dcausse>	 ejoseph: seems to be related to how you stored you gpg key passphrase I guess
[10:01:52] <dcausse>	 perhaps related: https://stackoverflow.com/questions/33339611/where-to-save-the-setting-security-xml-file-to-use-maven-encryption
[10:02:10] <dcausse>	 I did not set this on my side so can't really tel
[10:02:52] <dcausse>	 sorry perhaps not the gpg passphrase but the ossrh repo password?
[10:06:02] <dcausse>	 "The requested profile "pom.xml" could not be activated because it does not exist." is very strange
[10:06:27] <dcausse>	 I wonder how it could have inferred "pom.xml" as a profile name
[10:15:08] <ejoseph>	 That's where i am confused
[10:15:23] <ejoseph>	 Fixed the master password though
[10:15:40] <ejoseph>	 I want retry
[10:17:23] <dcausse>	 you can retry the "release:perform", I don't think it had uploaded anything yet
[10:47:18] <gehel>	 sorry I'm late to the party :/ E_TOO_MANY_MEETINGS (but they were interesting)
[10:47:53] <gehel>	 the pom.xml profile issue is a known problem in the Maven release process. It's only a warning and has no serious consequences.
[10:48:23] <gehel>	 the master password issue seems to be the real problem. Did a retry work this time?
[11:17:28] <zpapierski>	 dcausse: I went through https://gerrit.wikimedia.org/r/c/wikidata/query/rdf/+/739105/2 , but I'm a bit lost ont the buffered input messages in Kafka stream consumer and removal of the partial message test (which I'm guessing are connected) - I don't remember exactly when it was used and why is it removed now, can you shed some light on this?
[11:24:02] <dcausse>	 zpapierski: since messages are splitted into chunks at the producer level (to fit the kafka max message limit) we need to reconstruct them
[11:24:49] <dcausse>	 the previous approach was to still allow partially reconstructed message if they ended up filling the whole buffer on the consumer 
[11:25:21] <dcausse>	 applying reconciliation we can't really process partial messages since that would mean restoring a "partial entity"
[11:25:55] <dcausse>	 so we drop support for partially reconstructed messages
[11:26:21] <zpapierski>	 so no limit on a buffer?
[11:27:08] <dcausse>	 no
[11:27:19] <dcausse>	 we could add a limit but we would need to fail
[11:29:02] <kostajh>	 hi! we're scheduling a sendResetWeightedTags job our code (https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/737916/8/includes/NewcomerTasks/AddImage/AddImageSubmissionHandler.php#104). Is there an easy way to know the median time for these updates to be integrated into the search index? In this particular case, it's to reset the recommendation.image flag
[11:33:05] <dcausse>	 kostajh: they go to the cirrusSearchElasticaWrite topic (https://grafana.wikimedia.org/d/CbmStnlGk/jobqueue-job?orgId=1&var-dc=eqiad%20prometheus%2Fk8s&var-job=cirrusSearchElasticaWrite)
[11:33:47] <dcausse>	 I'd say 1min latency on average but can peak higher depending on the backlog
[11:34:52] <kostajh>	 ah, OK. For some reason I thought it was significantly longer to get the search index updated (like, hours). Maybe that was the case with recommendation.link at some point? or am I misremembering?
[11:35:01] <kostajh>	 (thanks for the link + info)
[11:35:04] <dcausse>	 the graph has flattened since yesterday it's because we disable some maintenance process that we might re-enable soon
[11:35:33] <dcausse>	 reseting is quick, pushing new data is slower as it is a batch process scheduled hourly
[11:36:40] <kostajh>	 ah, ok
[11:37:43] <kostajh>	 thanks
[11:38:02] <dcausse>	 yw!
[11:39:06] <dcausse>	 lunch
[11:46:25] <ejoseph>	 Failed to execute goal org.sonatype.plugins:nexus-staging-maven-plugin:1.6.8:deploy (injected-nexus-deploy) on project extra-analysis: Execution injected-nexus-deploy of goal org.sonatype.plugins:nexus-staging-maven-plugin:1.6.8:deploy failed: Nexus connection problem to URL [https://oss.sonatype.org/ ]: 401 - Unauthorized -> [Help 1]
[11:46:47] <ejoseph>	 that's the error i get now
[11:47:07] <zpapierski>	 ejoseph: how are you generally doing with stuff - want to continue on java training/plugins/needlessly detailed description of CPU memory/op caches? (on that last one I highly recommend https://en.wikichip.org/wiki/WikiChip - it's a treasure throve)
[11:48:24] <ejoseph>	 Let's continue with the plugin
[11:48:48] <zpapierski>	 (funnily enough, that page is missing an actual page on 8086, weird)
[11:49:01] <zpapierski>	 ok, I'll grab some water, you provide me a link
[12:54:27] <gehel>	 ejoseph: when you are reporting errors, could you provide a full log? Or at least some context around the error? You can upload snipets to Phabricator Paste: https://phabricator.wikimedia.org/paste/edit/form/14/
[12:55:26] <gehel>	 In this case, it looks like you might not have access to org.wikimedia on sonatype, which is strange since https://issues.sonatype.org/browse/OSSRH-75104 has been processed
[12:56:10] <gehel>	 ejoseph: can you login to https://oss.sonatype.org/ ?
[12:56:18] <gehel>	 with your sonatype credentials?
[13:00:48] <ejoseph>	 ok
[13:03:06] <ejoseph>	 https://www.irccloud.com/pastebin/j1ZJ4odI/
[13:03:38] <ejoseph>	 gehel: i just logged in
[13:04:20] <gehel>	 ok, so the credentials work, but seem to not give you the right access
[13:07:33] <zpapierski>	 lunch
[13:10:31] <gehel>	 ejoseph: can you try adding `-e -X` to get the full errors and debug logs?
[13:11:17] <ejoseph>	 ok
[13:13:30] <ejoseph>	 https://usercontent.irccloud-cdn.com/file/zatmPvHz/mvnw.log
[13:13:45] <ejoseph>	 couldn't send snippet 
[13:19:58] <gehel>	 I'm trying to find out if there is a way to list permissions on oss.sonatype.org
[13:24:13] <ejoseph>	 ok
[13:43:34] <ejoseph>	 launch
[13:46:20] <dcausse>	 when I log into https://oss.sonatype.org/ and select "Staging profiles" I see "org.wikimedia"
[13:47:10] <dcausse>	 not sure we have perms to list other users permissions on this group
[13:50:23] <gehel>	 dcausse: Oh, that's exactly what I was looking for
[13:50:40] <gehel>	 ejoseph: can you confirm (or not) what staging profiles you have?
[15:58:57] <ebernhardson>	 \o
[16:03:05] <zpapierski>	 o/
[16:13:40] <dcausse>	 o/
[16:15:43] * ebernhardson wonders if he can con bryan into running a pypi repo :)
[16:15:57] * ebernhardson should probably meet them first
[16:16:50] * zpapierski reminds ebernhardson that Bryan is on this channel and can see everything he writes
[16:16:54] * zpapierski :P
[16:17:21] <ebernhardson>	 good, it's not supposed to be a secret :)
[16:17:29] <zpapierski>	 I know :)
[16:17:43] <zpapierski>	 we're open about our cons!
[16:23:14] <zpapierski>	 and its Brian, sorry
[16:27:59] <zpapierski>	 ejoseph: still here and in need of assistance?
[16:38:30] * ebernhardson wonders if there is some "trivial" pypi repo that can serve it's files out of S3 (via swift's s3 compat)
[16:42:14] <gehel>	 let's migrate from Archiva to Nexus! And get support for all those nice repo formats out of the box.
[16:42:47] <ebernhardson>	 hmm, maybe? That sounds like more work, it has use cases to not break :P
[16:43:00] <ebernhardson>	 but not too many people use archiva, and it's not great software
[16:43:40] <gehel>	 honestly, I'm not sure that Nexus is a ton better than Archiva for our use case
[16:44:11] <zpapierski>	 it's most definitely more well-known
[16:45:51] <zpapierski>	 why we went for archiva instead of nexus or artifactory?
[16:46:30] <majavah>	 https://wikitech.wikimedia.org/wiki/Archiva says "Archiva was chosen over other Maven Repository Managers because it supports all of the features we need, and is 100% open source"
[16:46:30] <gehel>	 not sure. long time ago (before I joined) and whatever constraints we had at that time probably don't apply anymore
[16:46:35] <ebernhardson>	 unknown, archiva's been there since the first day i did java here :)
[16:46:56] <zpapierski>	 majavah: thanks for the link!
[16:49:27] <zpapierski>	 comparision link is probably long dead, but I'm guessing this being and Apache project, and not an open source released on open core principles mattered
[16:50:09] <zpapierski>	 still, having a pypi (and when if we want to be cool, RubyGems) support would be super nice 
[16:51:59] <gehel>	 We could (should?) go the other way around, and only depend on Maven Central / Pypi / ... ?
[16:52:54] <zpapierski>	 including are custom built blazegraph packages?
[16:53:18] <zpapierski>	 do we actually pushed them to MC?
[16:53:24] <gehel>	 yeah, that might be a stretch. Or we could just change the groupId
[16:53:25] <ebernhardson>	 i know lots of people publish garbage software to public repos on a daily basis, but i wanted to keep my garbage libraries separate :P
[16:53:33] <ryankemper>	 :P
[16:53:39] <ebernhardson>	 i feel like it has to actually be useful to publish to a real repo :P
[16:53:59] <ryankemper>	 If everyone had that attitude, we'd never have systemd! /s
[16:54:23] <ebernhardson>	 or i guess in a more nice fashion, when i make custom wmf hacks for software only we run, i don't feel bad because while it's open source we aren't publishing it for other people to run :)
[16:54:56] <zpapierski>	 ryankemper: on the other hand, If everyone had that attitude, we'd never have systemd! 
[16:55:12] * gehel is thinking that we should externalize that problem. Managing Maven repo does not seem like a good use of our time
[16:55:59] <ebernhardson>	 on the one hand, yes. But on the other hand i've been asking now and again for a pypi repo for 4+ years :P
[16:56:31] <ebernhardson>	 experience says the better way is start running it, get other people to use it, then complain about maintenance burden until we pass it off to a finally identified long term owner
[16:56:39] <ebernhardson>	 but, not working with archiva :P
[16:56:45] <gehel>	 :)
[16:56:49] <zpapierski>	 I'm with ebernhardson here - it doesn't seem to be possible to get anyone else interested :)
[16:58:34] <zpapierski>	 dcausse: I still need to go through to understand last bit of a code (MultiSync related changes), but I promise I'll finally reply on that CR tomorrow, sorry it takes so long
[16:58:50] <dcausse>	 zpapierski: np!
[16:58:52] <gehel>	 dinner time, back later
[16:59:31] <zpapierski>	 japanese time, won't be back  later
[18:01:04] <ebernhardson>	 hmm, i wonder if bulk daemons should be writing to cloudelastic too. We haven't in the past, but if we are going to ask teams to use cloudelastic for testing with prod data i guess we need the rest of the prod data there
[18:02:06] <ebernhardson>	 the annoying thing is we set it up as a consumer per cluster, so we would need a search-loader instance for cloudelastic, or rework puppet to install multiple consumers per instance
[18:10:35] <dcausse>	 can cloudelastic read kafka-main?
[18:10:53] <ebernhardson>	 we don't run the daemons on elastic instances anymore, we use ganetti
[18:11:10] <ebernhardson>	 so a ganetti instance would have to talk to kafka and cloudelastic, should be fine
[18:11:16] <dcausse>	 oh indeed forgot I about that
[18:11:55] <ebernhardson>	 can we run containers in prod yet? I suppose not necessary but the daemon doesn't need a whole instance, and for some reason i imagine deploying a container is easier than spinning a ganetti instance
[18:12:34] <dcausse>	 you mean k8s containers?
[18:13:06] <ebernhardson>	 well, no because k8s is complicated and seems harder than ganetti :P  but i guess thats what we have
[18:13:09] <ebernhardson>	 so i imagine wrong :P
[18:15:18] <dcausse>	 k8s seems harder because we've already done the puppet work for genetti I guess
[19:02:57] <ryankemper>	 we're presumably going to be moving more and more to k8s in the long run anyway
[19:03:50] <ryankemper>	 so if we're going to get the bulk daemons working for cloudelastic i'd lean towards figuring it out in k8s
[19:23:05] <ottomata>	 +1 ^ :)
[19:32:02] <gehel>	 ebernhardson: meeting? https://meet.google.com/vpb-qcut-qbb
[19:34:30] <dcausse>	 dinner
[19:41:28] <ebernhardson>	 doh! sec
[20:04:46] <ryankemper>	 ebernhardson: we should probably push whatever buttons are necessary to start the codfw->swift index upload
[20:04:47] <ebernhardson>	 hmm, something like 10% of pages sent from imagerec to codfw failed as missing
[20:04:55] <ebernhardson>	 ryankemper: it's running now
[20:05:10] <ryankemper>	 oh, nice
[20:05:24] * ryankemper should have checked SAL
[20:05:29] <ebernhardson>	 sadly the only status report we get is a per-shard finished marker, and none have finished yet :P but it's been running 2.4 hours
[20:07:13] <ryankemper>	 where can I see that status from?
[20:09:28] <ebernhardson>	 actually 10% above is wrong metric. 10% of updated pages did fail as missing, but updated pages was the wrong total metric. Lots of pages were nop's because the recommendation is unchanged.  Overall for codfw 1.56M nop's, 75k pages updated, 4k pages missing
[20:09:50] <ebernhardson>	 i guess that's fine and 4k out of 1.5M doesn't need to be looked at much
[20:57:04] <ebernhardson>	 meh, did a test run of dropping the old image recommendations, it seems to be ignoring the ignore list and trying to delete everything :( always fun :P
[21:18:42] <ebernhardson>	 just missing an int conversion, now it only wants to do 80k pages per cluster instead of 1M, letting it run. 
[22:35:26] <ebernhardson>	 still no shards complete for codfw -> swift snapshot, still seeing the elevated network trafic though so it's probably running
[22:48:44] <ebernhardson>	 i was hoping to start the catchup routine today, but it looks like with the current throttling not going to make it in time. Will probable be able to start the restore and setup the aliases though