[12:25:55] <vgutierrez>	 volans, Amir1 retrying IPIP encapsulation on ncredir@ulsfo
[12:27:15] <volans>	 vgutierrez: ack, sending all pages to you then :D
[12:27:29] <vgutierrez>	 https://www.irccloud.com/pastebin/0tDRFa9r/
[12:27:33] <vgutierrez>	 seems to be happy :)
[12:27:48] <volans>	 yay
[12:27:48] <vgutierrez>	 port 443 too
[12:27:55] <vgutierrez>	 🍾
[12:29:15] <Amir1>	 awesome
[12:32:28] <vgutierrez>	 https://grafana.wikimedia.org/goto/YvrDD0NSk?orgId=1 --> ipip-multiqueue-optimizer metrics are being scrapped as expected
[12:32:34] <vgutierrez>	 I'll create a dashboard after lunch
[12:32:39] <vgutierrez>	 tcp_mss_clamper aren't there for some reason
[12:32:45] <volans>	 ok
[12:33:31] <vgutierrez>	 tcp_mss_clamper_packets_total{interface="ens13",state="clamped"} 2082
[12:33:51] <vgutierrez>	 but curl seems to report them properly
[12:45:43] <topranks>	 Folks I made an error in Netbox a moment ago - I am currently in the process of restoring from DB.  Can I ask that anyone doing any work that needs to modify netbox to pause for the next few mins?
[12:48:29] <moritzm>	 ok
[13:32:29] <topranks>	 Netbox should now be back up and working - apologies for the interruption 
[14:42:34] <taavi>	 jbond: https://gerrit.wikimedia.org/r/c/operations/puppet/+/765257 broke puppet on some cloud vps vms, fix is https://gerrit.wikimedia.org/r/c/operations/puppet/+/978611/
[14:49:52] <jbond>	 taavi: sorry about that thanks
[16:30:43] <cdanis>	 https://labs.ripe.net/author/emileaben/does-the-internet-route-around-damage-edition-2023/
[16:52:26] <inflatador>	 if anyone has time for a quick +1, this is just move hosts out of insetup https://gerrit.wikimedia.org/r/c/operations/puppet/+/978634
[16:54:15] <btullis>	 inflatador: Done.
[16:54:42] <inflatador>	 btullis excellent! Once again I am in your debt ;)
[16:59:51] <btullis>	 A pleasure.
[17:16:01] <topranks>	 cdanis: that's a good read thanks :)
[17:48:06] <Krinkle>	 godog: If I understood correctly, we do still retain Prometheus metrics in Thanos for more than 1 year, but only at longer intervals (e.g. 5min, 1h, instead of 1m). How do I query these in Grafana?  I can't seem to make them show up in https://grafana-rw.wikimedia.org/d/000000066/resourceloader?orgId=1&from=1663232400000&to=1673643600000&viewPanel=39&forceLogin&editPanel=39 
[17:48:24] <Krinkle>	 the query uses irate [5m] currently, although if I change it to 1h it similarly ends 1 year ago
[17:48:35] <Krinkle>	 ref https://wikitech.wikimedia.org/wiki/Prometheus#FAQ
[17:52:00] <Krinkle>	 My understanding of Prometheus/Thanos is that I wouldn't need to change the query in this way, but it's something random I tried in case it mattered.
[18:50:16] <ebernhardson>	 Krinkle: i suspect we shortened the retention in https://phabricator.wikimedia.org/T311690 , profile::thanos::retention::raw in puppet looks to be set to 54w, which lines up reasonably with where your query cuts off. 
[18:51:32] <ebernhardson>	 hmm, the wiki page clearly says thats only for 1 minute retention though, so i dunnno
[19:03:58] <Krinkle>	 yeah... there's 3 separate retentions
[19:04:07] <Krinkle>	 but it seems Thanos only serves the raw one best I can tell
[19:26:39] <herron>	 Krinkle: yeah will be good to sync up w godog about this.  fwiw downsampled metrics can be seen using thanos.wm.o, and I know the grafana datasource can be configured with max_source_resolution param to get at the downsampled metrics too.  there's also a thanos query option to enable auto-downsampling.  In any event I added an "Experimental Thanos Downsample" datasource for the time being, but would be good to sync up on it
[19:43:25] <James_F>	 Hey there, is there a way to find individual Envoy logs? For T352328 we're trying to debug why Enjoy(?) doesn't like our new image version, but I can't find any errors in https://grafana.wikimedia.org/d/b1jttnFMz/envoy-telemetry-k8s for us (wikifunctions) in eqiad staging (and we're not defined for codfw staging?).
[19:43:26] <stashbot>	 T352328: Cannot deploy new Evaluator images to prod; calls fail with `upstream connect error or disconnect/reset before headers. reset reason: connection termination` - https://phabricator.wikimedia.org/T352328
[19:49:59] <taavi>	 James_F: I do see metrics for function-orchestrator and function-evaluator in eqiad/k8s-staging on that dashboard.
[19:57:59] <taavi>	 James_F: https://logstash.wikimedia.org/goto/73e57e05a25f033dc005c321929b8846 the EPIPE from orchestrator seems related?
[20:12:20] <James_F>	 taavi: Interesting, yeah, maybe?
[21:03:50] <sukhe>	 hi folks, including on-callers: Traffic has made some changes to the DNS hosts today, summarized in https://phabricator.wikimedia.org/T347054#9368653
[21:04:20] <sukhe>	 no action is required from your side but if you see any issues, please let us know. such an issue can be: failure running authdns-update, DNS failures when running a cookbook
[21:04:21] <cdanis>	 sukhe: coooool
[21:18:42] <sukhe>	 we will send out a detailed email when the changes are complete so people not on IRC are also aware. thanks!