[13:43:04] anyone want to stamp https://gerrit.wikimedia.org/r/c/operations/puppet/+/1064028 ? [13:44:31] cdanis: sure... [13:46:06] thanks jesse :) [14:23:52] Hey folks! Plan of impersonating citoid but with actual knowledge of response headers has been foiled once again – it seems like I can't reach url-downloader from toolforge as I had initially hoped. It seems like we've ruled out using curl from the cluster, using proxying from the cluster, using toolforge, and so on. Is there a next option? [14:25:15] Although honestly the more I poke at the sites that are giving us trouble the more I suspect that most people aren't using IP-based blocks but are instead detecting when they're getting requests from a non-UA and that there's less merit to this approach than I initially thought [14:26:33] zip: not sure of the context here, using IP-based blocks has limitations to the smart folk would probably not rely on it [14:26:35] Is there some other path, such as vpn access or getting my home IP into an access whitelist, that would be open to me as an editing team engineer? [14:26:40] yeah, that's what I figured [14:27:15] realistically it sounds like I need to build better observability into the tool itself [14:27:22] but in our case the number of IP blocks we have are small and easy to discover. So it would be a quite effective possibly [14:27:45] zip: I would be happy to help with OpenTelemetry integration for traces, if that's something you're thinking about [14:27:45] I saw we have a good number of /23 and /24 blocks! [14:27:59] cdanis: that would be very helpful! [14:30:02] sounds like I actually have to eat my vegetables and make our tools pleasant to debug [14:30:57] I don't know enough of the context to help you with your original question, but, more vegetables also sounds good [14:31:40] in terms of reachability to urldownloder, presume this is to squid on port 8080? [14:31:48] I believe so [14:32:03] well, to summarise we have quite a portion of spaghetti here to deal with [14:32:05] it has a public IP so the traffic probably gets there from toolforge, but there are rules on the box only allowing our prod IP ranges talk to it [14:32:29] aye, my hope had been that toolforge might count as "prod" [14:32:47] no we've fairly strict rules between toolfoge/cloud services/rest of prod. [14:33:10] however as it has a public IP I think the rules blocking it are the urldownloder VM ones [14:33:34] I'm not sure who manages that service to say whether access from any given place is ok or not [14:33:51] oh it's the Infrastructure Foundations team :P [14:34:03] (i.e. me) [14:34:35] temporary permission for some troubleshooting sounds okay to me [14:34:59] (preferably on the order of hours, so we don't forget) [14:35:13] it's just a basic squid proxy right? is it that you are doing experiments and want to make requests through it so those requests hit the outside with the same IPs as other things we have in production? [14:35:23] so, we have a plugin, Citoid, that allows a user to paste a URI (or I think ISBNs and such) into a box and get a nicely formatted citation back out of it. It talks to a backend, citoid-server, that has some basic rules for generating citations but usually delegates to our instance of zotero-server, which is basically a server-ified backend to a desktop app usually used by academics to generate citations [14:35:54] this tool has hundreds of little snippets of javascript to generate citations from all sorts of providers, so it would be a major pain to replace it. it's also entirely unobservable [14:36:02] cool - I've actually used it it's quite nice [14:36:07] wasn't really sure how it worked :) [14:36:28] I believe (but am not yet sure how to confirm) that they all go via url-downloader to access the interwebs [14:36:41] they would have to [14:37:01] topranks: that's correct, although I'm hoping to make debugging specific sites in general easier rather than just doing one-offs [14:37:26] local experimentation revealed that a lot of our issues were just bouncing off cloudflare and we've got a (still in-flight, I think) request in to be recognised as an acceptable bot [14:37:36] so - you're running another instance of citoid to do some debugging/development? [14:37:47] yup [14:38:19] can you change the proxy it's configured to use? there may be another one available in cloud realm [14:38:40] that and `chromium --guest --user-agent=citoid https://...` [14:39:51] Theoretically, although I'm not sure if that's much use. [14:39:59] zip: despite being spaghetti, if zotero has reasonable call stacks, otel auto-instrumentation might help you figure things out [14:40:31] aha, I'm writing that one down, thank you [14:41:12] citoid, as it goes, does not have any plugins for doing instrumentation but _does_ have some custom code that at least manages to pick up (or generate) a trace-id and hand it downstream, and I believe put it in logs [14:41:42] that's a good start [14:42:06] https://opentelemetry.io/docs/zero-code/js/ [14:42:26] and for local dev the all-in-one jaeger quickstart is good https://www.jaegertracing.io/docs/1.60/getting-started/ [14:42:57] thanks! [14:43:25] np! happy to help more [14:43:32] I've already found the opensearch dashboard so I think I've found where it comes out the other end, unless we've got more overtly tracing tools? [14:43:41] we have trace.wikimedia.org :) [14:43:53] logs will come out in opensearch [14:44:08] (technically traces live in the opensearch index as well, but they aren't much to look at, there) [14:44:33] I did not discover that! [14:44:44] I should have asked in here weeks ago [14:44:49] it's pretty new. I keep being about 1-2 weeks away from sending an announcement [14:46:35] neat! [14:46:59] though this doesn't seem to need much authentication? [14:47:27] it uses our dev account SSO and restricts to wmf or nda [14:47:29] https://trace.wikimedia.org/trace/8f75bec12f2a718d8f639aecda48fa50 [14:47:41] sample 'interesting' trace [14:48:29] ah, I just silently was already authenticated [14:49:17] time to pull my copy of Observability Engineering off the shelf eh [14:53:36] it looks like while we are logging a request_id field it's not necessarily getting associated properly in the logs, so I expect that fixing that up is my next order of business [14:53:42] s/logs/trace [14:55:15] for traces you'll also need to propagate along along `traceparent` and `tracestate`, in addition to `x-request-id` [14:56:00] aha [14:56:45] thank you for your help today! [14:57:55] np! [14:58:08] the zero-code/auto-instrumentation stuff I mentioned earlier will also attempt to do that for you [14:58:37] as long as you're using, like, fairly common Node.js http server and client libraries, and not doing anything too weird with incoming requests like handing them off to threadpools or something [14:59:27] seems like there's a lot of reading in my future [14:59:34] yeah, I expect that's the case [15:00:03] do the 'quick start' pieces on both the things I linked before you do anything else ;) [15:00:56] sure [15:01:41] the other Zotero problem is that it just dumps raw tracebacks into the log so we don't ingest any of it [15:01:59] I'll be glad to have that nonsense fixed [15:02:51] yes please <3 [15:23:15] Is there an HTTP 2-compatible equivalent of Apache Bench available in prod? Trying to get some real numbers for our k8s service changes but we just get `426 Upgrade Required` responses. [15:25:04] James_F: `wrk` perhaps? [15:25:21] cdanis: Just found that we have siege; if that doesn't work I'll try wrk, thanks! [15:25:43] cool, I just learned about taht one [15:32:48] Seems that using POSTs is hard with either. Sigh. Something to investigate post-meetings. [15:47:16] James_F: looks like our version of wrk supports `-s` for lua scripting, and, this should get you started: https://stackoverflow.com/questions/15261612/post-request-with-wrk [15:47:27] Oh, fancy. [15:47:28] Thanks. [15:51:54] https://github.com/giltene/wrk2 might be worth checking out too [15:52:12] brett: yeah I was limiting to tools I saw already installed on deploy1002 :) [15:52:17] aha [15:52:19] er, 1003 [20:31:12] !log imported php8.1_8.1.29-1+wmf11u1 into component/php81 - T372507 [20:31:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:31:15] T372507: Prepare WMF PHP 8.1 packages for Bullseye - https://phabricator.wikimedia.org/T372507 [21:07:57] Megacli! I thought I left you far behind ;) [21:08:16] * inflatador is just realizing some of the stat hosts have HW RAID [21:27:36] megacli may haunt SREs forever and man page writers ;) [21:36:02] I'm being transported back to the fascinating world of battery learning cycles! [21:40:55] I've always been tempted to construct a haunted SRE VM for halloween, megacli, would definitely be included!! [21:45:22] Now that is a holiday tradition I can get behind [21:50:01] I think I've been pretty lucky to have had very minimal direct contact with hardware raid controllers in my career [21:50:16] on the other hand I've had above-average distributed filesystem exposure [21:51:09] you get your scars one way or the other ;)