[06:45:45] hello folks, I've just ran `apt-get clean` on cp5012 since the root partition was at 94% [06:46:02] now we are at 76%, not sure if any follow up is needed (the partition is very tiny( [07:05:58] thanks elukey [07:07:17] well the root partition is 10G, not sure if that qualifies as "very tiny" :) [07:08:59] ema: when you work on analytics systems it probably is :) [07:09:45] this is an easy way to mock people working on challenging production environments :D [07:10:30] ema: I don't want to start a religious war about partition sizes, in my experience at WMF it takes no time to fill 10G :D [07:17:09] elukey: for sure, I agree that they should be bigger (and indeed on other cache PoPs they are). The minimalist in me just objected to the "very tiny" part :) [07:19:30] see https://wikitech.wikimedia.org/wiki/Traffic_cache_hardware - eqsin and ulsfo have the legacy 'L' configuration, just a 800G disk for both the OS and the persistent cache [07:20:01] of which ~ 10G are used for the OS and the rest for the on-disk ats-be cache [07:21:18] in other DCs instead we have 200+G disks for the OS, plus a separate large NVME (1.6T) for ATS [07:25:42] yep yep [07:35:13] interestingly there's quite some variance on how much space is occupied on different eqsin hosts [07:35:26] we go from 65% to 90%: [07:35:33] cp5001.eqsin.wmnet: /dev/md0 ext4 9.1G 5.6G 3.1G 65% / [07:35:39] cp5011.eqsin.wmnet: /dev/md0 ext4 9.1G 7.8G 926M 90% / [07:36:02] hmm that's interesting [07:36:10] considering that our production load shouldn't touch / [07:36:23] and I see that, for instance, cp5011 has 1.6G of logs, while cp5001 only ~300M [07:36:35] upload VS text? [07:37:47] yeah, there are many ats-tls 'HTTP/2 connection error' logs under /var/log/messages.1 on cp5011, the file is 359M [07:37:59] worth looking at :) [07:38:56] headers compression error [07:39:21] at quite some rate [08:44:27] 10Traffic: Low root disk space on multiple eqsin cp nodes - https://phabricator.wikimedia.org/T290305 (10ema) [08:44:36] 10Traffic: Low root disk space on multiple eqsin cp nodes - https://phabricator.wikimedia.org/T290305 (10ema) p:05Triage→03High [15:48:58] 10Traffic, 10Wikidata-Query-Service: 502 Bad Gateway on WDQS - https://phabricator.wikimedia.org/T290330 (10Bugreporter) p:05Triage→03Unbreak! [15:50:04] 10Traffic, 10Wikidata-Query-Service: 502 Bad Gateway on WDQS - https://phabricator.wikimedia.org/T290330 (10Bugreporter) [15:50:06] 10Traffic, 10Wikidata-Query-Service: 502 Bad Gateway on WDQS - https://phabricator.wikimedia.org/T290330 (10Ladsgroup) p:05Unbreak!→03Triage Your port is wrong. Right? The URL works for me. [15:51:05] 10Traffic, 10Wikidata-Query-Service: 502 Bad Gateway on WDQS - https://phabricator.wikimedia.org/T290330 (10Ladsgroup) ` amsa@C382:~$ curl --resolve query.wikidata.org:443 "https://query.wikidata.org/sparql?query=prefix%20schema:%20%3Chttp://schema.org/%3E%20SELECT%20*%20WHERE%20%7B%3Chttp://www.wikidata.org%3... [15:51:51] 10Traffic, 10Wikidata-Query-Service: 502 Bad Gateway on WDQS - https://phabricator.wikimedia.org/T290330 (10Bugreporter) I can confirm via simply viewing https://query.wikidata.org/sparql?query=prefix%20schema:%20%3Chttp://schema.org/%3E%20SELECT%20*%20WHERE%20%7B%3Chttp://www.wikidata.org%3E%20schema:dateModi... [15:52:23] 10Traffic, 10Wikidata-Query-Service: 502 Bad Gateway on WDQS - https://phabricator.wikimedia.org/T290330 (10Bugreporter) >>! In T290330#7331145, @Ladsgroup wrote: > ` > amsa@C382:~$ curl --resolve query.wikidata.org:443 "https://query.wikidata.org/sparql?query=prefix%20schema:%20%3Chttp://schema.org/%3E%20SELE... [15:54:43] 10Traffic, 10Wikidata-Query-Service: 502 Bad Gateway on WDQS - https://phabricator.wikimedia.org/T290330 (10Bugreporter) Note there are no problem to just access to https://query.wikidata.org/, it only return 502 when a query is executed. [15:58:03] 10Traffic, 10Wikidata-Query-Service: 502 Bad Gateway on WDQS on ulsfo - https://phabricator.wikimedia.org/T290330 (10Ladsgroup) [15:58:41] someone is saying wdqs is not accessible through ulsfo ^ bblack [15:58:43] 10Traffic, 10Wikidata-Query-Service: 502 Bad Gateway on WDQS on ulsfo - https://phabricator.wikimedia.org/T290330 (10Gehel) [15:59:34] Amir1: dunno if traffic related, we're getting some backend alerts regarding wdqs on -operations [15:59:40] hmm, it seems it's codfw only [15:59:46] vgutierrez: noted, sorry. [16:00:43] vgutierrez, Amir1 we're sending an email right now to the wikidata ML. We do have instabilities in WDQS codfw, and the root cause isn't clear yet. [16:00:58] thx gehel [16:01:04] thanks! [19:42:54] 10Traffic, 10Platform Engineering, 10Wikimedia-production-error: Wikimedia\Assert\PostconditionException: Postcondition failed: makeTitleSafe() should always return a Title for the text returned by getRootText(). - https://phabricator.wikimedia.org/T290194 (10Jdlrobson) p:05Medium→03Low ? Looking closel...