[13:53:32] What would you tell a new developer in the Wikimedia ecosystem (let's say a Python coder) to get them up to speed quickly? In the age of AI, how might this help create a "skill file" to give agentic coding systems better knowledge for their coding tasks? [13:53:34] More context: I was experimenting with AI to "vibe code" a Python script and was seeing the agentic coder fail badly with parsing wiki markup using its own regex patterns. I let it struggle a while before introducing mwparserfromhell, and then it was "happy" and could progress much faster. [13:53:35] That got me to thinking, a list of skills/hints/tricks would be good not just for humans wanting to do more tool-building, but as a shared skills file we could help maintain for agents in the future. Some ideas that come to mind, but would appreciate more insights: [13:53:37] 1. Use mwparserfromhell [13:53:38] 2. Templates tend to have a lot of aliases, so you should probably check for all their variants [13:53:40] 3. Renaming an article can have a dramatic effect on measuring article traffic/metrics, so take that into account [13:53:41] 4. We have a lot of useful machine learning models in the Wikimedia Lift Wing API related to article quality and topics [13:53:43] 5. Quarry and database replicas exists that is fast and doesn't require a lot of scraping/processing [13:53:44] 6. The Wikimedia APIs will throttle you so you should rate limit yourself with [15:42:04] 6. use https://codesearch.wmcloud.org/ (re @fuzheado: What would you tell a new developer in the Wikimedia ecosystem (let's say a Python coder) to get them up to speed quickly? In t...) [15:45:26] Wow! Very interesting! I did not know this existed (re @IsmaelOlea: 7. use https://codesearch.wmcloud.org/) [15:45:48] and follow UA policy (re @fuzheado: What would you tell a new developer in the Wikimedia ecosystem (let's say a Python coder) to get them up to speed quickly? In t...) [15:46:06] Does it us classical IR techniques or modern DL? (re @IsmaelOlea: 7. use https://codesearch.wmcloud.org/) [15:46:41] it uses hound. idk your abbreviations. (re @super_nabla: Does it us classical IR techniques or modern DL? I think is just IR.) [15:47:23] IR = Information retrieval: pattern matching, regex, BM25, TF-IDF. In practice surface-level search [15:47:23] DL = deep learning (re @jeremy_b: it uses hound. idk your abbreviations.) [15:47:58] IR then [15:48:22] I've been quite impressed the coding agents have been responsibly adding a good UA string, but agree that's a good thing to codify as a skill (re @jeremy_b: and follow UA policy) [15:49:36] codesearch is not the most reliable service we have and see also T241033 (re @super_nabla: IR = Information retrieval: pattern matching, regex, BM25, TF-IDF. In practice surface-level search [15:49:37] DL = deep learning) [15:50:02] Before AI you already had two groups of people: people who know how to search and people who know _how _to search [15:50:39] people who know how to write keywords [15:50:51] Or use quotes ;) [22:24:56] A possible other "skill" - consult [[en:Wikipedia:Reliable sources/Perennial sources]] when helping to research an article in order to avoid problematic citations (re @fuzheado: What would you tell a new developer in the Wikimedia ecosystem (let's say a Python coder) to get them up to speed quickly? In t...)