Background
I have a TinyPico (RP2040) laying around the office for the last 8 months. During a recent hackathon, I decided to whip it out for a weekend project and thought to myself - I bet I can make a Language Model run on this.
What I did:
- Tested different hyper-parameters & parameter count (1-28k) on the Pico itself
- Trained most optimized models on the TinyStories dataset
- Project status: Due to memory fragmentation on RP2040, I could not fit more than 256 vocabulary on the SRAM → we move on to test with the raspberry pi zero 2w for more flexibility
Key takeaways
Based on this comprehensive development log from your Starmind-Pico project, here are the key learnings and insights:
Critical Technical Learnings
Memory Management is the Primary Constraint
- Memory fragmentation is more limiting than total memory on RP2040
- Safe vocabulary limit: 256 tokens maximum - attempts at 512+ tokens failed
- Practical parameter limit: ~10K parameters for reliable operation
- Memory allocation failures occur even before full model loading with high dimensions
Architecture Impact Hierarchy (Most to Least Critical)
- Dimension Size: 40-50% speed loss per doubling - the ultimate performance killer
- Layer Depth: 25-40% speed loss per additional layer