Background

I have a TinyPico (RP2040) laying around the office for the last 8 months. During a recent hackathon, I decided to whip it out for a weekend project and thought to myself - I bet I can make a Language Model run on this.

What I did:

Tested different hyper-parameters & parameter count (1-28k) on the Pico itself
Trained most optimized models on the TinyStories dataset
Project status: Due to memory fragmentation on RP2040, I could not fit more than 256 vocabulary on the SRAM → we move on to test with the raspberry pi zero 2w for more flexibility

Key takeaways

Based on this comprehensive development log from your Starmind-Pico project, here are the key learnings and insights:

Critical Technical Learnings

Memory Management is the Primary Constraint

Memory fragmentation is more limiting than total memory on RP2040
Safe vocabulary limit: 256 tokens maximum - attempts at 512+ tokens failed
Practical parameter limit: ~10K parameters for reliable operation
Memory allocation failures occur even before full model loading with high dimensions

Architecture Impact Hierarchy (Most to Least Critical)

Dimension Size: 40-50% speed loss per doubling - the ultimate performance killer
Layer Depth: 25-40% speed loss per additional layer