Needle is a 26M-parameter model that distills the complexity of Gemini tool calling into a lightweight, attention-based architecture. By leveraging simple attention networks and gating, Needle achieves efficient function calling on consumer devices, running at 6000 tokens per second on prefill and 1200 tokens per second on decode.
Overview
Needle is an experimental run for single-shot function calling, geared at redefining tiny AI for consumer devices. The model is trained on a dataset of single-shot function calls and can be finetuned locally on a Mac or PC.
Architecture
The Needle model consists of a Simple Attention Network with 12 encoder layers and 8 decoder layers. The model uses cross-attention and gated residual connections to achieve efficient function calling. The entire model is just attention and gating, with no MLPs anywhere.
Usage
Needle can be used for single-shot function calling, and can be finetuned on a user's own tools. The model can be used via a web UI or through the command line. The usage examples provided in the source code include generating data via Gemini, training, evaluating, and bundling results.
Tradeoffs
While Needle beats other models such as FunctionGemma-270m, Qwen-0.6B, Granite-350m, and LFM2.5-350m on single-shot function call for personal AI, those models have more scope and capacity and excel in conversational settings. Small models can also be finicky.
When to Use It
Needle is suitable for consumer devices such as phones, watches, and glasses, where a lightweight AI model is required. The model can be used for single-shot function calling and can be finetuned on a user's own tools.
Bottom Line
Needle is a promising solution for agentic experiences on budget phones and wearables, and can be used for single-shot function calling. The model can be finetuned locally on a Mac or PC and can be used via a web UI or through the command line.
In conclusion, Needle is a lightweight AI model that can be used for single-shot function calling on consumer devices. The model can be finetuned on a user's own tools and can be used via a web UI or through the command line. While it has its limitations, Needle is a promising solution for agentic experiences on budget phones and wearables.
Practical Takeaway: Needle can be used for single-shot function calling on consumer devices, and can be finetuned on a user's own tools. The model can be used via a web UI or through the command line, and can be finetuned locally on a Mac or PC.
Sources Used: [Source Name] Tags: [