Coding

Show HN: Needle: We Distilled Gemini Tool Calling into a 26M Model

A 26M-parameter model, Needle, distills the complexity of Gemini tool calling into a lightweight, attention-based architecture, leveraging simple attention networks and gating to achieve efficient function calling on consumer devices. By abandoning massive models and reasoning-heavy designs, Needle runs at 6000 tokens per second on prefill and 1200 tokens per second on decode, making it a promising solution for agentic experiences on budget phones and wearables.

Needle is a 26M-parameter model that distills the complexity of Gemini tool calling into a lightweight, attention-based architecture. By leveraging simple attention networks and gating, Needle achieves efficient function calling on consumer devices, running at 6000 tokens per second on prefill and 1200 tokens per second on decode.

Overview

Needle is an experimental run for single-shot function calling, geared at redefining tiny AI for consumer devices. The model is trained on a dataset of single-shot function calls and can be finetuned locally on a Mac or PC.

Architecture

The Needle model consists of a Simple Attention Network with 12 encoder layers and 8 decoder layers. The model uses cross-attention and gated residual connections to achieve efficient function calling. The entire model is just attention and gating, with no MLPs anywhere.

Usage

Needle can be used for single-shot function calling, and can be finetuned on a user's own tools. The model can be used via a web UI or through the command line. The usage examples provided in the source code include generating data via Gemini, training, evaluating, and bundling results.

Tradeoffs

While Needle beats other models such as FunctionGemma-270m, Qwen-0.6B, Granite-350m, and LFM2.5-350m on single-shot function call for personal AI, those models have more scope and capacity and excel in conversational settings. Small models can also be finicky.

When to Use It

Needle is suitable for consumer devices such as phones, watches, and glasses, where a lightweight AI model is required. The model can be used for single-shot function calling and can be finetuned on a user's own tools.

Bottom Line

Needle is a promising solution for agentic experiences on budget phones and wearables, and can be used for single-shot function calling. The model can be finetuned locally on a Mac or PC and can be used via a web UI or through the command line.

In conclusion, Needle is a lightweight AI model that can be used for single-shot function calling on consumer devices. The model can be finetuned on a user's own tools and can be used via a web UI or through the command line. While it has its limitations, Needle is a promising solution for agentic experiences on budget phones and wearables.

Practical Takeaway: Needle can be used for single-shot function calling on consumer devices, and can be finetuned on a user's own tools. The model can be used via a web UI or through the command line, and can be finetuned locally on a Mac or PC.

Sources Used: [Source Name] Tags: [

Similar Articles

More articles like this

Coding 1 min

Visual Studio Code 1.120

Visual Studio Code’s 1.120 update slashes debugging friction with native Data Breakpoints, letting engineers pause execution when specific object properties change—not just memory addresses. The release also bakes in GitHub Copilot-powered inline code completions for Python, JavaScript, and TypeScript, cutting keystrokes by up to 40% in early benchmarks, while a revamped terminal shell integration finally bridges the gap between local and remote workflows.

Coding 1 min

SQL: Incorrect by Construction

"SQL's fundamental design flaw, rooted in its reliance on string concatenation, has been quietly undermining data integrity for decades, with a recent study revealing that a staggering 70% of SQL queries contain implicit string conversions, compromising the accuracy of results and exposing databases to catastrophic errors."

Coding 1 min

Reimagining the mouse pointer for the AI era

A radical redesign of the traditional cursor is underway, as researchers propose replacing the static pointer with a dynamic, AI-driven "attention pointer" that adapts to the user's gaze and task at hand. This innovation leverages computer vision and machine learning to create a more intuitive and context-aware interaction paradigm. By decoupling the pointer from the screen, users may experience improved productivity and reduced cognitive load.

Coding 1 min

Show HN: Gigacatalyst – Extend your SaaS with an embedded AI builder

A new class of embedded AI builders is emerging, allowing SaaS companies to empower non-technical users to craft custom workflows and features through conversational interfaces, thereby bypassing traditional engineering bottlenecks and long product roadmaps. This trend is exemplified by Gigacatalyst, a platform that leverages AI to connect with a SaaS's APIs, learn its data model, and enable users to build custom features without requiring engineering expertise.

Coding 1 min

Bambu Lab is abusing the open source social contract

A prominent open-source project is quietly rebranding proprietary code as community-driven, undermining trust in the collaborative development model that has fueled innovation in software for decades. Bambu Lab's recent actions involve repackaging closed-source components as open-source modules, exploiting loopholes in licensing agreements to conceal the true nature of their codebase. This brazen move threatens to erode the social contract that underpins open-source software development.

Coding 1 min

Show HN: Statewright – Visual state machines that make AI agents reliable

"Reliability trumps scale: A new approach to AI agent design uses constrained state machines and smaller models to tackle brittle problem-solving, potentially upending the industry's reliance on massive parameter counts and longer prompts."