FULLY OPENFRONTIERAI RESEARCH LAB

Join 8000+ AI researchers, professionals
and AI startups who upskill their workforce.

"The easy SaaS wave is over;
The next phase of wealth creation belongs to those with fundamental, systems-level AI expertise."

Cursor AI

AI coding tool,started by MIT students. Valued at $30 Billion, ARR $1 Billion

Decart AI

Realtime video generation model, each frame generated under 40ms, valued at $3 Billion USD, ARR estimated to be $100 Million.

Lovable AI

AI startup focussing on building full stack webapp with AI,Valued at $6.6 Billion, ARR of $200 Million.

Runway ML

Runway ML is a generative video startup, started with 4 students and currently valued at $3 Billion,with annual ARR of $300 Million

InVideo AI

Creates video for social media, including posts,ads,promotions. valued at $500 Million. ARR at $70 Million

Cursor AI

AI coding tool,started by MIT students. Valued at $30 Billion, ARR $1 Billion

Decart AI

Realtime video generation model, each frame generated under 40ms, valued at $3 Billion USD, ARR estimated to be $100 Million.

Lovable AI

AI startup focussing on building full stack webapp with AI,Valued at $6.6 Billion, ARR of $200 Million.

Runway ML

Runway ML is a generative video startup, started with 4 students and currently valued at $3 Billion,with annual ARR of $300 Million

InVideo AI

Creates video for social media, including posts,ads,promotions. valued at $500 Million. ARR at $70 Million

Introducing Rubin Series Models

Merging American and Chinese AI models into one,
we created Rubin.
Strong Performance with Efficient Inference and Training,
outpacing Earlier Vanilla Transfomer model by 52%

MODELS

Rubin Squirrel 0.5B

A lightweight 500 million parameter model built for research and education. Ideal for learning the fundamentals of training, fine-tuning, and deployment, it provides fast iteration speed and serves as a perfect entry point for exploring transformer-based architectures without heavy compute requirements.

Rubin Omni 2B

A medium-scale foundation model trained with pretraining, supervised fine-tuning (SFT), and GRPO (Generative Reinforcement with Preference Optimization). Designed for practical applications, it offers strong generalization while remaining efficient, making it a balanced choice for academic and applied research.

Rubin Omega 8B

A large-scale model featuring Latent Head Attention, enabling improved representation learning and context management. This scaled version is optimized for both reasoning and generation tasks, delivering robust performance in complex workloads while remaining manageable on modern GPU clusters.

Rubin Dragon 24B [8x 3B]

Our flagship model built with a Mixture of Experts (MoE) architecture combined with the Deepseek MLA and DSA. leading to more coherent, efficient, and explainable outputs. It scales to enterprise-grade workloads, supporting advanced research, production deployments, and cutting-edge applications.

Specs	Rubin 500M	Rubin 2B	Rubin 8B	Rubin 24B
Parameters	500M	2B	8B	24B
Training Tokens	26B	800B	1.6T	2.7T
Training Code	✔	✔	✔	✔
Inference Code	✔	✔	✔	✔
Inference Mode	KV Cache	KV Cache + Sink	KV Cache	KV Cache
Optimizer	muon with adamw	muon with adamw	muon with adamw	custom muon with adamw
Lr scheduler	trapezoidal with cosine annealing	trapezoidal	warmup with cosine annealing	cosine annealing
Attention	Grouped Query Attention	Grouped Query Attention	Grouped Query Attention with LHA	Sliding Window with MoE & LHA
Positional Encoding	Standard ROPE	YaRN ROPE	YaRN ROPE	YaRN ROPE
Normalisation	RMS	RMS	RMS	RMS
Model Weights	available	available	to be released soon	to be released soon
GPU used	1 x H100SXM	8 x H200 SXM	4 x B200	32 x B200
end to end training	✔	✔	✔	✔
PURPOSE	Research	PRODUCTION	Production	PRODUCTION
local inference	✔	✔	—	—
Model type	base model	base model + SFT + GRPO	instruction model [SFT]	to be released soon

End to End LLM training bootcamp,
designed to foster production ready
large language models

Welcome to age of intelligence,
your ability to create intelligent digital beings start with brainoidlabs

Try our models Join today

COMPANYCAREERSSUPPORTTERMSPRIVACY POLICYCONTACT US

BRAINOID LABS

FULLY OPENFRONTIERAI RESEARCH LAB

Join 8000+ AI researchers, professionals
and AI startups who upskill their workforce.

"The easy SaaS wave is over;
The next phase of wealth creation belongs to those with fundamental, systems-level AI expertise."

Cursor AI

AI coding tool,started by MIT students. Valued at $30 Billion, ARR $1 Billion

Decart AI

Realtime video generation model, each frame generated under 40ms, valued at $3 Billion USD, ARR estimated to be $100 Million.

Lovable AI

AI startup focussing on building full stack webapp with AI,Valued at $6.6 Billion, ARR of $200 Million.

Runway ML

Runway ML is a generative video startup, started with 4 students and currently valued at $3 Billion,with annual ARR of $300 Million

InVideo AI

Creates video for social media, including posts,ads,promotions. valued at $500 Million. ARR at $70 Million

Cursor AI

AI coding tool,started by MIT students. Valued at $30 Billion, ARR $1 Billion

Decart AI

Realtime video generation model, each frame generated under 40ms, valued at $3 Billion USD, ARR estimated to be $100 Million.

Lovable AI

AI startup focussing on building full stack webapp with AI,Valued at $6.6 Billion, ARR of $200 Million.

Runway ML

Runway ML is a generative video startup, started with 4 students and currently valued at $3 Billion,with annual ARR of $300 Million

InVideo AI

Creates video for social media, including posts,ads,promotions. valued at $500 Million. ARR at $70 Million

Introducing Rubin Series Models

Merging American and Chinese AI models into one,
we created Rubin.
Strong Performance with Efficient Inference and Training,
outpacing Earlier Vanilla Transfomer model by 52%

MODELS

Rubin Squirrel 0.5B

Rubin Omni 2B

Rubin Omega 8B

Rubin Dragon 24B [8x 3B]

Specs	Rubin 500M	Rubin 2B	Rubin 8B	Rubin 24B
Parameters	500M	2B	8B	24B
Training Tokens	26B	800B	1.6T	2.7T
Training Code	✔	✔	✔	✔
Inference Code	✔	✔	✔	✔
Inference Mode	KV Cache	KV Cache + Sink	KV Cache	KV Cache
Optimizer	muon with adamw	muon with adamw	muon with adamw	custom muon with adamw
Lr scheduler	trapezoidal with cosine annealing	trapezoidal	warmup with cosine annealing	cosine annealing
Attention	Grouped Query Attention	Grouped Query Attention	Grouped Query Attention with LHA	Sliding Window with MoE & LHA
Positional Encoding	Standard ROPE	YaRN ROPE	YaRN ROPE	YaRN ROPE
Normalisation	RMS	RMS	RMS	RMS
Model Weights	available	available	to be released soon	to be released soon
GPU used	1 x H100SXM	8 x H200 SXM	4 x B200	32 x B200
end to end training	✔	✔	✔	✔
PURPOSE	Research	PRODUCTION	Production	PRODUCTION
local inference	✔	✔	—	—
Model type	base model	base model + SFT + GRPO	instruction model [SFT]	to be released soon

End to End LLM training bootcamp,
designed to foster production ready
large language models

Welcome to age of intelligence,
your ability to create intelligent digital beings start with brainoidlabs

Try our models Join today

COMPANYCAREERSSUPPORTTERMSPRIVACY POLICYCONTACT US

BRAINOID LABS