Experiment - Case Study
LiteSpark is a privacy-first AI workspace that runs entirely in your browser. By leveraging the power of WebGPU and WASM, It can run cool LLM models completely offline with no installations and setup. Just load page, load model and go.
View Project
This was a project that i started with the hope that i can build a complete local AI toolkit in the browser that can run offline, with zero installations. You can just cache some models once and run it completely locally. Little did i know just how much pain this would be.
| Layer | Technology | Purpose |
|---|---|---|
| Runtime/Build | Bun | High-speed JavaScript toolkit for dependency management and bundling. |
| Local Inference | Transformers.js | Enables running AI models to run via WebGPU & WASM. |
| Database | PGlite | A full Postgres database running entirely in the browser for local persistence. Also planned to integrate vectordb with pgvector |
| Frontend | React 19 | Utilizing the latest React features with TanStack Router for SPA navigation. |
The initial way was to to try to make llama.cpp binding work in the browser. This sounded like great way to go because, Llama.cpp is the most popular inference engine out there and gguf models are super portable and easy to quantize - run with lower VRAM (super important for web-browsers). I found perfect project for this - Wllama
However at the time Wllama did not support webgpu, which means you models will be running completely on CPU and RAM. CPU is not designed to run highly parallelized compulations(ML Model matrix operations) like the way GPUs are. So this was a deal breaker.
Google recently rebranded tensorflow as LiteRT with teh lauch of Gemma 4 model family and with that a huge push towards Edge LM Inference. While Tensorflow was never a first choice for LMs when compared to Pytorch, it was somewhat okay.
Here the problems was that TensorFlow.js was being discontinued and JS bindings fro LiteRT was not yet out. So that was out of the table.
The Open Neural Network Exchange(ONNX) is an open-source artificial intelligence ecosystem of technology companies and research organizations that establish open standards for representing machine learning algorithms and software tools to enable a standard format for representing machine learning models.
Transformer.js as builtin support for ONNX model inferencing. This seemed like huge advantage as ONNX not only work for LLM, But also for CNN models, other niche vision, audio , classifications and prediction models. This mean i could integrate a whole ecosystem of Models to create a comprehensive Tool that can do more than just chatting to llm models
Things like Voice support, Live vision and even agents.
I quickly realised is that transformer.js with ONNX models is not super straight forward. Each model has slightly different implementations. So decided we can have a Adaptor factory model.. where each model family gets a custom adaptor adapting over the differences to create a unified common model interface. But this way more challenging that i realised.
The biggest problem here is each models having way different implementations of core models functionalities.
vision_encoder_q4.onnx vs vision_q4.onnx
pre_processor vs processor
vison_encoder in q4 but have text_decoder in fp16This is by far the worst problem
Gemma4ForConditionalGeneration //Gemma4
MultiModalityCausalLM //Janus
AutoModelForCausalLM //Llama - LFM
Qwen3_5ForConditionalGeneration //Qwen3.5
AutoModelForImageTextToText // Gemma 3
Note that these are model loader.... vastly different.
Each of the model have way diffrent implementations of Images and MultiModality as well
And some of these are in different versions of Transformer js