About Me

I’m a cloud engineer and developer passionate about building scalable Cloud and AI infrastructure. I enjoy working with modern technologies to create efficient, production-ready solutions.

This blog is where I share my experiences and learnings in software architecture, distributed systems, and engineering leadership.

🕹️ Learned to code on a Commodore, and I’ve basically been hitting RUN ever since.

Languages / Tools Used

Programming Languages:

Development Tools:

Security & code intelligence:

Frameworks & Libraries:

Vector search & databases:

LLM inference engines:

Services Used

Cloud Platforms:

APIs & Integrations:

Social Media APIs:

Notable Projects

MARS - GPU-resident multimodal memory substrate for real-time embodied AI. Episode-scoped retrieval as a CUDA kernel-level primitive: 197 µs p99 at N=1M with perfect cross-modal hit@15, 33× faster than FAISS-Flat-GPU on the same hardware. Companion paper: MARS: Episode-Scoped GPU Retrieval for Real-Time Embodied AI.
Cognitora inference - Open-source, datacenter-scale LLM orchestration above vLLM, SGLang, TensorRT-LLM, and llama.cpp: KV-aware routing, prefill/decode disaggregation, multi-tier KV cache, static Rust binaries for bare metal, Kubernetes, or cloud.
VittoriaDB - Zero-configuration embedded vector database with HNSW indexing, ACID storage, and REST API. Single Go binary for local AI development.
DistX - High-performance vector database written in Rust. Features HNSW indexing with SIMD optimizations, Qdrant-compatible REST API, and gRPC support.

Connect

Feel free to reach out or follow my work:

Website: fratepietro.com · antonello.dev
GitHub: @antonellof
Credly: My Certifications
Databricks: My Credentials
X.com: ☁️ Hack the Cloud