About Me
I’m a cloud engineer and developer passionate about building scalable Cloud and AI infrastructure. I enjoy working with modern technologies to create efficient, production-ready solutions.
This blog is where I share my experiences and learnings in software architecture, distributed systems, and engineering leadership.
🕹️ Learned to code on a Commodore, and I’ve basically been hitting RUN ever since.
Languages / Tools Used
Programming Languages:
Development Tools:
Security & code intelligence:
Frameworks & Libraries:
Vector search & databases:
LLM inference engines:
Services Used
Cloud Platforms:
APIs & Integrations:
Social Media APIs:
Notable Projects
-
MARS - GPU-resident multimodal memory substrate for real-time embodied AI. Episode-scoped retrieval as a CUDA kernel-level primitive: 197 µs p99 at N=1M with perfect cross-modal
hit@15, 33× faster than FAISS-Flat-GPU on the same hardware. Companion paper: MARS: Episode-Scoped GPU Retrieval for Real-Time Embodied AI. -
Cognitora inference - Open-source, datacenter-scale LLM orchestration above vLLM, SGLang, TensorRT-LLM, and llama.cpp: KV-aware routing, prefill/decode disaggregation, multi-tier KV cache, static Rust binaries for bare metal, Kubernetes, or cloud.
-
VittoriaDB - Zero-configuration embedded vector database with HNSW indexing, ACID storage, and REST API. Single Go binary for local AI development.
-
DistX
- High-performance vector database written in Rust. Features HNSW indexing with SIMD optimizations, Qdrant-compatible REST API, and gRPC support.
Connect
Feel free to reach out or follow my work:
- Website: fratepietro.com · antonello.dev
- GitHub: @antonellof
- Credly: My Certifications
- Databricks: My Credentials
- X.com: ☁️ Hack the Cloud