

Welcome, friend
Hey there! :wave: I’m Ahmad, a Software Engineer with a background in Machine Learning, currently focused on Gen. AI and Large Language Models. My academic background includes dual BAs in Computer Science and Data Science, and my professional journey has taken me through innovative environments. I’m a builder at heart, whether it’s hacking an MVP together over the weekend, designing large-scale distributed systems, setting up complex home labs and network architecture, solving intricate data puzzles, or exploring the frontiers of 3D printing. When I’m not coding or tinkering with tech, you’ll find me reading a book, lifting at the gym, or drinking a cup of coffee while contemplating life and things.
Why am I here? To share this exhilarating journey of Code & Steel with you. Whether you’re here to talk Gen. AI/LLMs and Coding, explore my hardware setups, discuss the intricacies of machine learning, or share a laugh over the latest DIY disaster, I hope you leave here having learned something new.
Feel free to check out my about page , read my blogposts, and get in touch via any of the social links provided below. Welcome to my world of code, creation, and continuous learning.
Featured
-
Stop Wasting Multi-GPUs Setup—Use vLLM or ExLlamaV2 for Tensor Parallelism
Posted on
7 MinutesUse vLLM or ExLlamaV2 for Tensor Parallelism
Context: Yesterday, I watched @ThePrimeagen live stream (love his streams by the way) where he was stress testing his new Green Tinybox—a 6x RTX 4090 build. His plan was to get the LLM to send and receive concurrent messages and respond to each others, increasing the number and frequency of those messages with time, as a way to stress test those GPUs; and he was using llama.cpp for inference. The llama.cpp part got my attention, and with such a powerful setup, llama.cpp is pretty much a system crippler. Around the 26-minute mark of his stream, I commented on that, and after some back-and-forth, I figured it was best not to hijack his stream and just write this blogpost instead.
-
Antifragile AI: Harnessing Uncertainty for a Resilient Future
Posted on
5 MinutesThe Evolution from Traditional Software to AI Agentic Systems
-
Serving AI From The Basement — Part II: Agents, MoEs, Inference & More
Posted on
Last Edited — 14 MinutesUnpacking SWE Agentic Framework, MoEs, Batch Inference, and More
For about 3 weeks now, I have been working on a multi-agent system that simulates a team of Software Engineers; this system assigns projects, creates teams and adds members to them based on areas of expertise and need, and asks team members to build features, assign story points, have pair programming sessions together, etc. Started mainly for fun and exploration, however, last week the following paper was released: Agents in Software Engineering .