Site Logo
Osman's Odyssey: Byte & Build
Chronicles of a Perpetual Learner

LLMs · · ·

  • So You Want to Learn LLMs? Here’s the Roadmap

    Posted on
    7 Minutes

    A Real-World, No-Bloat Guide to Building, Training, and Shipping LLMs

    Welcome to the “how do I actually learn how LLMs work” guide. If you’ve got a CS background and you’re tired of the endless machine learning prerequisites, this is for you. I built this with past me in mind, I wish I had it all drawn out like this. This roadmap should have you comfortable with building, training, and exploring and researching.

    The links at the end let you go as deep as you want. If you’re stuck, rewatch or reread. If you already know something, skip ahead. The phases are your guardrails, not handcuffs. By the end, you’ll have actually built the skills. Every resource, every project, every link is there for a reason. Use it, adapt it, and make it your own. I hope you don’t just use this as a collection of bookmarks.

    Remember, you can always use DeepResearch when you’re stuck, need something broken down to first principles, want material tailored to your level, need to identify gaps, or just want to explore deeper.

    This is blogpost #4 in my 101 Days of Blogging . If it sparks anything; ideas, questions, or critique, my DMs are open. Hope it gives you something useful to walk away with.

    The short version:

    You will:

    The approach here is simple.

    Learn by Layering: Build Intuition ➡️ Strengthen Theory ➡️ More Hands-on ➡️ Paper Deep Dives ➡️ Build Something Real.

    You’re going to use four kinds of resources:

  • Ultimate DeepResearch Prompt Builder—Template, Workflow, Pro Tips

    Posted on
    15 Minutes

    The Exact Prompt Engineering System Powering My DeepResearch Workflow

    TL;DR: I feed the template below into Gemini 2.5 Pro to build the DeepResearch prompt. Then I use the output to run DeepResearch with. You’ll find more context further down, but the main idea is simple: Just drop your core ideas between TOPIC BEGINS HERE and TOPIC ENDS HERE. The rest builds itself.

    Google just doesn’t cut it anymore: I’m the guy who wired a mini–data center into his basement. When you’ve got almost 3-dozen GPUs humming at 3 a.m. and a brain that treats half-baked ideas like Pokémon’s you gotta catch ’em all, shallow Googling just doesn’t cut it. I needed a research system that could keep up with the chaos in my head, force clarity, and let me ship faster than my cats can yank the UPS cable while livestreaming (true story ).

    DeepResearch and this framework turn my chaotic untangled thoughts into informative, in-depth, comprehensive reports. I also use it to learn anything and I have it wired into the loop of how I code with agents. Today, I’m sharing this workflow with you.

    This is blogpost #2 in my 101 Days of Blogging . If it sparks anything; ideas, questions, or critique, my DMs are open. Hope it gives you something useful to walk away with.

    I believe that if you genuinely want to move the needle, whether you’re an indie builder, a founder hunting for market clarity, or just someone tired of getting subpar answers, you need a proper system. Something structured that transforms vague curiosities into pinpoint insights, ruthlessly forces clarity, prevents endless rabbit holes, and delivers actual value (think high signal, zero noise). To me, that’s DeepResearch.

    Before DeepResearch, I’d “just check one thing” and suddenly I’m 138 tabs deep with outdated blogposts and conflicting info. DeepResearch fixed that, but only because I learned how to use it. There’s a method to the clarity.

    DeepResearch, among many things, could be:

    Traditional research often ends up vague, ambiguous, and misses key insights entirely. So I built-and obsessively iterated-a prompt framework designed explicitly to fix these problems. This approach guides both me and the AI to:

    This became the Ultimate DeepResearch Prompt Builder Template, the backbone of every serious AI-driven research I execute.

  • Just Like GPUs, We Need To Be Stress Tested: 101 Days of Blogging

    Posted on
    4 Minutes

    101 Days of Technical Blogging, Consistency, and Self-Experimentation

    Writing is how we come to understand ourselves, a gift to our future selves, a record of what once mattered. It grounds our thoughts and gives them shape.

    This one is for me. I hope you enjoy it too.

    The past few months have given me a lot to think about. Life can happen to you out of nowhere, faster than a finger snap, and you’ve only got yourself-mostly-to keep it together.

    In life, you’re either getting smarter or dumber. Stronger or weaker. More efficient or completely helpless. Subject to dependence or reliance. The latter is becoming exponentially easier, and the trend will only accelerate in the years ahead.

    “I want to live happily in a world I don’t understand.” ― Nassim Nicholas Taleb, Antifragile: Things That Gain From Disorder

    Don’t be that guy.

    Being prepared is fundamental to your survival, but not only that… Being prepared is our only duty in life: to ourselves, to our loved ones, and to everything we care about. So, I am no longer taking time for granted, and I will always be prepared.

    Actions-per-minute matter. A lot. We’re entering an era where productivity multipliers, across the board, are approaching infinity. That has to be harnessed, deliberately and fast. Or else…

    So, I’ve made a decision: I’m going to stress-test myself—across the board, for an extended amount of time. No more skipped workouts. No more pushed plans. No more dragging out already-soft deadlines. I have to show up. Fully. For all of it.

  • Private Screenshot Organizer with LMStudio (Runs Fully Local)

    Posted on
    13 Minutes

    Organize Screenshots with Local Multimodal LLMs, No Cloud Needed

    I run an AI screenshot organizer locally from my PC. I don’t want to send my screenshots anywhere on the internet; my data is mine, and sending it to any proprietary model means I am giving away the rights to it. So, I have a local VLM pipeline that organizes all my screenshots, this pipeline was previously powered by my 14x RTX 3090s Basement AI Server and now is running directly from my PC with LMStudio SDK and occupying less than 6GB of GPU VRAM.

    Recently LMStudio released their Python and Javascript SDKs . LMStudio is my go-to LLM desktop application , especially for models running directly on my PC and not AI Cluster. I have been intending to give their Python SDK a try with a small project, and the release of Gemma 3 new 4-bit quantization made me pull the trigger.

    Given that Gemma 3 is a multimodal that accepts both image and text as input (4B, 12B, and 27B; 1B is text only), and the wild size (and performance) that the QAT quantization makes the model sit at, I decided to rewrite my screenshots organizer to run directly from my PC.

    This article starts off slow, but it ramps up and gets way more interesting as we go. If you’d rather jump straight into the action, feel free to skip ahead to the Prerequisites section. And yep, this works on pretty much any image, not just screenshots.

    My Screenshots Folder with 875 Screenshots

    I hate a desktop littered with screenshots named Screenshot 2024-05-15 at 11.23.45 AM.png, Screen Shot 2024-05-16 at 9.01.12.png, or even worse, Untitled.png. The screenshots folder used to be where things went to die unless I use them right away. And then, sometimes, I find myself wondering about that one screenshot from 4 months ago!

    When Qwen2-VL came out last year, I built an asynchronous pipeline that ran on my AI cluster to automatically rename, categorize, and organize my screenshots based on their content. Given my atypical use of my AI cluster, that pipeline didn’t run frequently, and I much preferred to run it from my PC directly; but I also didn’t want to replicate the complex software configuration on my PC. You can learn more about how I use my AI cluster in this blogpost . Again, LMStudio simplifies these processes on my PC, one-stop shop for ai models kind of thing, and I already have enough headaches to add more to it; so, ultimately, I ran this pipeline from my AI cluster every few weeks once the screenshots mess bothered me enough to go around remembering how to get the pipeline up and running.

    In this post, we’ll build a practical screenshot organizer step-by-step, and in parallel we’ll get introduced to the core functionalities of the lmstudio-python library.

    We’ll create a Python script that:

  • DeepSeek R-1 671B on 14x RTX 3090s: Real-World Results & Key Takeaways

    Posted on
    2 Minutes

    How KTransformers Dominated llama.cpp in Real-World Inference

    Hello! If you don’t know me, I’m the guy with the 14x RTX 3090s Basement AI Server . Earlier this week, alongside @OsMo999 as my guest commentator, I livestreamed running DeepSeek R-1 671B-q4 using KTransformers on my AI Server and below are the key takeaways from that 6-hour session.

    I was inspired by an announcement from the KTransformers Team showcasing optimizations for running and offloading DeepSeek R-1 671B, 4-bit quantized, to the CPU. Instead of just benchmarking this on my own, I livestreamed everything from A to Z—during the livestream, I went beyond the initial plan and dived into how I use my AI server with vLLM, ExLlamaV2, and llama.cpp, and ran some comparisons.

    KTransformers boosted prompt eval speeds by ~15x over llama.cpp! This matched the benchmarks they released in their release docs. Here’s the eval data for the run at the 1:39:59 mark of the stream:

    Prompt Evaluation

    Generation Evaluation

    Funny enough, last week I wrote a blog post saying not to use llama.cpp for multi-GPU setups… and then I ended up livestreaming it this week!

    I plan to stream regularly, so let me know what you’d like to see next! Maybe you’d even like to join as a guest? 😎

    I have previously written a lot of in-depth blogposts on LLMs and AI, but never livestreamed on my own, so I’d love to hear your feedback! Drop your thoughts, ideas, or even suggestions for future AI server experiments and livestreams.

    Find me below: