<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Python on Savannah Ostrowski — Python Steering Council, CPython RM</title>
    <link>/tags/python/</link>
    <description>Recent content in Python on Savannah Ostrowski — Python Steering Council, CPython RM</description>
    <image>
      <title>Savannah Ostrowski — Python Steering Council, CPython RM</title>
      <url>/img/social-share.png</url>
      <link>/img/social-share.png</link>
    </image>
    <generator>Hugo -- gohugo.io</generator>
    <language>en-us</language>
    <lastBuildDate>Tue, 27 Jan 2026 00:00:00 +0000</lastBuildDate><atom:link href="/tags/python/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>I run a server farm in my closet (and you can too!)</title>
      <link>/posts/i-run-a-server-farm-in-my-closet/</link>
      <pubDate>Tue, 27 Jan 2026 00:00:00 +0000</pubDate>
      
      <guid>/posts/i-run-a-server-farm-in-my-closet/</guid>
      <description>One woman&amp;#39;s quest to answer the question: does JIT go brrr?</description>
      <content:encoded><![CDATA[<p>So recently, I was nerdsnipped into building a performance dashboard for the JIT by <a href="https://snarky.ca/">Brett</a> on Bluesky.</p>
<p><img loading="lazy" src="images/nerdsnipe.png" alt="Brett asking &ldquo;Is there a site anywhere tracking JIT perf and the 5%/10% targets for 3.15/3.16? I&rsquo;m curious how much farther you have before hitting those targets.&rdquo; on Bluesky."  />
</p>
<p>It&rsquo;s a seemingly basic question that we did not have an answer to, up until very recently.</p>
<p>So, I did what any sane person would do! Naturally, I set up four different runners across different architectures and operating systems, and created a website to answer the most important question: <a href="https://doesjitgobrrr.com">does JIT go brrr?</a></p>
<h2 id="building-a-server-farm-accidentally">Building a server farm (accidentally)</h2>
<p>I didn&rsquo;t <em>really</em> plan to scope creep things this much. I started with just a single Raspberry Pi running the benchmark suite nightly, around the same time I set up my RPi buildbot. But, one thing led to another, and all of a sudden I&rsquo;m running a small server farm in my closet (which isn&rsquo;t totally off brand for me as I have been known to <a href="https://savannah.dev/posts/raspberry-pi-cluster/">roll my own infra</a> from time to time!).</p>
<p>In any case, I was also motivated to set this up so that we could have a reliable pipeline set up to collect benchmark stats on consumer-grade hardware. We have access to beefy, server-grade machines via corporate sponsors, but those aren&rsquo;t necessarily representative of a basic laptop or a Raspberry Pi. I figured it could be pretty interesting to see how things were going. So I took inventory of the random machines lying around the house, and ended up with this suite of machines:</p>
<table>
  <thead>
      <tr>
          <th style="text-align: left">Runner</th>
          <th style="text-align: left">Hardware</th>
          <th style="text-align: left">OS</th>
          <th style="text-align: left">Architecture</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td style="text-align: left">blueberry</td>
          <td style="text-align: left">Raspberry Pi 5</td>
          <td style="text-align: left">Linux</td>
          <td style="text-align: left">aarch64</td>
      </tr>
      <tr>
          <td style="text-align: left">ripley</td>
          <td style="text-align: left">Intel i5</td>
          <td style="text-align: left">Ubuntu 24.04</td>
          <td style="text-align: left">x86_64</td>
      </tr>
      <tr>
          <td style="text-align: left">jones</td>
          <td style="text-align: left">M3 Pro MacBook</td>
          <td style="text-align: left">macOS</td>
          <td style="text-align: left">aarch64</td>
      </tr>
      <tr>
          <td style="text-align: left">prometheus</td>
          <td style="text-align: left">AMD Ryzen 5 3600X</td>
          <td style="text-align: left">Windows 11</td>
          <td style="text-align: left">x86_64</td>
      </tr>
  </tbody>
</table>
<p>…and before you ask, yeah, the majority of these machines have names related to the Alien franchise (I&rsquo;m a big fan!). Blueberry is just blueberry, because Pi 🥧!</p>
<h2 id="how-the-benchmarking-stack-works">How the benchmarking stack works</h2>
<p>Okay, so that&rsquo;s roughly the hardware setup but before diving into the software side of things, it helps to understand how the pieces fit together. Thankfully, this is where I could lean on existing projects like <a href="https://github.com/faster-cpython/bench_runner">bench_runner</a>.</p>
<p>bench_runner is really the tooling or framework for how this all comes together. I&rsquo;m not going to delve too deep into the setup here since it&rsquo;s well documented in the <a href="https://github.com/faster-cpython/bench_runner/blob/main/README.md">repo&rsquo;s README</a> but really this is what handles actually running the <a href="https://github.com/python/pyperformance">pyperformance</a> benchmark suite on self-hosted GitHub Actions runners. It calculates the improvements, handles all the options to enable optimization flags, generates comparison plots and result tables.</p>
<p>For context, pyperformance is the benchmark suite focused  on real-world benchmarks rather than synthetic ones. This matters because we want to measure how actual Python code performs, not just micro-optimizations that don&rsquo;t translate to real applications.</p>
<p>bench_runner also relies on your having configured a results repository, in my case: <a href="https://github.com/savannahostrowski/pyperf_bench">pyperf_bench</a>. This is where all the results, plots, stats etc. get stored after each run. Workflows are triggered from this repo, a self-hosted runner picks it up and uses bench_runner to build CPython and run benchmarks, and then results are committed back here.</p>
<h2 id="setting-up-self-hosted-runners">Setting up self-hosted runners</h2>
<p>The actual runner setup was surprisingly straightforward — GitHub&rsquo;s <a href="https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners/adding-self-hosted-runners">self-hosted runner documentation</a> walks you through it. You&rsquo;re basically downloading a runner application, configuring it with a token from your repo, and running it as a service. The runner then polls GitHub for jobs that match its labels.</p>
<p>The trickier part is getting the benchmarking environment right. Each machine needs:</p>
<ul>
<li><strong>Build dependencies</strong> for compiling CPython from source (gcc/clang, make, libssl-dev, etc.)</li>
<li><strong>LLVM</strong> installed (the JIT needs it for compilation)</li>
<li><strong>bench_runner</strong> and its dependencies</li>
<li>A system relatively free from background process noise…</li>
</ul>
<p>That last point is where things get fun. Benchmarking is all about reducing variance, and consumer hardware loves variance. You really want stable, reproducible results, which means minimizing anything that could interfere with CPU cycles while benchmarks are running.</p>
<p>Each machine had its own quirks (non-exhaustive):</p>
<p>On the Pi, thermal throttling meant I needed a better cooler. The Pi 5 starts throttling at 80°C and gets more aggressive at 85°C—when that happens, the CPU drops from 2.4GHz down to 1.5GHz, which would absolutely tank benchmark consistency. I also had to fiddle with IRQ affinity permissions since I wanted to pin benchmark processes to specific CPU cores to reduce noise.</p>
<p>Windows had a parade of permission errors…and literally everything on Windows doesn’t work the way I’d expect so that debugging was pretty unfun.</p>
<p><img loading="lazy" src="images/windows-debugging.jpg" alt="A meme of Spongbob sitting smiling and disassociating with an overlay of him screaming. The text reads &ldquo;Debugging anything Windows as a UNIX dev&rdquo;."  />
</p>
<p>And macOS? <a href="https://bsky.app/profile/savannah.dev/post/3m6e5s6ofuk2o">aargh, mediaanalysisd</a>. If you ever decommission a MacBook and decide to use it as a GitHub Actions runner, make sure you&rsquo;ve completely turned off iCloud syncing. Otherwise this friendly background daemon will come along scanning photos for AI features like face recognition and object detection, eating CPU cycles for hours and completely wrecking your benchmark stability.</p>
<p>All this to say, it&rsquo;s not too tricky but expect some friction and debugging  if you try to set one up yourself!</p>
<h2 id="the-dashboard-pipeline">The dashboard pipeline</h2>
<p>Here&rsquo;s what happens every night:</p>
<ul>
<li>Each runner runs the benchmark suite twice:  once for the interpreter, once with the JIT enabled, via a GitHub Action on a cron schedule</li>
<li>The results are compared per machine to derive the speedup</li>
<li>After all runs finish, I trigger a dashboard refresh which runs a script on another Raspberry Pi cluster (since I also self-host everything 🤠). The script pulls any new results, computes the geometric mean using pyperf, and loads everything into the database.</li>
<li>Results appear on <a href="https://doesjitgobrrr.com">doesjitgobrrr.com</a> shortly after!</li>
</ul>
<h2 id="so-what-have-we-learned">So, what have we learned?</h2>
<p>If you recall from various PyCon talks, Brandt has jokingly said the JIT was &ldquo;0% faster/slower&rdquo; for quite a while. However, things are actually coming together now! Results are quite dependent on operating system and hardware specs, but preliminary results show that the JIT is about <a href="https://docs.python.org/3.15/whatsnew/3.15.html#whatsnew315-jit">4-5% faster</a> than the interpreter on x86-64 Linux, and 7-8% faster on AArch64 macOS over the tail-calling interpreter in 3.15, as evidenced by runners sponsored by Meta.</p>
<p>However, my server farm does show more interesting results! On our <a href="https://doesjitgobrrr.com/?goals=10">nightly runs</a>  funded by yours truly, we&rsquo;re regularly seeing:</p>
<ul>
<li>~2% speedups on my Raspberry Pi 5 (blueberry) — aarch64</li>
<li>~5% speedups on my i5 running Ubuntu 24.04 (ripley) — x86_64</li>
<li>~8% speedups on my M3 Pro (jones) — aarch64</li>
<li>~15% speedups on my AMD Ryzen 5 3600X running Windows 11 (prometheus) — x86_64</li>
</ul>
<p>Beyond tracking nightly progress, the dashboard has also helped us catch performance regressions from PRs the same day they land, which feels awesome! The power of data!</p>
<p><img loading="lazy" src="images/does-jit-go-brrr.png" alt="A chart from doesjitgobrrr.com showing JIT speedups across different machines, with the highest being around 15% on Windows and the lowest around 2% on a Raspberry Pi 5."  />
</p>
<p>…and to be honest, that&rsquo;s pretty awesome if you ask me. Yes, we still have work ahead of us, but man, I&rsquo;m super proud of all the work we&rsquo;ve done over the past few months that have gotten us this far. The JIT has also become pretty community-driven at this point, and we&rsquo;re actively building up new contributors! Super exciting stuff!</p>
<h2 id="reading-materials-and-links">Reading materials and links</h2>
<ul>
<li><a href="https://doesjitgobrrr.com/">The unofficial JIT performance dashboard — does JIT go brrr?</a></li>
<li><a href="https://github.com/savannahostrowski/pyperf_bench">pyperf_bench — where all my data lives from our runs</a></li>
<li><a href="https://github.com/savannahostrowski/doesjitgobrrr">doesjitgobrrr on GitHub</a></li>
<li><a href="https://github.com/faster-cpython/bench_runner">bench_runner — for running pyperformance on GitHub Actions runners</a></li>
<li><a href="https://github.com/python/pyperformance">pyperformance — the benchmark suite</a></li>
<li><a href="https://pyperf.readthedocs.io/en/latest/">pyperf — toolkit for Python benchmarking</a></li>
<li><a href="https://peps.python.org/pep-0744/">PEP 744 — JIT Compilation</a></li>
</ul>
]]></content:encoded>
    </item>
    
    <item>
      <title>The coolest feature in Python 3.14</title>
      <link>/posts/the-coolest-feature-in-314/</link>
      <pubDate>Fri, 09 Jan 2026 00:00:00 +0000</pubDate>
      
      <guid>/posts/the-coolest-feature-in-314/</guid>
      <description>…can be used to build a zero-preparation remote debugger for Python applications running in Kubernetes and Docker containers?</description>
      <content:encoded><![CDATA[<p>As you may already know, I love containers. I think they are truly a magical and amazing piece of technology because of the freedom and ease of use they afford. However, using containers (or Kubernetes) can add complexity to your workflow. When we stick our software in a container, we naturally add a bit of a barrier between our local development environment and the software we’re building, unless of course you’re using a Dev Container in which case things get pretty meta…but I digress.</p>
<p>One particular area where using containers or Kubernetes adds friction is in debugging workflows. That FastAPI app that you’re running with Docker Compose or Kubernetes? Yeah, the debugger isn’t going to just magically connect after configuring your <code>launch.json</code> file and running it in VS Code anymore. Maybe you’ve ended up setting up some kind of sidecar service to run your debugging in your cluster. Or, maybe you’ve added <a href="https://github.com/microsoft/debugpy">debugpy</a> to your code, configured it to wait for connections, restarted  your pod, set up port-forwarding, and <em>hoped</em> your app is still in the right state by the time you connect. Oh yeah, and throw a <code>uvicorn --reload</code> in there and…aaaaaaa. Setting all of that up is a true headache and honestly, pretty peripheral to the task at hand! You were just trying to debug that app you’re building anyway!</p>
<p>So, I built a tool that I&rsquo;m calling <a href="https://github.com/savannahostrowski/debugwand"><code>debugwand</code></a> - a zero-preparation remote debugger for Python applications running in Kubernetes clusters or Docker containers. With <code>debugwand</code>, there&rsquo;s no sidecar pod, no application code changes, and virtually no setup required (okay, just a teeny tiny bit&hellip;it&rsquo;s just configuration!).</p>
<p><img loading="lazy" src="images/danger.jpg" alt="A meme of Ralph Wiggum saying (chuckles) I am in danger"  />
</p>
<blockquote>
<p><em>Containers hate to see me comin'</em></p>
</blockquote>
<p>How, you might ask? This brings me to perhaps the single coolest feature in Python 3.14  that makes this all possible (okay, okay…I’m being dramatic – this is like picking a favourite child!): <a href="https://docs.python.org/3/library/sys.html#sys.remote_exec"><code>sys.remote_exec()</code></a>. With <code>sys.remote_exec()</code>,  we can execute a Python script <em>inside another running process</em> all without restarting it. The script runs with full access to the target&rsquo;s memory, modules, and state, which is  exactly what you need to start a debug server on-demand.</p>
<h2 id="so-what-does-debugwand-do">So what does debugwand do?</h2>
<p>When you run <code>wand debug</code>, the CLI goes out and:</p>
<ol>
<li>
<p><strong>Finds your target</strong>  - For Kubernetes, it discovers pods for your service (and handles Knative, because of course you&rsquo;re using Knative). For Docker, you just point it at a container.</p>
</li>
<li>
<p><strong>Picks a process</strong> - It finds all the Python processes running in the pod/container with CPU and memory stats, so (tries to) choose the right one. Running <code>uvicorn --reload</code>? It&rsquo;ll detect that automatically too (you can always override the choice if you want to debug a different process).</p>
</li>
<li>
<p><strong>Injects <a href="https://github.com/microsoft/debugpy"><code>debugpy</code></a></strong> - Here&rsquo;s where<code> sys.remote_exec()</code> does its thing. <code>debugwand</code> writes a small script that starts a <code>debugpy</code> server and injects it into your running process. No restart. No code changes. Your app keeps serving requests like nothing happened.</p>
</li>
<li>
<p><strong>Sets up the connection</strong> - For Kubernetes, it handles port-forwarding automatically. For Docker, you just need to expose the port when you start the container.</p>
</li>
<li>
<p><strong>You connect your editor</strong> - Point VS Code (or nvim, or whatever <a href="https://microsoft.github.io/debug-adapter-protocol/">DAP</a> client you like) at <code>localhost:5679</code> (you can choose whatever port you’d like as well) and you&rsquo;re debugging. Set breakpoints, inspect variables, step through code - the works.</p>
</li>
</ol>
<p><img loading="lazy" src="images/debugwand.png" alt="Output from debugwand showing it selecting a worker process, injecting debugpy, setting up port forwarding, and handling reload mode"  />
</p>
<p>The whole flow to get everything set up should only take a few minutes to configure things and then you should be off to the races! Pretty neat, huh?</p>
<h2 id="you-can-try-it-out">You can try it out</h2>
<blockquote>
<p>A quick note: This tool is experimental and intended for local development. Enabling <code>SYS_PTRACE</code> capabilities has security implications you&rsquo;ll want to consider before using this anywhere near production (don’t do that!).</p>
</blockquote>
<p>While I built this primarily out of curiosity and for our workflow at work, you’re welcome to give this a shot! If you&rsquo;re running Python 3.14, give it a spin by <code>uv tool install debugwand</code> (see PyPI <a href="https://pypi.org/project/debugwand/">here</a>).</p>
<p>After that, you’ll need to:</p>
<ul>
<li>Ensure your cluster/container has <a href="https://github.com/savannahostrowski/debugwand/blob/main/docs/troubleshooting.md#permission-denied--cap_sys_ptrace">SYS_PTRACE kernel capabilities enabled</a>. Remember! Local development only, please!</li>
<li>Make sure you’ve port forwarded 5679 (or whatever port you want to use)</li>
<li>Set up your <a href="https://github.com/savannahostrowski/debugwand#connect-your-editor">launch.json</a> if you’re using VS Code.</li>
</ul>
<p>Then it’s just:</p>
<pre tabindex="0"><code># Kubernetes
wand debug -n my-namespace -s my-service

# Docker 
wand debug --container my-container
</code></pre><p>And then you’re off to the races! Just start your debugger and set some breakpoints!</p>
<h2 id="reading-materials-and-links">Reading materials and links</h2>
<ul>
<li><a href="https://docs.python.org/3.14/howto/remote_debugging.html#remote-debugging">Remote debugging attachment protocol</a></li>
<li><a href="https://github.com/savannahostrowski/debugwand"><code>debugwand</code> on GitHub</a></li>
<li><a href="https://pypi.org/project/debugwand/"><code>debugwand</code> on PyPI</a></li>
</ul>
]]></content:encoded>
    </item>
    
    <item>
      <title>I&#39;m on the Python Steering Council</title>
      <link>/posts/im-on-the-steering-council/</link>
      <pubDate>Wed, 17 Dec 2025 00:00:00 +0000</pubDate>
      
      <guid>/posts/im-on-the-steering-council/</guid>
      <description>I ran for a seat on the Steering Council for the 2026 term, and I was elected.</description>
      <content:encoded><![CDATA[<p>The title says it all here. I ran for a seat on the Steering Council for the 2026 term, and <a href="https://peps.python.org/pep-8107/#results">I was elected</a>…which feels pretty wild, if I&rsquo;m being entirely honest. Kind of wild in the same way that <a href="https://discuss.python.org/t/vote-to-promote-savannah-ostrowski/70302/7">becoming a core developer</a> felt wild, but even more so, since this means that the core team not only trusts me with the big shiny green merge button but also to help make good decisions about the technical direction of the language. A language used by millions of developers all around the world to build all kinds of applications, from web apps to data science analyses to machine learning models and everything in between. A language that I love and is so tremendously important to me.</p>
<p>Python was my first programming language, learned because a mentor at my first internship suggested I could script away all the button clicks I was doing in ArcGIS.</p>
<p>Python was my first and second job out of college, where I automated drone flights to try to shoot tree seeds out of pneumatic tubes for reforestation (the startup has since pivoted!) and built a data pipeline for processing geospatial data in a mapping application used by millions on their phones.</p>
<p>Python was the epicenter of my career pivot to product management, where I learned that I love working on developer tools.</p>
<p>Python was where I met my now husband, the coolest and smartest person I know.</p>
<p>Python was where I found my confidence that it was not just okay that I don&rsquo;t have a degree in engineering but actually welcomed and cool that I didn&rsquo;t.</p>
<p>Python was where I found my community and made friends that live all over the world. Under any other circumstances, I may never have met these people but I&rsquo;m so lucky that Python gave me this opportunity.</p>
<p>I guess I&rsquo;m saying all this because maybe when people hear me say that Python is really important to me, they think I mean it purely from a technical perspective. While it&rsquo;s true that I think Python is an amazing programming language, it&rsquo;s really a lot more than that to me. There&rsquo;s this swirly, amazing mess of social, technical and emotional factors that keep me coming back to Python.</p>
<p>Running for a seat on the Steering Council is honestly not something I even imagined myself doing. Earlier this year in <a href="https://www.youtube.com/watch?v=WGXXxGLBVF4">my EuroPython keynote</a>, I spent 45 minutes telling the audience that I had impostor syndrome. Okay, it wasn&rsquo;t quite that simple but really the talk was about coming from a non-traditional background and figuring out how to contribute to CPython. During the talk, I had a couple calls to action for attendees. The first was to find your community and to surround yourself with people who encourage you to do things outside your comfort zone. In this case, I am forever grateful to Pablo for nudging me over text three days before the nomination deadline with a &ldquo;hey, crazy idea, what if you ran for the steering council?&rdquo;. The second call to action was something I try to apply to many things in my life, which is to “do it, scared”. Running for a seat on the Steering Council was definitely a &ldquo;doing it scared&rdquo; moment for me.</p>
<p>All this is to say that I&rsquo;m really excited to have this opportunity over the next year or so. I had all of this in more detail in <a href="https://discuss.python.org/t/steering-council-nomination-savannah-ostrowski-2026-term/104989/2">my nomination statement</a> but thematically, my goals right now are: 1) ensure that all the right bits come together to make Python 3.15 the fastest Python ever (and lay the groundwork for future versions to be even faster), and 2) support making CPython and related projects under <a href="https://github.com/python">python/</a> the most welcoming and healthy projects in open source. That involves supporting today&rsquo;s contributors by making things like the PEP process less arduous, and extends to new contributors by understanding and addressing barriers to entry. In my mind, both of these areas are absolutely table stakes for making sure Python&rsquo;s future is as bright as can be.</p>
<p>Anyway, I just wanted to write this all down because I have feelings™️ but I&rsquo;m sure many, many other things will pop up. I&rsquo;m sure that I&rsquo;ll learn a lot from my fellow Steering Council members, the core team, and the community.</p>
<p>Thanks for reading. I love Python! Python is forever!</p>
]]></content:encoded>
    </item>
    
    <item>
      <title>What the heck is a trampoline, anyway?</title>
      <link>/posts/what-the-heck-is-a-trampoline-anyway/</link>
      <pubDate>Tue, 04 Nov 2025 00:00:00 +0000</pubDate>
      
      <guid>/posts/what-the-heck-is-a-trampoline-anyway/</guid>
      <description>You don&amp;#39;t have to be a compiler engineer to understand that trampolines are small hops to far destinations.</description>
      <content:encoded><![CDATA[<blockquote>
<p>This is a post in a series around making CPython internals more approachable. If I missed something or you’d like to request a topic, feel free to drop me a line via <a href="mailto:savannah@python.org">email</a>. You can also read other posts in the series <a href="https://savannah.dev/tags/you-dont-have-to-be-a-compiler-engineer/">here</a>.</p>
</blockquote>
<p>So, this post is going to be <del>potentially</del> probably very niche, but I just spent <em>a lot</em> of time going down a rabbit hole working on <a href="https://github.com/python/cpython/pull/140329">CPython&rsquo;s JIT LLVM 20 upgrade</a>, and I learned a thing or two and figured maybe other folks might be interested as well. I am also a firm believer in teaching to solidify your learning, so I decided to write this post. It&rsquo;ll be one part diary entry to commemorate a sort of wild debugging chain at the Paris airport (twice!!), and one part educational!</p>
<p>Cool, cool…alright, so I&rsquo;ve worked on our LLVM upgrades for the JIT for our last three version bumps. When you think about updating a dependency, you might think, &ldquo;go into some manifest file, change a 1 to a 2, and boom, you&rsquo;re done.&rdquo; However, it&rsquo;s become somewhat of a running joke amongst the team that I take on this seemingly trivial task and then end up opening a can of worms because there is almost always some catch or some weird thing that breaks.</p>
<p>This time around…there was a new worm in the can…and it was called a trampoline. But before we get into that, I want to walk you through how I stumbled upon doing this work and why it was necessary for this version of LLVM.</p>
<h2 id="but-first-more-about-stencils">But first, more about stencils</h2>
<p>If you recall from <a href="https://savannah.dev/posts/how-your-code-runs-in-a-jit-build/">this post</a>, before Python even runs with the JIT, we generate stencil files at build time using LLVM. These stencil files are really just a very long list of C functions (one per bytecode) compiled into machine code templates. The stencil file contains the raw machine code bytes for each operation, relocation information (where we need to patch things at runtime), which symbols (external functions) each stencil needs, and which stencils need trampolines.</p>
<p>Now, you might be asking, &ldquo;Savannah, I want to see a stencil file,&rdquo; and then I&rsquo;d tell you that you&rsquo;ll have to <a href="https://github.com/python/cpython/blob/main/Tools/jit/README.md">build Python with the JIT enabled</a>. Or, if you&rsquo;re less inclined, you can see <a href="https://gist.github.com/savannahostrowski/a37e1c407d8e3b3c2571bf7d24eaeb7a">this stencil for the LOAD_FAST instruction</a>. Gibberish? That&rsquo;s okay. The important part is that at runtime, we compile traces for a sequence of instructions that represent your program. For each operation, we get its precompiled stencil, copy it into executable memory, patch in the actual addresses we need, and then combine everything together into runnable machine code.</p>
<p>Now, when we generate these stencils, we use a bunch of compiler flags to tell LLVM how we want the stencils generated. You don&rsquo;t need to care too much about the specific flags or what they all mean at this point, but there are a lot of them. In a way, it&rsquo;s kind of like passing some kind of special incantation to the compiler…or at least it feels like that when things are broken and you are hitting random assertion errors in our instructions like I was.</p>
<h2 id="alright-now-lets-talk-about-upgrading-to-llvm-20">Alright, now let&rsquo;s talk about upgrading to LLVM 20!</h2>
<p>So this time, when I bumped the version, everything worked, except on <a href="https://github.com/savannahostrowski/cpython/actions/runs/18438327725/job/52537027963?pr=10#step:5:1584">x86_64 Darwin debug builds</a>!</p>
<p>Curious, right…well, there are a couple of reasons for this. For one, we were previously using a compiler flag (<code>-fno-plt</code>), which I discovered through some trial and error (and compiler warnings) isn&rsquo;t supported on macOS in LLVM 20, so that had to go. Second, our debug builds are not optimized, so code can be naturally further apart in memory than in release builds. Without optimizations, the compiler doesn&rsquo;t pack things tightly, which means our generated machine code and the runtime symbols it needs to call can end up separated by more than 2GB in memory. So basically, when I hit the assertion error saying that <code>patch_32r</code> for x86_64 was more than 2GB away, I had two options: 1) the simpler option - find the right combination of compiler flags to make it work (<code>-mcmodel=medium,large</code>; <code>-fno-pic</code> etc.) or 2) implement a trampoline (we will get back to this in a second).</p>
<p>So naturally, I tried the simplest option first. I started down this rabbit hole in earnest during a layover at 8 am in the Paris airport on my way to Spain for a team offsite, delirious and running on about 45 minutes of sleep.</p>
<p>Unfortunately, that didn&rsquo;t work!</p>
<figure>
    <img loading="lazy" src="images/tired-dw.jpg"
         alt="A meme of DW from Arthur looking extremely tired while smiling" width="600"/> 
</figure>

<blockquote>
<p><em>Live footage of me passing <code>-mcmodel=large</code> into the compiler for the fifth time, hoping it fixes all my problems</em></p>
</blockquote>
<p>So, I gave up…not really…But I did take a week break for the offsite and PyCon Spain.</p>
<p>However, on the way home from Spain, I had six whole hours in the Paris airport during my layover to get back at it. Thinking more about this problem, I remembered that <a href="https://github.com/python/cpython/pull/123872">Diego had implemented trampolines for aarch64</a> a while back. I decided to give up on compiler flags and really go down the rabbit hole. What ensued next was a long while of reading <a href="https://wiki.osdev.org/X86-64_Instruction_Encoding">x86_64 instruction encoding</a> to figure out how I could maybe handwrite x86-64 machine code byte-by-byte…a dark, dark place.</p>
<h2 id="okay-so-what-the-heck-is-a-trampoline">Okay, so what the heck is a trampoline…</h2>
<p>Okay, she&rsquo;s said the title of the blog post. We must be getting close! Yes, okay…so this is what a trampoline is - it&rsquo;s a small piece of code that acts as a bridge to another place in memory, some place we cannot directly reach<sup id="fnref:1"><a href="#fn:1" class="footnote-ref" role="doc-noteref">1</a></sup>. In this case, we need to access some symbol that&rsquo;s more than 2GB away from where we currently are in memory.</p>
<p>In the stencil snippet I linked to above, you&rsquo;ll notice some lines that say <code>patch_x86_64_trampoline</code>. Let&rsquo;s look at the actual patch instruction for the trampoline and walk through it.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-c" data-lang="c"><span style="display:flex;"><span><span style="color:#75715e">// Generate and patch x86_64 trampolines.
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span><span style="color:#66d9ef">void</span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">patch_x86_64_trampoline</span>(<span style="color:#66d9ef">unsigned</span> <span style="color:#66d9ef">char</span> <span style="color:#f92672">*</span>location, <span style="color:#66d9ef">int</span> ordinal, jit_state <span style="color:#f92672">*</span>state)
</span></span><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">uint64_t</span> value <span style="color:#f92672">=</span> (<span style="color:#66d9ef">uintptr_t</span>)symbols_map[ordinal];
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">int64_t</span> range <span style="color:#f92672">=</span> (<span style="color:#66d9ef">int64_t</span>)value <span style="color:#f92672">-</span> <span style="color:#ae81ff">4</span> <span style="color:#f92672">-</span> (<span style="color:#66d9ef">int64_t</span>)location;
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#75715e">// If we are in range of 32 signed bits, we can patch directly
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span>    <span style="color:#66d9ef">if</span> (range <span style="color:#f92672">&gt;=</span> <span style="color:#f92672">-</span>(<span style="color:#ae81ff">1LL</span> <span style="color:#f92672">&lt;&lt;</span> <span style="color:#ae81ff">31</span>) <span style="color:#f92672">&amp;&amp;</span> range <span style="color:#f92672">&lt;</span> (<span style="color:#ae81ff">1LL</span> <span style="color:#f92672">&lt;&lt;</span> <span style="color:#ae81ff">31</span>)) {
</span></span><span style="display:flex;"><span>        <span style="color:#a6e22e">patch_32r</span>(location, value <span style="color:#f92672">-</span> <span style="color:#ae81ff">4</span>);
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">return</span>;
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#75715e">// Out of range - need a trampoline
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span>    <span style="color:#66d9ef">unsigned</span> <span style="color:#66d9ef">char</span> <span style="color:#f92672">*</span>trampoline <span style="color:#f92672">=</span> <span style="color:#a6e22e">get_trampoline_slot</span>(ordinal, state);
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#75715e">/* Generate the trampoline (14 bytes, padded to 16):
</span></span></span><span style="display:flex;"><span><span style="color:#75715e">       0: ff 25 00 00 00 00    jmp *(%rip)
</span></span></span><span style="display:flex;"><span><span style="color:#75715e">       6: XX XX XX XX XX XX XX XX   (64-bit target address)
</span></span></span><span style="display:flex;"><span><span style="color:#75715e">
</span></span></span><span style="display:flex;"><span><span style="color:#75715e">       Reference: https://wiki.osdev.org/X86-64_Instruction_Encoding#FF (JMP r/m64)
</span></span></span><span style="display:flex;"><span><span style="color:#75715e">    */</span>
</span></span><span style="display:flex;"><span>    trampoline[<span style="color:#ae81ff">0</span>] <span style="color:#f92672">=</span> <span style="color:#ae81ff">0xFF</span>;
</span></span><span style="display:flex;"><span>    trampoline[<span style="color:#ae81ff">1</span>] <span style="color:#f92672">=</span> <span style="color:#ae81ff">0x25</span>;
</span></span><span style="display:flex;"><span>    <span style="color:#a6e22e">memset</span>(trampoline <span style="color:#f92672">+</span> <span style="color:#ae81ff">2</span>, <span style="color:#ae81ff">0</span>, <span style="color:#ae81ff">4</span>);
</span></span><span style="display:flex;"><span>    <span style="color:#a6e22e">memcpy</span>(trampoline <span style="color:#f92672">+</span> <span style="color:#ae81ff">6</span>, <span style="color:#f92672">&amp;</span>value, <span style="color:#ae81ff">8</span>);
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#75715e">// Patch the call site to call the trampoline instead
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span>    <span style="color:#a6e22e">patch_32r</span>(location, (<span style="color:#66d9ef">uintptr_t</span>)trampoline <span style="color:#f92672">-</span> <span style="color:#ae81ff">4</span>);
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>Here&rsquo;s what&rsquo;s happening:</p>
<p>First, we get the <code>value</code> - this is the memory address of the actual symbol we need to jump to. The <code>ordinal</code> is just an index that identifies which symbol we&rsquo;re dealing with (like &ldquo;symbol #5&rdquo; or &ldquo;symbol #37&rdquo;). A symbol is an external function that the JIT-compiled code needs to call at runtime, things like <code>PyObject_GetAttr</code> (to get an attribute from a Python object) or <code>PyDict_GetItem</code> (to get an item from a dictionary). Every external function is assigned an ordinal number.</p>
<p>Then we calculate the <code>range</code>, which is the distance in memory between our current location (<code>location</code>) and where we want to reach (<code>value</code>).
Next, we check if the symbol is reachable - in this case, within 2GB in memory (this is the assertion that failed in the first place!). If it&rsquo;s reachable, we just use the standard patch function, <code>patch_32r</code>, and we&rsquo;re done.</p>
<p>If not, we need a trampoline! We call <code>get_trampoline_slot</code> to get a memory location for our trampoline.</p>
<p>Then we generate the trampoline itself; this is where I had to read some wild x86-64 encoding instructions to figure out the machine code for a jump instruction. The trampoline is just 14 bytes (padded to 16): the first 6 bytes are the jump instruction (<code>jmp *(%rip)</code>), and the next 8 bytes are the 64-bit address we&rsquo;re jumping to.</p>
<p>Finally, we patch the original call site to point to the trampoline instead of trying to reach the far-away symbol directly.</p>
<p>Let&rsquo;s go down the rabbit hole a bit more and dissect the <code>get_trampoline_slot</code> function. This is sort of clever, so bear with me.</p>
<p>The problem we&rsquo;re solving: we have a pool of trampolines (basically an array of memory slots), and we need to figure out &ldquo;for symbol X, which slot in the pool should I use?&rdquo; However, not every symbol needs a trampoline. Remember, we only need trampolines when the symbol is too far to reach.
So we can&rsquo;t just use the symbol&rsquo;s ordinal as the array index, because that would waste a ton of memory. If we have 100 symbols but only 5 need trampolines, we don&rsquo;t want to allocate 100 trampoline slots!</p>
<p>The solution is to use a bitmask to track which symbols need trampolines and calculate their slot positions on the fly. Bitmasks kind of break my brain a bit, so I&rsquo;ll try to break this down for folks who feel similarly 😭</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-c" data-lang="c"><span style="display:flex;"><span><span style="color:#75715e">// Get the trampoline memory location for a given symbol ordinal.
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span><span style="color:#66d9ef">static</span> <span style="color:#66d9ef">unsigned</span> <span style="color:#66d9ef">char</span> <span style="color:#f92672">*</span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">get_trampoline_slot</span>(<span style="color:#66d9ef">int</span> ordinal, jit_state <span style="color:#f92672">*</span>state)
</span></span><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">const</span> <span style="color:#66d9ef">uint32_t</span> symbol_mask <span style="color:#f92672">=</span> <span style="color:#ae81ff">1</span> <span style="color:#f92672">&lt;&lt;</span> (ordinal <span style="color:#f92672">%</span> <span style="color:#ae81ff">32</span>);
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">const</span> <span style="color:#66d9ef">uint32_t</span> trampoline_mask <span style="color:#f92672">=</span> state<span style="color:#f92672">-&gt;</span>trampolines.mask[ordinal <span style="color:#f92672">/</span> <span style="color:#ae81ff">32</span>];
</span></span><span style="display:flex;"><span>    <span style="color:#a6e22e">assert</span>(symbol_mask <span style="color:#f92672">&amp;</span> trampoline_mask);
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#75715e">// Count the number of set bits in the trampoline mask lower than ordinal
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span>    <span style="color:#66d9ef">int</span> index <span style="color:#f92672">=</span> <span style="color:#a6e22e">_Py_popcount32</span>(trampoline_mask <span style="color:#f92672">&amp;</span> (symbol_mask <span style="color:#f92672">-</span> <span style="color:#ae81ff">1</span>));
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">for</span> (<span style="color:#66d9ef">int</span> i <span style="color:#f92672">=</span> <span style="color:#ae81ff">0</span>; i <span style="color:#f92672">&lt;</span> ordinal <span style="color:#f92672">/</span> <span style="color:#ae81ff">32</span>; i<span style="color:#f92672">++</span>) {
</span></span><span style="display:flex;"><span>        index <span style="color:#f92672">+=</span> <span style="color:#a6e22e">_Py_popcount32</span>(state<span style="color:#f92672">-&gt;</span>trampolines.mask[i]);
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">unsigned</span> <span style="color:#66d9ef">char</span> <span style="color:#f92672">*</span>trampoline <span style="color:#f92672">=</span> state<span style="color:#f92672">-&gt;</span>trampolines.mem <span style="color:#f92672">+</span> index <span style="color:#f92672">*</span> TRAMPOLINE_SIZE;
</span></span><span style="display:flex;"><span>    <span style="color:#a6e22e">assert</span>((<span style="color:#66d9ef">size_t</span>)(index <span style="color:#f92672">+</span> <span style="color:#ae81ff">1</span>) <span style="color:#f92672">*</span> TRAMPOLINE_SIZE <span style="color:#f92672">&lt;=</span> state<span style="color:#f92672">-&gt;</span>trampolines.size);
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> trampoline;
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>Here&rsquo;s the key idea: we need to answer &ldquo;how many trampolines come before this symbol?&rdquo; The answer tells us which slot in our array to use.
We store our bitmask as an array of 32-bit integers, where each bit represents whether a symbol needs a trampoline. Let&rsquo;s walk through an example. Say we have symbols 3, 7, 15, 37, and 42 that need trampolines (the only bits set to 1 in our bitmask). If we&rsquo;re looking for symbol 37&rsquo;s slot:</p>
<p><strong>Step 1: Find which bit represents our symbol</strong><br>
Symbol 37 is bit 5 in the second 32-bit chunk (37 % 32 = 5). We create a mask with just that bit set: <code>1 &lt;&lt; 5</code>.</p>
<p><strong>Step 2: Count all trampolines in earlier chunks</strong><br>
Symbol 37 is in chunk 1 (37 / 32 = 1, integer division), so we need to count all set bits in chunk 0. That&rsquo;s symbols 3, 7, and 15 - giving us 3 trampolines. We use <code>_Py_popcount32</code>, which counts set bits in a 32-bit integer.</p>
<p><strong>Step 3: Count trampolines before us in our own chunk</strong><br>
We use <code>(symbol_mask - 1)</code> to create a mask of all bits lower than ours (for bit 5, this gives us bits 0-4), then AND it with the trampoline mask to see which of those lower bits are actually set, then count them. In our example, there are no symbols between 32 and 37 that need trampolines, so this adds 0.</p>
<p><strong>Step 4: Calculate the final position</strong><br>
Symbol 37 gets index 3 (the 4th slot, since we&rsquo;re 0-indexed). We take our base memory address (<code>state-&gt;trampolines.mem</code>) and add <code>index * TRAMPOLINE_SIZE</code> to get to the right slot.</p>
<p>Break your brain a bit? Same! However, the beauty of this approach is that our trampoline pool is densely packed. We only allocate as many slots as we actually need, but we can still quickly calculate where any symbol&rsquo;s trampoline lives.</p>
<h2 id="putting-it-all-together">Putting it all together</h2>
<p>So, TL;DR, this is what’s happening:</p>
<pre tabindex="0"><code>Original code:     [call instruction] --X--&gt; [target] (too far, &gt;2GB away)

With trampoline:   [call instruction] -----&gt; [trampoline] -----&gt; [target]
                                          |                   	 |
                                          | jmp *(%rip)          |
                                          | [64-bit address] ----+
</code></pre><p>The trampoline is just a tiny piece of code that lives close enough to our call site (within 2GB) that we can reach it with a normal 32-bit relative jump, and it contains an instruction that can jump to the full 64-bit address of our actual target.</p>
<h2 id="and-thats-that">And that&rsquo;s that!</h2>
<p>If you made it this far, congrats! You now know way more about trampolines than you probably ever wanted to, and hey! You didn&rsquo;t need to be a compiler engineer to understand it (hopefully!)</p>
<blockquote>
<p>If you enjoyed this post, please consider sharing it with anyone you think might find it interesting. If you have any questions or feedback, feel free to reach out to me via <a href="mailto:savannah@python.org">email</a>.</p>
</blockquote>
<div class="footnotes" role="doc-endnotes">
<hr>
<ol>
<li id="fn:1">
<p><strong>A note on terminology</strong>: If you&rsquo;re familiar with programming language literature, you might know &ldquo;trampoline&rdquo; from its classic definition in continuation-passing style: where a function returns another function to its caller rather than calling it directly, enabling tail-call optimization without stack growth. In CPython&rsquo;s JIT context, however, we use &ldquo;trampoline&rdquo; to refer to a small piece of machine code that serves as an intermediary jump point to reach distant memory addresses that can&rsquo;t be accessed directly due to instruction set limitations.&#160;<a href="#fnref:1" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
</ol>
</div>
]]></content:encoded>
    </item>
    
    <item>
      <title>A new chapter</title>
      <link>/posts/a-new-chapter/</link>
      <pubDate>Fri, 03 Oct 2025 00:00:00 +0000</pubDate>
      
      <guid>/posts/a-new-chapter/</guid>
      <description>how i&amp;#39;m feeling now</description>
      <content:encoded><![CDATA[<blockquote>
<p><strong>TL;DR I&rsquo;m joining the engineering team at FastAPI Labs ⚡ and closing this chapter of my career in product management. Read on for more!</strong></p>
</blockquote>
<h2 id="building-for-builders">Building for builders</h2>
<p>A little over five years ago, I shared on social media that I was transitioning from engineering to product management. Since then, I&rsquo;ve had the privilege of working on developer tools at Microsoft, Docker, and most recently at Snowflake.</p>
<p>What I have loved the most about working in developer tools product management is that I got to stay technical and in the weeds, all while deeply focusing on user experience. I was able to participate in engineering conversations about design and implementation, even though I wasn&rsquo;t writing production code, because lower-level design choices directly impacted the end-user experience. Despite having a recurring existential crisis that I missed working as an engineer full-time, I continued to work as a product manager because I loved it and was good at it (and my open source work scratched the true engineering itch whenever I needed it).</p>
<p>I&rsquo;ve learned a great deal by working as a product manager, and I genuinely believe that I&rsquo;ve become a better engineer through this experience than I would have been if I had stayed an engineer in name only. There&rsquo;s also a long series of personal and professional events that followed the choice to try product management, for which I am eternally grateful. If I could go back in time, I would choose this path a thousand times over.</p>
<p>However, as time has gone on, I&rsquo;ve felt myself getting further from the parts of the role that gave me the most energy, which, as it turns out, are the engineering bits. I also always said that the title was irrelevant to me. Of course, levels (e.g., senior, staff, principal) do matter, as they help garner a certain level of respect, but titles like software engineer or product manager never held much significance for me.</p>
<p>What I know is that I want to build things - specifically, tools and experiences for builders. For me, that has always meant two things: a deep commitment to developer experience and a connection to Python and open source. The products and projects I am proudest of are those that have genuinely made developers&rsquo; lives easier, removed friction, and allowed people to spend more time creating (or, frankly, just giving them the agency to choose what they want to spend their time on). I think often when we&rsquo;re thinking about scaling quickly or signing the biggest deal, it&rsquo;s easy to lose sight of what delightful looks like. However, when we ignore too many papercuts, we create an open wound. We make our products unusable and fail to solve the problem in front of us.</p>
<p>What&rsquo;s important to me is remembering and emphasizing the humanity of what we&rsquo;re building. I want to focus on solving problems for people. I want to spend the time talking to developers about how their workflow sucks, I want to fix that tiny bug that will make them more productive, and I want to deeply understand user workflows and tooling to the point where I can anticipate what better looks like before they might be able to articulate it themselves.</p>
<p>I want to take the &ldquo;less scalable&rdquo; path if it means that I can solve the problem correctly the first time. I sincerely believe that if you build the right experience and truly solve the problem, the people (and the money) will come.</p>
<p>One key problem that still has not been truly solved for Python developers is the journey from local development environment to a deployed application (with all the bells and whistles). Developers will spend hours setting up everything locally, only to face infrastructure, cloud jargon, and difficult decisions. This is a problem I&rsquo;m all too familiar with, as both an engineer and someone who worked on tooling for Azure. This problem persists, especially in an era where an increasing number of new developers are choosing Python as their language of choice (just look at the <a href="https://lp.jetbrains.com/python-developers-survey-2024/">latest JetBrains Python Developer Survey</a>). I believe there&rsquo;s still a significant developer experience pain point here.</p>
<p>And so, this all leads me to the title of this post: in a couple days, I&rsquo;m starting a new chapter.</p>
<p><strong>On October 6th, I&rsquo;m joining the engineering team at FastAPI Labs to help build the future of deploying and managing FastAPI applications 🚀.</strong></p>
<h2 id="so-why-fastapi-labs-why-now">So why FastAPI Labs? Why now?</h2>
<p>FastAPI is now the <a href="https://www.linkedin.com/posts/fastapi_fastapi-is-now-officially-the-most-used-web-activity-7363884230432501760-g4ee?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAACZXt-0BnnBHfjfhKsX9ZzdaOv1_NmfqF00">most popular Python web framework</a>, so fixing this problem is both timely and impactful. More than that, I believe that FastAPI Labs is set up to build this the right way from the start, both in terms of working toward the right solution to the problem and the company ethos. You can read more about the FastAPI Cloud announcement <a href="https://fastapicloud.com/blog/fastapi-cloud-by-the-same-team-behind-fastapi/">here</a>, or better yet, <a href="https://fastapicloud.com/">join the waiting list</a>.</p>
<p>I first met <a href="https://github.com/tiangolo">Sebastián Ramírez (aka tiangolo)</a> on Twitter about five years ago when I was working on the Pylance language server. Even then, what stood out to me was his attention to detail, his relentless focus on developer experience, and his understanding of the importance of open source in our community. It was always clear in his work that he didn&rsquo;t just care about making something work but about making it delightful. So when Sebastián approached me about joining FastAPI Labs, it was a no-brainer. The chance to work alongside someone whose values resonate so profoundly with my own, on a problem that matters, at the right time, was an easy yes. I am absolutely stoked to be joining the team at such an exciting time!</p>
<p>I&rsquo;m also grateful that part of my role includes dedicated time to contribute to open source. I&rsquo;ll finally have the space to focus more deeply on my work on CPython, preparing for my role as <a href="https://discuss.python.org/t/welcome-the-3-16-and-3-17-release-manager-savannah-bailey/100163">Release Manager for Python 3.16/3.17</a>, advancing CPython performance work (e.g., ensuring FastAPI runs smoothly on JIT/free-threaded builds), and continuing to support FastAPI and other related projects in the ecosystem. I feel tremendously fortunate to have the opportunity to work at a company that not only understands my open source work but also celebrates it as core to its own mission. This is something I never really thought I&rsquo;d have the opportunity to do, so I&rsquo;m incredibly excited.</p>
<p>So, that&rsquo;s it. With one chapter ending, another one begins.</p>
<p><strong>Savannah 🤝🏻 FastAPI Labs</strong></p>
<blockquote>
<p>&hellip;and yes, in case you were wondering, the subheader of this blog post is indeed a <a href="https://open.spotify.com/album/3a9qH2VEsSiOZvMrjaS0Nu?si=kb4kb70JQMeFxre62IfsZA">Charli xcx reference</a>. Always.</p>
</blockquote>
]]></content:encoded>
    </item>
    
    <item>
      <title>How JIT builds of CPython actually work</title>
      <link>/posts/how-your-code-runs-in-a-jit-build/</link>
      <pubDate>Sun, 27 Jul 2025 00:00:00 +0000</pubDate>
      
      <guid>/posts/how-your-code-runs-in-a-jit-build/</guid>
      <description>You don&amp;#39;t have to be a compiler engineer to understand how your code runs in a JIT build of CPython</description>
      <content:encoded><![CDATA[<blockquote>
<p>This is a post in a series around making CPython internals more approachable. If I missed something or you’d like to request a topic, feel free to drop me a line via <a href="mailto:savannah@python.org">email</a>. You can also read other posts in the series <a href="https://savannah.dev/tags/you-dont-have-to-be-a-compiler-engineer/">here</a>.</p>
</blockquote>
<p>Ever wonder what really happens under the hood when you run your Python code? If you&rsquo;re using a JIT build of CPython, the answer may involve a few more steps than you&rsquo;d expect but thankfully, you don&rsquo;t have to be a compiler engineer to understand it.</p>
<p>Before I get into it, I want to shamelessly plug that you can help us test JIT builds of CPython pretty easily as of Python 3.14! You can now get official Python builds from <a href="https://www.python.org/downloads/">python.org</a> for both Windows and macOS that include <a href="https://docs.python.org/3.14/whatsnew/3.14.html#binary-releases-for-the-experimental-just-in-time-compiler">CPython’s experimental just-in-time (JIT) compiler built in but off by default</a>. While the JIT builds are not (yet) recommended for production use, you can enable the JIT using the <code>PYTHON_JIT=1</code> environment variable. We’d love to hear about your experience using Python with the JIT - the good, the bad, the ugly!</p>
<p>Alright, let&rsquo;s get after it.</p>
<h2 id="what-happens-when-you-execute-your-code-a-brief-overview-of-cpythons-interpreter">What happens when you execute your code: A brief overview of CPython’s interpreter</h2>
<p>&hellip;Well, before we get into what happens in a JIT build of Python, we should probably briefly talk about what happens in a &ldquo;regular&rdquo; build for anyone that isn&rsquo;t familiar with how the interpreter works, as this lays the foundation for the JIT builds later on. I think the best way to talk about this is by example. To cover this, let’s consider this very basic function:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">abs</span>(a: int, b: int) <span style="color:#f92672">-&gt;</span> int:
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">if</span> a <span style="color:#f92672">&gt;</span> b:
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">return</span> a <span style="color:#f92672">-</span> b
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> b <span style="color:#f92672">-</span> a
</span></span></code></pre></div><p>So, let&rsquo;s say you execute this with your local version of Python and boom, the code is running. But what actually happens here?</p>
<h3 id="first-your-code-is-broken-down-into-tokens">First, your code is broken down into tokens</h3>
<p>When you run this code, the first thing that happens is that Python breaks it down into tokens. Tokens are the smallest units of meaning in your code, like keywords, identifiers, literals, and operators. For example, in our function, <code>def</code>, <code>abs</code>, <code>(</code>, <code>a</code>, <code>b</code>, <code>if</code>, <code>&gt;</code>, <code>return</code>, and so on are all tokens. This process is known as lexical analysis or tokenization.
You can see what the tokens for our function look like using the <code>tokenize</code> module in the standard library. Here’s how you can do that:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">import</span> tokenize
</span></span><span style="display:flex;"><span><span style="color:#f92672">from</span> io <span style="color:#f92672">import</span> BytesIO
</span></span><span style="display:flex;"><span>source <span style="color:#f92672">=</span> <span style="color:#e6db74">b</span><span style="color:#e6db74">&#34;&#34;&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">def abs(a: int, b: int) -&gt; int:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    if a &gt; b:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        return a - b
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    return b - a
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">&#34;&#34;&#34;</span>
</span></span><span style="display:flex;"><span>tokens <span style="color:#f92672">=</span> tokenize<span style="color:#f92672">.</span>tokenize(BytesIO(source)<span style="color:#f92672">.</span>readline)
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">for</span> token <span style="color:#f92672">in</span> tokens:
</span></span><span style="display:flex;"><span>    print(token)
</span></span></code></pre></div><p>This will output a list of tokens that look something like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-plaintext" data-lang="plaintext"><span style="display:flex;"><span>TokenInfo(type=65 (ENCODING), string=&#39;utf-8&#39;, start=(0, 0), end=(0, 0), line=&#39;&#39;)
</span></span><span style="display:flex;"><span>TokenInfo(type=63 (NL), string=&#39;\n&#39;, start=(1, 0), end=(1, 1), line=&#39;\n&#39;)
</span></span><span style="display:flex;"><span>TokenInfo(type=1 (NAME), string=&#39;def&#39;, start=(2, 0), end=(2, 3), line=&#39;def abs(a: int, b: int) -&gt; int:\n&#39;)
</span></span><span style="display:flex;"><span>TokenInfo(type=1 (NAME), string=&#39;abs&#39;, start=(2, 4), end=(2, 7), line=&#39;def abs(a: int, b: int) -&gt; int:\n&#39;)
</span></span><span style="display:flex;"><span>TokenInfo(type=55 (OP), string=&#39;(&#39;, start=(2, 7), end=(2, 8), line=&#39;def abs(a: int, b: int) -&gt; int:\n&#39;)
</span></span><span style="display:flex;"><span>TokenInfo(type=1 (NAME), string=&#39;a&#39;, start=(2, 8), end=(2, 9), line=&#39;def abs(a: int, b: int) -&gt; int:\n&#39;)
</span></span><span style="display:flex;"><span>TokenInfo(type=55 (OP), string=&#39;:&#39;, start=(2, 9), end=(2, 10), line=&#39;def abs(a: int, b: int) -&gt; int:\n&#39;)
</span></span><span style="display:flex;"><span>TokenInfo(type=1 (NAME), string=&#39;int&#39;, start=(2, 11), end=(2, 14), line=&#39;def abs(a: int, b: int) -&gt; int:\n&#39;)
</span></span><span style="display:flex;"><span>TokenInfo(type=55 (OP), string=&#39;,&#39;, start=(2, 14), end=(2, 15), line=&#39;def abs(a: int, b: int) -&gt; int:\n&#39;)
</span></span><span style="display:flex;"><span>TokenInfo(type=1 (NAME), string=&#39;b&#39;, start=(2, 16), end=(2, 17), line=&#39;def abs(a: int, b: int) -&gt; int:\n&#39;)
</span></span><span style="display:flex;"><span>TokenInfo(type=55 (OP), string=&#39;:&#39;, start=(2, 17), end=(2, 18), line=&#39;def abs(a: int, b: int) -&gt; int:\n&#39;)
</span></span><span style="display:flex;"><span>TokenInfo(type=1 (NAME), string=&#39;int&#39;, start=(2, 19), end=(2, 22), line=&#39;def abs(a: int, b: int) -&gt; int:\n&#39;)
</span></span><span style="display:flex;"><span>TokenInfo(type=55 (OP), string=&#39;)&#39;, start=(2, 22), end=(2, 23), line=&#39;def abs(a: int, b: int) -&gt; int:\n&#39;)
</span></span><span style="display:flex;"><span>TokenInfo(type=55 (OP), string=&#39;-&gt;&#39;, start=(2, 24), end=(2, 26), line=&#39;def abs(a: int, b: int) -&gt; int:\n&#39;)
</span></span><span style="display:flex;"><span>TokenInfo(type=1 (NAME), string=&#39;int&#39;, start=(2, 27), end=(2, 30), line=&#39;def abs(a: int, b: int) -&gt; int:\n&#39;)
</span></span><span style="display:flex;"><span>TokenInfo(type=55 (OP), string=&#39;:&#39;, start=(2, 30), end=(2, 31), line=&#39;def abs(a: int, b: int) -&gt; int:\n&#39;)
</span></span><span style="display:flex;"><span>TokenInfo(type=4 (NEWLINE), string=&#39;\n&#39;, start=(2, 31), end=(2, 32), line=&#39;def abs(a: int, b: int) -&gt; int:\n&#39;)
</span></span><span style="display:flex;"><span>TokenInfo(type=5 (INDENT), string=&#39;    &#39;, start=(3, 0), end=(3, 4), line=&#39;    if a &gt; b:\n&#39;)
</span></span><span style="display:flex;"><span>TokenInfo(type=1 (NAME), string=&#39;if&#39;, start=(3, 4), end=(3, 6), line=&#39;    if a &gt; b:\n&#39;)
</span></span><span style="display:flex;"><span>TokenInfo(type=1 (NAME), string=&#39;a&#39;, start=(3, 7), end=(3, 8), line=&#39;    if a &gt; b:\n&#39;)
</span></span><span style="display:flex;"><span>TokenInfo(type=55 (OP), string=&#39;&gt;&#39;, start=(3, 9), end=(3, 10), line=&#39;    if a &gt; b:\n&#39;)
</span></span><span style="display:flex;"><span>TokenInfo(type=1 (NAME), string=&#39;b&#39;, start=(3, 11), end=(3, 12), line=&#39;    if a &gt; b:\n&#39;)
</span></span><span style="display:flex;"><span>TokenInfo(type=55 (OP), string=&#39;:&#39;, start=(3, 12), end=(3, 13), line=&#39;    if a &gt; b:\n&#39;)
</span></span><span style="display:flex;"><span>TokenInfo(type=4 (NEWLINE), string=&#39;\n&#39;, start=(3, 13), end=(3, 14), line=&#39;    if a &gt; b:\n&#39;)
</span></span><span style="display:flex;"><span>TokenInfo(type=5 (INDENT), string=&#39;        &#39;, start=(4, 0), end=(4, 8), line=&#39;        return a - b\n&#39;)
</span></span><span style="display:flex;"><span>TokenInfo(type=1 (NAME), string=&#39;return&#39;, start=(4, 8), end=(4, 14), line=&#39;        return a - b\n&#39;)
</span></span><span style="display:flex;"><span>TokenInfo(type=1 (NAME), string=&#39;a&#39;, start=(4, 15), end=(4, 16), line=&#39;        return a - b\n&#39;)
</span></span><span style="display:flex;"><span>TokenInfo(type=55 (OP), string=&#39;-&#39;, start=(4, 17), end=(4, 18), line=&#39;        return a - b\n&#39;)
</span></span><span style="display:flex;"><span>TokenInfo(type=1 (NAME), string=&#39;b&#39;, start=(4, 19), end=(4, 20), line=&#39;        return a - b\n&#39;)
</span></span><span style="display:flex;"><span>TokenInfo(type=4 (NEWLINE), string=&#39;\n&#39;, start=(4, 20), end=(4, 21), line=&#39;        return a - b\n&#39;)
</span></span><span style="display:flex;"><span>TokenInfo(type=6 (DEDENT), string=&#39;&#39;, start=(5, 4), end=(5, 4), line=&#39;    return b - a\n&#39;)
</span></span><span style="display:flex;"><span>TokenInfo(type=1 (NAME), string=&#39;return&#39;, start=(5, 4), end=(5, 10), line=&#39;    return b - a\n&#39;)
</span></span><span style="display:flex;"><span>TokenInfo(type=1 (NAME), string=&#39;b&#39;, start=(5, 11), end=(5, 12), line=&#39;    return b - a\n&#39;)
</span></span><span style="display:flex;"><span>TokenInfo(type=55 (OP), string=&#39;-&#39;, start=(5, 13), end=(5, 14), line=&#39;    return b - a\n&#39;)
</span></span><span style="display:flex;"><span>TokenInfo(type=1 (NAME), string=&#39;a&#39;, start=(5, 15), end=(5, 16), line=&#39;    return b - a\n&#39;)
</span></span><span style="display:flex;"><span>TokenInfo(type=4 (NEWLINE), string=&#39;\n&#39;, start=(5, 16), end=(5, 17), line=&#39;    return b - a\n&#39;)
</span></span><span style="display:flex;"><span>TokenInfo(type=6 (DEDENT), string=&#39;&#39;, start=(6, 0), end=(6, 0), line=&#39;&#39;)
</span></span><span style="display:flex;"><span>TokenInfo(type=0 (ENDMARKER), string=&#39;&#39;, start=(6, 0), end=(6, 0), line=&#39;&#39;)
</span></span></code></pre></div><p>Yeah, that&rsquo;s&hellip;a lot but don&rsquo;t worry, you don&rsquo;t have to memorize it or anything. The key takeaway here is that Python has broken down your code into its smallest meaningful parts, which will be used in the next steps of execution.</p>
<h3 id="next-your-code-is-parsed">Next, your code is parsed</h3>
<p>Next, these tokens are combined to form a structure called an abstract syntax tree (AST), which is a tree-based representation of the structure of your code. The AST captures the hierarchical structure of your code, showing how different parts relate to each other. For example, in our function, the AST would show that <code>abs</code> is a function definition, <code>a</code> and <code>b</code> are parameters, and the <code>if</code> statement is a conditional that leads to different return statements.</p>
<p>It&rsquo;s also at this stage that Python checks for syntax errors. If there are any, it raises a <code>SyntaxError</code> and stops execution.</p>
<p>We can see what our simple function above&rsquo;s AST would look like using the <a href="https://docs.python.org/3/library/ast.html"><code>ast</code> module</a> in the standard library. The code looks something like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">import</span> ast
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>source <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;&#34;&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">def abs(a: int, b: int) -&gt; int:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    if a &gt; b:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        return a - b
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    return b - a
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">abs(5, 3)
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">&#34;&#34;&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>tree <span style="color:#f92672">=</span> ast<span style="color:#f92672">.</span>parse(source)
</span></span><span style="display:flex;"><span>print(ast<span style="color:#f92672">.</span>dump(tree, indent<span style="color:#f92672">=</span><span style="color:#ae81ff">4</span>))
</span></span></code></pre></div><p>&hellip;and this would return a tree like so:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-plaintext" data-lang="plaintext"><span style="display:flex;"><span>Module(
</span></span><span style="display:flex;"><span>    body=[
</span></span><span style="display:flex;"><span>        FunctionDef(
</span></span><span style="display:flex;"><span>            name=&#39;abs&#39;,
</span></span><span style="display:flex;"><span>            args=arguments(
</span></span><span style="display:flex;"><span>                args=[
</span></span><span style="display:flex;"><span>                    arg(
</span></span><span style="display:flex;"><span>                        arg=&#39;a&#39;,
</span></span><span style="display:flex;"><span>                        annotation=Name(id=&#39;int&#39;)),
</span></span><span style="display:flex;"><span>                    arg(
</span></span><span style="display:flex;"><span>                        arg=&#39;b&#39;,
</span></span><span style="display:flex;"><span>                        annotation=Name(id=&#39;int&#39;))]),
</span></span><span style="display:flex;"><span>            body=[
</span></span><span style="display:flex;"><span>                If(
</span></span><span style="display:flex;"><span>                    test=Compare(
</span></span><span style="display:flex;"><span>                        left=Name(id=&#39;a&#39;),
</span></span><span style="display:flex;"><span>                        ops=[
</span></span><span style="display:flex;"><span>                            Gt()],
</span></span><span style="display:flex;"><span>                        comparators=[
</span></span><span style="display:flex;"><span>                            Name(id=&#39;b&#39;)]),
</span></span><span style="display:flex;"><span>                    body=[
</span></span><span style="display:flex;"><span>                        Return(
</span></span><span style="display:flex;"><span>                            value=BinOp(
</span></span><span style="display:flex;"><span>                                left=Name(id=&#39;a&#39;),
</span></span><span style="display:flex;"><span>                                op=Sub(),
</span></span><span style="display:flex;"><span>                                right=Name(id=&#39;b&#39;)))]),
</span></span><span style="display:flex;"><span>                Return(
</span></span><span style="display:flex;"><span>                    value=BinOp(
</span></span><span style="display:flex;"><span>                        left=Name(id=&#39;b&#39;),
</span></span><span style="display:flex;"><span>                        op=Sub(),
</span></span><span style="display:flex;"><span>                        right=Name(id=&#39;a&#39;)))],
</span></span><span style="display:flex;"><span>            returns=Name(id=&#39;int&#39;)),
</span></span><span style="display:flex;"><span>        Expr(
</span></span><span style="display:flex;"><span>            value=Call(
</span></span><span style="display:flex;"><span>                func=Name(id=&#39;abs&#39;),
</span></span><span style="display:flex;"><span>                args=[
</span></span><span style="display:flex;"><span>                    Constant(value=5),
</span></span><span style="display:flex;"><span>                    Constant(value=3)]))])
</span></span></code></pre></div><p>Again, this is a lot of information for such a short function but what you should really glean from this is that every variable, statement, function, constant, etc. along with its relationship is represented in this tree.</p>
<h3 id="then-we-compile-to-bytecode">Then, we compile to bytecode</h3>
<p>Next, Python compiles that AST down into bytecode, which is really a lower-level, platform-independent representation of your code. This is what the CPython interpreter actually executes.</p>
<p>Just like with the AST, you can see what the bytecode representation of this function would be. We can see what this would look like for the same function we looked at earlier using the <a href="https://docs.python.org/3/library/dis.html"><code>dis</code> module</a> (aka the disassembly module) in the standard library.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">import</span> dis
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">abs</span>(a: int, b: int) <span style="color:#f92672">-&gt;</span> int:
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">if</span> a <span style="color:#f92672">&gt;</span> b:
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">return</span> a <span style="color:#f92672">-</span> b
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> b <span style="color:#f92672">-</span> a
</span></span><span style="display:flex;"><span>dis<span style="color:#f92672">.</span>dis(abs)
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-plaintext" data-lang="plaintext"><span style="display:flex;"><span>  3           RESUME                   0
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  4           LOAD_FAST_BORROW_LOAD_FAST_BORROW 1 (a, b)
</span></span><span style="display:flex;"><span>              COMPARE_OP             148 (bool(&gt;))
</span></span><span style="display:flex;"><span>              POP_JUMP_IF_FALSE        9 (to L1)
</span></span><span style="display:flex;"><span>              NOT_TAKEN
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  5           LOAD_FAST_BORROW_LOAD_FAST_BORROW 1 (a, b)
</span></span><span style="display:flex;"><span>              BINARY_OP               10 (-)
</span></span><span style="display:flex;"><span>              RETURN_VALUE
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  7   L1:     LOAD_FAST_BORROW_LOAD_FAST_BORROW 16 (b, a)
</span></span><span style="display:flex;"><span>              BINARY_OP               10 (-)
</span></span><span style="display:flex;"><span>              RETURN_VALUE
</span></span></code></pre></div><p>This might look intimidating, but it&rsquo;s just a lower-level form of your original code. Here&rsquo;s a quick mapping, removing some instructions for brevity:</p>
<table>
  <thead>
      <tr>
          <th style="text-align: left">Bytecode Instruction</th>
          <th style="text-align: left">Original Code</th>
          <th style="text-align: left">Explanation</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td style="text-align: left"><code>LOAD_FAST_BORROW_LOAD_FAST_BORROW 1 (a, b)</code></td>
          <td style="text-align: left"><code>a &gt; b</code></td>
          <td style="text-align: left">Load <code>a</code> and load <code>b</code></td>
      </tr>
      <tr>
          <td style="text-align: left"><code>COMPARE_OP 148 (bool(&gt;))</code></td>
          <td style="text-align: left"><code>a &gt; b</code></td>
          <td style="text-align: left">Compare <code>a</code> and <code>b</code> using the <code>&gt;</code> operator</td>
      </tr>
      <tr>
          <td style="text-align: left"><code>POP_JUMP_IF_FALSE 9 (to L1)</code></td>
          <td style="text-align: left"><code>if a &gt; b:</code></td>
          <td style="text-align: left">Jump to else clause if condition is false</td>
      </tr>
      <tr>
          <td style="text-align: left"><code>LOAD_FAST_BORROW_LOAD_FAST_BORROW 1 (a, b)</code></td>
          <td style="text-align: left"><code>return a - b</code></td>
          <td style="text-align: left">Load <code>a</code> and  load <code>b</code> again for subtraction</td>
      </tr>
      <tr>
          <td style="text-align: left"><code>BINARY_OP 10 (-)</code></td>
          <td style="text-align: left"><code>a - b</code></td>
          <td style="text-align: left">Subtract <code>b</code> from <code>a</code></td>
      </tr>
      <tr>
          <td style="text-align: left"><code>RETURN_VALUE</code></td>
          <td style="text-align: left"><code>return a - b</code></td>
          <td style="text-align: left">Return the result of the subtraction</td>
      </tr>
      <tr>
          <td style="text-align: left"><code>LOAD_FAST_BORROW_LOAD_FAST_BORROW 16 (b, a)</code> (at label <code>L1</code>)</td>
          <td style="text-align: left"><code>return b - a</code></td>
          <td style="text-align: left">Load <code>b</code> and load <code>a</code> for the else clause</td>
      </tr>
      <tr>
          <td style="text-align: left"><code>BINARY_OP 10 (-)</code></td>
          <td style="text-align: left"><code>b - a</code></td>
          <td style="text-align: left">Subtract <code>a</code> from <code>b</code></td>
      </tr>
      <tr>
          <td style="text-align: left"><code>RETURN_VALUE</code></td>
          <td style="text-align: left"><code>return b - a</code></td>
          <td style="text-align: left">Return the result of the subtraction</td>
      </tr>
  </tbody>
</table>
<p>This shows how Python breaks your logic into a series of simple instructions. Each instruction is a single operation that the interpreter can execute. For example, <code>LOAD_FAST_BORROW_LOAD_FAST_BORROW</code> loads the values of <code>a</code> and <code>b</code>, <code>COMPARE_OP</code> compares them, and <code>BINARY_OP</code> performs the subtraction.</p>
<p>CPython runs your code using a bytecode interpreter. It uses an internal evaluation loop, which executes one bytecode instruction at a time, dispatching to the appropriate C function that handles it. This loop also manages the Python Virtual Machine (PVM), which maintains the call stack, handles memory management, exception handling, and more.</p>
<p>There&rsquo;s more to say here about the Global Interpreter Lock, garbage collection, etc. but I&rsquo;m going to save that for another post. The key takeaway here is that the PVM executes these bytecode instructions in a loop, processing each instruction in sequence until it reaches the end of the function or encounters a return statement.</p>
<p>For all intents and purposes, your code is now running in the Python interpreter. This is how Python executes your code in a regular build of CPython.</p>
<h3 id="but-wait-theres-more-the-specializing-adaptive-interpreter">But, wait, there&rsquo;s more: The Specializing Adaptive Interpreter</h3>
<p>Since Python 3.11, we&rsquo;ve had something called the <a href="https://peps.python.org/pep-0659/">Specializing Adaptive Interpreter</a> in CPython (a significant contributor to why
Python 3.11 was about 25% faster than Python 3.10 for most workloads). We won&rsquo;t get into this too deep in this blog post but in essence, the idea here is that once a bytecode instruction has been executed enough times in a code path, the interpreter can &ldquo;specialize&rdquo; it based on types and values seen at runtime.</p>
<p>For example, let’s consider the <code>BINARY_OP</code> instruction in our bytecode. If the interpreter sees you&rsquo;re doing a lot of integer subtraction, it might optimize that instruction internally by installing a fast path for integers. This means that while the bytecode still says <code>BINARY_OP</code>, the interpreter skips type checks and uses a specialized implementation for integer subtraction behind the scenes, making it significantly faster, even without the JIT compiler.</p>
<h2 id="okay-so-what-happens-in-jit-builds">Okay, so what happens in JIT builds?</h2>
<p>Right, right. Okay, so now that we understand how the interpreter works, we can talk about what happens when you run your code in a JIT build of CPython.</p>
<h3 id="enter-the-micro-instruction-uops-interpreter">Enter the micro-instruction (uops) interpreter</h3>
<p>So, your code is running in a regular build of CPython is already doing some smart things to optimize your bytecode. But what if we could do even better? What if we could take those bytecode instructions and turn them into something even more efficient? This is where the micro-instruction interpreter comes in.</p>
<p>So once your code has &ldquo;warmed up&rdquo; or been executed enough times, we can start to optimize it even further. What&rsquo;s really neat is that the specializing adaptive interpreter actually provides us with a lot of profiling information about the code being executed that helps with all of this. With the micro-op interpreter, we break each bytecode instruction in the code path down into even smaller, more specialized instructions called micro-operations, or uops. These uops are designed to be more efficient and can be executed much faster than the original bytecode instructions. The process of breaking down bytecode instructions into traces happens automatically thanks to some domain-specific language (DSL) infrastructure that was introduced in Python 3.12; it&rsquo;s effectively a table look up to say that this bytecode instruction maps to these uops. Once we have these uops, we can even start to optimize them further by removing unnecessary checks and operations (&hellip;again, a topic for another post).</p>
<p>I&rsquo;d be remiss if I didn’t mention that the micro-op interpreter is a separate interpreter from the regular bytecode one. In a JIT build of CPython, both interpreters are available, and once a function becomes &ldquo;hot,&rdquo; execution can switch from bytecode to uops. That might sound like a big performance win, but not quite yet. In fact, things often get slower at this stage. The micro-op interpreter introduces overhead by breaking each bytecode instruction into smaller, more granular uops and dispatching more instructions overall. It’s a trade-off: we’re doing extra work now to prepare for the real speedup that comes next, when the JIT compiler steps in to generate optimized machine code and (hopefully) recover that lost performance and then some.</p>
<blockquote>
<p>When you build Python with <code>--enable-experimental-jit</code> or set <code>PYTHON_JIT=1</code> in Python 3.14 builds, you&rsquo;re not just enabling the JIT itself, but the micro-op interpreter as well.</p>
</blockquote>
<h3 id="jit-compilation">JIT Compilation</h3>
<p>Alright, we&rsquo;ve finally made it! Let&rsquo;s talk about the JIT.</p>
<p>First off, let&rsquo;s talk about what a JIT compiler is, in case you&rsquo;re not already familiar. A JIT (Just-In-Time) compiler is a type of compiler that translates code into machine code at runtime, rather than before execution.</p>
<p>In the context of CPython, our JIT compiler uses a technique called copy-and-patch. This technique is covered in <a href="https://fredrikbk.com/publications/copy-and-patch.pdf">this paper</a> but don&rsquo;t worry, we don&rsquo;t need to get too academic here. Basically, what happens is as follows:</p>
<ol>
<li>When CPython is built, we use LLVM to generate precompiled stencil files for your specific platform and architecture. These stencil files contain templates for how to translate the micro-ops we talked about earlier into machine code.</li>
<li>When your code is executed, the JIT compiler monitors the execution and identifies &ldquo;hot&rdquo; traces—sections of code that are executed frequently.</li>
<li>When a hot trace is detected, the JIT compiler takes the relevant micro-ops, which are the smaller, specialized instructions we covered earlier, and uses the precompiled stencil templates to generate native machine code.
<ul>
<li>The JIT compiler fills in the placeholders in the stencil templates with the actual values needed for your code, such as addresses of variables, constants, and cached results (&ldquo;patching&rdquo; up the code).</li>
<li>These stencil files are then linked together to form a trace, which is a sequence of micro-ops that can be executed as native machine code.</li>
<li>Finally, the JIT compiler executes this native machine code directly instead of interpreting.</li>
</ul>
</li>
</ol>
<blockquote>
<p>Now, the elephant in the room here is that the JIT does not (yet!) make Python a whole lot faster. In most cases, the JIT builds range from slower to about the same performance as the non-JIT build of Python. As of 3.14, the JIT is faster in select benchmarks but we have a ways to go still. Ken Jin has a great <a href="https://fidget-spinner.github.io/posts/jit-reflections.html">blog post</a> that goes into more detail about the performance of the JIT builds in Python 3.14 (among other reflections) if you&rsquo;re interested.</p>
</blockquote>
<h2 id="putting-it-all-together">Putting it all together</h2>
<p>So, to summarize, when you run your code in a JIT build of CPython, the following happens:</p>
<ol>
<li>Your code is tokenized, parsed, and compiled into bytecode as usual.</li>
<li>The bytecode is executed by the regular bytecode interpreter, which may specialize some instructions based on runtime profiling.</li>
<li>If the code is executed enough times, the micro-op interpreter kicks in, breaking down the bytecode instructions into smaller, more specialized uops.</li>
<li>The JIT compiler then compiles these uops into native machine code using precompiled stencil templates, optimizing the execution of your code.</li>
<li>The native machine code is executed directly by the CPU, bypassing the bytecode interpreter and micro-op interpreter.</li>
</ol>
<p>&hellip;and that&rsquo;s it! You now have an understanding of how your code runs in a JIT build of CPython and you didn&rsquo;t have to be a compiler engineer to understand it!</p>
<h2 id="suggested-readings--videos">Suggested readings &amp; videos</h2>
<p>Some other great talks, blog posts, etc. by other folks working on Python:</p>
<ul>
<li>Maybe watch one of Brandt&rsquo;s talks on this topic:
<ul>
<li><a href="https://www.youtube.com/watch?v=kMO3Ju0QCDo">Building a JIT compiler for CPython</a></li>
<li><a href="https://www.youtube.com/watch?v=NE-Oq8I3X_w">What they don&rsquo;t tell you about building a JIT compiler for CPython</a></li>
</ul>
</li>
<li>Diego&rsquo;s <a href="https://www.youtube.com/watch?v=5si4zkAngpA">EuroPython 2025 talk</a></li>
<li>ICYMI earlier in the post, Ken Jin&rsquo;s <a href="https://fidget-spinner.github.io/posts/jit-reflections.html">Reflections on 2 years of CPython’s JIT Compiler: The good, the bad, the ugly</a> is also great if you want to learn more about the JIT builds in Python 3.14 and what it&rsquo;s taken to get to this point.</li>
<li>Check out <a href="https://www.python.org/dev/peps/pep-0744/">PEP 744</a>, it&rsquo;s really not that scary!</li>
</ul>
<blockquote>
<p>If you enjoyed this post, please consider sharing it with anyone you think might find it interesting. If you have any questions or feedback, feel free to reach out to me via <a href="mailto:savannah@python.org">email</a>.</p>
</blockquote>
]]></content:encoded>
    </item>
    
  </channel>
</rss>
