plasma-umass/scalene

Scalene Development Guide

View on GitHub ↗Yours? Claim it ↗

§ 01 — Stats

Stars13.4k

Forks435

Prior1366

Quality—

Score—

Tasks—

§ 02 — Use

Drop into your project.

A CLAUDE.md is just a markdown file at the root of your repo. Copy the content below into your own project's CLAUDE.md to give your agent the same context.

One-line install · current directory

$npx versuz@latest install plasma-umass-scalene --kind=claude-md

Or curl directly

$curl -o CLAUDE.md https://raw.githubusercontent.com/plasma-umass/scalene/HEAD/CLAUDE.md

Project typepython-data

Tokens

Embed badge

Show

Style

[![Versuz · plasma-umass/scalene](https://versuz.dev/badge/claude-md/plasma-umass-scalene)](https://versuz.dev/claude-md/plasma-umass-scalene)

Show CLAUDE.md content (~6.3k tokens)

# Scalene Development Guide

## Project Overview

Scalene is a high-performance CPU, GPU, and memory profiler for Python with AI-powered optimization proposals. It runs significantly faster than other Python profilers while providing detailed performance information. See the paper `docs/osdi23-berger.pdf` for technical details on Scalene's design.

**Key features:**
- CPU, GPU (NVIDIA/Apple), and memory profiling
- AI-powered optimization suggestions (OpenAI, Anthropic, Azure, Amazon Bedrock, Gemini, Ollama)
- Web-based GUI and CLI interfaces
- Jupyter notebook support via magic commands (`%scrun`, `%%scalene`)
- Line-by-line profiling with low overhead
- Separates Python time from native/C time

**Platform support:** Linux, macOS, WSL 2 (full support); Windows (partial support)

## Build & Test Commands

```bash
# Install in development mode
pip install -e .

# Run all tests
python3 -m pytest tests/

# Run tests for a specific Python version
python3.X -m pytest tests/

# Run linters
mypy scalene
ruff check scalene

# Run a single test file
python3 -m pytest tests/test_coverup_83.py -v
```

## Project Structure

### Core Profiler Components (`scalene/`)

- **`scalene_profiler.py`** - Main profiler class (`Scalene`). Entry point for profiling. Uses signal-based sampling for CPU profiling. Coordinates all profiling subsystems.
- **`scalene_statistics.py`** - `ScaleneStatistics` class. Collects and aggregates profiling data. Key types: `ProfilingSample`, `MemcpyProfilingSample`. Uses `RunningStats` for statistical aggregation.
- **`scalene_output.py`** - Profile output formatting for CLI/HTML
- **`scalene_json.py`** - `ScaleneJSON` class for JSON output format
- **`scalene_analysis.py`** - Profile analysis logic

### Entry Points

- **`__main__.py`** - Entry point for `python -m scalene`
- **`profile.py`** - Entry point for `--on`/`--off` control of background profiling

### Configuration & Arguments

- **`scalene_config.py`** - Version info (`scalene_version`, `scalene_date`) and constants:
  - `SCALENE_PORT = 11235` - Default port for web UI
  - `NEWLINE_TRIGGER_LENGTH` - Must match `src/include/sampleheap.hpp`
- **`scalene_arguments.py`** - `ScaleneArguments` class (extends `argparse.Namespace`) with all profiler options and their defaults defined in `ScaleneArgumentsDict`
- **`scalene_parseargs.py`** - `ScaleneParseArgs.parse_args()` builds the argument parser. `RichArgParser` provides colored help output (uses Rich on Python < 3.14, native argparse colors on 3.14+)

### Signal Handling

- **`scalene_signals.py`** - Signal definitions for CPU sampling
- **`scalene_signal_manager.py`** - Manages signal handlers
- **`scalene_sigqueue.py`** - Signal queue management
- **`scalene_client_timer.py`** - Timer for periodic profiling

### GPU Support

- **`scalene_nvidia_gpu.py`** - NVIDIA GPU profiling via `pynvml`
- **`scalene_apple_gpu.py`** - Apple GPU profiling (Metal)
- **`scalene_accelerator.py`** - Generic accelerator interface
- **`scalene_neuron.py`** - AWS Neuron support

### Memory Profiling

- **`scalene_memory_profiler.py`** - Memory profiling logic
- **`scalene_leak_analysis.py`** - Memory leak detection (experimental, `--memory-leak-detector`)
- **`scalene_mapfile.py`** - `ScaleneMapFile` for memory-mapped communication with native extension
- **`scalene_preload.py`** - Sets up `LD_PRELOAD`/`DYLD_INSERT_LIBRARIES` for native memory tracking

### Jupyter Integration

- **`scalene_magics.py`** - Jupyter magic commands (`%scrun` for line mode, `%%scalene` for cell mode)
- **`scalene_jupyter.py`** - Jupyter notebook support utilities

### Replacement Modules (`replacement_*.py`)

These modules monkey-patch standard library functions to capture profiling data during blocking operations:
- **`replacement_fork.py`** - Tracks `os.fork()`
- **`replacement_exit.py`** - Tracks `sys.exit()`
- **`replacement_lock.py`**, **`replacement_mp_lock.py`**, **`replacement_sem_lock.py`** - Lock acquisition timing
- **`replacement_thread_join.py`**, **`replacement_pjoin.py`** - Thread/process join timing
- **`replacement_signal_fns.py`** - Signal function replacements
- **`replacement_poll_selector.py`** - I/O polling timing
- **`replacement_get_context.py`** - Multiprocessing context

### Utilities

- **`runningstats.py`** - `RunningStats` class for online statistical calculations (mean, variance)
- **`scalene_funcutils.py`** - Function utilities
- **`scalene_utility.py`** - General utilities
- **`sparkline.py`** - Sparkline generation for memory visualization
- **`syntaxline.py`** - Syntax-highlighted source code lines
- **`adaptive.py`** - Adaptive sampling logic
- **`time_info.py`** - Time measurement utilities
- **`sorted_reservoir.py`** - Reservoir sampling for bounded-size sample collection

### GUI (`scalene/scalene-gui/`)

Web-based GUI built with TypeScript, bundled with esbuild.

**Core Files:**
- **`index.html.template`** - Jinja2 template for main GUI page (rendered by `scalene_utility.py`)
- **`scalene-gui.ts`** - Main TypeScript entry point, UI event handlers, initialization
- **`scalene-gui-bundle.js`** - Bundled JavaScript output (generated, do not edit directly)

**AI Provider Modules:**
- **`openai.ts`** - OpenAI API integration (`sendPromptToOpenAI`, `fetchOpenAIModels`)
- **`anthropic.ts`** - Anthropic Claude API integration
- **`gemini.ts`** - Google Gemini API integration (`sendPromptToGemini`, `fetchGeminiModels`)
- **`optimizations.ts`** - Provider dispatch logic, prompt generation
- **`persistence.ts`** - localStorage persistence with environment variable fallbacks

**Support Files:**
- **`launchbrowser.py`** - Opens browser to GUI (default port 11235)
- **`find_browser.py`** - Cross-platform browser detection

**Vendored Assets (for offline support):**
- **`jquery-3.6.0.slim.min.js`** - jQuery (vendored locally, not loaded from CDN)
- **`bootstrap.min.css`** - Bootstrap 5.1.3 CSS
- **`bootstrap.bundle.min.js`** - Bootstrap 5.1.3 JS with Popper
- **`prism.css`** - Syntax highlighting styles
- **`favicon.ico`** - Scalene favicon
- **`scalene-image.png`** - Scalene logo

These assets are copied to a temp directory when serving via HTTP, enabling the GUI to work in air-gapped/offline environments.

**Building the GUI:**
```bash
npm --prefix scalene/scalene-gui run build
```
The `build` script in `scalene/scalene-gui/package.json` invokes esbuild
with `--minify --sourcemap --target=es2020`. The minified bundle is what
gets checked in (≈1.1 MB vs ≈2.5 MB unminified). A `build:dev` variant
(no minification) is available for debugging.

### Native Extensions (`src/`)

C++ code for low-overhead memory allocation tracking:

**Headers (`src/include/`):**
- **`sampleheap.hpp`** - Sampling heap allocator. Key constant `NEWLINE` must match Python config.
- **`memcpysampler.hpp`** - Intercepts `memcpy` to track copy volume
- **`pywhere.hpp`** - Tracks Python file/line info for allocations
- **`samplefile.hpp`** - File-based communication with Python
- **`sampler.hpp`**, **`poissonsampler.hpp`**, **`thresholdsampler.hpp`** - Sampling strategies
- **`scaleneheader.hpp`** - Common header definitions

**Sources (`src/source/`):**
- **`libscalene.cpp`** - Main native library (loaded via `LD_PRELOAD`)
- **`pywhere.cpp`** - Python location tracking implementation
- **`get_line_atomic.cpp`** - Atomic line number access
- **`traceconfig.cpp`** - Trace configuration

### Vendor Libraries (`vendor/`)

- **`Heap-Layers/`** - Memory allocator infrastructure (by Emery Berger)
- **`printf/`** - Async-signal-safe printf implementation

## Key Patterns

### Python Version Compatibility

The codebase supports Python 3.8-3.14. Version-specific code uses:

```python
if sys.version_info >= (3, 14):
    # Python 3.14+ specific code
else:
    # Older Python versions
```

**Type Annotation Compatibility (Python 3.8/3.9):**
- **Do NOT use `X | Y` union syntax** in runtime-evaluated annotations (PEP 604 requires Python 3.10+). Use `Optional[X]` or `Union[X, Y]` from `typing` instead.
- **Do NOT use `list[X]`, `dict[K, V]`, `tuple[X, ...]`** in runtime-evaluated annotations (PEP 585 lowercase generics require Python 3.9+). Use `List`, `Dict`, `Tuple` from `typing` for 3.8 support.
- Adding `from __future__ import annotations` makes all annotations strings (not evaluated at runtime), which allows modern syntax on older Python. However, this can break code that inspects annotations at runtime (e.g., dataclasses, pydantic).
- The safest approach for this codebase: use `typing.Optional`, `typing.Union`, `typing.List`, `typing.Tuple`, `typing.Dict` in all annotation positions that are evaluated at runtime (function signatures, variable annotations outside `if TYPE_CHECKING` blocks).

**Python 3.13 Changes (`dis` module):**
- `dis.Instruction.starts_line` changed from `int | None` (line number) to `bool`
- New `dis.Instruction.line_number` attribute (`int | None`) added for the actual line number
- On Python < 3.13, `starts_line` is only set on the **first** instruction of each source line; use a line-tracking loop to propagate line numbers to subsequent instructions

**Bytecode/Opcode Compatibility (`dis` module):**
- **Never match specific opcode names** (e.g., `JUMP_BACKWARD`, `JUMP_ABSOLUTE`, `POP_JUMP_IF_TRUE`). Opcode names change across Python versions — for example, Python 3.10 while loops use `POP_JUMP_IF_TRUE` for backward jumps, Python 3.11+ uses `JUMP_BACKWARD`, and `JUMP_ABSOLUTE` was removed in 3.12.
- **Always use abstract `dis` module categories** when possible: `dis.hasjabs` (absolute jump opcodes), `dis.hasjrel` (relative jump opcodes), `dis.hasconst`, `dis.hasname`, etc. These are maintained by CPython and work across all versions.
- For call detection, matching `opname.startswith("CALL")` is acceptable since that prefix has been stable, but prefer opcode integer sets over name strings for hot paths.
- When checking jump direction (forward vs backward), use `instr.argval` (which `dis` resolves to an absolute offset) and compare against `instr.offset`, rather than relying on opcode names to imply direction.

**Python 3.14 Changes:**
- `argparse` now has built-in colored help output (`color=True` parameter)
- `RichArgParser` uses Rich for colors on Python < 3.14, native argparse colors on 3.14+

### Argument Parsing (`scalene_parseargs.py`)

```python
class RichArgParser(argparse.ArgumentParser):
    """ArgumentParser that uses Rich for colored output on Python < 3.14."""

    def __init__(self, *args, **kwargs):
        if sys.version_info < (3, 14):
            from rich.console import Console
            self._console = Console()
        else:
            self._console = None
        super().__init__(*args, **kwargs)
```

The `_colorize_help_for_rich()` function applies Python 3.14-style colors using Rich markup:
- `usage:` and `options:` → bold blue
- Program name → bold magenta
- Long options (`--foo`) → bold cyan
- Short options (`-h`) → bold green
- Metavars (`FOO`) → bold yellow

### GUI Patterns

**Preventing Browser Password Prompts:**
Use `autocomplete="one-time-code"` on password/API key inputs to prevent browsers from offering to save them:
```html
<input type="password" id="api-key" autocomplete="one-time-code">
```

**Show/Hide Password Toggle:**
```typescript
function togglePassword(inputId: string, button: HTMLButtonElement): void {
  const input = document.getElementById(inputId) as HTMLInputElement;
  if (input.type === "password") {
    input.type = "text";
    button.textContent = "Hide";
  } else {
    input.type = "password";
    button.textContent = "Show";
  }
}
```

**Provider Field Visibility:**
Use CSS classes to show/hide provider-specific fields:
```typescript
function toggleServiceFields(): void {
  const service = (document.getElementById("service") as HTMLSelectElement).value;
  // Hide all provider sections
  document.querySelectorAll(".provider-section").forEach((el) => {
    (el as HTMLElement).style.display = "none";
  });
  // Show selected provider section
  const section = document.querySelector(`.${service}-fields`);
  if (section) (section as HTMLElement).style.display = "block";
}
```

**Persistent Form Elements:**
Add class `persistent` to inputs that should be saved/restored from localStorage:
```html
<input type="text" id="api-key" class="persistent">
```
The `persistence.ts` module handles save/restore automatically.

**Standalone HTML Generation:**
The `generate_html()` function in `scalene_utility.py` supports a `standalone` parameter:
- When `standalone=False` (default): Assets are referenced as local files (e.g., `<script src="jquery-3.6.0.slim.min.js">`)
- When `standalone=True`: All assets are embedded inline (JS/CSS as text, images as base64)

The Jinja2 template uses conditionals:
```html
{% if standalone %}
<script>{{ jquery_js }}</script>
<style>{{ bootstrap_css }}</style>
{% else %}
<script src="jquery-3.6.0.slim.min.js"></script>
<link href="bootstrap.min.css" rel="stylesheet">
{% endif %}
```

### Module Imports

When importing submodules, be explicit:

```python
# Correct - mypy can verify this
import importlib.util
importlib.util.find_spec(mod_name)

# Wrong - mypy error: Module has no attribute "util"
import importlib
importlib.util.find_spec(mod_name)
```

## Testing

### Test Files (`tests/`)

- **`test_coverup_*.py`** - Auto-generated coverage tests
- **`test_runningstats.py`** - Statistics tests (requires `hypothesis`)
- **`test_scalene_json.py`** - JSON output tests (requires `hypothesis`)
- **`test_nested_package_relative_import.py`** - Import handling tests

### Test Dependencies

```bash
pip install pytest pytest-asyncio hypothesis
```

### Running Tests Across Python Versions

```bash
for v in 3.9 3.10 3.11 3.12 3.13 3.14; do
    python$v -m pytest tests/test_coverup_83.py -v
done
```

### Flaky Smoketests

The smoketests in `test/` can be flaky due to timing/sampling issues inherent to profiling:

- **"No non-zero lines in X"** - The profiler didn't collect enough samples. This happens when the test runs too quickly or signal delivery timing varies.
- **"Expected function 'X' not returned"** - A function wasn't sampled. Common with short-running functions.

These failures are usually timing-related and pass on re-run. They're more common on CI due to variable machine load.

### Port Binding in Tests

When testing port availability, never use hardcoded ports - they may already be in use on CI runners:

```python
# Bad - port 49200 might be in use
port = 49200
sock.bind(("", port))

# Good - find an available port first
port = find_available_port(49200, 49300)
if port is None:
    return  # Skip test if no ports available
sock.bind(("", port))
```

## CI/CD (`.github/workflows/`)

- **`run-linters.yml`** - Runs mypy and ruff on Python 3.9-3.14
- **`tests.yml`** - Runs pytest on Python 3.9-3.14
- **`build-and-upload.yml`** - Build and publish to PyPI

## Common Tasks

### Adding a New CLI Option

1. Add default value in `scalene_arguments.py`:
   ```python
   class ScaleneArgumentsDict(TypedDict, total=False):
       my_option: bool
   ```

2. Add argument in `scalene_parseargs.py`:
   ```python
   parser.add_argument(
       "--my-option",
       dest="my_option",
       action="store_true",
       default=defaults.my_option,
       help="Description of option",
   )
   ```

### Adding a New AI Provider

1. **Create provider module** (`scalene/scalene-gui/newprovider.ts`):
   ```typescript
   export async function sendPromptToNewProvider(
     prompt: string,
     apiKey: string
   ): Promise<string> {
     // API call implementation
   }

   export async function fetchNewProviderModels(apiKey: string): Promise<string[]> {
     // Optional: fetch available models from API
   }
   ```

2. **Update `optimizations.ts`**:
   - Import the new module
   - Add case in `sendPromptToService()` switch statement

3. **Update `index.html.template`**:
   - Add option to `#service` select dropdown
   - Add provider section with API key input, model selector, etc.
   - Add CSS for `.newprovider-fields` visibility

4. **Update `scalene-gui.ts`**:
   - Add provider to `toggleServiceFields()` function
   - Add refresh handler if dynamic model fetching is supported
   - Update `getDefaultProvider()` if env var support is needed

5. **Update `persistence.ts`** (for env var support):
   - Add mapping in `envKeyMap` for new fields

6. **Update `scalene_utility.py`**:
   - Read environment variable in `api_keys` dict
   - Pass to template rendering

7. **Rebuild the bundle**:
   ```bash
   npm --prefix scalene/scalene-gui run build
   ```

### Environment Variable API Keys

The GUI supports prepopulating API keys from environment variables:

| Element ID | Environment Variable | Provider |
|------------|---------------------|----------|
| `api-key` | `OPENAI_API_KEY` | OpenAI |
| `anthropic-api-key` | `ANTHROPIC_API_KEY` | Anthropic |
| `gemini-api-key` | `GEMINI_API_KEY` or `GOOGLE_API_KEY` | Gemini |
| `azure-api-key` | `AZURE_OPENAI_API_KEY` | Azure OpenAI |
| `azure-api-url` | `AZURE_OPENAI_ENDPOINT` | Azure OpenAI |
| `aws-access-key` | `AWS_ACCESS_KEY_ID` | Amazon Bedrock |
| `aws-secret-key` | `AWS_SECRET_ACCESS_KEY` | Amazon Bedrock |
| `aws-region` | `AWS_DEFAULT_REGION` or `AWS_REGION` | Amazon Bedrock |

**Flow:**
1. `scalene_utility.py` reads env vars and passes to Jinja2 template
2. Template injects `envApiKeys` JavaScript object into page
3. `persistence.ts` uses env vars as fallbacks when localStorage is empty

### Updating Version

Edit `scalene/scalene_config.py`:
```python
scalene_version = "X.Y.Z"
scalene_date = "YYYY.MM.DD"
```

## Dependencies

Key runtime dependencies:
- `rich` - Terminal formatting and colors
- `cloudpickle` - Serialization
- `pynvml` - NVIDIA GPU support (optional)

See `requirements.txt` for full list.

## CLI Structure

Scalene uses a verb-based CLI with two main subcommands:

```bash
# Profile a program (saves to scalene-profile.json by default)
scalene run [options] yourprogram.py

# View an existing profile
scalene view [options] [profile.json]
```

### Run Subcommand Options

```bash
scalene run prog.py                      # profile, save to scalene-profile.json
scalene run -o my.json prog.py           # save to custom file
scalene run --cpu-only prog.py           # profile CPU only (faster)
scalene run -c config.yaml prog.py       # load options from config file
scalene run prog.py --- --arg            # pass args to program
```

### View Subcommand Options

```bash
scalene view                             # open in browser
scalene view --cli                       # view in terminal
scalene view --html                      # save to scalene-profile.html
scalene view --standalone                # save as self-contained HTML (all assets embedded)
scalene view myprofile.json              # open specific profile
```

### Profile Completion Message

After profiling completes, Scalene prints instructions for viewing the profile:
```
Scalene: profile saved to scalene-profile.json
  To view in browser:  scalene view
  To view in terminal: scalene view --cli
```

The filename is only included in the command if a non-default output file was used.

### YAML Configuration

Create a `scalene.yaml` file with options:

```yaml
outfile: my-profile.json
cpu-only: true
profile-only: "mypackage,utils"
cpu-percent-threshold: 5
```

Load with: `scalene run -c scalene.yaml prog.py`

### Advanced Options

Use `scalene run --help-advanced` to see all options including:
- `--profile-all` - profile all code, not just the target program
- `--profile-only PATH` - only profile files containing these strings
- `--profile-exclude PATH` - exclude files containing these strings
- `--profile-system-libraries` - profile Python stdlib and installed packages (skipped by default)
- `--gpu` - profile GPU time and memory
- `--memory` - profile memory usage
- `--stacks` - collect stack traces
- `--profile-interval N` - output profiles every N seconds

### Smoke Tests

Smoke tests in `test/` use the new CLI syntax:

```python
# test/smoketest.py
cmd = [sys.executable, "-m", "scalene", "run", "-o", str(outfile), *rest, fname]
```

### GitHub Workflows

Workflows in `.github/workflows/` use the new CLI:

```yaml
# Profile with interval, then view
- run: python -m scalene run --profile-interval=2 test/testme.py && python -m scalene view --cli

# Profile with module invocation
- run: python -m scalene run --- -m import_stress_test && python -m scalene view --cli
```

## Signal Handling

Scalene uses several Unix signals for profiling. The signal assignments are in `scalene_signals.py`:

| Signal | Purpose | Platform |
|--------|---------|----------|
| `SIGVTALRM` | CPU profiling timer (default) | Unix |
| `SIGALRM` | CPU profiling timer (real time mode) | Unix |
| `SIGILL` | Start profiling (`--on`) | Unix |
| `SIGBUS` | Stop profiling (`--off`) | Unix |
| `SIGPROF` | memcpy tracking | Unix |
| `SIGXCPU` | malloc tracking | Unix |
| `SIGXFSZ` | free tracking | Unix |

### Signal Conflicts with Libraries

Libraries like PyTorch Lightning may also use these signals. The `replacement_signal_fns.py` module handles conflicts:

**On Linux:** Uses real-time signals (`SIGRTMIN+1` to `SIGRTMIN+5`) for redirection. When user code sets a handler for a Scalene signal, their handler is redirected to a real-time signal. Calls to `raise_signal()` and `kill()` are also redirected transparently.

**On macOS/other platforms:** Uses handler chaining. Both Scalene's handler and the user's handler are called when the signal fires.

```python
# Platform-specific signal handling
_use_rt_signals = sys.platform == "linux" and hasattr(signal, "SIGRTMIN")

if _use_rt_signals:
    # Linux: redirect to real-time signals
    rt_base = signal.SIGRTMIN + 1
    _signal_redirects[signal.SIGILL] = rt_base
else:
    # macOS: chain handlers
    def chained_handler(sig, frame):
        scalene_handler(sig, frame)
        user_handler(sig, frame)
```

### Frame Line Number Can Be None (Python 3.11+)

In Python 3.11+, `frame.f_lineno` can be `None` in edge cases (e.g., during multiprocessing cleanup). Always use a fallback:

```python
lineno = frame.f_lineno if frame.f_lineno is not None else frame.f_code.co_firstlineno
```

## Native Extension Build Issues

### C++ Standard Library Conflicts with vendor/printf

The `vendor/printf/printf.h` header defines macros that conflict with C++ standard library:

```c
#define vsnprintf vsnprintf_
#define snprintf  snprintf_
```

This breaks `std::vsnprintf` in `<string>` and other headers. **Fix:** Include C++ standard headers BEFORE vendor headers in `src/source/libscalene.cpp`:

```cpp
// Include C++ standard headers FIRST
#include <cstddef>
#include <string>

// Then vendor headers that define conflicting macros
#include <heaplayers.h>  // Eventually includes printf.h
```

## Profiling Guide

See [Scalene-Agents.md](Scalene-Agents.md) for detailed information about interpreting Scalene's profiling output, including Python vs C time, memory metrics, and optimization strategies.

## Debugging Guide

See [Scalene-Debugging.md](Scalene-Debugging.md) for signal handler debugging, async profiling debugging, the profile output pipeline (three separate renderers!), and unbounded growth prevention patterns.

## GUI Development Guide

See [Scalene-GUI.md](Scalene-GUI.md) for adding new columns, Vega-Lite chart types, pie chart best practices (two-wedge rendering, rotating pies), and the chart rendering flow.