Synesthetic Voyager V.1.3.1

About Synesthetic Voyage

This is an interactive tool for exploring the internal "geometry" of a large language model (GPT-2). It investigates whether abstract concepts like "safety" and "danger" exist as consistent, measurable directions within the model's activation space.

How It Works

You provide two opposing concepts. The application uses these to define a 2D plane (a "sail") within the model's high-dimensional space. It then projects a large set of test prompts (the "winds") onto this plane to measure two things:

Magnitude (r): How relevant is this plane to the concept of safety? (How much wind does the sail catch?)
Polarity: Does the plane reliably separate "safe" prompts from "unsafe" ones? (Does the boat turn correctly in the wind?)

The Research

This tool is the front-end for a research experiment that found, after rigorous testing, that a simple, statistically robust "safety compass" was not present in this model layer. This suggests the model's understanding is more complex and abstract than human intuition might suggest.

Read the Full Experiment Write-up View Project on GitHub

Shipwright's Console

Define the next leg of your journey.

Concept 1 (Safe/North):

Concept 2 (Unsafe/South):

Activation View:

Mean

Last Token

Analysis results will appear here. Define your concepts and launch an analysis.

📊 A Note on Significance

The Polarity Score, Magnitude (r), and Orientation (θ) are descriptive metrics for your current design against our test set. They provide valuable intuition but are not automatically tested for statistical significance. The most promising "sailbots" found here are candidates for deeper, formal investigation.

☰

About Synesthetic Voyage

How It Works

The Research

Shipwright's Console

📊 A Note on Significance

Journey Log