Segments: A Soundscape Playground.

Segments -> User Guide and General Info

Segments is a web application designed both as an artistic tool and an environmental audio classifier. It enables users to record, analyze, visualize and more crucially, resynthesize surrounding soundscapes. It is based on a combination of AI-powered analysis with algorithmic synthesis.

Segments transform, reorganize and finally render sonic environments into new auditory structures. It is a scientific instrument and a sound design tool, where attentive listening merges with auralization creativity.

Through a growing list of playback modes (algorithms), users can reframe field recordings (or any other audio material) into narrative arcs, stochastic reorganizations and spectral transformations.

Liminal Territories

Segments is based equally in the conceptual use of algorithms and the phenomenology of sound, followed by reorganization and resynthesis in both of these domains. Through reorganization hidden structures emerges, patterns rises on the surface and unnoticed details are brought to the foreground. Sometimes, what was once background becomes an aesthetic focus. The composer creates a dialogue between nature’s proto-musical sound material and audio signal processing, discovering new musical potentials already present in the soundscape.

Key Features

Audio Analysis: Classify audio into 521 everyday labels using MediaPipe's YAMNet model, bucketed into biophonic or anthropophonic categories.
Rendering Modes: Create composite renders with modes like, Lorenz attractor, Markov chain, Random mix or Emotion-based and many more.
Multiple Renders: Render and export up to 8 channels based on custom user rules.
Real-Time Processing: Analyze live microphone input or uploaded audio files (WAV/MP3).
Custom Model Training: Train new sound labels in-browser using TensorFlow, saved to localStorage. This is separate from the YAMNet model and does not modify it.
MIDI Integration: Map sound labels to MIDI notes, for controlling synthesizers or IoT devices.
Conceptual Reverb: Add Reverb to the full render file, by category (Biophony / Anthropophony) or by label.
Visualization: View analysis results as tables and pie charts for intuitive insights.
NEW: Musicality Analyzer: A custom three-factor model for analyzing the musicality of renders. Includes a detailed sound analysis.
NEW: Spectral Synthesis: A tool for further spectral editing, using spectral smear, harmonic boost and more.
NEW: Overtones Generator: A tool to introduce up to 12 new harmonics to the render file.

MediaPipe YAMNet vs. TensorFlow Custom Models

Segments uses two distinct AI models for audio analysis: the pre-trained MediaPipe YAMNet model and a TensorFlow-based custom model for user-defined labels. Below is a comparison of their roles and capabilities:

Feature	MediaPipe YAMNet	TensorFlow Custom Model
Purpose	Pre-trained model for classifying audio into 521 fixed everyday sound labels (e.g., "Bird," "Car Horn"), bucketed into Biophony or Anthropophony categories.	User-trained model for recognizing new, custom sound labels (e.g., "Coffee Machine") not included in YAMNet's label set.
Training	Fixed and pre-trained; cannot be modified or extended with new labels by users.	Trainable in-browser using 5–20 WAV clips (16 kHz, mono, 1–5 s). Users define new labels, and the model is trained independently of YAMNet.
Label Set	521 pre-defined labels, non-editable.	User-defined labels, created via the "Train on These Results" button.
Storage	Embedded in the application, not stored locally.	Saved to browser's localStorage as JSON weights; persists across reloads.
Integration	Provides baseline classification for all audio analysis tasks.	Supplements YAMNet by adding custom labels, which can be used in rendering or MIDI mapping.
Reset	Not applicable (fixed model).	Reset using Ctrl+Shift+X to clear custom model data.

Note: The YAMNet model is used for all initial audio classifications, while the TensorFlow custom model allows users to extend functionality with new sound labels. Custom labels do not modify or add to YAMNet's fixed label set but operate as a separate, parallel model.

How to use Segments

Getting Started

Wait for Model Loading: Ensure the status window (top right) shows "Model loaded".
Drag & Drop Files: Upload WAV or MP3 files via the "Audio Folder Analysis (≤ 10 MB each) ", "Single File (any size)" or "RealTime (up to 24h)" sections.
Render Audio: Select which source method to render (Playback Setting -> Source). Folder, Single, Realtime or a combination of sources. All three input methods can be rendered simultaneously.
Render Audio: Select a playback mode (Playback Setting -> Main Mode -> e.g. "Bounce Chaos") and click "Render". A new file(s) is created based on the selected playback mode and other settings (Pitch, Volume, Reverb etc). Re-render with a different algorithm (Main Mode) to notice the differences.
MIDI Mapping: [optionally] Assign labels (from YAMNet or custom model) to MIDI notes in the "MIDI Mapper" section to control external devices.
View Results: Check the "Folder Analysis Results," "Single File Recording Analysis," or "Real-Time Analysis" sections for tables and pie charts showing top labels, categories and confidence scores.
Optional Custom Training: In "Train Custom Sounds" section, train your personal model with new labels. This creates a separate custom model, not an update to YAMNet.

Training Custom Models (TensorFlow)

Segments allows in-browser training of custom sound labels using TensorFlow, independent of the YAMNet model.

Steps to Train a Custom Model

Prepare Audio Clips: Collect 5–20 clean WAV files (16 kHz, mono, 16-bit, 1–5 seconds, ≤ 10 MB each).
Analyze Clips: Upload clips via folder, single file, or real-time analysis (processed by YAMNet).
Train: Click the green "Train on These Results" button under the relevant analysis table, enter a label (e.g., "Coffee Machine"), and train in-browser with TensorFlow.
Add Negative Examples: Optionally upload negative samples to improve model accuracy.
Save Model: If accuracy ≥ 70%, the model auto-saves to localStorage. Use "Save Model" to export as JSON or "Load Model" to import.
Reset Model: Press Ctrl+Shift+X to reset the custom model (does not affect YAMNet).

Custom Model Architecture

Structure: 1-D CNN → Global MaxPool → Dense(16) → Dense(2-softmax)
Input: 16 kHz mono, 1-second clips (16,000 samples)
Augmentation: ±10% time-stretch, ±5 dB pink-noise, ±3 dB gain
Validation: 20% hold-out, target ≥ 70% accuracy
Storage: JSON weights in localStorage

Note: Custom training creates a new model separate from YAMNet. Use headphones to avoid feedback during training or recording. Ensure clips are clean and representative of the target sound.

Rendering Audio

Segments allows to create composite audio mixes from analyzed segments using various playback modes, leveraging labels from YAMNet or the custom TensorFlow model. Each mode is a sorting / sequencing strategy. Modes with [L] are available for looping (Read about Looping further).

Rendering Audio Methods (Playback Setting Section)

Folder: Render only files from Folder input.
Single: Render only files from Single input.
Realtime: Render only recordings from Realtime Input.
Folder and Single: Combo 1.
Folder and Realtime: Combo 2.
Single and Realtime: Combo 3.

Playback Modes (the list is growing...)

Playback Mode Parameters

Many of the following modes have additional parameters to control the algorithm: Playback Settings > Advanced Settings

Low → High: Arranges segments from lowest to highest frequency content AND sequential with a Biophony to Anthropophony order.
High → Low: Reverse of the above AND sequential with a Anthropophony to Biophony.
Lorenz Attractor: Uses chaotic patterns for segment ordering. Based on Lorenz Chaotic Attractor model. [L]
Markov Chain: Probabilistic transitions between segments. [L]
Most Common and Vice Versa: Render by most common or list common labels.
Random Mix: Random segment ordering. [L]
Cultural-geography: Takes you on geographical journey: rural → suburban → urban → industrial → natural. [L]
Maslow-hierarchy: Orders by Maslow's hierarchy: physiological → safety → belonging → esteem → self-actualization. [L]
Ontology-traversal: Organizes sounds in a hierarchical ontology and traverses it. [L]
Alternating: Alternates between Biophony and Anthropophony segments. [L]
Emotion: Orders segments based on emotional valence (based on preconfigured labels).
Label and Exlude: Filters segments by a specific label keyword (from YAMNet or custom model). Multiple labels are supported separated by comma (e.g. rain, wood, door).
Chaos-Bounce: Lorenz-attractor drives segment order; the randomness slider scales how wild the jumps feel (0 = nearly sequential, 100 = chaotic leaps). [L]
Spectral Flow: Sorts segments by a lightweight “brightness” proxy (RMS of upper-half spectrum) so the mix flows from dull to bright or vice-versa.
Spectral Stutter: A combination of Spectral Flow and Rhythmic Stutter. [L]
Rhythmic Stutter: Make each segment stutter-repeat at the chosen BPM. [L]
Formant-Ladder: Sorts segments by pre-computed average frequency, then picks an evenly-spaced “ladder” (2–12 steps); Depending the material it may sound like ascending/descending vocal formants.
Formant-Reorder: Pitch-shifts every segment up by the chosen ratio, sorts the now-scaled formant frequencies low→high, then optionally shuffles that ladder by the randomization % before returning the re-ordered, pre-shifted segments. [L]
Ecosystem Simulation: Simulates an ecosystem where sounds "compete" based on category rarity; "predators" (rare categories) "consume" order positions from "prey" (common ones), with randomization for mutation. In Advanced Setting adjust Mutation Rate. [L]
Narrative Journey: Order like: Start with natural sounds, build tension with Man-made intrusions, climax in hybrids, resolve in harmony. In Advanced Setting adjust Narrative Balance between Human-made and Natural segments. [L]
Quantum: Introduces a quantum-inspired behavior for ordering audio segments, simulating the concept of a quantum wave function and its collapse upon measurement. Unlike deterministic modes like "low→high" (sorting by frequency) or "random" (uniform shuffling), the "quantum" mode balances structure (favoring high-intensity segments) with randomness (probabilistic selection). [L]
Material-blend: Interpolates between material categories (Wood → Metal → Glass → Fabric, etc.).
Euclidean: Arrange segments based on Eyclidean algorithm. In Advanced Setting adjust Steps, Pulses and Rotation. [L]
Tidal Rhythm: Sorts segments by duration, assigns start times within a cycle (default 10 seconds) and applies randomization to introduce variability. [L]
Cellular-Repeat: Segments repeat in a 1-D cellular automaton: For every bar a segment is either kept, muted or replaced by its spectral neighbour. In Advanced Setting adjust Cells, Rules, Seed. [L]
User Custom: Custom user algorithms written in JavaScript to test on Segments. The function customOrder (files) takes an array of segments and returns an ordered array based on your algorithm.

User Custom Algorithms Examples

Alphabetical Order (A-Z):
function customOrder(files) {
return files.slice().sort((a, b) => {
const nameA = a.name || '';
const nameB = b.name || '';
return nameA.localeCompare(nameB);
}); }
Random Shuffle:
function customOrder(files) {
const shuffled = files.slice();
for (let i = shuffled.length - 1; i > 0; i--) {
const j = Math.floor(Math.random() * (i + 1));
[shuffled[i], shuffled[j]] = [shuffled[j], shuffled[i]];
}
return shuffled; }

Key Considerations: Function Signature and Return Value must be named customOrder: The function must be defined as function customOrder(files) { ... } because generateBuffer expects to call customOrder explicitly.
Input:
The files parameter is an array of segment objects, each with properties like:
audioBuffer: An AudioBuffer (Web Audio API) with duration, sampleRate (16 kHz), numberOfChannels, and getChannelData(ch).
category: String, either "Natural" or "Man-made".
name: String, the file name (e.g., "bird.wav").
topLabel: String, a primary label (e.g., "bird_call").
topCategories: Array of objects with categoryName (e.g., [{ categoryName: "bird" }, ...]).
frequency: Number, assumed for frequency-based sorting (e.g., in 'low→high').

Output: Must return an array of segment objects (a subset or reordering of files). The array must contain valid segments (each with an audioBuffer).

Looping Controls - Evolving Playback

Evolving Renders: If checked, creates a new random render with each new loop (loop time is specified in Render Length) based on the selected mode. Supported in selected playback modes.
Random Mode: Requires Evolving Renders checked. If checked, for each new loop a random playback mode is selected, along with a new random render. Supported in selected playback modes.
Check Both: For endless, non-repeatable playback.

Label Playback Mode

Label Select: Enter the label name to create a render only based on the choosen label.
Label Exclude: Enter label name to exclude it from render.

Rendering Options (Main Controls Section)

Volume: Adjust playback volume.
Time-Stretch: Adjust playback speed (e.g., 1x for normal, 0.5x for slower).
Pitch: Adjust pitch +12 / -12 semitones.
Fade: Set Crossfade between Segments.
Render Length: Specify output duration (e.g., 60s).
Speed: Adjust Playback Speed.
Filter and EQ: Select Filter type (Low Pass, High Pass, Band Pass) and adjust EQ settings.
Save Settings: Save silder values and options.
Revert to Original: Back to initial values

Reverb Controls

A convolution reverb engine, based on a high-quality Impulse Response.
Adjust Reverb Wet slider (0–100 %).
Adjust Room Size slider (0.1–1.0).
In Apply reverb dropdown, choose either:
- All – reverb on every segment.
- Biophony only – Add reverb only on "Biophony" segments.
- Anthropophonyonly – Add reverb only on "Anthropophony" segments.
- By label – Reverb is applied only when the segment label matches (case-insensitive).

Chain Mode

Render in series, using many playback modes at once.
In the text area (default lorenz → markov) enter the modes to render, with the order you would like to be processed.
Enter the name of Playbacj modes as they appear in Main Mode drop-down menu and use [ → ] to separate them.
Press Render Chain.

Multiple Renders

Segments spread up to eight (8) different render channels.
When mode is enabled, eight different files are generated, also available to download (one for each rendered output).
Each 'virtual output' (1-8) is filled with segments based on user defined criteria like segment duration, frequency, intensity (RMS), label, category.
Multiple Renders, is the result between the combination of the current Playback Mode and criteria set by user in this section.
All other features remain the same in this section (Reverb, Looping, Chain Render etc).
Custom mode, offers a deeper and more flexible access to the generation of segments per channel output. Use rules like:
- AND Conditions:
  - Example: label bird and freq > 1000 → 1,2
  - Example: label water and duration > 5 → 3,4
  - Example: category natural and rms > -20 → 5,6
- OR Conditions:
  - Example: label bird or label insect → 1,2
  - Example: freq < 500 or freq > 5000 → 3,4
  - Example: category natural or category weather → 5,6
- All operators:
  - >, >=, <, <=, =, ==
  - Example: rms <= -15 -> 5,6
- Range Distribution:
  - Range with Frequency (Hz), RMS (dBFS), Duration (s).
  - Example: freq 2000 4000 → 1,2
  - Example: duration 3 5 → 3,4
- Weighted Distribution:
  - Example: label water → 1[30%],2[30%],3[20%],4[20%]
  - Example: freq > 1000 → 1[40%],2[40%],3[10%],4[10%]
- Random Distribution:
  - Example: random all -> 1,2,3,4,5,6,7,8
- Complex Combinations:
  - Example: label bird and freq > 1000 or label insect and duration < 2 → 1,2 (all conditions must be true to return segments)
  - Example: category natural and freq 500-2000 or rms > -25 → 3,4,5
Browsers do not directly support multi-channel audio to drive the multiple renders into multiple speakers in real-time. Howevever, the multiple renders files can be uploaded to your DAW and driven to selected outputs if your audio interface support it.

There is a known bug which I am working to fix it. It does not affect the core functionality of the app, however it can be annoying.

Importing large files (mostly in Single file but sometimes in Folder Input files too), crushes the audio engine (only the first render). When this happens, pressing Play may not start playback immediately.

FIX: Press Stop Button > Render again > Press Play. Maybe you need to repeate these steps for 2 or 3 times before audio is ready to play. All following renders, using the same files and while trying other playback modes, function normally.

Using MIDI Functions

Segments maps sound labels (from YAMNet or custom TensorFlow model) to MIDI notes, to control synthesizers, DAWs, or IoT devices.

Steps to Set Up MIDI

Select MIDI Port: Choose an output port in the "MIDI Mapper" section.
Add Mapping: In the "Live MIDI Map" or "Pre-Map Sounds to MIDI" section, assign a label (from YAMNet or custom model) to a MIDI Note, CC, or Channel.
Common Sounds: Pre-map frequently detected YAMNet labels (e.g., "Bird," "Traffic").
Custom Sounds: Add custom-trained labels and assign MIDI parameters.
Save Mappings: Click "Save All Mappings" to store configurations.
Re-scan MIDI: Refresh available MIDI ports if devices change.

Note: Ensure your MIDI device or software is connected and recognized by the browser before mapping.

Practical Scenarios and Applications

Nature Conservation

Use Case: Use Segments to monitor a natural sonic environment. Record 10 minutes of soundscape, analyze it and find the percentage of Biophony sounds (birds, wind) vs.Anthropophony (distant traffic) in a specific area. Pie charts visualize ecosystem health. Hint: Train a custom TensorFlow model to detect a rare frog species, aiding conservation efforts.

Music and Sound Design

Use Case: Upload urban sound recordings, analyze and render a Markov chain-based mix of traffic and rain sounds. Then map a custom "Car Horn" label (trained via TensorFlow) to MIDI Note C3 to trigger a synth in their DAW, creating a dynamic soundscape for a film score.

Music Creation: Experimental Live Performance

Use Case: Use Segments's real-time analysis during a live set. Stream microphone input and the model detects sounds like "Applause" and "Footsteps" or your custom training labels. These are mapped to MIDI CCs to modulate synth parameters in e.g Ableton Live.

Smart Home and IoT

Use Case: Train a custom "Doorbell" label using TensorFlow and maps it to a MIDI CC to trigger smart lights. Segments detects the doorbell in real-time, automating the lighting response.

Film and Games

Use Case: Upload location recordings, filters for YAMNet's "Footsteps" label using Render by Label and downloads a composite WAV for a game's Foley library.

Spatial Audio

Use Case: Use the multi render section to generate up to 8 different audio channels. Upload to your DAW and use a multi-output audio interface to drive selected speakers.

File Formats and Specifications

Use Case	Format	Bits / Channels	Sample Rate	Max Size	Length Notes
Folder / Single Analysis	WAV, MP3	16–24 bit, mono/stereo	Auto-converted to 16 kHz mono	10 MB/file	≥ 0.1 s
Real-Time Stream	Web-Audio PCM	Float32, mono	16 kHz	—	Continuous
Custom-Training Samples	WAV (preferred)	16 bit, mono	16 kHz	10 MB/file	1–5 s ideal

Pie Charts

Purpose:Visualizes the distribution of sounds.
Sound Event Distribution Pie: Analyze the sonic environment for individual sounds (human or natural, with labels).
Background Pie: Visualizes the distribution of dominant, steady background sounds in an audio file or real-time input, focusing on persistent soundscapes rather than transient events.
Background Window (1-30s): Sets the time window for detecting dominant, steady background sounds, filtering out short, one-shot events like bird chirps or claps.
Min Hits: Specifies the minimum number of times a sound label must appear within this window to be considered a significant background element.
Real-Time Constraints: Long recordings (e.g., 1440 minutes) may strain browser memory.

The Three-Factor Model of Emergent Musicality

Segments incorporates a sophisticated three-factor model for analyzing how musicality emerges through hybrid human-algorithmic listening.

Key Insight: Musicality (M) is not discovered as an inherent property of soundscapes. Instead, it emerges through interaction between sonic material, computational transformation, and compositional practice.

The Musicality Formula

M = ƒ(ΔOD, ΔED, ΔCA)

Where M > 0 requires ΔOD > threshold₁, and M is bounded by ΔCA

The Three Factors Explained

Factor	What it is	What it measures
ΔOD (Organizational Density)	The degree of internal coherence and statistical interdependence among sonic events within a soundscape, independent of musical intention or convention.	Spectral spacing (acoustic niche partitioning) Temporal periodicities (cyclical processes) Amplitude modulation (environmental dynamics) Cross-correlation (echoes, sympathetic resonances)
ΔED (Emergent Differentiation)	The capacity of a soundscape to generate new structural relationships when subjected to computational analysis, revealing patterns not easily accessible to direct listening.	Latent category classification Semantic clustering potential Temporal segmentation capacity Spectral resynthesis potential
ΔCA (Compositional Affordance)	The degree to which structures emerging from ΔOD and ΔED become actionable as compositional material in hybrid human-algorithm workflows.	Decision Clarity: Speed and confidence of compositional decisions Vocabulary Development: Transferable strategies and labels Cognitive Load Management: System enables rather than overwhelms System Legibility: Predictable and interpretable behavior Aesthetic Productivity: Engagement produces valued material

How Musicality Emerges

Organizational Density (ΔOD): Forms the foundational layer of measurable sonic structure
Emergent Differentiation (ΔED): Enabled and constrained by ΔOD, reveals computational transformation potential
Compositional Affordance (ΔCA): Enabled and constrained by ΔED, measures practical accessibility for composition

Musicality (M) emerges only when all three factors are present and sufficiently high.

How the App Measures These Factors

The Enhanced Audio Analyzer section uses various audio analysis techniques to compute these factors:

ΔOD Measurement: Uses periodicity analysis (Zero Crossing Rate) and spectral concentration (Spectral Centroid) to quantify organizational density
ΔED Measurement: Analyzes frequency spread and dynamic range to estimate emergent differentiation potential through computational processing
ΔCA Measurement: Evaluates signal clarity and spectral balance to determine compositional affordance and practical usability
Overall Musicality (M): Calculated as: M = (ΔOD × 0.4) + (ΔED × 0.35) + (ΔCA × 0.25), with ΔCA as the bounding constraint

Note about ΔCA measurement: To measure true ΔCA, the app would need to evolve into an interactive platform that logs composer actions, corrections, iterations and vocabulary use over multiple sessions, thereby measuring the collaboration itself. However, it uses the recording's features to infer the probable potential for emergent differentiation and compositional collaboration.

Technical Implementation

Behind the scenes, the analyzer computes these key audio features:

Metric	Purpose	Typical Values
Spectral Centroid	The "center of mass" of frequencies, indicating brightness (influences ΔOD)	<500 Hz: Dark \| 500-2000 Hz: Mid \| >5000 Hz: Bright
Spectral Spread	Frequency distribution width around centroid (influences ΔED)	Low: Pure tones \| High: Noise, complex textures
Spectral Flatness	Tonal vs. noise character (influences ΔED)	0.0-0.3: Tonal \| 0.3-0.7: Mixed \| 0.7-1.0: Noisy
Zero Crossing Rate (ZCR)	Indicates periodicity and noisiness (influences ΔOD)	High: Noisy \| Low: Tonal/periodic
Crest Factor	Peak-to-average ratio (influences ΔCA)	1.0-2.0: Compressed \| 2.0-5.0: Normal \| >5.0: Percussive

Limitations and Known Issues

File Size: Maximum 10 MB per file in Folder Analysis to ensure browser performance and proper analysis. However, and if your system can handle it, any file size is valid for analysis and creative applications, but without expecting proper label results.
Train Segments: User need to create and save a custom model in the Training Section first, then train individual segments from analysis.
Sample Rate: All audio is converted to 16 kHz mono for analysis, which may affect high-frequency content. Final render is back at 16Bit/44.1Khz.
Training Accuracy: Custom TensorFlow models require ≥ 70% accuracy; noisy or insufficient samples may reduce performance.
MIDI Compatibility: Requires browser support for Web MIDI API and connected MIDI devices.
Memory Constraints: Long recordings may strain browser memory. Be patient when uploading large files or folders. Use 'Clear' > "Render' > 'Clear' > "Render' if playback fails to start.
Audio Engine: Importing large files sometimes crushes the audio engine. Press in this order: Stop -> Render -> Play up to 3 times to restore playback.

Future

Multi-output support to spatialize segments.
Extended midi functions.
OSC support.
Online user base with custom Playback algorithms.
Standalone application (Desktop/iOS/Android).

Developed:

To support and extend the creative process.
To support my PhD thesis 'Algorithmic Soundscape Compostion' at HMU.
For ihearcolors.online
For tokeno.net
By Dimitris Barnias 2025
dbarnias@gmail.com | Contact here for questions, ideas, comments |

Audio Folder Analysis

Select

Analysis: 0%

File Input Analysis

Load

Analysis: 0%

Real-Time Analysis

(m)

Enter duration & click Start

Select Input & Output

Realtime Results

Audio Render

Volume: 0 dB

Fade: 1.0s

Pitch: 0 semitones

Time-stretch: 1

Playback Speed: 1.0x

Reverb Dry/Wet: 0 %

Room Size: 0.5

Apply reverb to:

Enable Multi-Renders Number of Channels: Require rule matching

Filter Type:

High Pass Cutoff (Hz): 20 Hz

Low Pass Cutoff (Hz): 20000 Hz

Low EQ (dB): 0.0 dB

Mid EQ (dB): 0.0 dB

High EQ (dB): 0.0 dB

Random Playback: 100%

Random Modes:

Evolving Renders:

Chain Mode:

Source:

Main Mode:

Cell Rule 8

Cell Bars 4

Cell Seed 0

Eucl Step 8

Eucl Puls 3

Eucl Rota 0

Spectral BPM 90

Narrative Balance 50

Ecosystem Mutation 50

Chaos Bounce 50%

Formant Ladder Steps 6

Lorenz:

Min (Hz)

Max (Hz)

Sparse 50%

Each segment = 50% playback chance

Emotional Tone

0 matching segments

Cultural Direction

Max per Zone

Rural → Suburban → Urban → Industrial → Natural

Maslow Direction

Segments per Level

Physiological → Safety → Belonging → Esteem → Self-Actualization

Ontological Method

Max Depth Max per Leaf

Natural/Human/Mechanical → Subcategories → Specific Sounds

Spectral Source:

↓

Spectral Target:

Transfer Amount: 70% Subtle ←→ Extreme

Enter labels and click render to transfer spectral characteristics

Preserve transients (keep attack clarity) Match duration to source

Material Space Mixed material space

Synthetic →

← Organic

Hard →

← Soft

Material X: 50 Organic ←→ Synthetic

Material Y: 50 Soft ←→ Hard

Blend Radius: 20 Sharp ←→ Diffuse

Material Density: 80%

Label Filter

Active on Label Playback mode

Rendered Waveform

Render length: 60s

Stretch Off

MIDI Controller

MIDI Output

Trigger Labels

Label Name:

Note (0-127): Volume (0-127): Pan (0-127, 64=center): MIDI Channel (1-16):

Minimum delay between notes (ms):

Detected Labels

Train Custom Sounds

Ready to train new sounds

Select Training Samples (5+ recommended)

Add Negative Examples

Select Negative Samples

Sound Event Ditribution

No data yet. Record or upload audio.

Background window

10 s

Min Hits

Folder Analysis

File	Waveform	Top-3 Labels (Score)	Category	Confidence Over Time

Single File Recording Analysis

Start Time (s)	Waveform	Label	Category	Actions

Real-Time Analysis

Time (s)	Top Label (Score)	Category	Download

Analysis Results Render Pie MIDI Train

Enhanced Audio Analyzer

Drop audio file here or click to browse

Supports WAV, MP3, OGG, FLAC

Visualization

Time Domain

Frequency Domain

Complete analysis processes the entire file for accurate results.

Spectral Synthesis

Spectral Smear: 0.50

Harmonic Boost: 0.50

Noise Reduction: 0.50

Temporal Stretch: 1.00

Spectral Warp: 0.00

Resonance: 0.50

Analysis Influence on Synthesis

OP Influence

ED Influence

CA Influence

Overtone Generator

Enable Overtone Generation

Segments -> User Guide and General Info

Liminal Territories

Key Features

MediaPipe YAMNet vs. TensorFlow Custom Models

How to use Segments

Getting Started

Training Custom Models (TensorFlow)

Steps to Train a Custom Model

Custom Model Architecture

Rendering Audio

Rendering Audio Methods (Playback Setting Section)

Playback Modes (the list is growing...)

Playback Mode Parameters

User Custom Algorithms Examples

Looping Controls - Evolving Playback

Label Playback Mode

Rendering Options (Main Controls Section)

Reverb Controls

Chain Mode

Multiple Renders

Using MIDI Functions

Steps to Set Up MIDI

Practical Scenarios and Applications

Nature Conservation

Music and Sound Design

Music Creation: Experimental Live Performance

Smart Home and IoT

Film and Games

Spatial Audio

File Formats and Specifications

Pie Charts

The Three-Factor Model of Emergent Musicality

The Musicality Formula

The Three Factors Explained

How Musicality Emerges

How the App Measures These Factors

Technical Implementation

Limitations and Known Issues

Future

Developed:

Audio Render

Rendered Waveform

Audio Musicality Analysis

MIDI Controller

MIDI Output

Trigger Labels

Detected Labels

Train Custom Sounds

Add Negative Examples

Sound Event Ditribution

Folder Analysis

Single File Recording Analysis

Real-Time Analysis

Enhanced Audio Analyzer

File Information

Visualization

Time Domain

Frequency Domain

Analysis Results

Technical Metrics

Statistics

Interpretation

Spectral Synthesis

Analysis Influence on Synthesis

Overtone Generator

Fundamental Detection

Harmonic Spectrum