Segments is a web application designed both as an artistic tool and an environmental audio classifier.
It enables users to record, analyze, visualize and more crucially, resynthesize surrounding soundscapes.
It is based on a combination of AI-powered analysis with algorithmic synthesis.
Segments transform, reorganize and finally render sonic environments into new auditory structures. It is a scientific instrument and a sound design tool, where attentive listening merges with auralization creativity.
Through a growing list of playback modes (algorithms), users can reframe field recordings (or any other audio material) into narrative arcs, stochastic reorganizations and spectral transformations.
Liminal Territories
Segments is based equally in the conceptual use of algorithms and the phenomenology of sound, followed by reorganization and resynthesis in both of these domains.
Through reorganization hidden structures emerges, patterns rises on the surface and unnoticed details are brought to the foreground.
Sometimes, what was once background becomes an aesthetic focus.
The composer creates a dialogue between nature’s proto-musical sound material and audio signal processing, discovering new musical potentials already present in the soundscape.
Key Features
Audio Analysis: Classify audio into 521 everyday labels using MediaPipe's YAMNet model, bucketed into biophonic or anthropophonic categories.
Rendering Modes: Create composite renders with modes like, Lorenz attractor, Markov chain, Random mix or Emotion-based and many more.
Multiple Renders: Render and export up to 8 channels based on custom user rules.
Real-Time Processing: Analyze live microphone input or uploaded audio files (WAV/MP3).
Custom Model Training: Train new sound labels in-browser using TensorFlow, saved to localStorage. This is separate from the YAMNet model and does not modify it.
MIDI Integration: Map sound labels to MIDI notes, for controlling synthesizers or IoT devices.
Conceptual Reverb: Add Reverb to the full render file, by category (Biophony / Anthropophony) or by label.
Visualization: View analysis results as tables and pie charts for intuitive insights.
MediaPipe YAMNet vs. TensorFlow Custom Models
Segments uses two distinct AI models for audio analysis: the pre-trained MediaPipe YAMNet model and a TensorFlow-based custom model for user-defined labels. Below is a comparison of their roles and capabilities:
Feature
MediaPipe YAMNet
TensorFlow Custom Model
Purpose
Pre-trained model for classifying audio into 521 fixed everyday sound labels (e.g., "Bird," "Car Horn"), bucketed into Biophony or Anthropophony categories.
User-trained model for recognizing new, custom sound labels (e.g., "Coffee Machine") not included in YAMNet's label set.
Training
Fixed and pre-trained; cannot be modified or extended with new labels by users.
Trainable in-browser using 5–20 WAV clips (16 kHz, mono, 1–5 s). Users define new labels, and the model is trained independently of YAMNet.
Label Set
521 pre-defined labels, non-editable.
User-defined labels, created via the "Train on These Results" button.
Storage
Embedded in the application, not stored locally.
Saved to browser's localStorage as JSON weights; persists across reloads.
Integration
Provides baseline classification for all audio analysis tasks.
Supplements YAMNet by adding custom labels, which can be used in rendering or MIDI mapping.
Reset
Not applicable (fixed model).
Reset using Ctrl+Shift+X to clear custom model data.
Note: The YAMNet model is used for all initial audio classifications, while the TensorFlow custom model allows users to extend functionality with new sound labels. Custom labels do not modify or add to YAMNet's fixed label set but operate as a separate, parallel model.
How to use Segments
Getting Started
Wait for Model Loading: Ensure the status window (top right) shows "Model loaded".
Drag & Drop Files: Upload WAV or MP3 files via the "Audio Folder Analysis (≤ 10 MB each) ", "Single File (any size)" or "RealTime (up to 24h)" sections.
Render Audio: Select which source method to render (Playback Setting -> Source). Folder, Single, Realtime or a combination of sources. All three input methods can be rendered simultaneously.
Render Audio: Select a playback mode (Playback Setting -> Main Mode -> e.g. "Bounce Chaos") and click "Render". A new file(s) is created based on the selected playback mode and other settings (Pitch, Volume, Reverb etc). Re-render with a different algorithm (Main Mode) to notice the differences.
MIDI Mapping: [optionally] Assign labels (from YAMNet or custom model) to MIDI notes in the "MIDI Mapper" section to control external devices.
View Results: Check the "Folder Analysis Results," "Single File Recording Analysis," or "Real-Time Analysis" sections for tables and pie charts showing top labels, categories and confidence scores.
Optional Custom Training: In "Train Custom Sounds" section, train your personal model with new labels. This creates a separate custom model, not an update to YAMNet.
Training Custom Models (TensorFlow)
Segments allows in-browser training of custom sound labels using TensorFlow, independent of the YAMNet model.
Analyze Clips: Upload clips via folder, single file, or real-time analysis (processed by YAMNet).
Train: Click the green "Train on These Results" button under the relevant analysis table, enter a label (e.g., "Coffee Machine"), and train in-browser with TensorFlow.
Add Negative Examples: Optionally upload negative samples to improve model accuracy.
Save Model: If accuracy ≥ 70%, the model auto-saves to localStorage. Use "Save Model" to export as JSON or "Load Model" to import.
Reset Model: Press Ctrl+Shift+X to reset the custom model (does not affect YAMNet).
Custom Model Architecture
Structure: 1-D CNN → Global MaxPool → Dense(16) → Dense(2-softmax)
Augmentation: ±10% time-stretch, ±5 dB pink-noise, ±3 dB gain
Validation: 20% hold-out, target ≥ 70% accuracy
Storage: JSON weights in localStorage
Note: Custom training creates a new model separate from YAMNet. Use headphones to avoid feedback during training or recording. Ensure clips are clean and representative of the target sound.
Rendering Audio
Segments allows to create composite audio mixes from analyzed segments using various playback modes, leveraging labels from YAMNet or the custom TensorFlow model. Each mode is a sorting / sequencing strategy.
Modes with [L] are available for looping (Read about Looping further).
Realtime: Render only recordings from Realtime Input.
Folder and Single: Combo 1.
Folder and Realtime: Combo 2.
Single and Realtime: Combo 3.
Playback Modes (the list is growing...)
Low → High: Arranges segments from lowest to highest frequency content AND sequential with a Biophony to Anthropophony order.
High → Low: Reverse of the above AND sequential with a Anthropophony to Biophony.
Lorenz Attractor: Uses chaotic patterns for segment ordering. Based on Lorenz Chaotic Attractor model. [L]
Markov Chain: Probabilistic transitions between segments. [L]
Most Common and Vice Versa: Render by most common or list common labels.
Random Mix: Random segment ordering. [L]
Alternating: Alternates between Biophony and Anthropophony segments. [L]
Emotional Intensity: Orders segments based on emotional valence (requires custom training). If no training has been performed, then the order is based on the intensity of the segments.
Label Select and Exlude: Filters segments by a specific label keyword (from YAMNet or custom model).
Chaos-Bounce: Lorenz-attractor drives segment order; the randomness slider scales how wild the jumps feel (0 = nearly sequential, 100 = chaotic leaps). [L]
Spectral Flow: Sorts segments by a lightweight “brightness” proxy (RMS of upper-half spectrum) so the mix flows from dull to bright or vice-versa.
Spectral Stutter: A combination of Spectral Flow and Rhythmic Stutter. [L]
Rhythmic Stutter: Make each segment stutter-repeat at the chosen BPM. [L]
Formant-Ladder: Sorts segments by pre-computed average frequency, then picks an evenly-spaced “ladder” (2–12 steps); Depending the material it may sound like ascending/descending vocal formants.
Formant-Reorder: Pitch-shifts every segment up by the chosen ratio, sorts the now-scaled formant frequencies low→high, then optionally shuffles that ladder by the randomization % before returning the re-ordered, pre-shifted segments. [L]
Ecosystem Simulation: Simulates an ecosystem where sounds "compete" based on category rarity; "predators" (rare categories) "consume" order positions from "prey" (common ones), with randomization for mutation. In Advanced Setting adjust Mutation Rate. [L]
Narrative Journey: Order like: Start with natural sounds, build tension with Man-made intrusions, climax in hybrids, resolve in harmony. In Advanced Setting adjust Narrative Balance between Human-made and Natural segments. [L]
Quantum: Introduces a quantum-inspired behavior for ordering audio segments, simulating the concept of a quantum wave function and its collapse upon measurement. Unlike deterministic modes like "low→high" (sorting by frequency) or "random" (uniform shuffling), the "quantum" mode balances structure (favoring high-intensity segments) with randomness (probabilistic selection). [L]
Euclidean: Arrange segments based on Eyclidean algorithm. In Advanced Setting adjust Steps, Pulses and Rotation. [L]
Tidal Rhythm: Sorts segments by duration, assigns start times within a cycle (default 10 seconds) and applies randomization to introduce variability. [L]
Cellular-Repeat:Segments repeat in a 1-D cellular automaton: For every bar a segment is either kept, muted or replaced by its spectral neighbour. In Advanced Setting adjust Cells, Rules, Seed. [L]
User Custom: Custom user algorithms written in JavaScript to test on Segments. The function customOrder (files) takes an array of segments and returns an ordered array based on your algorithm.
User Custom Algorithms Examples
Alphabetical Order (A-Z): function customOrder(files) {
return files.slice().sort((a, b) => {
const nameA = a.name || '';
const nameB = b.name || '';
return nameA.localeCompare(nameB);
});
}
Random Shuffle: function customOrder(files) {
const shuffled = files.slice();
for (let i = shuffled.length - 1; i > 0; i--) {
const j = Math.floor(Math.random() * (i + 1));
[shuffled[i], shuffled[j]] = [shuffled[j], shuffled[i]];
}
return shuffled;
}
Key Considerations: Function Signature and Return Value must be named customOrder: The function must be defined as function customOrder(files) { ... } because generateBuffer expects to call customOrder explicitly.
Input: The files parameter is an array of segment objects, each with properties like: audioBuffer: An AudioBuffer (Web Audio API) with duration, sampleRate (16 kHz), numberOfChannels, and getChannelData(ch). category: String, either "Natural" or "Man-made". name: String, the file name (e.g., "bird.wav"). topLabel: String, a primary label (e.g., "bird_call"). topCategories: Array of objects with categoryName (e.g., [{ categoryName: "bird" }, ...]). frequency: Number, assumed for frequency-based sorting (e.g., in 'low→high').
Output: Must return an array of segment objects (a subset or reordering of files). The array must contain valid segments (each with an audioBuffer).
Looping Controls - Evolving Playback
Evolving Renders: If checked, creates a new random render with each new loop (loop time is specified in Render Length) based on the selected mode. Supported in selected playback modes.
Random Mode: Requires Evolving Renders checked. If checked, for each new loop a random playback mode is selected, along with a new random render. Supported in selected playback modes.
Check Both: For endless, non-repeatable playback.
Label Playback Mode
Label Select: Enter the label name to create a render only based on the choosen label.
Label Exclude: Enter label name to exclude it from render.
Rendering Options (Main Controls Section)
Volume: Adjust playback volume.
Time-Stretch: Adjust playback speed (e.g., 1x for normal, 0.5x for slower).
Example: label bird and freq > 1000 or label insect and duration < 2 → 1,2 (all conditions must be true to return segments)
Example: category natural and freq 500-2000 or rms > -25 → 3,4,5
Browsers do not directly support multi-channel audio to drive the multiple renders into multiple speakers in real-time.
Howevever, the multiple renders files can be uploaded to your DAW and driven to selected outputs if your audio interface support it.
There is a known bug which I am working to fix it. It does not affect the core functionality of the app, however it can be annoying.
Importing large files (mostly in Single file but sometimes in Folder Input files too), crushes the audio engine (only the first render).
When this happens, pressing Play may not start playback immediately.
FIX: Press Stop Button > Render again > Press Play. Maybe you need to repeate these steps for 2 or 3 times before audio is ready to play. All following renders, using the same files and while trying other playback modes, function normally.
Using MIDI Functions
Segments maps sound labels (from YAMNet or custom TensorFlow model) to MIDI notes, to control synthesizers, DAWs, or IoT devices.
Steps to Set Up MIDI
Select MIDI Port: Choose an output port in the "MIDI Mapper" section.
Add Mapping: In the "Live MIDI Map" or "Pre-Map Sounds to MIDI" section, assign a label (from YAMNet or custom model) to a MIDI Note, CC, or Channel.
Common Sounds: Pre-map frequently detected YAMNet labels (e.g., "Bird," "Traffic").
Custom Sounds: Add custom-trained labels and assign MIDI parameters.
Save Mappings: Click "Save All Mappings" to store configurations.
Re-scan MIDI: Refresh available MIDI ports if devices change.
Note: Ensure your MIDI device or software is connected and recognized by the browser before mapping.
Practical Scenarios and Applications
Nature Conservation
Use Case: Use Segments to monitor a natural sonic environment. Record 10 minutes of soundscape, analyze it and find the percentage of Biophony sounds (birds, wind) vs.Anthropophony (distant traffic) in a specific area. Pie charts visualize ecosystem health. Hint: Train a custom TensorFlow model to detect a rare frog species, aiding conservation efforts.
Music and Sound Design
Use Case: Upload urban sound recordings, analyze and render a Markov chain-based mix of traffic and rain sounds. Then map a custom "Car Horn" label (trained via TensorFlow) to MIDI Note C3 to trigger a synth in their DAW, creating a dynamic soundscape for a film score.
Music Creation: Experimental Live Performance
Use Case: Use Segments's real-time analysis during a live set. Stream microphone input and the model detects sounds like "Applause" and "Footsteps" or your custom training labels. These are mapped to MIDI CCs to modulate synth parameters in e.g Ableton Live.
Smart Home and IoT
Use Case: Train a custom "Doorbell" label using TensorFlow and maps it to a MIDI CC to trigger smart lights. Segments detects the doorbell in real-time, automating the lighting response.
Film and Games
Use Case: Upload location recordings, filters for YAMNet's "Footsteps" label using Render by Label and downloads a composite WAV for a game's Foley library.
Spatial Audio
Use Case: Use the multi render section to generate up to 8 different audio channels. Upload to your DAW and use a multi-output audio interface to drive selected speakers.
File Formats and Specifications
Use Case
Format
Bits / Channels
Sample Rate
Max Size
Length Notes
Folder / Single Analysis
WAV, MP3
16–24 bit, mono/stereo
Auto-converted to 16 kHz mono
10 MB/file
≥ 0.1 s
Real-Time Stream
Web-Audio PCM
Float32, mono
16 kHz
—
Continuous
Custom-Training Samples
WAV (preferred)
16 bit, mono
16 kHz
10 MB/file
1–5 s ideal
Pie Charts
Purpose:Visualizes the distribution of sounds.
Sound Event Distribution Pie: Analyze the sonic environment for individual sounds (human or natural, with labels).
Background Pie: Visualizes the distribution of dominant, steady background sounds in an audio file or real-time input, focusing on persistent soundscapes rather than transient events.
Background Window (1-30s): Sets the time window for detecting dominant, steady background sounds, filtering out short, one-shot events like bird chirps or claps.
Min Hits: Specifies the minimum number of times a sound label must appear within this window to be considered a significant background element.
Real-Time Constraints: Long recordings (e.g., 1440 minutes) may strain browser memory.
Limitations and Known Issues
File Size: Maximum 10 MB per file in Folder Analysis to ensure browser performance and proper analysis. However, and if your system can handle it, any file size is valid for analysis and creative applications, but without expecting proper label results.
Train Segments: User need to create and save a custom model in the Training Section first, then train individual segments from analysis.
Sample Rate: All audio is converted to 16 kHz mono for analysis, which may affect high-frequency content. Final render is back at 16Bit/44.1Khz.
Training Accuracy: Custom TensorFlow models require ≥ 70% accuracy; noisy or insufficient samples may reduce performance.
MIDI Compatibility: Requires browser support for Web MIDI API and connected MIDI devices.
Memory Constraints: Long recordings may strain browser memory. Be patient when uploading large files or folders. Use 'Clear' > "Render' > 'Clear' > "Render' if playback fails to start.
Audio Engine: Importing large files sometimes crushes the audio engine. Press in this order: Stop -> Render -> Play up to 3 times to restore playback.
Future
Multi-output support to spatialize segments.
Extended midi functions.
OSC support.
Online user base with custom Playback algorithms.
Standalone application (Desktop/iOS/Android).
Developed:
To support and extend the creative process.
To support my PhD thesis 'Algorithmic Soundscape Compostion' at HMU.
For ihearcolors.online
For tokeno.net
By Dimitris Barnias 2025
dbarnias@gmail.com | Contact here for questions, ideas, comments |