DeGirumJS predict_batch() + Readable Stream examples
If your browser supports the WebCodecs API, you can use it to develop some really cool video processing pipelines with DeGirumJS!
The WebCodecs API provides low-level access to the individual frames of a video stream. This allows for highly efficient and flexible video processing pipelines directly in the browser. When combined with DeGirumJS's predict_batch()
method, you can perform real-time AI inference on a live webcam stream with minimal latency.
The core components of this pipeline are:
1. MediaStreamTrackProcessor
: Takes a MediaStreamTrack
(like from a webcam) and exposes its frames as a ReadableStream
of VideoFrame
objects.
2. predict_batch()
: The DeGirumJS method that can directly consume a ReadableStream
of VideoFrame
objects and efficiently process them for inference.
3. MediaStreamTrackGenerator
: Takes a stream of processed VideoFrame
objects and exposes them as a new MediaStreamTrack
, which can be displayed in a <video>
element.
Here are some examples demonstrating how to build pipelines using these components:
Example 1: Simple Example: ReadableStream as Input
This example demonstrates the most direct way to perform inference on a video stream. We will take the ReadableStream
provided by the MediaStreamTrackProcessor
and feed it directly into model.predict_batch()
.
How it works:
* Get a videoTrack
from the webcam using navigator.mediaDevices.getUserMedia
.
* Create a MediaStreamTrackProcessor
to get a ReadableStream
of VideoFrame
objects.
* Pass this readableStream
directly as the data source to model.predict_batch()
.
* Display the results in a <canvas>
.
<p>Inference results from a direct video stream:</p>
<canvas id="outputCanvas"></canvas>
<script src="https://assets.degirum.com/degirumjs/0.1.4/degirum-js.min.obf.js"></script>
<script type="module">
// --- Model Setup ---
const dg = new dg_sdk();
const secretToken = localStorage.getItem('secretToken') || prompt('Enter secret token:');
localStorage.setItem('secretToken', secretToken);
const MODEL_NAME = 'yolov8n_relu6_coco--640x640_quant_n2x_orca1_1';
const ZOO_IP = 'https://cs.degirum.com/degirum/public';
const zoo = await dg.connect('cloud', ZOO_IP, secretToken);
const model = await zoo.loadModel(MODEL_NAME);
// 1. Get video stream from webcam
const mediaStream = await navigator.mediaDevices.getUserMedia({ video: true });
const videoTrack = mediaStream.getVideoTracks()[0];
// 2. Create a processor to get a readable stream of frames
const processor = new MediaStreamTrackProcessor({ track: videoTrack });
const readableStream = processor.readable;
// 3. Feed the stream to predict_batch and loop through results
for await (const result of model.predict_batch(readableStream)) {
// Display the result on the canvas
await model.displayResultToCanvas(result, 'outputCanvas');
// IMPORTANT: Close the frame to release memory.
// The SDK does not close frames when you provide a raw stream.
result.imageFrame.close();
}
</script>
Example 2: Real-Time Inference with Display in a <video>
Element
While the first example is simple, you might want to output the processed video (with results drawn) into a <video>
element (for further processing, use by other libraries in your code, etc...). This example uses WebCodecs for re-encoding the processed frames back into a video track.
We use a TransformStream
to orchestrate the work and a MediaStreamTrackGenerator
to create the final output video track. This pattern is more robust and flexible for building complex applications.
How it works:
* A MediaStreamTrackProcessor
creates a ReadableStream
from the webcam.
* This stream is piped through a TransformStream
. Inside the transform
function, for each frame
:
1. We run inference on the frame using model.predict()
.
2. We draw the original frame onto an OffscreenCanvas
.
3. We use model.displayResultToCanvas()
to overlay the inference results on that same canvas.
4. We enqueue a new VideoFrame
created from the canvas to the stream's controller.
5. We close the original frame
to free up memory.
* The output of the TransformStream
is piped to the writable
side of a MediaStreamTrackGenerator
.
* The MediaStreamTrackGenerator
's track is then attached to a <video>
element's srcObject
.
<p>Inference results inside a video element</p>
<video id="outputVideo" width="640" height="480" autoplay muted></video>
<script src="https://assets.degirum.com/degirumjs/0.1.4/degirum-js.min.obf.js"></script>
<script type="module">
const outputVideo = document.getElementById('outputVideo');
// --- Model Setup ---
const dg = new dg_sdk();
const secretToken = localStorage.getItem('secretToken') || prompt('Enter secret token:');
localStorage.setItem('secretToken', secretToken);
const MODEL_NAME = 'yolov8n_relu6_coco--640x640_quant_n2x_orca1_1';
const ZOO_IP = 'https://cs.degirum.com/degirum/public';
const zoo = await dg.connect('cloud', ZOO_IP, secretToken);
const model = await zoo.loadModel(MODEL_NAME);
// Use an OffscreenCanvas for efficient background rendering
const canvas = new OffscreenCanvas(640, 480);
const ctx = canvas.getContext('2d');
const stream = await navigator.mediaDevices.getUserMedia({ video: true });
const videoTrack = stream.getVideoTracks()[0];
const trackProcessor = new MediaStreamTrackProcessor({ track: videoTrack });
const trackGenerator = new MediaStreamTrackGenerator({ kind: "video" });
outputVideo.srcObject = new MediaStream([trackGenerator]);
// Define the transformation logic
const transform = async (frame, controller) => {
// Run inference on the current frame.
// Note: We use predict() here, not predict_batch(), as we process one frame at a time.
const result = await model.predict(frame);
// If we have a valid result, draw it on top
if (result.result) {
await model.displayResultToCanvas(result, canvas);
} else {
// Draw the original frame onto our offscreen canvas
ctx.drawImage(frame, 0, 0);
}
// Create a new frame from the canvas and pass it down the pipeline
controller.enqueue(new VideoFrame(canvas, { timestamp: frame.timestamp }));
// IMPORTANT: Close the original frame to release its resources.
frame.close();
};
// Construct the full pipeline!
trackProcessor.readable
.pipeThrough(new TransformStream({ transform }))
.pipeTo(trackGenerator.writable);
</script>
Example 3: Parallel Inference on Four Video Streams
The WebCodecs API and DeGirumJS can handle multiple independent video pipelines at once. This example demonstrates four processed video streams displayed in a 2x2 grid.
This architecture is highly scalable. While we use a cloned track here, you could just as easily use four different video sources (e.g., multiple cameras or video files).
How it works:
* Grab a single webcam track (mainVideoTrack
).
* Clone the track four times so each pipeline gets its own independent MediaStreamTrack
.
* For each pipeline:
1. Load a separate model instance.
2. Create a MediaStreamTrackProcessor
for the cloned track to get a ReadableStream
of VideoFrame
s.
3. Pass the stream directly to model.predict_batch()
.
4. For each inference result, render detections onto the assigned <canvas>
element using model.displayResultToCanvas()
.
5. Close the frame after processing to release memory.
<!DOCTYPE html>
<html>
<head>
<title>DeGirumJS four-canvas parallel demo</title>
<style>
html,
body {
margin: 0;
height: 100%;
}
#canvas-grid {
display: grid;
grid-template-columns: repeat(2, 1fr);
grid-template-rows: repeat(2, 1fr);
width: 100vw;
height: 100vh;
}
canvas {
width: 100%;
height: 100%;
background: #000;
display: block
}
</style>
</head>
<body>
<div id="canvas-grid">
<canvas id="canvas_0" width="640" height="480"></canvas>
<canvas id="canvas_1" width="640" height="480"></canvas>
<canvas id="canvas_2" width="640" height="480"></canvas>
<canvas id="canvas_3" width="640" height="480"></canvas>
</div>
<script src="https://assets.degirum.com/degirumjs/0.1.4/degirum-js.min.obf.js"></script>
<script type="module">
// ----- Model setup -----
const dg = new dg_sdk();
const secretToken = localStorage.getItem('secretToken') || prompt('Enter secret token:');
localStorage.setItem('secretToken', secretToken);
const MODEL_NAMES = [
'yolov8n_relu6_coco--640x640_quant_n2x_orca1_1',
'yolov8n_relu6_face--640x640_quant_n2x_orca1_1',
'yolov8n_relu6_hand--640x640_quant_n2x_orca1_1',
'yolov8n_relu6_widerface_kpts--640x640_quant_n2x_orca1_1'
];
const NUM_PIPELINES = MODEL_NAMES.length;
const ZOO_IP = 'https://cs.degirum.com/degirum/public';
const zoo = await dg.connect('cloud', ZOO_IP, secretToken);
// Grab the webcam once and clone the track
const stream = await navigator.mediaDevices.getUserMedia({ video: true });
const mainVideoTrack = stream.getVideoTracks()[0];
async function setupPipeline(index, videoTrack) {
// Load a separate model instance for each stream
const model = await zoo.loadModel(MODEL_NAMES[index]);
// Processor gives us a ReadableStream<VideoFrame>
const processor = new MediaStreamTrackProcessor({ track: videoTrack });
const readable = processor.readable;
// Iterate over batched predictions
for await (const result of model.predict_batch(readable)) {
// Draw detections to the right canvas
await model.displayResultToCanvas(result, `canvas_${index}`);
// IMPORTANT: Always close frames when supplying a raw stream
result.imageFrame.close();
}
}
// Create and launch four independent pipelines
for (let i = 0; i < NUM_PIPELINES; i++) {
setupPipeline(i, mainVideoTrack.clone());
}
</script>
</body>
</html>