Result Object Structure + Examples

The AIServerModel and CloudServerModel classes return a result object that contains the inference results from the predict and predict_batch functions.

Example:

let someResult = await someModel.predict(image);
console.log(someResult);

For example, the result can be structured like this:

{
    "result": [
        [
            { "category_id": 1, "label": "foo", "score": 0.2 },
            { "category_id": 0, "label": "bar", "score": 0.1 }
        ],
        "frame123"
    ],
    "imageFrame": imageBitmap
}

Accessing the Result Data

Inference Results: Access the main results using someResult.result[0].
Frame Info / Number: Get the frame information or frame number using someResult.result[1].
Original Input Image: Access the original input image with someResult.imageFrame.

Inference Result Types

The inference results can be one of the following types:

Detection: Contains bbox, category_id, label, score, optional landmarks and mask for instance segmentation and pose estimation.
Classification: Contains label and score for single-label classification.
Multi-Label Classification: Contains classifier name and a results array with multiple { label, score } entries.
Segmentation Mask: Contains shape and data, where data is a flat array (or Uint8Array) representing class IDs per pixel.
Pose Detection: Contains landmarks array with each landmark having category_id, landmark coordinates, optional score, label, and connectivity information.

Example Results

1. Detection Result

Detection results include bounding boxes (bbox) along with category IDs, labels, and confidence scores. Optionally, masks and landmarks may be included for segmentation and pose.

[
  {
    "category_id": 2,
    "label": "person",
    "score": 0.98,
    "bbox": [0.1, 0.2, 0.4, 0.8]
  },
  {
    "category_id": 3,
    "label": "dog",
    "score": 0.87,
    "bbox": [0.5, 0.3, 0.9, 0.7],
    "mask": {
      "width": 128,
      "height": 128,
      "data": "<RLE string>"
    }
  }
]

2. Classification Result

Single-label classification results contain a label, score, and category_id.

[
  {
    "category_id": 0,
    "label": "dog",
    "score": 0.76
  }
]

3. Multi-Label Classification Result

Multi-label classification results include a classifier name and an array of label-score pairs.

[
  {
    "classifier": "scene_tags",
    "results": [
      { "label": "beach", "score": 0.85 },
      { "label": "sunset", "score": 0.65 }
    ]
  }
]

4. Segmentation Mask Result

Semantic segmentation results provide a shape and a data array, which is typically returned as a Uint8Array.

[
  {
    "shape": [1, 256, 256],
    "data": [0, 1, 1, 0, /* ... repeated for 256*256 entries ... */]
  }
]

5. Pose Detection Result

Pose detection results include landmarks for each detected person, where each landmark has coordinates, optional score and label, and connectivity information.

[
  {
    "category_id": 0,
    "label": "person",
    "score": 0.98,
    "bbox": [0.1, 0.2, 0.4, 0.8],
    "landmarks": [
      {
        "category_id": 0,
        "landmark": [0.15, 0.25],
        "connect": [1],
        "label": "nose",
        "score": 0.99
      },
      {
        "category_id": 1,
        "landmark": [0.15, 0.35],
        "connect": [0],
        "label": "left_eye",
        "score": 0.97
      }
    ]
  }
]