If You Can
A Benchmark for Anomaly Segmentation
Meet the Team!
Deep CNNs are unreliable outside of their training distribution
Perception Failures were at the Heart of Past Accidents
We benchmark semantic anomalies.
We benchmark the identification of semantic anomalies that do not fit into any class definitions.
Anomaly Track: detect and localize anomaly with respect to Cityscapes.
Obstacle Track: detect and localize anything that is
drivable area on the road.
based on real-world images
Public leaderboard and submission instructions at
Anomalies can appear everywhere in the image.
Anomalies widely differ in size.
Wide variety of environments.
Not anomaly (white): 19 Cityscapes classes
Anomaly (orange): OoD objects
Void (black): Unknown objects in the background, gaps inside anomalies etc.
The objects of interest are the carriage including the horses. The other unknown objects in the background (e.g. parasols, chair) and gaps inside the carriage are voided.
Obstacles appear at different distances.
Different road surfaces.
Different lighting and weather conditions..
Not obstacle (white): Driveable area
Obstacle (orange): Objects placed on the road
Void (black): Everything besides the road, gaps inside obstacles
The road is the region of interest. The basket is labeled as obstacle, the background and the gap are ignored in the evaluation.
Anomaly Segmentation Performance Metrics
Classic pixel-wise metrics:
AUROC: Area under receiver operating characteristic curve (TPR vs. FPR)
: Area under precision recall curve (precision vs. recall)
Recent component-wise metrics:
sIoU: adjusted component-wise intersection over union wrt ground truth
PPV: component-wise positive predictive value (or precision) wrt prediction
TP: sIoU greater than a given threshold τ
FN: sIoU smaller than a given threshold τ
FP: PPV smaller than a given threshold τ
:= 2TP / (2TP + FP + FN) ∈ [0,1] averaged over τ=0.25,0.30,...0.75
Ordinary vs. adjusted component-wise intersection over union.
: IoU=68.18% vs. sIoU=87.01%
: IoU=21.68% vs. sIoU=68.44%
##### Evaluation and submission Loaders and metrics code: [github.com/SegmentMeIfYouCan/road-anomaly-benchmark](https://github.com/SegmentMeIfYouCan/road-anomaly-benchmark) Submit outputs for evaluation against private ground-truths. Public validation set is available. ```python from road_anomaly_benchmark.evaluation import Evaluation ev = Evaluation( method_name = 'MyMethod', dataset_name = 'ObstacleTrack-all', ) for fr in ev.get_frames(): # run your detector with the benchmark images anomaly_p = my_method(fr.image) ev.save_result(fr, anomaly_p) # files are being written in a background thread ev.wait_to_finish_saving() ```
##### Reusable metrics * Metrics can be used with other datasets * Add new metrics in a modular way ```bash # Pixel classification (PR, ROC) metric python -m road_anomaly_benchmark metric PixBinaryClass \ method1,method2 ObstacleTrack-validation # Instance level metrics python -m road_anomaly_benchmark metric SegEval-ObstacleTrack \ method1,method2 dset1,dset2 ```
Generate curves and tables
```bash python -m road_anomaly_benchmark comparison \ MyComparison \ metric1,metric2 \ method1,method2 \ dset1,dset2 ```
Benchmark results - advantages:
* Tailored solutions * Out-of-distribution training data