Effortlessly explore and summarize your data in SceneBox's rich web interface. Use advanced queries, interactive dashboards, and flexible filters to curate training datasets, examine bias, and uncover data bugs. Searching and summarizing your data has never been easier.
ML-generated embeddings are a powerful way to explore vast datasets. Use SceneBox's embeddings view to visualize your data, then cluster, select, curate, and spot outliers by diving into various corners of your dataset. You can either bring your own embedding spaces or use SceneBox's model zoo to index your data.
Managing and searching your multi-sensor temporal data has never looked so good. Perception logs are often composed of multiple streams of data from various sources including RGB cameras, Lidars, time-series, GPS, etc.
The SceneBox event engine synchronizes data across any number of sources to enable queries such as: "Find all the Lidar scenes when a car and a pedestrian were detected from a side camera, vehicle speed was >50 km/h, and it was raining, in Seattle, at night".
Found a few samples of an interesting corner case, but don't have the right query or metadata to find more? SceneBox uses ML to search fully unstructured datasets. Think "Google Photos" for your data. SceneBox enables 1-to-N and M-to-N search across one or multiple embedding spaces.
Send your curated datasets to your labeler with a single click (or a single API call) and manage all the annotated data and annotation worflows from a unified interface. SceneBox's Annotation Hub provides integrations with best-in-class annotation platforms. SceneBox also provides a hosted CVAT to streamline your labeling workflows.
Data operations often involve many interdependent moving pieces and full visibility to the entire process is key to effectively curate best datasets. Manage your data campaigns by viewing associated datasets, status of labeling operations, pre-tagging, labeling consensus, and much more.
Use powerful metrics such as mean intersection over union (IoU) to quickly compare models (including ground truths) or annotators. An IoU distribution and confusion matrix are provided to help you visually debug your model/data, find the corner cases where you need to collect more data for training, and identify labeling noise/errors. Then use similarity search to find more raw data.
Look at the embeddings of your data to easily identify label noise, or find discrepancies in annotations from multiple labelers or models.
In this example, a laptop is mislabeled as a TV. Looking at the image embeddings view, the mislabeled datapoint is noticed in a cluster of other laptops. The incorrect label is visually singled out with a color that does not match its cluster, making the error obvious and helping you debug your labels.
You can deploy SceneBox your data lakes across multiple data sources (AWS, GCS, Azure, on-prem), acting as a window into your data without changing its residency or having to send large, raw data to other servers.
In addition to SceneBox SaaS, SceneBox can be deployed over any major cloud VPC (AWS, GCP, Azure), or on-premise for full control and privacy of your data. With SceneBox's cloud-agnostic microservices architecture, it only require a Kubernetes system to operate.
SceneBox allows programmatic interactions for custom integrations or the automation of data operations using Python and REST APIs.