Our models are optimized based on size, performance, and use case. They’re cost-effective yet resilient, and designed to extract the most amount of value in the most reliable manner. The table below compares our Holo models with each other to easily distinguish between them and to demonstrate the progress we’ve made in pushing the limits of what our models can do.

Holo models

Below is a summary of how our models compare with each other:
ModelSizeTensor typeGeneral purposeUse case
Holo1.5 72B73.4B paramsBF16Highest accuracy, designed for cutting-edge research and advanced reasoningComplex, large-scale enterprise tasks requiring maximum capability.
Holo1.5 7B8.29B paramsBF16Balanced performance across accuracy, speed, and efficiency.Versatile choice for production workloads and mid-scale applications.
Holo1.5 3B3.75B paramsBF16Lightweight yet capable, optimized for responsiveness.Ideal for common tasks on interactive applications
Holo1 7B8.29B paramsBF16Higher accuracy and for large scale inferenceFull-scale tasks
Holo1 3B3.75B paramsBF16Optimized for efficiency, running locally and hardwareCommon tasks

Comparing with other models

Below is a summary of how our models compare with competitor models:

Holo1.5

The charts and table below demonstrate how Holo1.5 compares with competing models in both UI Localization and Screen Content Understanding via Question Answering. UI Localization is the agent’s ability to find and interact with specific parts of a user interface. Screen Content Understanding via Question Answering, on the other hand, refers to the agent’s understanding of a UI structurally and functionally, based on the quality of the answers it provides.

State-of-the-art (SOTA) UI Localization

Holo1 5 Figure1 Pn Holo1 5 Figure2 Pn
WebClickShowdownScreenSpot-v2ScreenSpot-ProGround-UI-1KOSWorld-GAverage
Holo1.5-3B81.4567.5091.6651.4983.2061.5772.81
Holo1.5-7B90.2472.1793.3157.9484.0066.2777.32
Holo1.5-72B92.4376.8494.4163.2584.5071.8080.54
Qwen2.5-VL-3B71.2050.3080.0029.3076.4034.3156.92
Qwen2.5-VL-7B76.5152.0085.6029.0080.7040.5960.73
Qwen2.5-VL-72B88.2941.0093.3055.6085.4061.9670.93
UI-TARS-1.5-7B86.1058.0094.0039.0084.2061.4070.45
Holo1-7B84.0464.2789.8526.0678.5047.2565.00
Holo1-3B79.3559.9688.9123.6674.7542.1661.47
UI-Venus-7B84.4467.3294.1050.8082.3058.8072.96
UI-Venus-72B77.0075.5895.3061.9075.5070.4075.95
Sonnet 493.0072.0093.0019.1084.0059.6070.12

Screen Content Understanding via Question Answering

Holo1 5 Figure3 Pn Holo1 5 Figure4 Pn
VisualWebBenchWebSRCScreenQAShortScreenQAComplexAverage
Holo1.5-3B78.5094.8087.9081.4085.65
Holo1.5-7B82.6095.9091.0083.2088.17
Holo1.5-72B83.8097.2091.9087.1090.00
Qwen2.5-VL-3B58.0093.0086.0076.0078.25
Qwen2.5-VL-7B69.0095.0087.0081.1083.02
Qwen2.5-VL-72B76.3097.0087.9083.2086.10
UI-TARS-1.5-7B79.7092.9088.7079.2085.12
Holo1-3B54.1093.9078.3053.5069.95
Holo1-7B38.1095.3083.3065.1070.45
UI-Venus-7B60.9096.6086.3082.3081.52
UI-Venus-72B74.1096.7088.6083.3085.67
Claude-Sonnet-458.9096.0087.0075.7079.40

Holo1

The charts below demonstrate how Holo1 compares with competing models as a Localizer. Pareto-optimal performance on WebVoyager refers to how the mode performs on the WebVoyager benchmark, offering the best accuracy/cost tradeoff among current models. UI Localization is the agent’s ability to find and interact with specific parts of a user interface.

Surfer-H: Pareto-Optimal Performance on WebVoyager

Holo1 3B Figure2 Pn

State-of-the-Art (SOTA) UI Localization

Holo1 3B Figure3 Pn