Holo models
Below is a summary of how our models compare with each other:Model | Size | Tensor type | General purpose | Use case |
---|---|---|---|---|
Holo1.5 72B | 73.4B params | BF16 | Highest accuracy, designed for cutting-edge research and advanced reasoning | Complex, large-scale enterprise tasks requiring maximum capability. |
Holo1.5 7B | 8.29B params | BF16 | Balanced performance across accuracy, speed, and efficiency. | Versatile choice for production workloads and mid-scale applications. |
Holo1.5 3B | 3.75B params | BF16 | Lightweight yet capable, optimized for responsiveness. | Ideal for common tasks on interactive applications |
Holo1 7B | 8.29B params | BF16 | Higher accuracy and for large scale inference | Full-scale tasks |
Holo1 3B | 3.75B params | BF16 | Optimized for efficiency, running locally and hardware | Common tasks |
Comparing with other models
Below is a summary of how our models compare with competitor models:Holo1.5
The charts and table below demonstrate how Holo1.5 compares with competing models in both UI Localization and Screen Content Understanding via Question Answering. UI Localization is the agent’s ability to find and interact with specific parts of a user interface. Screen Content Understanding via Question Answering, on the other hand, refers to the agent’s understanding of a UI structurally and functionally, based on the quality of the answers it provides.State-of-the-art (SOTA) UI Localization


WebClick | Showdown | ScreenSpot-v2 | ScreenSpot-Pro | Ground-UI-1K | OSWorld-G | Average | |
---|---|---|---|---|---|---|---|
Holo1.5-3B | 81.45 | 67.50 | 91.66 | 51.49 | 83.20 | 61.57 | 72.81 |
Holo1.5-7B | 90.24 | 72.17 | 93.31 | 57.94 | 84.00 | 66.27 | 77.32 |
Holo1.5-72B | 92.43 | 76.84 | 94.41 | 63.25 | 84.50 | 71.80 | 80.54 |
Qwen2.5-VL-3B | 71.20 | 50.30 | 80.00 | 29.30 | 76.40 | 34.31 | 56.92 |
Qwen2.5-VL-7B | 76.51 | 52.00 | 85.60 | 29.00 | 80.70 | 40.59 | 60.73 |
Qwen2.5-VL-72B | 88.29 | 41.00 | 93.30 | 55.60 | 85.40 | 61.96 | 70.93 |
UI-TARS-1.5-7B | 86.10 | 58.00 | 94.00 | 39.00 | 84.20 | 61.40 | 70.45 |
Holo1-7B | 84.04 | 64.27 | 89.85 | 26.06 | 78.50 | 47.25 | 65.00 |
Holo1-3B | 79.35 | 59.96 | 88.91 | 23.66 | 74.75 | 42.16 | 61.47 |
UI-Venus-7B | 84.44 | 67.32 | 94.10 | 50.80 | 82.30 | 58.80 | 72.96 |
UI-Venus-72B | 77.00 | 75.58 | 95.30 | 61.90 | 75.50 | 70.40 | 75.95 |
Sonnet 4 | 93.00 | 72.00 | 93.00 | 19.10 | 84.00 | 59.60 | 70.12 |
Screen Content Understanding via Question Answering


VisualWebBench | WebSRC | ScreenQAShort | ScreenQAComplex | Average | |
---|---|---|---|---|---|
Holo1.5-3B | 78.50 | 94.80 | 87.90 | 81.40 | 85.65 |
Holo1.5-7B | 82.60 | 95.90 | 91.00 | 83.20 | 88.17 |
Holo1.5-72B | 83.80 | 97.20 | 91.90 | 87.10 | 90.00 |
Qwen2.5-VL-3B | 58.00 | 93.00 | 86.00 | 76.00 | 78.25 |
Qwen2.5-VL-7B | 69.00 | 95.00 | 87.00 | 81.10 | 83.02 |
Qwen2.5-VL-72B | 76.30 | 97.00 | 87.90 | 83.20 | 86.10 |
UI-TARS-1.5-7B | 79.70 | 92.90 | 88.70 | 79.20 | 85.12 |
Holo1-3B | 54.10 | 93.90 | 78.30 | 53.50 | 69.95 |
Holo1-7B | 38.10 | 95.30 | 83.30 | 65.10 | 70.45 |
UI-Venus-7B | 60.90 | 96.60 | 86.30 | 82.30 | 81.52 |
UI-Venus-72B | 74.10 | 96.70 | 88.60 | 83.30 | 85.67 |
Claude-Sonnet-4 | 58.90 | 96.00 | 87.00 | 75.70 | 79.40 |
Holo1
The charts below demonstrate how Holo1 compares with competing models as a Localizer. Pareto-optimal performance on WebVoyager refers to how the mode performs on the WebVoyager benchmark, offering the best accuracy/cost tradeoff among current models. UI Localization is the agent’s ability to find and interact with specific parts of a user interface.Surfer-H: Pareto-Optimal Performance on WebVoyager

State-of-the-Art (SOTA) UI Localization
