This guide shows you how to deploy a Holo model as a real-time endpoint using the managed service Amazon SageMaker.
Pre-requisites
Please make sure you have previously subscribed to the model in AWS Marketplace. The notebook does not require a GPU, its purpose is to leverage AWS API (boto3) to deploy the endpoint. Ensure that selected IAM role used has enough privileges. You may start with role with AmazonSageMakerFullAccess policy attached abd that its trust relationship policy allows the actionsts:AssumeRole
for the service principal sagemaker.amazonaws.com
.
Step 1: Install required Python dependencies
Use the following code to install the required packages and import the necessary libraries:Step 2: Set up the SageMaker session and client
Set up a SageMaker session and client so you can connect with AWS and run your models..Step 3: Select Holo model package
Choose your Holo model package .Step 4: Deploy Holo
Deploy a SageMaker real-time endpoint hosted on a GPU instance. If you need general information on real-time inference with Amazon SageMaker, please refer to the SageMaker documentation. The deployed endpoint leverage vLLM serve, hence, supporting OpenAI APIs, exposing thev1/chat/completions
endpoint.
Step 4a. Define the endpoint configuration
Step 4b. Create the endpoint
Step 5: Run an example
The endpoint is in service. You can use Sagemakerinvoke_endpoint
API to perform real-time inference on the deployed Holo-1 model.
- Using AWS SageMaker to Invoke Holo1 for a localization task
- Using AWS SageMaker to Invoke Holo1 for a navigation task