multicloud365
  • Home
  • Cloud Architecture
    • OCI
    • GCP
    • Azure
    • AWS
    • IAC
    • Cloud Networking
    • Cloud Trends and Innovations
    • Cloud Security
    • Cloud Platforms
  • Data Management
  • DevOps and Automation
    • Tutorials and How-Tos
  • Case Studies and Industry Insights
    • AI and Machine Learning in the Cloud
No Result
View All Result
  • Home
  • Cloud Architecture
    • OCI
    • GCP
    • Azure
    • AWS
    • IAC
    • Cloud Networking
    • Cloud Trends and Innovations
    • Cloud Security
    • Cloud Platforms
  • Data Management
  • DevOps and Automation
    • Tutorials and How-Tos
  • Case Studies and Industry Insights
    • AI and Machine Learning in the Cloud
No Result
View All Result
multicloud365
No Result
View All Result

Deploy Qwen fashions with Amazon Bedrock Customized Mannequin Import

admin by admin
June 15, 2025
in AI and Machine Learning in the Cloud
0
Deploy Qwen fashions with Amazon Bedrock Customized Mannequin Import
399
SHARES
2.3k
VIEWS
Share on FacebookShare on Twitter


We’re excited to announce that Amazon Bedrock Customized Mannequin Import now helps Qwen fashions. Now you can import customized weights for Qwen2, Qwen2_VL, and Qwen2_5_VL architectures, together with fashions like Qwen 2, 2.5 Coder, Qwen 2.5 VL, and QwQ 32B. You possibly can convey your individual custom-made Qwen fashions into Amazon Bedrock and deploy them in a completely managed, serverless atmosphere—with out having to handle infrastructure or mannequin serving.

On this put up, we cowl how you can deploy Qwen 2.5 fashions with Amazon Bedrock Customized Mannequin Import, making them accessible to organizations wanting to make use of state-of-the-art AI capabilities throughout the AWS infrastructure at an efficient price.

Overview of Qwen fashions

Qwen 2 and a couple of.5 are households of huge language fashions, out there in a variety of sizes and specialised variants to go well with various wants:

  • Basic language fashions: Fashions starting from 0.5B to 72B parameters, with each base and instruct variations for general-purpose duties
  • Qwen 2.5-Coder: Specialised for code technology and completion
  • Qwen 2.5-Math: Targeted on superior mathematical reasoning
  • Qwen 2.5-VL (vision-language): Picture and video processing capabilities, enabling multimodal functions

Overview of Amazon Bedrock Customized Mannequin Import

Amazon Bedrock Customized Mannequin Import allows the import and use of your custom-made fashions alongside current basis fashions (FMs) by a single serverless, unified API. You possibly can entry your imported customized fashions on-demand and with out the necessity to handle the underlying infrastructure. Speed up your generative AI software growth by integrating your supported customized fashions with native Amazon Bedrock instruments and options like Amazon Bedrock Data Bases, Amazon Bedrock Guardrails, and Amazon Bedrock Brokers. Amazon Bedrock Customized Mannequin Import is usually out there within the US-East (N. Virginia), US-West (Oregon), and Europe (Frankfurt) AWS Areas. Now, we’ll discover how you should utilize Qwen 2.5 fashions for 2 widespread use instances: as a coding assistant and for picture understanding. Qwen2.5-Coder is a state-of-the-art code mannequin, matching capabilities of proprietary fashions like GPT-4o. It helps over 90 programming languages and excels at code technology, debugging, and reasoning. Qwen 2.5-VL brings superior multimodal capabilities. In response to Qwen, Qwen 2.5-VL just isn’t solely proficient at recognizing objects similar to flowers and animals, but additionally at analyzing charts, extracting textual content from photos, deciphering doc layouts, and processing lengthy movies.

Stipulations

Earlier than importing the Qwen mannequin with Amazon Bedrock Customized Mannequin Import, just remember to have the next in place:

  1. An energetic AWS account
  2. An Amazon Easy Storage Service (Amazon S3) bucket to retailer the Qwen mannequin recordsdata
  3. Ample permissions to create Amazon Bedrock mannequin import jobs
  4. Verified that your Area helps Amazon Bedrock Customized Mannequin Import

Use case 1: Qwen coding assistant

On this instance, we are going to exhibit how you can construct a coding assistant utilizing the Qwen2.5-Coder-7B-Instruct mannequin

  1. Go to to Hugging Face and seek for and replica the Mannequin ID Qwen/Qwen2.5-Coder-7B-Instruct:

You’ll use Qwen/Qwen2.5-Coder-7B-Instruct for the remainder of the walkthrough. We don’t exhibit fine-tuning steps, however you can too fine-tune earlier than importing.

  1. Use the next command to obtain a snapshot of the mannequin regionally. The Python library for Hugging Face supplies a utility referred to as snapshot obtain for this:
from huggingface_hub import snapshot_download

snapshot_download(repo_id=" Qwen/Qwen2.5-Coder-7B-Instruct", 
                local_dir=f"./extractedmodel/")

Relying in your mannequin measurement, this might take a couple of minutes. When accomplished, your Qwen Coder 7B mannequin folder will comprise the next recordsdata.

  • Configuration recordsdata: Together with config.json, generation_config.json, tokenizer_config.json, tokenizer.json, and vocab.json
  • Mannequin recordsdata: 4 safetensor recordsdata and mannequin.safetensors.index.json
  • Documentation: LICENSE, README.md, and merges.txt

  1. Add the mannequin to Amazon S3, utilizing boto3 or the command line:

aws s3 cp ./extractedfolder s3://yourbucket/path/ --recursive

  1. Begin the import mannequin job utilizing the next API name:
response = self.bedrock_client.create_model_import_job(
                jobName="uniquejobname",
                importedModelName="uniquemodelname",
                roleArn="fullrolearn",
                modelDataSource={
                    's3DataSource': {
                        's3Uri': "s3://yourbucket/path/"
                    }
                }
            )
            

It’s also possible to do that utilizing the AWS Administration Console for Amazon Bedrock.

  1. Within the Amazon Bedrock console, select Imported fashions within the navigation pane.
  2. Select Import a mannequin.

  1. Enter the main points, together with a Mannequin identify, Import job identify, and mannequin S3 location.

  1. Create a brand new service position or use an current service position. Then select Import mannequin

  1. After you select Import on the console, you need to see standing as importing when mannequin is being imported:

For those who’re utilizing your individual position, be sure to add the next belief relationship as describes in  Create a service position for mannequin import.

After your mannequin is imported, await mannequin inference to be prepared, after which chat with the mannequin on the playground or by the API. Within the following instance, we append Python to immediate the mannequin to instantly output Python code to listing objects in an S3 bucket. Bear in mind to make use of the proper chat template to enter prompts within the format required. For instance, you will get the proper chat template for any suitable mannequin on Hugging Face utilizing under code:

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-Coder-7B-Instruct")

# As an alternative of utilizing mannequin.chat(), we instantly use mannequin.generate()
# However it is advisable use tokenizer.apply_chat_template() to format your inputs as proven under
immediate = "Write pattern boto3 python code to listing recordsdata in a bucket saved within the variable `my_bucket`"
messages = [
    {"role": "system", "content": "You are a helpful coding assistant."},
    {"role": "user", "content": prompt}
]
textual content = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

Notice that when utilizing the invoke_model APIs, you have to use the total Amazon Useful resource Identify (ARN) for the imported mannequin. You’ll find the Mannequin ARN within the Bedrock console, by navigating to the Imported fashions part after which viewing the Mannequin particulars web page, as proven within the following determine

After the mannequin is prepared for inference, you should utilize Chat Playground in Bedrock console or APIs to invoke the mannequin.

Use case 2: Qwen 2.5 VL picture understanding

Qwen2.5-VL-* presents multimodal capabilities, combining imaginative and prescient and language understanding in a single mannequin. This part demonstrates how you can deploy Qwen2.5-VL utilizing Amazon Bedrock Customized Mannequin Import and check its picture understanding capabilities.

Import Qwen2.5-VL-7B to Amazon Bedrock

Obtain the mannequin from Huggingface Face and add it to Amazon S3:

from huggingface_hub import snapshot_download

hf_model_id = "Qwen/Qwen2.5-VL-7B-Instruct"

# Allow sooner downloads
os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"

# Obtain mannequin regionally
snapshot_download(repo_id=hf_model_id, local_dir=f"./{local_directory}")

Subsequent, import the mannequin to Amazon Bedrock (both through Console or API):

response = bedrock.create_model_import_job(
    jobName=job_name,
    importedModelName=imported_model_name,
    roleArn=role_arn,
    modelDataSource={
        's3DataSource': {
            's3Uri': s3_uri
        }
    }
)

Check the imaginative and prescient capabilities

After the import is full, check the mannequin with a picture enter. The Qwen2.5-VL-* mannequin requires correct formatting of multimodal inputs:

def generate_vl(messages, image_base64, temperature=0.3, max_tokens=4096, top_p=0.9):
    processor = AutoProcessor.from_pretrained("Qwen/QVQ-72B-Preview")
    immediate = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    
    response = consumer.invoke_model(
        modelId=model_id,
        physique=json.dumps({
            'immediate': immediate,
            'temperature': temperature,
            'max_gen_len': max_tokens,
            'top_p': top_p,
            'photos': [image_base64]
        }),
        settle for="software/json",
        contentType="software/json"
    )
    
    return json.masses(response['body'].learn().decode('utf-8'))

# Utilizing the mannequin with a picture
file_path = "cat_image.jpg"
base64_data = image_to_base64(file_path)

messages = [
    {
        "role": "user",
        "content": [
            {"image": base64_data},
            {"text": "Describe this image."}
        ]
    }
]

response = generate_vl(messages, base64_data)

# Print response
print("Mannequin Response:")
if 'selections' in response:
    print(response['choices'][0]['text'])
elif 'outputs' in response:
    print(response['outputs'][0]['text'])
else:
    print(response)
    

When supplied with an instance picture of a cat (such the next picture), the mannequin precisely describes key options such because the cat’s place, fur colour, eye colour, and basic look. This demonstrates Qwen2.5-VL-* mannequin’s means to course of visible data and generate related textual content descriptions.

The mannequin’s response:

This picture includes a close-up of a cat mendacity down on a comfortable, textured floor, seemingly a sofa or a mattress. The cat has a tabby coat with a mixture of darkish and light-weight brown fur, and its eyes are a hanging inexperienced with vertical pupils, giving it a charming look. The cat's whiskers are outstanding and prolong outward from its face, including to the detailed texture of the picture. The background is softly blurred, suggesting a comfortable indoor setting with some furnishings and probably a window letting in pure mild. The general environment of the picture is heat and serene, highlighting the cat's relaxed and content material demeanor. 

Pricing

You need to use Amazon Bedrock Customized Mannequin Import to make use of your customized mannequin weights inside Amazon Bedrock for supported architectures, serving them alongside Amazon Bedrock hosted FMs in a completely managed manner by On-Demand mode. Customized Mannequin Import doesn’t cost for mannequin import. You might be charged for inference based mostly on two components: the variety of energetic mannequin copies and their period of exercise. Billing happens in 5-minute increments, ranging from the primary profitable invocation of every mannequin copy. The pricing per mannequin copy per minute varies based mostly on components together with structure, context size, Area, and compute unit model, and is tiered by mannequin copy measurement. The customized mannequin unites required for internet hosting depends upon the mannequin’s structure, parameter rely, and context size. Amazon Bedrock mechanically manages scaling based mostly in your utilization patterns. If there aren’t any invocations for five minutes, it scales to zero and scales up when wanted, although this may contain cold-start latency of as much as a minute. Further copies are added if inference quantity constantly exceeds single-copy concurrency limits. The utmost throughput and concurrency per copy is set throughout import, based mostly on components similar to enter/output token combine, {hardware} kind, mannequin measurement, structure, and inference optimizations.

For extra data, see Amazon Bedrock pricing.

Clear up

To keep away from ongoing fees after finishing the experiments:

  1. Delete your imported Qwen fashions from Amazon Bedrock Customized Mannequin Import utilizing the console or the API.
  2. Optionally, delete the mannequin recordsdata out of your S3 bucket in case you not want them.

Keep in mind that whereas Amazon Bedrock Customized Mannequin Import doesn’t cost for the import course of itself, you’re billed for mannequin inference utilization and storage.

Conclusion

Amazon Bedrock Customized Mannequin Import empowers organizations to make use of highly effective publicly out there fashions like Qwen 2.5, amongst others, whereas benefiting from enterprise-grade infrastructure. The serverless nature of Amazon Bedrock eliminates the complexity of managing mannequin deployments and operations, permitting groups to give attention to constructing functions relatively than infrastructure. With options like auto scaling, pay-per-use pricing, and seamless integration with AWS companies, Amazon Bedrock supplies a production-ready atmosphere for AI workloads. The mix of Qwen 2.5’s superior AI capabilities and Amazon Bedrock managed infrastructure presents an optimum steadiness of efficiency, price, and operational effectivity. Organizations can begin with smaller fashions and scale up as wanted, whereas sustaining full management over their mannequin deployments and benefiting from AWS safety and compliance capabilities.

For extra data, discuss with the Amazon Bedrock Consumer Information.


In regards to the Authors

Ajit Mahareddy is an skilled Product and Go-To-Market (GTM) chief with over 20 years of expertise in Product Administration, Engineering, and Go-To-Market. Previous to his present position, Ajit led product administration constructing AI/ML merchandise at main know-how corporations, together with Uber, Turing, and eHealth. He’s keen about advancing Generative AI applied sciences and driving real-world impression with Generative AI.

Shreyas Subramanian is a Principal Knowledge Scientist and helps prospects by utilizing generative AI and deep studying to resolve their enterprise challenges utilizing AWS companies. Shreyas has a background in large-scale optimization and ML and in the usage of ML and reinforcement studying for accelerating optimization duties.

Yanyan Zhang is a Senior Generative AI Knowledge Scientist at Amazon Net Companies, the place she has been engaged on cutting-edge AI/ML applied sciences as a Generative AI Specialist, serving to prospects use generative AI to realize their desired outcomes. Yanyan graduated from Texas A&M College with a PhD in Electrical Engineering. Outdoors of labor, she loves touring, figuring out, and exploring new issues.

Dharinee Gupta is an Engineering Supervisor at AWS Bedrock, the place she focuses on enabling prospects to seamlessly make the most of open supply fashions by serverless options. Her group focuses on optimizing these fashions to ship the most effective cost-performance steadiness for patrons. Previous to her present position, she gained in depth expertise in authentication and authorization programs at Amazon, growing safe entry options for Amazon choices. Dharinee is keen about making superior AI applied sciences accessible and environment friendly for AWS prospects.

Lokeshwaran Ravi is a Senior Deep Studying Compiler Engineer at AWS, specializing in ML optimization, mannequin acceleration, and AI safety. He focuses on enhancing effectivity, decreasing prices, and constructing safe ecosystems to democratize AI applied sciences, making cutting-edge ML accessible and impactful throughout industries.

June Gained is a Principal Product Supervisor with Amazon SageMaker JumpStart. He focuses on making basis fashions simply discoverable and usable to assist prospects construct generative AI functions. His expertise at Amazon additionally consists of cellular procuring functions and final mile supply.

Tags: AmazonBedrockCustomDeployImportModelmodelsQwen
Previous Post

Improve Azure Native working system to new model

Next Post

Tips on how to Construct, Run, and Package deal AI Fashions Regionally with Docker Mannequin Runner

Next Post
Settings Administration for Docker Desktop now usually accessible within the Admin Console

Tips on how to Construct, Run, and Package deal AI Fashions Regionally with Docker Mannequin Runner

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Trending

Constructing TensorFlow Pipelines with Vertex AI

Constructing TensorFlow Pipelines with Vertex AI

March 27, 2025
Personal Cloud vs Public Cloud

Personal Cloud vs Public Cloud

January 27, 2025
Pushing the frontiers of audio technology

Pushing the frontiers of audio technology

May 1, 2025
Cross Sells, Upsells and Associated Merchandise in WooCommerce & E-Commerce

Cross Sells, Upsells and Associated Merchandise in WooCommerce & E-Commerce

February 2, 2025
Azure Databricks Pricing Defined

Azure Complete Price of Possession (TCO)

April 25, 2025
From Inexperienced Claims to Eco-Particular

From Inexperienced Claims to Eco-Particular

May 13, 2025

MultiCloud365

Welcome to MultiCloud365 — your go-to resource for all things cloud! Our mission is to empower IT professionals, developers, and businesses with the knowledge and tools to navigate the ever-evolving landscape of cloud technology.

Category

  • AI and Machine Learning in the Cloud
  • AWS
  • Azure
  • Case Studies and Industry Insights
  • Cloud Architecture
  • Cloud Networking
  • Cloud Platforms
  • Cloud Security
  • Cloud Trends and Innovations
  • Data Management
  • DevOps and Automation
  • GCP
  • IAC
  • OCI

Recent News

The Economics of Zero Belief: Why the ‘Straightforward’ Path Prices Extra

The Economics of Zero Belief: Why the ‘Straightforward’ Path Prices Extra

July 20, 2025
Maximize Financial savings with Automated Cloud Price Optimization

Serverless vs Serverful: Smarter Azure Decisions

July 20, 2025
  • About Us
  • Privacy Policy
  • Disclaimer
  • Contact

© 2025- https://multicloud365.com/ - All Rights Reserved

No Result
View All Result
  • Home
  • Cloud Architecture
    • OCI
    • GCP
    • Azure
    • AWS
    • IAC
    • Cloud Networking
    • Cloud Trends and Innovations
    • Cloud Security
    • Cloud Platforms
  • Data Management
  • DevOps and Automation
    • Tutorials and How-Tos
  • Case Studies and Industry Insights
    • AI and Machine Learning in the Cloud

© 2025- https://multicloud365.com/ - All Rights Reserved