DEV Community

Harish Aravindan
Harish Aravindan

Posted on

Serverless Bedrock: How I invoke Claude from Lambda in warrantyAI

Every week I ship a new piece of warrantyAI โ€” an AI-powered warranty management system I'm building on AWS. This week was Week 8: a 3-agent LangGraph pipeline wired to Bedrock.

Before the agents could do anything, I needed one thing to work cleanly: invoking Claude from a Lambda function without a server, without a container fleet, without an inference endpoint sitting idle burning money.

Building warrantyAI on AWS with AI-powered pipeline | Harish Aravindan posted on the topic | LinkedIn

๐—ช๐—ฒ๐—ฒ๐—ธ ๐Ÿด ๐—ผ๐—ณ ๐—ฏ๐˜‚๐—ถ๐—น๐—ฑ๐—ถ๐—ป๐—ด ๐˜„๐—ฎ๐—ฟ๐—ฟ๐—ฎ๐—ป๐˜๐˜†๐—”๐—œ ๐Ÿ‘‰ ๐—™๐—ผ๐—ฟ ๐˜๐—ต๐—ผ๐˜€๐—ฒ ๐˜€๐—ฒ๐—ฒ๐—ถ๐—ป๐—ด ๐˜๐—ต๐—ถ๐˜€ ๐—ณ๐—ผ๐—ฟ ๐˜๐—ต๐—ฒ ๐—ณ๐—ถ๐—ฟ๐˜€๐˜ ๐˜๐—ถ๐—บ๐—ฒ I'm a Senior Cloud Engineer building an AI-powered warranty management system on AWS โ€” from scratch, one week at a time. No shortcuts. Real architecture. Real cost numbers. ๐—ง๐—ต๐—ถ๐˜€ ๐˜„๐—ฒ๐—ฒ๐—ธ: ๐—œ ๐˜„๐—ถ๐—ฟ๐—ฒ๐—ฑ ๐Ÿฏ ๐—”๐—œ ๐—ฎ๐—ด๐—ฒ๐—ป๐˜๐˜€ ๐˜๐—ผ๐—ด๐—ฒ๐˜๐—ต๐—ฒ๐—ฟ ๐˜‚๐˜€๐—ถ๐—ป๐—ด ๐—Ÿ๐—ฎ๐—ป๐—ด๐—š๐—ฟ๐—ฎ๐—ฝ๐—ต The problem warrantyAI solves: Most people lose track of their warranties. Appliances expire. Repairs get denied. Money is wasted. warrantyAI reads the document, classifies the risk, and reminds you before itโ€™s too late. This week I built the core pipeline that makes that happen. Reader โ†’ Classifier โ†’ Reminder ๐Ÿ“„ ๐—ฅ๐—ฒ๐—ฎ๐—ฑ๐—ฒ๐—ฟ ๐—”๐—ด๐—ฒ๐—ป๐˜ Customer uploads a warranty PDF to S3. Textract pulls the raw text. Bedrock Haiku structures it into named fields โ€” product, brand, expiry date, serial number. ๐Ÿ” ๐—–๐—น๐—ฎ๐˜€๐˜€๐—ถ๐—ณ๐—ถ๐—ฒ๐—ฟ ๐—”๐—ด๐—ฒ๐—ป๐˜ Takes those fields and classifies the warranty. Haiku first โ€” fast and cheap. If confidence drops below 70%, automatically retries with Sonnet. GovernanceShield guardrail (built in Week 7) runs on every invocation. Outputs: category, expiry date, risk level. ๐Ÿ”” ๐—ฅ๐—ฒ๐—บ๐—ถ๐—ป๐—ฑ๐—ฒ๐—ฟ ๐—”๐—ด๐—ฒ๐—ป๐˜ Reads risk level from shared state. Generates a human-readable notification via Haiku. Publishes to SNS โ€” but only for medium and high risk. Low risk? Message generated, not sent. Deliberate FinOps decision. SNS isnโ€™t free at scale. Repo : https://lnkd.in/gsndTpQV ๐—ช๐—ต๐—ฎ๐˜ ๐—ต๐—ฒ๐—น๐—ฑ ๐˜๐—ต๐—ถ๐˜€ ๐˜๐—ผ๐—ด๐—ฒ๐˜๐—ต๐—ฒ๐—ฟ: ๐—ช๐—ฎ๐—ฟ๐—ฟ๐—ฎ๐—ป๐˜๐˜†๐—ฆ๐˜๐—ฎ๐˜๐—ฒ One typed Python dict shared across all 3 agents. No message queues between agents. No shared database mid-pipeline. Each agent reads from it, writes back a partial update. LangGraph handles the sequencing. What connected cleanly from previous weeks: โœ” Week 7 GovernanceShield guardrail โ€” one import, plugged straight in โœ” Per-agent IAM roles already existed โ€” zero new permissions needed โœ” S3 audit bucket already live โ€” all 3 agents write to it Building incrementally pays off. Whatโ€™s your multi-agent orchestration framework of choice right now? #AIPlatformEngineering #LangGraph #AWSBedrock #warrantyAI #Serverless #AI

favicon linkedin.com

Here's exactly how I did it.


Why serverless + Bedrock is the right combo

Bedrock's invoke_model API is synchronous and stateless. It takes a request, returns a response. That's exactly what Lambda is built for. No warm model, no GPU instance, no ECS cluster. You pay per invocation, per token.

For warrantyAI's workload โ€” sporadic document uploads, not a real-time chat product โ€” this matters. My entire system runs under $1.30/day.


The setup: IAM first, always

Before any code, the Lambda execution role needs this policy:

{
  "Effect": "Allow",
  "Action": [
    "bedrock:InvokeModel",
    "bedrock:InvokeModelWithResponseStream"
  ],
  "Resource": [
    "arn:aws:bedrock:ap-south-1::foundation-model/anthropic.claude-haiku-4-5-20251001",
    "arn:aws:bedrock:ap-south-1::foundation-model/anthropic.claude-sonnet-4-6"
  ]
}
Enter fullscreen mode Exit fullscreen mode

Scope it to specific model ARNs. Not *. Ever.


The invoke wrapper

This is the core function I reuse across all 3 agents in warrantyAI:

import json
import boto3

bedrock = boto3.client("bedrock-runtime", region_name="ap-south-1")

HAIKU  = "anthropic.claude-haiku-4-5-20251001"
SONNET = "anthropic.claude-sonnet-4-6"

def invoke_bedrock(prompt: str, model_id: str = HAIKU, max_tokens: int = 512) -> str:
    """
    Invoke a Bedrock Claude model from Lambda.
    Returns the text response as a string.
    """
    response = bedrock.invoke_model(
        modelId=model_id,
        contentType="application/json",
        accept="application/json",
        body=json.dumps({
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": max_tokens,
            "messages": [
                {"role": "user", "content": prompt}
            ]
        })
    )
    body = json.loads(response["body"].read())
    return body["content"][0]["text"].strip()
Enter fullscreen mode Exit fullscreen mode

That's it. Stateless, reusable, testable in isolation.


Haiku-first, Sonnet fallback

Haiku is fast and cheap. Sonnet is accurate and expensive. In warrantyAI's Classifier agent, I try Haiku first. If it returns low confidence, I retry with Sonnet automatically:

def classify_warranty(structured_data: dict) -> dict:
    prompt = build_classify_prompt(structured_data)

    # Attempt 1: Haiku
    result = invoke_bedrock(prompt, model_id=HAIKU)
    parsed = json.loads(result)

    # Fallback: Sonnet if confidence < 0.7
    if parsed.get("confidence", 0) < 0.7:
        result = invoke_bedrock(prompt, model_id=SONNET)
        parsed = json.loads(result)
        parsed["model_used"] = "sonnet"
    else:
        parsed["model_used"] = "haiku"

    return parsed
Enter fullscreen mode Exit fullscreen mode

In practice, Haiku handles ~85% of documents. Sonnet kicks in for complex commercial warranties with ambiguous clause structures.


Three things that will burn you

1. The body is a StreamingBody, not a string.
Always call .read() before json.loads(). Forget this once and you'll spend 20 minutes confused.

# Wrong
body = json.loads(response["body"])

# Right
body = json.loads(response["body"].read())
Enter fullscreen mode Exit fullscreen mode

2. Token limits on Lambda payloads.
Lambda has a 6MB synchronous response limit. Bedrock responses are usually tiny, but if you're passing large documents in your prompt, chunk them first. I cap prompts at 4,000 characters in the Reader agent.

3. Bedrock is regional.
Not all models are available in all regions. ap-south-1 (Mumbai) supports Haiku and Sonnet. If you get a ResourceNotFoundException, check model availability in your region first before debugging your code.


Cost reality check

For warrantyAI's workload (roughly 50 documents/day):

Model Avg tokens/call Cost/call Daily cost
Haiku ~800 ~$0.0004 ~$0.017
Sonnet (15% of calls) ~800 ~$0.006 ~$0.005

Total Bedrock cost: under $0.025/day for this workload.
The rest of my $1.30/day budget goes to Textract, SNS, and S3.


What's next

This pattern is the foundation for the entire warrantyAI pipeline. Next Sunday I'll cover how I wired these invocations into a LangGraph StateGraph โ€” three agents, one shared state dict, no message queues.

Follow along if you're building serverless AI on AWS. I publish every Sunday in LinkedIn

This is part of the Serverless Meets AI series โ€” practical AWS patterns from building warrantyAI.

Top comments (0)