How Druva used Amazon Bedrock to handle basis mannequin complexity when constructing Dru, Druva’s backup AI copilot

This publish is co-written with David Gildea and Tom Nijs from Druva.

Druva is a number one supplier of knowledge safety options, and is trusted by over 6,000 prospects, together with 65 of the Fortune 500. Clients use the Druva Knowledge Safety Cloud, a totally managed SaaS answer, to safe and recuperate knowledge from all threats. Unbiased software program distributors (ISVs) like Druva are integrating AI assistants into their options to make software program extra accessible.

Dru, the Druva backup AI copilot, permits real-time interplay and customized responses, with customers partaking in a pure dialog with the software program. From discovering inconsistencies and errors throughout the atmosphere to scheduling backup jobs and setting retention insurance policies, customers want solely ask and Dru responds. Dru can even suggest actions to enhance the atmosphere, treatment backup failures, and establish alternatives to reinforce safety.

On this publish, we present how Druva approached pure language querying (NLQ)—asking questions in English and getting tabular knowledge as solutions—utilizing Amazon Bedrock, the challenges they confronted, pattern prompts, and key learnings.

Use case overview

The next screenshot illustrates the Dru dialog interface.

In a single dialog interface, Dru gives the next:

Interactive reporting with real-time insights – Customers can request knowledge or custom-made studies with out in depth looking or navigating via a number of screens. Dru additionally suggests follow-up questions to reinforce consumer expertise.
Clever responses and a direct conduit to Druva’s documentation – Customers can achieve in-depth information about product options and functionalities with out guide searches or watching coaching movies. Dru additionally suggests assets for additional studying.
Assisted troubleshooting – Customers can request summaries of high failure causes and obtain urged corrective measures. Dru on the backend decodes log knowledge, deciphers error codes, and invokes API calls to troubleshoot.
Simplified admin operations, with elevated seamlessness and accessibility – Customers can carry out duties like creating a brand new backup coverage or triggering a backup, managed by Druva’s current role-based entry management (RBAC) mechanism.
Personalized web site navigation via conversational instructions – Customers can instruct Dru to navigate to particular web site areas, eliminating the necessity for guide menu exploration. Dru additionally suggests follow-up actions to hurry up activity completion.

Challenges and key learnings

On this part, we focus on the challenges and key learnings of Druva’s journey.

Total orchestration

Initially, we adopted an AI agent method and relied on the muse mannequin (FM) to make plans and invoke instruments utilizing the reasoning and performing (ReAct) methodology to reply consumer questions. Nevertheless, we discovered the target too broad and sophisticated for the AI agent. The AI agent would take greater than 60 seconds to plan and reply to a consumer query. Typically it might even get caught in a thought-loop, and the general success price wasn’t passable.

We determined to maneuver to the immediate chaining method utilizing a directed acyclic graph (DAG). This method allowed us to interrupt the issue down into a number of steps:

Establish the API route.
Generate and invoke personal API calls.
Generate and run knowledge transformation Python code.

Every step turned an impartial stream, so our engineers might iteratively develop and consider the efficiency and velocity till they labored effectively in isolation. The workflow additionally turned extra controllable by defining correct error paths.

Stream 1: Establish the API route

Out of the tons of of APIs that energy Druva merchandise, we would have liked to match the precise API the appliance must name to reply the consumer query. For instance, “Present me my backup failures for the previous 72 hours, grouped by server.” Having comparable names and synonyms in API routes make this retrieval downside extra complicated.

Initially, we formulated this activity as a retrieval downside. We tried totally different strategies, together with k-nearest neighbor (k-NN) search of vector embeddings, BM25 with synonyms, and a hybrid of each throughout fields together with API routes, descriptions, and hypothetical questions. We discovered that the only and most correct manner was to formulate it as a classification activity to the FM. We curated a small record of examples in question-API route pairs, which helped enhance the accuracy and make the output format extra constant.

Stream 2: Generate and invoke personal API calls

Subsequent, we API name with the right parameters and invoke it. FM hallucination of parameters, notably these with free-form JSON object, is without doubt one of the main challenges in the entire workflow. For instance, the unsupported key server can seem within the generated parameter:

"filter": {
    "and": [
        {
            "gte": {
                "key": "dt",
                "value": 1704067200
            }
        },
        {
            "eq": {
                "key": "server",
                "value": "xyz"
            }
        }
    ]
}

We tried totally different prompting methods, similar to few-shot prompting and chain of thought (CoT), however the success price was nonetheless unsatisfactory. To make API name era and invocation extra sturdy, we separated this activity into two steps:

First, we used an FM to generate parameters in a JSON dictionary as a substitute of a full API request headers and physique.
Afterwards, we wrote a postprocessing perform to take away parameters that didn’t conform to the API schema.

This methodology offered a profitable API invocation, on the expense of getting extra knowledge than required for downstream processing.

Stream 3: Generate and run knowledge transformation Python code

Subsequent, we took the response from the API name and reworked it to reply the consumer query. For instance, “Create a pandas dataframe and group it by server column.” Much like stream 2, FM hallucination is once more an impediment. Generated code can comprise syntax errors, similar to complicated PySpark capabilities with Pandas capabilities.

After making an attempt many various prompting methods with out success, we appeared on the reflection sample, asking the FM to self-correct code in a loop. This improved the success price on the expense of extra FM invocations, which have been slower and costlier. We discovered that though smaller fashions are sooner and more cost effective, at instances they’d inconsistent outcomes. Anthropic’s Claude 2.1 on Amazon Bedrock gave extra correct outcomes on the second strive.

Mannequin decisions

Druva chosen Amazon Bedrock for a number of compelling causes, with safety and latency being an important. A key issue on this determination was the seamless integration with Druva’s companies. Utilizing Amazon Bedrock aligned naturally with Druva’s current atmosphere on AWS, sustaining a safe and environment friendly extension of their capabilities.

Moreover, certainly one of our major challenges in creating Dru concerned deciding on the optimum FMs for particular duties. Amazon Bedrock successfully addresses this problem with its in depth array of obtainable FMs, every providing distinctive capabilities. This selection enabled Druva to conduct the fast and complete testing of assorted FMs and their parameters, facilitating the collection of probably the most appropriate one. The method was streamlined as a result of Druva didn’t must delve into the complexities of operating or managing these various FMs, because of the sturdy infrastructure offered by Amazon Bedrock.

By way of the experiments, we discovered that totally different fashions carried out higher in particular duties. For instance, Meta Llama 2 carried out higher with code era activity; Anthropic Claude Occasion was good in environment friendly and cost-effective dialog; whereas Anthropic Claude 2.1 was good in getting desired responses in retry flows.

These have been the most recent fashions from Anthropic and Meta on the time of this writing.

Resolution overview

The next diagram reveals how the three streams work collectively as a single workflow to reply consumer questions with tabular knowledge.

The next are the steps of the workflow:

The authenticated consumer submits a query to Dru, for instance, “Present me my backup job failures for the final 72 hours,” as an API name.
The request arrives on the microservice on our current Amazon Elastic Container Service (Amazon ECS) cluster. This course of consists of the next steps:
1. A classification activity utilizing the FM gives the out there API routes within the immediate and asks for the one which greatest matches with consumer query.
2. An API parameters era activity utilizing the FM will get the corresponding API swagger, then asks the FM to counsel key-value pairs to the API name that may retrieve knowledge to reply the query.
3. A customized Python perform verifies, codecs, and invokes the API name, then passes the information in JSON format to the following step.
4. A Python code era activity utilizing the FM samples a couple of information of knowledge from the earlier step, then asks the FM to jot down Python code to rework the information to reply the query.
5. A customized Python perform runs the Python code and returns the reply in tabular format.

To take care of consumer and system safety, we be certain in our design that:

The FM can’t immediately hook up with any Druva backend companies.
The FM resides in a separate AWS account and digital personal cloud (VPC) from the backend companies.
The FM can’t provoke actions independently.
The FM can solely reply to questions despatched from Druva’s API.
Regular buyer permissions apply to the API calls made by Dru.
The decision to the API (Step 1) is barely doable for authenticated consumer. The authentication element lives exterior the Dru answer and is used throughout different inner options.
To keep away from immediate injection, jailbreaking, and different malicious actions, a separate module checks for these earlier than the request reaches this service (Amazon API Gateway in Step 1).

For extra particulars, check with Druva’s Secret Sauce: Meet the Know-how Behind Dru’s GenAI Magic.

Implementation particulars

On this part, we focus on Steps 2a–2e within the answer workflow.

2a. Lookup the API definition

This step makes use of an FM to carry out classification. It takes the consumer query and a full record of obtainable API routes with significant names and descriptions because the enter, and responds The next is a pattern immediate:

Please learn the next API routes fastidiously as I’ll ask you a query about them:
<api_routes>{api_routes}</api_routes>
Which API route can greatest reply “{query}”?

2b. Generate the API name

This step makes use of an FM to generate API parameters. It first seems up the corresponding swagger for the API route (from Step 2a). Subsequent, it passes the swagger and the consumer query to an FM and responds with some key-value pairs to the API route that may retrieve related knowledge. The next is a pattern immediate:

Please learn the next swagger fastidiously as I’ll ask you a query about it:
<swagger>{swagger}</swagger>
Produce a key-value JSON dict of the out there request parameters based mostly on “{query}” on the subject of the swagger.

2c. Validate and invoke the API name

Within the earlier step, even with an try and floor responses with swagger, the FM can nonetheless hallucinate fallacious or nonexistent API parameters. This step makes use of a programmatic technique to confirm, format, and invoke the API name to get knowledge. The next is the pseudo code:

for every enter parameter (key/worth)
  if parameter key not in swagger then
    drop parameter
  else if parameter worth knowledge sort not match swagger then
    drop parameter
  else
    URL encode parameter
  finish if
finish for

second. Generate Python code to rework knowledge

This step makes use of an FM to generate Python code. It first samples a couple of information of enter knowledge to cut back enter tokens. Then it passes the pattern knowledge and the consumer query to an FM and responds with a Python script that transforms knowledge to reply the query. The next is a pattern immediate:

Please learn the next pattern knowledge fastidiously as I’ll ask you a query about them:
<sample_data>{5_rows_of_data_in_json}</sample_data>
Write a Python script utilizing pandas to rework the information to reply the query “{query}”.

2e. Run the Python code

This step includes a Python script, which imports the generated Python bundle, runs the transformation, and returns the tabular knowledge as the ultimate response. If an error happens, it would invoke the FM to attempt to right the code. When every little thing fails, it returns the enter knowledge. The next is the pseudo code:

for max variety of retries
  run knowledge transformation perform
  if error then
    invoke basis mannequin to right code
  finish if
finish for
if success then
  return reworked knowledge
else
  return enter knowledge
finish if

Conclusion

Utilizing Amazon Bedrock for the answer basis led to exceptional achievements in accuracy, as evidenced by the next metrics in our evaluations utilizing an inner dataset:

Stream 1: Establish the API route – Achieved an ideal accuracy price of 100%
Stream 2: Generate and invoke personal API calls – Maintained this customary with a 100% accuracy price
Stream 3: Generate and run knowledge transformation Python code – Attained a extremely commendable accuracy of 90%

These outcomes usually are not simply numbers; they’re a testomony to the robustness and effectivity of the Amazon Bedrock based mostly answer. With such excessive ranges of accuracy, Druva is now poised to confidently broaden their horizons. Our subsequent objective is to increase this answer to embody a wider vary of APIs throughout Druva merchandise. The subsequent enlargement shall be scaling up utilization and considerably enrich the expertise of Druva prospects. By integrating extra APIs, Druva will supply a extra seamless, responsive, and contextual interplay with Druva merchandise, additional enhancing the worth delivered to Druva customers.

To study extra about Druva’s AI options, go to the Dru answer web page, the place you may see a few of these capabilities in motion via recorded demos. Go to the AWS Machine Studying weblog to see how different prospects are utilizing Amazon Bedrock to unravel their enterprise issues.

In regards to the Authors

David Gildea is the VP of Product for Generative AI at Druva. With over 20 years of expertise in cloud automation and rising applied sciences, David has led transformative tasks in knowledge administration and cloud infrastructure. Because the founder and former CEO of CloudRanger, he pioneered revolutionary options to optimize cloud operations, later resulting in its acquisition by Druva. Presently, David leads the Labs crew within the Workplace of the CTO, spearheading R&D into generative AI initiatives throughout the group, together with tasks like Dru Copilot, Dru Examine, and Amazon Q. His experience spans technical analysis, industrial planning, and product growth, making him a outstanding determine within the discipline of cloud expertise and generative AI.

Tom Nijs is an skilled backend and AI engineer at Druva, obsessed with each studying and sharing information. With a give attention to optimizing programs and utilizing AI, he’s devoted to serving to groups and builders carry revolutionary options to life.

Corvus Lee is a Senior GenAI Labs Options Architect at AWS. He’s obsessed with designing and creating prototypes that use generative AI to unravel buyer issues. He additionally retains up with the most recent developments in generative AI and retrieval methods by making use of them to real-world eventualities.

Fahad Ahmed is a Senior Options Architect at AWS and assists monetary companies prospects. He has over 17 years of expertise constructing and designing software program purposes. He lately discovered a brand new ardour of constructing AI companies accessible to the plenty.

Supply hyperlink