Blog 29.1.2025

Building GenAI-powered apps on AWS

Competence

Team sitting around a table in sofas and chairs

Generative AI was on the forefront in AWS Summit Stockholm 2024. Acronyms like LLMs, FMs and RAG were in the air throughout the day. I found out that there is indeed no better way to spend a sunny summer’s day, than in well-cooled, dim-lit convention hall with a bunch of cloud experts.

As a Data Scientist, I was interested in getting to know the AWS Machine Learning (ML) and AI offering better, particularly the recent developments in GenAI in mind. I’ve often found it difficult to navigate the vastness of different features that is AWS SageMaker, as well as the different high-level AI and ML related services that AWS provides. My learning goal for the convention was to get a big picture that would help me navigate the AWS offering.

I approached my learning goal somewhat but also learned much about developing GenAI applications in general. I will try to share those learning here in a manner that those not too familiar with ML or AI might also find useful, as GenAI application development should by no means be solely the domain of ML specialists. More and more cloud experts and software developers find themselves implementing apps that leverage GenAI.

The whole machine learning world is a jungle of acronyms, and generative AI has brought with it several new ones. The GenAI paradigm Is that specialized organizations train foundational models (FMs), which are both in general and the technical sense generative: they are trained to produce new data. They provide their FMs downstream, to users who leverage them in various creative ways.

Text-generating large language models (LLMs) have been at the forefront of the emergence of GenAI and have entered public awareness because of the killer app that is ChatGPT. I find that often when people say GenAI, they are talking specifically about LLM-powered chatbots. I will likely fall into that trap as well.

It is always fun to pick up a new acronym or two: in Stockholm, I picked up the DevOps derivations FMOps and LLMOps. The presentation on FMOps was excellent and provided me with new insights beyond the AWS offering (some links at the end of the post).

I’ve used the three-way division of foundational model users from that presentation: Consumers, Fine-tuners, and Producers; to map some of the AWS services in relation to various GenAI topics, alongside general notions about said topics.

Note that I haven’t personally even tried all the services I mention: I’m simply saying these services exist and can perform a given job, not that they will do it well.

Consumers

Leveraging foundational models does not require considerable ML expertise. Using retrieval-augmented generation (RAG), users may bring their own data without having to train an ML model. And you don’t necessarily need to implement RAG from the bottom up yourself either, but high-level services are available for that. Most FM users are Consumers: they use pre-trained models by prompting them, without training them further.

AWS Bedrock is AWS’ fully managed go-to solution for GenAI application development. It provides high-level functionality to deploy and consume text and image based FMs, and functionality to build apps around FMs.

Where more control is required, SageMaker is also an option for Consumers. SageMaker includes every tool under the sun for ML model training, deployment, and beyond. I don’t mean that entirely in a positive way. As such it can be quite intimidating. FMs are available in SageMaker through the JumpStart functionality which provides various templates for you to work with.

I would identify three key concerns for Consumers: FM selection, RAG, and prompt engineering. RAG essentially means that the end user’s query is used to search for relevant documents (retrieval), which are added to the prompt (augmentation), which is given as the input to the FM (generation).

In Bedrock, RAG can be implemented using the Agents and Knowledge Bases functionalities. In SageMaker, implementing RAG is a more involved process, and there are many alternatives. For instance, you might use the intelligent search service AWS Kendra for retrieval. Or you might implement the retrieval step from scratch by deploying your own embedding model; pre-trained embedding models are available on JumpStart. Or you might even want to use a third-party service, such as Azure AI Search.

To select the best FM for their use case, Consumers need to evaluate different options. Evaluation of FMs Is a complex topic, because the quality of the outputs is typically hard to measure. Human evaluation may be required. For human evaluation, you can use the SageMaker Ground Truth. (Its typical use case is labelling of training data for supervised learning). FMs can also be used to perform the evaluation automatically, but it needs to be a different FM. Otherwise, your model will simply validate its own outputs.

Prompt engineering is the art of designing the inputs to the FM so that the outputs meet the requirements. There was an interesting presentation on the subject at the Summit, which I’ve linked at the end. Without broaching the broad topic any further, I will mention that I found particularly interesting that LLMs often do well with XML tags, which can be used to structure the prompt, and to get the model to output ‘pseudo-structured’ data (or Claude at least, which seems to be the go-to LLM for Bedrock).

Fine-tuners

Techniques such as RAG are bottlenecked by the quality of retrieval, and thus the results might not meet the high expectations set by state-of-the-art FMs. To properly integrate you own data with an FM, the model must be trained further. This is a computationally heavy process, given the typical size of FMs. Continuing the training of a pre-trained model is called fine-tuning. Fine-tuners usually require machine learning expertise.

Model fine-tuning can be performed on Bedrock or Sagemaker. On Bedrock, it’s a black box process where you provide the data and then magic happens under the hood.

On SageMaker, you are responsible for the process. Fine-tuning happens much like any training, only that you have a pre-trained model as a base. Foundational models are available on SageMaker JumpStart.

When fine-tuning, you continue the training where the provider left off. Because the amount of new data you bring is substantially smaller than the original training data, less training iterations are also needed. However, the number of the parameters on the FM remains the same (and considerably large), so the memory requirements will be substantial.

Luckily, there are techniques for reducing the number of parameters to be estimated during fine-tuning. Such techniques are referred to as parameter-efficient fine-tuning (PEFT). SageMaker JumpStart includes models with support for PEFT techniques, such as LoRA.

Fine-tuners need to understand their models in a much deeper way than Consumers do. After all, once an FM is fine-tuned, it is essentially a new model. Fine-tuning is not surgical: every iteration in the training process alters the model in a holistic way, and not always for the better. SageMaker’s Clarify functionality can be used to detect biases in a model. Model Monitor can be used for continuous monitoring of a model.

Providers

Providers train foundational models from scratch. This requires vast amounts of resources, both in terms of computation and, parhaps more crucially, data. Training new foundational models is therefore the purview of large corporations and dedicated AI labs . The real beauty of foundational models is in that most FM users do not need to enter this territory to reap the benefits of state-of-the-art ML.

My picks from the Summit

Finally, I would like to share my favourite presentations from the Summit. These presentations should be good study material for GenAI application development in general, not just on AWS. AWS publishes presentations from their conferences on their AWS Event Youtube channel.

On FMOps. This is the source of the classification of FM users I’ve used here, and most of this post in general. The recording is not from Stockholm, but the content is the same: FMOps/LLMOps: Generative AI from idea to production on AWS
On prompt engineering: Prompt engineering best practices for LLMs on Amazon Bedrock

Cloud expert, we are looking for freelancers →