Blog

Data Control and AI: How Private LLMs Offer a Secure Alternative to Cloud-Based Models

December 16, 2024

Filipe Lourenço, Principal Engineer at Critical Software, explores how LLMs address key challenges such as data privacy, intellectual property protection, and regulatory compliance.

Through a practical analysis, we examine the feasibility and challenges of implementing on-premises LLMs, demonstrating that it is possible to maintain high-quality results while ensuring full control over the data.

The Challenge

Many companies today face significant challenges when deploying AI solutions using large language models (LLMs), particularly with cloud-based models such as OpenAI's GPT. Let us take a closer look at some of these challenges:

Data Privacy: Some organisations have strict policies that require their data to remain within their premises or private cloud. This presents a challenge when relying on LLMs hosted in third-party clouds.
Intellectual Property Protection: Relying on cloud-hosted LLMs may expose companies to risks of third-party access, making it difficult to safeguard proprietary information and sensitive business data.
Regulatory Compliance: Legal and regulatory frameworks in certain industries or regions mandate that data must be processed and stored within specific geographic boundaries, potentially restricting the use of cloud-based solutions.

Given these obstacles, private large language models (LLMs) could serve as an alternative to cloud-based models. However, every rose has its thorn. The adoption of LLMs within organisations introduces several technical challenges that need to be addressed to ensure their success. One critical concern is the quality of results and whether they can match the performance and accuracy of cloud-based LLMs like OpenAI’s GPT. This raises questions about the quality, relevance, and accuracy of responses from these private models. Another issue with using private large language models is the infrastructure required to support them and how to ensure adequate response times and scalability.

The Approach: Exploring On-Prem LLMs as a Viable Alternative

To address these challenges, we launched an internal initiative to explore whether on-premises LLMs could offer a viable alternative to cloud-based models. The main activities within this initiative included, first, setting up an infrastructure that supports the use of multiple LLMs and integrating them with systems that previously relied on cloud-based models (such as Azure OpenAI) to verify whether similar outcomes could be achieved with smaller, private LLMs.

In terms of architecture design, we considered a typical setup for generative AI solutions that combines LLMs with Retrieval-Augmented Generation (RAG). Our approach was to replace the cloud-based LLM with an open-source, private LLM, ensuring that the entire solution could be deployed on the client’s premises. Next, we conducted a state-of-the-art evaluation, where we assessed the current landscape of open-source LLMs and selected the most suitable models for our use cases. Finally, we adapted in-house AI solutions developed by Critical, originally based on Azure OpenAI, by replacing the LLM with private models. This allowed us to deliver a complete on-premises solution for:

CoBot: This AI-powered solution leverages a Generative AI Chatbot specifically designed for analysing and explaining legacy codebases (e.g., Cobol), paving the way for seamless modernisation and future-proofing of critical systems. You can learn more about CoBot in our latest article.
IBE: A Graph-RAG solution that organises internal company information into a comprehensive knowledge base, reducing helpdesk demand, streamlining regulatory impact evaluations, and enabling process improvements through increased awareness.

Key Findings and Insights

Our experiments provided several key insights into the performance and viability of private LLMs:

Response Optimisation: We optimise responses by fine-tuning prompts for each specific model. Once a specific model is selected, adjusting the prompts for that model can significantly improve the results.
DeepEval Assessment: We used DeepEval to measure the relevance and accuracy of responses, which showed good results for open-source LLMs. Take, for example, the analysis for generating code descriptions and pseudo-code in CoBot.
Manual Comparison with OpenAI: In a manual comparison between open-source LLMs (e.g., llama-70b-instruct) and GPT-4, we found that the latter generally produced better results. The process demonstrated that while open-source models required more iterations of question-response than Azure OpenAI to reach similar results, they were capable of delivering comparable results to the larger model.
Outcomes Vary Across Use Cases: It’s advisable to explore different models to determine the most suitable one for each specific use case.
Infrastructure Requirements: Deploying in the cloud offers low initial costs and the flexibility to scale up as needed, making it an ideal solution during uncertain phases, such as the development phase and when identifying the models that best address the needs. On the other hand, on-prem infrastructure keeps all data under the client’s control, although it requires a significant initial investment in hardware, which can be recovered over time with extended usage. For cases where it’s feasible, leveraging the cloud for initial testing minimises uncertainties, paving the way for a smooth transition to on-premises once the right models and infrastructure requirements are clear.

A Path Forward for On-Prem LLM Adoption

Our initiative proved that replacing cloud-based LLMs with private, on-prem models is a viable solution for companies that cannot use online LLMs, whether due to data privacy concerns or regulatory compliance requirements. Replacing cloud-based LLMs with open-source LLMs deployed in a private context is indeed possible, particularly with recent advancements in open-source models. Let me share some of the key findings we've made:

Experimenting with different models is necessary to identify the best fit for each purpose.
Response optimisation through prompt tuning is essential and varies depending on the model being used.
Compared to Azure OpenAI, additional prompt iterations may be required to achieve similar levels of completeness in the information obtained from the LLM.
Depending on the use case and approach, using different models for distinct tasks within the same solution — where specialised sub-models handle specific subsets of the task — may improve results.

The Technology Behind the Solution

You may be wondering what cutting-edge technologies we used to ensure the success of this initiative. Allow me to show you:

NVIDIA Triton Inference Server: Flexible and scalable inference server solution that simplifies the deployment and management of machine learning models.
LLM Models Used: The main LLMs that were evaluated during the experiment were: MistralAI Mistral 7B Instruct; MistralAI Codestral 22B; MistralAI Mixtral 8x22B Instruct; Meta Llama-3.1 8b Instruct; Meta Llama-3.1 70b Instruct; Alibaba-NLP gte-large-en-v1.5.
DeepEval: To measure the relevance and accuracy of responses.

Do you have any questions or want to extend the debate? Talk to our Principal Engineer, Filipe Lourenço.

Facebook LinkedIn Twitter