In this guide, we explore the key techniques for enhancing Large Language Models (LLMs). Learn about effective strategies such as hyperparameter tuning, regularization, and choosing the right metrics.
Povindrakanth
April 19, 2024
LLMs (Large Language Models) are trained on the entire internet, and as a result, they only have general knowledge. However, for most business applications, you’ll need to give it access to data in order to be truly useful. Unlike in ChatGPT, you would want these data to be accessible by default and not have to manually add them every time you ask it to perform a task.
Adding a whole document to the query, including irrelevant information, is also expensive and slow.
This limitation can be overcome by giving access to a database, which will surface only the sections of the document thatare relevant to answering the question. This is also known as RAG (Retrieval Augmented Generation).
Role-based access controls are simply good business practice for larger organizations. Letting people only access those features that are necessary to perform their job reduces distractions and minimizes the risk of accidental or deliberate misuse of permissions.
Moreover, many businesses operate in industries that have stringent requirements on who can access what data and are subject to audits.
In the context of LLMs, role-based access often differentiates whether users can modify the ‘instruction prompt’ and LLM settings or whether they are consumers of apps that were carefully designed by their colleagues.
Imagine you’re the IT manager responsible for using AI within the organization; how would you know if what you have is effective?
While you may not want to read exactly what every employee is asking the company AI, you might want to know:
- Who in the org is the power user that can teach others?
- What do people use it for?
- What is the most popular use case?
- Are people asking the right questions, or do they need some training?
- How long does it take before they get a response?
- What is the cost per query?
Additionally, some organizations have a legal obligation to be able to audit their use of AI.
Using AI is great, but it is not free. While most queries are cheap and the majority of your staff will keep their usage within bounds, there is always that one individual who unknowingly or knowingly rakes up a huge computing bill.
For example, this could happen if you ask a model with a large context window (GPT4-preview) to analyze a spreadsheet with 10k rows. All rows will be taken as context. Do this a few times, and you’re looking at a few 10s of dollars for some playful experimentation.
Have you ever shared or received good prompts from your colleagues through a messaging app and later found yourself scrolling back to find that one prompt? Organization-wide, this creates a lot of friction and wasted time searching through messaging apps.
What is needed is a prompt management system,where people can store and find the prompts they need in an organized manner.
You’ve got your prototype AI application, it works for a demo, but is it ready to be rolled out commercially? Most likely (certainly) not. The first version is usually between 20-50% reliable and still has ‘dumb modes of failure’ and ample room for quality improvement.
Your engineers will need a system to debug these cases and get it to a level of reliability that makes it suitable for your enterprise. For this, they’ll need a monitoring system, a systematic way to measure performance, and a way to A/B test what modifications worked and didn’t.
We see that most businesses require at least 90% reliability for workflows where a ‘human-in-the-loop’ can approve answers and handle difficult edge cases. Or >99% reliability for fully automated workflows.
Congratulations! You have developed a reliable LLM application and consistently get the output you want; now what? You are going to use it of course!
For example, if you are using an LLM to analyze the transcript of a sales call, chances are that you are going to copy-paste parts of the analysis into your CRM (Customer Relationship Management software, for example, Hubspot). Wouldn’t it be great if you didn’t have to do that either? The true power of LLMs can be unlocked when they become invisible and are integrated with existing workflows.
The integrations between LLMs and other software often require a bit more attention than standard API integrations. This is because the model output needs to be reliably parsed in a particular way (often JSON) to pass through an API. Usually, this happens while passing layers of authentication and passing along metadata.
Engineers are powerful, but they are not gods. Even if they have crafted the perfect application, they cannot make the LLM read your mind.
Let’s say you have made a system that can answer questions about your sales data, and you ask it, ‘ What were John’s deal in Q1? ’. Did you mean:
- Q1 2024 or Q1 2023?
- John Doe or John Smith? Or did you mean Jon?
- Open deals or closed deals ?
Specificity is the name of the game. While LLMs can feel like magic, they cannot perform miracles.
Upon request, Query Vary runs clinics to educate staff on how LLMs work and help you identify suitable AI use cases for your organization.