[2]Building with LLMs: A Guide to Selection and Evaluation
"How Difficult Is It to Choose the Right LLM Model for Your Application?"
What are Large Language Models(LLMs)?
Humans are good at creating things such as writing poems, creating music, art, or painting. Up until now, machines had no chance of competing with humans at creative tasks. Thanks to LLM models, which have generative capabilities and, they are becoming good at creating creative things. This AI is called Generative AI, which goes beyond traditional AI to generate something new altogether rather than analyzing something that already exists in your database.
LLM models are trained using deep learning techniques on large amounts of text data. They are computationally intensive because they optimize many parameters to capture complex underlying patterns accurately. LLMs also use these deep learning models to learn entity relationships and semantics in the language to generate human-like content through Natural Language Processing(NLP) tasks. LLM models are too large to run on a single computer and are provided as a service over an API or web interface. The size of the hardware required often depends on model size, training data size, and training duration, including specific architecture.
Fine-tuning, though last, is a significant aspect of LLM models, as there is no model available out there that is one size fits all. Fine-tuning comes after pre-training; once the model has learned general language parameters in pre-training, it is further trained on a smaller, domain-specific database for the particular use case. More often than not, choosing the right model for your use case is challenging as it requires more fine-tuning and altering the model weights to make it behave as needed. This is often time-consuming and may not give results for your purpose. Hence, picking the right foundation model is most important; large models like GPT will have trade-offs.
Parameters to look after before selecting LLMs
1)Cost: There are two kinds of costs associated with LLMs — cost to train and cost of inference. The cost of calling an LLM is the inference cost. In contrast to ordinary queries, running LLMs needs a costly stack of GPUs. These GPUs are expensive to procure and maintain in terms of the cost of raw power and memory usage. Why does this matter?
This would affect the profitability of every business using LLMs and fall under subscription COGS. Since many suppliers are new to the industry and subscale, they have yet to be concerned about this. However, these expenses will be a problem for established providers.
Inference cost, in turn, consists of three components: 1) prompt cost (proportional to the length of the prompt), 2) generation cost (proportional to the generation length), and 3) sometimes a fixed cost per query.
2)Tokens/Second — The rate at which a model can process or generate tokens (words, subwords, or characters) within a given time frame. A model's response speed depends on this metric. Also, a token limit can be controlled beforehand while fine-tuning a model.
Why it matters:
Real-Time Responsiveness: In many applications, especially those that require real-time or near-real-time responses, the speed at which the model can process tokens is crucial.
Latency: Latency is the delay between sending an input to the model and receiving the output. Lower tokens per second generally correspond to higher latency.
Scalability/Concurrent Requests: As your application gains more users and experiences increased usage, the demand for the LLM model will also grow. Models with higher tokens per second rates can handle more requests concurrently, making them scalable for larger user bases. LLM orchestration frameworks provide a way to manage and control LLMs.
3)Accuracy of output required: A larger model might be helpful if your application needs extremely high accuracy, but it has its own trade-offs. A smaller model can be more appropriate if you are aiming for restricted devices or real-time processing. Even open-source models are proving more capable for specific use cases.
4) Customization and Fine-tuning: This could be a key element if your product has to understand a specific context. However, training large models with pre-defined weights might be complex, and models might behave differently than expected.
Earlier models simply grew in the number of parameters, but now it is considered better to train smaller models for a longer time and give them more data. This reduces the model size and the cost of the model’s further use during inference. Open source LLM models like Llama and Claude present a viable alternative by allowing the design of LLMs tailored to specific needs using targeted data. Additonally cost of employing open-source LLMs is significantly lower, open source models also allow on-prem hosting for sensitive data and fine tuned open source models with Reinforcement Learning from Human Feedback(RLHF) will perform better than OpenAI’s GPT.
5)Ease of API and integrations: This facilitates seamless connectivity, rapid deployment, and flexible adaptation of LLMs.
All these factors are tied up ultimately with the user experience.
Most use cases today based on text, code, audio, video or image generation in real-time or otherwise are only partially fulfilled by LLM models. Also, there is rarely any available stack off the shelf that you can use in your products. Hence, founders are building their proprietary stacks or waiting for capable Multimodal Models. Nevertheless, the pace of improvement of these models and the maturity of the stack is unimaginable.
What do you look at while selecting LLMs? What did I miss?
If you found this piece helpful or interesting, don't hesitate to share it with your network.