hiamitabha (Amitabha Banerjee)

Posts 1

Post

The past year has seen a rapid pace of delivery of open-to-use Large Language Models (LLMs) for Generative AI. Simultaneously, a number of service providers have bootstrapped, which provide cloud based APIs for generative inference using these LLMs. The list is long: Together.AI, Fireworks.AI, Mosaic ML (Databricks), Anyscale , Perplexity Labs... Assuming that you are fine with privacy and security issues that sending your data to a cloud hosted solution bring, these providers can help launch your Generative AI use case with little friction.

Given that all the cloud based providers provide access to the same/similar open models, how do you distinguish between them? One way is to look at the performance vs cost tradeoff. This post talks about performance.

To understand performance, one needs benchmarks and metrics. There exist a number of benchmarks and leader boards to compare solutions. But, before you start comparing data points flying across the internet, I suggest you to consider the following.

1) Play with your own benchmark: You need to understand the performance for your use case, not some random workload a off-the-shelf benchmark seeks to represent and its authors dreamed off. You need to build and run your own benchmark. You could either understand and re-work the publicly available benchmarks like LLMPerf, or design your own one. Contrary to wisdom, writing a benchmark grounds up isn't hard. I wrote a simple benchmark called GenAI-Bench to compare the response time of cloud providers. You can do it too.
2) Understand the metrics you care about. Is your use case a chatbot which needs a quick turnaround for the first word? Or do you need to parse a large number of documents and get a summary over-night. Correctly identifying what you care for will help you narrow down the best provider for your needs.
3) Keep generating data points periodically. Cloud providers routinely update their software. Metrics change. Measure them periodically.