Mixtral 8x7B: What you need to know about Mistral AI’s latest model

We’re Neon, and we’re redefining the database experience with our cloud-native serverless Postgres solution. If you’ve been looking for a database for your RAG apps that adapts to your application loads, you’re in the right place. Give Neon a try, and let us know what you think. Neon is cloud-native Postgres and scales your AI apps to millions of users with pgvector. In this post, Raouf is going to tell you what you need to know about Mixtral 8x7B, the new LLM by MistralAI.

Mistral AI, the company behind the Mistral 7B model, has released its latest model: Mixtral 8x7B (Mixtral). The model includes support for 32k tokens and better code generation, and it matches or outperforms GPT3.5 on most standard benchmarks.

In this article, we’ll review the new text-generation and embedding models by Mistral AI.

Background

Mistral AI has emerged as a strong contender in the open-source large language model sphere with their Mistral 7B model, which outperforms existing models like Llama 2 (13B parameters) across multiple benchmarks.

In a previous comparative analysis, we concluded that, although impressive, the Mistral 7B instruct model optimized for chat needed some improvements before being seen as an alternative to the gpt-* models.

Mixtral might change all of that as it’s pushing the frontier of open models. According to a recent benchmark, Mixtral matches or outperforms Llama 2 70B and GPT3.5.

	LLaMA 2 70B	GPT – 3.5	Mixtral 8x7B
MMLU (MCQ in 57 subjects)	69.9%	70.0%	70.6%
HellaSwag (10-shot)	87.1%	85.5%	86.7%
ARC Challenge (25-shot)	85.1%	85.2%	85.8%
WinoGrande (5-shot)	83.2%	81.6%	81.2%
MBPP (pass@1)	49.8%	52.2%	60.7%
GSM-8K (5-shot)	53.6%	57.1%	58.4%
MT Bench (for Instruct Models)	6.86	8.32	8.30

Developing with Mixtral 8x7B Instruct

If you plan to fine-tune Mixtral and your own inference, it’s important to note that Mixtral requires much more RAM and GPUs than Mistral 7B. While Mistral 7B works well on a 24GB RAM 1 GPU instance, Mixtral requires 64GB of RAM and 2 GPUs, which increases the cost by a factor of 3 (1.3$/h vs. 4.5$/h).

Luckily for developers, Mistral AI has an API in beta and under an invite gate. They also have client libraries for Python and JavaScript developers.

Below is an example of code using the Python library.

Prerequisite: install the `mistraiai` client library using `pip`:

pip install mistralai

Here is a code example:

from mistralai.client import MistralClient
from mistralai.models.chat_completion import ChatMessage

api_key = os.environ["MISTRAL_API_KEY"]
model = "mistral-tiny"

client = MistralClient(api_key=api_key)

messages = [
    ChatMessage(role="user", content="What is the elephant database?")
]

chat_response = client.chat(
    model=model,
    messages=messages
)

If you’re familiar with the OpenAI client library, you will notice the similarity between the two SDKs. The Mistral AI library can be used as a drop-in replacement, which makes migrations seamless.

Mistral AI API provides three models:

mistral-tiny based on Mistral-7B-v0.2
mistral-small based on Mixtral-7Bx8-v0.1
mistral-medium based on an internal prototype model

Mistral-embed: The new embedding model

In addition to the text generation models, Mistral AI’s API gives you access to BGE-large-like 1024-dimension embedding model `mistral-embed`, also accessible via the client library with the below code:

from mistralai.client import MistralClient

api_key = os.environ["MISTRAL_API_KEY"]

client = MistralClient(api_key=api_key)
embeddings_batch_response = client.embeddings(
      model="mistral-embed",
      input=["I love Postgres!"],
  )

What does it mean for your AI apps?

Mixtral provides developers with a gpt-3.5-turbo API compatible alternative and, in the case of mistral-tiny and mistral-small models, at a lower price.

Below is the price comparison per one million tokens.

	mistral-tiny	mistral-small	mistral-medium	gpt-3.5-turbo-1106	gpt-3.5-turbo-instruct
Input	$0.15	$0.64	$2.68	$1.0	$1.5
Output	$0.45	$1.93	$8.06	$2.0	$2.0

However, if you previously stored ada v2 1536 dimension vector embeddings with pgvector, you will need to re-create the embeddings to add support for mistral-embed.

embeddings_batch_response = client.embeddings(
      model="mistral-embed",
      input=["text 1", "text 2", "text 3"],
  )

The mistral-embed model for text embedding is slightly more expensive than the text-embedding-ada-002 model.

	mistral-embed	ada v2
Input	$0.107	$0.1

Note that Mistral AI’s pricing is in euros and the tables above reflect adjusted rates to USD.

Conclusion

The release of Mixtral 8x7B by Mistral AI represents a significant leap forward for open-source LLMs. With its enhanced capabilities like 32k token support, improved code generation, and competitive performance against gpt-3.5-turbo, Mixtral is poised to be a game-changer for developers and AI enthusiasts alike.

While the model’s resource requirements can be a potential barrier for some, those limitations are offset by the Mistral AI API, and the drop-in replacement client libraries in Python and JavaScript.

The pricing structure of Mixtral, particularly for the mistral-tiny and mistral-small models, presents a more cost-effective alternative to gpt-3.5-* models. This, along with the advanced capabilities of the mistral-embed model for text embedding, makes Mixtral an attractive option for a wide range of AI apps and Retrieval Augmented Generation pipelines.

However, it’s worth noting that transitioning to Mixtral, especially for those who previously used models like ada v2 for embedding, may require some adjustments in terms of re-creating embeddings and accommodating the slightly higher cost of mistral-embed.

Overall, Mixtral 8x7B marks an exciting development in the AI field, offering powerful and efficient tools for a variety of applications. As Mistral AI continues to innovate and expand its offerings, it will undoubtedly play a crucial role in shaping the future of AI technology.

📚 Continue reading

Deploy Mistral Large to Azure and create a conversation with Python and LangChain: check out our step-by-step guide to deploying Mistral Large to Azure.
30x faster index build for your vector embeddings with pgvector: learn how the new pgvector speeds up the index building process for vector embeddings by 30 times, optimizing performance for your AI apps.
How to create and publish a custom ChatGPT: a guide walking you through how to create, publish, and potentially monetize custom ChatGPT models.

What is Neon?