haive.core.models.embeddings.base¶

Base Embedding Models Module.

from typing import Any This module provides the foundational abstractions for embedding models in the Haive framework. It includes base classes and implementations for different embedding providers that transform text into high-dimensional vector representations for use in semantic search, clustering, and other NLP tasks.

Typical usage example:

Examples

>>> from haive.core.models.embeddings.base import create_embeddings, HuggingFaceEmbeddingConfig
>>>
>>> # Create a HuggingFace embedding model configuration
>>> config = HuggingFaceEmbeddingConfig(
>>> model="sentence-transformers/all-MiniLM-L6-v2"
>>> )
>>>
>>> # Instantiate the embeddings model
>>> embeddings = create_embeddings(config)
>>>
>>> # Use the model to embed documents or queries
>>> doc_embeddings = embeddings.embed_documents(["Text to embed"])

Classes¶

`AnyscaleEmbeddingConfig`	Configuration for Anyscale embedding models.
`AzureEmbeddingConfig`	Configuration for Azure OpenAI embedding models.
`BaseEmbeddingConfig`	Base configuration for embedding models.
`BedrockEmbeddingConfig`	Configuration for AWS Bedrock embedding models.
`CloudflareEmbeddingConfig`	Configuration for Cloudflare Workers AI embedding models.
`CohereEmbeddingConfig`	Configuration for Cohere embedding models.
`FastEmbedEmbeddingConfig`	Configuration for FastEmbed embedding models.
`HuggingFaceEmbeddingConfig`	Configuration for HuggingFace embedding models.
`JinaEmbeddingConfig`	Configuration for Jina AI embedding models.
`LlamaCppEmbeddingConfig`	Configuration for LlamaCpp local embedding models.
`MockTorch`	Mock torch module for documentation builds.
`OllamaEmbeddingConfig`	Configuration for Ollama embedding models.
`OpenAIEmbeddingConfig`	Configuration for OpenAI embedding models.
`SecureConfigMixin`	Mixin for securely handling API keys from environment variables.
`SentenceTransformerEmbeddingConfig`	Configuration for SentenceTransformer embedding models.
`VertexAIEmbeddingConfig`	Configuration for Google Vertex AI embedding models.
`VertexAIEmbeddings`	Mock VertexAI embeddings to avoid slow imports.
`VoyageAIEmbeddingConfig`	Configuration for Voyage AI embedding models.

Functions¶

`create_embeddings`(config)	Factory function to create embedding models from a configuration.
`get_huggingface_embeddings`()	Lazy import of HuggingFaceEmbeddings.

Module Contents¶

class haive.core.models.embeddings.base.AnyscaleEmbeddingConfig(/, **data)[source]¶

Bases: BaseEmbeddingConfig

Configuration for Anyscale embedding models.

This class configures embedding models from Anyscale.

Parameters:: data (Any)

provider¶: Set to EmbeddingProvider.ANYSCALE

model¶: The model name (defaults to thenlper/gte-large)

base_url¶: The base URL for the Anyscale API

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

instantiate(**kwargs)[source]¶

Instantiate an Anyscale embedding model.

Parameters:: **kwargs – Additional keyword arguments to pass to AnyscaleEmbeddings
Returns:: The instantiated embedding model
Return type:: AnyscaleEmbeddings

class haive.core.models.embeddings.base.AzureEmbeddingConfig(/, **data)[source]¶

Bases: BaseEmbeddingConfig

Configuration for Azure OpenAI embedding models.

This class configures embedding models from Azure OpenAI services, supporting environment variable resolution for credentials.

Parameters:: data (Any)

provider¶: Set to EmbeddingProvider.AZURE

model¶: The Azure deployment name for the embedding model

api_version¶: The Azure OpenAI API version to use

api_base¶: The Azure endpoint URL

api_type¶: The API type (typically “azure”)

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

get_api_key()[source]¶

Get the API key as a string.

Returns:: The API key
Return type:: str

instantiate(**kwargs)[source]¶

Instantiate an Azure OpenAI embedding model.

Parameters:: **kwargs – Additional keyword arguments to pass to AzureOpenAIEmbeddings
Returns:: The instantiated embedding model
Return type:: AzureOpenAIEmbeddings

class haive.core.models.embeddings.base.BaseEmbeddingConfig(/, **data)[source]¶

Bases: pydantic.BaseModel, SecureConfigMixin

Base configuration for embedding models.

This abstract base class defines the common interface for all embedding model configurations, ensuring consistent instantiation patterns across providers.

Parameters:: data (Any)

provider¶: The embedding provider (e.g., Azure, HuggingFace)

model¶: The specific model identifier or name

api_key¶: The API key for the provider (if required)

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

abstractmethod instantiate(**kwargs)[source]¶

Instantiate the embedding model with the configuration.

Parameters:: **kwargs – Additional keyword arguments to pass to the model constructor
Returns:: The instantiated embedding model
Return type:: Any
Raises:: NotImplementedError – Must be implemented by subclasses

class haive.core.models.embeddings.base.BedrockEmbeddingConfig(/, **data)[source]¶

Bases: BaseEmbeddingConfig

Configuration for AWS Bedrock embedding models.

This class configures embedding models from AWS Bedrock service.

Parameters:: data (Any)

provider¶: Set to EmbeddingProvider.BEDROCK

model¶: The model ID (defaults to amazon.titan-embed-text-v1)

region¶: AWS region

credentials_profile_name¶: AWS credentials profile name

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

instantiate(**kwargs)[source]¶

Instantiate an AWS Bedrock embedding model.

Parameters:: **kwargs – Additional keyword arguments to pass to BedrockEmbeddings
Returns:: The instantiated embedding model
Return type:: BedrockEmbeddings

class haive.core.models.embeddings.base.CloudflareEmbeddingConfig(/, **data)[source]¶

Bases: BaseEmbeddingConfig

Configuration for Cloudflare Workers AI embedding models.

This class configures embedding models from Cloudflare Workers AI.

Parameters:: data (Any)

provider¶: Set to EmbeddingProvider.CLOUDFLARE

model¶: The model name (defaults to @cf/baai/bge-small-en-v1.5)

account_id¶: Cloudflare account ID

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

instantiate(**kwargs)[source]¶

Instantiate a Cloudflare Workers AI embedding model.

Parameters:: **kwargs – Additional keyword arguments to pass to CloudflareWorkersAIEmbeddings
Returns:: The instantiated embedding model
Return type:: CloudflareWorkersAIEmbeddings

class haive.core.models.embeddings.base.CohereEmbeddingConfig(/, **data)[source]¶

Bases: BaseEmbeddingConfig

Configuration for Cohere embedding models.

This class configures embedding models from Cohere services.

Parameters:: data (Any)

provider¶: Set to EmbeddingProvider.COHERE

model¶: The Cohere model name for embeddings (defaults to embed-english-v3.0)

input_type¶: Type of input to be embedded (defaults to search_document)

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

instantiate(**kwargs)[source]¶

Instantiate a Cohere embedding model.

Parameters:: **kwargs – Additional keyword arguments to pass to CohereEmbeddings
Returns:: The instantiated embedding model
Return type:: CohereEmbeddings

class haive.core.models.embeddings.base.FastEmbedEmbeddingConfig(/, **data)[source]¶

Bases: BaseEmbeddingConfig

Configuration for FastEmbed embedding models.

This class configures FastEmbed models, which are lightweight and efficient embeddings that can run on CPU.

Parameters:: data (Any)

provider¶: Set to EmbeddingProvider.FASTEMBED

model¶: The model name (defaults to BAAI/bge-small-en-v1.5)

max_length¶: Maximum sequence length

cache_folder¶: Where to cache the model files

use_cache¶: Whether to use embedding caching

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

instantiate(**kwargs)[source]¶

Instantiate a FastEmbed embedding model.

Parameters:: **kwargs – Additional keyword arguments to pass to FastEmbedEmbeddings
Returns:: The instantiated embedding model
Return type:: FastEmbedEmbeddings

class haive.core.models.embeddings.base.HuggingFaceEmbeddingConfig(/, **data)[source]¶

Bases: BaseEmbeddingConfig

Configuration for HuggingFace embedding models.

This class configures embedding models from HuggingFace’s model hub, with support for local caching and hardware acceleration.

Parameters:: data (Any)

provider¶: Set to EmbeddingProvider.HUGGINGFACE

model¶: The HuggingFace model ID (defaults to all-MiniLM-L6-v2)

model_kwargs¶: Additional keyword arguments for model instantiation

encode_kwargs¶: Additional keyword arguments for encoding

query_encode_kwargs¶: Additional keyword arguments for query encoding

multi_process¶: Whether to use multi-processing for encoding

cache_folder¶: Where to cache the model files

show_progress¶: Whether to show progress bars

use_cache¶: Whether to use embedding caching

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

instantiate(**kwargs)[source]¶

Instantiate a HuggingFace embedding model.

This method includes error handling and GPU memory cleanup in case of initialization failures.

Parameters:: **kwargs – Additional keyword arguments to pass to HuggingFaceEmbeddings
Returns:: The instantiated embedding model
Return type:: HuggingFaceEmbeddings
Raises:: Exception – If model instantiation fails after cleanup attempt

class haive.core.models.embeddings.base.JinaEmbeddingConfig(/, **data)[source]¶

Bases: BaseEmbeddingConfig

Configuration for Jina AI embedding models.

This class configures embedding models from Jina AI.

Parameters:: data (Any)

provider¶: Set to EmbeddingProvider.JINA

model¶: The model name (defaults to jina-embeddings-v2-base-en)

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

instantiate(**kwargs)[source]¶

Instantiate a Jina AI embedding model.

Parameters:: **kwargs – Additional keyword arguments to pass to JinaEmbeddings
Returns:: The instantiated embedding model
Return type:: JinaEmbeddings

class haive.core.models.embeddings.base.LlamaCppEmbeddingConfig(/, **data)[source]¶

Bases: BaseEmbeddingConfig

Configuration for LlamaCpp local embedding models.

This class configures embedding models using LlamaCpp for local execution.

Parameters:: data (Any)

provider¶: Set to EmbeddingProvider.LLAMACPP

model¶: Required model name parameter (for compatibility with BaseEmbeddingConfig)

model_path¶: Path to the model file

n_ctx¶: Context size for the model

n_batch¶: Batch size for inference

n_gpu_layers¶: Number of layers to offload to GPU

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

instantiate(**kwargs)[source]¶

Instantiate a LlamaCpp embedding model.

Parameters:: **kwargs – Additional keyword arguments to pass to LlamaCppEmbeddings
Returns:: The instantiated embedding model
Return type:: LlamaCppEmbeddings

class haive.core.models.embeddings.base.MockTorch[source]¶: Mock torch module for documentation builds.

class haive.core.models.embeddings.base.OllamaEmbeddingConfig(/, **data)[source]¶

Bases: BaseEmbeddingConfig

Configuration for Ollama embedding models.

This class configures embedding models from Ollama, which runs locally and doesn’t require an API key.

Parameters:: data (Any)

provider¶: Set to EmbeddingProvider.OLLAMA

model¶: The Ollama model name (defaults to nomic-embed-text)

base_url¶: The base URL for the Ollama server

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

instantiate(**kwargs)[source]¶

Instantiate an Ollama embedding model.

Parameters:: **kwargs – Additional keyword arguments to pass to OllamaEmbeddings
Returns:: The instantiated embedding model
Return type:: OllamaEmbeddings

class haive.core.models.embeddings.base.OpenAIEmbeddingConfig(/, **data)[source]¶

Bases: BaseEmbeddingConfig

Configuration for OpenAI embedding models.

This class configures embedding models from OpenAI services, supporting multiple model types and configurations.

Parameters:: data (Any)

provider¶: Set to EmbeddingProvider.OPENAI

model¶: The OpenAI model name for embeddings (defaults to text-embedding-3-small)

dimensions¶: Output dimensions for the embedding vectors

show_progress_bar¶: Whether to show progress bars during embedding

chunk_size¶: Batch size for embedding operations

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

instantiate(**kwargs)[source]¶

Instantiate an OpenAI embedding model.

Parameters:: **kwargs – Additional keyword arguments to pass to OpenAIEmbeddings
Returns:: The instantiated embedding model
Return type:: OpenAIEmbeddings

class haive.core.models.embeddings.base.SecureConfigMixin[source]¶

Mixin for securely handling API keys from environment variables.

This mixin provides methods for securely resolving API keys from environment variables or explicitly provided values, with appropriate fallbacks.

classmethod resolve_api_key(v, info)[source]¶

Resolve API key from provided value or environment variables.

Parameters:

v – The provided API key value
info (pydantic.ValidationInfo) – ValidationInfo containing field data

Returns:

The resolved API key as a SecretStr

Return type:

SecretStr

class haive.core.models.embeddings.base.SentenceTransformerEmbeddingConfig(/, **data)[source]¶

Bases: BaseEmbeddingConfig

Configuration for SentenceTransformer embedding models.

This class configures embedding models from SentenceTransformers library, which provides efficient and accurate sentence and text embeddings.

Parameters:: data (Any)

provider¶: Set to EmbeddingProvider.SENTENCE_TRANSFORMERS

model¶: The model name or path (defaults to all-MiniLM-L6-v2)

cache_folder¶: Where to cache the model files

use_cache¶: Whether to use embedding caching

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

instantiate(**kwargs)[source]¶

Instantiate a SentenceTransformer embedding model.

Parameters:: **kwargs – Additional keyword arguments to pass to SentenceTransformerEmbeddings
Returns:: The instantiated embedding model
Return type:: SentenceTransformerEmbeddings

class haive.core.models.embeddings.base.VertexAIEmbeddingConfig(/, **data)[source]¶

Bases: BaseEmbeddingConfig

Configuration for Google Vertex AI embedding models.

This class configures embedding models from Google Vertex AI.

Parameters:: data (Any)

provider¶: Set to EmbeddingProvider.VERTEXAI

model¶: The model name (defaults to textembedding-gecko@latest)

project¶: Google Cloud project ID

location¶: Google Cloud region

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

instantiate(**kwargs)[source]¶

Instantiate a Google Vertex AI embedding model.

Parameters:: **kwargs – Additional keyword arguments to pass to VertexAIEmbeddings
Returns:: The instantiated embedding model
Return type:: VertexAIEmbeddings

class haive.core.models.embeddings.base.VertexAIEmbeddings(*args, **kwargs)[source]¶: Mock VertexAI embeddings to avoid slow imports.

class haive.core.models.embeddings.base.VoyageAIEmbeddingConfig(/, **data)[source]¶

Bases: BaseEmbeddingConfig

Configuration for Voyage AI embedding models.

This class configures embedding models from Voyage AI.

Parameters:: data (Any)

provider¶: Set to EmbeddingProvider.VOYAGEAI

model¶: The model name (defaults to voyage-2)

voyage_api_url¶: The API URL for Voyage AI

voyage_api_version¶: The API version for Voyage AI

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

instantiate(**kwargs)[source]¶

Instantiate a Voyage AI embedding model.

Parameters:: **kwargs – Additional keyword arguments to pass to VoyageEmbeddings
Returns:: The instantiated embedding model
Return type:: VoyageEmbeddings

haive.core.models.embeddings.base.create_embeddings(config)[source]¶

Factory function to create embedding models from a configuration.

This function simplifies the instantiation of embedding models by delegating to the appropriate configuration class.

Parameters:: config (BaseEmbeddingConfig) – The embedding model configuration
Returns:: The instantiated embedding model
Return type:: Any

Example:

Examples

>>> config = HuggingFaceEmbeddingConfig(model="sentence-transformers/all-MiniLM-L6-v2")
>>> embeddings = create_embeddings(config)

haive.core.models.embeddings.base.get_huggingface_embeddings()[source]¶: Lazy import of HuggingFaceEmbeddings.