haive.core.models.embeddings.base

Base Embedding Models Module.

from typing import Any This module provides the foundational abstractions for embedding models in the Haive framework. It includes base classes and implementations for different embedding providers that transform text into high-dimensional vector representations for use in semantic search, clustering, and other NLP tasks.

Typical usage example:

Examples

>>> from haive.core.models.embeddings.base import create_embeddings, HuggingFaceEmbeddingConfig
>>>
>>> # Create a HuggingFace embedding model configuration
>>> config = HuggingFaceEmbeddingConfig(
>>> model="sentence-transformers/all-MiniLM-L6-v2"
>>> )
>>>
>>> # Instantiate the embeddings model
>>> embeddings = create_embeddings(config)
>>>
>>> # Use the model to embed documents or queries
>>> doc_embeddings = embeddings.embed_documents(["Text to embed"])

Classes

AnyscaleEmbeddingConfig

Configuration for Anyscale embedding models.

AzureEmbeddingConfig

Configuration for Azure OpenAI embedding models.

BaseEmbeddingConfig

Base configuration for embedding models.

BedrockEmbeddingConfig

Configuration for AWS Bedrock embedding models.

CloudflareEmbeddingConfig

Configuration for Cloudflare Workers AI embedding models.

CohereEmbeddingConfig

Configuration for Cohere embedding models.

FastEmbedEmbeddingConfig

Configuration for FastEmbed embedding models.

HuggingFaceEmbeddingConfig

Configuration for HuggingFace embedding models.

JinaEmbeddingConfig

Configuration for Jina AI embedding models.

LlamaCppEmbeddingConfig

Configuration for LlamaCpp local embedding models.

MockTorch

Mock torch module for documentation builds.

OllamaEmbeddingConfig

Configuration for Ollama embedding models.

OpenAIEmbeddingConfig

Configuration for OpenAI embedding models.

SecureConfigMixin

Mixin for securely handling API keys from environment variables.

SentenceTransformerEmbeddingConfig

Configuration for SentenceTransformer embedding models.

VertexAIEmbeddingConfig

Configuration for Google Vertex AI embedding models.

VertexAIEmbeddings

Mock VertexAI embeddings to avoid slow imports.

VoyageAIEmbeddingConfig

Configuration for Voyage AI embedding models.

Functions

create_embeddings(config)

Factory function to create embedding models from a configuration.

get_huggingface_embeddings()

Lazy import of HuggingFaceEmbeddings.

Module Contents

class haive.core.models.embeddings.base.AnyscaleEmbeddingConfig(/, **data)[source]

Bases: BaseEmbeddingConfig

Configuration for Anyscale embedding models.

This class configures embedding models from Anyscale.

Parameters:

data (Any)

provider

Set to EmbeddingProvider.ANYSCALE

model

The model name (defaults to thenlper/gte-large)

base_url

The base URL for the Anyscale API

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

instantiate(**kwargs)[source]

Instantiate an Anyscale embedding model.

Parameters:

**kwargs – Additional keyword arguments to pass to AnyscaleEmbeddings

Returns:

The instantiated embedding model

Return type:

AnyscaleEmbeddings

class haive.core.models.embeddings.base.AzureEmbeddingConfig(/, **data)[source]

Bases: BaseEmbeddingConfig

Configuration for Azure OpenAI embedding models.

This class configures embedding models from Azure OpenAI services, supporting environment variable resolution for credentials.

Parameters:

data (Any)

provider

Set to EmbeddingProvider.AZURE

model

The Azure deployment name for the embedding model

api_version

The Azure OpenAI API version to use

api_base

The Azure endpoint URL

api_type

The API type (typically “azure”)

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

get_api_key()[source]

Get the API key as a string.

Returns:

The API key

Return type:

str

instantiate(**kwargs)[source]

Instantiate an Azure OpenAI embedding model.

Parameters:

**kwargs – Additional keyword arguments to pass to AzureOpenAIEmbeddings

Returns:

The instantiated embedding model

Return type:

AzureOpenAIEmbeddings

class haive.core.models.embeddings.base.BaseEmbeddingConfig(/, **data)[source]

Bases: pydantic.BaseModel, SecureConfigMixin

Base configuration for embedding models.

This abstract base class defines the common interface for all embedding model configurations, ensuring consistent instantiation patterns across providers.

Parameters:

data (Any)

provider

The embedding provider (e.g., Azure, HuggingFace)

model

The specific model identifier or name

api_key

The API key for the provider (if required)

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

abstractmethod instantiate(**kwargs)[source]

Instantiate the embedding model with the configuration.

Parameters:

**kwargs – Additional keyword arguments to pass to the model constructor

Returns:

The instantiated embedding model

Return type:

Any

Raises:

NotImplementedError – Must be implemented by subclasses

class haive.core.models.embeddings.base.BedrockEmbeddingConfig(/, **data)[source]

Bases: BaseEmbeddingConfig

Configuration for AWS Bedrock embedding models.

This class configures embedding models from AWS Bedrock service.

Parameters:

data (Any)

provider

Set to EmbeddingProvider.BEDROCK

model

The model ID (defaults to amazon.titan-embed-text-v1)

region

AWS region

credentials_profile_name

AWS credentials profile name

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

instantiate(**kwargs)[source]

Instantiate an AWS Bedrock embedding model.

Parameters:

**kwargs – Additional keyword arguments to pass to BedrockEmbeddings

Returns:

The instantiated embedding model

Return type:

BedrockEmbeddings

class haive.core.models.embeddings.base.CloudflareEmbeddingConfig(/, **data)[source]

Bases: BaseEmbeddingConfig

Configuration for Cloudflare Workers AI embedding models.

This class configures embedding models from Cloudflare Workers AI.

Parameters:

data (Any)

provider

Set to EmbeddingProvider.CLOUDFLARE

model

The model name (defaults to @cf/baai/bge-small-en-v1.5)

account_id

Cloudflare account ID

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

instantiate(**kwargs)[source]

Instantiate a Cloudflare Workers AI embedding model.

Parameters:

**kwargs – Additional keyword arguments to pass to CloudflareWorkersAIEmbeddings

Returns:

The instantiated embedding model

Return type:

CloudflareWorkersAIEmbeddings

class haive.core.models.embeddings.base.CohereEmbeddingConfig(/, **data)[source]

Bases: BaseEmbeddingConfig

Configuration for Cohere embedding models.

This class configures embedding models from Cohere services.

Parameters:

data (Any)

provider

Set to EmbeddingProvider.COHERE

model

The Cohere model name for embeddings (defaults to embed-english-v3.0)

input_type

Type of input to be embedded (defaults to search_document)

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

instantiate(**kwargs)[source]

Instantiate a Cohere embedding model.

Parameters:

**kwargs – Additional keyword arguments to pass to CohereEmbeddings

Returns:

The instantiated embedding model

Return type:

CohereEmbeddings

class haive.core.models.embeddings.base.FastEmbedEmbeddingConfig(/, **data)[source]

Bases: BaseEmbeddingConfig

Configuration for FastEmbed embedding models.

This class configures FastEmbed models, which are lightweight and efficient embeddings that can run on CPU.

Parameters:

data (Any)

provider

Set to EmbeddingProvider.FASTEMBED

model

The model name (defaults to BAAI/bge-small-en-v1.5)

max_length

Maximum sequence length

cache_folder

Where to cache the model files

use_cache

Whether to use embedding caching

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

instantiate(**kwargs)[source]

Instantiate a FastEmbed embedding model.

Parameters:

**kwargs – Additional keyword arguments to pass to FastEmbedEmbeddings

Returns:

The instantiated embedding model

Return type:

FastEmbedEmbeddings

class haive.core.models.embeddings.base.HuggingFaceEmbeddingConfig(/, **data)[source]

Bases: BaseEmbeddingConfig

Configuration for HuggingFace embedding models.

This class configures embedding models from HuggingFace’s model hub, with support for local caching and hardware acceleration.

Parameters:

data (Any)

provider

Set to EmbeddingProvider.HUGGINGFACE

model

The HuggingFace model ID (defaults to all-MiniLM-L6-v2)

model_kwargs

Additional keyword arguments for model instantiation

encode_kwargs

Additional keyword arguments for encoding

query_encode_kwargs

Additional keyword arguments for query encoding

multi_process

Whether to use multi-processing for encoding

cache_folder

Where to cache the model files

show_progress

Whether to show progress bars

use_cache

Whether to use embedding caching

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

instantiate(**kwargs)[source]

Instantiate a HuggingFace embedding model.

This method includes error handling and GPU memory cleanup in case of initialization failures.

Parameters:

**kwargs – Additional keyword arguments to pass to HuggingFaceEmbeddings

Returns:

The instantiated embedding model

Return type:

HuggingFaceEmbeddings

Raises:

Exception – If model instantiation fails after cleanup attempt

class haive.core.models.embeddings.base.JinaEmbeddingConfig(/, **data)[source]

Bases: BaseEmbeddingConfig

Configuration for Jina AI embedding models.

This class configures embedding models from Jina AI.

Parameters:

data (Any)

provider

Set to EmbeddingProvider.JINA

model

The model name (defaults to jina-embeddings-v2-base-en)

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

instantiate(**kwargs)[source]

Instantiate a Jina AI embedding model.

Parameters:

**kwargs – Additional keyword arguments to pass to JinaEmbeddings

Returns:

The instantiated embedding model

Return type:

JinaEmbeddings

class haive.core.models.embeddings.base.LlamaCppEmbeddingConfig(/, **data)[source]

Bases: BaseEmbeddingConfig

Configuration for LlamaCpp local embedding models.

This class configures embedding models using LlamaCpp for local execution.

Parameters:

data (Any)

provider

Set to EmbeddingProvider.LLAMACPP

model

Required model name parameter (for compatibility with BaseEmbeddingConfig)

model_path

Path to the model file

n_ctx

Context size for the model

n_batch

Batch size for inference

n_gpu_layers

Number of layers to offload to GPU

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

instantiate(**kwargs)[source]

Instantiate a LlamaCpp embedding model.

Parameters:

**kwargs – Additional keyword arguments to pass to LlamaCppEmbeddings

Returns:

The instantiated embedding model

Return type:

LlamaCppEmbeddings

class haive.core.models.embeddings.base.MockTorch[source]

Mock torch module for documentation builds.

class haive.core.models.embeddings.base.OllamaEmbeddingConfig(/, **data)[source]

Bases: BaseEmbeddingConfig

Configuration for Ollama embedding models.

This class configures embedding models from Ollama, which runs locally and doesn’t require an API key.

Parameters:

data (Any)

provider

Set to EmbeddingProvider.OLLAMA

model

The Ollama model name (defaults to nomic-embed-text)

base_url

The base URL for the Ollama server

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

instantiate(**kwargs)[source]

Instantiate an Ollama embedding model.

Parameters:

**kwargs – Additional keyword arguments to pass to OllamaEmbeddings

Returns:

The instantiated embedding model

Return type:

OllamaEmbeddings

class haive.core.models.embeddings.base.OpenAIEmbeddingConfig(/, **data)[source]

Bases: BaseEmbeddingConfig

Configuration for OpenAI embedding models.

This class configures embedding models from OpenAI services, supporting multiple model types and configurations.

Parameters:

data (Any)

provider

Set to EmbeddingProvider.OPENAI

model

The OpenAI model name for embeddings (defaults to text-embedding-3-small)

dimensions

Output dimensions for the embedding vectors

show_progress_bar

Whether to show progress bars during embedding

chunk_size

Batch size for embedding operations

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

instantiate(**kwargs)[source]

Instantiate an OpenAI embedding model.

Parameters:

**kwargs – Additional keyword arguments to pass to OpenAIEmbeddings

Returns:

The instantiated embedding model

Return type:

OpenAIEmbeddings

class haive.core.models.embeddings.base.SecureConfigMixin[source]

Mixin for securely handling API keys from environment variables.

This mixin provides methods for securely resolving API keys from environment variables or explicitly provided values, with appropriate fallbacks.

classmethod resolve_api_key(v, info)[source]

Resolve API key from provided value or environment variables.

Parameters:
  • v – The provided API key value

  • info (pydantic.ValidationInfo) – ValidationInfo containing field data

Returns:

The resolved API key as a SecretStr

Return type:

SecretStr

class haive.core.models.embeddings.base.SentenceTransformerEmbeddingConfig(/, **data)[source]

Bases: BaseEmbeddingConfig

Configuration for SentenceTransformer embedding models.

This class configures embedding models from SentenceTransformers library, which provides efficient and accurate sentence and text embeddings.

Parameters:

data (Any)

provider

Set to EmbeddingProvider.SENTENCE_TRANSFORMERS

model

The model name or path (defaults to all-MiniLM-L6-v2)

cache_folder

Where to cache the model files

use_cache

Whether to use embedding caching

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

instantiate(**kwargs)[source]

Instantiate a SentenceTransformer embedding model.

Parameters:

**kwargs – Additional keyword arguments to pass to SentenceTransformerEmbeddings

Returns:

The instantiated embedding model

Return type:

SentenceTransformerEmbeddings

class haive.core.models.embeddings.base.VertexAIEmbeddingConfig(/, **data)[source]

Bases: BaseEmbeddingConfig

Configuration for Google Vertex AI embedding models.

This class configures embedding models from Google Vertex AI.

Parameters:

data (Any)

provider

Set to EmbeddingProvider.VERTEXAI

model

The model name (defaults to textembedding-gecko@latest)

project

Google Cloud project ID

location

Google Cloud region

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

instantiate(**kwargs)[source]

Instantiate a Google Vertex AI embedding model.

Parameters:

**kwargs – Additional keyword arguments to pass to VertexAIEmbeddings

Returns:

The instantiated embedding model

Return type:

VertexAIEmbeddings

class haive.core.models.embeddings.base.VertexAIEmbeddings(*args, **kwargs)[source]

Mock VertexAI embeddings to avoid slow imports.

class haive.core.models.embeddings.base.VoyageAIEmbeddingConfig(/, **data)[source]

Bases: BaseEmbeddingConfig

Configuration for Voyage AI embedding models.

This class configures embedding models from Voyage AI.

Parameters:

data (Any)

provider

Set to EmbeddingProvider.VOYAGEAI

model

The model name (defaults to voyage-2)

voyage_api_url

The API URL for Voyage AI

voyage_api_version

The API version for Voyage AI

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

instantiate(**kwargs)[source]

Instantiate a Voyage AI embedding model.

Parameters:

**kwargs – Additional keyword arguments to pass to VoyageEmbeddings

Returns:

The instantiated embedding model

Return type:

VoyageEmbeddings

haive.core.models.embeddings.base.create_embeddings(config)[source]

Factory function to create embedding models from a configuration.

This function simplifies the instantiation of embedding models by delegating to the appropriate configuration class.

Parameters:

config (BaseEmbeddingConfig) – The embedding model configuration

Returns:

The instantiated embedding model

Return type:

Any

Example:

Examples

>>> config = HuggingFaceEmbeddingConfig(model="sentence-transformers/all-MiniLM-L6-v2")
>>> embeddings = create_embeddings(config)
haive.core.models.embeddings.base.get_huggingface_embeddings()[source]

Lazy import of HuggingFaceEmbeddings.