haive.core.models.embeddings.base¶
Base Embedding Models Module.
from typing import Any This module provides the foundational abstractions for embedding models in the Haive framework. It includes base classes and implementations for different embedding providers that transform text into high-dimensional vector representations for use in semantic search, clustering, and other NLP tasks.
Typical usage example:
Examples
>>> from haive.core.models.embeddings.base import create_embeddings, HuggingFaceEmbeddingConfig
>>>
>>> # Create a HuggingFace embedding model configuration
>>> config = HuggingFaceEmbeddingConfig(
>>> model="sentence-transformers/all-MiniLM-L6-v2"
>>> )
>>>
>>> # Instantiate the embeddings model
>>> embeddings = create_embeddings(config)
>>>
>>> # Use the model to embed documents or queries
>>> doc_embeddings = embeddings.embed_documents(["Text to embed"])
Classes¶
Configuration for Anyscale embedding models. |
|
Configuration for Azure OpenAI embedding models. |
|
Base configuration for embedding models. |
|
Configuration for AWS Bedrock embedding models. |
|
Configuration for Cloudflare Workers AI embedding models. |
|
Configuration for Cohere embedding models. |
|
Configuration for FastEmbed embedding models. |
|
Configuration for HuggingFace embedding models. |
|
Configuration for Jina AI embedding models. |
|
Configuration for LlamaCpp local embedding models. |
|
Mock torch module for documentation builds. |
|
Configuration for Ollama embedding models. |
|
Configuration for OpenAI embedding models. |
|
Mixin for securely handling API keys from environment variables. |
|
Configuration for SentenceTransformer embedding models. |
|
Configuration for Google Vertex AI embedding models. |
|
Mock VertexAI embeddings to avoid slow imports. |
|
Configuration for Voyage AI embedding models. |
Functions¶
|
Factory function to create embedding models from a configuration. |
Lazy import of HuggingFaceEmbeddings. |
Module Contents¶
- class haive.core.models.embeddings.base.AnyscaleEmbeddingConfig(/, **data)[source]¶
Bases:
BaseEmbeddingConfigConfiguration for Anyscale embedding models.
This class configures embedding models from Anyscale.
- Parameters:
data (Any)
- provider¶
Set to EmbeddingProvider.ANYSCALE
- model¶
The model name (defaults to thenlper/gte-large)
- base_url¶
The base URL for the Anyscale API
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- class haive.core.models.embeddings.base.AzureEmbeddingConfig(/, **data)[source]¶
Bases:
BaseEmbeddingConfigConfiguration for Azure OpenAI embedding models.
This class configures embedding models from Azure OpenAI services, supporting environment variable resolution for credentials.
- Parameters:
data (Any)
- provider¶
Set to EmbeddingProvider.AZURE
- model¶
The Azure deployment name for the embedding model
- api_version¶
The Azure OpenAI API version to use
- api_base¶
The Azure endpoint URL
- api_type¶
The API type (typically “azure”)
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- class haive.core.models.embeddings.base.BaseEmbeddingConfig(/, **data)[source]¶
Bases:
pydantic.BaseModel,SecureConfigMixinBase configuration for embedding models.
This abstract base class defines the common interface for all embedding model configurations, ensuring consistent instantiation patterns across providers.
- Parameters:
data (Any)
- provider¶
The embedding provider (e.g., Azure, HuggingFace)
- model¶
The specific model identifier or name
- api_key¶
The API key for the provider (if required)
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- abstractmethod instantiate(**kwargs)[source]¶
Instantiate the embedding model with the configuration.
- Parameters:
**kwargs – Additional keyword arguments to pass to the model constructor
- Returns:
The instantiated embedding model
- Return type:
Any
- Raises:
NotImplementedError – Must be implemented by subclasses
- class haive.core.models.embeddings.base.BedrockEmbeddingConfig(/, **data)[source]¶
Bases:
BaseEmbeddingConfigConfiguration for AWS Bedrock embedding models.
This class configures embedding models from AWS Bedrock service.
- Parameters:
data (Any)
- provider¶
Set to EmbeddingProvider.BEDROCK
- model¶
The model ID (defaults to amazon.titan-embed-text-v1)
- region¶
AWS region
- credentials_profile_name¶
AWS credentials profile name
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- class haive.core.models.embeddings.base.CloudflareEmbeddingConfig(/, **data)[source]¶
Bases:
BaseEmbeddingConfigConfiguration for Cloudflare Workers AI embedding models.
This class configures embedding models from Cloudflare Workers AI.
- Parameters:
data (Any)
- provider¶
Set to EmbeddingProvider.CLOUDFLARE
- model¶
The model name (defaults to @cf/baai/bge-small-en-v1.5)
- account_id¶
Cloudflare account ID
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- class haive.core.models.embeddings.base.CohereEmbeddingConfig(/, **data)[source]¶
Bases:
BaseEmbeddingConfigConfiguration for Cohere embedding models.
This class configures embedding models from Cohere services.
- Parameters:
data (Any)
- provider¶
Set to EmbeddingProvider.COHERE
- model¶
The Cohere model name for embeddings (defaults to embed-english-v3.0)
- input_type¶
Type of input to be embedded (defaults to search_document)
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- class haive.core.models.embeddings.base.FastEmbedEmbeddingConfig(/, **data)[source]¶
Bases:
BaseEmbeddingConfigConfiguration for FastEmbed embedding models.
This class configures FastEmbed models, which are lightweight and efficient embeddings that can run on CPU.
- Parameters:
data (Any)
- provider¶
Set to EmbeddingProvider.FASTEMBED
- model¶
The model name (defaults to BAAI/bge-small-en-v1.5)
- max_length¶
Maximum sequence length
- cache_folder¶
Where to cache the model files
- use_cache¶
Whether to use embedding caching
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- class haive.core.models.embeddings.base.HuggingFaceEmbeddingConfig(/, **data)[source]¶
Bases:
BaseEmbeddingConfigConfiguration for HuggingFace embedding models.
This class configures embedding models from HuggingFace’s model hub, with support for local caching and hardware acceleration.
- Parameters:
data (Any)
- provider¶
Set to EmbeddingProvider.HUGGINGFACE
- model¶
The HuggingFace model ID (defaults to all-MiniLM-L6-v2)
- model_kwargs¶
Additional keyword arguments for model instantiation
- encode_kwargs¶
Additional keyword arguments for encoding
- query_encode_kwargs¶
Additional keyword arguments for query encoding
- multi_process¶
Whether to use multi-processing for encoding
- cache_folder¶
Where to cache the model files
- show_progress¶
Whether to show progress bars
- use_cache¶
Whether to use embedding caching
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- instantiate(**kwargs)[source]¶
Instantiate a HuggingFace embedding model.
This method includes error handling and GPU memory cleanup in case of initialization failures.
- Parameters:
**kwargs – Additional keyword arguments to pass to HuggingFaceEmbeddings
- Returns:
The instantiated embedding model
- Return type:
HuggingFaceEmbeddings
- Raises:
Exception – If model instantiation fails after cleanup attempt
- class haive.core.models.embeddings.base.JinaEmbeddingConfig(/, **data)[source]¶
Bases:
BaseEmbeddingConfigConfiguration for Jina AI embedding models.
This class configures embedding models from Jina AI.
- Parameters:
data (Any)
- provider¶
Set to EmbeddingProvider.JINA
- model¶
The model name (defaults to jina-embeddings-v2-base-en)
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- class haive.core.models.embeddings.base.LlamaCppEmbeddingConfig(/, **data)[source]¶
Bases:
BaseEmbeddingConfigConfiguration for LlamaCpp local embedding models.
This class configures embedding models using LlamaCpp for local execution.
- Parameters:
data (Any)
- provider¶
Set to EmbeddingProvider.LLAMACPP
- model¶
Required model name parameter (for compatibility with BaseEmbeddingConfig)
- model_path¶
Path to the model file
- n_ctx¶
Context size for the model
- n_batch¶
Batch size for inference
- n_gpu_layers¶
Number of layers to offload to GPU
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- class haive.core.models.embeddings.base.MockTorch[source]¶
Mock torch module for documentation builds.
- class haive.core.models.embeddings.base.OllamaEmbeddingConfig(/, **data)[source]¶
Bases:
BaseEmbeddingConfigConfiguration for Ollama embedding models.
This class configures embedding models from Ollama, which runs locally and doesn’t require an API key.
- Parameters:
data (Any)
- provider¶
Set to EmbeddingProvider.OLLAMA
- model¶
The Ollama model name (defaults to nomic-embed-text)
- base_url¶
The base URL for the Ollama server
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- class haive.core.models.embeddings.base.OpenAIEmbeddingConfig(/, **data)[source]¶
Bases:
BaseEmbeddingConfigConfiguration for OpenAI embedding models.
This class configures embedding models from OpenAI services, supporting multiple model types and configurations.
- Parameters:
data (Any)
- provider¶
Set to EmbeddingProvider.OPENAI
- model¶
The OpenAI model name for embeddings (defaults to text-embedding-3-small)
- dimensions¶
Output dimensions for the embedding vectors
- show_progress_bar¶
Whether to show progress bars during embedding
- chunk_size¶
Batch size for embedding operations
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- class haive.core.models.embeddings.base.SecureConfigMixin[source]¶
Mixin for securely handling API keys from environment variables.
This mixin provides methods for securely resolving API keys from environment variables or explicitly provided values, with appropriate fallbacks.
- class haive.core.models.embeddings.base.SentenceTransformerEmbeddingConfig(/, **data)[source]¶
Bases:
BaseEmbeddingConfigConfiguration for SentenceTransformer embedding models.
This class configures embedding models from SentenceTransformers library, which provides efficient and accurate sentence and text embeddings.
- Parameters:
data (Any)
- provider¶
Set to EmbeddingProvider.SENTENCE_TRANSFORMERS
- model¶
The model name or path (defaults to all-MiniLM-L6-v2)
- cache_folder¶
Where to cache the model files
- use_cache¶
Whether to use embedding caching
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- class haive.core.models.embeddings.base.VertexAIEmbeddingConfig(/, **data)[source]¶
Bases:
BaseEmbeddingConfigConfiguration for Google Vertex AI embedding models.
This class configures embedding models from Google Vertex AI.
- Parameters:
data (Any)
- provider¶
Set to EmbeddingProvider.VERTEXAI
- model¶
The model name (defaults to textembedding-gecko@latest)
- project¶
Google Cloud project ID
- location¶
Google Cloud region
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- class haive.core.models.embeddings.base.VertexAIEmbeddings(*args, **kwargs)[source]¶
Mock VertexAI embeddings to avoid slow imports.
- class haive.core.models.embeddings.base.VoyageAIEmbeddingConfig(/, **data)[source]¶
Bases:
BaseEmbeddingConfigConfiguration for Voyage AI embedding models.
This class configures embedding models from Voyage AI.
- Parameters:
data (Any)
- provider¶
Set to EmbeddingProvider.VOYAGEAI
- model¶
The model name (defaults to voyage-2)
- voyage_api_url¶
The API URL for Voyage AI
- voyage_api_version¶
The API version for Voyage AI
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- haive.core.models.embeddings.base.create_embeddings(config)[source]¶
Factory function to create embedding models from a configuration.
This function simplifies the instantiation of embedding models by delegating to the appropriate configuration class.
- Parameters:
config (BaseEmbeddingConfig) – The embedding model configuration
- Returns:
The instantiated embedding model
- Return type:
Any
Example:
Examples
>>> config = HuggingFaceEmbeddingConfig(model="sentence-transformers/all-MiniLM-L6-v2") >>> embeddings = create_embeddings(config)