Scalar indexes organize data by scalar attributes (e.g., numbers, categories) and enable fast filtering of vector data. They accelerate retrieval of scalar data associated with vectors, thus enhancing query performance. LanceDB supports three types of scalar indexes:Documentation Index
Fetch the complete documentation index at: https://lancedb-bcbb4faf-docs-nested-index-field-paths.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
BTREE: Stores column data in sorted order for binary search. Best for columns with many unique values.BITMAP: Uses bitmaps to track value presence. Ideal for columns with few unique values (e.g., categories, tags).LABEL_LIST: Special index forList<T>columns supportingarray_contains_allandarray_contains_anyqueries.
Choosing the Right Index Type
| Data Type | Filter | Index Type |
|---|---|---|
| Numeric, String, Temporal | <, =, >, in, between, is null | BTREE |
| Boolean, numbers or strings with fewer than 1,000 unique values | <, =, >, in, between, is null | BITMAP |
| List of low cardinality of numbers or strings | array_has_any, array_has_all | LABEL_LIST |
Scalar Index Operations
1. Build the Index
You can create multiple scalar indexes within a table. By default, the index will beBTREE, but you can always configure another type like BITMAP
If you are using LanceDB Enterprise, the
create_scalar_index API returns immediately, but the building of the scalar index is asynchronous. To wait until all data is fully indexed, you can specify the wait_timeout parameter on create_scalar_index() or call wait_for_index() on the table.2. Check Index Status
3. Update the Index
Updating the table data (adding, deleting, or modifying records) requires that you also update the scalar index. This can be done by callingoptimize, which will trigger an update to the existing scalar index.
New data added after creating the scalar index will still appear in search results if optimize is not used, but with increased latency due to a flat search on the unindexed portion. LanceDB Enterprise automates the optimize process, minimizing the impact on search speed.
4. Run Indexed Searches
The following scan will be faster if the columnbook_id has a scalar index:
Scalar indexes can also speed up scans containing a vector search or full text search, and a prefilter:
Index UUID Columns
LanceDB supports scalar indexes on UUID columns (stored asFixedSizeBinary(16)), enabling efficient lookups and filtering on UUID-based primary keys.
To use
FixedSizeBinary, ensure you have:- Python SDK version
0.22.0or later - TypeScript SDK version
0.19.0or later
1. Define UUID Type
2. Generate UUID Data
3. Create Table with UUID Column
4. Create and Wait for the Index
5. Perform Operations with the UUID Index
Index nested fields
You can build a scalar index on a field inside a struct column by passing the canonical dot-separated path tocreate_index. This is useful when filters
target attributes nested under a metadata-style column, for example
metadata.user_id or metadata.event.type.
If a literal segment of the path itself contains a dot (for example a column
named user.id nested inside metadata), wrap that segment in backticks so
LanceDB can tell the dot apart from the path separator: metadata.`user.id`.
list_indices() echoes the same canonical path back, so the column you pass in
round-trips through index metadata regardless of nesting depth or escaping.
Python
Composite indexes that cover multiple columns aren’t supported yet. Each
create_index call must target a single (possibly nested) field path.