Orca Basic Usage Examples¶

This notebook provides basic usage examples for the OrcaLib library. It provides various examples of inserting and reading data from an OrcaDB instance.

Import Orca¶

[1]:

import orcalib as orca
import pandas as pd
import numpy as np

Authentication¶

Please note that you will need to OrcaDB running either locally or in the cloud to use this notebook. Please adjust the following configurations to match your OrcaDB instance.

[2]:

import os
orca.set_credentials(
    api_key=os.getenv("ORCADB_API_KEY", "my_api_key"),
    secret_key=os.getenv("ORCADB_SECRET_KEY", "my_secret_key"),
    endpoint=os.getenv("ORCADB_ENDPOINT", "http://localhost:1583"),
)

Creating a Database, a Table and an Index¶

[3]:

from orcalib import TextT, IntT, DocumentT, Float32T, VectorT
from orcalib import TableCreateMode

db = orca.OrcaDatabase("my_database")

table = db.create_table(
    "my_table",
    page_id=IntT.unique.notnull,
    title=TextT.unique.notnull,
    content=TextT.notnull,
    score=Float32T.notnull,
    vector=VectorT[768].notnull,
    if_table_exists=TableCreateMode.REPLACE_CURR_TABLE,
)

db.create_text_index(
    index_name="title_index",
    table_name="my_table",
    column="title",
)

db.create_text_index(
    index_name="content_index",
    table_name="my_table",
    column="content",
)

Creating index title_index of type text on table my_table with column title
Creating index content_index of type text on table my_table with column content

[3]:

text index: content_index on my_database.my_table.content (text)

Inserting Data¶

This section shows how to insert data into OrcaDB using the insert method.

[4]:

# Simple argument based insert
table.insert(
    page_id=1,
    title="Page 1",
    content="Today it is sunny.",
    score=0.5,
    vector=np.random.rand(768).tolist(),
)

# Insert with a dictionary
table.insert(
    {
        "page_id": 2,
        "title": "Page 2",
        "content": "I like cheese.",
        "score": 0.75,
        "vector": np.random.rand(768).tolist(),
    }
)

# Insert with a list of dictionaries
table.insert(
    [
        {
            "page_id": 3,
            "title": "Page 3",
            "content": "The car was blue.",
            "score": 0.25,
            "vector": np.random.rand(768).tolist(),
        },
        {
            "page_id": 4,
            "title": "Page 4",
            "content": "My favorite nacho topping is cheddar cheese.",
            "score": 0.1,
            "vector": np.random.rand(768).tolist(),
        },
    ]
)

# Insert a dataframe
df = pd.DataFrame(
    {
        "page_id": [5, 6],
        "title": ["Page 5", "Page 6"],
        "content": ["Two plus two equals four.", "The cat is on the mat."],
        "score": [0.9, 0.2],
        "vector": [np.random.rand(768).tolist(), np.random.rand(768).tolist()],
    }
)

table.insert(df)

Querying Data¶

OrcaDB supports a query builder that allows you to build complex queries using a simple API. This section shows how to query data from OrcaDB using the select method.

[5]:

# Basic query on page_id column

result = (
    table
    .select("page_id", "title", "score")
    .where(table.page_id >= 3)
    .order_by(table.score)
    .df(limit=100)
)

result

[5]:

	page_id	title	score
0	4	Page 4	0.10
1	6	Page 6	0.20
2	3	Page 3	0.25
3	5	Page 5	0.90

OrcaDB supports a SQL-like query language for querying data. This section shows how to query data from OrcaDB using the query method.

[6]:

# Raw SQL for more complex queries

result = db.query(
    """
    SELECT page_id, title, score
    FROM my_table
    WHERE page_id >= 3
    ORDER BY score
    LIMIT 100
    """
)

result

[6]:

	page_id	title	score
0	4	Page 4	0.10
1	6	Page 6	0.20
2	3	Page 3	0.25
3	5	Page 5	0.90

Index queries¶

One key capability of Orca is the ability to do semantic queries on the data. In this example, we do a query to find all records that have a similar value to a given query. Note that the query is not an exact match, but a semantic match.

[7]:

result = (
    db
    .scan_index(
        index_name="content_index",
        query="I also like cheese"
    )
    .select(table.page_id, table.content)
    .df(limit=3)
)

result

[7]:

	page_id	content
0	2	I like cheese.
1	4	My favorite nacho topping is cheddar cheese.
2	6	The cat is on the mat.

Updating Data¶

This section shows how to update data in OrcaDB using the update method.

[8]:

table.update(
    {
        "title": "Page 2 (edited)",
        "content": "I don't like cheese.",
    },
    table.page_id == 2,
)

table.select("page_id", "title", "content").where(table.page_id <= 2).df(limit=10)

[8]:

	page_id	title	content
0	1	Page 1	Today it is sunny.
1	2	Page 2 (edited)	I don't like cheese.

Now let’s repeat the same query we did earlier to see if the update was successful.

[9]:

result = (
    db
    .scan_index(
        index_name="content_index",
        query="Do you like nachos?"
    )
    .select(table.page_id, table.title, table.content)
    .df(limit=3)
)

result

[9]:

	page_id	title	content
0	4	Page 4	My favorite nacho topping is cheddar cheese.
1	2	Page 2 (edited)	I don't like cheese.
2	3	Page 3	The car was blue.

Deleting Data¶

This section shows how to delete data from OrcaDB using the delete method.

[10]:

table.delete(table.page_id == 2)

table.select("page_id", "title", "content").where(table.page_id <= 2).df(limit=10)

[10]:

	page_id	title	content
0	1	Page 1	Today it is sunny.