Orca Basic Usage Examples¶
This notebook provides basic usage examples for the OrcaLib library. It provides various examples of inserting and reading data from an OrcaDB instance.
Import Orca¶
[1]:
import orcalib as orca
import pandas as pd
import numpy as np
Authentication¶
Please note that you will need to OrcaDB running either locally or in the cloud to use this notebook. Please adjust the following configurations to match your OrcaDB instance.
[2]:
import os
orca.set_credentials(
api_key=os.getenv("ORCADB_API_KEY", "my_api_key"),
secret_key=os.getenv("ORCADB_SECRET_KEY", "my_secret_key"),
endpoint=os.getenv("ORCADB_ENDPOINT", "http://localhost:1583"),
)
Creating a Database, a Table and an Index¶
[3]:
from orcalib import TextT, IntT, DocumentT, Float32T, VectorT
from orcalib import TableCreateMode
db = orca.OrcaDatabase("my_database")
table = db.create_table(
"my_table",
page_id=IntT.unique.notnull,
title=TextT.unique.notnull,
content=TextT.notnull,
score=Float32T.notnull,
vector=VectorT[768].notnull,
if_table_exists=TableCreateMode.REPLACE_CURR_TABLE,
)
db.create_text_index(
index_name="title_index",
table_name="my_table",
column="title",
)
db.create_text_index(
index_name="content_index",
table_name="my_table",
column="content",
)
Creating index title_index of type text on table my_table with column title
Creating index content_index of type text on table my_table with column content
[3]:
text index: content_index on my_database.my_table.content (text)
Inserting Data¶
This section shows how to insert data into OrcaDB using the insert
method.
[4]:
# Simple argument based insert
table.insert(
page_id=1,
title="Page 1",
content="Today it is sunny.",
score=0.5,
vector=np.random.rand(768).tolist(),
)
# Insert with a dictionary
table.insert(
{
"page_id": 2,
"title": "Page 2",
"content": "I like cheese.",
"score": 0.75,
"vector": np.random.rand(768).tolist(),
}
)
# Insert with a list of dictionaries
table.insert(
[
{
"page_id": 3,
"title": "Page 3",
"content": "The car was blue.",
"score": 0.25,
"vector": np.random.rand(768).tolist(),
},
{
"page_id": 4,
"title": "Page 4",
"content": "My favorite nacho topping is cheddar cheese.",
"score": 0.1,
"vector": np.random.rand(768).tolist(),
},
]
)
# Insert a dataframe
df = pd.DataFrame(
{
"page_id": [5, 6],
"title": ["Page 5", "Page 6"],
"content": ["Two plus two equals four.", "The cat is on the mat."],
"score": [0.9, 0.2],
"vector": [np.random.rand(768).tolist(), np.random.rand(768).tolist()],
}
)
table.insert(df)
Querying Data¶
OrcaDB supports a query builder that allows you to build complex queries using a simple API. This section shows how to query data from OrcaDB using the select
method.
[5]:
# Basic query on page_id column
result = (
table
.select("page_id", "title", "score")
.where(table.page_id >= 3)
.order_by(table.score)
.df(limit=100)
)
result
[5]:
page_id | title | score | |
---|---|---|---|
0 | 4 | Page 4 | 0.10 |
1 | 6 | Page 6 | 0.20 |
2 | 3 | Page 3 | 0.25 |
3 | 5 | Page 5 | 0.90 |
OrcaDB supports a SQL-like query language for querying data. This section shows how to query data from OrcaDB using the query
method.
[6]:
# Raw SQL for more complex queries
result = db.query(
"""
SELECT page_id, title, score
FROM my_table
WHERE page_id >= 3
ORDER BY score
LIMIT 100
"""
)
result
[6]:
page_id | title | score | |
---|---|---|---|
0 | 4 | Page 4 | 0.10 |
1 | 6 | Page 6 | 0.20 |
2 | 3 | Page 3 | 0.25 |
3 | 5 | Page 5 | 0.90 |
Index queries¶
One key capability of Orca is the ability to do semantic queries on the data. In this example, we do a query to find all records that have a similar value to a given query. Note that the query is not an exact match, but a semantic match.
[7]:
result = (
db
.scan_index(
index_name="content_index",
query="I also like cheese"
)
.select(table.page_id, table.content)
.df(limit=3)
)
result
[7]:
page_id | content | |
---|---|---|
0 | 2 | I like cheese. |
1 | 4 | My favorite nacho topping is cheddar cheese. |
2 | 6 | The cat is on the mat. |
Updating Data¶
This section shows how to update data in OrcaDB using the update
method.
[8]:
table.update(
{
"title": "Page 2 (edited)",
"content": "I don't like cheese.",
},
table.page_id == 2,
)
table.select("page_id", "title", "content").where(table.page_id <= 2).df(limit=10)
[8]:
page_id | title | content | |
---|---|---|---|
0 | 1 | Page 1 | Today it is sunny. |
1 | 2 | Page 2 (edited) | I don't like cheese. |
Now let’s repeat the same query we did earlier to see if the update was successful.
[9]:
result = (
db
.scan_index(
index_name="content_index",
query="Do you like nachos?"
)
.select(table.page_id, table.title, table.content)
.df(limit=3)
)
result
[9]:
page_id | title | content | |
---|---|---|---|
0 | 4 | Page 4 | My favorite nacho topping is cheddar cheese. |
1 | 2 | Page 2 (edited) | I don't like cheese. |
2 | 3 | Page 3 | The car was blue. |
Deleting Data¶
This section shows how to delete data from OrcaDB using the delete
method.
[10]:
table.delete(table.page_id == 2)
table.select("page_id", "title", "content").where(table.page_id <= 2).df(limit=10)
[10]:
page_id | title | content | |
---|---|---|---|
0 | 1 | Page 1 | Today it is sunny. |