To get started you will need to have the following installed:
uv
(installation instructions)aws
cli (installation instructions)- AWS cdk cli (see install instructions))
Note
If you don't have uv
installed (or don't want to), see alternate instructions below.
Install dependencies in a virtual environment:
uv sync --all-groups
Activate the virtual environment:
source .venv/bin/activate
While logged into aws, verify that it will build:
cd osdp && cdk synth
Important
If cdk cannot locate the python deps, restarting your shell will typically fix it.
Your application configuration values should be provided in a file named osdp/config.toml
. (An example of the format is provided in config.toml.example
)
stack_prefix = "my-stack"
embedding_model_arn = ""
foundation_model_arn = ""
manifest_fetch_concurrency = 15
ead_process_concurrency = 10
[data]
type = "iiif"
collection_url = "https://api.dc.library.northwestern.edu/api/v2/collections/ecacd539-fe38-40ec-bbc0-590acee3d4f2?as=iiif"
# Alternatively, for EAD data use the following structure for data
# [data]
# type = "ead"
# [data.s3]
# bucket = "my-bucket"
# prefix = "my-prefix
[tags]
project = "my-project"
stack_prefix
(str)(Required) - will be appended to the beginning of the CloudFormation stack on deploy. (*For NU devs this is not needed, it will use theDEV_PREFIX
env var in AWS.)data
(dist)(Required) - Describes either the IIIF collection url or S3 location of EAD source XML files to load on initial app creation.embedding_model_arn
(str)(Required) - Embedding model to use for Bedrock Knowledgebasefoundation_model_arn
(str)(Required) - Foundation model to use for Bedrock RetreiveAndGenerate invocationstags
(dict) - Key value pair tags applied to all resources in the stack.manifest_fetch_concurrency
(str) - The concurrency to use when retrieving IIIF manifests from your API.ead_process_concurrency
(str) - The concurrency to use when processing EAD files.
Run the deploy command, providing the name of your stack. (This will be your stack_prefix
+ _OSDP_Prototype
. It can also be obtained by running cdk ls
:
cd osdp && cdk ls
mystack-OSDP-Prototype # <-- it's this one
OsdpPipelineStack
OsdpPipelineStack/staging/OSDP-Prototype
Deploy
cd osdp && cdk deploy mystack-OSDP-Prototype
Additional data can be loaded by manually invoking the step function.
- Note that to load EAD data if you had initially run a IIIF load you will need to grant S3
GetObject
andListObjects
permissions to both the state machine and the EAD processing lambda.
Example state machine input for IIIF load:
{
"workflowType": "iiif",
"collection_url": "https://api.dc.library.northwestern.edu/api/v2/collections/ecacd539-fe38-40ec-bbc0-590acee3d4f2?as=iiif",
"s3": {
"Bucket": "yourstackname-osdp-prototype-xxxxxxxx",
"Key": "manifests.csv"
}
}
Example state machine input for EAD load:
{
"s3": {
"SourceBucket": "my-bucket",
"SourcePrefix": "my-prefix"
},
"workflowType": "ead"
}
To install uv
locally
Ensure you are using Python 3.10:
python --version
# Python 3.10.5
Create a virtual environment:
python -m venv .venv
Activate it:
source .venv/bin/activate
Install uv
within the virutual environment:
pip install uv
Install dependencies:
uv sync --all-groups
This is a ECR repository that is consumed by the OSDP.
See its readme for more information.
The OSDP primarily uses Python, but NodeJS is used for one lambda.
Follow the steps in the quickstart
Tip
If using VSCode, set your Python interpreter by opening the Command Palette ⇧⌘P
and choosing Python: Select Interpreter
.
Set the path to ./.venv/bin/python
.
Read more here.
Ensure using v22:
node --version
# v22.13.1
There is one lambda that uses node:
cd functions/build_function
npm i
cd ../../
To run the tests.
pytest
If the app has not been synthesized, tests may fail. See synth instructions.
ruff check .
or
ruff check --fix .
To run formatting
ruff format .
Useful cdk
commands to be run within the osdp/
directory:
cdk ls
list all stacks in the appcdk synth
emits the synthesized CloudFormation templatecdk deploy
deploy this stack to your default AWS account/regioncdk diff
compare deployed stack with current statecdk docs
open CDK documentation
Enjoy!