Anzo® Unstructured - operator image

cambridgesemantics/unstructured-operator
Standalone image
Single-stream repository
Cambridge Semantics
3.1.3-20250912060136latest3.1.3
Overview

Description

Graph Studio Unstructured Operator (anzo-unstructured-operator)

By Altair Engineering Inc.

 

Supported tags

About Graph Studio Unstructured

Graph Studio Unstructured solution has two main parts:

Microservice Leader

The Microservice Leader works in concert with the Graph Studio Agent and Graph Studio Unstructured to perform ingestion of unstructured data into RDF suitable for use in an Graph Studio Data Fabric. The Microservice Leader provides queuing and coordination functions for a cluster of Graph Studio Unstructured workers. Ingesting unstructured data with an Graph Studio Server requires the configuration and use of one or more Microservice Leader nodes.

 

Graph Studio Unstructured Worker

The Graph Studio Unstructured Worker works in concert with the Microservice Leader and Graph Studio Agent to perform ingestion of unstructured data into RDF suitable for use in an Graph Studio Data Fabric. Graph Studio Unstructured supports extraction from plain text, HTML, PDF, Word, and Excel files, and creation of annotations based on regular expressions, a linked data knowledge base, or third-party annotators via a REST interface. Graph Studio Unstructured leverages Graph Studio's distributed microservice framework (via the Graph Studio Agent and Microservice Leader) to allow timely processing of large document sets. Use of Graph Studio Unstructured requires an appropriately licensed Graph Studio Server installation.

 

Project Status: stable

Operator Version: v1

Prerequisites

  • Kubernetes cluster, versions {1.32-1.25}
  • Kubectl, versions {1.32-1.25}

Setting up prerequisites

Steps to deploy Graph Studio Unstructured Operator

Graph Studio Unstructured Docker Images Used for Deployment

When you deploy Graph Studio Unstructured using operator, following are the set of images used for actual deployments. We have given reference docker commands to download the latest releases for each of them below.

 

Graph Studio Unstructured Operator

  • To download latest release, please use: docker pull registry.connect.redhat.com/cambridgesemantics/unstructured-operator

Anzo Microservices Leader

  • To download latest release, please use: docker pull registry.connect.redhat.com/cambridgesemantics/anzo-microservices-leader

Anzo Unstructured Worker

  • To download latest release, please use: docker pull registry.connect.redhat.com/cambridgesemantics/anzo-unstructured-worker

Steps to Deploy

  • Create Namespace, mention namespace in metadata.name
    kubectl create -f deploy/v1_namespace_default.yaml
  • Setup Service Account
    kubectl create -f deploy/default_v1_serviceaccount_unstructured-operator.yaml --namespace <namespace>
  • Setup RBAC
    kubectl create -f deploy/default_rbac.authorization.k8s.io_v1_role_unstructured-operator.yaml  --namespace <namespace>
    kubectl create -f deploy/default_rbac.authorization.k8s.io_v1_rolebinding_unstructured-operator.yaml l  --namespace <namespace>
    kubectl create -f deploy/rbac.authorization.k8s.io_v1_clusterrole_unstructured-operator.yaml
    kubectl create -f deploy/rbac.authorization.k8s.io_v1_clusterrolebinding_unstructured-operator.yaml  --namespace <namespace>
  • Setup the CRD
    kubectl create -f deploy/crds/apiextensions.k8s.io_v1_customresourcedefinition_anzounstructureds.anzounstructured.clusters.cambridgesemantics.com.yaml
  • Deploy anzo-operator
    kubectl create -f deploy/default_apps_v1_deployment_unstructured-operator.yaml --namespace <namespace>
  • Deploy Graph Studio Custom Resource(CR), i.e. Graph Studio Unstructured deployment
    kubectl apply -f deploy/default_anzounstructured.clusters.cambridgesemantics.com_v1_anzounstructured_au01.yaml --namespace <namespace>

NOTE One needs to edit deploy/default_apps_v1_deployment_unstructured-operator.yaml, with right docker image details.

 

Steps to delete AnzoUnstructured CR and Graph Studio Unstructured Operator

  • Delete Unstructured CR
    kubectl delete -f deploy/default_anzounstructured.clusters.cambridgesemantics.com_v1_anzounstructured_au01.yaml --namespace <namespace>
  • Delete unstructured-operator
    kubectl delete -f deploy/default_apps_v1_deployment_unstructured-operator.yaml --namespace <namespace>
  • Delete RBAC
    kubectl delete -f deploy/default_rbac.authorization.k8s.io_v1_role_unstructured-operator.yaml --namespace <namespace>
    kubectl delete -f deploy/default_rbac.authorization.k8s.io_v1_rolebinding_unstructured-operator.yaml --namespace <namespace>
    kubectl delete -f deploy/rbac.authorization.k8s.io_v1_clusterrole_unstructured-operator.yaml --namespace <namespace>
    kubectl delete -f deploy/rbac.authorization.k8s.io_v1_clusterrolebinding_unstructured-operator.yaml  --namespace <namespace>
  • Delete Service Account
    kubectl delete -f deploy/default_v1_serviceaccount_unstructured-operator.yaml --namespace <namespace>
  • Delete CRD
    kubectl delete -f deploy/crds/apiextensions.k8s.io_v1_customresourcedefinition_anzounstructureds.anzounstructured.clusters.cambridgesemantics.com.yaml

Graph Studio Unstructured CustomResource(CR) Specification

The following table lists the configurable parameters for Graph Studio Unstructured and their default values.(CR API Version: v1)

ParameterDescriptionDefault
metadata.nameName of CRau01
metadata.namespaceNamespace of CR 
metadata.labelsDictionary of (key: val) as labels of CR 
spec.volumesList of persistent volumes for Graph studio Unstructuredcommented, please uncomment to add value
spec.volumes.[i].nameName for persistent volume 
spec.volumes.[i].mountPathPath where persistent volume should be mounted inside container 
spec.volumes.[i].pvAttributes to configure persistent volume, of type v1.PersistentVolume 
spec.volumes.[i].pvcAttributes to configure persistent volume claim, of type v1.PersistentVolumeClaim 
spec.msLeader.nodeConfig.specConfiguration specification for Graph studio Unstructured Leader pods 
spec.msLeader.nodeConfig.spec.replicasNumber of pods for Graph studio Unstructured Leader1
spec.msLeader.nodeConfig.spec.serviceNameName of headless service for Graph studio Unstructuredau--ms
spec.msLeader.nodeConfig.spec.template.spec.serviceAccountNameService account name for podsunstructured-operator
spec.msLeader.nodeConfig.spec.template.spec.containers.x.NameName of Graph stdio Unstructured Leader containerms
spec.msLeader.jvmMemoryGraph Studio Unstructured leader JVM memory 
spec.msLeader.bootPropertiesGraph Studio Unstructured leader specific boot properties 
spec.auWorker.nodeConfig.specConfiguration specification for Graph studio Unstructured Worker pods 
spec.auWorker.nodeConfig.spec.replicasNumber of pods for Graph studio Unstructured Worker1
spec.auWorker.nodeConfig.spec.serviceNameName of headless service for Graph studio Unstructuredau--w
spec.auWorker.nodeConfig.spec.template.spec.serviceAccountNameService account name for podsunstructured-operator
spec.auWorker.nodeConfig.spec.template.spec.containers.x.NameName of Graph studio Unstructured Worker containerw
spec.auWorker.bootPropertiesGraph studio Unstructured worker specific boot properties 
spec.auWorker.jvmMemoryGraph studio Unstructured worker JVM memory 
spec.bootPropertiesBoot properties i.e. the environment variables for Graph studio Unstructured CRcommented, please uncomment to add value

References

https://docs.cambridgesemantics.com/

Published

Generally Available

Size

67.9 MB

Digest

SecurityTechnical information

General information

The following information was extracted from the containerfile and other sources.

SummaryGraph Studio® Unstructured Operator, ubi9 Image
DescriptionGraph Studio® Unstructured Operator lets a user deploy and manage life-cycle of Graph Studio® Unstructured via Graph Studio.
ProviderCambridge Semantics
Maintainerhttps://altair.com/customer-support

Technical information

The following information was extracted from the containerfile and other sources.

Repository nameGraph Studio® Unstructured Operator, ubi9 Image
Image version3.1.3
Architectureamd64
PackagesGet this image
Terms & conditionsBefore downloading or using this Container, you must agree to the Red Hat subscription agreement located at redhat.com/licenses. If you do not agree with these terms, do not download or use the Container. If you have an existing Red Hat Enterprise Agreement (or other negotiated agreement with Red Hat) with terms that govern subscription services associated with Containers, then your existing agreement will control.
Using registry tokens

Use the following instructions to get images from a Red Hat container registry using registry service account tokens. You will need to create a registry service account to use prior to completing any of the following tasks.

Using OpenShift secrets

First, you will need to add a reference to the appropriate secret and repository to your Kubernetes pod configuration via an imagePullSecrets field.

Then, use the following from the command line or from the OpenShift Dashboard GUI interface.

Using podman login

Use the following command(s) from a system with podman installed

Using docker login

Use the following command(s) from a system with docker service installed and running

Using Red Hat login

Use the following instructions to get images from a Red Hat container registry using your Red Hat login.

Using OpenShift

For best practices, it is recommended to use registry tokens when pulling content for OpenShift deployments.

Using podman login

Use the following command(s) from a system with podman installed

Using docker login

Use the following command(s) from a system with docker service installed and running

Red Hat logoLinkedInYouTubeFacebookTwitter

Platforms

Products & services

Try, buy, sell

Help

About Red Hat Ecosystem Catalog

The Red Hat Ecosystem Catalog is the official source for discovering and learning more about the Red Hat Ecosystem of both Red Hat and certified third-party products and services.

We’re the world’s leading provider of enterprise open source solutions—including Linux, cloud, container, and Kubernetes. We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

© 2025 Red Hat