GraphQL User guide#

This guide documents the basics of using Pyxis GraphQL, as well as using GraphQL in general. For optimizing your queries see partners.md.

If you’re looking for how to acquire access to Pyxis in the first place, see here.

GraphQL#

Because GraphQL is a relatively new language, this guide includes a short overview of it. While most users are fairly familiar with REST and might want to apply the same kind of thinking to GraphQL, this is not the best approach. GraphQL does not really have anything similar to REST endpoints, but rather defines types with fields that resemble objects. Let’s look at an example of such type for a user object:

type User {
    id: ID
    name: String
    surname: String
    age: Int
}

Four fields are defined here (id, name, surname, age), each with a data type (ID, String and Int). You might be wondering about the ID type, but by default, it is just a String alias. A GraphQL schema is composed of such types, along with a special Query type, that is used to define what can be queried. Let’s define the Query type:

type Query {
    get_user(id: ID!): User
}

This defines a get_user query that takes one argument named id and returns a User type. The id argument is required to be non-null by the ! type modifier. All of the code examples mentioned so far would be a part of the schema on the server side.

From the client side you’d be able to query a user with a id of abc123 as follows:

query Example{
    get_user(id: "abc123"){
        name
        surname
    }
}

Note how we are able to query for just the fields that we need (name and surname in this case), which is the biggest difference from REST APIs. If you’re making just one query, you can omit the query Example part. Another great feature is that you can do a bunch of queries in a single call:

{
    get_user(id: "abc123"){
        name
        surname
    }
    get_user(id: "edf456"){
        name
        age
    }
}

This concludes the short introduction to GraphQL to get you started, however, it is recommended that you take a look at some of the listed sources to learn more:

Quickstart#

In this section we’ll cover how we do GraphQL in Pyxis. Note that you should have some basic knowledge of GraphQL going into this section.

First, get your Pyxis GraphQL up and running, as described in the README.md. You should have http://localhost:8000/graphql/ showing you a GraphQL playground. On the right side you can browse schema definition and documentation.

We use GraphQLCRUD for query naming. Queries starting with get_ fetch a single resource, while queries that begin with find_ fetch paginated results.

Response format and error handling#

Unlike REST, GraphQL does not use HTTP status codes to communicate errors. Instead, applications are expected to communicate errors in the response. That’s why we use a unified format across Pyxis GraphQL:

{
    data {
        error {
          status
          detail
        }
        data {}
    }
    errors [{
        message  
    }]
}

The top-level data/errors structure is provided by the graphql server. If errors is not present, that means the top-level data is valid. Inside that we provide data and error for any errors that occurred during the resolve process. So when fetching a resource, you’d first check if there are any errors in the error key, and if that field is null, it would signal that the query was successful and the data field is valid. What type is in the data is dependent on the query you’re doing, more specifically on the type you’re querying for.

get_ queries#

Let’s look a single resource query first.

GraphQL query:

{
  get_image(id: "abc") {
    error {
      status
      detail
    }
    data {
      _id
      brew {
        build
        nvra
      }
    }
  }
}

REST endpoint call equivalent:

https://catalog.redhat.com/api/containers/v1/images/id/abc?include=_id,brew.build,brew.nvra

Here we are fetching a single container Image with the id of abc and getting its _id and some nested brew fields.

So, executing the query successfully should yield a similar response to this example:

{
  "data": {
    "get_image": {
      "error": null,
      "data": {
        "_id": "abc",
        "brew": {
          "build": "package-1.1-11",
          "nvra": "package-1.1-11.architecture"
        }
      }
    }
  }
}

find_ queries#

Find queries use the same format, but add some more parameters because of pagination. Let’s look at the definition of the find_images query:

find_images(page_size: Int = 50, page: Int = 0, sort_by: [sort_by!])

Notice that you can specify how many results per page you want, as well as the page, just how you would in Pyxis REST, with the default 50 results per page, starting at the first page. You can also specify sorting in a list of sort_by types, which consist of field name and either DESC or ASC for order. Let’s see an example now.

GraphQL query:

{
  find_images(page_size: 3, sort_by: [{ field: "creation_date", order: DESC }]) {
    error {
      status
      detail
    }
    page
    page_size
    total
    data {
      _id
      creation_date
    }
  }
}

REST endpoint call equivalent:

https://catalog.redhat.com/api/containers/v1/images?page_size=3&sort_by=creation_date[desc]&include=data._id,data.creation_date

In this query we request 3 images with the most recent creation_date. Because this result is paginated, we can also get page, page_size and total number of results for the query. The result might look something like this:

{
  "data": {
    "find_images": {
      "error": null,
      "page": 0,
      "page_size": 3,
      "total": 607006,
      "data": [
        {
          "_id": "A",
          "creation_date": "2020-11-06T06:51:40.244000+00:00"
        },
        {
          "_id": "B",
          "creation_date": "2020-11-06T06:51:38.670000+00:00"
        },
        {
          "_id": "C",
          "creation_date": "2020-11-06T06:51:35.356000+00:00"
        }
      ]
    }
  }
}

Advanced filtering#

For find_ queries, there’s complex filtering available in the filter argument for the query. Let’s start with a simple query where we fetch only published repositories.

GraphQL query:

{
  find_repositories (filter: {published: {eq: true}}){
    error {
      status
      detail
    }
    page
    page_size
    total
    data {
      _id
    }
  }
}

REST endpoint call eqvivalent:

https://catalog.redhat.com/api/containers/v1/repositories?filter=published==true&include=data._id

The syntax for a filter item is {field: {condition: value}} where available conditions differ for each data type. GraphQL playground should provide autocomplete with a list of available filters for every field, but generally we replicate the functionality of the REST filtering language.

Chaining filters#

Let’s look at nested filters and chaining filters together, this time focusing only on the filter field to avoid repeating the rest of the query.

{
  find_repositories(
    filter: {
      and: [
        { published: { eq: true } }
        { contacts: { email_address: { eq: "john@doe.com" } } }
      ]
    }
  )
}

REST filter equivalent:

filter=published==true and contacts.email_address=="john@doe.com"

At the top level (right after filter), you can chain filters with and or or keywords, followed by a list of filters. One of those filters in the above query also specifies a filter for a nested field contacts.email_address. At the top level you can also combine the and and or together like this:

{
  find_repositories(
    filter: {
      and: [
        { published: { eq: true } }
        {
          or: [
            { contacts: { email_address: { eq: "bob@doe.com" } } }
            { contacts: { email_address: { eq: "alice@doe.com" } } }
          ]
        }
      ]
    }
  )
}

REST filter equivalent:

filter=published==true and (contacts.email_address=="bob@doe.com" or contacts.email_address=="alice@doe.com")

The main takeaway from chaining filters is that it can only happen at the top level, e.g. this query is invalid:

{
  find_repositories(
    filter: {
      and: [
        { published: { eq: true } }
        {
          contacts: {
            or: [
              { email_address: { eq: "bob@doe.com" } }
              { email_address: { eq: "alice@doe.com" } }
            ]
          }
        }
      ]
    }
  )
}

Filtering arrays#

Arrays have their special filters, which you can use to filter by array size (currently only eq condition is available), by condition for a item on a specified index or by elemMatch operator. If example_field is an array, then you can filter by its size with the example_field_size field, by its index with the example_field_index field and by elemMatch with the example_field_elemMatch.

Let’s try to filter images by the number of layers and specify what the top layer should be.

{
  find_images(
    filter: {
      and: [
        { parsed_data: { layers_size: { eq: 1 } } }
        {
          parsed_data: {
            layers_index: { index: 0, condition: { eq: "sha256:layer_hash" } }
          }
        }
      ]
    }
  )
}

REST filter equivalent:

filter=parsed_data.layers=size=1 and parsed_data.layers.0=="sha256:layer_hash"

The syntax for the _index field is {example_field_index: {index: Int condition: {cond: value}}}.

The following query returns images that are in rhel repository and are published.

GraphQL query:

{
  find_images(
    filter: {
      repositories_elemMatch: {
    		and: [
          {repository: {eq: "rhel"}}
          {published: {eq: true}}
        ]
      }
    }
  ) {
    data {
      _id
      creation_date
      repositories {
        repository
        published
      }
    }
  }
}

REST endpoint call eqvivalent:

https://catalog.redhat.com/api/containers/v1/images?filter=repositories=em(repository=="rhel" and published==true)&include=data._id,data.creation_date,data.repositories.repository,data.repositories.published

Filtering subobjects and NULL#

Users can use filters to find objects where subobjects do or do not exists or are set to NULL. That can be done by the queries similar to the one below.

GraphQL query:

{
  find_images(
    filter: {
      repositories: {
    		eq: null
      }
    }
  ) {
    data {
      _id
      creation_date
      repositories {
        repository
        published
      }
    }
  }
}

REST endpoint call eqvivalent:

https://catalog.redhat.com/api/containers/v1/images?filter=repositories==null&include=data._id,data.creation_date,data.repositories.repository,data.repositories.published

The query looks for the images where repositories is either null or not set at all. If we would want to find all the objects that have this field set, we would replace eq operator by ne.

Edges#

Edges are another new thing that GraphQL introduces. You can imagine schema types as nodes in a graph connected by edges. Edges in this context mean references to other types in the schema. So what are the fantastic edges and where to find them? They can be identified as fields with *Response data type in type definition in the schema. This also indicates that another call to Pyxis REST has to be made, so be careful about querying for too many edges, as that might result in very slow queries.

Let’s look at a query utilizing an edge in the Image type to get vulnerabilities for the image.

GraphQL query:

{
  get_image(id: "abc") {
    error {
      status
      detail
    }
    data {
      _id
      edges {
        vulnerabilities(page_size: 5) {
          error {
            status
            detail
          }
          data {
            _id
            active
          }
        }
      }
    }
  }
}

The GrapQL query would be equivalent to the following two REST endpoint calls:

https://catalog.redhat.com/api/containers/v1/images/id/abc?include=_id

https://catalog.redhat.com/api/containers/v1/images/id/abc/vulnerabilities?page_size=5&include=data._id,data.active

And the edge field definition in the schema:

type Image {

    ...

    vulnerabilites(page_size: Int = 50, page: Int = 0, filter: VulnerabilityFilter): VulnerabilityListResponse

    ...

}

You can see that since this is a get many type of edge, you can control pagination and even filter the results, but beware, you can’t affect the parent type with these filters, nor you can use filter argument on the parent get_image to filter edge fields.

There are also get one edges which contain only a single type, e.g. the rpm_manifest edge in Image:

type Image {

    ...

    rpm_manifest: RPMManifestResponse

    ...

}

In which case the query would look like this:

{
  get_image(id: "abc") {
    error {
      status
      detail
    }
    data {
      _id
      edges {
        rpm_manifest {
          error {
            status
            detail
          }
          data {
            _id
          }
        }
      }
    }
  }
}

The GrapQL query would be equivalent to the following two REST endpoint calls:

https://catalog.redhat.com/api/containers/v1/images/id/abc?include=_id

https://catalog.redhat.com/api/containers/v1/images/id/abc/rpm_manifest?include=_id

Variables#

Variables can be used to replace static query parameters and make queries reusable and more readable. When parameters are replaced by query variables, the query can be used repeatedly with different variable values. Variable values are sent as part of request payload in a dictionary saved under variables key.

The variable must be specified in the query parameter by syntax $variableName: VariableType. Then it can be referenced inside query by its name.

For example simple get_image query can be transformed using variables as follows:

Old query:

{
  get_image(id: "000000000000000000000123") {
    error {
      status
      detail
    }
    data {
      _id
    }
  }
}

Generalized query with variable:

query GetImageQuery($imageID: ObjectIDFilterScalar){
  get_image(id: $imageID) {
    error {
      status
      detail
    }
    data {
      _id
    }
  }
}

Payload with variables:

{
  "query": "...",
  "variables": {
    "imageID": "000000000000000000000123"
  }
}

Aliases#

Aliases can be used to rename fields returned by a GraphQL query. Alias can be easily created using the alias_name: selected_field syntax.

This can be useful for calling multiple queries with same name in one GraphQL request.

In this example, using aliases avoids an error that would be caused by a query name conflict:

{
  image1: get_image(id: "000000000000000000000000") {
    error {
      status
      detail
    }
    data {
      _id      
    }
  }
  image2: get_image(id: "000000000000000000000001") {
    error {
      status
      detail
    }
    data {
      _id
    }
  }
}

Thanks to the aliases, the query returned a response with valid JSON without duplicate keys:

{
  "data": {
    "image1": {
      "error": null,
      "data": {
        "_id": "000000000000000000000000"
      }
    },
    "image2": {
      "error": null,
      "data": {
        "_id": "000000000000000000000001"
      }
    }
  }
}