Home / Stream/ Integrations/ Sources/ Google Cloud/Google Cloud Pub/Sub Source

Google Cloud Pub/Sub Source

Cribl Stream supports receiving data records from Google Cloud Pub/Sub, a managed real-time messaging service for sending and receiving messages between applications.

Type: Pull | TLS Support: YES (secure API) | Event Breaker Support: No

Prerequisites

When working with Google Cloud Pub/Sub, you need to know:

  • The project where your credentials originated.
  • The project that contains any topic and subscription you want to work with.

If the desired topic and subscription reside in the same project where your credentials originated, describe them using either short names or fully-qualified names.

If the desired topic and subscription reside in a different project than the one where your credentials originated, you must describe them using fully-qualified names.

Here’s an example of a fully-qualified topic name: projects/my-project-id/topics/my-topic-id.

The short name for the same topic would be: my-topic-id.

Configure Cribl Stream to Receive Data from Pub/Sub

  1. On the top bar, select Products, and then select Cribl Stream. Under Worker Groups, select a Worker Group. Next, you have two options:
    • To configure via QuickConnect, navigate to Routing > QuickConnect. Select Add Source and select the Source you want from the list, choosing either Select Existing or Add New.
    • To configure via the Routes, select Data > Sources. Select the Source you want. Next, select Add Source.
  2. In the New Source modal, configure the following under General Settings:
    • Enabled: Toggle on to enable the Source.
    • Input ID: Enter a unique name to identify this Pub/Sub Source definition. If you clone this Source, Cribl Stream will add -CLONE to the original Input ID.
    • Description: Optionally, enter a description.
    • Topic ID: ID of the Pub/Sub topic from which to receive events.
    • Subscription ID: ID of the subscription to use when receiving events.
  3. Next, you can configure the following Optional Settings:
    • Create topic: If toggled on, Cribl Stream will create the topic on Pub/Sub if it does not exist.
    • Create subscription: If toggled on (default), Cribl Stream will create the subscription on Pub/Sub if it does not exist.
    • Ordered delivery: If toggled on, Cribl Stream will receive events in the order that they were added to the queue. For ordered delivery to work correctly, ensure that both the process receiving events and the subscription from which messages are read have ordered delivery enabled.
    • Region: Region to retrieve messages from. Select default to allow Google to auto-select the nearest region. (If you’ve enabled Ordered delivery, the selected region must be allowed by message storage policy.)
    • Tags: Optionally, add tags that you can use to filter and group Sources in Cribl Stream’s UI. These tags aren’t added to processed events. Use a tab or hard return between (arbitrary) tag names.
  4. Optionally, you can adjust the Authentication, Processing, and Advanced settings, or Connected Destinations outlined in the sections below.
  5. Select Save, then Commit & Deploy.

Authentication

Use the Google authentication method drop-down to select a Google authentication method:

  • Auto: This option uses the GOOGLE_APPLICATION_CREDENTIALS environment variable to provide the location of a credential JSON file. The environment variable can also be set directly in JSON format. When this option is set, Cribl Stream will first check whether the GOOGLE_APPLICATION_CREDENTIALS environment variable exists and properly set with valid credentials. If this environmental variable is missing or incorrect, Cribl Stream checks for credentials in two additional fallback locations, as described in the official Google Cloud documentation: How Application Default Credentials works. In addition to using the GOOGLE_APPLICATION_CREDENTIALS environment variable, you can use credentials attached to your Google Cloud instance.

  • Manual: With this default option, you use the Service account credentials field to enter the contents of your service account credentials file (a set of JSON keys), as downloaded from Google Cloud. To insert the file itself, click the upload button at this field’s upper right. As an alternative, you can use environment variables, as outlined here.

  • Secret: Use the drop-down to select a key pair that you’ve configured in Cribl Stream’s internal secrets manager or (if enabled) an external KMS.

Processing Settings

Fields

In this section, you can define new fields or modify existing ones using JavaScript expressions, similar to the Eval function.

  • The Field Name can either be a new field (unique within the event) or an existing field name to modify its value.
  • The Value is a JavaScript expression (enclosed in quotes or backticks) to compute the field’s value (can be a constant). Select this field’s advanced mode icon (far right) if you’d like to open a modal where you can work with sample data and iterate on results.

This flexibility means you can:

  • Add new fields to enrich the event.
  • Modify existing fields by overwriting their values.
  • Compute logic or transformations using JavaScript expressions.

Pre-Processing

In this section’s Pipeline drop-down list, you can select a single existing Pipeline or Pack to process data from this input before the data is sent through the Routes.

Advanced Settings

Backlog limit: When the Destination exerts backpressure, this setting limits the number of events that Cribl Stream will queue for processing before it stops retrieving further events. Defaults to 1000 events.

Number of concurrent streams: How many streams to pull messages from at one time. Doubling the value doubles the number of messages this Source pulls from the topic (if available), while consuming more CPU and memory. Defaults to 5.

Request timeout (ms): Pull request timeout, in milliseconds. Defaults to 60000 ms (i.e., 1 minute).

Environment: If you’re using GitOps, optionally use this field to specify a single Git branch on which to enable this configuration. If empty, the config will be enabled everywhere.

Connected Destinations

Select Send to Routes to enable conditional routing, filtering, and cloning of this Source’s data via the Routing table.

Select QuickConnect to send this Source’s data to one or more Destinations via independent, direct connections.

Internal Fields

Cribl Stream uses a set of internal fields to assist in handling of data. These “meta” fields are not part of an event, but they are accessible, and Functions can use them to make processing decisions.

Fields for this Source:

  • __messageId – ID of the message from Google.
  • __projectId – ID of the Google project from which the data was received.
  • __publishTime – Time at which the event was originally published to the Pub/Sub topic.
  • __subscriptionIn – The subscription from which the event was received.
  • __topicIn – The topic from which the event was received.

Google Cloud Roles and Permissions

Your Google Cloud service account should have at least the following roles on subscriptions:

  • roles/pubsub.subscriber
  • roles/pubsub.viewer or roles/viewer

As an alternative, you can rely on the following discrete permissions, which are included in the above roles:

  • pubsub.snapshots.seek
  • pubsub.subscriptions.consume
  • pubsub.topics.attachSubscription

To enable Cribl Stream’s Create topic and/or Create subscription options, your service account should have one of the following (or higher) roles:

  • roles/pubsub.editor
  • roles/editor

Either editor role confers multiple permissions, including those from the lower viewer, subscriber, and publisher roles. For additional details, see the Google Cloud Access Control topic.

How Cribl Stream Pulls Data

Pub/Sub treats all the Worker Nodes as members of a Consumer Group, and each Worker gets its share of the load from Pub/Sub. This is the same process as normal Kafka. By default, Workers will poll every 1 minute. In the case of Leader failure, Worker Nodes will continue to receive data as normal.

Troubleshooting

The Source’s configuration modal has helpful tabs for troubleshooting:

Live Data: Try capturing live data to see real-time events as they are ingested. On the Live Data tab, click Start Capture to begin viewing real-time data.

Logs: Review and search the logs that provide detailed information about the ingestion process, including any errors or warnings that may have occurred.

You can also view the Monitoring page that provides a comprehensive overview of data volume and rate, helping you identify ingestion issues. Analyze the graphs showing events and bytes in/out over time.