Step-By-Step

How To: Access Seek Connect, Setup a Data Pipeline, Review Data and Transform Data

 

Step 1: How do I access Seek Connect?

If your organization has subscribed to the Seek Connect capability, you can access Seek Connect directly in Insight Cloud via the left-side navigation pane as shown below:

Once you select the Connect option, you'll be navigated to Seek Connect in a new browser window where you'll grant authentication permissions. This is a one-time action. As a result, once you're logged into Insight Cloud, you'll also already be authenticated in Seek Connect. 


The Seek Connect landing page is the Jobs page where you can manage all source connections, data transformations, and publications. To create a new Job simply click on the Create New + button on the far right. You can decide to create a new source, transformation, or publication.  

Step 2: How do I create a new Data Pipeline?
  • First, we will cover the process of creating a new source:

  • Once you select the Source option, you will be navigated to the Source Job creation page: 

  • To name your dataset, click on the Untitled Source box, type the new name you want the dataset to have, and press Enter.

  • To set up your file source, choose the connector you want from the dropdown menu and follow the onscreen instructions to fill out the fields required to set up the connection to the source data. 

  • Once you enter the correct information for your S3 bucket, you have the option to adjust the file format.

  • Select the correct format for your data (CSV, Parquet, Avro, JSON) and adjust any parameters you would like. Next, click on the continue button at the bottom of the page to create your source.

Step 3: How do I review a sample of the source data?
  • Once you have completed the Select Source step and established a connection to the source, Seek Connect will redirect you to the Review files screen.

  • On this screen, you can select a source file asset and preview a sample of the source data. 

  • Click on the Back button to return to the Create dataset screen and modify your source setup. 

  • Click on Confirm to proceed with the data upload process. 

  • Seek Connect automatically creates the necessary landing and ingestion sources and destinations to generate the pipeline to your S3 bucket. The Upload page will confirm the Job source has been completed. 

  • After each field above is completed, you will be navigated to a preview page where you can save this Source Job to be used in Transformation and Publication Jobs. 

  • Once the connection is Active, you can click Save to return to the home page to create a new Data Transformation Job. 

Step 4: How do I perform a Data Transformation?
  • To begin your Data Transformation Job, navigate to the Create New + button and select the Transformation option from the drop-down menu. 

  • The next page will allow you to name your Data Transformation Job and select the source(s) you would like to perform the transformation on.

  • Select your source from the left-hand panel. You can click the preview button to see a sample of the data to help inform your SQL transformation. 

  • To transform this dataset before publishing, you need to navigate to the untitled_transformation.sql tab. Here, you can write SQL code to select or subset the data you want. 

  • To help with your SQL query, we have included a copy feature for each data source you have. Simply, hover your cursor over your desired source and click the copy button:

  • Paste this snippet into the untitled_transformations.sql file to begin writing your transformation. 

  • Now, you can add any additional SQL code to transform your dataset. 

  • For this example, we are going to apply a simple transformation to subset the table on the IS_SUPERHOST column so that we only have Superhosts in the dataset. 

  • To subset this table to only include Superhosts, we need to add WHERE IS_SUPERHOST = ‘t’ to the code snippet

  • You can preview the results of this query by selecting the Preview button below

  • We can see from the above preview that the only rows included are Superhosts, so we are seeing the expected result from our above SQL query. 

  • Before saving this Data Transformation Job, you can edit the Title, Rename the File, or choose to materialize this dataset as a table instead of a view:

  • Materializing the data as a table will refresh/update the data regularly, based on upstream source data updates. Materializing as a view will generate results each time it is queried by a user. Each method brings compute and performance implications to consider. Views are usually the more cost-effective choice, but not always. 

  • Once you have finished making edits to the Data Transformation Job, click Save in the upper right corner. This will prompt you to name your transformation. Make sure to give this a unique name as this will be used in the Publication Job