Posts Tagged ‘sqlactivity’

SqlActivity & RedshiftCopyActivity fails ? Use shellCommandActivity instead

Written by mannem on . Posted in Data Pipelines

There are several limitations of SQLActivity and RedshiftCopyActivity. If the psql/sql commands are too complex, these activities fail to prepare the statements correctly and will throw out some errors which cannot be easily rectified. So, you always have the option to use shellCommandActivity to run your complex script.

This article guides you to create a shell Script and corresponding Data-Pipeline template to run your complex script directly(part 1)or when present in S3(Part 2 ). As the true purpose of Data-Pipelines is automation, The script can also take arguments that you can reference using placeholders.


The following shell script takes arguments referenced in ScriptArguments object of the ShellCommandActivity. Its runs COPY command to copy files from S3 to PostgreRDS. Another example shows a copy from S3 to Redshift.

Run a PSQL command to copy from S3 to PostgreRDS

Run a PSQL command to copy from S3 to Redshift

A sample Pipeline template to copy from s3 to RDSPostgres

You may use a similar definition to copy from S3 to Redshift.


If your script is in S3 and you wanna pass arguments to your script:

Ex: insert into :v1 select * from source where = :v2; -> s3://YourBucket/event_data_yyyymm.sql

$1 -> S3File location
$2 -> Password for Redshift
$3 -> Table Name
$4 -> condition value

A snippet of Definition for ShellCommandActivity can be :

Running complex queries on redshift with Data-pipelines

Written by mannem on . Posted in Data Pipelines, Redshift

Sometimes AWS Data-Pipelines SQLActivity may not support complex queries. This is because Data-Pieplines SqlActivity passes this script to JDBS executeStatement(Prepared statement). This script is supposed to be idempotent. So here’s an alternative to run psql/sql commands using Data-Pipelines.

Suppose you have the following psql command,

select 'insert into event_data_' ||to_char(current_date,'yyyymm')|| '_v2 select * from stage_event_data_v2 where event_time::date >= '''||to_char(current_date, 'YYYY-MM')||'-01'' and event_time::date <= '''||last_day(current_date)||''';';

and it should output,

insert into event_data_201511_v2 select * from stage_event_data_v2 where event_time::date >= '2015-11-01' and event_time::date <= '2015-11-30';

This is a valid command in psql and can be successfully executed with workbenches and psql shell.

But using Data-pipelines, executing the above command will throw and error:

ERROR processing query/statement. Error: Parsing failed

This is because the script appears to be changing(not idempotent) when it is executed.

If you have a complex redshift commands and wish to performing operations against Redshift that involve custom logic. You could rather write a program in your favorite language and run it
using ShellCommandActivity. This is a quite valid way of interacting with Redshift.

There are several ways to do this. I am including a shell script and its Data-pipelne template as a reference here.

Sample shell command:

Sample Data-pipelines template:

Some info on the script and Data-pipeline:

1. This script file has 2 arguments (Arg 1 is the sql script that you need to execute , Arg 2 is the Redshift password). These arguments are provided in Data-pipeline shellCommandActivity object using scriptArgument field.

2. The script outputs its result to v2.sql and uploads to s3 bucket (with -t tuples only option), so that you can run the script later.

3. The Data-pipeline template uses the *myRedshiftPass parameter id to hide the password from DataPipelines.