Push data to AWS Kinesis firehose with AWS API gateway via Service proxy

Written by mannem on . Posted in Kinesis

 

firehose

With API Gateway, developers can create and operate APIs for their back-end services without developing and maintaining infrastructure to handle authorization and access control, traffic management, monitoring and analytics, version management, and software development kit (SDK) generation.

API Gateway is designed for web and mobile developers who are looking to provide secure, reliable access to back-end APIs for access from mobile apps, web apps, and server apps that are built internally or by third-party ecosystem partners. The business logic behind the APIs can either be provided by a publicly accessible endpoint API Gateway proxies call to, or it can be entirely run as a Lambda function.

In this article, we will create an Publicly accessible API endpoint on which your application can issue POST requests. Via Service proxy, the contents of this post request go to Firehose as PutRecord API call and eventually the data goes to S3/Redshift/ES-Cluster based on your Firehose settings. Usage of service proxy eliminates invoking an AWS Lambda function.

The end result would be :

1. Your application issues a POST request to the API gateway endpoint that you create –

Ex:

2. The API gateway translates and authenticates this request as PutRecord API call via Service proxy and puts data “SampleDataStringToFirehose” into your Firehose.

3. The Firehose eventually hydrates the destination (Either S3 or Redshift) with the data from your POST requests.


Here’s step by step walkthrough on setting this up:

This walkthrough assumes you had explored other walkthrough’s in http://docs.aws.amazon.com/apigateway/latest/developerguide/getting-started-intro.html
1. Creating Gateway:

> Create an API Gateway by going through the web console.
> Create a resource under that API and create a POST method.
> In this method, choose integration type as Advanced and select “AWS Service Proxy”.
> Method settings:

Select desired Region,
Service as Firehose ,
Leave subdomain empty ,
Http Method -> POST,
Ation -> PutRecord
Role -> ARN of the role that can be assumed by API gateway and had policies to allow at-least ‘PutRecord’ action on your firehose. A sample role which allows all actions – is attached later.
Ex: arn:aws:iam::618548141234:role/RoleToAllowPUTsOnFirehose

Confused? you can also checkout a sample role creation here: http://docs.aws.amazon.com/apigateway/latest/developerguide/getting-started-aws-proxy.html#getting-started-aws-proxy-add-roles

2. Testing:

Save this method and TEST the method with following request body that can be found on PutRecord API call webpage.

Replace ‘test’ with your Firehose stream name.

http://docs.aws.amazon.com/firehose/latest/APIReference/API_PutRecord.html

3. Verify S3 contents:

Now if you see the S3 contents that the firehose is supposed to hydrate (after s3 buffer interval or Buffer size, which ever satisfied first) ,

The contents will be binary format like äfiõÚ)≤ÈøäœÏäjeÀ˜nöløµÏm˛áˇ∂ø¶∏ß∂)‡ which isn’t the data that you just pushed via API call.

This is because the Firehose expects the datablob to be encoded in Base64. (This can be confirmed by running ( aws firehose put-record --delivery-stream-name test --debug --region us-west-2 --record Data=SampleDataStringToFirehose ) , which automatically encodes the data blob in base64 before sending the request ). While we mention ‘SampleDataStringToFirehose’ as data , we see AWS CLI actually sends ‘U2FtcGxlRGF0YVN0cmluZ1RvRmlyZWhvc2U=’

{‘body’: ‘{“Record”: {“Data”: “U2FtcGxlRGF0YVN0cmluZ1RvRmlyZWhvc2U=”}, “DeliveryStreamName”: “test”}’ ,

where base64-encoded(SampleDataStringToFirehose) = ‘U2FtcGxlRGF0YVN0cmluZ1RvRmlyZWhvc2U=’

So, You need to apply transformations on your POST payload to encode the Data in base64.

You can use a $util variable like $util.base64Encode() to encode in base64 at the API Gateway layer.

4. Applying transformations:

Using transformations, you can modify the JSON schema during the request and response cycles.

By defining a mapping template, the request and response payloads can be transformed to reflect a custom schema.

For a request body like-

Here’s a sample mapping template that I created checking documentation (application/json):

Usage:

> While testing a Resource -> Integration Request -> Add mapping template -> Content-Type = application/json
> Instead of Input passthrough, use mapping template to paste your template and save.
> Now test with a request body similar to what you had used before, and Verify in the Logs section – “Method request body after transformations” ,
it should look like

> You may need to modify the mapping template, so that include whatever payload you want for your application.
> Instead of using these transformations on API GW, you can also choose your client to encode the data before framing a request to API GW.

5. Deployment:

Now that we have a working method that can issue PutRecord to Firehose, we deploy the API to get a publicly accessible HTTP endpoint to issue POST requests. Your application can issue POST requests on this Endpoint and contents of this post requests go to Firehose as PutRecord API call and eventually the data goes to S3/Redshift based on your firehose settings.

Make sure you include Content-Type: application/json header in the POST request. You can also try application/x-amz-json-1.1

6. Monitor and Extend:
  • Monitoring – Check Cloudwatch monitoring tab on AWS Firehose for incoming Records and Bytes. You can also verify Cloudwatch Logs to verify failures.
  • Off-course you verify the contents of the S3 bucket / Redshift tables / ES cluster
  • Extend – You may extend the functionality to work with other API calls on other AWS Services required by your client App. Similar setup can be used to POST data to Kinesis streams from your Applications.

scratch

A sample role :

Tags: , , , , , , , , , , , ,

Trackback from your site.

Comments (12)

  • dan

    |

    Thanks for this post. Was able to set up the flow in my environment. Question (I’m new to APIs)…if I wanted to pass in multiple fields to the POST and parse them into seperate fields in Redshift how would I go about it?

    Reply

    • mannem

      |

      Well, suppose your Redshift table called ‘people’ is created with columns like

      id,F.name,L.name

      You can specify your data blob with some delimiter like ‘|’ and new line character like \n :

      For example :

      1. The post request could be like :

      curl -H “Content-Type: application/json” -X POST https://bvvfrgw123.execute-api.us-west-2.amazonaws.com/prod/firehose2 -d ‘
      {
      “DeliveryStreamName”: “test”,
      “Record”: {
      “Data”: “12345|Bob|Smith\n678910|Sam|Green\n”
      }
      }’

      You can configure your API GW to either use ‘putRecordBatch’ or ‘putRecord’ API call with necessary transformations.

      2. Firehose creates an s3 staging file with the above Data line.

      3. Now it will run Redshift COPY from S3 to Redshift table ‘people’ based on the COPY command options that you give while setting up Firehose.

      the COPY command by default uses ‘|’ as DELIMITER and \n as new line.

      So, with that above datablob, the data should be automatically loaded to respective columns and rows like

      id,F.name,L.name
      12345 Bob Smith
      678910 Sam Green

      ————-

      http://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-data-format.html

      http://docs.aws.amazon.com/firehose/latest/dev/basic-create.html#console-to-redshift

      Reply

  • nitu

    |

    Thanks for a very detailed writeup. It is quite useful.

    Could you tell me how to add newline as part of custom schema for S3.

    {
    “DeliveryStreamName”: “$input.path(‘$.DeliveryStreamName’)”,
    “Record”: {
    “Data”: “$util.base64Encode($input.path(‘$.Record.Data’))”
    }
    }
    I don’t have control on the incoming data and hence, I need to add \n only in the mapping. when I try to add \n to Data, it simply copies \n in S3 but doesn’t translate to newline.

    Could you please advise

    thanks

    Reply

  • BobF

    |

    I just want to pass along a REALLY big “Thank You!” for this great article. It was easy to follow and worked perfectly. I don’t even want to think about how many hours you likely saved me.

    Reply

    • mannem

      |

      Glad that it worked for you.

      Reply

  • Jigar

    |

    I have a lambda function which has latest dynamodb stream record, now I want to pass that record to firehose using this API, so how can I call and invoke this API from same lambda function?

    Reply

    • mannem

      |

      Calling a deployed API involves submitting requests to the execute-api component of API Gateway. The request URL is the Invoke URL generated by API Gateway when the API is successfully deployed. You can obtain this invocation URL from the API Gateway console or you can construct it yourself according to the following format: https://{restapi_id}.execute-api.{region}.amazonaws.com/{stage_name}/

      Amazon API Gateway REST requests are HTTP requests. From your Lambda function based on the programming language, you need to import necessary http library and invoke any GET-method , or POST-methods on the above URL. Based on how your deployed the API GW, you may need to do signing on the headers of this HTTP request ( AWS_IAM + API Key if enabled ). Instead of manually framing authentication headers on your code , You can use AWS SDK on Lambda to easily sign your requests.

      http://docs.aws.amazon.com/apigateway/api-reference/making-http-requests/

      http://docs.aws.amazon.com/apigateway/latest/developerguide/how-to-call-api.html

      Reply

  • SADHASIVAM

    |

    HI

    We wanted to pass all request headers as part of base64 encoded Data attribute. i try to contact, add it seems to be not working and always returns internal server error any idea.

    thanks

    Reply

  • Jean-michel

    |

    Great blog, I follow the doc to set policies and create role but keep getting an error, where is the role to allow all actions?
    The error is
    assumed-role/APIGatewayAWSProxyExecRole/BackplaneAssumeRoleSession is not authorized to perform: firehose:PutRecord on resource

    Reply

    • mannem

      |

      Hi Jean-michel , looks like the role that is being assumed by APi GW is does not have permissions to make PutRecord calls to your Firehose resource. You will need a role that will allow this call.

      The policy document on your role(that you attach to API GW while you create it 0 see STEP 1 ) can be like :
      {
      “Version”: “2012-10-17”,
      “Statement”: [
      {
      “Effect”: “Allow”,
      “Resource”: [
      “*”
      ],
      “Action”: [
      “firehose:*”
      ]
      }
      ]
      }

      This means API GW will have access to all firehose API calls on all of your resources.

      Reply

  • keerthivasan santhanakrishnan

    |

    Thanks a lot for your blob, It was really useful. Before reading this, i was guessing that there should be way to use api gateway as kinesis proxy, you made it very clear and i was able to get the pipeline running very fast. Thanks a lot.

    Reply

Leave a comment

  • cloudformation

    cloudformation

    pipeline

    Data-pipelines

    directoryservice

    directoryservicez

    cloudtrail

    cloudtrail

    config

    config

    trustedadvisor

    Trustedadvisor

  • snap

    Snapshot

    glacier

    Glacie

    storagegw

    Storage Gatewa

    s3

    S3

    cloudFront

    Cloud Front

  • r53

    Route 53

    lambda

    lambd

    directConnect

    DirectConnect

    vpc

    VPC

    kinesis

    Kinesis

    emr

    Emr

  • sns

    SNS

    transcoder

    Transcoder

    sqs

    SQS

    cloudsearch

    Cloud Search

    appstream

    App Stream

    ses

    SES

  • opsworks

    opsworks

    cloudwatch

    Cloud Watch

    beanstalk

    Elastic Beanstalk

    codedeploy

    Code Deploy

    IAM

    IAM

  • dynamodb

    dynamodb

    rds

    RDS

    elasticache

    ElastiCache

    redshift

    Redshift

    simpledb

    simpledb