Push data to AWS Kinesis firehose with AWS API gateway via Service proxy

Written by mannem on . Posted in Kinesis



With API Gateway, developers can create and operate APIs for their back-end services without developing and maintaining infrastructure to handle authorization and access control, traffic management, monitoring and analytics, version management, and software development kit (SDK) generation.

API Gateway is designed for web and mobile developers who are looking to provide secure, reliable access to back-end APIs for access from mobile apps, web apps, and server apps that are built internally or by third-party ecosystem partners. The business logic behind the APIs can either be provided by a publicly accessible endpoint API Gateway proxies call to, or it can be entirely run as a Lambda function.

In this article, we will create an Publicly accessible API endpoint on which your application can issue POST requests. Via Service proxy, the contents of this post request go to Firehose as PutRecord API call and eventually the data goes to S3/Redshift/ES-Cluster based on your Firehose settings. Usage of service proxy eliminates invoking an AWS Lambda function.

The end result would be :

1. Your application issues a POST request to the API gateway endpoint that you create –


2. The API gateway translates and authenticates this request as PutRecord API call via Service proxy and puts data “SampleDataStringToFirehose” into your Firehose.

3. The Firehose eventually hydrates the destination (Either S3 or Redshift) with the data from your POST requests.

Here’s step by step walkthrough on setting this up:

This walkthrough assumes you had explored other walkthrough’s in http://docs.aws.amazon.com/apigateway/latest/developerguide/getting-started-intro.html
1. Creating Gateway:

> Create an API Gateway by going through the web console.
> Create a resource under that API and create a POST method.
> In this method, choose integration type as Advanced and select “AWS Service Proxy”.
> Method settings:

Select desired Region,
Service as Firehose ,
Leave subdomain empty ,
Http Method -> POST,
Ation -> PutRecord
Role -> ARN of the role that can be assumed by API gateway and had policies to allow at-least ‘PutRecord’ action on your firehose. A sample role which allows all actions – is attached later.
Ex: arn:aws:iam::618548141234:role/RoleToAllowPUTsOnFirehose

Confused? you can also checkout a sample role creation here: http://docs.aws.amazon.com/apigateway/latest/developerguide/getting-started-aws-proxy.html#getting-started-aws-proxy-add-roles

2. Testing:

Save this method and TEST the method with following request body that can be found on PutRecord API call webpage.

Replace ‘test’ with your Firehose stream name.


3. Verify S3 contents:

Now if you see the S3 contents that the firehose is supposed to hydrate (after s3 buffer interval or Buffer size, which ever satisfied first) ,

The contents will be binary format like äfiõÚ)≤ÈøäœÏäjeÀ˜nöløµÏm˛áˇ∂ø¶∏ß∂)‡ which isn’t the data that you just pushed via API call.

This is because the Firehose expects the datablob to be encoded in Base64. (This can be confirmed by running ( aws firehose put-record --delivery-stream-name test --debug --region us-west-2 --record Data=SampleDataStringToFirehose ) , which automatically encodes the data blob in base64 before sending the request ). While we mention ‘SampleDataStringToFirehose’ as data , we see AWS CLI actually sends ‘U2FtcGxlRGF0YVN0cmluZ1RvRmlyZWhvc2U=’

{‘body’: ‘{“Record”: {“Data”: “U2FtcGxlRGF0YVN0cmluZ1RvRmlyZWhvc2U=”}, “DeliveryStreamName”: “test”}’ ,

where base64-encoded(SampleDataStringToFirehose) = ‘U2FtcGxlRGF0YVN0cmluZ1RvRmlyZWhvc2U=’

So, You need to apply transformations on your POST payload to encode the Data in base64.

You can use a $util variable like $util.base64Encode() to encode in base64 at the API Gateway layer.

4. Applying transformations:

Using transformations, you can modify the JSON schema during the request and response cycles.

By defining a mapping template, the request and response payloads can be transformed to reflect a custom schema.

For a request body like-

Here’s a sample mapping template that I created checking documentation (application/json):


> While testing a Resource -> Integration Request -> Add mapping template -> Content-Type = application/json
> Instead of Input passthrough, use mapping template to paste your template and save.
> Now test with a request body similar to what you had used before, and Verify in the Logs section – “Method request body after transformations” ,
it should look like

> You may need to modify the mapping template, so that include whatever payload you want for your application.
> Instead of using these transformations on API GW, you can also choose your client to encode the data before framing a request to API GW.

5. Deployment:

Now that we have a working method that can issue PutRecord to Firehose, we deploy the API to get a publicly accessible HTTP endpoint to issue POST requests. Your application can issue POST requests on this Endpoint and contents of this post requests go to Firehose as PutRecord API call and eventually the data goes to S3/Redshift based on your firehose settings.

Make sure you include Content-Type: application/json header in the POST request. You can also try application/x-amz-json-1.1

6. Monitor and Extend:
  • Monitoring – Check Cloudwatch monitoring tab on AWS Firehose for incoming Records and Bytes. You can also verify Cloudwatch Logs to verify failures.
  • Off-course you verify the contents of the S3 bucket / Redshift tables / ES cluster
  • Extend – You may extend the functionality to work with other API calls on other AWS Services required by your client App. Similar setup can be used to POST data to Kinesis streams from your Applications.


A sample role :

Tags: , , , , , , , , , , , ,

Trackback from your site.

Comments (12)

  • dan


    Thanks for this post. Was able to set up the flow in my environment. Question (I’m new to APIs)…if I wanted to pass in multiple fields to the POST and parse them into seperate fields in Redshift how would I go about it?


    • mannem


      Well, suppose your Redshift table called ‘people’ is created with columns like


      You can specify your data blob with some delimiter like ‘|’ and new line character like \n :

      For example :

      1. The post request could be like :

      curl -H “Content-Type: application/json” -X POST https://bvvfrgw123.execute-api.us-west-2.amazonaws.com/prod/firehose2 -d ‘
      “DeliveryStreamName”: “test”,
      “Record”: {
      “Data”: “12345|Bob|Smith\n678910|Sam|Green\n”

      You can configure your API GW to either use ‘putRecordBatch’ or ‘putRecord’ API call with necessary transformations.

      2. Firehose creates an s3 staging file with the above Data line.

      3. Now it will run Redshift COPY from S3 to Redshift table ‘people’ based on the COPY command options that you give while setting up Firehose.

      the COPY command by default uses ‘|’ as DELIMITER and \n as new line.

      So, with that above datablob, the data should be automatically loaded to respective columns and rows like

      12345 Bob Smith
      678910 Sam Green





  • nitu


    Thanks for a very detailed writeup. It is quite useful.

    Could you tell me how to add newline as part of custom schema for S3.

    “DeliveryStreamName”: “$input.path(‘$.DeliveryStreamName’)”,
    “Record”: {
    “Data”: “$util.base64Encode($input.path(‘$.Record.Data’))”
    I don’t have control on the incoming data and hence, I need to add \n only in the mapping. when I try to add \n to Data, it simply copies \n in S3 but doesn’t translate to newline.

    Could you please advise



  • BobF


    I just want to pass along a REALLY big “Thank You!” for this great article. It was easy to follow and worked perfectly. I don’t even want to think about how many hours you likely saved me.


    • mannem


      Glad that it worked for you.


  • Jigar


    I have a lambda function which has latest dynamodb stream record, now I want to pass that record to firehose using this API, so how can I call and invoke this API from same lambda function?


    • mannem


      Calling a deployed API involves submitting requests to the execute-api component of API Gateway. The request URL is the Invoke URL generated by API Gateway when the API is successfully deployed. You can obtain this invocation URL from the API Gateway console or you can construct it yourself according to the following format: https://{restapi_id}.execute-api.{region}.amazonaws.com/{stage_name}/

      Amazon API Gateway REST requests are HTTP requests. From your Lambda function based on the programming language, you need to import necessary http library and invoke any GET-method , or POST-methods on the above URL. Based on how your deployed the API GW, you may need to do signing on the headers of this HTTP request ( AWS_IAM + API Key if enabled ). Instead of manually framing authentication headers on your code , You can use AWS SDK on Lambda to easily sign your requests.







    We wanted to pass all request headers as part of base64 encoded Data attribute. i try to contact, add it seems to be not working and always returns internal server error any idea.



  • Jean-michel


    Great blog, I follow the doc to set policies and create role but keep getting an error, where is the role to allow all actions?
    The error is
    assumed-role/APIGatewayAWSProxyExecRole/BackplaneAssumeRoleSession is not authorized to perform: firehose:PutRecord on resource


    • mannem


      Hi Jean-michel , looks like the role that is being assumed by APi GW is does not have permissions to make PutRecord calls to your Firehose resource. You will need a role that will allow this call.

      The policy document on your role(that you attach to API GW while you create it 0 see STEP 1 ) can be like :
      “Version”: “2012-10-17”,
      “Statement”: [
      “Effect”: “Allow”,
      “Resource”: [
      “Action”: [

      This means API GW will have access to all firehose API calls on all of your resources.


  • keerthivasan santhanakrishnan


    Thanks a lot for your blob, It was really useful. Before reading this, i was guessing that there should be way to use api gateway as kinesis proxy, you made it very clear and i was able to get the pipeline running very fast. Thanks a lot.


Leave a comment

  • cloudformation












  • snap





    Storage Gatewa




    Cloud Front

  • r53

    Route 53











  • sns







    Cloud Search


    App Stream



  • opsworks



    Cloud Watch


    Elastic Beanstalk


    Code Deploy



  • dynamodb