Search DynamoDB tables using Elasticsearch/Kibana via Logstash plugin

Written by mannem on . Posted in Dynamo DB, Elasticsearch


The Logstash plugin for Amazon DynamoDB gives you a nearly real-time view of the data in your DynamoDB table. The Logstash plugin for DynamoDB uses DynamoDB Streams to parse and output data as it is added to a DynamoDB table. After you install and activate the Logstash plugin for DynamoDB, it scans the data in the specified table, and then it starts consuming your updates using Streams and then outputs them to Elasticsearch, or a Logstash output of your choice.

Logstash is a data pipeline service that processes data, parses data, and then outputs it to a selected location in a selected format. Elasticsearch is a distributed, full-text search server. For more information about Logstash and Elasticsearch, go to https://www.elastic.co/products/elasticsearch.

Amazon Elasticsearch Service is a managed service that makes it easy to deploy, operate, and scale Elasticsearch in the AWS Cloud. aws.amazon.com/elasticsearch-service/


This article includes an installation guide that is tested on EC2 instance where all the per-requsites are installed and Logstash is configured so that it connects to Amazon ElasticSearch using the input/Output plugins to start indexing records from DynamoDB. Click here to get all the instructions :
https://github.com/mannem/logstash-input-dynamodb


Logstash configuration:

After running a similar command on the shell, Logstash should successfully start and begin indexing the records from your DynamoDB table.


Throughput considerations:


Kibana:


References:
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Tools.DynamoDBLogstash.html

https://aws.amazon.com/blogs/aws/new-logstash-plugin-search-dynamodb-content-using-elasticsearch/

https://github.com/awslabs/logstash-input-dynamodb


Similar plugins:

https://github.com/kzwang/elasticsearch-river-dynamodb

Query AWS ES cluster by signing http requests with AWS IAM roles (python)

Written by mannem on . Posted in Elasticsearch

The AWS public facing documentation provides some python examples to sign the http reqests with IAM users’s to access other AWS resources. In this case, AWS ES cluster whose access policies are restricted to those IAM users.

If you wish to restrict the access to ES cluster with IAM roles instead, the signing process is a bit different.

The document (http://docs.aws.amazon.com/general/latest/gr/sigv4-signed-request-examples.html) seem to be only for IAM users but not for IAM roles.


Changing the signed header

Signing requests with IAM roles need additional header called ‘session token’ added the request header using a header name of ‘x-amz-security-token’.

So, in the ESrequest.py replacing this line:

headers = {'x-amz-date':amzdate, 'Authorization':authorization_header}

With the following should work for signing requests with IAM roles cred’s.

headers = {'x-amz-date':amzdate, 'Authorization':authorization_header, 'x-amz-security-token':token}

Where, the session ‘token’ string should be obtained from the corresponding IAM role. (These credential trio for roles are rotated frequently and they have an expiration date. So make sure you are using unexpired token)


Obtaining ‘token’ string from IAM Roles

If an EC2 instance is assuming a role, you can get it with

curl http://169.254.169.254/latest/meta-data/iam/security-credentials/MyEc2Role
(http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html#instance-metadata-security-credentials)

In python, all these credentials can be obtained with ‘requests’ module and parsing them accordingly.

The token can be obtained similarly from other services assuming IAM roles like Lambda etc.


Find the end-to-end code here : https://github.com/mannem/elasticsearchDev/blob/master/ES_V4Signing_IAMRoles

  • cloudformation

    cloudformation

    pipeline

    Data-pipelines

    directoryservice

    directoryservicez

    cloudtrail

    cloudtrail

    config

    config

    trustedadvisor

    Trustedadvisor

  • snap

    Snapshot

    glacier

    Glacie

    storagegw

    Storage Gatewa

    s3

    S3

    cloudFront

    Cloud Front

  • r53

    Route 53

    lambda

    lambd

    directConnect

    DirectConnect

    vpc

    VPC

    kinesis

    Kinesis

    emr

    Emr

  • sns

    SNS

    transcoder

    Transcoder

    sqs

    SQS

    cloudsearch

    Cloud Search

    appstream

    App Stream

    ses

    SES

  • opsworks

    opsworks

    cloudwatch

    Cloud Watch

    beanstalk

    Elastic Beanstalk

    codedeploy

    Code Deploy

    IAM

    IAM

  • dynamodb

    dynamodb

    rds

    RDS

    elasticache

    ElastiCache

    redshift

    Redshift

    simpledb

    simpledb