Saturday, February 02, 2013

Amazon Web Services (AWS) pocket reference for Business Intelligence Architects / Architecture Design

I'm reading: Amazon Web Services (AWS) pocket reference for Business Intelligence Architects / Architecture DesignTweet this !
Every large scale IT organization is organized in some form of verticals / Strategic Business Units (SBU), or in some other form. These may be grouped by geography / technology / industry groups etc. Almost inevitably every such organization has a cloud computing capability, and most of cloud based projects / architectures are designed and developed by this capability. This may work till you are working in the capacity of an architect for your own set of projects that just deal with your technology.

I believe that when one intends to grow as an enterprise architect, one needs to collaborate with SMEs from cross environments / technologies / platforms, and for the same one needs to have a good understanding of a variety of each of it.

Why Amazon Web Services (AWS) - AWS is probably the largest cloud player in providing IaaS. Azure and other such platforms have started providing IaaS recently, but their major strength is PaaS where they provide technology to build solutions and the infra is managed by them. If one intends to develop solutions that have a very broad mix variety of technologies, then one would have to opt a very strong IaaS cloud environment, than a PaaS environment.

Below are some of my quick notes on the world of Amazon Web Services, that one might want to keep in consideration while architecting BI solutions on AWS.

1) AWS has two types of clouds : Public / Virtual private cloud (VPC)

In public cloud servers are under AWS control, which can be configured by user. In VPC, servers are hosted within AWS but part of corporate network. IPs are under the control of the corporate network and security between the corporate network and servers hosted on AWS is the obligation of the corporate.

2) Amazon Simple Storage Service (S3) :

  • Its an object store, where one can store any type of data in huge amounts, and the same can be accessed using the API provided by amazon for S3. 
  • It's a highly available service, as it stores copies of data in multiple locations. It can be used as a staging location for migrating data across availability zones when using Elastic Block Store Disk.
  • When data is stored into S3, the datatype is stored in a metadata tag. When a client accesses the data, it can check this tag to ensure that the data is read accordingly.
  • S3 can store an object with max 5 GB in size. S3 objects can be accessed via REST/SOAP/HTTP. Third party tools are available to handle storage management inside S3.
3) Amazon Elastic Compute Cloud - EC2

  • Provides scalable and flexible compute capacity EC2 instance provides interface to manage Amazon Machine Image (AMI, also known as bundle). Amazon, and other third party providers like RightScale, IBM and others provide ready images for use.
  • Any software installation would be lost from EC2 instance, once the instance is "terminated". Persistent images are also available which can persist software changes, once the instance is stopped (but not terminated). These images are based on EBS or S3 instance store.
  • If you use a SQL Server 2008 R2 AMI, then the license cost of SQL Server is included in the cost of running the instance. One cannot use their own purchased licenses to offset the cost of SQL Server license in a AWS provided SQL Server AMI.
  • One can allocate static IP address to an instance using AWS "Elastic IP", and after that once can RDP to the same using the same IP / DNS every time. Without an Elastic IP, the IP address for the instance would change every time the instance is started and stopped. Elastic IPs are chargeable.
Billing types for EC2 instance
  • Reserved Instance - This instance type requires reserving the instance for a fixed term. It includes an up-front cost, along with usage charges. This instance is cheaper than Unreserved instance.
  • Unreserved Instance - This instance is billed on pay-per-use basis, but is comparatively expensive than Reserved Instance.
  • Spot Instance - These are unique type of EC2 instances, which are basically amazon's way to handle spare capacity. You need to set a price and number of instances you need. When the average spot price falls below the price set by you, the instances would be allocated to your account. But downside is that once the average spot price rise above the price set by you, those instance would stop.
  • In AWS, you are not billed for any data transfer between AWS components (for example data transfer between S3 and EC2). But for any data traffic that goes in and out of the instance using Internet, is billable. 
  • Various categories of EC2 instances available like Micro, Standard, Cluster Compute, High-Memory Cluster, Cluster GPU, High Memory, High CPU, High Storage, High I/O etc. Also each of them have small, medium, large scaling for each category. A comparison can be seen from here: http://www.ec2instances.info , http://aws.amazon.com/ec2/instance-types/
4) Amazon Elastic Block Storage (EBS)

  • Its the storage system / disk where EC2 instance would store and persist data. EBS is created, configured and managed out of EC2 instance and not within it. Even if an EC2 instance has been terminated, data stored on EBS would persist.
  • EBS volumes can be 1 GB to 1 TB in size.
  • EBS volume availability is restricted to the region and availability zone in which they are created. It's possible to make it available within a different zone by creating a snapshot of EBS and storing it into S3, and again creating a new EBS from the snapshot stored in S3. But EBS cannot be made available across regions by any means.
  • One EC2 instance can have many EBS volumes, but one EBS volume cannot be shared by multiple EC2 instances.
5) Amazon Security Groups

  • It provides a way to restrict access on EC2 instances, by configuring ports, ip and servers that can connect to an EC2 instance. It acts as a firewall for an EC2 instance.
  • All the EC2 instance on which a security group is applied, does not become part of a common group / subnet.
6) Amazon CloudWatch

  • Cloudwatch are of two types in AWS - Basic CloudWatch and Detailed CloudWatch.
  • Basic CloudWatch is available with EC2 instance. It collects different performance metrics related to the EC2 instance.
  • Detailed CloudWatch enables a detailed monitoring of EC2 instances, with alerts and notifications.

7) Amazon Elastic Load Balancing (ELB)

  • Elastic Load Balancing can be used for two major purposes - Load balancing and Fault tolerance.
  • As a load balancer it can distribute incoming traffic to different servers in a load balanced fashion.
  • As a fail over balancer, it can detect a failed / unresponsive / unhealthy EC2 instance and route traffic to other instances as required.


  • Amazon RDS provides full featured database services using MySQL, Oracle as well as SQL Server database engine.
  • RDS provides fault-tolerance / high availability by creating Multi-AZ Deployments. With this option, one instance of RDS is created in the availability zone selected by user, and second instance is created in an alternative availability zone. Both instances are kept upto date in parallel. The second instance is not visible / available, until the first instance becomes unavailable, and when it does, the second instance takes over immediately.
  • RDS instance can be configured to create Read Replica which are copies of the RDS instance, that can be used for reporting purposes.
  • RDS instances are backed up by default in AWS and this backup remains available for a limited time. Backups are totally configurable and can be persisted indefinitely too.


  • Amazon SNS is a publish and subscribe model using which systems or user can generate and/or receive alerts and/or notifications.
  • There are three methods in which alerts / notifications are delivered: Email / Http based web service call / A message via Simple Queue Service (SQS).

10) Amazon CloudFront

Its the Content Delivery Network of AWS that distributes and caches content at the nearest servers based on user request patterns.

11) Amazon Elastic MapReduce (EMR)

  • Amazon EMR provides features to process large amounts of data using Hadoop based processing combined with other AWS products.
  • EMR also provides option to run HBase (column oriented, distributed, NoSQL database) on Hadoop clusters which enables real-time data access to Hadoop in cloud.

12) Amazon Identity and Access Management (IAM) and Amazon CloudFormation provides means to control permissions to AWS resources as well as manage AWS resources as a system respectively. Amazon Route 53 is a highly available and scalabe Domain Name System (DNS) management service that can be used with AWS IAM to manage domains with faster performance.

2 comments:

Robert P. Calfee said...

This post is probably where I got the most useful information for my research. Thanks for posting, maybe we can see more on this.
Are you aware of any other websites on this
msbi


David Bravo said...

Its very informative and useful blog.Aws solutions architect is the way to build highly scalable and reliable applications in aws cloud.

Related Posts with Thumbnails