Friday, November 30, 2012

OSPF Fast Convergence Tools - Event Propagation


In continuation to my previous post about OSPF Fast Convergence – Event Detection, I will share my ideas about OSPF Event Propagation tool. Event propagation mainly covers LSA generation process. Once the event has been detected, LSA is generated to reflect the change. LSA is not generated immediately; OSPF waits 5000 msec until generating new LSA. This is only used for Router and Network LSA.

To get OSPF faster convergence(Read More About OSPF High Availability), LSA’s can be rate limited. This will increase OSPF convergence time.

Below is the command line to configure LSA throttling
router ospf 10
timers throttle lsa all (start) (hold) (max)
timers lsa arrival

start:[default = 0 ms] Default is to generate LSA immediately after receiving first trigger.
Recommended = 10 ms

hold : [default = 5000 ms] Small increments of 20ms should be considered if multiple link failures may occur on same router. This ensures if all failures have not been advertised by first regenerated LSA, following one will be promptly triggered.
Recommended = 100 ms

max : [default = 5000 ms] Default value may be kept unchanged. Prior to introduction of LSA throttle timer, bevahior was to wait MinLSInterval between distinct originations of any particular LSA with default MinLSInterval = 5s. Default value provides response equivalent to pre LSA-throttle.
Recommended = 5000ms

timer: [default 1000 ms] This controls minimum interval for accepting the same LSA. “Same LSA" is defined as an LSA instance containing same LSA ID, LSA type, and advertising router ID. If an instance of same LSA arrives sooner than interval that is set, LSA is dropped.
Recommended = 80 ms


Click Here To Read Rest Of The Post...

Thursday, November 29, 2012

OSPF Fast Convergence Tools


What is Network convergence?
Time required to “Detect the Event”, “Propagate the Event”,” Process the Event” and “Update the Routing and Forwarding Information base” and after that routing the traffic to an alternate path during the outage of primary network path is called network convergence.

OSPF have various tool and techniques for fast convergence as below mentioned:-
Event Detection
Event Propagation
Event Processing
RIB Update

Read OSPF High Availability Techniques with SSO,NSF and NSR

I will be covering various OSPF Event Detection tools in this post:-
• Carrier Delay
• Bidirectional Forwarding Detection
• Physical Links
• IP Event Dampening
• Fast Hellos

1. Carrier Delay:- When the physical interface states changes, software must notify that change to the routing process. By default carrier delay is 2 seconds. So we can configure the carrier delay to zero for fast link detection by using “carrier-delay msec 0” under interface.
Note:-Sometimes if the flap interval is less than 2 seconds, during that time router doesn’t generate any logs for the same. This could also be the reason that syslog will not see any message.

2. Bidirectional Forwarding Detection

3. Physical links:- If you are using PoS interface, link failure detection is almost < 50ms. Also no need to configure BFD on PoS interface. If you are using Gigabit Ethernet interfaces, use Auto-Negotiation under physical interface as it helps to detect unidirectional failures. By default it is enabled and recommends not disabling it.

4. IP Event Dampening:- By using this tool we can mitigate the flapping links. The concept is same as we normally use in BGP. It simply tracks the flapping interface and applies penalty. Add that flapping interface in down state from routing protocol perspective if it exceeds the penalty threshold.

5. Fast Hellos:- OSPF fast hello packets mainly focus on the stability rather than the convergence. It is very CPU intensive process and sometimes leads to the high CPU problems also. It can be configured under interface by using “ip ospf dead-interval minimal hello-multiplier 3” command. Minimal keyword in the command sets dead interval to 1 second and hello-multiplier 3 means send 3 packets in one second.


Click Here To Read Rest Of The Post...

Wednesday, November 28, 2012

CISCO EIGRP DUAL Algorithm


Enhanced Interior Gateway Routing Protocol (EIGRP) is a advanced distance vector routing protocol proprietary to CISCO. Highly valued for its ease of deployment and fast convergence, EIGRP is used in many large Enterprise networks. EIGRP maintains all of the advantages of distance-vector protocols and having good features set for selecting loop free paths too.

EIGRP scales effectively in a well-designed network and provides extremely quick convergence times with minimal network traffic. EIGRP advantages include:
• Consumes low network resources as compared to OSPF
• Transmits only partial updates not the full routing table
• Rapid convergence times for changes in the network topology

Deep dive to get better understanding of Cisco EIGRP DUAL Algorithm


What is Reported Distance?
As depicted in Figure 1, A has three directly connected neighbors to reach E. The distance reported by A’s neigbors to reach E is known as Reported Distance. So A has three reported distance from it’s neighbors as below mentioned:-
• B can reach E with a cost of 10
• C can reach E with a cost of 10
• D can reach E with a cost of 30



What is Feasible Distance?
The total cost of each link to reach from A to E is known as feasible distance. As depicted in figure 2, A has three feasible distance to reach E with below mentioned cost.
• A can reach to E via B with cost of 20
• A can reach to E via C with cost of 25
• A can reach to E via D with cost of 45
The best out of the three feasible distances will become the successor. So as per the above output, B is nominated as the successor.

How to select the feasible successor or loop free alternate? After selecting successor, DUAL will look for the Reported Distance of the neighbors and check which one satisfies the Feasibility Condition; Reported Distance < Feasible Distance. As per the above two outputs we can conclude that
• C can reach E with a cost of 10(Reported Distance), so C reported distance (10) < feasible distance (20). This path is a loop free.
• D can reach E with a cost of 30(Reported Distance), so D reported distance (30) > feasible distance (20). This doesn’t satisfies the condition of selecting feasible successor (Reported Distance < Feasible Distance). So DUAL will mark this path as loop.

So now A has marked B as its successor and C has its feasible successor. Let’s assume now link between A and B is down as per figure 3. Now EIGRP will examine the available paths to E and declared C as a successor (best path to reach E) which was earlier selected as feasible successor (Loop Free Path). Now EIGRP will again look for its feasible successor and as per the Feasibility Condition, Reported Distance from D is 30 which is greater than the Feasible Distance. So EIGRP will consider D as loop path and didn’t qualify it for the feasible successor.


Now assume, link between A and C is down as shown in Figure 4. Now EIGRP will look for loop free path but unfortunately, as shown in Figure 3, no loop free path is available. However, A has a neighbor D might have a loop free path to E. So A will put E into Active State and query to D, D sends an reply to its query that it has a loop free path available to E. Once A will receive the reply from D, it beings start using that path for E.


Read More About Remembering PE-CE EIGRP

Read More About Eigrp adjacency issues with TLV


Click Here To Read Rest Of The Post...

Monday, November 26, 2012

BGP Graceful Restart, NSR and NSF



As we have already seen that(OSPF High Availability with SSO,NSF and NSR) there are two different mechanism to prevent routing protocol re-convergence during a processor switch-over. One is Graceful Restart(Non Stop Forwarding) and another is Non Stop Routing(NSR). Graceful Restart and Non Stop Routing allows for the forwarding of data packets to continue along known routes. By using Graceful Restart routing protocol information is being restored while using Non Stop Routing, routing protocol information is being refreshed.

BGP Graceful Restart for NSF
• BGP Graceful Restart is described in RFC 4724
• BGP has been enhanced with NSF-capability and awareness
• Routers running these protocols can detect a switchover and take the necessary actions to continue forwarding network traffic and to recover route information from the peer devices
• NSF Aware
– A router is NSF-aware if it is running NSF-compatible software.
• NSF Capable
– A router is NSF-capable if it has been configured to support NSF; therefore, it would rebuild routing information from NSF-aware or NSF-capable neighbors.
• BGP support for NSF requires that neighbor routers are NSF-aware or NSF-capable
• A router that is NSF-aware functions like a router that is NSF-capable with one exception: an NSF-aware router is incapable of performing an SSO operation
• A router that is NSF-aware is capable of maintaining a peering relationship with a NSF-capable neighbor during a NSF SSO operation, as well as holding routes for this neighbor during the SSO operation
• NSF awareness for BGP is not enabled by default as BGP uses TCP connection. It can be enabled by using bgp graceful-restart under BGP process.

Click Here To Read Rest Of The Post...

Friday, November 23, 2012

How does OSPF behave with SSO,NSF and NSR


In my last post, tried to explain OSPF High Availability with SSO,NSF and NSR. In this post I am trying to give a snapshot how does OSPF behaves with SSO,NSF and NSR.

OSPF without SSO/NSF
1. All the OSPF neighborship are up and running. At any point of time, Router A has some issue with primary RP and it gets restarted.
2. Adjacency between router A and B will down.
3. Router B will remove the adjacency and clear the entire forwarding table for router A.
4. Router B will inform the change about router A to all its connected peers named C,D and E.
5. Now router A is up again after reboot and establish a peer relationship with router B.
6. Router B will add the routes in its routing table.
7. Router B will inform the change to its peer neighbors.


OSPF with SSO/NSF
1. All the OSPF neighborship are up and running and Graceful Restart Capability are exchanged between Router A and Router B.
2. At any point of time, Router A has some issue with primary RP and switch to the secondary RP.
3. Router B doesn’t remove the associated routes from its routing table.
4. Router B doesn’t inform its peer about the change.
5. Router A standby RP re-establishes adjacency.
6. Router B updates router A with its routing information.
7. Router A updates router B with its routing information.


OSPF with NSR (Non Stop Routing)
1. Router A synchronizes it’s all OSPF states and databases to the secondary or standby RP.
2. Router A primary RP fails.
3. Router B doesn’t remove routes from its table and doesn’t inform the neighbors about the OSPF change.
4. Router A standby RP continues forwarding while the using the states exchange by the primary RP.

Click Here To Read Rest Of The Post...

Thursday, November 22, 2012

OSPF High Availability with SSO,NSF and NSR


Nonstop Forwarding with Stateful Switchover (NSF/SSO) are redundancy mechanisms for intra-chassis route processor failover.

SSO synchronizes Layer 2 protocol state, hardware L2/L3 tables (MAC, FIB, adjacency table), configuration, ACL and QoS tables.

NSF gracefully restarts routing protocol neighbor relationships after an SSO fail-over

1. Newly active redundant route processor continues forwarding traffic using synchronized HW forwarding tables.
2. NSF capable routing protocol (eg: OSPF) requests graceful neighbor restart.Routing neighbors reform with no traffic loss.
3. Cisco and RFC3623 as per standard.
4. Graceful Restart capability must always be enabled for all protocols. This is only necessary on routers with dual processors that will be performing switch overs.
5. Graceful Restart awareness is on by default for non-TCP based interior routing protocols (OSPF,ISIS and EIGRP). These protocols will start operating in GR mode as soon as one side is configured.
6. TCP based protocols (BGP) must enable GR on both sides of the session and the session must be reset to enable GR. The information enabling GR is sent in the Open message for these protocols.

Nonstop Routing (NSR) is a stateful redundancy mechanism for intra chassis route processor (RP) failover.

NSR , unlike NSF with SSO, 1. Allows routing process on active RP to synchronize all necessary data and states with routing protocol process on standy RP.
2. When switchover occurs, routing process on newly active RP has all necessary data and states to continue running without requiring any help from its neighbor(s).
3. When switchover occurs, routing process on newly active RP has all necessary data and states to continue running without requiring any help from its neighbor(s).
4. Standards are not necessary as NSR does NOT require additional communication with protocol peers
5. NSR is desirable in cases where routing protocol peer doesn’t support Cisco or IETF standards to support Graceful Restart capabilities exchange.
6. NSR uses more system resources due to information transfer to standby processor.

Deployment Considerations for SSO/NSF and NSR

1. From a high level, you need to protect the interfaces (SSO), the forwarding plane (NSF) and the control plane (GR or NSR).
2. Enabling SSO also enables NSF.
3. Each routing protocol peer needs to be examined to ensure that both its capability has been enabled and that its peer has awareness enabled.
4. While configuring OSPF with NSF, make sure all the peer devices that participate in OSPF must be made OSPF NSF-aware.
5. While configuring OSPF with Non Stop Routing (NSR), it doesn’t require peer devices be NSR capable. It only requires more system resources. Both NSF and NSR could be active at same time but NSF is used as fallback.

Read More About OSPF Design Consideration

Design Considerations:OSPF Network Stability

What is Bidirectional Forwarding Detection?

Configuration:-
OSPF with Nonstop Forwarding
redundancy
mode sso
!
router ospf 1
nsf [cisco | ietf]


OSPF with Nonstop Routing
nsr process-failures switchover
!
router ospf 1
nsr

Click Here To Read Rest Of The Post...

Wednesday, November 21, 2012

OSPF Routing Protocol Summary


What Is OSPF?
• Open Shortest Path First
• Link State Protocol using the Shortest Path First algorithm (Dijkstra) to calculate loop-free routes
• Used purely within the TCP/IP environment
• Designed to respond quickly to topology changes but using minimal protocol traffic
• Used in both Enterprise and Service Provider Environment
• Uses IP protocol 89
• Metric is cost, based on interface bandwidth by default (10^8 / BW in bps)
• Sends partial route updates only when there are changes
• Send hello packets every 10 sec with dead timer of 40 sec over Point to Point & Broadcast networks
• Send hello packets every 30 sec with dead timer of 120 sec over NBMA networks
• Uses multicast address 224.0.0.5 (ALL SPF Routers)
• Uses multicast address 224.0.0.6 (ALL DR Routers)

Different Types of OSFP LSAs
1. Router Link State Advertisement (Type 1 LSA)
2. Network Link State Advertisement (Type 2 LSA)
3. Summary Link State Advertisement (Type 3 and Type 4 LSA)
4. External Link State Advertisement (Type 5 LSA)

Different types of OSPF Packet
1. Hello
2. Database description
3. Link State Request
4. Link State Update
5. Link State Acknowledgement

Different Types of OSPF Areas
Regular Area: ABRs forward all LSAs from backbone
• Summary LSA (summarized/non-summarized) from other areas injected
• External links injected

Stub Area: A stub area is an area with single exit point (if you need multiple exit points then configure it as NSSA) into which External LSA are not flooded
• Summary LSAs from other areas injected
• LSA type 5 not injected
• Default LSA injected into area as Summary LSA
• Define all routers in area as stub
• External link flaps will not be injected
Consolidates specific external links— default 0.0.0.0
Used in networks with a lot of LSA Type 5

Totally Stubby Area
A Totally Stubby Area Forwards Default Link 0.0.0.0
The ABR will block not only the AS External LSAs bit also all Summary LSAs, except a single Type 3 LSA to advertise the default route.

Not So Stubby Areas (NSSA)
Benefits of stub area, but ASBR is allowed

New type external LSA (type 7)
• Type 7 LSAs flooded throughout the area
• No type 5 external LSAs in the area
• Type 7 LSAs will be converted into type 5 LSAs when flooded into area 0 by ABRs
Filtering and summaries allowed at ABRs

Areas are used to make OSPF Scale
• OSPF uses a 2 level hierarchical model
• One SPF per area, flooding done per area
• Regular, Stub, Totally Stubby and NSSA Area Types
• A router has a separate LS database for each area to which it belongs
• All routers belonging to the same area should have identical databases
• SPF calculation is performed independently for each area
• LSA flooding is bounded by area
• If any link flaps in the area it takes nlogn calculations where n is the number of the links in the area

Read More About OSPF Design Consideration
Design Considerations:OSPF Network Stability


Click Here To Read Rest Of The Post...

Tuesday, November 20, 2012

ITU’s Secret Plans to regulate the Internet


The World Conference on International Telecommunications (WCIT 2012) of the ITU is to be held during December 3 - 14 in Dubai to review the current International Telecommunication Regulations (ITRs).

The ITU has regulated telegraphs, telephones since 1865, coordinates the shared global use of radio spectrum and has a role to play in the assignment of satellite orbits.

Now the ITU seeks to include Internet Governance in its sphere of control. There are several proposals to be discussed at the WCIT in a closed setting, to bring about adverse changes to the way the Internet works.

A simplified infographic, as presented by equaltimes.org is posted below. Please take a look at this infographic, share it with your colleagues and contacts. You could also express your concern by signing a petition.



Click Here To Read Rest Of The Post...

Monday, November 19, 2012

Capacity Planning Hadoop Cluster


In the previous post, I have explained about the working architecture of Hadoop along its components like name node, data node etc. In this post, I will be explaining how to capacity plan hadoop cluster in terms of data node. Name node capacity plan will cover in the upcoming posts. I am assuming that the deployment will start from 2013 and data and data node capacity projections will be given till 2017. Below are the assumptions which have been considered while capacity planning hadoop cluster:-

As per the above listed assumptions, starting from 1TB of dailiy data from 2013. for capacity building assuming 5% data growth per month starting from 2014 onwards. In 2013 we have 1080TB of data and by the end of 2017 we have 8711Tb of data. Almost 8 times from the starting year. Below is the table which tells how many UCS Servers(Data Nodes) will be required to handle 8711TB of data.

Graphical View of the projections:-


Click Here To Read Rest Of The Post...

Saturday, November 17, 2012

Big Data Telecom Use Cases


After reading lot of blogs and websites, I could conclude the only below listed use cases for Big Data for telecom market.
1. Customer churn prevention
2. Customer loyalty
3. New Campaign Management
4. Call Detail Record Analysis
5. Network Performance and Optimization by reading logs from the various devices
6. Fraud and Suspicious traffic detection
7. Contact Center Text Mining
8. Multi device event stream analysis co-relating Firewall & IDS & Switch activity
Feel free to comment if you have more.

Click Here To Read Rest Of The Post...

Thursday, November 15, 2012

Hadoop File System Metadata Replication By Using Secondary Name Node


FSIMAGE and EDITS are the two most important files of name node. FSIMAGE is having snapshot of all the metadata of the Hadoop cluster whereas EDITS contains the incremental information of metadata. The reason for maintaining the incremental information in another file because it requires the write operation whereas FSIMAGE is served directly from the RAM. Over the time EDITS log file grows and in case of failure of name node, it could take a longer time to load the stuff in FSIMAGE.

How to eliminate the problem in case of failure of primary name node?

The solution would be secondary name node. The logic is as simple as we do replace the two numbers by using the temp variable. Below are the messages exchange between primary name node and secondary name node.

1.Secondary name node sends a message to primary name node to start writing the incremental messages in EDITS.NEW file.
2.Secondary name node copies the FSIMAGE and EDITS file from the primary name node.
3.Secondary name node adds the EDITS to the FSIMAGE and make a new FSIMAGE file.
4.Sends it to the primary name node.
5.Primary name node renames the EDITS.NEW file to EDITS.

Click Here To Read Rest Of The Post...

Wednesday, November 14, 2012

Hadoop Cluster - File Read Function


Autonomy of Hadoop Write explains how the client writes a file to the Hadoop HDFS Cluster. Once the file is written, it could be used for read only also. In Hadoop, Client sends a request to the name node requesting access a file and name node returns the data node name with the blocks they have. Below is the process how does it work. Here is the beginning of my post.

Click Here To Read Rest Of The Post...

Tuesday, November 13, 2012

Hadoop Cluster – The Anatomy of Hadoop Pipeline Write


Now we know about Big Data, HDFS, Map Reduce and different types of Hadoop Nodes. In this post, I will be touching the how the client writes the files with in the Hadoop Cluster with different options. A client is having a file 1 which is split into three blocks named A, B and C as depicted in the below figure.

Writing a block on the different data nodes with in same rack space:-
Step 1:- In the client will send its request to “name node” by saying I have three blocks of file 1. Please let me know how could I write these blocks to which nodes.
Step 2:- Name Node will reply by returning the names of data node 1 is used for Block A, data node 2 is for block B and data node 3 is for block C.
Step 3:- Client will write block A on data node 1. Block B writing will not start till the client will get the acknowledgement of block A. Thereafter it will write C.
Figure 1

As per figure 1, it doesn’t maintain the redundancy and availability in case of failure of any of the data nodes or TOR switch (Top of Rack).

Data Node 1 will automatically look for the nearest data node to replicate its block A to the nearest available data node. The same process will be done by data node 2 and 3 for replication of block B and C within the same rack as depicted in Figure 2. The advantage of replication will help to maintain the redundancy and availability in case of failure of any of the data nodes as depicted in figure 2 but couldn’t provide availability in case of failure of rack.
Note the above implementation is based on the replica of 2.
Figure 2

To maintain high uptime, there is an option of “Hadoop Pipeline Write”. Client will send write request of block A on data node 1. Once the request is received, data node 1 will look for another nearest available data node but not in the same rack but in the different rack as depicted in the figure 3. Now data node 1 will copy the block A to data node 4 and further data node 4 will copy it to data node 5 by creating a pipeline write. Similar ways all the blocks will be get copied to the respective nodes.
Data Node 5 will acknowledge about block A to 4 and data node will acknowledge to data node 1, data node 1 will send the acknowledgement to client.
Figure 3

The underneath network could be a layer 2 or layer 3. If it is layer 2 network, the loops must be avoided to use the full optimal bandwidth because The Anatomy of Hadoop Pipeline requires more bandwidth.
Figure 3 implementation is based on the replica of 3.

Click Here To Read Rest Of The Post...

Sunday, November 11, 2012

Hadoop Architecture – Types of Hadoop Nodes in Cluster - Part 2


In continuation to the previous post (Hadoop Architecture-Hadoop Distributed File System), Hadoop cluster is made up of the following main nodes:-
1.Name Node
2.Data Node
3.Job Tracker
4.Task Tracker

The above depicted is the logical architecture of Hadoop Nodes. But physically data node and task tracker could be placed on single physical machine as per below shown diagram.



There are few other secondary nodes name as secondary name node, backup node and checkpoint node. This above diagram shows some of the communication paths between the different types of nodes in the Hadoop cluster. A client is shown as communicating with a JobTracker as well as with the NameNode and with any DataNode. There is only one NameNode in the cluster but one can plan for the redundant name node in the cluster but manually it has to be switched on. While the data file is stored in blocks at the data nodes, the metadata for a file is stored at the NameNode. If there is one node in the cluster to spend money on the best enterprise hardware for maximum reliability it is the NameNode. The NameNode should also have as much RAM as possible because it keeps the entire filesystem metadata in memory and data nodes could be used as commodity hardware.

Any typical HDFS cluster has many DataNodes. They store the blocks of data and when a client requests a file, it finds out from the NameNode which DataNodes store the blocks that make up that file and the client directly reads the blocks from the individual DataNodes. Each DataNode also reports to the NameNode periodically with the list of blocks it stores. A JobTracker node manages MapReduce jobs. There is only one of these on the cluster. It receives jobs submitted by clients. It schedules the Map tasks and Reduce tasks on the appropriate TaskTrackers in a rack-aware manner(Hadoop knows the network topology) and monitors for any failing tasks that need to be rescheduled on a different TaskTracker. To achieve the parallelism for your map and reduce tasks, there are many TaskTrackers in a Hadoop cluster. Each TaskTracker spawns Java Virtual Machines to run your map or reduce task.

Click Here To Read Rest Of The Post...

Friday, November 9, 2012

Hadoop Architecture – Hadoop Distributed File System - Part 1


Hadoop cluster is a collection of racks. Every Rack is having nodes generally it is called computers. When we group all the nodes in same rack and collection of various racks become Hadoop cluster.
Hadoop Cluster








Hadoop has two major componets:-
1.Hadoop Distributed File System a.k.a HDFS
2.Map Reduce
I will be discussing more on HDFS in this post. HDFS runs on top of the existing file systems on each node in a Hadoop cluster Hadoop works best with very large files. The larger the file, the less time Hadoop spends seeking for the next data location on disk and the more time Hadoop runs at the limit of the bandwidth of your disks. Seeks are generally expensive operations that are useful when you only need to analyze a small subset of your dataset. Since Hadoop is designed to run over your entire dataset, it is best to minimize seeks by using large files. Hadoop is designed for streaming or sequential data access rather than random access. Sequential data access means fewer seeks, since Hadoop only seeks to the beginning of each block and begins reading sequentially from there. Hadoop uses blocks to store a file or parts of a file.









HDFS-Blocks
They default block size is 64 megabytes each and most systems run with block sizes of 128 megabytes or larger. A Hadoop block is a file on the underlying file system. Since the underlying file system stores files as blocks, one Hadoop block may consist of many blocks in the underlying file system as shown in the figure.





Advantages of Blocks
1.Easy to calculate how many can fit on a disk.
2.A file may be larger than any disc in the network
3.If the file is smaller than the block size, the only needed space is used. This is mainly the case when to store the last block. E.g. to store 420MB file the split is shown as below:-



Click Here To Read Rest Of The Post...

Wednesday, November 7, 2012

What is Big Data or Hadoop?


Imagine if you are having 100MB of data which is stored in structured way (RDBMS) and you need to process it. The best way to use it on your personal computer because PC doesn’t have any problem to process this kind of data. Even PC will help to work up to few GBs of data.

But what will happen when
1. Data grow exponentially and you are almost approaching the limits of computer.
2. Data is receiving as unstructured form
3. Data becomes burden to your IT


Management wants to derive the information from both relational and unstructured data. The answer is Hadoop. Hadoop is an open source project of the Apache Foundation and written in Java developed by Doug Cutting who named it after his son’s elephant. Hadoop uses Map Reduce and Google file system technologies as its foundation. Hadoop is opted for distributed deployment not for much parallel for processing. It is optimized to handle massive quantities of data which could be structured (like RDBMS), unstructured (tweets or facebook comments etc.) or semi-structured, using commodity hardware, that is, relatively inexpensive computers. Hadoop replicates its data across different computers, so that if one goes down, the data are processed on one of the replicated computers.

Hadoop is not suitable for Online Transaction Processing workloads where data are randomly accessed on structured data like a relational database. Hadoop is not suitable for Online Analytical Processing or Decision Support System workloads where data are sequentially accessed on structured data like a relational database, to generate reports that provide business intelligence. Hadoop is used for Big Data. It complements Online Transaction Processing and Online Analytical Processing. It is not a replacement for a relational database system.

So, what is Big Data?
With all the devices available today to collect data, such as RFID readers, microphones, cameras, sensors, and so on, we are seeing an explosion in data being collected worldwide. Big Data is a term used to describe large collections of data (also known as datasets) that may be unstructured, and grow so large and quickly that it is difficult to manage with regular database or statistics tools. Therefore, Big Data solutions based on Hadoop and other analytics software are becoming more and more relevant for every type of industry.

Click Here To Read Rest Of The Post...