MemoryLeak.in: What is Big Data?

These days IT world is abuzz with the term “Big Data”. So, what really is Big Data? By definition:

“Amount of data that is so huge that it can’t be processed – stored, searched, analysed, shared etc. – through traditional computing. “ Here traditional computing refers to non-parallel non-distributed systems like Relational Database Management Systems or RDMS etc.

The three V’s of Big Data are:

Variety: Big Data can arrive from various sources and in various formats like CSVs, JSONs, TSVs etc.
Velocity: The velocity with which data is generated is generally very high and is generated at varied speeds.
Volume: The volume of Big Data is very very high, may be even in terms of Petabytes.

What is the source of Big Data?
Most common source of Big Data is the log that is generated by any system. For example, whenever we access a website, it logs our IP address, browser info, OS, datetime of access and few other details regarding us, to a log file. A website with a huge traffic can generate log files of around 1-2 GB per day or even more. Every day, we create approximately 2.5 quintillion bytes of data. It is estimated that 90% of the data in the world today has been created in the last two years alone. This is the velocity at which we are producing data! New sources of these kind of data are piling up with each passing day.

Why really do we need to process such huge data?
The answer is – to derive some meaningful information/insight out of it. Meaningful information most commonly refers to business value that we can be derived from the Big Data. For example, if we are deploying a new version of a website, we can run analytics on the log files of the older as well as newer version of the website to generate meaningful information such as – was the new design widely accepted? Does it caused a negative impact on the traffic to the website? We can even use the logs to predict the expected traffic during a particular period of a day in a week. This is the era of collective intelligence.

Why can’t we use traditional computing to process Big Data?
The very first thing is the time constraint. By the time, a traditional system will process Big Data and retrieve meaningful information from it, most probably by that time the information retrieved would have lost it’s significance. Secondly, there would be lots of issues while processing Big Data on a single node cluster(a standalone system) – What if the systems goes down after 80% processing is complete? We would then have to wait again for that 80% processing to get completed. And there will be lots of memory issues like our jobs hitting heap space limits, out of memory error etc.

For big players like Google, Amazon etc., setting up a huge cluster of hundreds of nodes to process Big Data is not a big deal, but for smaller players and startups, this isn’t feasible. Here comes into picture the another buzz word of IT sector – The cloud computing. Cloud computing has made it possible for even startups to process Big Data within reasonable cost. One of the most popular cloud service is Amazon’s AWS – Amazon Web Services. There are a number of services provided by AWS which enables even small companies to process Big Data – like S3 or Glacier for cheap storage of Big Data, Elastic MapReduce or EMR for processing of Big Data etc.

Advances in Big Data affects us on daily basis! Be it the ‘friend suggestions’ on Facebook or the ‘personalized ads’ on Google. There is a whole new world of Recommendation Engines which does these processing on a highly parallel and distributed systems. Industry is constantly mining data(actually the Big Data) to provide highly personalized user experience and simultaneously create a new kind of advertising business where in the advertiser knows before hand what you need to buy and what are the products a particular customer is interested in !

1 comment:

AnonymousDecember 2, 2022 at 10:53 PM
Legality of the gambling follow in South Korea is a totally totally different affair from their habits. In fact, ever since they have began gaining power and progressing, legislation relating to this sphere has become even harsher. Moreover, all sites featured on our listing use state-of-the-art applied sciences to protect their users. For instance, every platform makes use of SSL encryption to ensure that|to make certain that} your delicate data is secure from cyberattacks. This state-of-the-art 점보카지노 on line casino in Incheon is positioned in the Hyatt Grand Hotel, solely three minutes away from the Incheon International Airport.

MemoryLeak.in

Pages

Sunday, May 19, 2013

What is Big Data?

1 comment: