( ESNUG 554 Item 2 ) -------------------------------------------- [12/10/15]
Subject: Jolly's quick-and-dirty cheat sheet for those exploring Big Data
All Big Data systems share these common traits:
- Data is broken into many small pieces called "shards".
- Shards are stored and distributed across many smaller cheap disks.
- These cheap disks exist on cheap Linux machines.
Cheap == low memory, consumer-grade disks and CPU's.
- Shards can be stored redundantly across multiple disks, to
build resiliency. (Cheap disks and cheap computers have
higher failure rates).
Big Data software (like Hadoop) use simple, powerful techniques so the
data and compute are massively parallel.
- from http://www.deepchip.com/items/0554-01.html
From: [ John "Jolly" Lee of Apache Ansys ]
Hi, John,
P.S. Here's cheat sheet links for those who want to explore Big Data more.
By far, the most popular Big Data system is Hadoop. It's open source and
was originally developed by some Yahoo engineers
http://en.wikipedia.org/wiki/Apache_Hadoop
who took the ideas from the seminal Google MapReduce research paper
http://research.google.com/archive/mapreduce.html
It's not hard for a SW engineer to start playing with Big Data systems.
Hadoop is available for download
http://hadoop.apache.org/releases.html
but it usually requires root access to install. Or you can use Amazon
to start playing with pre-installed systems
http://aws.amazon.com/elasticmapreduce
Most companies will want to go with a commercially supported solution.
Just like how many companies go with RedHat for a commercially supported
Linux distro, there are many companies also going with Cloudera, MapR, and
Hortonworks for Hadoop-based Big Data systems.
It's interesting to note that Intel invested $1B into Cloudera, and Google
invested $1B into MapR, and Hortonworks is a spin-out from Yahoo. Other
Big Data related companies that your readers may want to read up on are:
Databricks, Splunk, Pivotal, Palantir, MongoDB, and Tableau.
- John "Jolly" Lee
Ansys-Apache, Inc. San Jose, CA
---- ---- ---- ---- ---- ---- ----
Related Articles
ANSS "Jolly" on why Big Data is a bad fit for EDA and chip design
Join
Index
Next->Item
|
|