What is Big Data?
Big data describes a holistic information management strategy that includes and integrates many new types of data and data management alongside traditional data.Big data is a popular term used to describe the exponential growth and availability of data, both structured and unstructured. And big data may be as important to business – and society – as the Internet has become. Why? More data may lead to more accurate analyses.
Big data is a broad term for data sets so large or complex that traditional data processing applications are inadequate. Challenges include analysis, capture, data curation, search, sharing, storage, transfer, visualization, and information privacy. The term often refers simply to the use of predictive analytics or other certain advanced methods to extract value from data, and seldom to a particular size of data set. Accuracy in big data may lead to more confident decision making. And better decisions can mean greater operational efficiency, cost reductions and reduced risk.
Volume.
Many factors contribute to the increase in data volume. Transaction-based data stored through the years. Unstructured data streaming in from social media. Increasing amounts of sensor and machine-to-machine data being collected. In the past, excessive data volume was a storage issue. But with decreasing storage costs, other issues emerge, including how to determine relevance within large data volumes and how to use analytics to create value from relevant data.Velocity.
Data is streaming in at unprecedented speed and must be dealt with in a timely manner. RFID tags, sensors and smart metering are driving the need to deal with torrents of data in near-real time. Reacting quickly enough to deal with data velocity is a challenge for most organizations.Variety.
Data today comes in all types of formats. Structured, numeric data in traditional databases. Information created from line-of-business applications. Unstructured text documents, email, video, audio, stock ticker data and financial transactions. Managing, merging and governing different varieties of data is something many organizations still grapple with.Characteristics:
Volume
The quantity of generated data is important in this context. The size of the data determines the value and potential of the data under consideration, and whether it can actually be considered big data or not. The name ‘big data’ itself contains a term related to size, and hence the characteristic.Variety
This is the category of big data, and an essential fact that data analysts must know. This helps people who analyze the data and are associated with it effectively use the data to their advantage and thus uphold the importance of the big data.Velocity
‘Velocity’ in this context means how fast the data is generated and processed to meet the demands and the challenges that lie in the path of growth and development.Variability
This refers to inconsistency the data can show at times—which hampers the process of handling and managing the data effectively.Veracity
The quality of captured data can vary greatly. Accurate analysis depends on the veracity of source data.Complexity
Data management can be very complex, especially when large volumes of data come from multiple sources. Data must be linked, connected, and correlated so users can grasp the information the data is supposed to convey.Applications
- Government
- United States of America
- India
- United Kingdom
- International development
- Manufacturing
- Cyber-Physical Models
- Media
- Internet of Things (IoT)
- Technology
- Private sector
- Retail
- Retail Banking
- Real Estate
- Science
- Science and research
Who Uses Big Data:
- IBM
- HP
- EMC
- Teradata
- Oracle
- SAP
- Microsoft
- Amazon Web Services
- VMware
Advantages and Disadvantages:
Advantages:
1. Usability:
All cloud storage services reviewed in this topic have desktop folders for Mac’s and PC’s. This allows users to drag and drop files between the cloud storage and their local storage.2. Bandwidth:
You can avoid emailing files to individuals and instead send a web link to recipients through your email.3. Accessibility:
Stored files can be accessed from anywhere via Internet connection.4. Disaster Recovery:
It is highly recommended that businesses have an emergency backup plan ready in the case of an emergency. Cloud storage can be used as a back‐up plan by businessesby providing a second copy of important files. These files are stored at a remote location and can be accessed through an internet connection.
5. Cost Savings:
Businesses and organizations can often reduce annual operating costs by using cloud storage; cloud storage costs about 3 cents per gigabyte to store data internally. Users can see additional cost savings because it does not require internal power to store information remotely.Disadvantages:
1. Usability:
Be careful when using drag/drop to move a document into the cloud storage folder. This will permanently move your document from its original folder to the cloud storage location. Do a copy and paste instead of drag/drop if you want to retain the document’s original location in addition to moving a copy onto the cloud storage folder.2. Bandwidth:
Several cloud storage services have a specific bandwidth allowance. If an organization surpasses the given allowance, the additional charges could be significant. However, some providers allow unlimited bandwidth. This is a factor that companies should consider when looking at a cloud storage provider.3. Accessibility:
If you have no internet connection, you have no access to your data.4. Data Security:
There are concerns with the safety and privacy of important data stored remotely. The possibility of private data commingling with other organizations makes some businesses uneasy.5. Software:
If you want to be able to manipulate your files locally through multiple devices, you’ll need to download the service on all devices.Big Data Analytics Tools:
More and more tools offer the possibility of real-time processing of Big Data. As Hadoop at the moment does not offer Real-Time Big Data Analytics, other products should be used. Fortunately, there a quite some (open source) tools that do the job well.Storm
Storm, which is now owned by Twitter, is a real-time distributed computation system. It works the same way as Hadoop provides batch processing as it uses a set of general primitives for performing real-time analyses. Storm is easy to use and it works with any programming language. It is very scalable and fault-tolerant.Cloudera
Cloudera offers the Cloudera Enterprise RTQ tools that offers real-time, interactive analytical queries of the data stored in HBase or HDFS. It is an integral part of Cloudera Impala, an open sourcetool of Cloudera.
Gridgrain
GridGain is an enterprise open source grid computing made for Java. It is compatible with Hadoop DFS and it offers a substitute to Hadoop’s MapReduce. GridGain offers a distributed, in-memory, real-time and scalable data grid, which is the link between data sources and different applications.SpaceCurve
The technology that SpaceCurve is developing can discover underlying patterns in multidimensional geodata. Geodata is different data than normal data as mobile devices create new data really fast and not in a way traditional databases are used to. They offer a Big Data platform and their tool set a new world record on February 12, 2013 regarding running complex queries with tens of gigabytes per second.References:
https://en.wikipedia.org/wiki/Big_datahttp://www.sas.com/en_us/insights/big-data/what-is-big-data.html
http://www.datamation.com/applications/30-big-data-companies-leading-the-way-1.html
http://bigdata-madesimple.com/5-advantages-and-disadvantages-of-cloud-storage/
https://datafloq.com/read/the-power-of-real-time-big-data/225