Big Data is a relatively new term. It was coined in the 2000s.
What is “Big Data”?
I found this video that explains the concept of Big Data in a concise way.
Along with the video, there were two articles that have been a big help in informing me more on what Big Data is and the way it affects digital citizens.
Doug Laney broke down Big Data into the 3 V’s: Volume, Velocity, and Variety.
In article 1 they add two more V’s (except 1 isn’t a V)
“Variability. In addition to the increasing velocities and varieties of data, data flows can be highly inconsistent with periodic peaks. Is something trending in social media? Daily, seasonal and event-triggered peak data loads can be challenging to manage. Even more so with unstructured data.
Complexity. Today’s data comes from multiple sources, which makes it difficult to link, match, cleanse and transform data across systems. However, it’s necessary to connect and correlate relationships, hierarchies and multiple data linkages or your data can quickly spiral out of control.”
Basically, Big Data is the term that refers to the massive amount of data that we come in contact with every day. The conversation around Big Data is: What are organizations and companies doing with our personal information and data?
It isn’t always a bad thing though! Look at how Big Data can help educators:
“Educators armed with data-driven insight can make a significant impact on school systems, students and curriculums. By analyzing big data, they can identify at-risk students, make sure students are making adequate progress, and can implement a better system for evaluation and support of teachers and principals.”
In our third article, Big Data in Education, it talks about how the University of Alabama is a great model for how schools can use Big Data. Before utilizing big Data, Alabama’s data analysts had to manually pull information and transcribe it into spreadsheets. As you can imagine that would be extremely time consuming, tedious task for a university with over 38,000 students. They can now get better results in a quicker amount of time, which leads to better quality work from employees.
Other universities and how they are using it include:
The University of Central Florida uses data visualization to create and meet the provost’s challenging 2020 goals.
The University of Idaho uses analytics and data visualization to create a new, interactive way to share data with dozens of different constituents.
The University of Louisville created a primary data platform to support the university’s strategic planning process in just 60 days.
There are some examples of Big Data that can be outrageously extreme. In article 4, The Future of Big Data and Analytics in K-12 Education, it mentions an idea from Max Ventilla. He sees the future of the classroom having cameras in the classroom capturing every action, facial expression, etc. that the students have every day all school year long. He envisions infrared cameras, tracking devices like a fitbit, chromebooks that tracks the every movement and second on the tablet. This is an extreme input of Big Data, that may step over a large set of boundaries. This information would be use to collect information to help administration / teachers. They can see that students preform better at certain times of the day or could create custom assignments for students for further academic success.
How do you see the future of Big Data in education? How do you think it would be helpful? Where do you draw the line in it being too invasive?
According to article 5, 5 Dramatic Impacts of Big Data on Education, there are 5 ways that Big Data is already impacting education.
- Empowering Better Decision Making
- Student’s Results
- Career Prediction
- Mapping Concept
- Enhancing the Learning Experience
Some people view Big Data as a helpful tool that can encourage innovation and individualization. Others view Big Data as a privacy breach.
In our sixth article, Big Data Privacy Is A Bigger Issue Than You Think, it sums up the qualms very well. “Big data analytics has the power to provide insights about people that are far and above what they know about themselves.”
I enjoyed reading through the privacy statistics in the Big Data and the Future of Privacy article.
A few mentioned are:
“Google is more than 1 million petabytes in size and processes more than 24 petabytes of data a day, a volume that is thousands of times the quantity of all printed material in the U.S. Library of Congress.
More than 1 billion unique users visit YouTube each month and over 6 billion hours of video are watched each month on YouTube – that’s almost an hour for every person on Earth, and 50% more than last year.
90 percent of the data in the world today has been created in the past two years.
In 2012, data was forecasted to double every two years through the year 2020.
In 2020, the amount of digital data produced will exceed 40 zettabytes, which is the equivalent of 5,200 gigabytes for every man, woman and child on planet earth.”
Reading through that just makes me realize how BIG Big Data can be.
There are several technologies that help business / companies in their Big Data journey.
The article, 15 Big Data Technologies To Watch, gives examples of the most sought after.
I’ll include their top 3 down below to give you a sense of the popular names and what their role in Big Data collection.
1. The Hadoop Ecosystem
While Apache Hadoop may not be as dominant as it once was, it’s nearly impossible to talk about big data without mentioning this open source framework for distributed processing of large data sets. Last year, Forrester predicted, “100% of all large enterprises will adopt it (Hadoop and related technologies such as Spark) for big data analytics within the next two years.”
Over the years, Hadoop has grown to encompass an entire ecosystem of related software, and many commercial big data solutions are based on Hadoop. In fact, Zion Market Research forecasts that the market for Hadoop-based products and services will continue to grow at a 50 percent CAGR through 2022, when it will be worth $87.14 billion, up from $7.69 billion in 2016.
Key Hadoop vendors include Cloudera, Hortonworks and MapR, and the leading public clouds all offer services that support the technology.
Apache Spark is part of the Hadoop ecosystem, but its use has become so widespread that it deserves a category of its own. It is an engine for processing big data within Hadoop, and it’s up to one hundred times faster than the standard Hadoop engine, MapReduce.
In the AtScale 2016 Big Data Maturity Survey, 25 percent of respondents said that they had already deployed Spark in production, and 33 percent more had Spark projects in development. Clearly, interest in the technology is sizable and growing, and many vendors with Hadoop offerings also offer Spark-based products.
R, another open source project, is a programming language and software environment designed for working with statistics. The darling of data scientists, it is managed by the R Foundation and available under the GPL 2 license. Many popular integrated development environments (IDEs), including Eclipse and Visual Studio, support the language.
Several organizations that rank the popularity of various programming languages say that R has become one of the most popular languages in the world. For example, the IEEE says that R is the fifth most popular programming language, and both Tiobe and RedMonk rank it 14th. This is significant because the programming languages near the top of these charts are usually general-purpose languages that can be used for many different kinds of work. For a language that is used almost exclusively for big data projects to be so near the top demonstrates the significance of big data and the importance of this language in its field.
As a summary of Big Data, what it is, how it fits in education, privacy, and technology. Here is a video that goes over what we talked about and our readings. The video also includes examples of Big Data tools that companies can use.