Big Data Standards and Benchmarks

 

Big Data Abstract

 

Big Data is a field that is currently exploding at universities and in companies around the world. However, very few high schools world wide, offer Big Data as a high school class. One thing holding schools back is that there is no formal curriculum for Big Data in a high school setting. The goal of this work is to create a high school curriculum for Big Data.  The concept for offering Big Data as a high school class originated at Concordia International School in Shanghai by my coauthor Peter Tong who developed the class with support from IBM. I teach this course at Concordia International School in Hanoi. In collaboration with my colleague, while working on graduate level work in Data Science at Harvard University, we have created a formal high school curriculum for Big Data that includes Standards and Benchmarks. This is a key step in Big Data being offered as a high school class by a large number of high schools. The creation of a formal curriculum in a high school setting should facilitate the spread of Big Data as a course offering in high schools around the world.

 

Neil Whitehead

 

Course Description:

 

What is “Big Data”? In the simplest of terms, it refers to the tools, processes and procedures allowing an organization to create, manipulate, and manage very large data sets and storage facilities. As of the beginning of 2013, the world is creating more data each day, than for the past four decades combined. The process of shifting through such sheer quantities of data proves to be a demanding process for any person to do. Big Data is a new course that encompasses information technology, science and mathematics. This course will focus on the conceptual understanding and the application theory behind Big Data Analytics rather than explicit formulas and technical jargons. The main objective for this course is to create “awareness” and to be exposed to the realm of big data and the hidden dangers it might bring. This course will include some hands on experience utilizing big data analytics to solve some practical real life projects. Upon completion, you will be more aware about this big data phenomenon.  By Dr. Peter Tong

 

Standards and Benchmarks Big Data – High School Curriculum

By Neil Whitehead and Dr. Peter Tong

Standard 1

Recognize how Big Data influences our world in ways that we do not necessarily notice.

Benchmark 1.1 Understand how having more data fundamentally changes what kinds of patterns we can see. We can see patterns that were previously invisible.

Benchmark 1.2 Understand how Big Data is behind the scenes in many aspects of our lives from credit ratings, insurance premiums etc.

Standard 2

Data Wrangling- Understand the challenges of collecting and cleaning data including joining data sets that do not match or contain conflicting information.

 

Benchmark 2.1 Understands how data sets can be Messy

Benchmark 2.2 Understands the challenges of joining data sets

Benchmark 2.3 Understands the challenges of data sets with conflicting data.

Benchmark 2.4 Structured and unstructured data. Understand the challenges of transforming data (such as changing the units of a one data set to connect it with another data set), such as sentiments data, converting picture/video data for analyses. (Students may use tools such as Data Monarch)

Benchmark 2.5 Can successfully take data and organize it in a format that it can be accepted by software such as Watson Analytics or Tableau

Standard 3

Understand how data storage has changed over time and how this has contributed to the rise of Big Data.

Benchmark 3.1 Understands how the cost per memory unit has dropped over time

Benchmark 3.2 Understands how computers are now networked so that a data set can be stored on many computers.

 

Standard 4

Understand the role of data production in the rise of Big Data

Benchmark 4.1 Understands the sources of data. This includes machines and devices producing data

Benchmark 4.1 Students are able to find good sources of data to work with in a project format

Benchmark 4.2 Students can understand case studies of how data such as GPS is collected and used

Benchmark 4.3 Students can understand how the increase in the volume of Data has increased due to devices creating data

Standard 5

Understand how companies are deriving value from Big Data  

Benchmark 5.1: Understand how Big Data is used by companies such as Walmart for product placement

Benchmark 5.2 Understand companies such travel companies that leverage big data to earn a profit

Benchmark 5.3 Understand how Big Data is used by companies such as Amazon to make recommendation.

 

Standard 6

Develop an understanding of how changes in data processing have contributed to the rise of big data and machine learning such as AI (examples Hadoop, etc)

Benchmark 6.1 Understands how network such as Hadoop have contributed to the rise of Big data

Benchmark 6.2 Understand how processing speed has changed in recent years

Benchmark  6.3 Understands the necessary interrelationship between the growth in Storage, Processing, Bandwidth and Security

 

Standard 7

Develop an understanding of the likely future of Big Data including particularly in regards to the Internet of Things

 

Benchmark 7.1 Understands the history of the Internet of Things including RFID and how it gained traction

Benchmark 7.2 Understand how Moore’s Law has contributed to the progression from Mainframe computers to the Internet of Things.

Benchmark 7.3  Understands Sensing in the Big Data context (examples include health benefits of sensors such as Fitbit and safety implications of sensors in the seat of cars)

Benchmark 7.4 Understand how Big Data will give rise to smart appliances

Benchmark 7.5 Understands how Big Data will give rise to smart homes

Benchmark 7.6 Understands how Big Data will give rise to smart cities

Benchmark 7.7 Understands how AI will give rise to self driving cars

 

Standard 8

Develop an understanding of the risks associated with Big Data.

 

Benchmark 8.1 Understands Privacy privacy concerns

Benchmark 8.2 Understands predictive risks such as the possibility People being accused of crimes that they are predicted to be about to commit

Benchmark 8.3 Understands hacking risks such as the risk of hacking into someone’s device such as self-driving car and causing a crash or someone’s smart toaster to cause a fire.

 

Standard 9

Find correlations between data sets and do analysis. This includes, finding good sources of data, cleaning data, asking good questions of the data and making models. Students may use software products such as Watson Analytics and spreadsheets to help with this analysis.

 

Benchmark 9.1 : Is able to ask meaningful questions about the data to obtain insights (make and test hypotheses)

Benchmark 9.2: Is able to find patterns and relationships such as correlation in a data set.

Benchmark 9.3: Understands the limitations of Correlation (does mean causation can lead to misunderstandings – Simpson’s Paradox)

Benchmark 9.5: uses correlations to draw conclusions

 

Communication and Presentation

Standard 10

Create meaningful visual displays of data.

 

Benchmark 10.1 Understands how different visual variables work better for different types of data

Benchmark 10.2 Understands how visualizations can be misleading

Benchmark 10.3 Can successfully use technology to  make meaningful visual displays of a large data set

Standard 11

Use good presentation skills to explain findings to an audience

 

Benchmark 11.1 Can successfully make a presentation to an audience using good eye contact and voice projection

Benchmark 11.2 Can successfully make a presentation to an audience with use of good visual display

Benchmark 11.3 Can successfully make a presentation to an audience with good explanation of with content related to Big Data

 

References

 

Cognitive Class, IBM, https://cognitiveclass.ai/. Accessed 23 Sept. 2017.

 

Common Core State Standards Mathematics, National Governors Association Center for Best Practices, 2010, www.corestandards.org/Math/Practice/. Accessed 23 Sept. 2017.

 

Cukier, Kenneth, and Viktor Mayor-Shonberger. Big Data. New York, Houghton Mifflin Harcourt, 2013.

 

Finlay, Steven. Predictive Analytics, Data Mining, Big Data, Myths, Misconceptions and Methods. New York, Palgrave Macmillan, 2014.

 

Kilback Phd, Brent. Personal interview.

 

Next Generation Science Standards: For States, By States (insert specific section title(s) being used if not referring to entirety of the NGSS), National Science Teachers Association, 2013, nextgenscience.org/. Accessed 23 Sept. 2017.

 

Siegel, Eric. Predictive Analytics. Hoboken New Jersey, John Wiley and Sons Inc, 2016.

 

Thomas, Rob. Big Data Revolution. John Wiley and Sons Ltd, 2015.

 

Tong Ph.D., Peter. Personal interview.