CSCI946 - Big Data Analytics Week 1

Sources of Big Data

Big Data is generated from various sources, including:

  • Mobile sensors
  • Social media platforms
  • Smart grids
  • Video rendering
  • Medical imaging
  • Genetic data
  • Surveillance cameras
  • Geophysical data

Understanding Big Data: It’s Not Just About Size

Big Data isn’t defined solely by its size; it’s characterized by several key properties, often referred to as the “Vs”:

  • Velocity: The speed at which data is generated and processed.
  • Volume: The sheer amount of data produced.
  • Variety: The different types of data (structured, semi-structured, unstructured).
  • Value: The business value that can be derived from the data.
  • Veracity: The reliability and accuracy of the data.
  • Variability: The changing nature of data, including inconsistencies and peaks in data flow.

Structures of Big Data

Big Data can be categorized based on its structure, which influences how it is processed and analyzed:

  1. Unstructured Data:
    • Description: Data without a predefined model or organization. It often includes large amounts of text but can also encompass images, videos, and social media content.
    • Examples: Emails, videos, social media posts, satellite images, presentations.
  2. Quasi-Structured Data:
    • Description: Data with some level of organization but lacking a fixed schema. It often includes metadata or tags to provide a basic structure.
    • Examples: Web server logs, network logs, event logs.
  3. Semi-Structured Data:
    • Description: Data that doesn’t reside in a traditional database but still has some organizational properties, often using tags or markers to enforce a hierarchical structure.
    • Examples: XML files, JSON documents, emails (with metadata such as sender, receiver, and date).
  4. Structured Data:
    • Description: Highly organized data, typically stored in databases with a predefined schema. It is arranged into rows and columns, making it easy to enter, query, and analyze.
    • Examples: Relational databases, spreadsheets, SQL tables.

This classification is essential for determining the appropriate tools and techniques for processing and analyzing data based on its structure.

Business Intelligence vs. Data Science

  • Business Intelligence (BI): Focuses on analyzing and exploring past data to make informed business decisions.
  • Data Science: Emphasizes predicting future trends and patterns by analyzing current and historical data.

Lecture 1




Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • CSIT882 - Data Management Systems
  • CSCI927 - Service-Oriented Software Engineering Week 3
  • CSCI946 - Big Data Analytics Week 2
  • CSCI933 - Machine Learning Algorithms and Applications
  • CSCI927 - Service-Oriented Software Engineering Week 2