Data Glossary 📖

This comprehensive Data Glossary is your essential guide for navigating the complex world of data and analytics. It demystifies key data concepts, tools, and practices, offering clear, basic, concise explanations. This is meant to empower you with the knowledge to make data-driven decisions and enhance your data knowledge. Please refer to the bookshelf if you want to dive deeper into these topics. This glossary is a constant work-in-progress. The world of data is always and rapidly evolving with new terms popping up all the time, so feel free to use it regularly to stay up to date.

Data

Data is a collection of facts, figures, and information from which conclusions can be drawn or decisions can be made. In a more technical sense, data refers to qualitative or quantitative attributes of a variable or set of variables. Data is typically collected through observations, measurements, research, or analysis, and can come in various forms such as numbers, words, images, videos, or sounds.

In the context of information technology and computing, data is often referred to as the raw information or inputs that are processed by a computer.

In the business world, data is crucial for making informed decisions, understanding market trends, customer behavior, and for improving products, services or operations. In the age of AI, data is being increasingly used to augment a business's operations and services in various fields of predictive analytics and AI solutions to customers.

Data Contract

From a technical perspective, a data contract formally defines details like data schema, extraction cadence, transport protocols, and support contacts. For example, a contract could specify that data from system X will be extracted daily at 3 AM UTC via REST API in JSON format providing user ID, name, email, and purchase history fields. The contract names the engineer and product manager to contact on each side if issues arise. Data contracts should be standardized and programmatically queryable to enable integration into data pipelines and workflows.

From a business perspective, a data contract sets clear expectations between data consumers and producers by stating what data will be shared, how often, and by what method. Having data contracts helps ensure teams get the reliable, high quality data they need to drive business outcomes. If formal agreements seem too strict, informal discussions can also set expectations between providers and consumers. This helps providers understand downstream needs so consumers get the right data at the right time.

Data Product

From a technical perspective:
A data product combines a data domain with product thinking, serving as the core unit within a Data Mesh framework. It encapsulates a set of related, identifiable data managed with consistency and a measure of quality and accuracy, aiming to be discoverable, understandable, trustworthy, and interoperable. At its architectural core, a data product features interoperable interfaces like APIs and pipelines for data consumption and ingestion, ensuring data security, discoverability, and governance through published metadata and federated governance models. It operates within clear boundaries, with an empowered owner overseeing its evolution, contributing to an ecosystem where data is easily shared, consumed, and governed. This approach demands data engineers to focus on creating data products that are easy to build, deploy, secure, and manage, leveraging formal contracts, versioning, and security protocols to facilitate seamless data interoperability and to enhance the data product's value chain within the enterprise.

From a business perspective:
A data product represents a strategic asset in the realm of data-driven decision-making, blending specific business needs with data insights to deliver tangible value. It is designed to make data easily accessible, shareable, and governable across an organization, thus fostering a culture of informed decision-making and agility. By marrying data domains—cohesive sets of relevant data—with product thinking, data products become key to unlocking potential value, facilitating better customer experiences, and driving innovation. For business leaders, the focus on data products within a Data Mesh architecture emphasizes the importance of treating data as a product: one that requires ownership, clear management, and a vision for continuous improvement. This approach ensures that data assets are not just stored and managed, but actively contribute to the organization's strategic goals, making data a cornerstone of competitive advantage and operational excellence.

If you want to read more on the topic, here is a detailed explanation

Data Warehouse

From a technical standpoint, a data warehouse is a centralized repository designed to store integrated data from multiple, often disparate, sources. It's structured specifically for query and analysis, supporting business intelligence activities including analytics, reporting, and data mining.

From a business perspective, it's a digital storage system that allows businesses to consolidate all their data from different sources into one central location. It's designed for efficient analysis and reporting, aiding in informed decision-making.

Database

A database is a structured collection of data stored in a computer system. It is designed to efficiently store, retrieve, manage, and manipulate data. Databases are fundamental to modern computing and are managed through a Database Management System (DBMS)

Businesses use this digital system to store and organize business data. It is a crucial tool for managing information in various forms, such as customer records, transactions, and inventory data.

Note that generally this term is referring to relational databases which organizes data in a highly structured way. Data in unstructured format are usually stored in NoSQL databases or simply object stores on the cloud.

The views expressed on this website are mine and do not reflect the views of my employer. Some links on this site are affiliate links, and I may earn a commission if you click through and make a purchase. This is at no cost to you.