What is Data Modeling?

Amruta Bhaskar
May 18, 2021
0 comment(s)
2361 Views

Data modeling is the process of creating a data model for the data to be stored in a database. This data model is a conceptual representation of Data objects, the associations between different data objects, and the rules. Data modeling helps in the visual representation of data and enforces business rules, regulatory compliances, and government policies on the data. Data Models ensure consistency in naming conventions, default values, semantics, and security while ensuring quality of the data.

The Data Model is defined as an abstract model that organizes data description, data semantics, and consistency constraints of data. The data model emphasizes on what data is needed and how it should be organized instead of what operations will be performed on data. Data Model is like an architect's building plan, which helps to build conceptual models and set a relationship between data items.

Data models are built around business needs. Rules and requirements are defined upfront through feedback from business stakeholders so they can be incorporated into the design of a new system or adapted in the iteration of an existing one.

Data can be modeled at various levels of abstraction. The process begins by collecting information about business requirements from stakeholders and end users. These business rules are then translated into data structures to formulate a concrete database design. A data model can be compared to a roadmap, an architect’s blueprint or any formal diagram that facilitates a deeper understanding of what is being designed.

Data modeling employs standardized schemas and formal techniques. This provides a common, consistent, and predictable way of defining and managing data resources across an organization, or even beyond.

Ideally, data models are living documents that evolve along with changing business needs. They play an important role in supporting business processes and planning IT architecture and strategy. Data models can be shared with vendors, partners, and/or industry peers.

There are three main types of data models that organizations use. Each type of data model serves a different purpose and has its own advantages.

Conceptual data model

A conceptual data model is a visual representation of database concepts and the relationships between them. Typically, a conceptual data model won’t include details of the database itself but instead focuses on establishing entities, characteristics of an entity, and relationships between them. These data models are created for a business audience, especially key business stakeholders.

2.1Logical data model

A logical data model is often the next step after conceptual data modeling. This data model further defines the structure of the data entities and sets the relationships between them. The attributes of each data entity are clearly defined. Usually, a logical data model is used for a specific project since the project would have certain requirements for the structure. The model can still be integrated into other logical models to provide a better understanding of the scope. For this level of data modeling, the normalization process is applied to 3NF, but no secondary or primary key is needed.

2. Physical data model

A physical data model is used for database-specific modeling. Just like with the logical model, a physical model is used for a specific project but can be integrated with other physical models for a comprehensive view. The model goes into more detail with column keys, restraints, and primary and foreign keys. The columns will include exact types and attributes in this model, and the data should be normalized as well. A physical model designs the internal schema.

As a discipline, data modeling invites stakeholders to evaluate data processing and storage in painstaking detail. Data modeling techniques have different conventions that dictate which symbols are used to represent the data, how models are laid out, and how business requirements are conveyed. All approaches provide formalized workflows that include a sequence of tasks to be performed in an iterative manner. Those workflows generally look like this:

Identify the entities. The process of data modeling begins with the identification of the things, events or concepts that are represented in the data set that is to be modeled. Each entity should be cohesive and logically discrete from all others.
Identify key properties of each entity. Each entity type can be differentiated from all others because it has one or more unique properties, called attributes. For instance, an entity called “customer” might possess such attributes as a first name, last name, telephone number and salutation, while an entity called “address” might include a street name and number, a city, state, country and zip code.
Identify relationships among entities. The earliest draft of a data model will specify the nature of the relationships each entity has with the others. In the above example, each customer “lives at” an address. If that model were expanded to include an entity called “orders,” each order would be shipped to and billed to an address as well. These relationships are usually documented via unified modeling language (UML).
Map attributes to entities completely. This will ensure the model reflects how the business will use the data. Several formal data modeling patterns are in widespread use. Object-oriented developers often apply analysis patterns or design patterns, while stakeholders from other business domains may turn to other patterns.
Assign keys as needed, and decide on a degree of normalization that balances the need to reduce redundancy with performance requirements. Normalization is a technique for organizing data models (and the databases they represent) in which numerical identifiers, called keys, are assigned to groups of data to represent relationships between them without repeating the data. For instance, if customers are each assigned a key, that key can be linked to both their address and their order history without having to repeat this information in the table of customer names. Normalization tends to reduce the amount of storage space a database will require, but it can at cost to query performance.
Finalize and validate the data model. Data modeling is an iterative process that should be repeated and refined as business needs change.

Data modeling might seem like an abstract process, far removed from the data analytics projects that drive concrete value for the organization. But data modeling is necessary foundational work that not only allows data to more easily be stored in a database but also positively impacts data analytics, too.

These are some of the key benefits of data modeling and why organizations will continue to use data models:

Higher quality data-The visual depiction of requirements and business rules allow developers to foresee what could become large-scale data corruption before it happens. Plus, data models allow developers to define rules that monitor data quality, which reduces the chance of errors.
Increased internal communication about data and data processes- Creating data models is a forcing function for the business to define how data is generated and moved throughout applications.
Reduced development and maintenance costs- Because data modeling surfaces errors and inconsistencies early on in the process, they are far easier and cheaper to correct.
Improved performance- An organized database is a more efficiently operated one; data modeling prevents the schema from endless searching and returns results faster.

SkillRary