The Case for Modeling Part 2

The regular readers of this blog know that my database modeling book was cancelled when Microsoft pulled the rug out from under M and the repository. I did a lot of good writing about modeling in general, so I wanted to put some of it up here on the blog, since the cancellation is official now. This is the second part of Chapter 1.

Why modeling?

Much software is designed on napkins. A story, a mockup, a model is really all that you need to design how a piece of software works.

This book isn’t about user stories or screen mockups. It is about models, and modeling models. It is one part of the trifecta of software design, and it is a very important part.

What is a model? When you say model to my son, he thinks Legos. That’s not far from the truth. A model is the whole of the component parts that make up software. In the example in Chapter one, the model consisted of trucks, boxes, and the relationship between the two.

The further you get into the so called ‘enterprise’ world (meaning really really big software) the more having a consistent model makes it possible to design intelligent software. This is because a good model brings consistent terminology, an abstraction of ideas and an understandable message to the party.

Creation of a consistent terminology

Working on a model makes you decide what to call things in your domain. In the example in Chapter 1, Boxes could be called Containers. They aren’t. They are called Boxes.

Purists will say that it isn’t just the act of deciding what to call things that matters, but instead the decisions that are made. I disagree on matters of principle here. The simple act of deciding to call something a box and not a container is important. Discussion on what a container is matters. Later on, might you have a bucket? Should boxes and buckets both be containers?

This conversation is frustrating but important. This is the selection of nouns in your new language, and nouns matter. They participate in a common language for your business, both in everyday discussion and in software development

Language and software design

They are called programming ‘languages’ for a reason. Spoken languages have nouns and verbs, as do programming languages. The difference is that precision matters a lot more in programming languages, so the ‘dictionary’ of nouns needs to be exactly accurate.

In programming languages, you define your nouns and verbs. Nouns are class instances, and verbs are methods. In a language like Visual Basic, you define the members of the language as you go:

Public Class Box

Public String Destination

End Class

The system has a concept of a box, and the box has a concept of a destination. We didn’t call the box a container, we called it a box. The box has a destination. This is where the box is going. We didn’t call it an endpoint. An endpoint might mean something else.

Verbs work the same way. You might be able to route a box, for instance.

Public Class Box

Public Destination As String

Public Function Route() As Boolean

Return True

End Function

End Class

Now the Box knows a verb – you can route a box. It is an action word – something you can do. We are literally building a new language here – the hip term is a Domain Specific Language, or DSL. It is not a general language. It is specific to our needs.

There is a problem here, though. The Visual Basic code is something that is only used by the developers. Left to their own devices, programmers will come up with the language to describe your business all on their own, and then it won’t be a common language, used by the business and the IT staff.

The key is in a common language

The reason that domain languages are important to modeling is that the software comes out a lot better if both the business users and the software developers use the same words. I probably don’t even have to mention this, because most all of the fine readers of this book have had that conversation:

“But that process only occurs if the router has that one form.”

“Which form?”

“Oh, I don’t know. That one form with the routing information. “

“Which routing information?”

“I’m not sure. It’s a little different every time.”

A common language mitigates this conversation, because they know to say “destination” and not “that information that is needed for the route.” Things have a name. Those names reduce confusion, and cause better software development.

Description of ideas in an abstraction

Creation of a model provides an abstract space to work with the ideas in the system. The architect and the business analyst don’t have to depend on concrete examples with exceptions left and right to design the software, they can work in generalities.

Boxes can be routed, for example. The development team doesn’t have to deal with the fact that Box ABC123 went to Cleveland, and Box DEF456 went to Cincinnati. The exceptions in specific cases are important, and need to be modeled in their own right, but they don’t need to be in the general abstraction that is used to talk about the system in at the 1,000 foot view.

Making models understandable

Abstraction is important when talking to users. To define user stories, the business analyst needs to talk to users all along the process tree. Most users don’t have visibility into the whole process. The receiving guy doesn’t know how Box ABS123 got to Cleveland, but he probably understands that the box was routed. Even if he doesn’t, this idea can be explained, because it is understandable.

Development teams get entrenched in detail. This detail is a necessary as the software gets developed. With an understandable model, though, there is always somewhere to go when you need to return to ‘home base’ and cover the big picture again.

Images versus text

Usually, architects develop diagrams to show models, like Figure 1-4. This is fine, but it gets to be far to complex after the details is added. Few diagrams can be easily searched, organized or simplified. Since we are looking for an abstraction, simplicity is king.

clip_image002
Figure 1-4: A model diagram

An option for a model is to eschew the boxes and use text. Models developed in text can be more easily simplified, and are much more searchable.

Text has a lot of benefits when you are creating an abstraction. First, the viewer doesn’t get tied up by the lines. Often, relationships between items (represented by the lines) are far too complex to make a readable diagram. If you have a simple list of the items that are related, you can just read them rather than tracing them.

The other benefit of using text for the abstraction is that there are fewer blocks to broadening the abstraction. When you realize that the box actually has a relationship with a product, you can just add it, rather than finding the product box on the other side of the page and finding a place for the line.

The conversation between the architect and the developer

The third consideration of creating an abstraction is the conversation amongst the development team members. The problem domain is rarely known when a group of developers get together to solve a particular business problem. Usually a new feature is just that – new – and the terminology being used is foreign to all but the business people directly involved with the process.

Use of a model mitigates this issue by firming up the terminology right from the start. If all of the entities in the business domain are defined and named, the conversation can revolve around the abstraction rather than examples. Even if the business unit itself uses mixed terminology for parts of the business domain, the model will sort out the communication between the architect and the developers.

This conversation is rarely called out, but it a very important conversation indeed. In small and large projects, the big picture is usually in the architects head rather than written down. The architect tries to pass the feature specific information to the developer, shielding them from the big picture. The developer tries to build a focused feature without insight into the overall system. Hilarity ensues.

The construction of a well-understood model provides an abstraction that the architects can use to give the developers context. Context is very important in system development. It doesn’t get in the way of the detailed development work that the programmers are completing, but it does lead to the discovery of contextual errors. Developers can find problems that relate to the system as a whole and apply the fix holistically, rather than just locally in their own feature.

Forming an understandable message

Creating a model allows for communication to be constructed that actually makes sense – especially as it relates to change. In order to communicate to the users and developers about a system, it is necessary to clearly describe the focus points of the system. To clearly describe anything about the system, it is necessary to start with a model.

To some extent, this pales with the need to provide the next guy with a path. Documentation of systems at the design level is notoriously bad. It is never up to date, practically never complete. It is confusing and passes on no knowledge of the business domain. It is useless. Software modeling – done correctly – mitigates this considerably.

Communicating to the user

The user is the person for whom the software is written. It is important that the development team can communicate on a few different levels. First, the user needs to be able to pass on business functionality to the architect. Second, the development team needs to confirm functionality after consuming it into the rest of the system.

During the initial communication about the system, the first thing that needs to be completed is a software model. The entities in the model will provide a base level of communication about the rest of the software. As functionality is discussed, there will be no need to focus on specifics because the model is specifically understood, not just generally assumed.

As important is the review of software designs. When user stories and mockups are reviewed with the user prior to commencing development, there is an understood terminology behind the discussion.

Communicating to the developer

The conversation between the architect and the developer was discussed above, but it bears mention again in this context. Creation of a context for the developer to create features makes a huge difference.

Context isn’t the only issue, however. Users communicate with developers too - especially in agile environments. Testers communicate with developers. Managers talk to everyone. Wouldn’t it be nice if everyone could just use the same terminology? I certainly think so.

The model brings an accurate terminology to the whole team, especially when communicating with the developer. As with the other examples in this section, the terminology reduces errors and the time it takes to communicate ideas.

Communicating to the future

Probably the largest afterthought in system documentation is the next guy. After all, the development staff isn’t being paid to make version two easier, are they? They are being paid to write version one. Nonetheless, having to edit a system is a necessity. Either the technology will move on and the update wizard will come calling, or the business rules will change.

The best way to improve on systems documentation is to remove the problem of updating it on a regular basis. A quality software model will assist with that because it will update as the code is updated. Working from a model removes some of the need for comprehensive documentation.

The business case

It is tough enough to explain to the CIO why you need to upgrade to TFS 2010. Describing just how important it is to totally change the development methodology of the organization even few years is really a problem.

In Software Language Engineering, Anneke Kleppe points out that we need to upgrade our development methodologies now because it’s the shiny new thing, but because we are doing more with less. Dijkstra (1972) called it the Software Crisis, and it is getting worse.

Taming complexity

Year after year, software developers are asked to tackly more complexity. Kleppe describes an environment where it used to be enough to have ‘Hello World’ show up on the screen. With the advent of GUI, consider all of the technologies that have to be mastered to get it on the screen - CSS, windowing, threading perhaps, markup, the list goes on.

What’s more the additional computing power has led to additional expectations of users. Now you are expected to deliver identity and membership, with ‘Hello Mr. Sempf’. Or even delivering more, localization and personalization, with ‘Good evening Mr. Sempf.’

Modeling software tames complexity. Part of the growing complexity of software is that no one person knows the whole system. It is a common story: features need to be added, but the UI guy is on vacation and now there are 24 business rules in the RDBMS.

Accurate models make for understandable software. While nothing makes software easy, not having an accurate model will certainly make it harder to understand.

Enhancing communication

From the project management perspective, nothing makes life better than communication. Weather agile or waterfall, trust and transparency are key. Both of these characteristics require communication. Communication is hard when half of the people at the table calls a container of items a ‘box’ and half of them call it a ‘carton.’

Modeling software enhances communication. Even if all that is done is the simple act of validating the common language it will make talking about the project a lot easier. I hope that you would take it even further than that.

As a project grows, features are added. To build a system of language that assures that the concepts in the application are referred to the same way throughout time. This is essential to good communication, and ease of updating.

With real software modeling, and a metadata implementation, the model follows the software. This means that throughout the lifecycle, the semantics are an intrinsic part of development through the domain specific languages provided in the model.

The Case For Modeling Part 1

The regular readers of this blog know that my database modeling book was cancelled when Microsoft pulled the rug out from under M and the repository. I did a lot of good writing about modeling in general, so I wanted to put some of it up here on the blog, since the cancellation is official now. This is the first part of Chapter 1.

While I don’t want to spend a lot of time on a history lesson, some background information is necessary to make sure we are all on the same page. If you, the reader, have spent a lot of time watching the world of modeling pass us all by, then you might be able to safely skip the rest of this chapter.

What led to this

The rest of you: stick around. It’s a story of wonder and woe. You’ll laugh, you’ll cry. It’s better than Cats. You’ll read it again and again.

In all seriousness, the path that led to Microsoft’s latest software modeling effort is very interesting. It cuts to the core of business computer programming - how do I model my business process in code?

It is a problem that developers have faced since we were punching tape. In the end, it is all zeros and ones, so how do we make tools that make it easier to describe moving a box from point A to point B? That’s the heart of software modeling.

To break it down, let’s start with a description of the actual problem we face - moving from the model to the code. Then we will inventory some of the tools that have been tried over the years - many of which have led to SQL Modeling. Finally, we’ll look over CASE and the 800 pound gorilla: UML.

A description of the problem

Imagine you are the new CIO of a distribution company. Your company, let’s call it M Logistics, has no software. Yes, that’s what I said. They have no software at all. They use this weird stuff made out of ground up trees. They call it paper. You call it inefficient.

You are going to need to get some software in there, and you are going to need to do it fast. There are no off-the-shelf solutions available (more on that later) so you are going to have to build it.

Get out the whiteboard. Starting with a design is the best way, most will admit. There are questions that need to be asked.

What goes into a distribution company, information wise? The truck shows up with boxes. You unload the boxes, sort then in different ways perhaps. Then you put them in different trucks and send them off to a store.

“Fair enough,” you might think. “We will start with a truck containing boxes.” The drawing probably looks like Figure 1-1.

clip_image002_thumb
Figure 1-1 The whiteboard - truck contains boxes.

Great! Good start. Over the next few hours you map out the rest of the relationships between objects in your scope - or your domain model - and the whiteboard will be filled with goodness.

Oh, but wait. You can’t ship the whiteboard to the central office, so you will probably need to make a document. A word processor for the text will solve that problem, and some diagramming software for the model, like Figure 1-2.

clip_image004_thumb
Figure 1-2: The document - truck contains boxes

Nice work! Now you can package it up and send it to Central Office. All we have to do now is wait for approval.

Moving to code

The day has come and approval from Central Office has arrived (along with your new business cards, sweet!) It is time to start coding. Fire up Visual Studio, reference the carefully reviewed document, and start sketching out the domain model like the following listing.

namespace Warehouse
{
    public class Truck
    {
        public Box Boxes { get; set; }
    }
    public class Box
    {
    }
}

That’s a good start. We can work on the rest later. Now we all know that the C# class isn’t everything, we need a database to back it up. It is time to fire up the RDBMS of choice and model up a database. Depending on your tool, it should look like Figure 1-3

clip_image006_thumb
Figure 1-3: The database - trucks contain boxes

We are certainly getting somewhere. We have a domain model. We have a database. We … need some way to hook them together. Alright, enter your ORM of choice, or maybe roll up your sleeves and roll your own.

Eventually the decision is to use Entity Framework. Fine - still you have quite a few lines of code, mapping the database to the domain model:

  public partial class Truck : EntityObject
    {
        public static Truck CreateTruck(global::System.Int32 id)
        {
            Truck truck = new Truck();
            truck.Id = id;
            return truck;
        }

        [EdmScalarPropertyAttribute(EntityKeyProperty=true, IsNullable=false)]
        [DataMemberAttribute()]
        public global::System.Int32 Id
        {
            get
            {
                return _Id;
            }
            set
            {
                if (_Id != value)
                {
                    OnIdChanging(value);
                    ReportPropertyChanging("Id");
                    _Id = StructuralObject.SetValidValue(value);
                    ReportPropertyChanged("Id");
                    OnIdChanged();
                }
            }
        }
        private global::System.Int32 _Id;
        partial void OnIdChanging(global::System.Int32 value);
        partial void OnIdChanged();
    
        [EdmRelationshipNavigationPropertyAttribute("Model1", "TruckBox", "Box")]
        public EntityCollection<Box> Boxes
        {
            get
            {
                return ((IEntityWithRelationships)this).RelationshipManager.
GetRelatedCollection<Box>("Model1.TruckBox", "Box"); } set { if ((value != null)) { ((IEntityWithRelationships)this).RelationshipManager.
InitializeRelatedCollection<Box>("Model1.TruckBox", "Box", value); } } } }

And now, finally, we can get around to actually writing the software. That is a lot of steps. Something is rotten in Denmark, methinks.

The gap is wider than you think

There are two things that are clearly wrong here.

First, you had to concern yourself with object/relational mapping. Aren’t we past that? Isn’t the concept well enough understood that we don’t need to put the code in books anymore?

Second, you touched the basic model five times. That’s far too many. Why can’t we use the model we start with as the domain model, the database and the ORM?

Mapping code to models

The first of these problems, object / relational mapping or O/RM, has been solved many times. We saw above how it is a problem that has to be solved every single time we build a data driven piece of software.

The solutions range from open source products with a wide community following to expensive rarely used commercial products that solve very specific issues.

The core problem is that gap

The reason that there are so many different solutions is because of the range of that gap between the objects and the database. The example at M Logistics is a fairly straightforward one, but there are numerous more sophisticated examples.

  • In a many to many situation, should the ORM model the joining table? What if it has data associated with it?
  • What happens if there are several different databases?
  • Also, sometimes some of the data silos in an entity model aren’t databases at all.
  • We haven’t even talked about the pattern of the domain model. Repository? Lazy loading?

The more issues in an O/RM the more sophisticated the software doing the mapping must be. The more sophisticated the software is the fewer people will want to use it. This leads to fragmentation in the software market. It also leads to confused users.

Various solutions

There are all kinds of object / relational mappers out there. Eric Nelson has a great inventory in his entity framework talk, which points out the following items:

  • NHibernate - a rewrite of Hibernate, which is originally an open source Java tool. Well used and mature.
  • EntitySpaces - a commercial tool that moves a lot of SQL functionality into the C# code.
  • LLBLGenPro - another commercial tool that generates code for you to do the mapping.
  • DevForce - a newer O/RM mostly for the web space.
  • XPO - a DevExpress product, eXpress Persistant Objects. It is a complete abstraction layer.
  • Lightspeed - a Mindscape product that has a stunning visual interface.
  • Open Access - the Telerik entry that depends on convention to avoid reflection.

This list is just in the Microsoft development space, too. There are lots of O/RMs for Java and other platforms. NHibernate, in fact, is a port of Hibernate, the well-known Java O/RM.

Microsoft’s tries

One would imagine that since all of these products are developed to fix a weakness in the Microsoft development space that eventually Microsoft would step in and fix it themselves. Well, they have tried. In fact, they have been trying all along, but let’s just start with the .NET era.

Typed datasets were the first .NET implementation of an ORM back in 2002, and they are still pretty cool for small projects. They effectively represented an in-memory copy of the data that would be committed to the RDB on demand. They became unwieldy for large projects, though.

ObjectSpaces was going to be the ‘next big thing’ as part of .NET 1.1 and 2.0 but never saw the light of day. I am not sure what happened there. Maybe we can get the scoop sometime.

The Microsoft Business Framework was, I think part of what would be the Dynamics group and grew out of the SharePoint models. I think. Anyway, it doesn’t matter because it didn’t ship either.

Then there was WinFS. I you hadn’t heard about that I’ll tell you over a beer sometime. Never shipped.

Finally, in 2007, lambda calculus was added to the CLR and LINQ was introduced. LINQ did actually ship, and it is currently the best way to query a model of a database in your code for the whole Microsoft stack. This book isn’t about LINQ, but if you don’t know it you should learn it. It is a very powerful technology.

The nice thing about LINQ is that it is datasource agnostic – exactly what we needed. It can talk directly to a SQL Server database, yes, but it can also talk to another object model (like a normalized Data Access Layer) or an Entity Model built in EDMX.

So that is where we leave the history lesson on O/RM. The next significant move by Microsoft is SQL Modeling, which is the topic of the rest of the book. First, though, let’s talk about the second topic – modeling the software.

Modeling with CASE and the gorilla in the room

A lot of people have tried to solve the modeling problem too. Remember how the model got redone 5 times? Most people agree that that needs to get better.

The Unified Modeling Language

The 800 pound gorilla in the room is the UML. A gang of four people who have forgotten more about software design than I will ever know got together one day. They left their egos at the door and took their four separate design patterns and built this system for diagramming software.

This system is the Unified Modeling Language or UML, and it is the most complete, most comprehensive possible way to describe how a piece of software works. I can’t even do it justice here, so I will just direct you to Martin Fowler’s great books on the subject – especially UML Distilled. I don’t leave home without it.

But the UML has two problems. First, it isn’t code. Since it is so very comprehensive, it can be converted to code, but you have to completely fill out the model, which usually takes longer than just writing the code! (As an aside, I often write the domain model in the language of choice, and then convert it to the UML.)

The other problem with the UML is that it doesn’t inherently have any knowledge of the database. The project still has the O/RM problem from the last section. Though the UML is expressive enough to infer a schema, it is sometimes too expressive, and the schema is only as good as the tools.

Many tools have attempted to solve this, from the simple Microsoft Visio to the mighty IBM Rational Rose. None of them do it well, because that is not really what the UML is designed for – it is not an O/RM. You are forced to write yet another model, like the venerable Object Role Model, or even just an Entity Model, and then manually do the mapping.

There is clearly room for a simpler yet better modeling pattern here. We may have created a system in the UML that has become too unwieldy to use. Yet just writing on whiteboards prevents remote team access, and is tough to distribute for review.

<whatever> Driven Design

A current trend is to use part of the architectural patterns as the design medium. For instance, Test Driven Design, or TDD, is basically the use of unit tests to describe expected outcomes of user stories, and then code until the tests pass, and then refactor.

TDD is a great pattern, and I use it a lot. However, it is tough to review, and nigh impossible to generate documentation from for a formal document. It almost has to be used in conjunction with something else.

Another alternative is Design by Contract, sometimes called Contract Driven Design or CDD. It works well in large teams, can be used with TDD, and can generate the UML if you need it to. It has a steep learning curve though, and is often overkill for large projects.

There are a host of others; you probably already have your favorite. Few of them do the O/RM well, and most of them are too tough to use for 10,000 foot view types of modeling.

Especially for forms over data types of applications (which constitutes well over half of the projects we all develop) there needs to be a new solution. Somewhere there is a solution that is easy to learn, distribute, and review, and that translates well to code and the database. SQL Modeling is that solution.

Bill Sempf

Husband. Father. Pentester. Secure software composer. Brewer. Lockpicker. Ninja. Insurrectionist. Lumberjack. All words that have been used to describe me recently. I help people write more secure software.

profile for Bill Sempf on Stack Exchange, a network of free, community-driven Q&A sites

MonthList