Passive Data vs “Active” Data

Home | Blog | CSLA .NET | CSLA Store

21 December 2004

In recent SOA related posts I’ve alluded to this, but I wanted to call it out directly. SOA is founded on the principal of passive data. Dima Maltsev called this out in a recent comment to a previous blog entry:

<?xml:namespace prefix = o ns = “urn:schemas-microsoft-com:office:office” />

In one of the posts <?xml:namespace prefix = st1 ns = "urn:schemas-microsoft-com:office:smarttags" />Rockford is saying that SOA doesn't address the need of moving data and also the logic associated with that data. I think this is one of the main ideas of SOA and I would rather state it positively: SOA wants you to separate the logic and data. Yes, absolutely. SOA is exactly like structured or procedural programming in this regard. All those concepts we studied way back in the COBOL/FORTRAN days while drawing flowcharts and data flow diagrams are exactly what SOA is all about. Over the past 20-30 years two basic schools of thought have evolved. One ways that data is a passive entity, the other attempts to embed data into active entities (often called objects or components). Every now and then someone tries to merge the two concepts, providing a scheme by which sometimes data is passive, while other times it is contained within active entities. Fowlers Data Transfer Objects (DTO) design pattern and my CSLA .NET are examples of recent efforts in this space, though we are hardly alone. But most people stick with one model or the other. Mixing the models is complex and typically requires extra coding. Thus, most software is created using a passive data model, or an active entity model alone. And the reality is that the vast majority of software uses a passive data model. In my speaking I often draw a distinction between data-centric and object-oriented design. Data-centric design is merely a variation on procedural programming, with the addition of a formalized data container of some sort. In some cases this is as basic as a 2-dimensional array, but in most cases it is a RecordSet, ResultSet, DataSet or some variation on that theme. In .NET it is a DataTable (or a collection of DataTables in a DataSet). The data-centric model is one where the application goes to the database and retrieves data. The data in the database is passive, and when the application gets it, the data is in a container – say a DataTable. This container is passive as well. I hear you arguing already. “Both the database and DataTable make sure certain columns are numeric!”, or “They both make sure the primary key is unique!”. Sure, that is true. Over the past decade some tiny amount of intelligence has crept into our data containers, but nothing really interesting. Nothing that makes sure the number in the column is a good number – that it is in the right range, or that it was calculated with the right formula. Anything along the lines of validation, calculation or manipulation of data occurs outside the data container. That outside entity is the actor, the data container is merely a vessel for passive data. And that’s OK. That works. Most software is written this way, with the business logic in the UI or a function library (or maybe a rules engine), acting against the data. The problem is that most people don’t recognize this as procedural programming. Since the DataSet is an object, and your UI form is an object, the assumption is that we’re object-oriented. Thus we don’t rigorously apply the lessons learned back in the FORTRAN days about how to organize our code into reusable procedures and organize those procedures into function libraries. Instead we plop the code into the UI behind button clicks and key press events. Any procedural organization is a token effort, unorganized and informal. Which is why I favor an active entity approach – in the form of object-oriented business entity objects. In an active entity model, data is never left out “on its own” in a passive state. Well, except for when it is in the database, because we’re stuck using RDBMS databases and the passive data concept is so deeply embedded in that technology it is hopeless… But once the data comes out of the database it is in an active entity. Again, this is typically an entity object, designed using object-oriented concepts. The primary point here is that the data is never exposed in a raw form. It is never passive. Any external entity using the data can count on the data being validated, calculated and manipulated based on a consistent set of rules that are included with the data. In this model we avoid putting the logic in the UI. We avoid the need to create procedures and organize them into function libraries like in the days of COBOL. Instead, the logic is part and parcel with the data. They are one. Most people don’t take this approach. Historically it has required more coding and more effort. With .NET 1.x some of the overhead is gone, since basic data binding to objects is possible. However, there’s still the need to map the objects to/from the database and that is certainly extra effort. Also, the data binding isn’t on a par with that available for the DataSet. In .NET 2.0 the data binding of objects will be on a par (or better than) binding with a DataSet, so that end of things is improving nicely. The issue of mapping data to/from the database remains, and appears that it will continue to be an issue for some time to come. In any case, along comes SOA. SOA is all about active entities sending messages to each other. When phrased like that it sounds almost object-oriented, but don’t be fooled. The active entities are procedures. Each one is stateless, and each one is defined by a formal contract that specifies the name of the procedure, the parameters it accepts and the results it returns. Some people will argue that they aren’t stateless. Indigo, for instance, will allow stateful entities just like DCOM and Remoting to today. But we all know (after nearly 10 years experience with DCOM) that stateful entities don’t scale and don’t lead to reliable systems. So if you really want to have an unscalable and unreliable solution then go ahead and use stateful designs. I’ll be over here in stateless land where things actually work. :-) The point being, these service-entities are not objects, they are procedures. They accept messages. Message is just another word for data, so they are procedures that accept data. The data is described by a schema – often an XSD. This schema information has about the same level of “logic” as a database or DataSet – which is to say it is virtually useless. It can make sure a value is numeric, but it can’t make sure the number is any good. So Dima is absolutely correct. SOA is all about separating the data (messages) from the logic (procedures aka services). Is this a good thing? Well that’s a value judgement. Did you like how procedural programming worked the first time around? Do you like how data-centric (aka procedural) programming works in VB or Powerbuilder? If so, then you’ll like SOA – at least conceptually. Or do you like how object-oriented programming works? Do you appreciate the consistency and centralized nature of having active entities that wrap your data at all times? Do you feel that the extra effort of doing OO is worth it to gain the benefits? If so, then you’ll probably balk at SOA. Note that I say that people happy with procedural programming will conceptually like SOA. That’s because there’s always the distributed parallel thing to worry about. SOA is inherently a distributed design approach. Yes, you might configure your service-entities to all run on the same machine, or even in the same process. But the point of SOA is that the services are location-transparent. You don’t know that they are local, and at some point in the future they might not be. Thus you must design under the assumption that each call to a service is bouncing off two satellites as it goes half-way around the planet and back. SOA (as described by Pat Helland at least) is also inherently parallel. Calls to services are asynchronous. Your proxy object might simulate a synchronous call, so you don’t even realize it was async. But a core idea behind SOA is that of transport-transparency. You don’t know how your message is delivered. Is it via web services? Is it via a queue? Is it via email? You don’t know. You aren’t supposed to care. But even procedural aficionados may not be too keen to program in a distributed parallel environment. The distributed part is complex and can be slow. The parallel part is complex and, well, very complex. My predication? Tools like Indigo will avoid the whole parallel part and it will never come to pass. The idea of transport-transparency (or protocol-transparency) will go the way of the dodo before SOA ever becomes truly popular. The distributed part of the equation can’t really go away though, so we’re kind of stuck with that. But we’ve dealt with that for nearly 10 years now with DCOM, so it isn’t a big thing. In the end, my prediction is that SOA will fade away as people realize that the reality is that we’ll be using it to do exactly what we did with DCOM, RMI, IIOP and various other RPC technologies. The really cool ideas – the ones with the power to be emergent – are already fading away. They aren’t included (at least in the forefront) of Indigo, and they’ll just slip away like a bright and shining dream and we’ll be left with a cool new way to do RPC. And RPC is fundamentally a passive data technology. Active entities (procedures) send passive data between them – often in the form of arrays or more sophisticated containers like a DataSet or perhaps an XML document with an XSD. It is all the same stuff with tiny variations. Like ice cream – chocolate, vanilla or strawberry, it is all cold and fattening. So welcome to the new world, same as the old world.