Wednesday, January 25, 2012

Data Management for the Rest of Us

Data Management for the Rest of Us
The Three W’s of Data Management

As companies increasingly treat data as a valuable asset, Data Management practices have become more visible in the enterprise. Having specialized in integration for a long time, I have developed a very integration-specific view of the information technology world. Over time, I would observe data management activities in various companies, and casually think with relative indifference, what is so special about data management? You have a large, complex database, but nothing interesting happens until the data actually moves some place. To my thinking, data is simply one of the raw materials used in the process of making the business work. It’s the applications, fueled by data moving through them, that implements the business processes. And ultimately, IT serves no other purpose than to automate business processes.

In my most recent client engagement, I was hired as a data integration expert, but because of staff changes, I soon moved into a more general data management and architecture position. I consulted with various technology groups within the company to counsel on topics like data integration, operational data stores, data warehouses, metadata management, and other data management disciplines. Our principles were guided by the Data Management Association (www.DAMA.org), and were also influenced greatly by Bill Inmon’s Corporate Information Factory strategy for Data Warehousing (www.inmoncif.com).

If you’re a data integration specialist like me, finding yourself amidst data management people is a new and unsettling experience. But over the last months, having sifted through many concepts across the data management disciplines, I have been able to organize the core ideas into distinct categories, and at the same time identify the position of integration technologies within the data management landscape.

3 W’s of Data Management

Data Management concepts can be organized by asking ourselves three basic questions about data in our enterprise. The questions posed are the What, Where, and the How of Data Management. What is Our Data? Where is Our Data Stored? How Does Data Move Through the Enterprise? Posing these questions and contemplating the answers leads to an understanding of the major categories concerning the Data Management profession. In addition, certain disciplines cut across these categories. As you answer the What, Where, and How of data, you must also consider the required degrees of security, data quality, governance, and metadata management, commensurate with business needs.

What is Our Data?

What is our business data? We need to establish commonly understood terms and definitions, and their relationships. This information should be centralized into a common business glossary, readily accessible to both business people and technologists. The glossary becomes the common vocabulary by which accurate communications take place across the organization. When the business requests new features in a software system, the common vocabulary is used to convey the desired features. Business and IT need a shared understanding of the terms uses, and the glossary provides this.

Beyond the glossary is a much more in-depth and formal study of the information in the enterprise. Information needs to be modeled using standard technologies that have evolved for this purpose. This typically leads to entity-relationship models constructed using tools such as ER Studio or ERwin. A company will typically take a top-down approach, first identifying subject areas for the business information. These are the core information categories which are the subjects of business processing. In banking, for example, subject areas would include Customer, Account, and Loan. Data Modelers then dive into the details, and proceed with building formal ER models of the information. Modeling starts at the conceptual level by defining entities and their relationships. From there, detail is added to arrive at logical models, which includes attributes for the entities. Lastly, physical modeling takes the logical model to create an actual implementation on a database like Oracle, SQL Server, or MySQL. As is common in the long-lived enterprise, legacy applications and databases have existed for a long time, so the newer models are unfortunately more a reflection of how we want the data to be defined and organized, rather than how it actually is defined and organized. Rationalizing the existing systems against the desired models is usually a long term task that seemingly is never finished.

Data Modelers, Data Architects, and Business Analysts play key roles in the “what” of data management.

Where is Our Data Stored?

After understanding what our data is, we need to know where the data is stored across the enterprise. The “where” of data management demands that we understand all the data stores used throughout the enterprise. These technologies span relational database management systems, document repositories, email systems, and even basic file systems. The front-line of systems process the core business transactions that essentially run the business. Data usually proceeds from there into centralized repositories, warehouses, and data marts where it is used to manage and plan the business. These repositories can be internal, hosted elsewhere, or even cloud-hosted. It’s important to recognize that data has its own lifecycle. For any given type of information, we need to identify which systems create it, and which can update, read, and delete it. Data in these repositories is subject to certain management activities like access control, encryption, replication, failover, backup, and archival.

Enterprise architects, data architects and database administrators play key roles in the “where” of data management. Enterprise and data architects map out the landscape of systems and applications within the enterprise to help align with business goals. Database administrators are the custodians of data as it sits in a repository, responsible for applying the proper management principles to guarantee performance, security, availability, and preservation.

How Does Data Move Through the Enterprise?

The "what" and "where" of data management largely deal with "data at rest" in transactional data stores, operational data stores, warehouses, content management systems, and data marts.  Data's alter-ego in the enterprise is "data in motion", which shows us how data moves through the enterprise. We need to understand the paths taken by data as it flows through the various systems, ultimately leading to products being shipped, payments being received and booked into financial systems, and management reports being generated? Integrating systems is a practice in existence since companies have had multiple systems. Systems interact using a wide array of technologies, including messaging, services, file transfer, and shared databases, just to name a few. From a data management perspective, knowing the flow of data through the systems is critical to understanding the state of the data and the risks it is subjected to. This study also unlocks the potential of data, pointing to new uses in support of the business.


Understanding the data movement in the enterprise leads to a broader understanding of the data management world with respect to the different states of data. Data management activities like modeling, glossary development, and operational management have historically focused on data at rest, and have had little influence on the world of data in motion. Integration specialists along with a mature set of software tools have been successfully managing data in motion for over a decade. Yet the two worlds are still quite different, and much work is yet to be done to bring the formalisms from traditional Data Management disciplines for data at rest over to the world of data in motion. A comprehensive Data Management policy must unify data at rest with data in motion.

Managing Metadata

At the heart of a Data Management program is managing the byproducts of the activities described here. Metadata is information about other data. For example, data residing in a database is described by the tables and columns which define its layout. In fact, most of the information produced from activities I’ve described in this paper is metadata. A logical model, physical model, the business glossary, and XML Schema definitions describing the format of data in motion are all examples of key metadata in the enterprise. It is Metadata Management that ties it all together, providing a unified view of how data works across the enterprise. Metadata is stored and related in a Metadata Repository (MDR). The ability of an MDR to store and relate all types of models is critical to the success of metadata management, and the overall data management program itself.

The critical functions provided by the MDR include lineage tracing, impact analysis, business glossary management, and reusability facilitation. When a user views a business report and wonders what a particular data field represents, the glossary defines the term for her, and lineage analysis shows where the data came from, tracing it from the original source through each system and transformation that touched it. When IT wishes to alter a database table, impact analysis identifies all the downstream systems, reports, and data feeds that are impacted. Finally, the MDR becomes the central repository for IT and business to view available assets to use for new purposes.

Summary

Data Management is all about treating data as a valuable asset. In most businesses, data is critical to business operations. Disruptions in data flow, misinterpretations, unauthorized access, and data loss can have a significant detrimental effect on the business, sometimes being fatal.

As we contemplate the activities of Data Management, we must also not forget the role we play as information technologists. Information is at the core of every significant business. But we are not the business. Instead we play a support role. If you’re in retail, the company goal is to sell product. In Defense, you protect the homeland and its interests, in financial services, you make money by moving and manipulating money in its various forms, and you sell products that do the same. In all industries, money doesn’t come in the door simply by storing data and moving it around. It’s important to realize the role of Data Management, and the more general role of IT within the enterprise. These are support roles. You should strive to understand the business, and be in a position to partner with the business to instill confidence and align the efforts of IT with the goals of the business. Confidence in IT spurs an open flow of ideas, where technology capabilities spark new business ideas that business and IT can jointly exploit to further the goals of the business.

Friday, August 21, 2009

SOA Enlightenment

This month, Gartner released the 2009 "hype cycle" curve, showing SOA has finally emerged from the "trough of disillusionment". As Gartner tells us, SOA is no different than other technologies, moving from initial inception, to extreme vendor hype, to disillusionment as users feel swindled when reality falls far short of expectations. See Joe McKendrick's article, which includes the hype cycle graph. SOA is now marching forward on the "slope of enlightenment". At this stage in its lifecycle, the over-hype of past years has been resoundingly squelched, and the fundamental benefits of SOA, without the vendor exaggerations we've come to expect, have become apparent to the masses of IT groups across many industries. These IT groups are now "rolling up their sleeves", in Joe's terms, implementing projects using service oriented principles, and actually getting real value from those efforts. SOA is not saving the world in and of itself, but rather has become an essential tool for those companies seeking greater productivity. SOA makes IT more efficient, and helps focus energy on the essential purpose of IT: making the business run better, both now and in the future when inevitable business changes demand rapid change from software and systems.

Our data services project at XAware continues to gain strength. As companies build business applications using service-oriented principles, accessing data in a service-oriented manner becomes critical. Software from the XAware project addresses this need elegantly, especially when a company has complex data structures and varied formats spread across different systems. XAware is open source, and can be found at www.xaware.org.

Monday, June 22, 2009

Cloud-based Integration

There has been a lot of press over the last year about cloud computing. Web-based applications like Salesforce.com have demonstrated the utility and cost-effectiveness of cloud-based resources. Integration vendors have also begun providing products designed for the cloud. Like the few other cloud-based integration products, the XAware engine can be installed and run in a cloud computing environment. This architecture lets you avoid local infrastructure investments, and is most beneficial when you need to integrate data sources and applications involving other cloud-based or SaaS resources (not all resources are premise-based). If you are using SaaS or other internet-based resources, placing an integration component like XAware on the internet makes sense. It makes less sense if all your application components are premise-based, depending on performance requirements. This latter scenario would have multiple premise-based components contributing data to a cloud-based integration application, where the data is combined and transformed, then sent back into premise-based resources. Only a few types of integration applications can afford the performance cost of such a round-trip to/from the cloud.

If you're interested in experimenting with XAware in the cloud, you might read the new Wiki article on installing the XAware engine in Amazon's Elastic Cloud Computing (EC2) environment . This environment lets you pay Amazon for hardware and software resources as you go, providing a very cost-effective and flexible environment for many integration applications.

Tuesday, April 21, 2009

A recent front-page article by Jeff Feinman in SD Times, "Bottom Line: Software had better pay", warns that economic conditions mandate any funded project better begin paying back immediately.  Supply chain management and automation improvements are cited as types of projects that continue to be funded during this economic down turn.  Automation in particular is an area ripe with opportunities aligned with service-oriented architecture (SOA).  Even more attractive is the fact that many automation opportunities are focused on improving a small number of business processes.  This means that a project has a manageable scope.  Project costs  are more predictable, thus return on investment (ROI) is more definite.  Projects with predictable and immediate ROI are much more attractive than those with fuzzy costs ROI, thus are more likely to get funded when budgets are tight.

Why is automation well-aligned with SOA?  Since the early days of SOA hype, the main goal of SOA has been to achieve business agility through the organization of computing functions as interchangeable parts, called services.  A business process is implemented by orchestrating services to accomplish the goals of the process.  Services are designed to be reusable, so a particular service invocation, like "Create new customer", can participate in many different business processes.  Most importantly, new business processes can largely be orchestrated from a comprehensive library of existing services.  Traditional development cycles of 12-18 months are replaced with the creation of a new orchestration, a process that may take just a few weeks.

While most companies are years away from having a comprehensive library of services, automation projects present an attractive "delivery vehicle" to begin or continue growing the service inventory.  Automation is the processes of exploiting computer resources to augment or replace manual processes traditionally performed by humans.  Architected in a service-oriented manner, a process or orchestration environment provides a visual tool to design and manage a business process.  Activities in the process generally are implemented by services.  Candidate tools for the process layer include ActiveEndpoints , Apache ODE , and ProcessMaker. Tools in the services layer would of course include XAware for information-oriented services, and Eclipse-based tools when custom-coded services are required.

So, while times are tough and budgets are tight, development work continues in key areas.  Companies can't afford to stop reacting to market demands or improving operations.  Service-oriented implementation strategies will conitinue to play a key role in the important work currently underway.  And, if architected properly, service-oriented projects help lay the groundwork for more strategic benefits in the future.

Wednesday, January 7, 2009

The Essense of SOA

Anne Thomas Manes’ recent article proclaiming the death of SOA has started a firestorm across many SOA and IT-related blogs and forums. I wrote about it here. In the post, Anne refers to the severe disillusionment some feel towards the over-hyped term, “SOA”, and proposes we drop the term altogether and simply refer to the core concept as “services”.

The term “SOA” is at the same time ubiquitous and ambiguous. So much energy has gone into molding it into a marketing story that it seems no two people share the same definition.. The truth is, SOA is really just a simple evolution in IT development strategy. I believe the essence of SOA is building software components as “interchangeable parts”, an idea preceding Eli Whitney himself. It is evolutionary (not revolutionary), because we’ve tried to do this for decades, culminating in object oriented and component-based software development. SOA is simply the next stepping stone, which loosens the chains of platform and vendor lock-in, catalyzed by internet and XML-based standards. It’s still about interchangeable parts, components that can be recombined and reused to either build new systems, or rapidly change existing ones. And it’s a superior strategy even if you don’t plan to reuse or recombine, because componentized systems are easier to build, manage, troubleshoot, and support.

In the software industry, I believe we are travelling a similar path as other industries, and finding that interchangeable parts are easy to design in the small, but exponentially more difficult to design in the large. Wiper blades and radios are easily replaced in your car. But I would just love to install a new hybrid-electric engine in my beloved ’96 Jeep Cherokee. Interchangeable parts on such a grand scale are much more problematic. I’m sure there’s marketing mechanics at work here, too. GM wraps their latest electric engine in the Chevy Volt, available next year for $42,000. They don’t want to sell just an engine.

So, I think the term we use to describe the concepts behind SOA is less important than agreeing on the core, underlying essence of SOA. I believe this to be extending the idea of “interchangeable parts”. Personally, I find myself avoiding the term “SOA” more and more, perhaps in a subconscious effort to avoid the pained or befuddled look on those faces in the room. Instead, I gravitate towards the term “service” or “service oriented”. Above all, we need to understand and accept that we are not revolutionaries. We are just carrying forward what others have already set in motion. By communicating this point, we gain credibility in our conversations with business people who control budgets. The alternative is to position this concept as “the next big thing”, something business people seem to immediately distrust.

Tuesday, January 6, 2009

Is SOA Dead?

Burton Group analyst and SOA guru Anne Thomas Manes recently blogged that “SOA is Dead” (http://apsblog.burtongroup.com/2009/01/soa-is-dead-long-live-services.html), referring to the disillusionment and even disgust some feel towards the over-hyped term. But she was referring just to the term SOA, not the concept itself. On the contrary, service orientation has seemed to find firm roots in diverse areas such as mashups, RIA, BPM, cloud computing, and others.

I completely agree that service orientation is here to stay. But I don’t agree that the term SOA is going away any time soon. We are in the typical “trough of disillusionment” Gartner speaks about, as a huge wave of over-hype sets high expectations for a technology. Industry buys into the vision, then slowly comes to realize it is not a silver bullet. Hard work is still to be done to extract the benefits of the new technology. When so many people express disappointment in a technology, negative momentum builds, and soon a consensus develops that the new technology is a failure at best, and evil at worst. Such is the case with SOA.

To be sure, some technologies never fully emerge from the trough. Artificial Intelligence and Object Databases are two examples that never achieved wide-spread adoption after huge early stage hype. Others fare much better, like EAI and even Java. I remember the early Java days working at MCI circa 1997. Despite huge investments including the best consultants Sun had to offer, projects were massively under-performing, or even failing altogether. But the gradual maturation of the platform and supporting tools pulled Java from the trough of disillusionment to eventually make it the most popular programming language ever.

Anne concludes by saying that we need to move away from the term SOA and simply use the term “services”, since that is the core foundation of the concept. I think that’s fine for the time being. In fact, I’ve recently found myself using the term “service orientation” instead of SOA anyway. But I believe this is a temporary diversion. Eventually, market noise will settle down, and we in the software industry will finally develop a consensus on what this concept really is. When that happens, I believe, it will mark the emergence from the trough of disillusionment, and the return to calling this concept “SOA” once again, without fear of scorn.

See my related post here.

Wednesday, December 3, 2008

The Long Tail of IT

In these days of recession and shrinking IT budgets, development groups are forced to do more with less. This appears to be an opportunity for growth for Open Source projects, as companies find it difficult to purchase products, or even expand use of products they currently own. Open source products are available to assist in implementations of a wide range of IT problems. And with a very low cost of entry, development groups can kick the tires, and even implement an entire project, without awaiting the decision of an enterprise architecture group or budget committee. In a recent meeting with AMR Research, analyst Dave Brown talked about “Long Tail” effects within IT, where large, mainstream projects are still getting funding, but the large number of smaller, tactical projects are left to fend for themselves. This is exactly where Open Source can make the biggest impact… the large number of tactical projects going on within a company, often “flying under the radar” of the corporate enterprise architects. And it is not just the stealth projects benefiting from Open Source. Many projects designated as “tactical” or short term solutions have the flexibility to select the most expedient solution, which often turns out to include Open Source. I discussed Long Tail effects in the creation of services here.

So, in addition to areas where Open Source has a solid beachhead, like Linux usage for corporate servers, it certainly appears that Open Source is making additional headway, filling many nooks and crannies in the IT development space.