Thinking About Data

“Though the mills of God grind slowly, yet they grind exceeding small…” Henry Wadsworth Longfellow translated this famous line from a German poem written by Friedrich von Logau in 1654.  Imagine if Longfellow worked as a data architect in today’s Information Technology industry.  Perhaps he would have written this now famous line as follows: “Though the databases of God grind slowly, yet they grind exceedingly small.”

This is often how I feel when I begin investigating a database to diagnose performance problems or to start documenting the schema and constructing ETL process to populate a reporting database or data warehouse.  As data modelers, data architects and database developers- all of whom I will collectively refer to as database people for the remainder of this article-we are taught to think about data relationally and dimensionally. Relationally we normalize data for production OLTP databases, organizing it in such a way as to minimize redundancy and dependency.

Dimensionally we design data warehouses to have facts and dimensions that are easily understandable and provide valuable decision support to various business entities.  High quality, reliable data that is easy to query and consume is the goal of these design patterns.  During my twenty year career, however, I have discovered that well-designed database schemas and data models are the rare exception, not the rule.  And there are a couple of common themes underpinning the bad designs I encounter.

Poorly designed databases are often the result of designers who instead of thinking about how data will flow through their databases and more importantly, which people and internal business entities will want to consume that data, simply view databases as a convenient resting place for data.  This erroneous view frequently stems from a lack of formal training in database normalization, relational algebra, dimensional modeling and data object modeling; skills that I believe are essential for anyone serious about enterprise level database design.  The database schema is foundational to almost every business system, so it is imperative to involve skilled database people early in the design process.  Failure to do this may result in flawed database schemas which will suffer from one or more of the follow issues:

  • Lack of extensibility
  • Difficult to maintain
  • Lack of scalability
  • Difficult to query
  • Contain a high degree of data anomalies which render the data unreliable
  • Performance problems

Having worked with a lot of weird and strange database designs-I could probably write an entire book on the subject- I want to briefly mention some of the more commonly encountered database design errors.  And I am going to classify these errors into two general groups: Design errors in production OLTP databases and design errors in databases intended for reporting.

Production OLTP databases

  1. A completely flattened, un-normalized schema. Databases designed this way will have all sorts of issues when placed under production load; performance and scalability for example. I often hear this line from developers, “Well it worked fine in QA with a small amount of data, but as soon as we deployed to production and threw serious traffic at it, all sorts of performance problems emerged.”  Flattened schemas like this frequently lack declarative referential integrity which leads to data anomalies. Getting reliable reporting from this environment is difficult at best.
  2. A highly normalized schema, possibly over-normalized for the context, but lacking in sufficient/useful payload.  Every part of the data model has been taken to its furthest Normal Form and data integrity is well enforced through Primary Key/Foreign Key relationships. But getting useful data requires both excessive table joins and deriving or calculating needed values at query time. For example, I worked on the invoicing portion of an application-the invoicing code was written in PERL-in which customer product usage was calculated during the invoicing process.  The final usage totals were printed on the paper invoices, but not stored anywhere in the database.  Finance needed this information, both current and twelve months historical and to get it, I had to recreate large parts of the invoicing process in SQL; a herculean task.  When I inquired of the developers as to why customer usage invoicing totals were not stored in the database they responded as follows, “I guess we never though anyone would want that data.”

Reporting databases intended to serve internal business entities

  1. Attempts to build internal reporting functions directly against production OLTP databases.  A discussion of OLTP optimization techniques vs. OLAP optimization techniques is beyond the scope of this article. But suffice it to say that attempts to run I/O intensive reporting against a production OLTP system which is not optimized for reporting will cause tremendous performance problems.
  2. Building out a secondary physical database server and placing an exact copy of a production OLTP database on it as a “reporting instance” database.  Doing this will certainly remove any chance of overwhelming the production server.  But it will not provide a database optimized for reporting.
  3. Adding a few somewhat haphazard aggregation tables to the “reporting instance” database mentioned above.  This may temporarily reduce query times for reports relying on aggregated data, but it is not a long-term substitute for a properly designed dimensional reporting model.

Data models are often given short shrift because the original developers, being inexperienced with relational and dimensional databases, do not think correctly about data. This error in data thought frequently results in poor database designs which may perform poorly and contain unreliable data that is difficult to query.  I want to leave you with a specific example of this point by briefly relating a somewhat recent client experience of mine.

My client at the time had recently purchased a start-up company whose product was a complex ad serving application/engine.  The SQL Server databases foundational to the application were suffering severe performance problems which rippled through the entire system and resulted in a less-than-stellar customer experience.

At my client’s behest I executed a major system review and quickly ascertained two primary issues; a data architecture that limited scalability and incorrect table indexing which was a direct result of the architecture. I kept their developers and Program Manager involved with my solution development process and after a successful deployment, which solved their performance issues, the program manager made a key statement to me. She said, “Wow, I never understood the value and importance that a person with strong data architecture and DBA skills could bring to project like this.  You can be certain that on all future projects of this magnitude I will insist on, and budget for a person with your skillset to be involved at the outset to ensure we avoid these types of database issues.”

Every IT, Program and Project Manager would do well to heed her advice.  Consider spending some time with your recruiting department to find an experienced data architect with a successful track record at the enterprise level.  It will be time and money well-spent.

Tech Trends for 2013

Daniel Maycock is one of Slalom’s acknowledged thought leaders in the realm of new and emerging technology.

There were many significant technology advances during 2012 in a number of key areas, including the mainstream adoption of LTE, Big Data, and analytics dominating the enterprise IT agenda.

Companies went from adopting cloud platforms and services to leveraging those services and transforming their businesses.

  • Windows 8 has shown just how important Internet connectivity will be for computing in many capacities.
  • Every major IT vendor has focused to some extent on the convergence of mobile, cloud, analytics, social, and helping companies make IT a central part of their business in every aspect.
  • From SalesForce to Azure, cloud-based solutions are expected to grow even more in 2013.

As more and more companies begin waking up to this new reality, the question is not if adoption of key technologies such as cloud and mobile will take place, but how quickly and what can be done to make them work for the business as fast as possible. Furthermore, as these technologies are integrated deeper into the enterprise, it will be critical to keep in mind what other technologies will follow in their path. Read more of this post

Windows Server 8: Part 4—DirectAccess

Slalom consultant and accomplished Microsoft systems developer Derek Martin sheds light on Windows Server 8 (WS8) through his insightful blog series focusing on his research within the technical preview documentation, personal experimentation with the product, and thoughts of how they can apply to the real world as soon as it is released to manufacturing (RTM).

Slalom Consultant Derek Martin

Slalom Consultant Derek Martin is an accomplished Microsoft systems developer and integrator, experienced in developing and deploying SharePoint and CRM solutions, integrating line of business applications, and leveraging existing infrastructure investments.

Remote access is a tough nut to crack. On the one hand, every employee needs access to their corporate resources at all times of the day and from whatever device they can shake a stick at. On the other hand, IT can’t just tear down the firewall and let everyone in for obvious reasons. For a long time, dedicated virtual private network (VPN) equipment (like an ASA appliance) or VPN software (like Microsoft RRAS) was really the best option for secure, reliable connectivity when not on prem.  The challenge with those solutions are several fold: Read more of this post

Windows Server 8: Part 3—BranchCache

Slalom consultant and accomplished Microsoft systems developer Derek Martin sheds light on Windows Server 8 (WS8) through his insightful blog series focusing on his research within the technical preview documentation, personal experimentation with the product, and thoughts of how they can apply to the real world as soon as it is released to manufacturing (RTM).

Slalom Consultant Derek Martin

Slalom Consultant Derek Martin is an accomplished Microsoft systems developer and integrator, experienced in developing and deploying SharePoint and CRM solutions, integrating line of business applications, and leveraging existing infrastructure investments.

Perhpas my favorite Windows Server 8 feature, BranchCache™, has received tremendous updates in this release. BranchCache is a technology originally released in the 08 versions of Windows Server that allows companies with large remote office deployments to ease wide area network (WAN) traffic for file server purposes by securely caching files at remote offices on either servers or even workstations (using something akin to P2P technology). Once a client from a remote office grabs a file off a server via the WAN, it gets cached there (again on either servers or client workstations) for other users to get.  This operation is transparent to the end user–the only thing they notice is files open faster! To me–this is dark magic. Read more of this post

Windows Server 8: Part 2–Making the Administration of Servers Easier

Slalom consultant and accomplished Microsoft systems developer Derek Martin sheds light on Windows Server 8 (WS8) through his insightful blog series focusing on his research within the technical preview documentation, personal experimentation with the product, and thoughts of how they can apply to the real world as soon as it is released to manufacturing (RTM).

Slalom Consultant Derek Martin

Slalom Consultant Derek Martin is an accomplished Microsoft systems developer and integrator, experienced in developing and deploying SharePoint and CRM solutions, integrating line of business applications, and leveraging existing infrastructure investments.

It comes as no surprise that in order to build a private cloud (or a public one if you’re in the hosting space) that the fewer server admins per server, the better. Some enterprises have a hard limit of say 20 to 1, others with impressive automation and orchestration have gotten up to 100 to 1, or more likely, broken out individual server administration tasks to individual teams—this team for patching, that team for AD, etc.

In WS8, the convergence of knowledge around maintaining large numbers of server instances becomes almost fun. There really is no longer a reason to have one large server running dozens or hundreds of apps, simply because they are easier to administer. In fact, because of the continuous fine tuning of memory and processing requirement, dozens of Windows 8 instances can run on a single host without any trouble. But the tools better be there—and they are! Read more of this post

Windows Server 8: Part 1—Active Directory

Slalom consultant and accomplished Microsoft systems developer Derek Martin sheds light on Windows Server 8 (WS8) through his insightful blog series focusing on his research within the technical preview documentation, personal experimentation with the product, and thoughts of how they can apply to the real world as soon as it is released to manufacturing (RTM).

One of the most stable and well designed pieces of software to come out of Microsoft–ever–is Active Directory. While its development was riddled with twists and turns (and the occasional run in with Novell), the identity platform for millions of businesses is a stalwart of stability. Few outside IT Administrators ever know of its existence–and that is the point. The basic ability to authenticate and gain access to corporate resources, be that a computer, a server or a file would not work without the consistent resiliency of this gorgeous database.

Slalom Consultant Derek Martin

Slalom Consultant Derek Martin is an accomplished Microsoft systems developer and integrator, experienced in developing and deploying SharePoint and CRM solutions, integrating line of business applications, and leveraging existing infrastructure investments.

The biggest advantages of Active Directory over other products is its ubiquity. Microsoft has spent untold sums making it accessible and secure to a variety of applications, the entire MSFT stack included. It also is the underpinning of major infrastructural components of many software packages including Exchange, Lync, SharePoint, and thousands of LoB applications. In recent versions, it has gained prominence in the realm of federation, enabling businesses to seamlessly federate their authentication and authorization stores using technologies like ADFS and Claims Authentication. The changes in WS8 around Active Directory continue to provide support for the major system components from previous versions (great backward compatibility) as well as the new features throughout Windows Server (describe later in this series) and support administrators by addressing common pain points. Here’s what’s new:

Simplified Deployment: Similar to Windows 8 Client where everything is touch first, in WS8 everything is PowerShell first! Deploying Active Directory is so much easier now that it can be accurately scripted.  All but the smallest companies have needs to deploy multitudes of AD servers.  In WS8, you can deploy Active Directory on multiple servers at once, export your graphical user interface  (GUI) based configuration to a series of PowerShell scripts and can clone brand new Domain Controllers (think sysprep on steroids) for rapid Active Directory forest topology builds.

Safer Virtualization Support: This exact issue has bitten me! What’s the cardinal sin with virtual machines and Active Directory? DON’T P2V a DC (without following very complex prescriptive processes). The dreaded rollback USN will bite you. Active Directory in WS8, however, recognizes that virtual domain controllers are what enterprises need, so they’ve built in additional replication logic to keep time synchronized with hosts and algorithms that keep things in check when they are virtualized.

These three updates with Active Directory, along with refined management experiences, new/simpler PowerShell commands will make administering your forest much easier. I’ll update this post as more information is learned about any of the underpinnings of Active Directory–I’m particularly interested to see if the schema has changed any!

Slalom Consulting’s Dallas office Slalom Consulting's Project & Cloud focus
Learn more about our Dallas office Learn more about Slalom Consulting Cloud

subscribe by emailSubscribe to follow new Cloud posts

The Coming Storm of Windows Server 8–A Multipart Series: Intro

Slalom Consultant Derek Martin

Slalom Consultant Derek Martin is an accomplished Microsoft systems developer and integrator, experienced in developing and deploying SharePoint and CRM solutions, integrating line of business applications, and leveraging existing infrastructure investments.

If ever there was a time when the technology of the world that enables millions of businesses and this new concept of the cloud converged on a single piece of software, it is occurring with Windows Server 8 (WS8). While the rest of the planet contemplates the good, the bad, and the ugly of Windows 8 client (myself included), I wanted to take an opportunity to digest the underpinnings of the server counterpart–to see if the server team was doing things as radical as their client team cousins. The answer? A resounding YES. In fact, in many ways, what is happening on the server side of the shop is more disruptive and compelling than the client side. Sexier even you ask? Only to a nerd’s nerd (me) and maybe a few of my colleagues.

WS8 focuses on making the life of the administrator easier. It streamlines management, completes features introduced in previous versions, and adds a host of new features that make this the largest software update to Windows Server since Windows 2000.

WS8 also bridges many of the remaining gaps between the public cloud and the private cloud. It allows dynamic and automated workflows for the rapid provisioning and re-provisioning of resources. It brings into competitive parity (and then some!) its Hyper Visor with industry heavyweight VMWare and does so at a fraction of the cost.

This series focuses on my research within the technical preview documentation, my own personal experimentation with the product and thoughts of how they can apply to the real world as soon as it is released to manufacturing (RTM). Over the coming months, I hope you enjoy what you see!

Slalom Consulting’s Dallas office Slalom Consulting's Project & Cloud focus
Learn more about our Dallas office Learn more about Slalom Consulting Cloud

subscribe by emailSubscribe to follow new Cloud posts

The State of Storage in 2012—to the Cloud!

Slalom Consultant Derek Martin

Slalom Consultant Derek Martin is an accomplished Microsoft systems developer and integrator, experienced in developing and deploying SharePoint and CRM solutions, integrating line of business applications, and leveraging existing infrastructure investments.

There is a great article in Information Week this week about the 2012 State of Storage that I wanted to comment on. If you don’t have a subscription, that’s okay. The basic premise is that solid-state drive (SSD) costs are really starting to drop and the idea of enterprise’s using SSD storage area networks (SANs), or higher numbers of SSDs in existing SAN technology, is starting to gain traction.

This is certainly true and will continue to provide great performance improvements for needed input/uutput operations per second (IOPS). It’s telling that to this day, many enterprise, including the five that I work closely with, are stuck in the traditional models of storage. This isn’t their fault–these kinds of sea-level changes take time and there are obvious risks to upending a trusted SAN solution. But the writing is on the wall: traditional massive storage arrays for both performance applications and archival/compliance/storage requirements are going to look very different in a few short years. Read more of this post

An Ordinary Day with the Client

Slalom Consultant David Van De Somple

Slalom Consultant David Van De Sompele's expertise includes performance tuning, production DBA activities, data modelling, ETL processes and reporting. He's based in Slalom's Seattle office.

If a day with the client is so ordinary, what is there to write about?  That is a fair question and the answer to it is the topic of this article.

Dictionary.com defines the word ordinary as: common, mediocre and unexceptional. But if that is the value you are delivering to your client then you should consider a career other than consulting. As consultants we specialize in various technologies and methodologies.  Blending this expertise with creativity and innovation, we help companies solve complex business and technological problems.  Over time this becomes ordinary to us – it’s what we do – but this ordinary effort from the consultant should look like an extraordinary effort from the client’s perspective.

On rare occasions you may be presented with a particularly challenging situation: uncharted waters, or an unpredictable fluke for which no one has the answer.  And as a senior consultant you must be willing to dig deep and bring the entire breadth of your professional experience to bear in order to solve the problem.  To better illustrate this I’ll relate a recent experience I had.  It was one of those situations in which everything seemed to be going well until a very strange situation occurred and I was asked to jump in and solve the issue. Read more of this post

The Final Two Silos

Slalom Consultant Derek Martin

Slalom Consultant Derek Martin is an accomplished Microsoft systems developer and integrator, experienced in developing and deploying SharePoint and CRM solutions, integrating line of business applications, and leveraging existing infrastructure investments.

It is interesting to see the Enterprise space, with all the efforts of integration and collaboration, continue to struggle with fundamental technology differences and the lack of willingness to improve. There seem to be two final silos of not just data, but process and infrastructure. The silos are between historically perceived enterprise holdings and new technology enterprise holdings.

  1. The database. Most large enterprises seem to have heavy investments in Oracle and MSSQL. Full disclosure on my preference, but there are valid reasons for that.
  2. The programming platform. Most large enterprises have a heavy investment between Java and .Net with a sprinkling of other things here and there for one off. IBM comes into play here as well with its horrid product line.
  3. Enterprise Content Management. Every large enterprise is working through this challenge. The two major camps that I see left are EMC and Microsoft. The file server is dead and if you are still using one, well, quit that … better yet, call me and I’ll tell you how to quit that and save 80% of your TCO but I digress.
  4. Read more of this post

Follow

Get every new post delivered to your Inbox.

Join 129 other followers

%d bloggers like this: