Day 2 :: Contemplating Data Design

I find myself thinking like I did the first day, in a very data centric way. It is similar to how they introduce Object Orientated Programming (OOP) concepts as modeling code after real-life objects. Of course shortly after this, they explain that this is not always the best and offer a few counter examples. Regardless, I find the real world the most intuitive tool to use when initially solidifying a concept. Using this thought process, you can start to envision how you want your data to be usable by your and possibly other applications (this is the intro enterprise thinking, your data schema will likely be used my a slew of different applications in any enterprise). What do the relationships really mean about the data you are trying to represent? Is this data best represented on this table or is really its own entity that should be stored in a look up table? How is entity A related to entity B?
All of these questions get a little hairy when you start to create a vision for multiple complex applications as you want your data to be self explanatory and highly reusable along with avoiding duplication and maintenance problems found in non-normalized schema. Even a simple application like the one we are writing can make you stop and think about the pros and cons of a particular arrangement, let us take two examples from my brainstorming session…
Link me this data man!
First, the simple one. I was getting in the habit(while building an EER diagram) of creating link tables with unique identity columns. This is the de facto standard and often times completely the correct choice when building a link table. While I was specifying my ‘Identifying Relationship’s I realized that I was making a mistake. An Identifying Relationship is a term used to describe the use of the foreign key as an identifying column. When thinking if a relationship is identifying, consider if the link table entry can/should be uniquely identifiable with out the entity it is linking against. In my case, I was using a link table to relate technologies to the projects I used them on. If the Technology didn’t exist there would be many projects with an entry and the link wouldn’t be identifiable, if the projects where gone there would be several technologies that would not allow the record to be uniquely identified. There are two ways to solve this issue: 1.) Add a link ‘id’ column that acts as a primary key. 2.) Make these foreign key’s a composite primary key marking them as an Identifying Relationship. I suppose you could also make a composite key out of all three columns, but this would effectively be the same as having a singular id column that kept the records unique.
Now, why would I stop following convention and remove the traditional ID column that keeps the rows unique? Well, I don’t want technologies to be defined to any single project more than once. It makes no sense to say I used Java and Java on this great project I was working on, nor does it make any sense in reverse to say on project A and project A I used Java.
Link me once, I didn’t want to look up, link me twice, I’m starting to get the picture, but link me three times, wtf…
This consideration is another link table thought. Originally, I had designed from the project entity out. Each project was likely created because of a Job (but didn’t have to be) and each Job had an Employer. Then I started to think about well the person working on the project is also likely to be the employee representing the Job and an employee of the the Employer who sponsors the Job. It was not a solid design to have People only related to projects as you wouldn’t know if the person on the project was from any particular employer or if they had any particular role on the project. Enter my contemplations on how to best represent the link between the three entities.

On the left a 3 primary key link table and the right a more traditional employer and person link through a single job entity.
After drawing up these two schema ideas, I wrote down in plain English what each relationship could mean. You’ll notice in the red marker a (BAD?) string written in black. The question came down to the idea that even though a Job role is often common among multiple employers, they are never identical in every aspect. Therefore, a job should not be allowed to have multiple employers. This made it an easy decision to let the Job table be a link between person and employer. The last question is whether multiple people could hold the same Job for an employer at different points in time. This is a little tricky as it is very likely a Job defined by an employer could be held by multiple people over the course of the company’s life. In our application we really don’t have to worry about that as it is intended to be for a single person, me, but what if I wanted to expand on the idea and allow friends and family to document their own experience in my system? Or maybe I just want to allow my person table to be full of my contacts and I’m interested in their previous job history as connections. At any rate, I was not ready to accept a less than optimal structure if I could avoid it.
What I really wanted was the best of both these two designs, an employment link between a person and a job, but keeping each job specific to a single employer. All I had to do was remove one link from the tri-disaster.

Using a link table only between Person and Job allows jobs to be dependent on an employer but allow multiple people to have held or to hold that position.
Now, I shouldn’t be so harsh on the three-way link, after all who doesn’t like the idea of a three-way. Consider the situation where I would want to build a system that was intended for multiple employers to manage HR affairs. Wouldn’t it be a cool feature to allow young companies to choose from a list of defined job roles/positions when creating job listings or exploring the structure of their own company? Certainly, I could just search the jobs in the system and duplicate any that a new employer wants to copy, but this would be bad practice and lead to headaches if we wanted to updated all instances of Sr. Software Engineer to have the same description. On the other hand, copying and thinking of the entity as a unique job to that company (even though initially it isn’t) might be perfect as it allows the employer to tweak this job description to fit their needs without modifying the company’s job description they stole it from. It would just be hard to display a user friendly search result listing of jobs as there maybe 100 different versions of ‘Sr. Software Engineer’.
Anyway, I think that is enough analysis for today. Tomorrow we’ll try to finish up this discussion with a completed EER diagram.

Recent Comments