Unit Testing and Decisions on BAD Data

I was having a conversation with a friend about their particular predicament and how they got fussed at for taking time, after being directed to, to write unit tests for someone else's code.  Not only had this individual been directed to do these tests, but after providing estimates based on the idea that the application already was complete, or at least in a testing state.  So with the idea that the application was more complete than it was, unit test writing initiated only to find that the application needed major fixes.

Now correct me if I am wrong, but unit testing is done to find these issues and errors.  Sometimes, most of the time, almost all of time, unit testing is going to take longer than writing the application straight.  But the application will be more solid, have better integrity, the ability for regressions, faster future development, less crashing, better organization, and assuredly better meet the requirements the application is designed to meet.  So when someone takes the time to assure this level of integrity, the last thing a manager or anyone should do is to not realize what is going on when unit tests are being created.  Writing a unit test, especially under a TDD process, is NOT JUST WRITING CODE!

Rant Number 345,251,090,253.

So, to all the managers in the world, please read and focus on what a unit test is.  The reason and purpose of a unit test is vital to getting the results of what is promised from unit testing.

So don't ask your developers to go and unit test applications that they did not write unless you expect them to go through and literally verify every single "unit" of work that is being performed.  They have to write a "unit test" for each "unit" of work so they must understand each "unit".  That means, for all practical purposes, learning the entire use case and application in detail.  Remember, if done properly, this will probably take almost as much time as actually writing the application did.  It'll be worth it, but make sure to have good expectations of what kind of resources and effort this will take!

Writing unit tests after the fact like this is not a good way to go about coding, but sometimes it needs to be done.  It is definitely good to do test driven development instead of the post test development style.

So please stop expecting the market this mythical hype, get rid of the pointy hair boss syndrome, and start dealing with reality.

Digg It!DZone It!StumbleUponTechnoratiRedditDel.icio.usNewsVineFurlBlinkList

Posted by: Adron
Posted on: 10/31/2007 at 11:37 PM
Categories: Rants
Actions: E-mail | Kick it! | DZone it! | del.icio.us
Post Information: Permalink | Comments (0) | Post RSSRSS comment feed

Learned, Relearned, and Learned Again, Agile Like

I have to admit, it is pleasant to "relearn" something versus trying to learn it the first time.  The images, explanations, and past learning pour back into my memory.  The memories and experiences are almost always good, the times fun, and the excitement of figuring out something the first time is renewed.

With that I've dived back into my most recent study of business intelligence and various OLAP technologies from MDX to the storage mechanism full force.  Lately I've reviewed again the storage mechanisms, working to figure out what is inside the "black box", that makes the OLAP Cube.  Much of it I've recollected over the last few weeks with ease, but the black box is a bit of a mystery.

What got me thinking about this wasn't the action of relearning these last few weeks, but an article I read that has to do with this exact topic.  The Secret Sauce of Highly Productive Software Development hits the nail on the head of what makes an Agile team truly effective and faster, sometimes exponentially faster than a traditional team of software developers.  Learning is a key to making a team fast.  The team must always learn, every day, every hour something different comes up and they must adapt.  Everything from the new pattern to implement a needed feature to the simple "I gotta hit this button even though it is not in the documentation" knowledge.

After all that being said, if there was one feature of Agile Methodology that I have to pull out of the bag of tricks and say is important, it is learning and adapting to the tools and the project.  This aspect of the methodology is more important than almost any other part, of course all the other parts are built around enabling the developers to learn, adapt, and have process that can handle and integrate the "learning" into the actual development process as much as possible.

All this thinking on the topic of Agile and the basic inferiority of previous methods lends more credence to why Agile itself is permeating almost every methodology these days.  It is strange, almost like the other methodologies, are attempting to "learn" Agile.  There are of course those people out there that are naysayers of Agile, and that is fine, they'll probably be out of the software development sector of the industry within the next 1-5 years.  If not they'll be relegated to the "maintenance role" of 35 years and will hopefully be happy with their "white picket fence" dream home.  I personally am ready for the challenge of staying well ahead of the curve, far away from the mediocrity of the average, and pushing the envelope as much as possible.  Addictions aren't generally a good thing, but I am addicted to solving this learning bottleneck within project efforts.  Not only am I addicted, I'm intrinsically connected to the whole learning concept.

On that note, I'm back to the blogs, books, and personal efforts of the day.

Digg It!DZone It!StumbleUponTechnoratiRedditDel.icio.usNewsVineFurlBlinkList

Posted by: Adron
Posted on: 10/30/2007 at 5:03 PM
Categories: Discussion Points or Ideas
Actions: E-mail | Kick it! | DZone it! | del.icio.us
Post Information: Permalink | Comments (0) | Post RSSRSS comment feed

Pond's Laws of System Design (A.K.A. Something to Ponder)

In my eternal effort to keep up with everything, or at least the things related to what I do...  I found an excellent law of system design list by Mr Ward Pond over on his blog.  I always find lists interesting, as statistics show almost everyone does, but these types are actually of use.  Many of the points are true to the occupation and definitely worth reading and acknowledging.  Some of my own thoughts around Mr. Pond's Laws are generally reinforcing the general undertone of each law.  The ones I generally have 2 cents to add are following.

#5 Pond mentions that the "value you add goes triple for the value that the job brings to you".  I absolutely agree.  I would also add the correlation that without value being brought to the individual doing the job it is a worthless job.  If you can't enjoy, even in some small way, you should probably either find a way to enjoy somehow or get the hell out of that type of work.  When #5 goes the other way, not only do you not add value to what you do and your coworkers efforts, but you detract extensively from your own life.

#6 "Play" with the product.  If you aren't, are you even really doing your job?  Something I commonly ask of advanced DBAs, Software Developers, and even Architects that don't write code anymore.  What are you doing then?  Why are you not doing it?  Get back to it, if you aren't playing, something is big time wrong!

#8 Please, I implore people to know this, to simply know thy self.  Without this, you will never be good at the whole aspect of human contact, which regardless of what some might think, is pivotal to doing a good job at anything!

...and that leads me to the last law that I really must reiterate...

#10 Your code is a communication with someone else, who will likely come after you are gone.  This is true, it is frustrating, and sometimes agonizing, but simple leave some kind of whimsical comment is better than nothing.  I've made gross errors in this area of my career, but ongoing I always try to improve my comments.  On the same note, but the other end of the view, make sure you learn to grasp and read comments and code.  Make sure you actually know the technology (see #6), don't blame inadequate knowledge and ability on the "guy that left" because you don't understand patterns, objects, the language, the technology stack, or other scope.  Always learn, keep learning, and don't stop.  With enough knowledge, even the most cryptic spaghetti isn't all that bad to fix.

Digg It!DZone It!StumbleUponTechnoratiRedditDel.icio.usNewsVineFurlBlinkList

Posted by: Adron
Posted on: 10/29/2007 at 5:16 PM
Categories: Rants | Discussion Points or Ideas
Actions: E-mail | Kick it! | DZone it! | del.icio.us
Post Information: Permalink | Comments (2) | Post RSSRSS comment feed

Oh my freaking !@#!

I just spent multiple minutes on the phone with Chase Cardholder Services attempting to find out the bloody freaking zip code to send a stupid payment to.  You would think that would be easy.

I couldn't find it on the site, I'm sure it is buried there somewhere.

I could barely get to it on the phone.  I had to go through two phases of entering my 16 digit account number, plus other pieces of information just to get the blasted address!  Something is seriously wrong with their navigation.

One of the other interesting tidbits is that while navigating this, I was also reading a blog entry.  That blog entry just so points out a lazy lazy navigation practice - "show less choices".

There is a connected irony between my complaint and his.  They need to get someone on their navigation and logical structure of their site right away!  I just experienced the negatives of their navigation, and their lack of providing information in a coordinated and collected fashion.  The billing address should BE UNDER THE PAYMENTS SECTION!!!  If anyone is paying attention, FIX THIS ASAP.

The services are good, the cards a decent, the point programs rock, but blast it I want to be able to use them better!

Thanks,
Perturbed Adron

Digg It!DZone It!StumbleUponTechnoratiRedditDel.icio.usNewsVineFurlBlinkList

Posted by: Adron
Posted on: 10/26/2007 at 10:47 AM
Categories: Rants
Actions: E-mail | Kick it! | DZone it! | del.icio.us
Post Information: Permalink | Comments (2) | Post RSSRSS comment feed

Business Intelligence Observations

As I sit through the second day of training I can't help but think back upon my previous experiences with OLAP and BI.  My thoughts run back to some other times doing the same things, but it just wasn't initially called BI or OLAP, simply reports for executives and business decision making.  I find it interesting that there are specific labels for all of these efforts.

BI for the most part is just views, stored procedures, and other data getters that coordinates and correlates data into "information".  The difference now is that there are tools and other applications that assist   This has been done for ages and ages.  There have been autonomous database servers setup just for this purpose before.  They where not called OLAP or BI servers, just simply report servers.

One of the solutions that I worked with in the past previous to dedicated OLAP and BI solutions was back at SCP Pool Corp.  We had a database that was used for OLTP work and another database that stored denormalized reporting data.  In addition to that we had break downs of data into views and other assorted organizations to provide extensive reports.

This solution was nothing more than what the BI and OLAP Cubes provide today.  The data was similar, the speed was similar, and the capabilities where similar.

I will admit though, that even with similar capabilities, what is simplified and assured with SSAS & SSIS in comparison to the manually created and manually coded solutions of yesteryear are vastly better.  I just like to draw the correlation that it isn't particularly NEW, it is just DIFFERENT.  The decrease in time it takes to put together report cubes, process them because of simplified structures, and the ongoing enhancements to the underlying engines themselves make BI and OLAP work worth the investment.

Once again, just like interurban passenger cars are now light rail vehicles and horseless carriages are now cars, reporting databases are now OLAP/BI Cubes.

Digg It!DZone It!StumbleUponTechnoratiRedditDel.icio.usNewsVineFurlBlinkList

Posted by: Adron
Posted on: 10/24/2007 at 10:28 PM
Categories: SCP Pool Corp. | Discussion Points or Ideas | Business Intelligence and Analytics
Actions: E-mail | Kick it! | DZone it! | del.icio.us
Post Information: Permalink | Comments (0) | Post RSSRSS comment feed

BI/OLAP Training Day #1

Yesterday I attended Maxamine Training as part of my initiation at WebTrendsMaxamine is a web site scanner for finding tags, JavaScript, and other assorted objects within a web application/site.  The application tool and software is a client application with a server based storage base.  All in all, it is a decent product, and for anyone in need of a web site scanner they should definitely check it out.

Today I'm sitting in day one of Business Intelligence and OLAP Training.  It is an introductory class, which places me at risk of being a little bored, but it is still good to get a solid review with SQL Server 2005 Tools.  My last serious experience was with SQL Server 2000 way back in the days of my Jacksonville work.  SQL Server 2005 has some definite changes that are advantageous over the previous version.  Even though this class is introductory, I decided it would be a great opportunity to create a set of notes and a write up in relation to the class.  These notes and descriptions are what will follow over the next few days.

Definitions

OLTP - Online Transaction Processing

This form of database is used primarily for; transaction processing systems, normalized data, optimized for data entry, and has a focus on data integrity.

OLAP - Online Analytical Processing

This form of a database/warehouse is used primarily for;  analytical systems, denormalized data, heavily indexed, aggregated, and has a focus on reporting and analytics.

Star Schema - A Star Schema is a schema that is used in business intelligence OLAP Cubes that has the following characteristics;  single fact table, multiple dimension tables, all dimension tables directly related to fact table, and allow quickest processing time.

Snowflake Schema - A Snowflake Schema is a schema that is used in business intelligence OLAP Cubes that has the following characteristics;  single fact table, multiple dimension tables, dimension tables can be related to the fact table through other dimension tables, slower to process.  Sometimes faster to load than a star schema, but not particularly to process.

Fact Table Design - A fact table is created, generally, from multiple OLTP tables, foreign key columns to dimension tables, measure values, and provides a single level of granularity.  Fact tables also should have surrogate keys, be flattened to a star schema, may need to deal with nulls, and is the shared bus for dimensions.  It is also preferred, and sometimes a key need, to not have nulls in the cube.

Catalog - A database defined on an OLAP server.

Cube - A multidimensional data subset derived from relational tables, or a data warehouse, organized for rapid query response.  What I have called in the past, "Database Views on turbo steroids"!

Dimensions - Dimensions represent how the data is accessed.  Usually dimensions represent things, persons, a place, or other type.  The dimensions answer the "who", "what", or "where" and the "how" or "by" part of a question.  The dimension presents the how or by part of the question.

Hierarchy - A hierarchy is a structure within a dimension that represents the organization of data grouped in levels of most general to most specific.  An attribute hierarchy is a flat hierarchy containing a single attribute.  This would include a single level break out such as a product type.  A user defined hierarchy is a hierarchical arrangement of attributes.  This would include a break out of product type including sub product type.  Creating more multilevel hierarchy.  A level represents a specific detail about data within the dimension.  A member represents a specific value contained within a level.

Measures are key metrics, often referred to as the key performance indicators (KPI), of a business.  Measures answer the "what" part of the question, and are usually additive, discrete, and easily measured.  One example is that of sales, which is of course easily additive.  Others would include;  cost per item, item sale price, sold price, and other set values.  A calculated measure is the additive sum of the item sale prices, or sold prices for a period of time, or by quarter, or other measurement.

Best Practices from Day #1

  • Use staging areas.  Staging areas help abstract, clean, and often speed up the process.
  • Surrogate keys should be used for dimension tables.
  • Use a shared bus of dimensions.
  • Fact table foreign keys  should be the surrogate keys not business keys.
  • All measures in fact table need to be the same level of granularity.

My Two Cents $$

When designing cubes one of the first things that needs to be done is forget.  Forget the OLTP, normalized, technical break out of data and storage.  The first thing to do is to think of the questions that need answered.

  • What are the roll ups and sales for product x through y for the first quarter broken down into weeks, possibly even days, or time of days?
  • What are the locations that sales occur in, what time of year is it and how does that correlate to the weather (summer cloths sell in the summer, winter in the winter, etc)?
  • What is the break out of planned, projected, and literal sales and expenses against revenue?

This is one of the things that I find absolutely exiting and interesting about the field of business intelligence.  Business intelligence is the good way, best practice way, the smart way to make business decisions based on empirical evidence!

Now For The Microsoft Architecture

In Microsoft's Technology Stack the UDM, or Unified Dimensional Model, is the cube itself represented in the Visual Studio SQL Server Analysis Services Interface.  The UDM combines both relational and analytical elements.

For more about specifics of BIDS, SQL Server Management Studio, and other Microsoft Architecture definitions check out my previous entry on these items.

To assist in visualizing what each part of the SSAS Visual Studio IDE make sure to clarify the various levels of abstraction.  The abstracted points to know are the Data Source Views (DSV) in relation to an actual Cube, where external sources come from, and how the various SSAS mechanisms work in conjunction with SQL Server Integration Services (SSIS).

The data source view provides an abstracted layer the Cube works on.  DSVs are derived upon existing warehouse tables.  These DSVs are also correlated appropriately for presentation within the Cube.  A common mistake is to think that a DSV can serve as a cube.  This is not suggested as major bottlenecks and other issues can occur.

Also, remember that the data warehouse is very important, not just as a data repository but also as an integration server.  Without being able to execute SSIS and associated ETL processes separately from the actual OLTP database is a sure fire death sentence to the OLTP Database Server.

OLAP Database

The OLAP database is the top level container for OLAP Objects.  This top level object contains the various objects;  data source, data source view, cube, dimension, security roles.  The data source is the provider or data connection.  The cube access is provided at this level along with the processing of the cube, retrieval with ROLAP and HOLAP, proactive caching, and write backs.

On the last note, a kind of summary for the class, the following points where made.  One: Don't use any auto build features.  Two: Have your cube designed with the appropriate keys in place.  Three: Use a star schema data source to simplify dimension management.  Four: Dimensions are shared;  be careful not to duplicate.  Five: Use surrogate keys in base data source.  Six: Add all columns of interest as attributes.

My Summary:  The day went well with a lot of material covered.  Fortunately for me since I was exhausted from a crappy night's sleep, most of the material was merely a refresher.  It was good though to step back through all these things, but also to see the SSAS way of doing things.  The last time, as I've probably mentioned, that I was doing this I was condemned to use SQL 2000.  All in all, I had a blast, getting to refresh the material and learning a few things about the SQL Server 2005 Analysis Services.

Digg It!DZone It!StumbleUponTechnoratiRedditDel.icio.usNewsVineFurlBlinkList

The Guru(s) of MDX, OLAP, and SQL Server Analysis Services

While ramping back up for heavy duty OLAP & BI work there where a few names that kept popping up.  These are definitely the heavy hitters in the field.  For my (and you fellow reader's) future reference.

Digg It!DZone It!StumbleUponTechnoratiRedditDel.icio.usNewsVineFurlBlinkList

Posted by: Adron
Posted on: 10/22/2007 at 1:40 PM
Categories: Website and Application Write-Ups | Business Intelligence and Analytics
Actions: E-mail | Kick it! | DZone it! | del.icio.us
Post Information: Permalink | Comments (0) | Post RSSRSS comment feed

Web Analytics Grasped, Week One Wrap Up

Web analytics are derived in two primary ways, so far (more on this later), JavaScript tags and web server log files.  This is pretty much the universal way that web statistics are collected.  There are dozens of books out there that cover this in dozens of different ways;  Web Analytics Demystified (which I'm reading now), Web Analytics: An Hour a Day, Actionable Web Analytics, Web Site Measurement Hacks, Google Analytics 2.0, and diverging into a sub-topic of web analytics is Search Engine Optimization: An Hour a Day.  These books of course cover much more than just web analytic tagging methods, but they're good resources for discovering more about the field.

Web analytics can be used by almost any sector of the web industry.  Some of these divisions include; content, commerce, lead generation, and self service sites.  Each of these separate divisions would track differing statistics, ranging from gross margin, gross margin return on investment (GMROI), net profit, total sales, average order size, or accessory attachment rate all the way to average page views per visit, average visits per visitor, click through on on-site ads, first-time versus returning visitors.  We analytics cover the full spectrum of feature sets in almost any type of web application you could think of.

I've also started to stumble into specific techno analytics speak.  One of the words that hit me kind of funny is sessionizing.  First off, it is not technically a word.  It breaks convention from a real word that would be created or used in the English Language.  But really, I guess nobody cares if it gets the point across.  Plus we're talking about the marketing industry here; it couldn't possibly hurt to draw more attention to something with faux words.

Sessionizing is a custom web analytics adjective.  Sessionizing is the process of assigning a unique visitor to one or more actions that occurred within a defined period or visit.  There are other words, and analytics speak, that I will cover later.  For now, I'll just let the word sessionizing sink in a little bit.  If you think I'm joking, I'm very serious, so be aware when you hear it in the analytics industry, that it truly does mean something now.  I'm sure it will end up in the dictionary within the next few years anyway.  Then it will be as real as any other English Word.

Really though, it is just one more entertaining part of this part of the technology sector!

Some of the other things that starting here has made me ponder.  Some I can't mention yet, others include; what would I measure on this site, how could I use this to improve my search engine optimization, and other such things that I generally don't ponder.  I get the feeling that I'll be thinking a lot more about these things in the coming days and weeks.  Cheers to that, I'm off for a few beers and a weekend of doing absolutely nothing in particular!

Digg It!DZone It!StumbleUponTechnoratiRedditDel.icio.usNewsVineFurlBlinkList

Posted by: Adron
Posted on: 10/19/2007 at 4:43 PM
Categories: WebTrends | Web Analytics
Actions: E-mail | Kick it! | DZone it! | del.icio.us
Post Information: Permalink | Comments (0) | Post RSSRSS comment feed

Thinker, a Real Thinker

Yo!  I made a gross error of neglect in my moving on entry!

Ryan B. - thanks for giving me some serious ideas on book purchases, a very interesting "other section" at Powerll's, and for being an all around kick ass programmer!  I hope to run into you in the future!  Maybe I'll be able to show you some of the cool things I have going on in my set path finding navigation application.  I've actually started on it AND am making some decent progress.

Digg It!DZone It!StumbleUponTechnoratiRedditDel.icio.usNewsVineFurlBlinkList

Posted by: Adron
Posted on: 10/13/2007 at 1:09 PM
Actions: E-mail | Kick it! | DZone it! | del.icio.us
Post Information: Permalink | Comments (0) | Post RSSRSS comment feed

Powell's Recycles :: Books

I decided on my first official day 100% back in Portland I was going to finally get my "office" setup over in the south east side of town.  In doing so Joleen and I spent about 4 hours getting things cleared up and trash tossed, and I started stacking all my books I no longer needed or wanted.  At the end of the night I had approximately 40 extra books, that I had no use for.  Instead of tossing them in the trash with the other trash, I decided I'd let Powell's decide if they could sell them or they could just recycle the books.

I went in today, made an amazing $6.00 credit toward my purchase of some magazines and another book.  I was stoked that Powell's will do this.  In addition I didn't have to throw away or worry about where to "take to recycle", they just handled all the rest.

Powell's ROCKS!  Love this place.  Now off to run some errands and dig into my recently purchased magazines and Ryan influenced purchase of O'Reilly's Collective Intelligence.

Digg It!DZone It!StumbleUponTechnoratiRedditDel.icio.usNewsVineFurlBlinkList

Posted by: Adron
Posted on: 10/13/2007 at 1:06 PM
Actions: E-mail | Kick it! | DZone it! | del.icio.us
Post Information: Permalink | Comments (0) | Post RSSRSS comment feed