Hello! This post is a contribution to T-SQL Tuesday. T-SQL Tuesday is a monthly blogothon where we get together and write about a different topic. March’s topic was to blog about you favorite data type, hosted by Brent Ozar.
I wrote two versions of this post. This one you’re reading in English, and this one in Portuguese.
If you work with data, you probably do not have control over all the data sources you need. What I mean, is that for example, you may receive data from different places. Perhaps it’s your job to centralize and standardize it the best you can, so that it makes sense to your team. Once you understand the data, value can be extracted from it.
When your data comes from different systems, it’s likely you will not have control over the data’s validation. For example, one of you vendors may be very specific about the data types allowed into their systems, which means when the data gets to you, you’ll see something (ideally) more structured. However, your data may also be handled by a group of fed up developers who decided they will allow everything the user wants (special attention to the the verb there being want instead of need, big difference).
Date data types are really important and used in SQL. But for humans, dates can be formatted in some ways… For example, I’m from Brazil, and the way we write our dates is different from the US.
- Brazil: DD/MM/YYYY
- US: MM/DD/YYYY
In SQL, your date type stores data like this: YYYY-MM-DD. No room for mistakes, right? Bam, Wrong.
Remember when I said you may have different data sources and you can’t control their data type validations? Let’s think of the following example:
- Your clients use a 3rd part system
- The 3rd party uses the data to do whatever it is their system does
- They send you that data with the results of your project
- You, a smart data person, tries to load the data into your system. Most importantly, your system is formatted with the data types you expect to receive. So, for example, if you expect a field with a date value, you’ll format your table to have a date type column.
- Let’s use the table “ThirdPartyInfo” as an example.
Now, as should have assumed by now, your third party did not applied any data validation to the date types. Hence, you may get some crazy “dates”, like this:
- 01-02-03 (where to even begin with this one?!)
- and many others….
Here’s what happens when you try to insert something that’s not a date, into any of the date columns:
It does not matter the method you’ll use to input data to your table, you’ll get an error if you’re not passing date values to your date types columns.
How can you avoid issues like this
- Be open to your vendor about why this is important to your data, and explain to them your tables data type. How? Documenting, the thing IT people hate most.
- If the vendor is pushing back, talk to your superiors, and show them a scenario in which you need to spend your precious (
expensive)time to fix this mishap. Enforce this could be avoided if everyone were on the page about the data types for the data you share.
- If nothing above works, or you need a temporary solution, you could validate the data on your end too. More work upfront, but your future self will thank you for putting this effort now.
Learning it the hard way
- Real life example: I recently had an issue that cost me a lot of time. I had received a csv I needed for reporting, and the file had a few date fields. My table, was expecting date types to come in all the fields, but I was getting errors on my data loading job.
- I had to take a step back and find where the issue was. I thought it would be easy to find, by checking the most recent info that got into the system (this was a daily file we received), and so I started looking for the issue. My main mistake, was that I did not isolate the date column that was giving me an error. In the SQL Server, the error message was really vague, I could only tell there was a string trying to be converted to date and I had several different date fields in my table.
- My second big mistake was fixing everything I found on the source file, and I thought was wrong and causing the issues. Spoiler alert: it wasn’t.
- It took me some time to realize that “ISDATE” was an easy way to search for a column that is expecting a date type, but received something else instead:
You see that the return to that query has the string value. Now, you could apply this kind of check as a validation before you load data into your production table, and also use it to troubleshoot issues like mine.
With all that being said, I actually like the DATE data type in SQL. It works great, the real issue is that we as humanity never agreed on a single date format. Sigh. I hope this helps!
If you want you can read more about the Date types here: Date types in SQL
Would you solve my problem in a different way? Tell me how below in the comments (: