Book review – “Storytelling with Data”, by Cole Nussbaumer Knaflic

Hello! As you may have noticed from my Reading List page here, I like to read. Recently, with the new job, I was looking for a book that talked about Data Visualization. While searching, I came across “Storytelling with Data”, and it was not the first time I saw it. After checking a few reviews, I decided to invest my time reading it. Turns out it was a great decision! I liked it so much that I wanted to talk about it here, so here it comes, grab your reading glasses.

Who should read this book?

I believe this book is great for beginners to BI and Data Visualization. However, it does not require any fancy tools. In fact, the author claims to have used Excel in all the examples. In short, if you work with and have to present data to others, this will be a valuable reading.

What’s covered in this book?

In some of the main chapters, you’ll find information about how to pick the right design for the story you are trying to tell. Bellow is a very brief summary for some chapters covered.

The importance of context

Sometimes we get so into the analytics part of the visualization, the we can easily forget the context behind it. Which is why it’s good t think about who’s this data for, what they are planning to do with it and which questions are trying to be answered by this viz. If you don’t considered this, then you’re just plotting numbers on a report that people will not know what to do with.

Key points: engage with the audience, leave some prompts with call to actions that will guide them to where they should look to understand what is being presented.

Effective visuals

Sometimes, the shine things are not the best. As a BI dev, you may want t go crazy and try something that is rarely used, because it’s cool. However, when you do this, you let go of the main objective of your work: you’re not doing this for yourself. Somebody or some team requested this, and people are more at ease with things they already know how to operate. If they only use excel and visualize data with tables, they may not be interested in seeing a area or stacked bar, because that translates to extra brain effort.

Key points: avoid pizza and donuts charts always, because our eyes can misinterpret them. When possible, stick to familiar forms for the audience (bars, line graphs etc).

Eliminating clutter

This is a hard one for me. I find it one of the most difficult things to do, which is eliminate what is unnecessary. If you want to tell too much at once, you end up not telling anything at all. It’s a good practice to keep it short, keep it simple. Answer what is being asked, and remember you do not need to make the audience take the analyzing journey with you. That was your job! Their precious time will be used to see the final results. You can brag about the hard work you do in a meeting with your managers, for example (performance reviews exists for this!)

Key points: Only show what’s important for the context of the story, remove repetitive things and summarize when possible. Even considere plain big old numbers to highlight very specific topics instead of showing a whole graph for it (for example, when showing a product price range through time, instead of using a graph, just say “Product X had an increase of XX% in x years.”)

Focusing attention

When looking at a data visualization, it’s easy to get lost. If it’s too colourful, too bright, or dull, you may not know what is the key point being presented to you. As mentioned, the audience is not there to go through the analysis with you. That part should be done and you can call out to the audience’s attention by using:

  • color: you could tone everything down, for example by making use of gray tones, and use 1 single color that would pick someone’s attention because it stands out so much.
  • bold letters: bold letters are good because they don’t make your viz look clutter as underline may do, and are more perceptive than italic.
  • sizing: you can make things bigger to call attention, just make sure it’s an appropriate size.

Think like a designer

Wearing your design hats! She mentions in the book that people tend to think that beautiful is in fact perceived as more efficient. This may be a neglected point when you don’t have much time to put into the design, but taking this extra step shows respect for your audience and your data (awn!).

Key points: who doesn’t like pretty things???


Storytelling is what ties it all together. You could have done a great analysis, found awesome stuff about the data, presented it beautifully, but if you’re missing the context because you can’t put it into a “story”, and just like that, you may have lost the audience.

Heard of Death by Powerpoint? Yeah. Creating a narrative that guides the people to your findings, using their language, showing the steps, involving them and asking them to follow along with you is important.

Key points: she really got me thinking about the prompts we can make to the audience. Using questions is an awesome way of inviting people to think about what they are seeing. And when that is narrated by a story, it’s more compelling. It’s hard to follow numbers on a page, it’s easy to see a story develop itself in front of you.

Would I recommend this book?

It’s clear by now that I loved it. It feels like I took a shortcut to avoiding some headaches that would have come with experience in BI. With that being said, don’t go in thinking you’ll see technical stuff about how to actually create vizes, specially with a tool like Tableau or Power BI, there is nothing in there about this.

Other resources

Cole has a blog called (brace yourselves) Storytelling with Data! It’s full of resources and has more info about the workshops and material they have.

That was it for me. I’m excited to check books that were mentioned on this one, and if they’re interesting I may write a review for them too. See you soon!

My scketchnotes from Tableau Conference 2021 – Data Culture & Data Governance

Hello! Recently I’ve started a new job as a Business Intelligence Analyst (I’m thrilled about this :D). One of my team’s main tool is Tableau, which is used to create data visualizations. I’ve always loved this topic, but it was not something necessary to apply at work… until now!

Tableau has a conference that is happening from November 9-12. I attended the first day yesterday, specially the sessions that talk about Data Governance and Data Culture, given these are topics that will be present in my day to day work.

Something things about me:

  • I’m a visual learner;
  • I like to take notes, they help me structure what I just listened;
  • I love drawing (love does not equal being great at it, you can love it too!);
  • Sometimes it’s hard for me to just sit there and listen to a talk, my mind can drift away very quickly.

Once I saw Kendra Little’s method of sketchnotes and thought it was awesome so I’m giving it a try now. Basically, during the live conference you take notes, but not writing too much. This method forced me into condensing the topic even more, and using visual aids to make the notes more appealing. For my first try it was interesting. It was a great help for me, because I learn better if I’m writing something down, and doing it this way allowed it more freedom.

Without further ado, here are the notes so far! I hope that by checking them out you could understand what the session was and inspire you to try your own!

Data for the win: how two olympic athletes use data
Data Literacy: How to get your company involved in data
Empower users to know, use & trust governed data
Creating Data-Driven Organizations

I know there are typos in there, my goal was not to take perfect notes, but to absorb as much information as possible. I hope this was useful and let me know if you try it!

Let’s all data!

#Data21 #Tableau

Work/Life Balance – #TSQL2sday

This post is a contribution to T-SQL Tuesday! I’m excited to be here once again ūüôā

This month we’re talking about Work/Life balance and I’m contributing with some things that I’ve learned along the way.

When I start thinking about the topic, work/life balance sounds as something that may be separated in two big parts: your personal challenges, and the culture of the place you work at. It’s important to be aligned with your employers culture – but only up until a point that makes sense and it’s comfortable to you.

What I mean by comfort, it’s making sure that you’re able to have your professional life and still have time for your own stuff. After all, work is only part of your life.

I work since I was 15. Since I started, I have a better idea of how to take better care of my mental and physical health. These two topics are my pillars when thinking about the balance I need. I want to be able to make decisions on my professional life, while still taking care of me along the way.

Mental health it’s still health!

There’s a big stigma that surrounds this theme. Some people are afraid to talk about it. That’s why whenever I’m having a casual conversation at work, I’ll usually tell the person that I’m currently doing therapy. I like to talk about my hobbies and things outside work. I believe that sometimes we get stuck on day to day tasks, and it’s a nice reminder for others that hey – you are also human! Also, after I open up more, others seem more encouraged to talk about their own stuff.

We need to acknowledge and be constantly remembered that we’re all people with our own outside-of-work lives.

Things that help me mentally

  • making sure I take some time-off: who does not love vacations? They’re great, but lately I’ve been more open to paying attention to my body and mind working hours. During the day, I’ll make sure to block time for high focused work when I’m most alert. I know that I don’t function before a good breakfast, and that coffee (or just taking the break!) is a perfect mood booster for my afternoons, when I’m getting tired. This leads to…
  • taking smarter breaks: this one is easier said than done. I have a hard time not leaning into social media whenever I need a break. So sometimes I’ll try to do something small like picking up snack, filling my water bottle, stretching with some guided video, reading a book’s chapter, reading an article etc.
  • never work when I’m sleepy, thirsty or hungry: this is something that I internally always felt. Then, last year I joined a conference where one of at presenters said this when she was finishing her talk, and it was like my unconscious shouting to me “I’ve been trying to warn you about this, other people feel it too!”. This is why it’s so important to pay attention to your internal body clock, knowing when you’re likely to be hungry or tired can help you better plan your day. Pro-tips: don’t be the person who schedules meetings during lunch time, people will not love you.
  • having hobbies: besides making good conversations topics, I love doing manual work like drawing, cooking, and I’ve even learned some macrame. It’s amazing how into the flow my work gets when I’m doing an activity that requires my full attention like that. Doing something that relaxes you is like giving your brain a break from the work. So that next time you have a challenge in front of you, it may be easier to think in a different way and come up with fresh new ideas.
  • meditate: I’m anxious. Most people say I look super zen. Luck for them (and me) they can’t see what goes on inside an over thinker’s head. Most of the time, I get lost in thought. It happens. I try to deal with it by acknowledging the facts – this is just a thought, that I created, or, this is just a feeling, and everything is fine, nothing will blow up. To me, meditation is not relaxing, but awareness instead. It’s an anchor that provides tools for when I’m drifting off being concerned about things that will probably never happen.
  • be open to your loved ones: with most of us working from home, it’s likely that you need to share your space with others. I try to be honest about my work and I clearly say when I’d rather not be disturbed for a couple minutes, and that maybe it’s better to start lunch without me because I’ll be late.

Taking care of your body

I’m not going to lie and say I love exercising. I wanted to be the person that wakes up at 5AM, drinks some weird green juice, gets yoga done, takes a shower and it’s ready to work before everyone else is even up! I’m not that person and I probably won’t never be.

I have a tendency to visualize this perfect scenario and then if one tiny part does not go as planned, I’ll just throw it all away. Waking up at 7AM instead of 5? I guess I won’t have time to exercise today after all… Did not exercise when I was planning to? Oh I guess it’s irrelevant to have my healthy dinner than, I’ll just have pizza again.

Because of this behaviour of mine, I need to remind me that it’s ok if I could not wake up at the right time. Or if today I have a headache that won’t allow me to move as freely as I wanted to. Sometimes I’m just not in the mood.

The important thing is doing little things. Doing the best you can do. Each day. Don’t be too hard on yourself. But also make sure you’re not being too soft, like if you were your grandma when the kids wanted something.

Some things that help me take care of my body

  • exercising, and doing so because I love my body. Try different things, eventually you will find something that will help you relax, disconnect, feel stronger even! This leads to a more positive mindset for me. Like the day I did a plank for over a minute and felt like NOBODY MOVE, I’LL FINISH THAT PROJECT TODAY.
  • eating healthy, whole, foods: I come from Brazil, and we have one of the best cuisines – not because it’s fancy, but because it’s nutritiously rich. Most people will eat good proportioned food that is home made and that should be (nutritionally) enough for our bodies. Not to mention we actually leave our computer desks and take proper breaks to eat and refill on energy. I was not someone who cooked, but apparently I like it now and I’ve learned to value more the full process. At home, we make lists of what we’ll eat during the week, then grocery shopping and sometimes we even do a meal prep so that we can freeze things. Having my lunch be frozen so that all I need is to microwave it makes me instantly happier and gives me extra free time for my lunch break, yay! (Of course this is coming from someone with no kids and I have a partner that shares this work with me so I’m speaking from my experience ūüôā ).

This is part of what helps me maintain a good work/life balance. I’m also part of a group of people at work that encourages others to be more physically active. We promote stretching and meditation weekly sessions with the team. I think that even just talking about it at work and making people see our efforts and concerns reminds them of “hey – other people are feeling the same and they care!”. Thinking of our shared human connection can have a huge impact on our work/life balance.

Special thanks to TJay for hosting this month’s theme ūüôā

Voc√™ precisa de SQL para trabalhar na √°rea de dados?

Ilustra√ß√Ķes por Camila Henrique

Eu quis falar sobre isso pois eu vejo muitas d√ļvidas e uma falta de dire√ß√£o de pessoas que est√£o come√ßando em TI ou pensando em trocar de carreira. A resposta r√°pida para a pergunta ‚Äúeu preciso saber SQL para conseguir um trabalho na √°rea de dados?‚ÄĚ, √© sim, precisa. Nos pr√≥ximos par√°grafos eu te explico por que eu penso isso.

√Č f√°cil se perder no meio de tantas linguagens de programa√ß√£o e metodologias de projeto que √†s vezes o b√°sico fica pra tr√°s. Eu acredito que ter uma boa funda√ß√£o pode te abrir portas que antes voc√™ n√£o veria. Eu tenho certeza que SQL √© uma grande skill pra se ter na √°rea de dados.

Os data jobs est√£o em alta h√° alguns anos. Existe muita especula√ß√£o sobre o que uma pessoa da √°rea faz (e realmente, isso pode variar muito). Por√©m, existe uma habilidade que sempre aparece nas descri√ß√Ķes de vagas de trabalho. Voc√™ consegue encontr√°-la abaixo nos exemplos?

This image has an empty alt attribute; its file name is image.png
Linkedin, Julho 2021

Linkedin, Julho 2021

Linkedin, Julho 2021

O SQL sempre está em demanda para muitas vagas. Porque faz parte dos básicos da área. Quando você domina os básicos, você tem grandes chances de prosperar. Você pode aprender coisas ótimas sobre a área de dados, mas no fim, em algum momento você vai precisar lidar com ele: um banco de dados. E adivinha: SQL é justamente como nos comunicamos com os bancos de dados, sua própria língua (eu acho isso lindo, pare de me julgar!).

Eu estou trabalhando em uma s√©rie de posts dedicados √† pessoas que gostariam de aprender SQL do zero. Meu foco ser√° o produto da Microsoft, o MS SQL Server. Eles tem uma ‚Äúvers√£o” de SQL s√≥ para esse banco de dados, √© o T-SQL. Eu espero compartilhar meu conhecimento com voc√™ e quem sabe eu n√£o ajude algu√©m no meio do caminho. Esse √© meu jeito de retribuir para a comunidade ūüôā

Perguntas ou sugest√Ķes, meus coment√°rios est√£o sempre abertos!

Se você quiser ler este mesmo post em inglês, leia aqui.

Do you need SQL for a data related job?

Illustrations by Camila Henrique

I wanted to talk about this because I see a lot of doubts and lack of direction from people who are either beginning now in IT land or thinking about switching careers. The short answer to “do I need to know SQL for a data job?” , is yes. In the next few paragraphs I explain why I think so.

It’s easy to get caught up in all the fancy programming languages and methodologies for projects that sometimes the basics… are just not there. I believe having a good foundation opens paths to other doors that you could not see before. And I’m certain that SQL is one hell of a foundation to have in the data land.

Data jobs have been on a hype for the past few years. There’s a lot of speculating about what a data person job is (and actually it can vary a lot). However, there is one skill that seems to endlessly haunt job descriptions. Can you spot it below?

Taken from Linkedin, July 2021

Taken from Linkedin, July 2021

Taken from Linkedin, July 2021

SQL is always on high demand for any data job. Because it’s part of our basics. And knowing your basics can help you thrive. You can learn a lot of interesting stuff about data, but in the end, you’ll most likely need to get your hands on a database at some point, and SQL is how you talk to a database (and I think that’s beautiful. Stop judging me!).

I’m working on a series of posts dedicated to people who would like to learn SQL from zero. I’ll focus on the Microsoft product, MS SQL Server. They have a version of SQL just for it, it’s called T-SQL. I hope to share my knowledge with you and perhaps help someone along the way. This is my way of giving back to the community ūüôā

Any questions or suggestions, my comments are always there for you.

If you want to check out this same post in Portuguese, click here.

Convite para evento online: Women Techmakers Montreal

Convite para evento online: Women Techmakers Montreal

Olá! O post de hoje é um convite ao evento: Women Techmakers Montreal.

Women Techmakers é um programa criado pelo Google, para celebrar o Dia Internacional da Mulher e para realçar o talento das mulheres em tecnologia. Este programa já passou por mais de 200 eventos globais e 52 países. O próximo evento será no sábado, dia 20 de Março, 2021.

Ano passado, eu me lembro de estar muito animada para participar do evento em Montreal. Por√©m… o COVID aconteceu e eventos foram cancelados. Felizmente, a organiza√ß√£o mudou para um evento online. Foi um dia maravilhoso! Cheio de mulheres super inspiradoras e uma comunidade linda.

Esse ano, eu quis: 1- fazer com que mais pessoas descubram o evento, e 2- fazer com que mais pessoas participem!

Quem pode participar?

Todos os gêneros são bem vindos! Se você gosta de tecnologia, eu tenho certeza que você encontrará um tópico que te interessa. Se inscreva e aproveite! (:

Importante: por ser realizado em Montreal, todo o conte√ļdo ser√° em ingl√™s ou franc√™s.

O evento é de graça e dura o dia todo (das 9AM às 6PM EST). Aqui você pode consultar a agenda do dia.

Caso voc√™ queira ver o que rolou no ano passado, aqui est√° uma playlist com as sess√Ķes.

Para mais informa√ß√Ķes, visite o site oficial.

Espero te ver por la!

Check out this post in English.

Event invitation – Women Techmakers Montreal

Invitation to join online event: Women Techmakers Montreal!

Hello everyone! Today’s post is an invitation to the Women Techmakers Montreal event.

Women Techmakers is a program created by Google to celebrate International Women’s Day and to highlight the talent of women in technology. This program has been in over 200 global events and seen across 52 countries. The next event will be on Saturday, March 20, 2021.

Last year, I remember being really excited to go to the event in Montreal, when COVID hit and events all over were cancelled. Luckily, the organization switched to an online event instead. It was an amazing day! Full of such inspiring women, and such a beautiful community.

This year, I wanted to: 1- let more people know about the event, and 2- get people to join the event!

Who can join?

All genders are welcome! If you’re into tech, I’m positive you’ll find a topic that will get your attention. Sign up and enjoy (:

The event is totally free and lasts the whole day (from 9AM to 6PM EST). Check out their amazing agenda.

If you’re wondering about what happened last year, check this playlist with their sessions.

Find more info on their site.

I hope to see you there!

T-SQL Tuesday – Meu tipo de dado (menos) favorito em SQL: DATE

Ol√°! Este post √© uma contribui√ß√£o ao T-SQL Tuesday. T-SQL Tuesday √© um blogothon mensal, onde a comunidade se re√ļne para escrever sobre um t√≥pico diferente. O t√≥pico de mar√ßo  √© sobre seu tipo de dados favorito. Brent Ozar √© o host do m√™s.

Eu escrevi esse post em duas vers√Ķes. Esta que voc√™ est√° lendo em portugu√™s, e uma em ingl√™s.

Se voc√™ trabalha com dados, voc√™ provavelmente n√£o tem controle sobre todas suas fontes. Por exemplo, voc√™ pode coletar dados de lugares diferentes. Talvez seja o seu trabalho centralizar os dados, e criar padr√Ķes para que os dados fa√ßam sentido para o seu time. Uma vez que voc√™ entendeu seus dados, voc√™ pode extrair o real valor deles.

Quando seus dados existem em diferentes sistemas, talvez voc√™ n√£o tenha controle sobre a valida√ß√£o que acontece por tr√°s. 

Por exemplo, um dos seus fornecedores podem ser muito espec√≠ficos sobre os tipos de dados que eles permitem no sistema. Isso significa que quando os dados chegarem at√© voc√™, voc√™ ver√° algo (idealmente) mais estruturado. Contudo, seus dados podem tamb√©m prover de um sistema liderado por desenvolvedores que decidiram deixar o usu√°rio ‚Äúlivre‚ÄĚ para fazer o que quiser (aten√ß√£o ao verbo ‚Äúquerer‚ÄĚ e n√£o ‚Äúprecisar‚ÄĚ, existe uma grande diferen√ßa a√≠).

Dados dos tipos de datas são muito importantes em SQL. Mas para nós humanos, datas podem ser formatadas de diferentes maneiras… Por exemplo, no Brasil, escrevemos datas de um jeito diferente dos Estados Unidos.

  • Brasil: DD/MM/AAAA

Em SQL, seu tipo de data é guardado no banco assim: AAAA-MM-DD. Não tem como errar, certo? Errado.

Lembra quando eu disse que voc√™ pode ter origens diferentes dos dados e que por isso n√£o tem controle sobre as valida√ß√Ķes? Vamos pensar no seguinte exemplo:

  • Seus clientes usam um sistema de outro fornecedor, passando os dados para eles
  • O fornecedor usa os dados para fazer o que quer que seja que o sistema fa√ßa
  • Eles te enviam os dados com o resultado do projeto
  • Voc√™, um profissional inteligente, tenta carregar os dados no seu sistema. E mais importante, seu sistema √© formatado com os tipos de dados que voc√™ espera receber. Ent√£o, digamos que voc√™ queira receber um campo com valores do tipo data, voc√™ vai formatar sua tabela para ter um tipo de data.
  • Vamos usar a tabela ‚ÄúThirdPartyInfo‚ÄĚ abaixo como exemplo.

Agora, como você já deve ter imaginado, seu fornecedor não aplicou nenhum tipo de validação para esses tipos de dados. Logo, você pode se deparar com alguns tipos de datas estranhos, como esses:

  • Jan/2021
  • 03-20-18
  • 01-02-03 (onde come√ßar com esse?!)
  • 2022/2
  • Entre outros…

Isso é o que acontece quando tentamos inserir algo que não é uma data, em qualquer uma das colunas do tipo data.

Não importa o método que você use para popularizar essa tabela, você vai receber um erro se você não está passando valores do tipo data para suas colunas do tipo data.

Como evitar problemas como esse

  • Seja honesto com o seu fornecedor sobre a raz√£o de ter os tipos de dados que voc√™ precisa, e explique quais s√£o eles. Como explicar? Fazendo aquilo que os profissionais de TI mais odeiam: documentando.
  • Se o fornecedor est√° relutante com a mudan√ßa, fale com seus superiores, mostre para eles cen√°rios onde voc√™ tem que gastar seu tempo precioso (e caro) s√≥ para arrumar esse erro. Reinforce que isso pode ser evitado se todo mundo estivesse na mesma p√°gina sobre os tipos dos dados que voc√™s compartilham.
  • Se nada acima funcionar, ou voc√™ precisar de uma solu√ß√£o tempor√°ria enquanto a situa√ß√£o se resolve, voc√™ poderia validar os dados do seu lado. Se voc√™ fizer um esfor√ßo agora, antes do problema, pode parecer mais trabalho, massss o voc√™ do futuro vai te agradecer por ter se esfor√ßado antes do problema chegar.

Aprendendo do pior jeito

  • Exemplo da vida real: eu recentemente tive um problema que me custou um certo tempo. Eu recebi um arquivo csv necess√°rio para um relat√≥rio, e o arquivo tinha alguns campos de datas. Minha tabela estava esperando receber os campos de datas com valores‚Ķde datas. Por√©m, meu job do carregamento dos dados estava com problemas.
    • Eu tive que dar um passo para tr√°s e tentar encontrar a raiz do problema. Eu pensei que fosse ser algo f√°cil de conseguir se eu olhasse as entradas mais recentes do arquivo (esse tipo de arquivo era mandado pra n√≥s todos os dias, com as coisas mais recentes atualizadas). Meu problema principal, foi que eu n√£o isolei a coluna de data que estava dando erro. Na verdade, no SQL Server, a mensagem de erro quase sempre √© muito gen√©rica, e n√£o me falava qual a coluna. Tudo que eu sabia era que uma string estava tentando ser convertida em data, mas sem sucesso.
    • Meu segundo maior problema foi arrumar tudo que eu encontrei no arquivo, que pudesse de alguma forma estar causando o problema. Alerta spoiler: isso n√£o resolveu meu problema.
    • Eu demorei um tempo para entender que usar a fun√ß√£o ‚ÄúISDATE‚ÄĚ, seria uma maneira f√°cil de procurar por uma coluna que esperava receber um tipo de data, mas que recebia outra coisa.

Em momentos de desespero, esquecemos solu√ß√Ķes simples, como esta. 

Observe que o resultado dessa consulta tem um valor do tipo texto. Agora, você pode aplicar essa validação ANTES de carregar os dados na sua tabela de produção, além de usar o mesmo comando para investigar problemas como esse.

Resumindo: eu na verdade gosto do tipo DATE no SQL. Funciona bem, mas o problema real foi que falhamos como humanidade pois nunca concordamos em um √ļnico formato para datas. Espero te ajudar de alguma forma!

Se quiser ler mais sobre tipos de dados de datas em SQL, leia aqui (em ingl√™s). 

Você resolveria meu problema de uma maneira diferente? Me conte abaixo nos comentários (:

T-SQL Tuesday – My (least) favorite SQL data type: DATE

Hello! This post is a contribution to T-SQL Tuesday. T-SQL Tuesday is a monthly blogothon where we get together and write about a different topic. March’s topic was to blog about you favorite data type, hosted by Brent Ozar.

I wrote two versions of this post. This one you’re reading in English, and this one in Portuguese.

If you work with data, you probably do not have control over all the data sources you need. What I mean, is that for example, you may receive data from different places. Perhaps it’s your job to centralize and standardize it the best you can, so that it makes sense to your team. Once you understand the data, value can be extracted from it.

When your data comes from different systems, it’s likely you will not have control over the data’s validation. For example, one of you vendors may be very specific about the data types allowed into their systems, which means when the data gets to you, you’ll see something (ideally) more structured. However, your data may also be handled by a group of fed up developers who decided they will allow everything the user wants (special attention to the the verb there being want instead of need, big difference).

Date data types are really important and used in SQL. But for humans, dates can be formatted in some ways… For example, I’m from Brazil, and the way we write our dates is different from the US.

  • Brazil: DD/MM/YYYY

In SQL, your date type stores data like this: YYYY-MM-DD. No room for mistakes, right? Bam, Wrong.

Remember when I said you may have different data sources and you can’t control their data type validations? Let’s think of the following example:

  • Your clients use a 3rd part system
  • The 3rd party uses the data to do whatever it is their system does
  • They send you that data with the results of your project
  • You, a smart data person, tries to load the data into your system. Most importantly, your system is formatted with the data types you expect to receive. So, for example, if you expect a field with a date value, you’ll format your table to have a date type column.
  • Let’s use the table “ThirdPartyInfo” as an example.

Now, as should have assumed by now, your third party did not applied any data validation to the date types. Hence, you may get some crazy “dates”, like this:

  • Jan/2021
  • 03-20-18
  • 01-02-03 (where to even begin with this one?!)
  • 2022/2
  • and many others….

Here’s what happens when you try to insert something that’s not a date, into any of the date columns:

It does not matter the method you’ll use to input data to your table, you’ll get an error if you’re not passing date values to your date types columns.

How can you avoid issues like this

  • Be open to your vendor about why this is important to your data, and explain to them your tables data type. How? Documenting, the thing IT people hate most.
  • If the vendor is pushing back, talk to your superiors, and show them a scenario in which you need to spend your precious (expensive) time to fix this mishap. Enforce this could be avoided if everyone were on the page about the data types for the data you share.
  • If nothing above works, or you need a temporary solution, you could validate the data on your end too. More work upfront, but your future self will thank you for putting this effort now.

Learning it the hard way

  • Real life example: I recently had an issue that cost me a lot of time. I had received a csv I needed for reporting, and the file had a few date fields. My table, was expecting date types to come in all the fields, but I was getting errors on my data loading job.
    • I had to take a step back and find where the issue was. I thought it would be easy to find, by checking the most recent info that got into the system (this was a daily file we received), and so I started looking for the issue. My main mistake, was that I did not isolate the date column that was giving me an error. In the SQL Server, the error message was really vague, I could only tell there was a string trying to be converted to date and I had several different date fields in my table.
    • My second big mistake was fixing everything I found on the source file, and I thought was wrong and causing the issues. Spoiler alert: it wasn’t.
    • It took me some time to realize that “ISDATE” was an easy way to search for a column that is expecting a date type, but received something else instead:
In desperate times, we may be blind by stress and not think about simple things, like this.

You see that the return to that query has the string value. Now, you could apply this kind of check as a validation before you load data into your production table, and also use it to troubleshoot issues like mine.

With all that being said, I actually like the DATE data type in SQL. It works great, the real issue is that we as humanity never agreed on a single date format. Sigh. I hope this helps!

If you want you can read more about the Date types here: Date types in SQL

Would you solve my problem in a different way? Tell me how below in the comments (:

Dicas de sites e podcast sobre dados e tecnologia

Este post ser√° feito apenas em portugues, j√° que nele eu indico conte√ļdo para brasileiros.

Photo by Avel Chuklanov on Unsplash

Eu adoro me manter informada através de blogs e podcasts. Sempre me inscrevo em diferentes sites, blogs e comunidades por aí. Quando não estou afim de ler e procuro uma coisa mais descontraída ou quero conhecer outros profissionais, eu escuto podcasts. Acredito que esses sejam jeitos fáceis e legais de me manter informada sobre novidades e o que está rolando na comunidade.

Além disso, os sites e podcasts me permitem aprender sobre coisas que talvez eu nunca procuraria. Pensando nisso, eu resolvi dar dicas de alguns sites e podcast que eu sigo e indico. Pequeno lembrete: eu trabalho na área de dados, então a maioria das minhas dicas serão relacionadas a área.




Caso você saiba falar inglês e queira praticar, aqui estão meus blogs preferidos:

  • Brent Ozar, Microsoft Certified Masters, Brent trabalha como consultor e tem varios treinamentos de SQL Server, blog e canal no youtube (ps: os treinamentos dele s√£o fant√°sticos, indico muito!)
  • Haystacks, blog de data science por Caitlin Hudon
  • Little Miss Data, blog de data science por Laura Ellis
  • SQL Server Central, blog da comunidade de SQL Server
  • SQL Authority, blog sobre SQL Server por Pinal Dave
  • Towards Data Science, blog de data science no Medium

Conhece algum outro site ou podcast que n√£o est√° na lista? Comenta a√≠ embaixo! ūüėČ