Creating a Data Governance Strategy to Live and Hustle By
Almost everything you need to make moves right now is within your current digital footprint. The challenge right now is you are likely generating digital artifacts without proper organization and placing proper context. Your current story, current situation and your journey and future is defined in your digital pictures, your readings, your software, your social media postings, articles you clipped and saved and notes you taken.
What you want to do and your first mission is learn how to govern the data you bring into your personal and business life to maximize the value of that data for yourself. And let’s get straight to the point – put all of your data in one basket!
What I recommend is you combine and consolidate every piece of data in your life in your gathering from business and life and family into one repository – consolidate everything! Because your real life is you doing yourself 24 hours a day whether it’s business or relationships or just living – your digital governance needs to be holistic to show how every piece of data connects to the sum of your true self.
The Approach
The goal of your data governance is to discover how much data you already have to date, how data is entering your life and how you are processing the data to extract the maximum value.
For example, an average person can have pictures on their camera SD card and their mobile phone SD card, additional pictures on the cloud, have documents on the cloud, note memos on their mobile phone, documents on their tablets and laptops. Digital data all over the place on separate data storage devices with no cohesive strategy to understand the digital footprint you are subconsciously collecting and why.
Your strategy is to put every data point you encountered or created to go into one central data repository. Then you analyze the data in one central repository to start piecing together who you were back then, who you are right now and where you headed. Because all of your answers are in the data that you gathered and that’s the key you need to unlock to truly know yourself.
Because you may not have realized this – your data you collecting is really what the past version of yourself is sending clues about who you are. In other words, your data you are collecting more important than the concept of visions and dreams and it’s time to decode.
Future facing, you want the ability to have artificial intelligence or machine learning look at your data and help you shape who you actually are, that’s the real future goal not only for you but for your children.
Extract Transform and Load
Extract Transform Load is a process known in acronym form as ETL and this process moves data from one format to another. In your data governance strategy, you have to establish an effective operation to extract data from various sources, transform them to be loaded properly into your data repository.
Extract from Personal Devices.
There are two types of data you are extracting, physical data and digital data.
Physical Data. This is your notebook, paper receipts and the things you either write down or receive such as a receipt or business card.
Digital Data. These are either physical data you digitally captured with your camera from photos or a scanner or digital data you acquired downloading from the web or attachments in email and so on.
What you should do is determine what data points occur in your life every day and there is a lot you can capture. For example, what time you wake up in the morning, how much you spent on public transit or gas, how many calories you intake when you eat, the receipt from the story, your to-do list, printed recipes, a good article you read and more – how are you capturing this data you are consuming? The next day you have, start looking for these opportunities throughout your day to capture physical data and digital data.
When you scan all of your physical data into digital data and identify your digital data within your cell phone, your tablet SD card and laptop and phones and more, copy all of the digital files to one location on your computer or multiple computers or storage drive. In my extract method, I named a folder called /data_dump on every computer I had and copied every digital asset I owned on each device to that folder. This is how I extracted the data and stored it in one space.
Transform into Unique Data Item
The job of transforming is where each item is treated as a unique file, even if the same picture or document resides on two different computers. The problem with most commercial file sync software is how duplicate files are handled – the software usually prompts you to replace the file or ignore the file.
While there is software that move files into external drive on both Apple and Microsoft systems and external hard drive backup software – they just give you a copy of the file folder which is not what you want. What you want to do to transform every file into a unique digital unit, even if it is duplicates – you will sort everything out later.
What we have done was create a script in Windows-based C# that scan the folder and copy it to a folder named after the extension. So if a file is named photo.jpg, it is moved to a /JPG folder. In addition, the file is tagged with a GUID so it is renamed photo_xxxx-xxxx-xxxx-xxxx-xxxxx.jpg to make it a unique data item. This script will be made available for download, so check back later for the download.
By organizing all of the files by the file extension, we know have all of our picture files, word documents, pdf documents, video files separated in a folder. This set the stage for us to now have setup a data structure where you can now find files by the type, our machine learning can find files in a repository and we know when we see a video file – store it in the video section, this is the organization we setting up.
Establishing Your Data Load
After extracting and transforming your data into a repository and organized by file name, there is an initial and one-time process to organize the data you already have then create a system to handle future data on an ongoing basis.
Data governance is a real high-paying job and a component of data science. If you can master how you manage your own data, you will develop skills to manage data at your startup, your operations and your campaigns and even for companies you work at.
Your initial data load is a process to refine the data you loaded into a refined repository to build from. This process is creating a data swamp then refine into a data lake then move into a data warehouse to be consumed by data marts. We will explain each component in detail but this is the core of your data governance strategy.
Transform into a Data Lake. You are going to clean up the swamp and remove every bad piece of data that is not relevant, these are system files, configuration files, etc.
Structure into a Data Warehouse. You are then going to create a structured cataloging of all files into a master graph system. The graph system is now new “source of truth” of your data.
Establish Data Marts. Your data marts are each project you working on from a photo book or vision board or business documents – you have your data there at your disposal to draw from. This is the very powerful end result that will transform you to understand with purpose whatever mission you take.
Establish a Data Swamp
A data swamp is the drop folder containing all of your files you gathered unorganized and unsorted. This was the extraction and transform part of this article where you pulled all of the files from all of your devices in one location. Your data swamp should be organized by file type as we described earlier in the article where we drop files in a respective folder named after the extension using our script.
The great thing about data swamps is knowing you have everything as raw as possible in one location and this is a great first step. Because the next step is cleaning up the data swamp and transforming it into a data lake.
Transform into Data Lake
A data lake is a clean version of your data repository after removing the junk files that are not needed. A lot of unnecessary files such as configuration files, setting files or application files will be captured during your ETL process. You will seek to get rid of files you don’t need.
In this case, you will see folders named INI or EXE or DLL and more you do not use – delete these folders. Keep only the folders like JPG, DOCX, PNG, WAV, MP3, PDF that you know are documents, pictures and other files that you will be accessing in your repository.
Once you remove the junk data files you collected, you now have a data lake which is now the true foundation of your data warehouse and your data governance platform. Now you have a clean repository you can begin to start warehousing the data.
Organized into a Data Warehouse
Your data lake can now be organized and indexed into a data warehouse. Your data governance goal at this stage is to establish a single source or truth or SSOT. All of your data reside in this one data lake repository and is the true source of all of your data. When you want data, you need to be confident that the original file and the location is in your data lake as the single source of truth.
To warehouse your data is to create a database of all of the files and creating meta data to properly find and search in the future. There are several ways to do this. The first way and a lot of document management systems does this is they store the information of each file into a flat file or binary index system – the problem is this file can easily become huge and later on, corrupted with bad data.
What we are doing for our data governance is use a database (either SQL Server or MSSQL) for collecting each file name. We created an initial script in C# (similar to the file organizer script) that goes through every directory in our data lake and create a text file containing the file name, file extension, file path, file size and upload date. This data in the text file is then imported into a database where we can now catalog, add meta data and tags that can be searched.
Essentially by creating a database representation of the files in the data lake, we created a document management system to now quickly search through our data lake for relevant files. Our files are now warehoused and we now have a proper data management system.
Formation of Data Marts
A data mart is whatever project or endeavor you are working on that requires data in your repository. This is where you copy the documents, photos and videos and more from your data warehouse for your political project, your book project, your blog article, your business research, your career growth.
If you worked for a company, most likely they had network shared folders for you and your department – these are data marts. Network folders like /marketing or /projectx on the job where you store documents, media and research. Instead of posting stuff on Pinterest, you create data marts to showcase to yourself home remodeling and other exhibits – this is what a data mart is for.
So essentially, you had just created data governance by consolidating all of your data into a single source of truth which is your data warehouse. Whenever you work on a new project, you pull from the single source of truth into the data mart representing your project.
Continuous Integration
Now that you establish a data warehouse, the next step and the most important is creating an ongoing process to push all future data items into the repository. The key strategy is making sure you have a system in place to move your physical and digital data on a routine basis.
The key is to identify your personal devices that you will use to collect data. We recommend a blank (no lines) sketch notebook to write handwritten notes and tape receipts and business cards. Then take photos with your phone camera to capture physical documents. Make your mobile phone camera your primary capture device from screenshots to photo captures.
Another great tactic is to leverage available cloud services that are offered for free or paid services. When you capture items, move it quickly to the cloud drive under a data_dump folder. This way, you have a central repository on the go and once you go home, you can transfer everything to your data warehouse.
After moving the files from the cloud to a local data dump folder, a script is run every day to move the data into the data warehouse using the same code to initially move the data items into the respective file extension folder. However, a second part of that script index the script into a database which is the complete ETL process. One computer can be task to perform the task to pull data from a cloud drive and perform the ETL job as the continuous integration process.
Analyzing Your Data
Your life identity is basically the data you collect every day. You may not realize but your data is your subconscious talking to you about the reality of your life. And because you established a data warehouse, you can now begin to get a better picture of your position by observing a few metrics.
Look at what type of data you are collecting and not collecting. Are you not collecting what you eat for example? This is an important data point you need to know to understand your diet. If you are collecting data on how you spend money, then you consider your financials a big priority. Look at what you are collecting and what you are not collecting and focus on start collecting the important stuff.
Your file upload dates give a context of what activities you record on a daily basis to show what matters to you. Again, take observations of what are you capturing each day and look at both what is being captured and what should be captured on a daily basis. You may be purposely ignoring aspects of your life while focusing on other aspects and your data will show this.
Most important look at the context of what you uploaded – not the files but the bigger picture that you know in terms of your subconscious. You know why you are clipping certain photos and online articles around a certain topic. You know in your inner self while you collecting aspirational data items of what you looking forward and activities you seek to understand better and improve upon.
Mastering Your Data to Master the World
When you establish data governance within your life, you begin to know who you are, where you from and where you going. You also have a powerful tool at your fingertips to use your data to determine your current position and your future aspirations – it’s all there within the data you collect.
When you want to launch a venture, the majority of information you truly need to get going will be within the handwritten notes, the pictures and online clippings you collect. Your data will give you full insight of your position, your shortcoming and strengths and how prepared you are to succeed based on the data you have and the data you don’t have.
As we move forward into the future, machine learning will learn to look at and analyze the data warehouses you created and help you form a better picture of yourself and direction. You will also be able to show and tell a story with your collection of data that is unique to you as well. Mastering data governance will be a key differential between you and your competition because you will have better ownership and insight into your story and journey.