- DIKW pyramid so data + context = information
- Fred Smith: Info about the package is as important as package as itself (FedEx)
Data speed
- Uploading to and downloading from the cloud increases latency; time-poor environments + increased risk of junctures/steps where things can go wrong.
Data intuition
- Insurance example, “flighted” helicopter or plane; what are the correlations that would help us sift from low to high cost and then distribute those accordingly.
- Define the situation and setting. You can eliminate 99% of everything else. So in the doctor example, we have three factors (drug, amount, dosage/freq) and now its a known classification problem w/ three categories that you hypothesize form w/in those.
VAULTIS
- Visible —> know and can be seen
- Accessible —> Can get to it
- Understandable —> If you can access it, do you understand it?
- Linked —> tethered and connected throughout the organization to other important info
Data Governance
- Three mechanisms: Structural + Procedural + Relational
- Data audit as akin to a financial audit
- Resources vs. additional duties
Ingredients for Data Governance
- Metrics and data quality guidance to inform teammates on what right looks like
- Will there be a future where we treat data etiquette as strict as rifle etiquette
- DOTMLPF(P) —> Adapt it for governance and data audits
Metadata
- Map - territory; it’s the map for the data
OLTP vs OLAP —> Data Warehouse generation
Exist vs. used: A company had 700 fields but only 10% are used. Before you license data, identify how many data fields are used. Do I need 700 fields? So many questions rise
Pre-work for data quality practices:
- Generate some summary statistics
- Identify potential issues
- Walk the process of common use cases
Data downtime: The system isn’t producing data of acceptable quality. The sum of 1 + 2 is what you’re trying to reduce.
- Incident detection —> 2. Incident resolution
Cascade issues —> propagation issues and alerts that can flood a system. Which is why root cause issues are so important.
Sutton + Sopoleski —> your data defines what you can access and undestand and that is our area of possibility. Which is why we don’t hav efree will. So our adata structure is that much more importnat…