DATAPHREAK - AI-Powered Data Analysis Tool

Merge Files

Merge the currently loaded dataset with another CSV/TSV/Excel file. Choose the key columns and optionally convert Salesforce IDs to 18 characters for matching.

Select file to merge:

Help & Documentation

Welcome to DATAPHREAK, your comprehensive data analysis and cleaning tool. This guide covers all features and capabilities.

🚀 Getting Started - Your First 5 Minutes

New to DATAPHREAK? Follow this quick walkthrough to get started immediately!

📊 Step 1: Try Sample Data (30 seconds)

• Click the "✨ Try Sample Data" button in the Load Data section
• This loads 400 rows of customer data with some intentional issues to explore
• Perfect for learning without uploading your own files

👀 Step 2: See Your Data Overview (1 minute)

• Check the KPI cards that appear - they show row count, columns, missing values, and duplicates
• Look for the Data Quality Score - higher is better!
• Notice the Quick Actions like "Analyze All" and "Quick Fix"

📈 Step 3: Visualize Your Data (1 minute)

• Click the "Data Charts" button to see beautiful charts
• Hover over bars to see exact counts and percentages
• Age data automatically groups into 5-year ranges

🧹 Step 4: Clean Your Data (2 minutes)

• Click "🧹 Clean All" for smart instant cleaning
• Or go to "Data Operations" section to choose specific fixes
• Watch the data quality score improve!

💾 Step 5: Export Results (30 seconds)

• Use "Export" to download your cleaned data
• Choose CSV, JSON, or create a Data Dictionary report
• Files are automatically named based on operations performed

🎯 Pro Tip: Start with sample data to learn, then upload your own CSV files. Everything happens locally in your browser - your data never leaves your computer!

Ready to dive deeper? Continue reading the sections below, or jump to any topic using the navigation above.

1. Loading Your Data

Getting your files into DATAPHREAK is easy!

Drag & Drop: Simply drag your CSV or TSV file into the gray drop zone
Browse Files: Click "Choose File" to select from your computer
Try Sample Data: Click "✨ Try Sample Data" to practice with example data first

💡 File Requirements:
• First row should contain column names (like: Name, Email, Age)
• CSV files work best, but TSV (tab-separated) files work too
• Excel files (.xlsx/.xls) are supported directly

2. Understanding Your Results

Once your data loads, here's what you'll see:

KPI Cards: Quick stats showing total rows, columns, missing values, and duplicate records
Data Quality Score: AI-powered assessment (0-100%) measuring completeness, consistency, and pattern validity with A-F grading
Quick Actions: One-click buttons for common tasks:
- Analyze All: Deep-dive into every column
- Clean All: AI-powered cleaning that auto-corrects emails, phones, and dates
- Health Check: Get a detailed data quality report

🎯 What's a good quality score?
• 90-100%: Excellent - very clean data
• 80-89%: Good - minor issues to fix
• 70-79%: Fair - several problems to address
• Below 70%: Needs significant cleaning

AI-Powered Data Quality Assessment

Comprehensive quality scoring that analyzes your entire dataset:

Overall Score: 0-100% assessment with letter grades (A-F) like a report card
Completeness: How much of your data is filled vs missing
Consistency: Measures duplicate rows and data conflicts
Validity: Checks if data follows expected patterns (emails, phones, dates)
Visual Indicators: Color-coded quality cards (green = good, red = needs work)
Actionable Insights: Shows exactly what needs attention
Real-time Updates: Quality score improves as you clean your data

AI-Powered Duplicate Detection

Advanced duplicate detection that goes beyond basic matching:

Exact Duplicates: Finds rows that are completely identical across all fields
Fuzzy Duplicates: AI similarity matching finds near-duplicates like "John Smith" vs "Jon Smith"
Smart Export: CSV includes both exact_duplicate_group and fuzzy_duplicate_group columns
Performance Optimized: Handles large datasets with chunked processing
Visual Grouping: Clear separation between exact matches and similar records
Row References: Shows exact row numbers for easy verification in Excel

Field Analysis & Profiling

Detailed analysis for each column in your data with AI pattern detection:

Data Types: Number, date, boolean, string with confidence scores
AI Patterns: Automatically detects emails, phone numbers, and dates with confidence indicators
Pattern Confidence: Green = high confidence, yellow = partial match, gray = low confidence
Coverage: Percentage of filled vs missing values
Uniqueness: How many different values appear in each column
Number Analysis: Minimum, maximum, average, and unusual values
Date Ranges: Earliest and latest dates in date columns
Text Analysis: Character length patterns and text formatting details
Sample Values: Representative examples from each column

AI Pattern Detection

Intelligent recognition and standardization of common data formats:

Email Detection: Automatically identifies and standardizes email addresses to lowercase
Phone Numbers: Recognizes various phone formats and standardizes to consistent formatting
Date Intelligence: Handles mixed international date formats (US, European, ISO) and converts to YYYY-MM-DD
Smart Confidence: Shows green (high), yellow (partial), or gray (low) confidence indicators
Auto-Correction: Clean All button applies pattern-based fixes automatically
International Support: Properly handles global phone and date formats
Column Intelligence: Uses column names for better pattern prioritization

Data Distribution Charts

Professional-grade interactive charts accessible via the Data Charts button:

Smart Binning: Automatic age grouping (5-year ranges) and intelligent numeric binning
Visual Features: Gradient colors, grid lines, Y-axis labels with professional styling
Statistical Overlays: Mean (μ) and median (M) lines with exact values displayed
Interactive Tooltips: Hover for exact counts, ranges, and percentages
Color Coding: Frequency-based colors (red=high frequency, blue=low frequency)
Categorical Charts: Top 10 values for text columns with horizontal bars
Smooth Animations: Loading transitions and hover effects for enhanced user experience
Print Function: Export histograms to PDF with professional formatting
Responsive Design: Charts automatically scale from 580px to 1160px based on screen size
Age Detection: Automatically formats age data with meaningful 5-year groupings

Validation Rules & Quality Checks

Define and enforce data quality standards:

Allowed Values: Comma-separated lists of valid values per column
Pattern Matching: Custom rules for complex data validation
Rule Persistence: Automatically saves rules using column header signatures
Issue Detection: Highlights cells that don't match your rules
Issue Reporting: "Rows with Issues" section shows problematic data
Batch Processing: Apply rules across entire datasets efficiently

Export Options

Multiple export formats and customization options:

CSV Export: Standard comma-separated format with proper escaping
JSON Export: Array of objects with column headers as keys
Data Dictionary: Complete field definitions with statistics and rules
Enhanced Duplicate Exports: Includes both exact_duplicate_group and fuzzy_duplicate_group columns for comprehensive review
Histogram Printing: Professional PDF reports of distribution charts
Security: Formula injection protection (prefixes dangerous characters)
File Naming: Intelligent naming based on operations performed

11. Compare Files (Find Similar Records)

Have two files with similar but not identical data? This feature helps you find matching records across files.

📋 Common Use Cases:

Customer Lists: Find "John Smith" in file A that matches "Jon Smith" in file B
Email Variations: Match "j.smith@company.com" with "john.smith@company.com"
Company Names: Find "ABC Corp" that matches "ABC Corporation"

🔧 How to Use:

Step 1: Load your first file (becomes your "primary" dataset)
Step 2: Use "Merge Files" to load a second file (becomes "secondary")
Step 3: Choose which columns to compare between the files
Step 4: Set similarity level (0.80 = very similar, 0.50 = somewhat similar)
Step 5: Click "Compare A↔B" to find matches

⚡ Performance Note: To keep your browser responsive, we limit comparisons for very large files. If you see a message about too many comparisons, try increasing the similarity setting or working with smaller datasets.

Cross-File Merging & ID Matching

The Merge Files feature allows combining the loaded dataset with another CSV or TSV file. Choose key columns from each dataset, select a join type (Left, Inner, Right or Full) and optionally enable Salesforce ID conversion for matching. The merged output is downloaded automatically and the second file persists in memory as your secondary dataset for subsequent cross-file analysis. You can switch between the primary and secondary datasets via the dataset selector next to the dataset name.

Themes

Use the theme button to toggle between Dark, Light, and Matrix modes. Your selection is saved locally and persists between sessions.

Settings & Preferences

Open the Settings panel from the top navigation to customise DATAPHREAK. From this modal you can:

Choose a theme – Dark, Light and Matrix Mode styles are available.
Select default cleaning operations – decide which actions (trim spaces, fix letter case, remove accents and convert Salesforce IDs) are pre-selected when you open the Data Cleaning & IDs panel.
Pick your language – switch the interface language once translations are available.
Enable or disable persistence – save preferences and rules to your browser’s local storage or opt to discard them on page reload.
Encrypt local data – optionally protect your preferences and rules with a passphrase using in-browser AES-GCM encryption.

Your preferences are stored locally only and never sent to a server. If persistence is disabled, settings revert to defaults when the page is refreshed.

Keyboard Navigation

You can tab through inputs and use keyboard shortcuts like Ctrl+L to reload the file or Esc to close modals. Most buttons also display an informative tooltip when hovered.

4. Quick Data Cleaning

Fix common data problems with one click! Select columns and choose which fixes to apply:

🧹 Available Cleaning Options:

Trim Spaces: Removes extra spaces before/after text
Example: " John Smith " becomes "John Smith"
Fix Letter Case: Smart formatting based on field type
Names → Title Case, Emails → lowercase, IDs → UPPERCASE
Remove Accents: Converts special characters to regular letters
Example: "José" becomes "Jose"
Convert Salesforce IDs: Extends 15-character IDs to 18-character format
Useful if you work with Salesforce data

🚀 Quick Start:

Step 1: Click checkboxes next to columns you want to clean
Step 2: Choose which operations to apply
Step 3: Click "Apply" to clean your data
Step 4: Watch your data quality score improve!

💡 Pro Tip: Use "🧹 Clean All" button for AI-powered instant cleaning that auto-corrects email, phone, and date formats, or customize specific operations below.

Unique Keys & IDs

The assistant in the Data Cleaning & IDs panel helps you find one or two fields that can uniquely identify each row. It scans for fields or field pairs with no missing values and no duplicates. These are ideal as primary keys or External IDs. If none exist, the assistant offers to add a new surrogate_id column with sequential values.

Security & Privacy

DATAPHREAK is designed to keep your data private and secure. All processing happens locally in your browser—no network calls are made. This section summarises the key security measures:

Offline-only operation: your datasets never leave your computer.
Sanitised exports: exported CSV values are escaped to prevent formula injection and malicious scripts; cells that begin with =, +, -, or @ are prefixed with a single quote to turn them into plain text.
No macros or scripting engine: unlike traditional spreadsheets, DATAPHREAK has no macro support, eliminating a major attack vector.
Strict Content Security Policy: the page is served with a Content Security Policy that blocks inline and remote scripts.
Cross-file comparison cap: approximate matching across files is capped at ~200,000 comparisons with lightweight blocking to prevent browser lock-ups.
Error handling: file parsing is wrapped in try/catch blocks with friendly error messages.
Local persistence: rule sets and preferences are stored only in your browser’s localStorage; nothing is uploaded to any server. You can disable persistence entirely or encrypt your saved data with a passphrase in the Settings panel.

Legal Notice

This tool is currently in beta and is provided “as is” without warranty of any kind. It was created and is owned by Zachary Sluss. During the beta period it is made available as open source. For inquiries, contact zacsluss@yahoo.com.

Settings

Theme

Default operations Trim spaces Fix letter case Remove accents Convert SF IDs

Language Enable persistence Encrypt saved data Password

Development Log

Development history and feature updates for DATAPHREAK:

v8.6.0 (Current)

Professional Data Visualization Revolution: Complete transformation of chart readability and user insights:
- 📊 Smart Axis Titles: Intelligent X/Y axis labeling based on column patterns (Age, Price, Email domains, etc.)
- 💡 Enhanced Hover Tooltips: Rich insights with count, percentage, rank, and data quality indicators
- 🎨 Data-Driven Colors: Red bars for quality issues, green for clean data, smart outlier detection
- 🎯 Professional Layout: Increased margins, proper typography, and statistical overlays
- 📈 Quality Indicators: Visual badges for data validation (✅ Valid, 🔸 Rare, 🔥 Common)
- 🔧 Enhanced Positioning: Smart tooltip boundary detection prevents edge cutoff
Critical Stability Fix: Resolved JavaScript variable collision that was preventing app functionality:
- 🐛 Variable Conflict: Fixed duplicate 'mean' declaration causing syntax errors
- ⚡ Restored Functionality: All buttons and features now work properly
- 🔍 Code Review: Enhanced error detection and prevention measures

v8.5.0

True Offline Functionality: Complete elimination of external dependencies for authentic offline operation:
- 📦 Embedded SheetJS Library: Fully integrated 951KB Excel processing library directly into HTML file
- 🌐 Zero Network Calls: Removed all CDN references, DNS prefetch hints, and external script sources
- 🔒 Enhanced Security: Updated Content Security Policy to block all external resources while maintaining functionality
- ✅ Excel Support Maintained: Full .xlsx and .xls file compatibility preserved without internet dependency
- 📱 Single-File Promise: True to the "single-file offline-ready" commitment - works completely without internet
- 🚀 Performance Optimized: File size increased to 1.33MB but eliminates network latency and dependency failures
Security Enhancements: Comprehensive security review and improvements:
- 🛡️ Content Security Policy: Strengthened CSP by removing external CDN allowances
- 🔍 Code Analysis: Full security audit confirming no eval(), dangerous functions, or XSS vulnerabilities
- 🧹 Unicode Cleanup: Fixed non-ASCII en-dash characters that caused JSON parsing errors
- ✅ Input Sanitization: Verified proper HTML escaping in all user data handling

v0.8.0

Revolutionary Help System Redesign: Complete transformation of user documentation for optimal new user experience:
- 🚀 Getting Started Section: Brand new 5-minute walkthrough with step-by-step instructions for immediate user success
- 📋 Logical Navigation Structure: Reorganized from beginner → advanced with color-coded sections (Basics/Analysis/Advanced)
- 🎯 Plain English Content: Replaced technical jargon with user-friendly explanations and real-world examples
- 📊 Visual Enhancement: Added icons, step-by-step guides, Pro Tips, and helpful callout boxes throughout
- 🔧 Improved Navigation Pills: Fixed clustered layout with proper spacing, responsive design, and smooth hover animations
Complete Hover Tooltip Enhancement: Comprehensive tooltip system overhaul for superior user guidance:
- ✅ 100% Coverage: Added tooltips to every interactive element including buttons, inputs, selects, and navigation items
- 🎨 Visual Consistency: Standardized all tooltips with emojis, friendly language, and contextual examples
- 📝 Beginner-Friendly Language: Replaced technical terms with plain English and added "what this does" explanations
- 🎯 Contextual Help: Each tooltip provides immediate value with examples like "e.g., ' John ' becomes 'John'"
Professional Icon Integration: Custom data analytics favicon for brand identity:
- 📊 Data-Themed Design: Custom icon featuring data charts and analytics symbols in your brand colors
- 🔄 Universal Compatibility: Professional favicon that displays perfectly in all modern browsers
- 💾 Self-Contained: No external dependencies - everything works offline without internet connection
- 🎨 Theme Coordination: Icon automatically matches your chosen theme colors for consistent branding
Enhanced User Experience: Streamlined interface improvements for better workflow efficiency:
- 🎯 Intuitive Navigation: Improved button layouts and consistent visual design throughout the application
- ⚡ Faster Workflow: Optimized performance and reduced loading times for smoother data processing
- 🛡️ Enhanced Security: Improved data protection and privacy safeguards for sensitive information
- 📱 Better Responsiveness: Enhanced mobile and tablet compatibility for on-the-go data analysis

v0.6.1

Professional-Grade Data Visualization: Revolutionary histogram system with enterprise-level features:
- Interactive data distribution charts with hover tooltips showing exact counts, ranges, and percentages
- Intelligent data grouping including automatic age ranges (5-year groups) and smart number categories
- Statistical overlays with mean (μ) and median (M) lines displayed with exact values
- Frequency-based color coding system (red=high frequency, blue=low frequency)
- Beautiful gradient fills and professional grid lines with Y-axis labels
- Smooth loading animations and staggered bar transitions for enhanced user experience
Enhanced Export Capabilities: Dual-format export system with improved usability:
- JSON export functionality converting rows to objects with headers as keys
- Separate CSV and JSON download buttons with intuitive icons
- Professional PDF data chart printing with formatted layouts
- Formula injection protection for all export formats
Comprehensive Help Documentation: Complete documentation overhaul with:
- 16 organized sections covering all features and capabilities
- Dedicated histogram documentation with detailed feature explanations
- Enhanced navigation with logical categorization and improved discoverability
- Professional documentation standards with consistent formatting and user-friendly language
Responsive Design Excellence: Advanced scalability and user interface improvements:
- Histogram charts automatically scale from 580px to 1160px based on screen size
- Improved age data detection and specialized formatting
- Enhanced categorical data visualization with top 10 value horizontal bars
- Professional modal layouts with optimized space utilization

v0.6.0

Complete Theme System Overhaul: Achieved perfect visual parity across all themes with:
- Enhanced Light Theme with golden accent animations, particle system, and comprehensive visual effects
- Dark Mode cosmic dust particle system with purple accent theme conversion
- Matrix Mode particle system optimization and consistency improvements
- Uniform accent gradient coverage across all button types in all themes
- Perfect particle masking and visibility controls across all themes
Advanced Particle Systems: Implemented three unique particle effects with:
- Light Theme: Golden glowing sparkles with gentle rising animation
- Dark Theme: Purple cosmic dust with natural falling patterns
- Matrix Theme: Enhanced digital rain effects with smooth performance
- Intelligent positioning ensuring visual effects appear in the right places
- Consistent visual effects and controls across all available themes
Visual Consistency Excellence: Perfect theme uniformity including:
- Standardized glow effects, hover animations, and interaction feedback
- Enhanced button gradients with translucent accent color treatments
- Improved border visibility and separator line consistency
- Fixed collapsible card functionality across all header structures
- Universal green dropzone styling with enhanced hover effects

v0.5.4

Interface Flow Optimization: Perfect logical organization including:
- Reordered Data Cleaning section for optimal workflow: Field Selection → Custom Operations → Apply Changes → Quick Actions
- Positioned Apply Changes button directly below Custom Operations checkboxes for clear visual connection
- Moved Quick Actions below Apply Changes to separate preset utilities from custom operations
- Clear task separation making it easy to understand: configure → apply → use utilities
- Improved workflow reducing confusion about which buttons to use when
User Experience Excellence: Streamlined interface design with:
- Perfect logical grouping of related functions for intuitive workflows
- Reduced cognitive load with visually connected operation configuration and execution
- Clear distinction between custom operations and preset quick actions
- Enhanced usability through optimal button placement and visual hierarchy

v0.5.3

UI/UX Enhancements: Major interface improvements including:
- Reorganized Data Cleaning & IDs section with expanded Quick Actions (5 buttons)
- Moved Analyze Keys and Add ID Column to Quick Actions for better accessibility
- Added Download button to Quick Actions for streamlined workflow
- Simplified bottom action area to focus on Apply Changes functionality
- Improved visual balance with evenly distributed quick action buttons
User Experience: Enhanced workflow optimization with:
- More intuitive button grouping: instant actions vs. custom operations
- Reduced scrolling with key tools prominently placed in Quick Actions
- Faster access to critical data preparation features
- Cleaner, more organized interface with logical button placement

v0.5.2

Code Cleanup & Optimization: Major codebase improvements including:
- Removed all orphaned code and unused event handlers for better performance
- Eliminated test code from production export functions
- Removed unused metadata functionality (~120 lines of dead code)
- Cleaned up navigation proxy handlers for non-existent elements
- Streamlined validation system to focus on user-defined rules
UI/UX Improvements: Enhanced user experience with:
- Reordered sections for better workflow: Data Cleaning now appears before Duplicate Analysis
- Removed unused table columns and simplified interface
- Improved logical flow from data loading → cleaning → analysis → validation
- Enhanced export functionality with cleaner output (no test messages)

v0.5.1

Bug Fixes & Code Quality: Critical fixes and improvements including:
- Fixed canvas null reference error that prevented file loading
- Added missing event listeners for Dataset Overview quick action buttons
- Improved null safety with optional chaining for all DOM element access
- Replaced loose equality checks with strict equality for better type safety
- Removed debug console.log statements for cleaner production code
UI Streamlining: Simplified interface with:
- Removed unused heatmap visualization system to focus on core analysis features
- Streamlined Data Quality section to Column Analysis and Quality Statistics views
- Fixed dual-file loading system to properly support primary and secondary datasets
- Enhanced user feedback with improved toast messages and error handling

v0.5.0

Interface Refinement: Major usability improvements including:
- Optimized typography with most readable system fonts across all interface text
- Enhanced button sizing and spacing for better touch targets and visual clarity
- Improved quick action button layouts with proper horizontal distribution
- Added meaningful emojis to action buttons for better visual communication
Layout Improvements: Streamlined section organization with:
- Redesigned Data Cleaning layout with side-by-side field selection and operations
- Enhanced Dataset Overview info panel with larger, more readable text
- Removed non-functional placeholder buttons to reduce interface clutter
- Streamlined navigation for better user experience

v0.4.0

Excel Support: Added comprehensive support for .xlsx and .xls files using SheetJS library, allowing users to analyze Excel workbooks alongside CSV/TSV files
Undo/Redo System: Implemented complete undo/redo functionality for data operations using Command pattern with:
- Visual undo/redo buttons in header with dynamic tooltips
- Keyboard shortcuts (Ctrl+Z for undo, Ctrl+Y for redo)
- Memory management with 50 operation limit
- State persistence for all data transformations
Performance Optimizations: Major performance improvements including:
- Optimized data visualization rendering
- Increased analysis capacity to 1M cells
- Chunked file processing for large files
- Optimized duplicate detection with async processing
- Added progress indicators for long-running operations
Enhanced User Experience: Improved file type validation, better error handling, and streamlined interface for business users

v0.3.9

Initial comprehensive data profiling capabilities
CSV/TSV file support with drag-and-drop interface
Missing values analysis and visualization
Exact duplicate detection and reporting
Field-level statistics and type detection
Rule-based validation with allowed value sets
Data validation via user-defined rules
Cross-file fuzzy duplicate detection
File merging with multiple join types
Data cleaning operations (trim, case normalization, accent removal)
Salesforce ID conversion (15→18 character)
Theme support (Dark, Light, Matrix Mode)
Local storage persistence with optional encryption
Offline-first architecture with security focus

Data Distribution Charts

Load a dataset to view histograms for each column...

DATAPHREAKv8.6.0

Load Data ▼

Data Cleaning & IDs ▼

Single-File Duplicates ▼

Compare Between Files ▼

Column Analysis ▼

Rows with Issues ▼