feat: implement auto-detection for data formats and enhance dataset input UI

This commit is contained in:
2025-10-15 12:37:00 +03:00
parent a8f5ba44ea
commit acdc619e4e
5 changed files with 374 additions and 144 deletions

View File

@@ -199,6 +199,12 @@ Astrolabe is a focused tool for managing, editing, and previewing Vega-Lite visu
- Support for multiple data sources: - Support for multiple data sources:
- **Inline data**: JSON, CSV, TSV, TopoJSON stored directly - **Inline data**: JSON, CSV, TSV, TopoJSON stored directly
- **URL data**: Remote data sources with format specification - **URL data**: Remote data sources with format specification
- **Intelligent auto-detection system**:
- Single input field for data or URL
- Automatic URL detection and content fetching
- Format detection from content (JSON, CSV, TSV, TopoJSON)
- Confidence scoring (high/medium/low)
- Visual confirmation UI with badges and preview
- Automatic metadata calculation: - Automatic metadata calculation:
- Row count, column count, column names - Row count, column count, column names
- Data size in bytes - Data size in bytes
@@ -217,7 +223,6 @@ Astrolabe is a focused tool for managing, editing, and previewing Vega-Lite visu
- CSV/TSV: `{ values: data, format: { type: 'csv'/'tsv' } }` - CSV/TSV: `{ values: data, format: { type: 'csv'/'tsv' } }`
- TopoJSON: `{ values: data, format: { type: 'topojson' } }` - TopoJSON: `{ values: data, format: { type: 'topojson' } }`
- URL: `{ url: "...", format: { type: '...' } }` - URL: `{ url: "...", format: { type: '...' } }`
- Button-group UI for source/format selection (matches editor style)
- Graceful error handling for CORS and network failures - Graceful error handling for CORS and network failures
- Modal scrolling support for small viewports - Modal scrolling support for small viewports
@@ -246,6 +251,12 @@ Astrolabe is a focused tool for managing, editing, and previewing Vega-Lite visu
- Vega-Lite's native format parsers used for rendering - Vega-Lite's native format parsers used for rendering
- Metadata refresh fetches live data and updates statistics - Metadata refresh fetches live data and updates statistics
- Modal resizes with viewport (max-height: 90vh) - Modal resizes with viewport (max-height: 90vh)
- **Auto-detection algorithms**:
- URL validation with protocol check (http/https)
- JSON parsing with TopoJSON identification
- CSV/TSV detection via delimiter counting and consistency checks
- Format inference from URL file extensions
- Debounced input handling for real-time feedback
--- ---
@@ -388,10 +399,11 @@ Astrolabe is a focused tool for managing, editing, and previewing Vega-Lite visu
- Multi-format support: JSON, CSV, TSV, TopoJSON - Multi-format support: JSON, CSV, TSV, TopoJSON
- Multi-source support: Inline data and URL references - Multi-source support: Inline data and URL references
- Modal-based Dataset Manager with full CRUD - Modal-based Dataset Manager with full CRUD
- Intelligent auto-detection (URL/format/confidence)
- Visual confirmation UI with badges and preview
- Automatic metadata calculation and display - Automatic metadata calculation and display
- URL metadata fetching and refresh - URL metadata fetching and refresh
- Dataset reference resolution in Vega-Lite specs - Dataset reference resolution in Vega-Lite specs
- Button-group UI for source/format selection
- Retro Windows 2000 aesthetic throughout - Retro Windows 2000 aesthetic throughout
### Technical Implementation ### Technical Implementation
@@ -410,4 +422,5 @@ Astrolabe is a focused tool for managing, editing, and previewing Vega-Lite visu
- **Dataset Storage**: IndexedDB with async/Promise-based API, unique name constraint - **Dataset Storage**: IndexedDB with async/Promise-based API, unique name constraint
- **Dataset Resolution**: Async spec transformation before rendering, format-aware data injection - **Dataset Resolution**: Async spec transformation before rendering, format-aware data injection
- **URL Metadata**: Fetch on creation with graceful CORS error handling - **URL Metadata**: Fetch on creation with graceful CORS error handling
- **Modal UI**: Flexbox with overflow:auto, max-height responsive to viewport - **Modal UI**: Flexbox with overflow:auto, max-height responsive to viewport
- **Auto-detection**: URL validation, JSON/CSV/TSV parsing, confidence scoring, real-time feedback

View File

@@ -129,9 +129,13 @@
- Copy reference button (generates `"data": {"name": "..."}`) - Copy reference button (generates `"data": {"name": "..."}`)
- Delete dataset with confirmation - Delete dataset with confirmation
- Refresh metadata button for URL datasets (🔄) - Refresh metadata button for URL datasets (🔄)
- Automatic metadata calculation on creation - **Auto-detection system**:
- Single input field for data or URL
- Automatic URL detection and content fetching
- Format detection (JSON, CSV, TSV, TopoJSON)
- Confidence scoring (high/medium/low)
- Visual confirmation with badges and preview
- URL fetching with CORS error handling - URL fetching with CORS error handling
- Button-group UI for source/format selection
- Unique dataset name constraint (IndexedDB index) - Unique dataset name constraint (IndexedDB index)
- Empty state message for no datasets - Empty state message for no datasets
@@ -197,14 +201,14 @@ src/
├── js/ ├── js/
│ ├── config.js # Global variables, settings, sample data │ ├── config.js # Global variables, settings, sample data
│ ├── snippet-manager.js # Snippet CRUD, storage, search, sort (977 lines) │ ├── snippet-manager.js # Snippet CRUD, storage, search, sort (977 lines)
│ ├── dataset-manager.js # Dataset CRUD, IndexedDB, formats (637 lines) │ ├── dataset-manager.js # Dataset CRUD, IndexedDB, auto-detection (714 lines)
│ ├── panel-manager.js # Layout resizing, toggling, persistence (200 lines) │ ├── panel-manager.js # Layout resizing, toggling, persistence (200 lines)
│ ├── editor.js # Monaco setup, Vega rendering, dataset resolution (150 lines) │ ├── editor.js # Monaco setup, Vega rendering, dataset resolution (150 lines)
│ └── app.js # Event handlers, initialization (197 lines) │ └── app.js # Event handlers, initialization (197 lines)
└── styles.css # Retro Windows 2000 aesthetic └── styles.css # Retro Windows 2000 aesthetic
``` ```
**Total JS Lines**: ~2,161 lines (excluding comments and blank lines) **Total JS Lines**: ~2,238 lines (excluding comments and blank lines)
--- ---

View File

@@ -227,37 +227,25 @@
</div> </div>
<div class="dataset-form-group"> <div class="dataset-form-group">
<div class="dataset-toggle-row"> <label class="dataset-form-label">Data or URL *</label>
<div class="dataset-toggle-section"> <div class="dataset-format-hint">
<span class="dataset-toggle-label">Source:</span> Paste your data (JSON, CSV, or TSV) or a URL. Format will be detected automatically.
<div class="dataset-toggle-group">
<button class="dataset-toggle-btn active" data-source="inline" type="button">Inline</button>
<button class="dataset-toggle-btn" data-source="url" type="button">URL</button>
</div>
</div>
<div class="dataset-toggle-section">
<span class="dataset-toggle-label">Format:</span>
<div class="dataset-toggle-group">
<button class="dataset-toggle-btn active" data-format="json" type="button">JSON</button>
<button class="dataset-toggle-btn" data-format="csv" type="button">CSV</button>
<button class="dataset-toggle-btn" data-format="tsv" type="button">TSV</button>
<button class="dataset-toggle-btn" data-format="topojson" type="button">TopoJSON</button>
</div>
</div>
</div> </div>
<textarea id="dataset-form-input" class="dataset-textarea" placeholder="Paste data or URL here..." rows="12"></textarea>
</div> </div>
<div class="dataset-form-group" id="dataset-url-group" style="display: none;"> <!-- Detection Confirmation UI -->
<label class="dataset-form-label">URL *</label> <div id="dataset-detection-confirm" class="dataset-detection-confirm" style="display: none;">
<input type="text" id="dataset-form-url" class="dataset-input" placeholder="https://example.com/data.csv" /> <div class="detection-header">
</div> <span class="detection-title">Detected:</span>
<div class="detection-badges">
<div class="dataset-form-group" id="dataset-data-group"> <span class="detection-badge" id="detected-format">JSON</span>
<label class="dataset-form-label">Data *</label> <span class="detection-badge" id="detected-source">Inline</span>
<div class="dataset-format-hint" id="dataset-format-hint"> <span class="detected-confidence high" id="detected-confidence">high confidence</span>
JSON array of objects: [{"col1": "value", "col2": 123}, ...] </div>
</div> </div>
<textarea id="dataset-form-data" class="dataset-textarea" placeholder='[{"col1": "value", "col2": 123}, ...]' rows="12"></textarea> <div class="detection-preview-label">Preview:</div>
<pre id="detected-preview" class="detection-preview-box"></pre>
</div> </div>
<div class="dataset-form-group"> <div class="dataset-form-group">

View File

@@ -377,80 +377,239 @@ function closeDatasetManager() {
window.currentDatasetId = null; window.currentDatasetId = null;
} }
// Update format hint and placeholder // Auto-detect data format from pasted content
function updateFormatHint(format) { function detectDataFormat(text) {
const hintEl = document.getElementById('dataset-format-hint'); text = text.trim();
const dataEl = document.getElementById('dataset-form-data');
if (format === 'json') { // Try JSON first
hintEl.textContent = 'JSON array of objects: [{"col1": "value", "col2": 123}, ...]'; try {
dataEl.placeholder = '[{"col1": "value", "col2": 123}, ...]'; const parsed = JSON.parse(text);
} else if (format === 'csv') {
hintEl.textContent = 'CSV with header row: col1,col2\\nvalue1,123\\nvalue2,456'; // Check if it's TopoJSON
dataEl.placeholder = 'col1,col2\nvalue1,123\nvalue2,456'; if (parsed && typeof parsed === 'object' && parsed.type === 'Topology') {
} else if (format === 'tsv') { return { format: 'topojson', parsed, confidence: 'high' };
hintEl.textContent = 'TSV with header row: col1\\tcol2\\nvalue1\\t123\\nvalue2\\t456'; }
dataEl.placeholder = 'col1\tcol2\nvalue1\t123\nvalue2\t456';
} else if (format === 'topojson') { // Check if it's JSON array
hintEl.textContent = 'TopoJSON object: {"type": "Topology", "objects": {...}, "arcs": [...]}'; if (Array.isArray(parsed)) {
dataEl.placeholder = '{"type": "Topology", "objects": {...}}'; return { format: 'json', parsed, confidence: 'high' };
}
// Could be TopoJSON or other JSON object
return { format: 'json', parsed, confidence: 'medium' };
} catch (e) {
// Not JSON, continue checking
}
// Check for CSV/TSV
const lines = text.split('\n').filter(line => line.trim());
if (lines.length >= 2) {
const firstLine = lines[0];
// Count delimiters
const commaCount = (firstLine.match(/,/g) || []).length;
const tabCount = (firstLine.match(/\t/g) || []).length;
// TSV detection
if (tabCount > 0 && tabCount > commaCount) {
// Verify consistency across rows
const isConsistent = lines.slice(0, 5).every(line =>
(line.match(/\t/g) || []).length === tabCount
);
if (isConsistent) {
return { format: 'tsv', parsed: text, confidence: 'high' };
}
}
// CSV detection
if (commaCount > 0) {
// Basic consistency check (at least 2 rows with similar comma count)
const isConsistent = lines.slice(0, 5).every(line => {
const count = (line.match(/,/g) || []).length;
return Math.abs(count - commaCount) <= 1; // Allow 1 comma difference
});
if (isConsistent) {
return { format: 'csv', parsed: text, confidence: 'medium' };
}
}
}
return { format: null, parsed: null, confidence: 'low' };
}
// Check if text is a URL
function isURL(text) {
try {
const url = new URL(text.trim());
return url.protocol === 'http:' || url.protocol === 'https:';
} catch (e) {
return false;
} }
} }
// Toggle between URL and inline data inputs // Detect format from URL extension
function toggleDataSource(source) { function detectFormatFromURL(url) {
const urlGroup = document.getElementById('dataset-url-group'); const urlLower = url.toLowerCase();
const dataGroup = document.getElementById('dataset-data-group'); if (urlLower.endsWith('.json')) return 'json';
if (urlLower.endsWith('.csv')) return 'csv';
if (urlLower.endsWith('.tsv') || urlLower.endsWith('.tab')) return 'tsv';
if (urlLower.endsWith('.topojson')) return 'topojson';
return null;
}
if (source === 'url') { // Fetch and detect format from URL
urlGroup.style.display = 'block'; async function fetchAndDetectURL(url) {
dataGroup.style.display = 'none'; try {
} else { const response = await fetch(url);
urlGroup.style.display = 'none'; if (!response.ok) {
dataGroup.style.display = 'block'; throw new Error(`HTTP ${response.status}: ${response.statusText}`);
}
const text = await response.text();
const detected = detectDataFormat(text);
// If no format detected from content, try URL extension
if (!detected.format) {
const formatFromURL = detectFormatFromURL(url);
if (formatFromURL) {
return {
format: formatFromURL,
content: text,
confidence: 'low',
source: 'url'
};
}
}
return {
format: detected.format,
content: text,
parsed: detected.parsed,
confidence: detected.confidence,
source: 'url'
};
} catch (error) {
throw new Error(`Failed to fetch URL: ${error.message}`);
} }
} }
// Show detected format confirmation UI
function showDetectionConfirmation(detection, originalInput) {
const confirmEl = document.getElementById('dataset-detection-confirm');
const detectedFormatEl = document.getElementById('detected-format');
const detectedSourceEl = document.getElementById('detected-source');
const detectedPreviewEl = document.getElementById('detected-preview');
const detectedConfidenceEl = document.getElementById('detected-confidence');
confirmEl.style.display = 'block';
// Show detected format
detectedFormatEl.textContent = detection.format ? detection.format.toUpperCase() : 'Unknown';
// Show source
detectedSourceEl.textContent = detection.source === 'url' ? 'URL' : 'Inline Data';
// Show confidence indicator
const confidenceClass = detection.confidence === 'high' ? 'high' :
detection.confidence === 'medium' ? 'medium' : 'low';
detectedConfidenceEl.className = `detected-confidence ${confidenceClass}`;
detectedConfidenceEl.textContent = `${detection.confidence} confidence`;
// Show preview
let previewText = '';
if (detection.source === 'url') {
previewText = `URL: ${originalInput}\n\n`;
if (detection.content) {
const lines = detection.content.split('\n');
previewText += `Preview (first 10 lines):\n${lines.slice(0, 10).join('\n')}`;
if (lines.length > 10) {
previewText += `\n... (${lines.length - 10} more lines)`;
}
}
} else {
const lines = originalInput.split('\n');
previewText = lines.slice(0, 15).join('\n');
if (lines.length > 15) {
previewText += `\n... (${lines.length - 15} more lines)`;
}
}
detectedPreviewEl.textContent = previewText;
// Store detection data for later use
window.currentDetection = {
...detection,
originalInput
};
}
// Hide detection confirmation UI
function hideDetectionConfirmation() {
const confirmEl = document.getElementById('dataset-detection-confirm');
confirmEl.style.display = 'none';
window.currentDetection = null;
}
// Show new dataset form // Show new dataset form
function showNewDatasetForm() { function showNewDatasetForm() {
document.getElementById('dataset-list-view').style.display = 'none'; document.getElementById('dataset-list-view').style.display = 'none';
document.getElementById('dataset-form-view').style.display = 'block'; document.getElementById('dataset-form-view').style.display = 'block';
document.getElementById('dataset-form-name').value = ''; document.getElementById('dataset-form-name').value = '';
document.getElementById('dataset-form-data').value = ''; document.getElementById('dataset-form-input').value = '';
document.getElementById('dataset-form-url').value = '';
document.getElementById('dataset-form-comment').value = ''; document.getElementById('dataset-form-comment').value = '';
document.getElementById('dataset-form-error').textContent = ''; document.getElementById('dataset-form-error').textContent = '';
// Reset to inline data source and JSON format // Hide detection confirmation
document.querySelectorAll('[data-source]').forEach(btn => { hideDetectionConfirmation();
btn.classList.toggle('active', btn.dataset.source === 'inline');
});
document.querySelectorAll('[data-format]').forEach(btn => {
btn.classList.toggle('active', btn.dataset.format === 'json');
});
toggleDataSource('inline');
updateFormatHint('json');
// Add listeners if not already added // Add paste handler if not already added
if (!window.datasetListenersAdded) { if (!window.datasetListenersAdded) {
// Source toggle button listeners const inputEl = document.getElementById('dataset-form-input');
document.querySelectorAll('[data-source]').forEach(btn => {
btn.addEventListener('click', function () {
// Update active state
document.querySelectorAll('[data-source]').forEach(b => b.classList.remove('active'));
this.classList.add('active');
toggleDataSource(this.dataset.source);
});
});
// Format toggle button listeners // Handle paste/input with auto-detection
document.querySelectorAll('[data-format]').forEach(btn => { inputEl.addEventListener('input', async function () {
btn.addEventListener('click', function () { const text = this.value.trim();
// Update active state if (!text) {
document.querySelectorAll('[data-format]').forEach(b => b.classList.remove('active')); hideDetectionConfirmation();
this.classList.add('active'); return;
updateFormatHint(this.dataset.format); }
});
const errorEl = document.getElementById('dataset-form-error');
errorEl.textContent = '';
// Check if it's a URL
if (isURL(text)) {
errorEl.textContent = 'Fetching and analyzing URL...';
try {
const detection = await fetchAndDetectURL(text);
errorEl.textContent = '';
if (detection.format) {
showDetectionConfirmation(detection, text);
} else {
errorEl.textContent = 'Could not detect data format from URL. Please check the URL or try pasting the data directly.';
hideDetectionConfirmation();
}
} catch (error) {
errorEl.textContent = error.message;
hideDetectionConfirmation();
}
} else {
// Inline data - detect format
const detection = detectDataFormat(text);
if (detection.format) {
showDetectionConfirmation({
...detection,
source: 'inline'
}, text);
} else {
errorEl.textContent = 'Could not detect data format. Please ensure your data is valid JSON, CSV, or TSV.';
hideDetectionConfirmation();
}
}
}); });
window.datasetListenersAdded = true; window.datasetListenersAdded = true;
@@ -466,8 +625,6 @@ function hideNewDatasetForm() {
// Save new dataset // Save new dataset
async function saveNewDataset() { async function saveNewDataset() {
const name = document.getElementById('dataset-form-name').value.trim(); const name = document.getElementById('dataset-form-name').value.trim();
const source = document.querySelector('[data-source].active').dataset.source;
const format = document.querySelector('[data-format].active').dataset.format;
const comment = document.getElementById('dataset-form-comment').value.trim(); const comment = document.getElementById('dataset-form-comment').value.trim();
const errorEl = document.getElementById('dataset-form-error'); const errorEl = document.getElementById('dataset-form-error');
@@ -479,68 +636,55 @@ async function saveNewDataset() {
return; return;
} }
// Check if we have detected data
if (!window.currentDetection || !window.currentDetection.format) {
errorEl.textContent = 'Please paste data or URL to detect format';
return;
}
const detection = window.currentDetection;
const { format, source, originalInput } = detection;
let data; let data;
let metadata = null; let metadata = null;
if (source === 'url') { if (source === 'url') {
const url = document.getElementById('dataset-form-url').value.trim(); // For URL, we already fetched the content
if (!url) { data = originalInput; // Store the URL string
errorEl.textContent = 'URL is required';
return;
}
// Basic URL validation
try {
new URL(url);
} catch (error) {
errorEl.textContent = 'Invalid URL format';
return;
}
// Fetch metadata from URL // Calculate metadata from fetched content
errorEl.textContent = 'Fetching data from URL...'; if (detection.content) {
try { try {
metadata = await fetchURLMetadata(url, format); metadata = calculateDatasetStats(
errorEl.textContent = ''; detection.parsed || detection.content,
} catch (error) { format,
errorEl.textContent = `Warning: ${error.message}. Dataset will be created without metadata.`; 'inline'
// Continue anyway - URL might require CORS or auth );
await new Promise(resolve => setTimeout(resolve, 2000)); // Show warning briefly // Override to use actual content size
errorEl.textContent = ''; metadata.size = new Blob([detection.content]).size;
} catch (error) {
console.warn('Failed to calculate metadata:', error);
}
} }
data = url; // Store the URL string
} else { } else {
// Inline data // Inline data
const dataText = document.getElementById('dataset-form-data').value.trim(); if (format === 'json' || format === 'topojson') {
if (!dataText) { if (!detection.parsed) {
errorEl.textContent = 'Data is required'; errorEl.textContent = 'Invalid JSON data';
return; return;
}
// Basic validation of data format
try {
if (format === 'json' || format === 'topojson') {
const parsed = JSON.parse(dataText);
if (format === 'json' && !Array.isArray(parsed)) {
errorEl.textContent = 'JSON data must be an array of objects';
return;
}
if (format === 'json' && parsed.length === 0) {
errorEl.textContent = 'Data array cannot be empty';
return;
}
data = parsed; // Store as parsed JSON
} else if (format === 'csv' || format === 'tsv') {
const lines = dataText.trim().split('\n');
if (lines.length < 2) {
errorEl.textContent = `${format.toUpperCase()} must have at least a header row and one data row`;
return;
}
data = dataText; // Store as raw CSV/TSV string
} }
} catch (error) { if (format === 'json' && Array.isArray(detection.parsed) && detection.parsed.length === 0) {
errorEl.textContent = `Validation error: ${error.message}`; errorEl.textContent = 'Data array cannot be empty';
return; return;
}
data = detection.parsed;
} else if (format === 'csv' || format === 'tsv') {
const lines = originalInput.trim().split('\n');
if (lines.length < 2) {
errorEl.textContent = `${format.toUpperCase()} must have at least a header row and one data row`;
return;
}
data = originalInput.trim();
} }
} }

View File

@@ -1014,6 +1014,87 @@ body {
border: 1px solid #e0e0a0; border: 1px solid #e0e0a0;
} }
/* Detection Confirmation UI */
.dataset-detection-confirm {
margin: 12px 0;
padding: 12px;
background: #e8f4f8;
border: 2px solid #4a90c5;
border-radius: 4px;
}
.detection-header {
display: flex;
justify-content: space-between;
align-items: center;
margin-bottom: 8px;
}
.detection-title {
font-size: 11px;
font-weight: bold;
color: #000000;
}
.detection-badges {
display: flex;
gap: 6px;
align-items: center;
}
.detection-badge {
background: #316ac5;
color: #ffffff;
padding: 2px 8px;
font-size: 10px;
font-weight: bold;
border: 1px solid #0a246a;
border-radius: 2px;
}
.detected-confidence {
font-size: 9px;
padding: 2px 6px;
border-radius: 2px;
font-weight: normal;
}
.detected-confidence.high {
background: #90ee90;
color: #000000;
border: 1px solid #60c060;
}
.detected-confidence.medium {
background: #ffff90;
color: #000000;
border: 1px solid #d0d060;
}
.detected-confidence.low {
background: #ffb080;
color: #000000;
border: 1px solid #d08050;
}
.detection-preview-label {
font-size: 10px;
font-weight: bold;
margin-bottom: 4px;
color: #000000;
}
.detection-preview-box {
background: #ffffff;
border: 2px inset #c0c0c0;
padding: 8px;
font-family: 'Courier New', monospace;
font-size: 10px;
overflow: auto;
max-height: 150px;
margin: 0;
}
.dataset-form-error { .dataset-form-error {
color: #ff0000; color: #ff0000;
font-size: 11px; font-size: 11px;