- Регистрация
- 1 Мар 2015
- Сообщения
- 1,481
- Баллы
- 155
How I built DocTextExtractor to power NotteChat's AI-powered document chat, and how you can integrate it into your own Flutter apps.
As a Flutter developer with a passion for simplifying complex problems, I created —a lightweight, open-source Dart package that extracts text from .doc, .docx, .pdf, Google Docs URLs, and .md files.
This tool was born from the challenges I faced while building , an app that allows users to chat with document content using AI. In this article, I’ll share how I built DocTextExtractor, why it matters, and how you can integrate it into your own Flutter projects.
Why I Built DocTextExtractor
NotteChat empowers students, professionals, and educators to interact with documents conversationally. Users simply paste a URL or upload a file to a document and can then summarize, explore, or ask questions about the content using AI.
However, supporting multiple document formats (.doc, .docx, PDF, Google Docs, Markdown) posed a serious challenge. Most existing Flutter solutions only worked for specific formats, and there was no unified solution.
So I built DocTextExtractor, with a goal to:
1. Identifying the Need
The core feature of NotteChat—chatting with document content—meant I needed a consistent way to extract text, regardless of format or source.
Key Requirements:
I relied on the trusted Flutter/Dart ecosystem with the following tools and packages:
At the heart of the package is the TextExtractor class with a single extractText() method.
Key Features:
Each format was tackled with custom logic:
To enhance UX, I added a _extractFilename() method that pulls names from:
I tested with:
Edge cases included:
To make DocTextExtractor reusable, I decided to publish it on pub dev, and that is also one of the reasons I'm writing this article
DocTextExtractor is the backbone of NotteChat's AI-powered chat with documents.
It enables:
Step 1: Add the Dependency
Add to your app’s pubspec.yaml.
yaml
dependencies:
doc_text_extractor: ^1.0.0
Run:
flutter pub get
Step 2: Import and Initialize
Import the package and create a TextExtractor instance:
import 'package:doc_text_extractor/doc_text_extractor.dart';
final extractor = TextExtractor();
Step 3: Extract Text from a URL
Use extractText to process URLs for .doc, .docx, .md, PDF, or Google Docs:
Future<void> processDocumentUrl(String url) async {
try {
final result = await extractor.extractText(url);
final text = result.text;
final filename = result.filename;
print('Filename: $filename');
print('Text: ${text.substring(0, 100)}...');
// Pass text to AI service (e.g., for NotteChat’s AI chat)
} catch (e) {
print('Error: $e');
// Show user-friendly error (e.g., "Please convert .doc to .docx")
}
}
Example usage
processDocumentUrl('
processDocumentUrl('
Step 4: Extract Text from a Local File
For local files (e.g., user-uploaded .md or .doc), set isUrl: false:
import 'package:path_provider/path_provider.dart';
import 'dart:io';
Future<void> processLocalFile(String filePath) async {
try {
final result = await extractor.extractText(filePath, isUrl: false);
final text = result.text;
final filename = result.filename;
print('Filename: $filename');
print('Text: ${text.substring(0, 100)}...');
// Use text in app logic
} catch (e) {
print('Error: $e');
}
}
Example usage
final dir = await getTemporaryDirectory();
processLocalFile('${dir.path}/sample.md');
Step 5: Integrate with your preferred AI API
You can now use the extracted text in your app with AI tools like OpenAI, Gemini, or Sonar APIs.
class ChatScreen extends StatelessWidget {
Future<void> _handleDocument(String url) async {
final result = await extractor.extractText(url);
final text = result.text;
final filename = result.filename;
// Update session title
final sessionTitle = 'Session ${DateTime.now().toIso8601String().split('T')[0]} - $filename';
// Summarize with AI (e.g., Sonar API)
final sonarService = SonarService();
final summary = await sonarService.queryDocument(text, 'Summarize this document');
// Display in UI
print('Session: $sessionTitle');
print('Summary: $summary');
}
}
Step 6: Enhance UX with Error Handling
Add loading dialogs for large files and user-friendly errors:
if (e.toString().contains('Unsupported document type')) {
ScaffoldMessenger.of(context).showSnackBar(
SnackBar(content: Text('Unsupported format. Try converting to .docx or PDF.')),
);
}
Final Thoughts
DocTextExtractor started as a necessity for NotteChat but evolved into a powerful, standalone Flutter package. It’s now available for anyone building document-based apps, AI tools, or productivity platforms.
Try it out:
View the source:
If you found this helpful or end up using the package, feel free to drop a
on or share your feedback. I’d love to hear how you’re using it!
Happy coding!
DocTextExtractor #NotteChat #Flutter #AI #DestinyEd #Dart
As a Flutter developer with a passion for simplifying complex problems, I created —a lightweight, open-source Dart package that extracts text from .doc, .docx, .pdf, Google Docs URLs, and .md files.
This tool was born from the challenges I faced while building , an app that allows users to chat with document content using AI. In this article, I’ll share how I built DocTextExtractor, why it matters, and how you can integrate it into your own Flutter projects.
Why I Built DocTextExtractor
NotteChat empowers students, professionals, and educators to interact with documents conversationally. Users simply paste a URL or upload a file to a document and can then summarize, explore, or ask questions about the content using AI.
However, supporting multiple document formats (.doc, .docx, PDF, Google Docs, Markdown) posed a serious challenge. Most existing Flutter solutions only worked for specific formats, and there was no unified solution.
So I built DocTextExtractor, with a goal to:
- Support .doc, .docx, .pdf, .md, and Google Docs URLs
- Handle both local files and URLs
- Enable offline parsing for .doc and .md
- Power AI by providing clean, structured text
- Extract real filenames for better UX
1. Identifying the Need
The core feature of NotteChat—chatting with document content—meant I needed a consistent way to extract text, regardless of format or source.
Key Requirements:
- Unified API for all formats
- Clean filename extraction
- Minimal dependencies
- Cross-platform support (iOS, Android, Web)
I relied on the trusted Flutter/Dart ecosystem with the following tools and packages:
- http: Fetch documents via URLs
- syncfusion_flutter_pdf: Parse and extract PDFs
- archive + xml: Extract from .docx and .doc
- markdown: Convert .md to plain text
- VS Code + GitHub for development and version control
3. Designing the Core LogicGitHub repo:
At the heart of the package is the TextExtractor class with a single extractText() method.
Key Features:
- Unified Return Type: A Record(text, filename) for easy use
- Smart Format Detection: Checks HTTP Content-Type or file extension
- Offline Support: No internet required for .doc and .md
- Error Handling: Friendly exceptions (e.g., "Unsupported document type")
Each format was tackled with custom logic:
- .doc: No existing Dart parser, so I created one using raw XML parsing
- .docx: Unzipped and parsed word/document.xml
- .md: Used markdown package for plain-text conversion
- PDF: Parsed using syncfusion_flutter_pdf
- Google Docs: Converted /edit URLs to /export?format=pdf and parsed as PDF
To enhance UX, I added a _extractFilename() method that pulls names from:
- Content-Disposition headers (e.g., filename="report.docx")
- URL segments (e.g., )
- Google Docs metadata (fallback if unavailable)
I tested with:
- .doc: Legacy Word file
- .docx: Modern reports
- .md: GitHub README files
- PDF: Academic papers
- Google Docs: Shared documents
Edge cases included:
- Missing headers
- Large files (>10MB)
- Unsupported formats
To make DocTextExtractor reusable, I decided to publish it on pub dev, and that is also one of the reasons I'm writing this article
- Published on
- MIT Licensed
- Included example app
- Wrote a detailed README with usage examples
DocTextExtractor is the backbone of NotteChat's AI-powered chat with documents.
It enables:
- AI Chat: Clean text fed into AI (e.g., "Summarize this PDF")
- Offline Use: Great for areas with limited internet
- Smart UX: Real filenames and helpful error messages
- Versatile Support: For modern and legacy users
Step 1: Add the Dependency
Add to your app’s pubspec.yaml.
yaml
dependencies:
doc_text_extractor: ^1.0.0
Run:
flutter pub get
Step 2: Import and Initialize
Import the package and create a TextExtractor instance:
import 'package:doc_text_extractor/doc_text_extractor.dart';
final extractor = TextExtractor();
Step 3: Extract Text from a URL
Use extractText to process URLs for .doc, .docx, .md, PDF, or Google Docs:
Future<void> processDocumentUrl(String url) async {
try {
final result = await extractor.extractText(url);
final text = result.text;
final filename = result.filename;
print('Filename: $filename');
print('Text: ${text.substring(0, 100)}...');
// Pass text to AI service (e.g., for NotteChat’s AI chat)
} catch (e) {
print('Error: $e');
// Show user-friendly error (e.g., "Please convert .doc to .docx")
}
}
Example usage
processDocumentUrl('
processDocumentUrl('
Step 4: Extract Text from a Local File
For local files (e.g., user-uploaded .md or .doc), set isUrl: false:
import 'package:path_provider/path_provider.dart';
import 'dart:io';
Future<void> processLocalFile(String filePath) async {
try {
final result = await extractor.extractText(filePath, isUrl: false);
final text = result.text;
final filename = result.filename;
print('Filename: $filename');
print('Text: ${text.substring(0, 100)}...');
// Use text in app logic
} catch (e) {
print('Error: $e');
}
}
Example usage
final dir = await getTemporaryDirectory();
processLocalFile('${dir.path}/sample.md');
Step 5: Integrate with your preferred AI API
You can now use the extracted text in your app with AI tools like OpenAI, Gemini, or Sonar APIs.
class ChatScreen extends StatelessWidget {
Future<void> _handleDocument(String url) async {
final result = await extractor.extractText(url);
final text = result.text;
final filename = result.filename;
// Update session title
final sessionTitle = 'Session ${DateTime.now().toIso8601String().split('T')[0]} - $filename';
// Summarize with AI (e.g., Sonar API)
final sonarService = SonarService();
final summary = await sonarService.queryDocument(text, 'Summarize this document');
// Display in UI
print('Session: $sessionTitle');
print('Summary: $summary');
}
}
Step 6: Enhance UX with Error Handling
Add loading dialogs for large files and user-friendly errors:
if (e.toString().contains('Unsupported document type')) {
ScaffoldMessenger.of(context).showSnackBar(
SnackBar(content: Text('Unsupported format. Try converting to .docx or PDF.')),
);
}
Final Thoughts
DocTextExtractor started as a necessity for NotteChat but evolved into a powerful, standalone Flutter package. It’s now available for anyone building document-based apps, AI tools, or productivity platforms.
Try it out:
View the source:
If you found this helpful or end up using the package, feel free to drop a
Happy coding!
DocTextExtractor #NotteChat #Flutter #AI #DestinyEd #Dart