becke.ch becke-ch--diff--s0-v1
tool
Name Description
becke.ch compare tool web application

becke.ch compare tool web application (for older browsers respective browsers having issues)

becke.ch compare tool library


becke.ch compare tool: A Java application (Web & Standalone) to diff respective compare entire file hierarchies (zip-files or directories), input streams and documents in different formats.
The underlying algorithm is based on a variation of Eugene W. Myers diff algorithm (O(ND) Difference Algorithm), can process any kind of input streams (preferably character input streams), supports filtering based on regular expressions and provides different tuning options.
This underlying library is more accurate than Meld, Eclipse and sometimes even Beyond Compare. More powerful than Meld and Eclipse supporting filtering based on regular expressions. And last but not least has comparable performance to the afore mentioned diff / comparison tools.
NEW (08.2015): Microsoft Word (.doc & .docx) & Excel (.xls & .xslx) support: Supports now as well the comparison of Microsoft Word (.doc & .docx) & Excel (.xls & .xslx) files (file load based on Apache POI)!
NEW (09.2015): OpenOffice & LibreOffice Writer (.odt) support: Supports now as well the comparison of OpenOffice- and LibreOffice-Writer (.odt) files!
NEW (10.2015): PDF (.pdf) support: Supports now as well the comparison of PDF-files (file load based on Apache PDFBox)!
NEW (11.2015): Linear Space Refinement: Improved quality & performance: Implemented a variation of Eugene W. Myers linear space refinement algorithm and further improved the diff quality and performance!
NEW (12.2015): AngularJS: Improved GUI: Implemented the GUI in AngularJS for a better look & feel and user experience.
NEW (01.2016): File hierarchy: Directory & Zip-File Support: Supports the comparison of entire file hierarchies: directories (only stand-alone client) and zip-files and outputs the differences in a tree-view.
NEW (02.2016): Unified Diff Support: Supports unified diff. The unified diff output is often used as input to patch programs. Many projects specifically request that "diffs" be submitted in the unified format, making unified diff format the most common format for exchange between software developers. (See as well: diff utility).
NEW (05.2016): Max-Time Support (Option: max time: Default 1000 milliseconds): The user can now set a max-time how long the comparison may take. After this time limit is exceeded the algorithm shows in percentage how much characters could be compared and returns the remaining characters as non-matching.
NEW (06.2016): Performance Improvement: The reconciliation algorithm has further been improved and based on that the performance and diff results became even better.
NEW (07.2016): Lazy Load & Scrollbar Support: Only the nodes/files that are different are loaded, shown and expanded in the explorer. Further nodes are loaded on demand when clicking the corresponding folder icon in the explorer. When navigating from the diff-explorer- to the diff-file-view and back again the scrollbar-position in the explorer is maintained (making sure the user does not loose the orientation).
NEW (08.2016): Filter support on hierarchies: (Option: "file hierarchy": filtering: Default: "yes"): When comparing entire file hierarchies (directories, zip-files, ...) the line- and overall-filters can now be applied on every single file, which makes the file hierarchy diff result much more accurate i.e. no false positives because the same filtering is now applied on each file as when comparing just single files.
NEW (08.2016): Character-Set/Encoding support: (Option: "file: character set": Default: "UTF-8"): We can now explicitly set the character set/encoding of the left- and right-file before comparing them. This helps to avoid differences caused by files that have been persisted in different encodings.
NEW (09.2016): Font Size: Diff In-/Output: The user can now set the font size for the diff in- and output sections.
NEW (09.2016): On-the-fly Edit & Save support: The user can edit the diff-content and while typing the diff result gets updated on-the-fly. And last but not least the user can then save the modified files and diff results.
NEW (11.2016): WYSIWIG Edit support: The user can WYSIWIG (What You See Is What You Get) directly edit the diff-content in the HTML table itself (no need anymore for separate text-area) and while typing the diff result gets updated on-the-fly. (This feature is based on the "contenteditable" support in HTML5.)
NEW (01.2018): Copy & same-size/same-time support on hierarchies: NEW Copy functionality to copy single files or entire sub-directories from left to right and vice versa!
(Option: "file: same size & same time: compare: Default: "no"): When comparing entire file hierarchies (directories, zip-files, ...) the same-time & same size comparison can now be applied so that every single file will be compared, which makes the file hierarchy diff result more accurate but potentially at a high performance impact depending on the directory size and therefore set to false by default.

scope={organization={becke.ch},interest={business,private},document-type={any,txt,xml,pdf,odt,doc,docx,xls,xslx},programming-language={java},platform={any},category={tool}}

Use Cases, Samples & Screenshots: Some use cases, samples and screen-shots comparing the accuracy of: becke-ch--diff--s0-v1 to Eclipse, Meld and Beyond Compare:
  • Use Case 1: Short text with the words concatenated together
  • Use Case 2: Long text sections that are sparsely matching and not aligned on new-line boundaries e.g. comparing long and complex html markup text with its clear text version copied from the browser.
  • Use Case 3: More powerful than Meld and Eclipse supporting filtering based on regular expressions

Use Case 1: Short text with the words concatenated together:

Use Case 1: Sample:
From: To:
Dies ist ein langer Text den wir vergleichen wollen. DiesxxisteinlangerTextdenwirvergleichen wollllen.

Use Case 1: Comparison:
becke-ch--diff--s0-v1: Success: The concatenated words are correctly detected and the missing spaces are correctly highlighted.
Eclipse: Failure: The concatenated words are NOT correctly detected instead some random word separations are highlighted.
Meld: Success: The concatenated words are correctly detected and the missing spaces are correctly highlighted.
Beyond Compare: Partial Success: The concatenated words are correctly detected BUT the missing spaces are NOT (correctly) highlighted.



Use Case 2: Long text sections that are sparsely matching and not aligned on new-line boundaries e.g. comparing long and complex html markup text with its clear text version copied from the browser.:

Use Case 2: Sample:
From: To:
<html>
<head>
<title>SELFHTML: HTML-Kurzreferenz</title>
<meta name="description" content="(SELFHTML 7.0) Kurzreferenz der HTML-Befehle">
<meta name="keywords" content="SELFHTML, HTML, Referenz">
<link rel=stylesheet type="text/css" href="wselfhtm.css">
</head>
<body bgcolor=#FFFFFF text=#000000 link=#AA5522 vlink=#772200 alink=#000000>

<p><nobr><a class="an" name="top"><img src="x2.gif" width=16 height=13 border=0></a> <a href="selfhtml.htm"><b>SELFHTML</b></a>/<a href="tq.htm" target="_parent">Quickbar</a></nobr></p>
<hr noshade size=1>
<table cellpadding=4 cellspacing=1 width=100%>
<tr>
<td bgcolor=#EEEEEE class="doc" width=110><img src="xweb.gif" width=106 height=109></td>
<td bgcolor=#EEEEEE class="doc" valign=bottom width=100%><h2>HTML-Kurzreferenz</h2></td>
</tr>
<tr>
<td bgcolor=#EEEEEE class="doc" valign=top align=center>
<img src="x5.gif" width=30 height=20 vspace=6 border=0 alt="Diese Seite ist ein Dokument mit Informationstext">
</td>
<td bgcolor=#FFFFFF valign=top nowrap>

<p>
<img src="xgdown.gif" width=14 height=10 border=0>&nbsp;<a href="#a1"><b>Hinweise zur Kurzreferenz</b></a><br>
<img src="xgdown.gif" width=14 height=10 border=0>&nbsp;<a href="#a2"><b>Allgemeine und dateiweite Angaben</b></a><br>
<img src="xgdown.gif" width=14 height=10 border=0>&nbsp;<a href="#a3"><b>Absatztypen und Textgestaltung</b></a><br>
<img src="xgdown.gif" width=14 height=10 border=0>&nbsp;<a href="#a4"><b>Tabellen</b></a><br>
<img src="xgdown.gif" width=14 height=10 border=0>&nbsp;<a href="#a5"><b>Verweise</b></a><br>
<img src="xgdown.gif" width=14 height=10 border=0>&nbsp;<a href="#a6"><b>Grafiken</b></a><br>
<img src="xgdown.gif" width=14 height=10 border=0>&nbsp;<a href="#a7"><b>Formulare</b></a><br>
<img src="xgdown.gif" width=14 height=10 border=0>&nbsp;<a href="#a8"><b>Frames</b></a><br>
<img src="xgdown.gif" width=14 height=10 border=0>&nbsp;<a href="#a9"><b>Multimedia</b></a><br>
<img src="xgdown.gif" width=14 height=10 border=0>&nbsp;<a href="#a10"><b>Layer</b></a><br>
<img src="xgdown.gif" width=14 height=10 border=0>&nbsp;<a href="#a11"><b>Styles und Scripts</b></a>
</p>

</td>
</tr><tr><td colspan=2 bgcolor=#EEEEEE class="doc"><a href="#bottom"><img src="xgdown.gif" width=14 height=10 border=0></a>&#160;</td></tr>
</table>



<h2 class="Sh2"><a class="an" name="a1">Hinweise zur Kurzreferenz</a></h2>

<p>Die HTML-Kurzreferenz ist f&uuml;r Anwender gedacht, die sich bereits mit HTML auskennen und bei der t&auml;glichen Arbeit einfach eine Hilfe am Bildschirm oder als Ausdruck haben m&ouml;chten, ohne viel klicken zu m&uuml;ssen. Wenn Sie in der verk&uuml;rzten Darstellung auf dieser Seite etwas nicht verstehen oder Probleme mit der Anwendung der Befehle haben, folgen Sie den Verweisen vom Typ <img src="x3.gif" width=15 height=10 border=0>&nbsp;<a name="hier" href="#hier"><b>Beschreibung</b></a>.<br>
Die Kurzreferenz erhebt keinen Anspruch auf Vollst&auml;ndigkeit!</p>
......


SELFHTML/Quickbar

HTML-Kurzreferenz
Diese Seite ist ein Dokument mit Informationstext

Hinweise zur Kurzreferenz
Allgemeine und dateiweite Angaben
Absatztypen und Textgestaltung
Tabellen
Verweise
Grafiken
Formulare
Frames
Multimedia
Layer
Styles und Scripts

Hinweise zur Kurzreferenz

Die HTML-Kurzreferenz ist für Anwender gedacht, die sich bereits mit HTML auskennen und bei der täglichen Arbeit einfach eine Hilfe am Bildschirm oder als Ausdruck haben möchten, ohne viel klicken zu müssen. Wenn Sie in der verkürzten Darstellung auf dieser Seite etwas nicht verstehen oder Probleme mit der Anwendung der Befehle haben, folgen Sie den Verweisen vom Typ Beschreibung.
Die Kurzreferenz erhebt keinen Anspruch auf Vollständigkeit!
.....

Use Case 2: Comparison:
becke-ch--diff--s0-v1: Success: The long text sections that are sparsely matching are correctly detected and highlighted.
Eclipse: Failure: Eclipse cannot handle long text sections that are sparsely matching and that are not aligned on new-line boundaries.
Meld: Failure: Meld cannot handle long text sections that are sparsely matching and that are not aligned on new-line boundaries.
Beyond Compare: Partial Success: The long text sections that are sparsely matching are partially correctly detected and highlighted.


Use Case 3: More powerful than Meld and Eclipse supporting filtering based on regular expressions: Filtering gives the user the ability to filter out text passages he is not interested in so he can better focus on relevant differences e.g. in java a user could filter out leading and trailing spaces, comments, etc.


Use Case 3: Sample: Comparing html markup text with its clear-text (copied from the browser): The same sample was already shown (without filtering) in use case 2 (see above). As already mentioned the filter gives the user the ability to focus on the relevant differences - in this use case we basically filtered out the markup tags to be able to better focus on the changes in the text passages:
html markup versus clear-text: regular expression:
  • "(<img[^>]+alt=\"([^\"]+))|(>([^<]+)<)"
This regular expression matches the text between the tags and considers as well the alternate text for images getting displayed.


Use Case 3: Sample: Shows the difference / benefit of a filtered diff (lower image) versus a non-filtered diff (upper image) applied on java code: Again the user can better focus on the relevant changes because leading-, trailing-spaces, line comments and javadoc were filtered out:
Non-filtered text versus filtered text: regular expression:
  • Per line: "((^\\s+)|(\\s+$)|(^\\s*//.*$))"
  • Overall: "(/\\*\\*.*?\\*/)"
The regular expression per line matches the leading and trailing spaces and the line comments and the overall regular expression matches the javadoc that we want to ignore.