becke.ch becke-ch--diff--s0-v1
tool

Description:
  • Text-area & DIFF Text: Enter the text (left & right) into the text areas and press the button "DIFF Text" to compare them - or
  • Choose File & DIFF Files: Choose the files (left & right) using the file browser and press the button "DIFF Files/Directories" to compare them (Attention: In the online/web version the file-size is limited to 2*512 KB to avoid misuse of this service) - or
  • Choose Directory & DIFF Directories: Choose the directories (left & right) using the directory browser and press the button "DIFF Files/Directories" to compare them (Attention: The comparison of directories is only supported in the standalone version - in the web version you need to zip the directories first before comparing them!)
  • diff algorithm (default smart diff): "smart diff" or "character based diff" or "line based diff":
    • "smart diff" (default): As the name says is a smart combination of the algorithms "line based diff" and "characters based diff". First the line based diff is applied and if the match is bad i.e. below a certain threshold (20% is the default) then the character based diff is applied.
    • "character based diff": is a good choice for bad aligned text i.e. text that have no common/similar line separation respective have different formatting (tags) e.g. comparing html text with its clear text (copied from browser). Or comparing books that are formatted differently.
    • "line based diff": Is a good choice, if the from and to input text are line oriented (most programming languages) and in case we have a good amount of lines that are matching. The non-matching lines are compared in a second run using the character based diff. False positive line matches can have a bad impact on the matching result, "tearing it apart". Most diff tools (unfortunately) only support line based comparison. For example see the "meld" diff tool in unix, where false-positive matches mess up the whole comparison.
  • max time (default 1000 milliseconds): The number of milliseconds the comparison is allowed to take before it is stopped. After this time limit is exceeded the algorithm shows in percentage how much characters could be compared and returns the remaining characters as non-matching.
  • file hierarchy: filtering (default true): Indicates whether the line- and overall-filters should be applied on every single file when comparing entire file-hierarchies (directories, zip-files, etc.). The default is set to "true" BUT be aware that filtering every single file in big hierarchies can take quite long, especially if we have overall filters set, because for overall filtering we need to read and filter the whole file before we can compare. Line filters have no such big impact on performance because we read, filter and compare line by line and stop when the first line differs.
  • file: same size & same time: compare (default false): Indicates whether files that have the same size and same last modification date should be compared. Basically if this flag is set to true then every single file in the hierarchy is compared! The default is set to "false" BECAUSE if both: "file-hierarchy-filtering" AND "same-size and same-time" are set to true then this has BIG PERFORMANCE IMPACT ON LARGE FILE HIERARCHIES!
  • diff output: merge matching sections (default true): Normally we we are only interested in the non-matching sections to have them displayed next to each other. Therefore we can/should set merge matching sections to true.
  • diff output: unified diff: take unchanged lines of original (default true): Unified diff was not developed having filtering (and ignore empty lines) in mind. Therefore due to filtering it can happen that the unchanged lines (aka contextual lines) in the original and new file do not fully correspond. To circumnavigate this limitation this option enables the user to choose whether to take the unchanged lines of the original or the new file. Normally the unchanged lines should be taken of the original file because the patch is applied to the original file. (If you want the unified diff output to be fully accurate you should disable the overall-filtering and the ignoring-of-empty-lines).
  • diff output: unified diff: number of unchanged lines (default 1): As already mentioned previously the unified diff is sometimes fuzzy due to filtering. Therefore the number of unchanged lines in the original and new file might not correspond. To minimize this impact it is suggested to set the number of unchanged/contextual lines to "1" - because one single line always corresponds. (Normally in unified diff the number of unchanged lines is set to "3").
  • style: font: size: diff input and output (default 10): The font size that should be used for the diff in- and output.
  • line filter: pattern: enter a regular expression pattern that should get applied for each line. For example to match the leading and trailing spaces on a line use the following pattern: "(^\s+)|(\s+$)"
  • line filter: capturing groups: enter a comma separated list of numbers (0,1,... or 1,3,... or 2 or ...) of capturing groups you are interested in filtering. For example the pattern "(^\s+)|(\s+$)" has three capturing groups: 0 (always matches the whole expression), 1 (matches "(^\s+)" i.e. the leading spaces) and 2 (matches "(\s+$)" i.e. the trailing spaces)
  • line filter: capturing groups: action: select the action that should get applied i.e. what should be done with the capturing groups respective the content matched by the capturing groups: "remove" (the capturing group respective the content matched by this group is removed / ignored for the diff) or "keep" (only the capturing group is considered for the comparison)
  • overall filter: pattern: analogue to "line filter" but gets applied (after the line filters have been applied) on the whole text respective file. For example a filter that matches the JavaDoc comments would be: "/\*\*.*?\*/"
  • overall filter: capturing groups: analogue to "line filter: capturing groups"
  • overall filter: capturing groups: action: analogue to "line filter: capturing groups: action"
  • diff option: ignore empty lines (default true): Mostly we can ignore empty lines when comparing text (because they have no relevance besides formatting)
  • file: character set (default UTF-8): The character set that is used when reading a file from disk. The default character set for the Linux platform is UTF-8. The default character set for Windows is Windows-1252 or CP-1252 a character encoding of the Latin alphabet which is almost identical to ISO-8859-1 except in the control characters range 80 to 9F (hex).
  • follow symbolic links (default true): Defines whether symbolic links to files and directories should be followed during comparison or not. Default is set to true i.e. symbolic links will be followed for comparison. (Attention: The symbolic link option is only supported in the standalone version - because in the web version we are working on single files or on zip-files and in this context symbolic links are not relevant i.e. only in the standalone version when comparing entire directories the symbolic link option is relevant.)
History:
  • NEW (08.2015): Microsoft Word (.doc & .docx) & Excel (.xls & .xslx) support: Supports now as well the comparison of Microsoft Word (.doc & .docx) & Excel (.xls & .xslx) files (file load based on Apache POI)!
  • NEW (09.2015): OpenOffice & LibreOffice Writer (.odt) support: Supports now as well the comparison of OpenOffice- and LibreOffice-Writer (.odt) files!
  • NEW (10.2015): PDF (.pdf) support: Supports now as well the comparison of PDF-files (file load based on Apache PDFBox)!
  • NEW (11.2015): Linear Space Refinement: Improved quality & performance: Implemented a variation of Eugene W. Myers linear space refinement algorithm and further improved the diff quality and performance!
  • NEW (12.2015): AngularJS: Improved GUI: Implemented the GUI in AngularJS for a better look & feel and user experience.
  • NEW (01.2016): File hierarchy: Directory & Zip-File Support: Supports the comparison of entire file hierarchies: directories (only stand-alone client) and zip-files and outputs the differences in a tree-view.
  • NEW (02.2016): Unified Diff Support: Supports unified diff. The unified diff output is often used as input to patch programs. Many projects specifically request that "diffs" be submitted in the unified format, making unified diff format the most common format for exchange between software developers. (See as well: diff utility).
  • NEW (05.2016): Max-Time Support (Option: max time: Default 1000 milliseconds): The user can now set a max-time how long the comparison may take. After this time limit is exceeded the algorithm shows in percentage how much characters could be compared and returns the remaining characters as non-matching.
  • NEW (06.2016): Performance Improvement: The reconciliation algorithm has further been improved and based on that the performance and diff results became even better.
  • NEW (07.2016): Lazy Load & Scrollbar Support: Only the nodes/files that are different are loaded, shown and expanded in the explorer. Further nodes are loaded on demand when clicking the corresponding folder icon in the explorer. When navigating from the diff-explorer- to the diff-file-view and back again the scrollbar-position in the explorer is maintained (making sure the user does not loose the orientation).
  • NEW (08.2016): Filter support on hierarchies: (Option: "file hierarchy": filtering: Default: "yes"): When comparing entire file hierarchies (directories, zip-files, ...) the line- and overall-filters can now be applied on every single file, which makes the file hierarchy diff result much more accurate i.e. no false positives because the same filtering is now applied on each file as when comparing just single files.
  • NEW (08.2016): Character-Set/Encoding support: (Option: "file: character set": Default: "UTF-8"): We can now explicitly set the character set/encoding of the left- and right-file before comparing them. This helps to avoid differences caused by files that have been persisted in different encodings.
  • NEW (09.2016): Font Size: Diff In-/Output: The user can now set the font size for the diff in- and output sections.
  • NEW (09.2016): On-the-fly Edit & Save support: The user can edit the diff-content and while typing the diff result gets updated on-the-fly. And last but not least the user can then save the modified files and diff results.
  • NEW (11.2016): WYSIWIG Edit support: The user can WYSIWIG (What You See Is What You Get) directly edit the diff-content in the HTML table itself (no need anymore for separate text-area) and while typing the diff result gets updated on-the-fly. (This feature is based on the "contenteditable" support in HTML5.)
  • NEW (01.2018): Copy & same-size/same-time support on hierarchies: NEW Copy functionality to copy single files or entire sub-directories from left to right and vice versa!
    (Option: "file: same size & same time: compare: Default: "no"): When comparing entire file hierarchies (directories, zip-files, ...) the same-time & same size comparison can now be applied so that every single file will be compared, which makes the file hierarchy diff result more accurate but potentially at a high performance impact depending on the directory size and therefore set to false by default.
  • NEW (03.2018): Follow Symbolic Links: The user can select whether symbolic links to files and directories should be followed and considered during comparison or not!

diff algorithm:
max time (milliseconds):
file hierarchy: filtering:
file: same size & same time: compare:
diff output: merge matching sections:
diff output: unified diff: take unchanged lines of original:
diff output: unified diff: number of unchanged lines:
style: font: size: diff input and output:
from: to:
line filter: pattern: line filter: pattern:
line filter: capturing groups: line filter: capturing groups:
line filter: capturing groups: action: line filter: capturing groups: action:
overall filter: pattern: overall filter: pattern:
overall filter: capturing groups: overall filter: capturing groups:
overall filter: capturing groups: action: overall filter: capturing groups: action:
diff option: ignore empty lines: diff option: ignore empty lines:
file: character set: file: character set:
follow symbolic links:

Security Question: Please enter the result of the following operation: {{securityQuestion.leftOperand}} {{securityQuestion.rightOperand}} =
from: to:

{{diffOutput.message}}
{{diffOutput.message}}
Time to load: {{diffOutput.timeToLoad}} ms; Time to compare: {{diffOutput.timeToCompare}} ms; Time to write: {{diffOutput.timeToWrite}} ms
Tree Navigation: To navigate use: Ctrl+Shift+Cursor-Keys (←↑ →↓) and Ctrl+Shift+Enter (↲) to compare the selected node (and hide or show the tree).
Operations: To copy use: Ctrl+Shift+C.
Selected Node : {{treeFrom.currentNode.displayName}}
Selected Node : {{treeTo.currentNode.displayName}}
(Press Ctrl+Shift+Enter (↲) to show or hide the tree)
Save Files: To save the modified file(s) press: Ctrl+Shift+S.
{{diffInput.fileNameFrom}} ({{diffInput.fileLastModifiedFrom | date:'dd.MM.yyyy HH:mm:ss'}}) {{diffInput.fileNameTo}} ({{diffInput.fileLastModifiedTo | date:'dd.MM.yyyy HH:mm:ss'}})


{{diffOutput.message}}
unified diff

Copyright © 2016 becke.ch. All rights reserved. Mail to: diff--s0-v1 at becke.ch