Overview

Data Prepper lets you extract and prepare data from text files to be used in another application, such as a database or a spreadsheet. With it, you can add input files to a job and for each input file, skip lines, trim lines, replace text in lines, extract columns from files, rearrange the columns, replace or reformat text in columns, join columns, join lines, and merge multiple files in to one output file.

To use it, you add some input files, add some line or column operations, and then press the Run button to run the job. The new files will be written to your output folder.

The Clipboard Data window lets you split the text on the macOS Clipboard into lines and columns. From there, you can trim it, replace it, fill it with text or sequences, add, delete, move, or transform the lines and columns, and then join them back together and placed back on the macOS Clipboard, where you can paste it into another application.

For more information about this app, or for questions or comments, please visit our website.

Adding Input Files

To add input files to a job, click the Job menu, then Add Input File. You can add more than one file at a time. You can also drag and drop files from the Finder into the Input Files list on the left. If the app is not running, you can drag and drop files on to the app icon to launch it with the dropped files. You can also right-click on a file in the Finder, and then open it with the app.

After you run a job, the output files will go to the output folder. To change the output folder, click the Job menu, then Select Output Folder.... You can add files from the output folder to the Input Files list as well, which allows repeated passes on the same file in the same job.

You can rearrange the order of the input files by dragging and dropping the rows within the Input Files list. They will be processed sequentially, starting from the top.

Normally you will want to have the same number of columns in each file, so that the output file will have a uniform number of columns. You can use the Set Initial Number of Columns operation to ensure that each input file in the job has the same number of columns.

To remove input files, select one or more and click the Job menu, then Remove Selected Input Files. Note that this only removes the input file from this app, and does not delete the file itself.

After adding input files, the Preview area will display the first 10 lines of the input file after they have been processed by the Initial operation. You can change what is displayed in the preview area to the text before the initial operation, the columns before the Final operation, or the text after the Final operation by clicking the menu above the preview area.

You can double click on a file in the Input Files list to switch the preview area selection to the text before the initial operation.

Operations

When a job is run, each input file will have operations performed on it before being written to the output folder. You can add operations to an input file by pressing the + in the Operations list on the right side of the window.

You can select multiple input files at once, and change the Initial and Final operations of all of them at the same time. You can also add or delete operations to multiple selected input files at once.

The Skip Line, Trim Line, and Replace Line operations are always performed after the Initial operation and Set Initial Number of Columns operations, and before any column operations.

The operations are described in more detail below.

Initial

Select the character set encoding of the input file using the Decode As option. UTF-8 is the most common character set encoding. If you know the encoding of the input file is not UTF-8, you can select the Windows CP1252 or the Mac Roman character set. Select ASCII to limit the characters to only the first 128 characters in the ASCII character set.

The line separator is used to separate the file's contents into lines. The column separator is used to separate each line into columns. To remove double quotes around columns, which are common in CSV files, check the Remove quotes box.

This operation is always performed first per input file and cannot be deleted.

Final

Join the columns back together with the column separator. Check the Add quotes box to add double quotes around the column. Join lines with the line separator.

Change the character set encoding of the output file using the Encode As option. Enter your desired file name. The Write Mode option lets you either overwrite or append to the output file.

This operation is always performed last per input file and cannot be deleted.

Replace Character

Replace any of the characters with the specified text string. The following escape sequences can be used for common problem characters:

DescriptionHex
\\Backslash0x5c
\rCarriage return0x0d
\nLine feed0x0a
\tHorizontal tab0x09
\vVertical tab0x0b
\fForm feed0x0c
\bBackspace0x08
\aAudible bell0x07

For example, you can enter "\r\n" in the "Replace" field, and "\n" in the "with" field, to replace the line endings of a Windows text file with those of a Unix text file.

ASCII Control Characters are any ASCII values with values below 32 (0x20 in hexadecimal) or value 127 (the Delete character, 0x7f in hex). The "Replace" field will be ignored with this selection.

Uncommon Characters are any that are not in the Unicode Basic Multilingual Plane. This includes most emoji characters. The "Replace" field will be ignored with this selection.

If present, this operation is always performed immediately after the Initial operation. You can have multiple Replace Character operations and they will be performed in the order they appear in the list.

You do not need to split a file into lines and columns to use this operation.

Skip Line

Skip blank lines (lines with no characters), lines that contain a specific string, or lines with certain line numbers. Separate line number ranges with commas. Example: "1,2,4-8" would skip lines 1, 2, 4, 5, 6, 7 and 8. To skip from a line number to the end of the file, omit the second number from the range. Example: "3,5,25-" would skip lines 3, 5, and 25 through the end of the file.

Trim Line

Trim the specified text string, characters, or whitespace from the start, end, or both sides of the line.

Replace Line

Replace either the specified text string or regular expression with the specified text in the line.

Join Lines

Join the specified number of consecutive lines together using the specified separator text string.

This operation is always performed just before the Final operation, regardless of its order in the list of operations. Only one of these operations is performed per input file.

Set Initial Number of Columns

Set the number of columns in all lines. If there are too few columns in a line, new ones are added with the specified text. If there are too many columns in a line, the excess ones are removed.

This operation is always performed just after the Initial operation, regardless of its order in the list of operations. Only one of these operations is performed per input file.

Set Final Number of Columns

Set the number of columns in all lines. If there are too few columns in a line, new ones are added with the specified text. If there are too many columns in a line, the excess ones are removed.

This operation is always performed just before the Final operation, regardless of its order in the list of operations. This allows you to omit trailing columns after rearranging or joining columns. Only one of these operations is performed per input file.

Rearrange Columns

Drag and drop the columns on the left to rearrange them. This operation is always performed just before the Join Columns operation (if not present, then just before the Join Lines operation). Only one of these operations is performed per input file.

Join Columns

Select columns to join and then press the Join button. This operation is always performed just before the Join Lines operation (if not present, then just before the Final operation). Only one of these operations is performed per input file.

Trim Column

Select one or more columns on the left and trim the specified text string, characters, or whitespace from the start, end, or both sides of the selected columns.

Replace Column

Select one or more columns on the left and replace either the specified text string or a regular expression with the specified text in the selected columns.

Reformat Column

Select one or more columns on the left and the desired type of data to reformat it as, and with what new format. If it fails to format, the original value will be used. To use a different value, check the If a column cannot be reformatted box and enter the text to use.

Running the Job

After you have added your input files and edited the Initial and Final operations, and added your desired operations, click the Run Job button to run the job. This will create one or more output files in your output folder containing the included columns in each of the input files. You can press the Stop button while the job is running to stop the job.

The status of the job while it is running will be displayed at the bottom right of the window. After it completes, you can see the entire log file by clicking Log menu, and then Open Last Log File.

The contents of the Output folder can be viewed at any time by clicking the Job menu, then Show Output Folder in Finder. You can drag and drop files from the Output folder to whatever destination you want.

Clipboard Data

The Clipboard Data window can be opened from the Window menu and lets you split the text on the macOS Clipboard into lines and columns. You can then perform operations on the lines and columns, and join them back together. This places the text back on the macOS Clipboard, where you can paste it into another application. The operations you can perform are described below.

To use it, select some text in an app and copy it to the clipboard. Typically this is done by clicking the Edit menu within the app, then Copy. Then in the Clipboard Data window, press the Refresh button to load the text from the macOS Clipboard into the window. Press the Perform button in the toolbar, and select your desired changes to the lines and columns. When done, press the Join button to place the new text on to the macOS Clipboard, and then you can paste it into your desired app.

Some operations work on selected lines or columns. To select a line, click on a numbered cell in the left-most column. To select a column, click on the header of a column. You can select multiple lines or columns by holding down the Shift or Command keys when clicking.

You can edit the value in the cell and then press the Done button when finished.

If a value in a cell requires more than one line to display, double-click on it to see it in a separate window.

To undo the last operation, click the Edit menu, then Undo.

Refresh

To get started, select some text in an app and copy it to the macOS Clipboard. Then in the Clipboard Data window, press the Refresh button. You will see the text displayed in the window below the toolbar.

Split

To perform operations on the text, first it must be split into lines and columns. Press the Split button in the toolbar, and select the line and column separators.

You can enter any text for the line separator, or click the drop down menu and select one of the choices described below.

\nThe line feed character. It is the most common end-of-line character in macOS and UNIX.
\r\nThe carriage return / line feed combination is standard on Windows
\rThe carriage return character. Old Macs used this character to mark the end of lines.

You can enter any text for the column separator, or click the drop down menu and select one of the choices described below.

,The comma character. For data from CSV files, you will typically want to remove quotes as well.
\tThe tab character
|The pipe character

Perform

Click the Perform button to select from the menu of operations described below. You can select cells or lines or columns to operate on by clicking on them. Hold down the Command key to select more than one cell, line, or column.

Trim text strings, characters, or whitespace from one or both sides of the selected cells.

Replace text strings or text that matches regular expressions in the selected cells.

Fill the selected cells with a string or a sequence of numbers.

Add lines above or below, or columns to the left or right of the contents.

Move selected lines up or down, or columns left or right. You can move multiple lines or columns at once.

Delete the first line, last line, first column, or last column. You can delete selected lines or columns by selecting them and clicking the Edit menu, then Delete.

Join lines or columns to concatenate them together.

Merge the contents of selected lines or columns together. For example, if you select 2 lines, and then click on merge up, the contents of each of the elements in the second line will be appended to the contents of each of the elements in the first line. You can change the merge separator that goes between the cells.

Flip the data horizontally, vertically, or diagonally. If you flip the data horizontally or diagonally, any lines that had fewer columns than the longest one will be filled with blank cells.

Join

Press the Join button to join the columns into lines, and lines into text which is then placed back on the macOS Clipboard. After the join, you can paste the contents of the macOS Clipboard into another app.

You can enter any text for the column separator, or click the drop down menu and select one of the choices described below.

,The comma character. For data going to CSV files, you will typically want to add quotes as well.
\tThe tab character
|The pipe character

You can enter any text for the line separator, or click the drop down menu and select one of the choices described below.

\nThe line feed character. It is the most common end-of-line character in macOS and UNIX.
\r\nThe carriage return / line feed combination is standard on Windows
\rThe carriage return character. Old Macs used this character to mark the end of lines.

Previous Jobs

You can save a job in both the Data Prepper window and the Clipboard Data window by clicking the Job menu, then Save Job. The saved jobs can then be found by clicking the Window menu, then Previous Jobs, to show the Previous Jobs window.

When you save a job in the Data Prepper window, the input files and the operations on each input file are saved. When you save a job in the Clipboard Data window, only the operations are saved as there are no input files. The Data Prepper jobs are indicated with a P in a blue box, and the Clipboard Data jobs are indicated with a C in a green box.

To restore operations from an input file in a Data Prepper job, select the job, then select one of the input files in the middle of the Previous Jobs window, then click the Restore Operations button. The operations of the currently selected input file in the Data Prepper window will be replaced with those in the Previous Jobs window.

To perform operations again on text that was refreshed in the Clipboard Data window, open the Previous Jobs window and select the job on the left, then click the Perform Operations button.

To rename a previous job, select the job and press the Enter key on the keyboard and enter the new job name.

To delete previous jobs, select one or more in the left-most list and press the - button underneath it.

To search for previous jobs by job name or last saved date, click the button to the right of the search field. To search by last saved date, enter the date in the format YYYY-MM-DD or YYYY-MM. For example, to see jobs since April 13, 2018, you would enter 2018-04-13. Up to 200 previous jobs can be shown.

Preferences

Click the Data Prepper menu, then Preferences to change the preferences described below.

Decode Error ReplacementText to use when a byte sequence cannot be decoded in the initial operation character set. This can be blank to omit the byte sequence.
Encode Error ReplacementText to use when a character cannot be encoded in the final operation character set. This can be blank to omit the character.
Auto split lines and columnsAutomatically split an input file into lines and columns when it is added to the job.
Open last log after job runsOpens the last log in the default text editor after the job completes.
Open output folder after job runsOpens the output folder after the job completes.

Regular Expressions

A regular expression is a sequence of characters used to match other sequences of characters.

For more info about regular expressions, please see the ICU User Guide.

Examples

Regular Expression Example Matching Strings
abc abc
.bc abc, bbc, cbc, dbc, ...
.*bc bc, abc, aabc, aaabc, bbc, bbbc, bbbbc, cbc, ccbc, cccbc, ...
^\s+ (one or more whitespace characters at the beginning of the string)
\s+$ (one or more whitespace characters at the end of the string)

Common Regular Expression Metacharacters

Character Description
\s Match a white space character. Examples are space, tab, newline, or carriage return.
\S Match a non-white space character.
\d Match any character that is a decimal digit.
\D Match any character that is not a decimal digit.
\w Match a word character.
\W Match a non-word character.
[pattern] Match any one character from the set. 
. Match any character.
^ Match at the beginning of a line.
$ Match at the end of a line.
\ Quotes the following character. Characters that must be quoted to be treated as literals are * ? + [ ( ) { } ^ $ | \ .

Common Regular Expression Operators

Operator Description
| Alternation. A|B matches either A or B.
* Match 0 or more times. Match as many times as possible.
+ Match 1 or more times. Match as many times as possible.
? Match zero or one times. Prefer one.
{n} Match exactly n times
{n,} Match at least n times. Match as many times as possible.
{n,m} Match between n and m times. Match as many times as possible, but not more than m.
*? Match 0 or more times. Match as few times as possible.
+? Match 1 or more times. Match as few times as possible.
?? Match zero or one times. Prefer zero.
{n}? Match exactly n times
{n,}? Match at least n times, but no more than required for an overall pattern match
{n,m}? Match between n and m times. Match as few times as possible, but not less than n.

Common Set Expressions (Character Classes)

Example Description
[abc] Match any of the characters a, b or c
[^abc] Negation - match any character except a, b or c
[A-M] Range - match any character from A to M. The characters to include are determined by Unicode code point ordering.

Character Set Encodings

This app supports converting between UTF-8, Windows CP1252 (Latin 1 / Western European), Mac Roman, and ASCII. To convert files to any other character set encoding, use the iconv command-line program that is included with every Mac.

For example, to convert a text file from UTF-8 to UTF-16, you could run the following program from the Terminal app:

iconv -f UTF-8 -t UTF-16 myfileUTF8.txt > myfileUTF16.txt

Limitations

The Clipboard Data window has been programmed to allow up to 16,384 columns and 1 million lines. However your system may have lower limits than that due to memory or processing limits.