Data Processing – the Foundation of the eDiscovery Process


By Jeff Hudson and Kinny Chan, Precision Discovery

Broadly speaking, there are four steps to compiling evidence in the eDiscovery lifecycle: data collection/preservationdata processing, and data review/analysis and production. Coming after collection and preservation, data processing is often the unnoticed step compared to more talked-about technologies like Technology Assisted Review (TAR). It is however, no less important. If data processing is inaccurate, the results of an entire eDiscovery workflow can be called into question.

Data processing involves three steps that must be completed properly and thoroughly before data is moved to the next phase of the eDiscovery life cycle:

  • Text extraction, which enables you to search the text contents of every file.
  • Metadata extraction, which enables you to search information about a specific file, such as who created the file and when it was created.
  • File hashing, which enables you to identify exact file duplicates. Hashing uses data encryption to creates a digital signature or “checksum” for each file. After they’ve been hashed, files that are exactly the same will have the exact same digital signature.

The validity of an eDiscovery production is often entirely dependent upon the reliability of the text and metadata extraction in data processing. Keyword searching, TAR, semantic analysis, metadata analysis and even file de-duplication all rely on correct text extraction, metadata extraction and file hashing.

One example where poor processing can affect the entire eDiscovery lifecycle is where data isn’t properly extracted from files because they are complex or unknown file types.

Consider all the various forms that electronically stored information (ESI) can take – there are common ESI files in eDiscovery such as emails, MS Office files and PDFs, plus less traditional ESI files from sources including structured data (such as proprietary databases), SharePoint, work share programs, voicemails, instant messages, text messages and social media posts. This great variation can wreak havoc on standard eDiscovery data processing systems. In fact, structured data may fail text extraction, metadata extraction, file hashing, or even fail to be processed altogether when using standard eDiscovery tools.

When you encounter these less traditional or structured data files, there are six steps that you should take.

Step 1: Identify where the data is located. Ask where the data comes from and how end users and administrators interacted with the system. In discovery, it is important to understand how the data was used in the ordinary course of business. Asking the original end users and administrators of the source system will help you decide how you will want to process this data.

Step 2: Understand table lists, field lists and schema. The table and field lists will detail the information found in each field of the structured data. The schema will help you understand the relationships between the tables and fields.

Step 3: Discuss with your case team how the data needs to be produced. The information from Step 1 will help inform how the structured data will be recreated after processing.

Step 4: Convert and process data. Based on the information gathered from Steps 1 to 3, you should be able to convert the data from a structured data format into individual records where traditional eDiscovery data processing can be applied. After conversion, you might even be able to forego traditional data processing and load the data directly into an eDiscovery review tool, such as kCura’s Relativity.

Step 5: Validate the process. After conversion and data processing, validate the results and confirm the integrity of the data by comparing the input and output.

Step 6: Load the results of data processing into a common review platform for document review and analysis. After Steps 1 to 4 are completed, the files should be ready for review in a traditional eDiscovery review system.

Precision Discovery’s expertise in this area of data processing is one of the main reasons to outsource your eDiscovery project to Precision Discovery. Our experts can help penetrate the veil of complexity and identify and extract information and metadata within non-traditional ESI. From text messages to cloud-based software to social media, we can craft a solution. Fill out our contact form today, and one of our team members will respond as soon as possible.

Jeff Hudson-Headshot.jpeg

Jeff Hudson, Managing Director, Consulting Services

Jeff Hudson leads the eDiscovery Consulting group for Precision Discovery. He excels at providing simple, understandable solutions for complex problems. He enjoys great food with great company.

Visit Jeff on LinkedIn


Kinny Chan_Headshot500.jpg

Kinny Chan, Chief Customer Officer

Kinny Chan (@kinnychan) is the lead eDiscovery Consultant for Precision Discovery. He enjoys taking complex challenges and explaining them in simple and understandable terms. He is inspired by the intersection between technology, business and the law.


Kinny ChanComment