Light it Up! Understanding the Format and Specifications of a Production
Proactive Discovery Series | Productions Received | Part 1
Document and data exchange in discovery is typically a two-way street: both sides are obligated to produce records to each other. Yet most eDiscovery blog posts, help articles and industry-marketing attention have focused on the processes used by the producing party to sift through massive amounts of data to make productions to the requesting party. But as the requesting party, what do you do when a rolling production of hundreds of thousands of documents starts landing in your office?
It seems the two-way street is lit only on one side. So let’s see if we can light up the other side.
This article is the first installment in our Proactive Discovery Series: Production Received. The goal of the series is to provide ideas and solutions for common concerns in discovery. The goal of this installment is to set a baseline for analysis and metrics needed to understand the format and specifications of a production before attorney review begins.
So it’s 4:30 on the Friday afternoon before a three-day holiday weekend with depositions scheduled to start next Friday. Your team has been waiting for a production, calling and emailing every day for a week requesting updates and an ETA, when, suddenly, a medium-sized FedEx box lands unceremoniously on your desk. Happy Friday!
Inside the box is a hard drive and a cover letter that says the drive contains the first production in a series of rolling productions. The letter also says the data produced are sourced to the 15 stipulated custodians and provides a breakdown of the Bates number ranges assigned to each custodian’s section of the production.
Get it Loaded
The first thing to do is get the drive to your eDiscovery hosting provider, whether it’s in-house litigation support or a third-party vendor. Get the drive to them as soon as possible so they can start unencrypting and copying the data to a server for loading into an eDiscovery analysis and review tool such as Relativity.
Once everything is open and extracted from the drive, get a file directory of the drive. A file directory is a text file that shows every folder and file on the drive. From the file directory, you can see whether the production contains load files, text files, image files, and/or native files.
This simple step can help identify productions that do not conform to the Electronically Stored Information (ESI) production protocol or order at a very early stage.
For example, if the ESI production protocol says the production format should be TIFF image files with load files and some natives, but the file directory doesn’t show any load files or TIFFs and instead shows only PDF files, then you might assume the format is not proper. Therefore, the format of the production should be addressed with the producing party because it may be missing metadata, family relationships and non-PDF native files.
Document hosting platforms require load files that contain metadata and links to the other files in a production.
- Load files are the glue that hold the images, natives, metadata and searchable text together for attorney review. Without load files, you cannot search by date, review a native Excel file, or run searches without incurring the additional cost of Optical Character Recognition (OCR). The load files identify where documents begin and end, and also where families of documents begin and end . A great example of document and family associations are emails with attachments: you need to see each individual file, but you also need to know that all of the files in an email are in a family group.
- Searchable text is usually produced in standalone text files that are named for the Bates number of the associated document. So the text in BATES00000001.txt would correspond with the image for BATES00000001.tif, and the load files create this association in the database.
- Image files are usually TIFF files that look like the printed form of a file, but without any paper.
- Native files refer to files produced in their native format, such as Word (.doc or .docx) and Excel (.xls and .xlsx). These files can only be opened in their native program, MS Word or MS Excel, in this case.
PRE-Review | Initial Analysis
Assuming the production appears compliant with the ESI production protocol based on the file directory, the next step after loading the production is to spot any issues that counsel needs to address.
The initial analysis should include information about corrupt or missing data, assuming that such information is readily apparent to the tech team loading the data. Analysts loading the data may not know that a native file won’t look right in the document viewer, but they will know when natives that are referenced in the load file are missing from the drive. Additionally, if the tech team knows that the load files should contain metadata, then they can alert the attorney case team if the load file is missing certain critical metadata, such as From, To, CC, BCC, Date Sent, Subject, File Creation Date, and Last Modified Date.
There are more great tools and solutions for productions received, and the rest of the series will provide additional insights as well as sample reporting on production deficiencies, using data visualization to spot production issues, using analytics for analysis and prioritization, and much more.
Aaron Patton, Managing Director
Aaron Patton is a managing director at Precision Discovery who is always on the hunt for common sense solutions to persistent discovery problems. As a Wisconsin Law graduate (2001), Aaron is fascinated by “Law in Action”, especially the intersection of discovery rules and discovery reality. Aaron is based in the Washington, DC area.