Use of Natural Language
Processing (NLP) in
Civil Case Management:
A Report on Three Proof of Concept Projects
By Paula Hannaford-Agor & Jannet Okazaki
May 2023
ii
iii
Contents
Acknowledgements ...........................................................................................................................................................iv
Introduction........................................................................................................................................................................... 1
NLP Triage POC ................................................................................................................................................................... 4
Quality Control POC........................................................................................................................................................12
Conclusions and Recommendations ..........................................................................................................................15
Use Cases .............................................................................................................................................................................19
Appendix A: POC 1—Civil Case Data Extraction and Case Matching POC ...............................................27
Appendix B: POC 2 – Civil Case Triage POC ..........................................................................................................29
Appendix C: POC 3 –Civil Consumer Debt Cases, Quality Control POC ...................................................30
Appendix D: Civil Case Triage Criteria .....................................................................................................................34
iv
Acknowledgements
We would like to express our sincere gratitude
to all those who have contributed to this
proof of concept (POC) study on the use of
Natural Language Processing (NLP) in civil
case processing in state courts. First and
foremost, we want to acknowledge the CCJ
Civil Justice Improvements Committee for their
recommendations to leverage technology to
support effective case management. Their vision
and dedication to improving the civil justice
system have been instrumental in inspiring this
project. We also are thankful for the attendees
at the 2017 Court Technology Conference who
suggested that the use of NLP to extract data
directly from case lings might perform better
than data extracted from court case management
systems for a range of essential case processing
tasks. Their insights and perspectives have been
invaluable in shaping the direction of this study.
A great many individuals helped us throughout
the study. We beneted greatly from the
insights and suggestions of our project advisory
committee members who spent two long days in
a dark conference room helping us outline the
requirements for the POC: Roberto Adelradi
(Eleventh Judicial Circuit Court of Florida), IV
Ashton (LegalServer), Judge Jennifer Bailey
(Eleventh Judicial Circuit Court of Florida),
Katherine Bircheld (McHenry County Circuit
Court, Illinois), Chief Magistrate Gregory
Clifford (Cleveland Municipal Court), Margaret
Hagan (Stanford School of Design), Judge
Steven Houran (Stafford County Superior
Court, New Hampshire), Casey Kennedy
(Texas Judicial Branch), and Kelly Steele (Ninth
Judicial Court of Florida). We also owe a debt
of gratitude to Judge Gina Beovides (Eleventh
Judicial Circuit Court of Florida) who provided
feedback to the vendors during the machine
learning phase of the project; to our research
interns Camden Kelliher, Laura Acker, and
Madeline Williams who spent many hours
manually coding data from civil case lings;
to our NCSC colleagues for their support and
collaboration throughout the project, especially
Jim Harris, Barbara Holmes, Allison Trochesett,
Sarah Gibson, and Keeley Daye; and to Henry
Sal, Jr. of Computing Systems Innovations and
Abhinav Sonami of Leverton Intelligence, the
commercial vendors who donated their time
and talents to participate in the POC.
We want to express our heartfelt appreciation
to the Superior Courts of Arizona in Maricopa
and Pima Counties, the Fifteenth Circuit Court
of Florida (Palm Beach), and the Cleveland
Municipal Court, which provided exceptionally
large troves of court documents for this study,
and to Darren Dang, Karen Hernandez, and
Brett Howard in the Superior Court of Orange
County, California and to Richard McHattie
of the Superior Court of Arizona in Maricopa
County for showing us how NLP can work in real
court environments. Finally, we are grateful to
the State Justice Institute both for its nancial
support (SJI 18-P-020) and for its great patience
as we struggled to complete this project in the
midst of a global pandemic. We are condent
that the lessons learned will benet courts for
many years to come.
The views expressed in this report are those of
the authors and do not necessarily represent
those of the State Justice Institute, the National
Center for State Courts, or the individual
courts, court staff, or vendors who participated
in the project.
1
Introduction
Natural language processing (NLP) is a eld
of computer science, articial intelligence,
and computational linguistics that employs
predictive analytics and machine learning with
a focus on the interaction between computers
and both written and spoken language. First
developed in the 1950s, NLP has become
increasingly sophisticated over the past two
decades as computational power has increased.
Because legal language tends to be more
structured in format than other linguistic forms,
NLP applications have become particularly
useful for a variety of law-related tasks.
For example, NLP is the primary technique
employed in e-discovery to identify documents
related to a specic query based on keywords
or phrases. This technology is also being used
to extract information from multiple documents
to assess variation in key data elements for risk
management purposes.
1
Although courts are vast repositories of
legal documents, they are only recently
implementing predictive analytics and machine
learning techniques, including NLP, to support
court operations. For example, one area in
which NLP has shown particular suitability is
the task of redacting information disclosed
in court documents to protect the privacy
interests of litigants and vulnerable third
1 Lahr Mahler, What is NLP and Why Should Lawyers Care? ABA Law Practice today (Feb. 13, 2015)
(http://www.lawpracticetoday.org/article/nlp-lawyers/).
2 See, e.g.,
tom cLarke et al., Automated redaction ProoF oF concePt rePort (NCSC Sept. 2017).
3
civiL Justice imProvements committee, a caLL to action: ensuring civiL Justice For aLL (ncsc 2016).
parties, including children.
2
More recently,
courts have begun to explore the potential
benets of NLP and other tools such as data
extraction and robotic process automation
(RPA) for a variety of case processing
tasks. Maricopa County Superior Court, for
example, has used these techniques to extract
information from both paper and electronic
documents to enter onto the court’s case
management system (CMS). The Superior
Court in Orange County, California is training
these tools to recognize different subtypes of
default judgment motions so that clerks do not
have to open the electronic documents to verify
the type of default sought by plaintiffs.
In 2016, the Conference of Chief Justices
(CCJ) and the Conference of State Court
Administrators (COSCA) endorsed
recommendations to leverage technology to
improve civil case management.
3
In particular,
NLP and related tools could be used to support
two areas of civil case processing: sorting
cases at ling based on the anticipated level
of judicial involvement in case management,
and conrming that essential procedural
requirements have been satised before
entering nal judgments in cases.
2
Previous efforts to automate civil case triage
based on information extracted from CMS were
only moderately successful in assigning cases to
the correct track, in part because many of the
data elements that experts believe are related
to case complexity are not routinely captured
in CMS. In addition, CMS data elements often
lack sufcient precision to make meaningful
distinctions between cases of varying
complexity.
4
NLP might overcome many of the
limitations of CMS data in civil case triage by
identifying and extracting data directly from
case pleading documents. Indeed, NLP could
capture a great deal more information than
CMS data such as the number and nature of
legal claims asserted and relief requested by the
plaintiff, the defendant’s response to each claim
including the number and nature of afrmative
defenses, counterclaims, crossclaims, and third-
party claims. Collectively, this information
could be used to determine the level of legal
and interpersonal conict between the parties
and the anticipated volume of discovery, both
of which are recognized as important factors
in pathway assignment. The utility of these
technologies for identifying factors related to
case complexity might even be extended across
4 civiL Justice initiative: criteria For automating Pathway triage in civiL case Processing (ncsc 2017).
5 During the mortgage foreclosure crisis in 2009-2010, many courts discovered widespread problems in
court lings, including lack of standing to foreclose on the property, incomplete mortgage servicing records,
and fraudulent certications (e.g., robo-signing) of mandatory disclosures and documents. See, e.g., Maria
Wang, GMAC’s ‘Robo-Signers’ Draw Concerns About Faulty Process, Mistaken Foreclosures, ProPubLica (Sept. 29,
2010); Stacy Cowley & Jessica Silver-Greenberg, Behind the Lucrative Assembly Line of Student Debt Lawsuits,
N.Y. times (Nov. 13, 2017), available at https://www.nytimes.com/2017/11/13/business/dealbook/student-
debt-lawsuits.html; mary Spector, Default and Details Exploring the Impact of Debt Collection Litigation on
Consumers and Courts, 6 Va. L. & bus. rev. 257, 285 (2011); Peter A. Holland, Junk Justice: A Statistical Analysis
of 4,400 Lawyers Filed by Debt Buyers, u. md. Francis king carey schooL oF Law LegaL studies research PaPer,
No. 2014-15; FederaL trade commission, rePairing a broken system: Protecting consumers in debt coLLection
Litigation (2010).
multiple cases, for example, by identifying
individual litigants or attorneys who are more
likely to require judicial direction or oversight.
These technologies might also be able to
identify external trends that contribute to
individual case complexity, such as changes in
case law, the regulatory environment or even
the business practices of signicant justice
system stakeholders.
Courts also struggle to ensure quality decision-
making in high-volume court dockets such
as small claims, landlord/tenant, consumer
debt collection, and mortgage foreclosure.
The overwhelming majority of defendants
on these dockets are self-represented and
lack the legal expertise to challenge improper
claims or raise legitimate defenses.
5
NLP
could be used to identify information in case
documents that signal the need for additional
scrutiny during in-court hearings or before
entering default judgments. Such information
could include inconsistent information (e.g.,
different defendant names or addresses on
the complaint, the contract, and the service
return afdavit), or the absence of essential
information with the complaint (e.g., copy of
3
original contract, proof of standing, proof of
timeliness, active military afdavit, or missing or
incorrect documentation of damages and fees).
To explore the feasibility of NLP to support
court operations in these two areas, the
National Center for State Courts (NCSC)
designed three distinct Proof of Concept (POC)
projects. NCSC partnered with three general
jurisdiction courts that participated in the CJI
automated civil case triage project to use NLP
techniques to identify and extract key terms
and characteristics from the case pleadings for
use in assigning cases to an appropriate civil
case processing track.
6
For quality control over
high-volume dockets, the NCSC worked with
the Cleveland Municipal Court on a POC to
identify inaccurate or missing information from
case documents in its consumer debt collection
docket that would signal the need for increased
judicial review. The NCSC partnered with two
vendors that specialize in NLP technologies
to control for variation in vendor quality. In
addition, NCSC interviewed IT staff in the
superior courts of Maricopa County, Arizona
and Orange County, California about their
experiences implementing these technologies
for purposes similar to the POCs.
6 The courts that participated in the automated
civil case triage project included the Arizona
superior and justice courts; the Missouri circuit
courts; and the Palm Beach, Florida circuit and
county court.
4
NLP Triage POC
The previous study of automated civil case
triage found that CMS data elements either
lacked sufcient precision to make meaningful
distinctions between cases of varying
complexity or were not recorded in CMS at all.
7
The most important data elements for triage
purposes were those related to case type; the
number of parties; the defendant’s response,
if any, to complaint allegations, including
crossclaims, counterclaims, and third-party
claims; and the defendant’s representation
status. The NLP Triage POC was designed to
test whether NLP could extract those data
elements from case pleading documents
(complaints and answers) with sufcient
accuracy and precision to employ the triage
criteria developed in the automated civil case
triage study.
In preparation for the Triage POC, NCSC
assembled electronic copies of case pleadings
from three of the general jurisdiction courts
that participated in the automated civil triage
7 criteria For automating Pathway triage in civiL case Processing, supra note 4.
8 Maricopa and Pima County Superior Courts in Arizona, and the Fifteenth Judicial Circuit Court of Florida.
study.
8
Case pleadings have both structured
and unstructured elements. In all three courts,
pleadings included a case heading on the rst
page featuring the name of the court in which
the document was led; the type of document
(e.g., complaint, answer); the case number;
the case title (plaintiff(s) name v. defendant(s)
name; and the name, contact information, and
bar number of the attorney ling the document.
Figure 1 illustrates a typical case heading. A
date stamp showing the date and time the case
was led generally appears on the upper right-
hand corner of the document. The content of
the documents following the case headings
was a semi-structured narrative outlining the
plaintiff’s alleged facts of the case (complaint)
or the defendant’s responses (answer), the
legal claims or defenses, and the relief sought,
including demands for a jury trial.
5
Figure 1: Complaint Filed in Superior Court of Arizona, Maricopa County
6
The Triage POC involved two components
(Appendix A). The rst component was purely a
data extraction exercise to identify and extract
case information from the pleadings that would
permit judges or trained court staff to assign
cases to a case processing pathway based on
the formulas developed in the automated civil
case triage project. Table 1 displays the key
data elements.
The second component was a relational data
test to match cases based on the court and case
number, to compare the number of defendants
named in the complaint and answer, and to
identify differences in the number of parties,
names, or litigant types. In terms of civil case
processing, this information would indicate
whether a case was “fully joined” – this, that
all named defendants had responded to the
initial complaint – and the court should issue
a case scheduling order or set a date for a
case management conference to establish
expectations for the litigation process.
A second Triage POC invited vendors to use AI
tools either to review and triage cases based on
the NCSC formulas or to develop and test a new
model based on predictive analytics. This POC
essentially asked the vendors to identify and do
computational processes of key data to create
information pertinent to case management
processes such as counting the number of
defendants. These computations were then used
to triage the case into a specic path.
Table 1: Data Elements Extracted in NLP Triage POC
Complaint
Court in which the case was led
Case number
Filing date
Names and types of rst six plaintiffs
Names and types of rst six defendants
Unknown defendants included in complaint
Case type
Bar number and law rm name of plaintiff attorneys
Plaintiff demand for jury trial
Amount of compensatory damages demanded
Injunctive relief, punitive damages, attorneys fees or declaratory judgment demanded
Answer
Answer date
Names and types of defendants in Answer
Bar number and law rm name of defendant attorneys
Defendant allegations of crossclaims, counterclaims or third-party claims
Afrmative defenses
Defendant demand for jury trial
7
NCSC assigned most of the assembled
documents to a Learning Set that participating
vendors could use in the machine learning
phase to teach their software to extract the
data elements needed for triage. In this process,
an analyst works within the software to identify
and label data elements within the documents.
Through the iterative process, the machine
learns the pattern and reaches a threshold
where it can identify the data elements at a high
level of accuracy. The learning set included
39,765 pleading documents for 34,796 civil
cases led in the Superior Court of Arizona in
Maricopa County; 9,862 pleading documents
for 5,004 civil cases led in the Superior Court
of Arizona in Pima County; and 16,632 pleading
documents for 13,724 civil cases led in the
Fifteenth Judicial Circuit Court of Florida (Palm
Beach County).
Although vendors had the opportunity to ask
clarifying questions about the desired data
extracts, the Triage POC was more complicated
than previous POCs insofar that it required
knowledge of civil procedure and terminology.
In addition, the learning process was conducted
in a static environment (documents saved on
NCSC servers) and was based on computer
algorithms with limited human review and
feedback. Machine learning is an unavoidable
and critical rst step to train the software. A
large volume of representative documents and
human review time are required to achieve
desired thresholds of accuracy. The level of
structure within the documents may also
inuence machine learning time. For example,
9 The algorithms developed as triage criteria for the automated civil case triage project assigned
74% of cases to the correct case processing pathway. For incorrectly assigned cases, however, the
algorithms more often failed to elevate cases to a higher pathway (22%) than they were to elevate cases
inappropriately (4%).
structured forms are easier to learn than
unstructured documents.
NCSC selected pleading documents for 250
cases as a Test Set that was released to the
vendors at the end of the Learning Phase.
Cases selected for the Test Set were weighted
toward those with higher complexity index
scores to assess the extent to which NLP
methods could improve the accuracy of
triage pathway assignment compared to the
automated civil case triage algorithms.
9
Twenty
percent (20%) of the POC Test Set consisted
of cases assigned to the complex pathway
compared to 7% of the cases overall; 40% of the
POC Test Set consisted assigned to the general
pathway and 40% to the streamlined pathway
compared to 19% and 75%, respectively, of the
cases overall. Due to an error in assigning cases
to the Test Set, 26 cases were not manually
coded by the NCSC. Consequently, the vendor
results reect 224 usable cases.
Legally trained project staff reviewed the
Test Set cases and documented data elements
related to case complexity. Using the triage
criteria developed in the previous study,
project staff also assigned each case to a case
processing pathway as well as indicated their
recommendation for a different pathway
if warranted based on their review of the
pleadings. The vendor ran their data-extraction
software on the Test Set and submitted it to
NCSC project staff to be compared to the
manually coded Test Set. The compiled results
are reported in Table 2.
8
Table 2: Data Extraction Success Rate
Total N Correct %Correct
1st Plaintiff Name 208 206 99.0%
Answer Filed 224 221 98.7%
1st Defendant Name 209 206 98.6%
1st Plaintiff Bar Number 192 189 98.4%
Defendant Jury Demand 112 110 98.2%
Plaintiff Law Firm Name 200 195 97.5%
Damages Unspecied 106 103 97.2%
Plaintiff Jury Demand 208 201 96.6%
Cross Claim 110 106 96.4%
1st Defendant Bar Number 100 96 96.0%
Third Party Claim 111 106 95.5%
1st Plaintiff Type 208 197 94.7%
Counter Claim 111 105 94.6%
Afrmative Defenses 108 102 94.4%
Punitive Damages 214 201 93.9%
2nd Defendant Name 147 138 93.9%
Defendant Law Firm 106 99 93.4%
Attorneys Fees 209 195 93.3%
Injunctive Relief 214 198 92.5%
1st Defendant Type 207 190 91.8%
Answer Date 104 105 91.4%
Declaratory Relief 206 187 90.8%
2nd Plaintiff Bar Number 74 64 86.5%
2nd Defendant Bar Number 36 26 72.2%
2nd Plaintiff Name 60 42 70.0%
3rd Plaintiff Bar Number 30 21 70.0%
Compensatory Damages 95 62 65.3%
Unknown Defendants 100 62 62.0%
Case Type 224 69 39.2%
9
Overall, NLP performed quite well on the
data extraction test, correctly identifying
most of the requested data elements more
than 90% of the time. Many of these data
elements were structured or semi-structured
data located in the document heading, making
them relatively easy to identify and extract.
Others, such as demands for jury trials,
injunctive or declaratory relief, afrmative
defenses and crossclaims, counterclaims,
and third-party claims were often only found
in the nonstructured narrative sections of
the pleadings, but were sometimes set off as
subheadings within the documents.
The few instances that NLP extracted incorrect
information were most often due to incomplete
machine learning concerning idiosyncratic
formatting styles employed by lawyers in
the participating jurisdictions. For example,
many plaintiff lawyers named “John Doe,
Jane Doe,” and “XYZ Corporations I through
X” as placeholders in the named defendants
in the event that additional defendants would
be identied at a later time, but NLP did not
recognize these as “unknown defendants.
Similarly, the use of DBA (doing business as) or
AKA (also known as) to designate plaintiff and
defendant pseudonyms was often misidentied
as a second party rather than an alternate
name for the original party. Finally, several
smaller law rms led pleading documents
with the names and bar numbers of all licensed
attorneys employed by the rm listed on the
letterhead; the ling attorney record would
then highlight or mark their name to indicate
that they were counsel of record on the case.
Additional direction during the machine
learning phase would likely have corrected
these errors over time. If uncorrected,
however, those errors would have created
additional errors involving calculations for the
number of parties, which was a key factor in the
triage algorithms.
10
Figure 2: Example of Unknown Defendants Not Identied by NLP Technologies
The data element that posed the greatest
difculty for NLP was identication of the
case type. NLP correctly identied the case
type in only 39.2% of the cases. In those
instances, it did so only because the case
type was prominently included in the case
heading with sufcient detail to be of use for
case triage purposes. For example, “mortgage
foreclosure” and “motor vehicle tort” were
often identied correctly in case headings in
all three participating courts. Other case types
might be identied in the heading as “non-
motor vehicle tort” or “breach of contract.
These more general designations cannot
differentiate a slip-and-fall premises liability
case from a medical malpractice case or a credit
card collection suit from a commercial contract
dispute or partnership dissolution. As a general
rule, medical malpractice, commercial contract
disputes, and partnership dissolution cases are
far more complex and require far more judicial
involvement and oversight than premises
liability or credit card collection cases.
11
Figure 3: Example of Incorrect Case Type Identication
Ultimately, none of the NLP vendors attempted
the second or third components of the Triage
POC, so the NCSC used their ability to correctly
identify and extract information from the rst
component to assess the rate at which they
could have done so. As Table 2 showed, NLP
successfully identied and extracted 90%
or more of most data elements other than
case type. The relational data test required
the NLP vendor to determine whether an
answer was led in response to the complaint,
successfully count the number of plaintiffs in
the complaint and defendants in the answer
and determine whether all of the named
defendants had responded to the complaint. It
correctly determined that an answer was led
in 98.7% of the cases and correctly identied
all plaintiffs and defendants in 83.9% of the
cases. Consequently, it would have successfully
performed the relational data test for 87.6% of
the cases in which an answer was led.
Successfully completing the third POC
component, however, was heavily dependent
on correctly identifying the case type, the
existence of an answer, the representation
status of the parties, the number of plaintiffs
and defendants, and in many instances, the
relief sought including a jury demand by either
or both parties. Although the success rate was
acceptable for most of these items individually,
NLP correctly identied all of the necessary
information for triage in only 24 cases (10.7%).
Incorrect case type was the most frequently
occurring error.
12
Quality Control POC
The NLP Quality Control (QC) POC was an
intentionally ambitious test of NLP ability to
classify documents, extract information, and
analyze and compare the extracted information
to a checklist of case processing requirements
for debt collection cases. See Appendix A
for POC 3. The dataset consisted of 21,469
documents led in 3,420 unique consumer
debt collection cases disposed in the Cleveland
Municipal Court. The Cleveland Municipal
Court was specically requested to participate
in the POC because it had recently enacted Civil
Practice Rule 6.13, requiring plaintiffs seeking
default judgments to provide an afdavit of
current military status, proof of assignment from
the original creditor or original party in interest
to the plaintiff, and the last billing statement
from the original creditor sent to the defendant
or an afdavit explaining why the required
documents are not available. If Rule 6.13 is
satised, the relevant documentation would
include proof of the plaintiff’s standing to bring
suit, proof that the defendant received notice
of the lawsuit, proof that the case was led
within the Ohio statute of limitations governing
debt collection cases, and proof of the amount
of damages sought.
10
Documents related to
100 cases were selected for the QC POC Test
Set while the remaining documents were made
available to vendors as a Learning Set.
11
10 The CCJ Civil Justice Improvements Committee identied proof of standing, notice, timeliness, and
amount of damages as elements that are fundamental to procedural due process that had often not
been observed in high-volume dockets. Supra note 3, at 33-34.
11 All cases selected for the NLP QC Test Set included at minimum the complaint, summons, proof of
service return, and motion for default judgment with accompanying documentation.
Like the Triage POC, data from the QC test
cases were manually coded by project staff and
entered into a dataset for analysis. In addition
to documenting key information, the coders
answered a series of relational questions
related to standing, notice, timeliness, and proof
of claims. Table 3 provides basic descriptive
information about the QC Test Set cases. Of
particular note, 59% of cases were led by
a plaintiff who purchased the debt from the
original creditor, but only 88% of those cases
included documentation showing the chain of
custody for the debt. Sixteen percent (16%)
of cases included proof that the defendant
received notice of the claim and in an additional
80% of cases notice was presumed because
nothing in the le indicated that the summons
was not delivered. Three cases, however, had
no summons documentation and in one case the
summons was mailed to the plaintiff’s address.
In three cases, the name of the defendant did
not match the debtor named in the contract
on which the suit was predicated. Six cases
did not indicate the date of default, which is
necessary to determine whether the case was
led within the statute of limitations governing
debt collection cases. Four cases failed to
include proof of the amount claimed in the suit.
Two cases indicated that the debtor had led
for bankruptcy, which should have stayed the
proceeding in the municipal court. Each of these
13
inconsistencies should have triggered additional judicial scrutiny before a judgment was entered.
The Quality Control POC was designed to identify those inconsistencies that might have been
overlooked and bring them to the attention of a judicial ofcer.
The electronic documents provided by the Cleveland Municipal Court included .pdf, .tif, and .xml
formats and the image resolution for the documents varied from 200dpi to 400dpi. In addition,
the case number was not always consistently marked on each ling. For example, case number
2018-CVF-06499 appeared variously as 18 CVF 6499, 18CVF 6499, and 2018 CVF 006499
in different documents. Finally, case lings often included duplicate copies of previous lings
(e.g., afdavits included with both the complaint and the motion for judgment), which were
subsequently scanned by court staff as part of the electronic le. Consequently, a signicant
challenge for the NLP vendors was correctly identifying the document type, associating the
document with the correct case number, and then ignoring duplicate documents within the same
electronic les.
The rst task for the POC was to classify the type of document and count the number of unique
documents associated with each case. Table 4 compares the number of unique documents
identied by manual coding and the NLP process. It is clear from the analysis that poor image
resolution and the duplication of documents within les greatly undermined the accuracy of the
NLP document classication process. Variations in the format of the case number (truncation of
year, extraneous leading zeros, and hyphenation or spaces between different sections of the case
Table 3: Description of QC Test Set Cases
Average number of documents 7.6
Average claim amount $2,938.70
Percent of cases served by certied mail 97%
Percent of cases with proof of service 16%
Percent of cases with presumed service 80%
Percent of cases with service date < 1 year 96%
Percent of contested cases 2%
Percent of cases with same defendant and debtor name 97%
Percent of cases led by original creditor 41%
Percent of cases with proof of ownership by assigned plaintiff 88%
Percent of cases with default date included in documentation 94%
Percent of cases with proof of claims 96%
14
number) resulted in the NLP identifying 293 discreet
case numbers for 100 cases.
12
Additional human
interaction during the machine learning phase of the
POC would likely have corrected for the variations
in case number formats. Similarly, the NLP
technology captured the title of documents exactly
as they appeared, but could not classify the type of
document without additional direction during the
machine learning process. For example, the NLP
extraction identied 113 documents as “Certied
Mail Signature,” “Certied Mail Unclaimed,” or
Certied Mail Undeliverable,” but did not recognize
them as return of service documents. Similar to
its performance in the Triage POC, this lack of
specication made it impossible for the NLP to
perform the subsequent relational tasks to identify
gaps in documentation that would indicate the need
for additional judicial scrutiny before a judgment
was entered.
12 A case number could not be identied for an additional
552 documents.
Table 4: Document Classication
Manual
Coding
NLP
Vendor
Number of cases 100 293
Number of unique documents 762 1301
Complaints 100 92
Return of Service Documents 112 113
Motions for Judgment 99 93
Summonses 191 24
Answers 2 8
Afdavits 97 105
Judgments 163 99
Post-judgment lings 43 0
15
Conclusions and Recommendations
A global movement towards digitalization
is underway and the courts are included in
this trend. With the public becoming more
digitally savvy, there are greater expectations
for courts to embrace digital technology and
innovative approaches. Public interactions
with the court system are a main driver of
change as their demands for quality and speed
of service are evolving both online and ofine.
New ways of working are also inuencing
the court’s workforce. Technology provides
opportunities for courts to work differently
with new approaches to case processing,
remote services, and public access to the
courts.
The tools within Articial Intelligence
continue to grow and evolve. These proof of
concept and use cases demonstrate that AI
and NLP technology are capable of improving
processes and delivering needed outcomes
given the appropriate machine learning time
and attention to the quality of data. Courts
that implement NLP technology usually start
with areas that contain iterative tasks with
low variability. Identifying iterative processes
that are clear and easy are a common starting
point, yet the benets can be incredible.
Reducing staff time by having technology
deal with redundant tasks allows staff to shift
attention to more complex tasks.
Data are at the core of successful digital
transformation and one of the main benets
of AI technology is that data are no longer
bound by traditional databases. Today data
can be found in more diverse forms such as
images, searchable text, handwriting, and
even audio/spoken word. With the ever-
increasing processing power in computing
systems, large data storage capacities, and
innovative tools, there are huge opportunities
to harness the power of data.
16
Key Takeaways
Some key takeaways that should be considered before courts begin implementation of NLP and
other innovative AI tools.
Data are Central to Innovation
As expected, the quality of the data greatly
impacts future processes. If data is in a
searchable format, such as a .PDF, it is easier for
the software to fully understand the information.
If the information is in a scanned document
image such as .TIF or .JPG, then an Optical
Character Recognition (OCR) process must be
completed before the software can read and
process the information within the document.
The quality of the image resolution is critical
for the OCR process to work effectively, so
courts using scanned images should employ
the minimum resolution standards necessary
for effective OCR. Courts may need to improve
existing document resolution if OCR minimum
requirements are not met before starting the
machine learning process.
Other markings such as time stamps over text
and handwriting on forms may offer additional
challenges in accuracy. Software recognition of
handwriting and the ability to ignore markings
such as stamps (noise) has improved and will
continue to improve. However, it is still best
to work towards the cleanest documents
possible for scanned images. Ideally, information
submitted into the court case le should be in
a fully digital format. Most information today
is created within a computer, so printing and
scanning information back in as an image should
be avoided. Processes should keep information
13 See www.ncsc.org/NODS.
“born digital“ to be retained in a fully digital
format throughout the process. Digital time
stamps, digital signatures and digital notarization
process help make this possible. Ultimately, the
courts should focus on collecting “information
contained in documents.
Data should follow standards to provide
continuity to the software. Initiatives like the
National Open Data Standards (NODS)
13
are
useful for providing courts with standard data
denitions and structures. The more courts can
agree on and use standards, the more easily
software can learn. Standards make sharing and
understanding information between disparate
courts much easier. Standards at the local level
such as standard form structures, standard
data collection methods (portals, guided forms
assembly), and well-designed cover sheets can
help business analysts utilize software tools such
as NLP to a greater potential as these efforts
provide consistent learning. Having to learn
multiple possible terms related to Dissolution of
Marriage for example is possible, but the more
“variety” that exists, the more learning must take
place. Variability also impacts the continuous
learning process and courts will have to maintain
a growing catalog of learned terminology with
various degrees of clarity as to what is occurring
in the case.
17
Rethink Processes
Fundamental to moving into digitization and using tools such as NLP requires courts to ask the
fundamental question “Why are we doing this process this way?” and “How should we organize
our work?” Courts should also consider how they can create an environment where they can be
t for the future and adaptable to changing needs. Implementing new innovative tools provide
the perfect opportunity to look at the entire process and make changes that support current
innovative improvements as well as setup future opportunities. It is a time to transform not just
technology, but also human processes, policies, and experiences.
Document Intelligence
Machine learning allows software to read, understand, and identify key data elements. Then the
software can be directed to take actions such as redaction, data extraction, assessment of data,
and assignment into work queues or workows. Whether scanned paper or a natively digital
document, a lot of information is contained in the case record. Finding new ways to tap into that
information is the goal of developing document intelligence strategies.
Traditional Databases
Courts still rely on traditional case management systems with data dened and stored within
databases. Extracted data using document intelligence may be integrated or placed into databases
more easily without relying on manual data entry.
Robotic Process Automation (RPA)
When direct data integration is not feasible or is complex, many courts are using RPA. RPA makes
use of machine learning to identify and extract key elements off the digital court case le, and then
replicate human data entry steps to populate a database. RPA is also used to randomly select case
records for quality control tests as well as other simple iterative tasks that can be learned.
18
Data Warehouses
During early computing days when data was centrally stored on a mainframe, storage was limited
and highly managed. Now with storage and processing capabilities becoming more robust, it
is possible to collect data from various sources to create a combined data repository in a data
warehouse. This reduces time to conduct analyses from multiple sources because much of
the data has already been combined and placed into a storage space that is a single source of
query. Data warehouses store current data from multiple databases as well as historical data for
purposes of in-depth data analytics.
Advanced Digital Assistants – Chatbots
Courts are making use of NLP and machine learning to create advanced digital assistants and
Chatbots. These assistants and bots help the public with information, guide them to resources
such as standard court forms, provide language access, and connect them to the appropriate court
staff for one-on-one assistance, if needed. These tools also help internal staff with data analytics,
staff education, and assistance with internal resources such as human resources.
Business Intelligence
When courts put the effort into machine learning, this catalog of learned information may be
applied to multiple levels of court case processing. When used at multiple points, the key benet
is the development of business intelligence (BI). Business intelligence leverages technology-
driven processes that collect and store data. Then data analytics can be more rapidly and
comprehensively completed to inform decisions and process improvements. Business intelligence
provides greater capabilities for benchmarking, metrics, and analysis.
19
Use Cases
The use cases described below make use of NLP as well as other AI tools to perform functions
similar and separate from the Proof of Concepts in the grant. They are great examples of the
exibility and variety of uses in the court environment. These use cases focus on improving
internal processes as well as public facing processes and services to improve overall customer
experience (CX).
ARIZONA MARICOPA COUNTY CLERK OF THE SUPERIOR COURT
The Clerk of Court for Maricopa County Superior Court is the record keeper and duciary for the
Superior Court of Maricopa County, the fourth largest county in terms of population. The clerk
handles records, documents, and money. Maricopa is an all-electronic court record court, but
lings are submitted both electronically and in paper. Paper is digitized by scanning.
• An average of 36,291 pieces of paper are led daily.
• The Clerk processes an average of 14,500 documents daily.
• More than 155,000 new cases are led annually.
• The document image repository holds 78 million scanned images;
paper lings are still scanned.
• The Clerk operates nine geographic locations with multiple ling counters.
• The Clerk processes an average of $563,414 in monies daily.
The main driver for Maricopa’s AI initiatives stemmed from the internal question of “how can
we improve our traditional document processing?” In addition to lings, the Clerk’s ofce also
received approximately 30,000 calls per month with questions ranging from case information
questions, e-ling support, payments, and licensing. The Clerk of Court wanted to do more with
technology than congure off-the-shelf systems or develop applications in-house. Instead, the
IT ofce sought to be “future ready” to take advantage of tools like Articial Intelligence and
Robotic Process Automation (RPA) and apply them to the environment. The Clerk strategized and
prioritized leveraging emerging technology to transform service delivery and to improve customer
experience. Bold, but calculated.
Strategies used involved:
• Articial Intelligence:
• Robotic Process Automation (RPA)
• Business Intelligence – Data Warehouse
20
It was also important to invest in talent before taking the journey. Maricopa hired a Chief
of Innovation and AI. It takes a team to congure, train, test and support the AI. Customer
Experience Engineers were put into place and are similar to business analysts, but focus is more on
AI conversations to monitor and improve the customer experience.
Operational Efciency –
Transformation with AI
Many courts still have document management systems especially in the early days of scanning
paper case les. Even with e-ling, paper lings still occur. Document imaging or “intelligent
capture” is done by scanning the document and putting it through an OCR process to covert
the image into readable data. For documents that are scanned or received natively in a fully
readable format, once received the focus then shifts to data within the digital documents. Data
is automatically identied, classied, and data types and classication are trained to trigger
placement into workows. Previously this was a manual process, but now has been automated.
Intelligent capture was customized to t the needs of the clerk. The Clerk required not only the
document title, but also the case type and docket code. Once those elements are identied, the
case then is routed to be auto docketed.
Once the intelligent capture process reached the high 90% accuracy condence threshold, the
Clerk moved to implement Robotic Process Automation (RPA). By enhancing their workforce
with a digital workforce (RPA), the organization improved further with timeliness and efciency.
With this complement of AI tools and measures there has already been an over 50% improvement
in the turnover of paper documents from processing lings into electronic court records
and docketing, and a 40% efciency improvement in staff time. This process has allowed for
24/7/365 processing both attended and unattended.
21
EXAMPLE OF INTELLIGENT CAPTURE, REDACTION, CONFIDENCE THRESHOLD
22
RPA – How RPA Robots were used
in Maricopa
Robotic process automation (RPA) is a
business process automation technology
based on metaphorical software robots (bots)
or an articial intelligence AI) digital worker.
This involves developing an action list by
having the bot watch a human perform the
task within a software interface and then
learning to perform the automation through
repeated observations. This is an alternative
to using an Application Programming
Interface (API) to exchange information. A
common use for RPA is to train it to identify
data from case documents and perform
data entry functions through an automated
process. This use case for RPA helps with gaps
in the workforce in areas where staff may be
performing iterative tasks that can be learned
and replicated by software.
In Maricopa County, each of the bots was
given a name, including “Ron Burgundy,
World News Agent,” “Yoda,” “Alfred,” and
CLEO”. Each bot uses NLP to identify
information and is given instructions on steps
to perform via a learning/training process.
RPA mimics human steps such as data entry
or launching a search query on the Internet so
these steps may be automated.
Ron Burgundy is an Internal Testing BOT that
searches websites for new information about
courts and technology and presents it back
to the internal team. World News Agent
assists employees to nd information on
external websites.
Yoda is an Internal Slack BOT that assists
employees to nd information about
administrative and resources, such as signing
up for benets. (Assist Employees)
Alfred is an Internal Slack BOT that assists
the technology division with monitoring and
with managing technology requests. Alfred
has some help desk assistance functions,
including classifying the assistance request
and automatically creating and assigning the
help desk ticket.
CLEO (English) and CLEO (Spanish) is a
customer-facing BOT Virtual Assistant that
focuses on the customer experience. IBM
Watson is used for voice conversations and
Twilio to connect to Omnichannel. Using NLP,
CLEO appears as a chat bot on the Clerk’s
website and allows customers to engage
24/7 in both English and Spanish. CLEO
averages 3,700 chats per month in includes
the ability to seamlessly manage a warm hand
off to a human conversation with a customer
experience (CX) representative. Watson
is used as a knowledge base for human
conversations to help ensure information is
consistent and evolves as it is exposed to new
information. Thus far, 80% customers rate
their experience as satisfactory 80% of the
time. Maricopa will be moving from Chatbots
to conversational AI as the next iteration
in their transformation. Maricopa County
Superior Court is working with the vendor
Computing Systems Innovation (CSISoft)
to implement AI, machine learning, data
extraction, and RPA.
23
ORANGE COUNTY SUPERIOR COURT OF CALIFORNIA
Project Theme: Data is our Killer App. Orange
County viewed this opportunity with the
slogan “Data is our killer app. To understand
the existing process to transform the area of
document intelligence, areas of workload,
capacity, backlog, jury response rate, and scal
impact of policies were reviewed in depth.
Orange County Superior Court of California
was challenged with a high volume of
unique forms entering the court. There is an
investment of time to review these forms which
is a highly procedural process. Information
contained within the forms trigger placement
into workows. This process was using an
incredible amount of human processing time
and stafng was not sufcient to keep up.
Many of the forms are paper les scanned and
digitized as an image .PDF rather than having
a native fully digital searchable .PDF. Faced
with this challenge, Orange County looked at
opportunities to transform and digitize the
process.
Even in e-ling scenarios there was a high
rejection rate. The Family Division had a 20%
rejection rate of e-led forms, and 40% of the
time the reason was incomplete information.
Each form is manually reviewed by a clerk
regardless of entry method, scanned paper or
e-ling. This takes a lot of time.
Transforming this process was accomplished
by starting small and branching out. AI tools
are now mature and “big” because there are
many components to AI that work in various
combinations to address specic processes.
Technology using AI on forms was logical as
forms have structure which makes it easier
to train AI on repeatable steps since data
is located at dened locations on the form.
Machine learning is a process where AI is
trained to locate data, identify it, and then
process the data as per instructions. As the
number of forms increases that AI processes
and learns from, the more accurate it becomes
over time. The civil division of court was
selected rst since there was mandatory e-ling
using standard forms in place.
Three use cases are in play in Orange County.
1. Document Intelligence and Data
Extraction.
2. Redaction – due to legalization of
cannabis, many court records required
redaction of past offenses.
3. Default Judgements
24
USE CASE 1 – Document Intelligence and Data Extraction
Document Intelligence is about unlocking the data within the case le or forms. The courts
have lots of documents and untapped information that could be available for query and other
actionable processes and automation scenarios. Document intelligence complements business
intelligence by supplementing data extracted from documents with data from databases and data
warehouses. Document classication is the
rst step in the process and in Orange County this
is the Magic Classier process. Document classication is a manual process
to drill down from
the high level to the sub classication levels needed to properly docket and place the case into a
workow queue. There is a lot of work being done now using data analytics to determine the key
indicators for classication and then using the iterative machine learning process to train AI to
perform the classication process.
There are 3 case management systems in Orange County: 1) Tyler Odyssey for Family and Juvenile
(SQL); 2) V3 for Civil, Probate, Small Claims (Oracle); and 3) Vision for Criminal (Oracle). There
was already in place an established method of unlocking the data from these sources and putting
them into a data warehouse (Snowake). There was also an established method to visualize the
data using Power BI, Tableau, SharePoint Online, and MS Excel. The layer that was added was the
AI and Machine Learning layer. It was placed after the data warehouse, so the presentation tools
had more information available Orange County is using these tools in the AI and machine learning
swim lane: Databricks (data analytics), Azure DevOps, and Azure Forms Recognizer (Azure
DevOps and Forms Recognizer are completing the data extraction and forward actions).
The building blocks below take information from the AI and Machine learning through the
document intelligence process and adds to the business intelligence. The activity intelligence
integrations, contextual understating, business rules along are combined with Natural Language
Processing (NLP) to support processes to the right. These processes are simple such as case
initiation, document classication of e-led case information to more complex processes
supporting redaction, default judgements, protection orders to name a few. These building blocks
and automation help the clerk and courts with case processing. Predictive Analytics are used for
such things as case ling levels and workload predictions.
BUILDING BLOCKS
25
BUILDING BLOCKS
Legend:
Black: Completed
Blue: In progress
DATA ROADMAP
26
USE CASE 2 – Redaction (Cannabis)
D
ue to the legalization of marijuana, the courts must retroactively redact portions of court
case le related to cannabis charges. Single count instances are straightforward, but in some
instances, there are multiple counts listed where only the cannabis related information is to be
redacted. Machine learning must learn the various iterations of how a cannabis related count
might be referred to such as “Count Two, which makes learning more challenging. This means the
machine learning must tie the Count Two charge to mean redaction of those unobvious words
when encountered. This machine learning process is underway and ongoing. This project is to
avoid a high volume of manual redaction. The vendor partner Orange County is using for this
process is PTFS.
SINGLE COUNT VERSUS MULTIPLE COUNTY EXAMPLE
27
USE CASE 3 – Default Judgments
In Orange County Superior Court,
all default judgments are led
electronically. The courts received
meta data and PDFs. As these
lings go into a review queue for
default judgements, the clerks
would have to view each one and
determine the correct subtype.
There are 9 subtypes for default
judgments. Making the subtype
determination may require the
clerk to nd information from
other sources such as a lookup
in the case management system.
Once the subtype was identied,
it was added to the notes section
in the CMS. Then the clerk
assigned to the work the specic
subtype for default judgments
would have to search the notes
to “nd” these cases assigned to
them. This was a time consuming
and inefcient process.
To transform this into a more
efcient digital process, the AI will
scrape the pertinent data from
the default judgment ling, rules
will be applied to the data, there
will be 9 specic sub-queues and
the rules engine will 1) determine
the appropriate subtype and 2)
place the ling into the correct
queue. Automating this part will
free up clerk time from the heavily manual process of determining subtype and allow them to work on the
queues. No jobs are lost in this process, but the repeatable steps have been automated to allow the clerks
to work more timely on cases. This will help reduce backlogs.
SAMPLE DEFAULT JUDGMENT
28
Lessons Learned
1. Start with a relevant business question.
(What problem needs to be solved?)
2. Leverage an integrated technology stack. (Buy
and build can be combined, look at what works
best for the court’s environment).
3. Be agile. (start small, iterate, learn, repeat)
Other Uses of AI
Other uses are AI in Orange County includes
Chatbots using Google Contact Center AI in
areas of the Collections Group and of Jury Group
since those are high volume areas where the
court receives a lot of questions. The BOT is used
to answer the common questions coming in.
Collections has a team of 2 people working part
time to work on the Q/A to rene parameters
around “intent”, or “What are you trying to
nd?”. Business analysts look at the questions
coming in and help rene the ChatBot’s ability to
answer incoming questions. Special emphasis on
new questions. This is known as intent mapping.
Orange County is evolving from Chatbots to
conversational AI as their next step in their digital
transformation.
Orange County is using other tools than RPA,
but sees the benets of this technology. The
term robotic may be misunderstood and make
employees concerned about being replaced by
a robot. Perhaps the , the “R” should be viewed
as “Repeatable” since this technology is a great
t for repeatable tasks that the software can
learn by mimicking the pattern through repeated
observations of the steps. RPA is an excellent t for
older systems where direct integration through an
API may be difcult or unavailable.
29
Appendix A:
POC 1—Civil Case Data Extraction and Case Matching POC
Background:
The National Center for State Courts has already completed proof of concepts on data redaction
and would like to look at the technology to complete data extraction from civil cases. Data
extraction would include initial document classication and capture of data.
POC Purpose:
The purpose of this POC is to determine the effectiveness and accuracy of extracting specic
targets from civil documents. These extracted data will be critical for use in population of other
application’s databases. It is anticipated that the software will be more effective in nding and
extracting data from the document that will lead to more complete and accurate data sets. To
demonstrate some potential use in an outside application component, extracted data will have
some relational comparisons.
Data Set:
The Civil Case Triage dataset consists of approximately 65,000 pleading documents (Complaints ≈
37,000; Answers ≈ 28,000) from the Maricopa County (AZ) Superior Court, the Pima County (AZ)
Superior Court, and the Palm Beach County (FL) Circuit Court.
Data Extraction:
For each document, extract the following information:
• Extract the name of the court in which the document was led;
• Extract the case number assigned to the document;
• Identify the type of document (e.g., complaint, answer)
• Extract the date the document was led;
• Is this document written in a language other than English? Y/N
• Is this document written in plain English? Y/N
• Indicate the number of pages in the document.
If the document is a Complaint
• Extract the bar number of plaintiff’s lawyer; and the name of the law rm; OR
• Indicate that the plaintiff is self-represented.
30
• How many plaintiffs are named in the Complaint?
• Extract the name of each plaintiff and indicate whether the plaintiff is a person or an
organizational party.
• How many defendants are named in the Complaint?
• Extract the name of each defendant and indicate whether the defendant is a person or an
organizational party.
• Indicate if the plaintiff(s) seeks class action certication? Y/N
Indicate the subject matter of the lawsuit:
• Automobile negligence (Pima, 3,425; Maricopa, 8,177; Palm Beach, 3,425.
• Premises liability (Pima, Maricopa, 754; Palm Beach, 1,042;
• Medical malpractice (Maricopa, 440)
• Legal malpractice (Maricopa, 180)
• Other professional malpractice (Maricopa, 52);
• Product liability (Maricopa, 6)
• Slander/Libel/Defamation (Maricopa, 172)
• Intentional tort – Assault/Battery
• Intentional tort – Vandalism
• Pet attack
• Breach of contract – plaintiff buyer (Maricopa, 35)
• Breach of contract – credit card debt collection (
• Breach of contract – student loan debt
• Breach of contract – other consumer debt collection
• Breach of contract – commercial debt collection
• Landlord/tenant – residential eviction
• Landlord/tenant – past due rent collection
• Landlord/tenant – tenant plaintiff (housing violation, deposit collection)
• Landlord/tenant – commercial lease
Outcomes:
Extraction Test
• Capture data in a structured dataset;
• Capture document content for future search capability;
• Generate summary of extracted data.
Relational Data Test
• Match cases based on identical court and case number.
• Compare number of parties in Complaint(s) and Answer(s).
• Identify difference in the number of parties, names, or litigant types.
31
Appendix B:
POC 2 – Civil Case Triage POC
Background:
The National Center for State Courts captured a diverse data set of civil cases and their outcomes
to develop a case triage model. This model placed cases into one of three categories: 1) simple, 2)
standard, and 3) complex. This model was based on experience from subject matter experts.
POC Purpose:
The purpose of this POC it to determine the effectiveness and viability of using AI tools to place
triage civil cases into the three categories. These categories assist clerks/courts with workow.
The vendor may approach this POC to use the apply the existing model for triage or to use AI tools
to conduct analytics to determine a more effective model.
Outcomes:
Depending on the approach of the vendor for this POC the anticipated outcomes may t into one
of two categories:
1. Use AI tools within the software to triage cases based on the NCSC model. Compare POC
results to actual results outcomes in the model.
2. Use AI tools to review and analyze the same civil case types and determine the appropriate
case management pathway using a new model based on predictive analytics. Compare
POC results to actual results outcomes in the model.
Dataset:
The Civil Case Triage dataset consists of approximately 65,000 pleading documents (Complaints ≈
37,000; Answers ≈ 28,000) from the Maricopa County (AZ) Superior Court, the Pima County (AZ)
Superior Court, and the Palm Beach County (FL) Circuit Court.
NCSC will provide complexity scores and raw data for each case based on actual case activity
reported in CMS and will provide complexity thresholds for pathway assignments in each court.
32
Appendix C:
POC 3 –Civil Consumer Debt Cases, Quality Control POC
Background:
The National Center for State Courts would like to explore the use of AI tools to assist with quality
control in civil cases, specically the consumer debt collection case type. There is a need to check
completeness of information and other critical indicators to determine if a case is ready to move
forward or requires additional case management.
POC Purpose:
There are a host of requirements to process civil cases in debt collection. This POC will utilize
document classication and data extraction tools to match documents in cases and extract various
required elements. Then these information points will be further analyzed and compared to a
quality control requirements checklist.
Dataset:
The Quality Control dataset consists of 21,469 documents led in 3,420 unique consumer debt
collection cases disposed in the Cleveland Municipal Court. **The image resolution varies from
200dpi to 400dpi. This particular jurisdiction will recopy the entire court le upon each ling, and
you will nd duplicate documents within the image. Software will need to be able to identify and
ignore duplicates.
For each document:
• Identify the document type;** In the data set, there are duplicate copies in subsequent ling,
so document identication will be important to this POC.
• Extract the case number.
If the document type is a Complaint, extract:
• Case number
• Filing date
• Name of Plaintiff
• Number of Defendants
• Name of each Defendant(s)
• Address of each Defendant(s)
33
• Amount of debt claimed
• Date of default
• Amount of principle claimed
• Amount of interest claimed
• Amount of fees claimed
• Attorney signatureY/N
If the document type is a Return of Servicedocument, extract:
• Case number
• Service date
• Filing dateof return
• Who served the notice? (USPS, Sheriff, private process server)
o Name of private process server
o Image of signature on USPS returnY/N
o Failure of service (undeliverable, unclaimed, refused, not served)
• Name of Defendant
• Address of Defendant on summons
• Address of Defendant where served
• Type of service(personal, residency, publication, certied mail, rst class mail)
If the document type is an Answer, extract:
• Case number
• Filingdate
• Number of defendants
• Name of defendant(s)
• Address of defendant(s)
• Bar number oflawyers, if any
• Is the debt admitted or contested?
• Indicate defensesalleged in Answer:
o DebtSatised
o Debt discharged/bankruptcy
o Not me
o Not my debt
o Amount in dispute
o Statute of limitations
o Debt invalid
o Identity theft
• Attorney/Party SignatureY/N
34
If the document includes SupportingDocumentation:
• Indicate in which document type the supporting documentation was appended;
• Indicate the page number in document where the supporting documentation was appended
• Indicate whether the supporting documentation is a billing statement or statement of debt owed.
If so, extract:
• Case number
• Filing date
• Name of Plaintiff
• Name of Defendant
• Date of original contract/application
• Date of statement
• Date of last payment
• Date of default
• Amount of principle
• Amount of fees
• Amount of interest
• Signatureon AfdavitN/A
• Afdavits (attorney or other source)
• Indicate whether the supporting document is an afdavit.
If so:
• Indicate the page number in the document where the afdavit was appended
• Indicate if the Plaintiff is the original creditor Y/N
• If the plaintiff is not the original creditor, indicate whether a statement describing the
chain of ownership/custody is included.
• Extract: 
• Casenumber
• Filing date
• Attorney or creditorafdavit
• Signature on Afdavit
If the document type is a Motion for Judgment, extract:
• Case number
• Filing date
• Plaintiff name
• Number of defendants
• Defendant name(s)
• Defendant address(es)
• Amount claimed
35
• Statement describing proof of standing (original creditor or chain of ownership/custody)
• Military afdavit
• Amountof attorneys’ fees
• Supporting documentation
• Attorney Signature
Outcomes:
Extraction Test
• Capture data in a structured dataset;
• Capture document content for future search capability.
Relational Data Test
The output will be a checklist that will summarize key indicators in a case to assist the court in
determining the quality of the case, identifying issues requiring additional action, and determining
readiness of the case to move forward.
1. Show chain of ownership of the debt if the debt has been sold.
2. Show evidence of debt (contract, billing statement, other documentation)
3. Motion for default judgment – must show supporting documentation and nancial
accounts
4. Military service check has been conducted. (military receive special exemptions/
accommodations).
36
Appendix D: Civil Case Triage Criteria
CIVIL TRIAGE CRITERIA FOR MARICOPA COUNTY SUPERIOR COURT
Case Type
Assign to General Pathway if all
conditions are met
Assign to Complex Pathway if all
conditions are met
Debt Collection Not applicable
Plaintiff and defendant are represented,
2 or more defendants, answer or
responsive pleading led, and jury
demand led by either party
Landlord/Tenant All cases Not applicable
Other Contract
Plaintiff represented, 2+
defendants, answer or responsive
pleading led
Plaintiff represented, 2+ defendants
AND 2+ plaintiffs, and answer or
responsive pleading led
Automobile Tort
Plaintiff and defendant
represented, 2+ defendants AND
2+ plaintiffs, answer or responsive
pleading led, and jury demand
led by either party
Not applicable
Intentional Tort
Plaintiff and defendant represent-
ed, 2+ defendants
Plaintiff and defendant represented,
2+ defendants, answer or responsive
pleading led
Medical malpractice Not applicable All cases
Other malpractice Not applicable
Plaintiff and defendant represented,
2+ defendants, answer or responsive
pleading led
Product liability
Plaintiff and defendant
represented, 2+ defendants,
answer or responsive pleading led
Plaintiff and defendant represented, 2+
defendants AND 2+ plaintiffs, answer or
responsive pleading led
Premises liability
Plaintiff and defendant
represented, 2+ defendants,
answer or responsive pleading led
Not applicable
Other tort
Plaintiff and defendant represent-
ed, 2+ plaintiffs, answer or respon-
sive pleading led
Not applicable
Real property
Plaintiff represented, 2+
defendants, answer or responsive
pleading led
Not applicable
Other civil
Plaintiff and defendant
represented, 2+ defendants,
answer or responsive pleading led
Not applicable
37
Appendix D (con’t): Civil Case Triage Criteria
CIVIL TRIAGE CRITERIA FOR FIFTEENTH JUDICIAL CIRCUIT COURT OF FLORIDA
Case Type
Assign to General Pathway if all
conditions are met
Assign to Complex Pathway if all
conditions are met
Debt Collection
Plaintiff and defendant are
represented, more than 2
defendants, answer or responsive
pleading led
Plaintiff and defendant are represented,
counterclaim or third party claim led,
answer or responsive pleading led, and
jury demand led by either party
Landlord/Tenant Not applicable Not applicable
Other Contract Not applicable Not applicable
Automobile Tort
Plaintiff and defendant are
represented, more than 2
defendants, answer or responsive
pleading led
Not applicable
Intentional Tort Not applicable Not applicable
Medical malpractice Not applicable
Plaintiff and defendant are represented,
more than 2 defendants and 2 or more
plaintiffs, answer or responsive pleading
led, and jury demand led by either
party
Other malpractice Not applicable Not applicable
Product liability Not applicable
Plaintiff and defendant represented,
more than 3 defendants, answer or
responsive pleading led, and jury
demand led by either party
Premises liability Not applicable Not applicable
Other tort Not applicable
Plaintiff and defendant represented,
more than 2 defendants, answer or
responsive pleading led, and jury
demand led by either party
Real property Not applicable Not applicable
Other civil
Plaintiff and defendant
represented, 2 or more defendants,
answer or responsive pleading
led, and jury demand led by
either party
Not applicable
38
Appendix D (con’t): Civil Case Triage Criteria
CIVIL TRIAGE CRITERIA FOR PIMA COUNTY SUPERIOR COURT
Case Type
Assign to General Pathway if all
conditions are met
Assign to Complex Pathway if all
conditions are met
Debt Collection Not applicable Not applicable
Landlord/Tenant
Plaintiff and defendant
represented
Not applicable
Other Contract Not applicable
Plaintiff and defendant represented,
answer or responsive pleading led, and
jury demand led by either party
Automobile Tort Not applicable
Plaintiff and defendant represented,
answer or responsive pleading led, and
jury demand led by either party
Intentional Tort Not applicable Not applicable
Medical malpractice Not applicable
Plaintiff and defendant represented,
3 or more defendants, and answer or
responsive pleading led
Other malpractice Not applicable Not applicable
Product liability Not applicable Not applicable
Premises liability Not applicable Not applicable
Other tort Not applicable
Plaintiff and defendant represented,
answer or responsive pleading led, and
jury demand led by either party
Real property
Plaintiff and defendant
represented, organizational
defendant, 3 or more defendants,
answer or responsive pleading led
Not applicable
Other civil
Plaintiff and defendant
represented, answer or responsive
pleading led, jury demand led by
either party
Plaintiff and defendant represented, no
organizational parties
ncsc.org/cji
ISBN: 978-0-89656-328-5 © 2023