I found more than a thousand errors in the course of parsing the online version of California's legal codes. At first, I thought there might be something wrong with my parsing algorithms -- I had, indeed, gone through a number of rounds of bug-fixing. These repeated sections were carried over to the site I've published (calaw.tabulaw.com). Having parsed the sections, it would take just a few minutes to clean up the duplicates, but just to make sure I looked back at the California legislature's website.

When I looked at the original data on the California legislature's website, I saw the sections repeated verbatim. I've collected the 1,368 repeated sections (about 2%), and most look like errors in California's original conversion from print to electronic document.

Want to see for yourself? Check out these sample sections:

There were also printer's errors that apparently crept in during the conversion from print to electronic format. For example:

Ý1084.] Section Ten Hundred and Eighty-four. The writ of mandamus

may be denominated a writ of mandate.

Do any of these errors cause confusion about what the law is? Maybe not, but it makes navigating the law that much more confusing. With almost all legal research now being done electronically, I think it's reasonable to expect official government electronic sources that can be relied upon.