Thursday, March 13, 2014

Open Data and the Role of Government

In my last post, I made the point that government agencies should refer to, and use, official government sources for primary law, where available. The comments, including those by Tom Bruce of Cornell's LII, and Annette Nellen of San Jose State University, underscore the important role that third-party sites like LII play in disseminating information obtained from official sources. The presence and widespread use of these sites raises the question: what is the proper role of government in publishing and disseminating primary law online.

For many years, third-party sites provided the only accessible, structured, source for certain primary law, and they still often provide the best sources or interfaces for this information.* And for years the open government community has pointed out the failings of government sites and many of the areas we have wanted to see improvement. Examples include this excellent post from Cato's Jim Harper a year ago today, this congressional testimony from Sunlight's Dan Schuman, and the excellent work by Hudson Hollister and the Data Transparency Coalition to pull such policy recommendations together (pdf).

In no small measure due to this public concern, government entities have become better online publishers of their own official documents. As I noted in my last post, the U.S. Code, now published in XML by the Law Revision Counsel, has come a long way since the days that it was updated on a 6 year schedule (still the case for the official print version). Other government electronic sources for primary law are also much improved: for rulemaking, for bills, statutes and other legislative information, both improve on their aging predecessors.

Where should these sites stop and allow the private sector to take over? Is publishing bulk XML enough?

My view is that government must go beyond publishing bulk structured data. I believe that government should provide an official online source for primary law that includes structured data (XML) presented with modern web features, including:
  • hyperlinked citations, with unique identifiers at the paragraph or section level
  • dynamic navigation of contents (e.g. navigation through tables of contents)
  • full text search
In addition, I believe that an accurate and navigable point-in-time view of the law-- a kind of version control-- should also be included where possible. This would allow us to see the law as it was in force at any date. It may be unrealistic for some data sources to create this kind of record for historical documents, but document drafting processes going forward should include some kind of version control.

What do you think government's role is in publishing primary law? In particular, how important do you think web features such as navigation and search are for the official government version?

* In this category, in addition to LII and the federation of LII sites around the world, I'd include Tim Stanley's original findlaw and now Justia, Carl Malamud's; Josh Tauberer's GovTrack, OpenCongress and other Sunlight Foundation sites, Waldo Jaquith's work at for state codes and statutes;; Xcential's own and many more.

Tuesday, March 4, 2014

Tax Code: Ask the Experts?

I start today's post with a curious observation: if you go to the IRS website looking for online copies of the U.S. Tax Code, what you find, on their page entitled "Tax Code, Regulations and Official Guidance" is a link to Cornell's Legal Information Institute, which hosts a copy of the U.S. Code.  Below the link is a warning:
CAUTION.  The version of the IRC underlying the retrieval functions presented above is generated from the official version of the U.S. Code made available to the public by Congress.  However, this version is only current through the 1st Session of the 112th Congress convened in 2011.  Before relying on an IRC section retrieved from this or any other publicly accessible version of the U.S. Code, please check the U.S. Code Classification Tables  published by the U.S. House of Representatives to verify that there have been no amendments since that session of Congress.
And after clicking the link the IRS directs you to another warning about links to private sites that reads:
By linking to this private business, the IRS is not endorsing its products, services, or privacy or security policies. We recommend you review the business's information collection policy or terms and conditions to fully understand what information is collected by this private business.
This strikes me as odd.  Hosting and publishing the text of the law is one function that I think both liberals and conservatives would agree should be carried out by the government itself. Here is the IRS saying that you cannot rely on them, or any other U.S. government entity, to provide you an electronic copy of the tax law. And that they don't endorse the information that is provided on the LII site either. Caveat lector, and good luck filing your returns.

Cornell does do a terrific job of compiling, parsing and presenting Federal legal information for free to the public (donate to their efforts, if you can!). And it appears that the information on their site is more up-to-date than the IRS warning would suggest. Nonetheless, their information is not fully up-to-date, and more to the point-- they are not responsible to taxpayers for providing this information. I think that Cornell and other similar efforts are best seen as filling in the gaps where government has not risen to the occasion.

But in this case, there is an excellent, updated electronic version of the U.S. Code, published by the office that compiles the Code: the Law Revision Counsel (LRC). As Cornell rightly notes (under an "update" tab here):
If you suspect that our system may be missing something, please double-check with theOffice of the Law Revision Counsel.
Yes, I'm biased-- our company, Xcential, helped the LRC convert the Code into well-structured XML, which the House announced last July. But that fact works both ways-- our team took on this project and decided to work with the House on this project, because of a fundamental belief: the official, accurate, electronic sources for our country's laws should be provided by the government.

Simple idea, no? Let's hope someone at the IRS is listening.

[Next up: IRS's internal reliance on Lexis-Nexis for tax law, opinions and analysis.]