Wednesday, June 29, 2011

Tax Law Access in the 21st Century: Guest Post on 21stCenturytaxation blog

Professor Annette Nellen generously invited me to write a guest article on her blog about technological changes that can make a difference for tax law.  Nellen is tax professor and Director of the MS Taxation Program at San Jose State University, and a thought leader on reform of tax law and practice.

Check out the post and leave your comments here:

Monday, June 27, 2011

Quora Post: What are nontechnical barriers to adopting version control for legislation?

I was invited to answer the question above at Quora, which touches on a number of themes on this blog. It generated an interesting discussion, and I reproduce my answer here:

This Venn diagram explains the most fundamental barrier:
If you squint, you might be able to find a couple of intersections, but not many. I think that this is a problem that can be solved largely by providing a clean, obvious, technical solution for lawmakers. To borrow from the Godfather: offer legislators a solution they can't refuse (more below).

But this question asks about the non-technical* barriers, and these are largely inertial. The legal community is unaware of the powerful text-based tools that could make legal work more accessible to the public and more efficient. Meanwhile, there is no "version control" lobby in Congress. So although adding version control would make a tremendous difference to the efficiency of the legal process, few people understand the value that it would bring. I've written about the potential benefits in a couple of specific cases: What can lawmakers learn from computer science? and Open Source Tax Law

Much of the current system for drafting, publishing and updating U.S. laws is more than two hundred years old, depending on how you count. It is internally consistent (mostly) and is actually quite sensible for organizing legislation into printed books.**

In the case of U.S. Federal legislation, the significant burden of writing, compiling and publishing U.S. laws is divided among three different institutions:  the Office of Legislative Counsel of the U.S. House is in charge of formatting and printing legislative drafts and proposed legislation; the Law Revision Counsel of the U.S. House maintains and updates the U.S. Code on a 6 year schedule***; and the Government Printing Office is in charge of printing the official version of the U.S. Code. When these roles were originally established, they provided the human resources and Quality Assurance to maintain an organized body of law. The challenge is to move from this system to one that is suited for an electronic age.

Each of the three institutions works with legislation in a different primary format. Where metadata has been added, e.g. to create an XML or HTML version, the formats are not consistent with each other. This is a technical barrier that will require a non-technical solution (choosing one format and responsible institution over the other). It's a question of awareness and political will.

This year has seen some progress on both counts. Just a couple of months ago, Speaker of the House John Boehner and majority leader Eric Cantor wrote a letter to the Clerk of the House, calling for e-formats for legislation.**** The Sunlight Foundation has been doing great work in pushing for transparency in government, including more consistency in e-formats for legislation.

This is where I think a technical solution (and technical people) can make a difference. We can develop a solution that "just works": showing a redlined version of laws for any bill, accurately showing changes in the U.S. Code as soon as an amendment is enacted, and browsing of legislative history like the MacOS Time Machine. A non-partisan solution that could save money and increase transparency, all at the push of a button. I still wouldn't underestimate the power of inertia, but having an elegant and simple technical solution close at hand will make it much more likely that legislators will make the change.

*By "technical" I assume the question refers to the algorithm that would actually be used to implement version control, and "non-technical", I assume, means the political or historical resistance to change.

**Legislators, and the legal community as a whole, has yet to make the transition from print-centered formatting to electronic. Legal documents--even if originated and consumed electronically--are still formatted as if destined primarily for print.

***The U.S. Code is a compilation of U.S. Federal laws into 50 Titles, divided by subject area.

****I highlight this letter, and some of the technical challenges to converting legislation into a version-control friendly formats, on my blog:

Insert mode

Thursday, June 23, 2011

California Laws Android App in 5 Minutes

I've wrapped the California Laws website in an Android application for even faster access from your mobile phone.

[UPDATE: I have shut down the live CA Laws demo website; provides the internal hyperlinks that I had built into my site, and is kept up-to-date. The Android app is also not working now.] Download the new California Laws app here for free, test it out and let me know what you think.  To install, you need to download directly to your Android device and open from the System tray.


In theory, going from a mobile-friendly website to a web application for Android, iPhone or iPad should be relatively simple.  And in practice, it now is, within limits.  I used a new web-based service to create this "version 1" California codes Android application. The website that I used to make this app,, is a bit slower than I'd like for normal page rendering, but otherwise they offer an impressive service. I have no relation to Appgeyser, but this looks like the fastest way to go from website to application, and is a good way to test how your site would look as an application.

A few downsides which can be cured in future versions:
- Tables of contents require scrolling across the screen
- Appgeyser puts an ad at the bottom of the application for their service

Wednesday, June 22, 2011

California Laws Go Mobile, With Headings

You can now browse California legislation from your mobile device at This is an extension of the work, described in these posts, to parse and display California's laws for more user-friendly navigation. I implemented this as a simple web application, using mobile-targeted style sheets, not (yet) a native application. On the devices I've tested, though, it's fast and convenient. Let me know how it works on your iPad, iPhone, Android, Blackberry, or other mobile toy device.

I also implemented an idea by Jason Wilson, and seconded (or at least retweeted) by Robert Richards, to add headings to California law sections, to help provide context. Wilson, of Jones McClure, a legal publisher, has given a lot of thought to legal technology and has many interesting ideas on how to make legal technology better. His suggestion on the California Laws site is just the kind of exchange I was hoping to generate. If you have ideas or suggestions to improve navigation of California's laws (or the Internal Revenue Code), let me know on twitter (@arihersh or @tabulaw) or in comments below.
Or make the changes yourself and send me a pull request on Github, where I've put the code for the California Law website. In some ways, using online collaborative technologies like Github brings things full circle for law: lawyers have been doing "open source" collaboration for millennia, taking branches and merging Biblical laws, Hammurabi's Code, the Roman Code and others to create new laws. I'd love to see Jones McClure and other legal publishers join in an effort to provide a truly open source repository of primary laws and court opinions, upon which secondary content and proprietary analysis tools can be built.

In future posts I will flesh out details of how this could work, in the context of California law and in open sourcing the Internal Revenue Code.

Friday, June 17, 2011

Open Source the Tax Code

This week, the U.S. government released a major update to the online version of the Tax Code. For some reason this didn't make headlines.
Here, and in future posts, I will discuss why the text of the Tax Code needs to be "open sourced", and how we're approaching this challenge at Tabulaw with The work to introduce structural metadata to California's laws (ongoing, now open sourced on Github, and available at, was a warm-up for this discussion.

This week's update of the Tax Code illustrates the challenge ahead: The update, by the Law Revision Counsel, a small, dedicated office of Congress, incorporates all of the changes that Congress has made to the Code from 2006 through the end of 2010. The Internal Revenue Code is the Federal law that arguably has the greatest impact on the lives of most Americans. And for the time being, the public has an up-to-date version* of this law.

The impact of the nearly 10,000 sections of this law is one reason that President Obama emphasized the need to simplify the Tax Code in his State of the Union Address, saying, "It makes no sense, and it has to change."

Yet this lack of clarity is itself a major impediment to change. If and when Congress takes up the battle over what the tax code should say, we will need as much clarity as possible about what current tax law actually says. The effort will raise a fierce debate about important issues of tax policy, fairness and the future of this country. However, these issues become clouded by a mire of laws, regulations and guidance that even leading experts (and the IRS) struggle to understand and explain. Technology cannot cut through all of the fog, but there are non-partisan, technical solutions that can help make the task easier. An Open Government bill introduced today by Representative Darrell Issa (R), includes an important open data provision that would impact IRS (and other) agency rulings and guidance. I believe that open-sourcing the law itself is a natural corollary to this bill.

By open sourcing, I mean to:
  1. Introduce meaningful metadata into the text.
  2. Parse or draft new tax-related bills in so that they can be:
    1. instantly compared to existing laws and, when passed,
    2. used to immediately update a public, online version of the new law.
  3. Create an platform that experts and professionals can use to research, debate and explain the law.
The first two principles are essentially a subset of common-sense "open data" principles such as these from the Sunlight Foundation or these from the initiative. The third is a focus of our work at Tabulaw to improve online tools for legal professionals (more on this soon). We are at an exciting time for initiatives to reinvent participation in government (e.g. PopVox, OpenCongress, Sunlight Foundation's OpenStates etc.). I believe that, especially wrt the Internal Revenue Code, there is much groundwork that needs to be done by the professional tax community--in clarifying, explaining and simplifying the Code--in order to make public participation in the policy-making process more meaningful.

*The LRC version is up-to-date through January 2011.

Thursday, June 9, 2011

Free Advice to Congress: 5 Better Uses for the Internet

As the Wiener scandal reminds us, Congress doesn't quite have the hang of this internet thing. So I take the liberty here to provide 5 suggestions of better things Congressmen could be doing with their access to the web and our tax dollars:

Monday, June 6, 2011

California Laws: Now with search

I've made some improvements to, which has all of California's legal codes with internal links for easy navigation of the laws. It now also has a fast search engine, powered by Sphinx.

Know anyone who works with California state law? Pass this on to them. Anyone in the legislature? They might want to replace the aging

Wednesday, June 1, 2011

Cleaning Up California Law: Errors in online sections

I found more than a thousand errors in the course of parsing the online version of California's legal codes. At first, I thought there might be something wrong with my parsing algorithms -- I had, indeed, gone through a number of rounds of bug-fixing. These repeated sections were carried over to the site I've published ( Having parsed the sections, it would take just a few minutes to clean up the duplicates, but just to make sure I looked back at the California legislature's website.

When I looked at the original data on the California legislature's website, I saw the sections repeated verbatim. I've collected the 1,368 repeated sections (about 2%), and most look like errors in California's original conversion from print to electronic document.

Want to see for yourself? Check out these sample sections:

There were also printer's errors that apparently crept in during the conversion from print to electronic format. For example:

Ý1084.] Section Ten Hundred and Eighty-four. The writ of mandamus

may be denominated a writ of mandate.

Do any of these errors cause confusion about what the law is? Maybe not, but it makes navigating the law that much more confusing. With almost all legal research now being done electronically, I think it's reasonable to expect official government electronic sources that can be relied upon.