Monday, September 17, 2012

Guest Post at VoxPopuLII

Grant Vergottini and I were given the opportunity to write about our exploits with legal hackathons and legislative editors over at the VoxPopuLII blog.  Do go over there and disagree with us in the comments.  We welcome a heated discussion.

Thursday, July 19, 2012

Meeting at US House on Congressional Legislation XML Standards

Tomorrow (July 20), there is going to be an "industry day" to discuss the U.S. House modernization project.  This includes projects from both the Office of Legislative Counsel (responsible for bill drafting) and the Law Revision Counsel (responsible for codifying legislation into the U.S. Code).  The project specifications describe goals for data encoding of U.S. legislation that point toward a truly modern system-- in which references in legislation can be identified and easily hyperlinked, and in which it might be possible to get an up-to-date version of a law without having to manually trace the entire amendment history.

As I've mentioned before, technology alone is not enough.  Congress needs to take at least two steps in the way it writes bills, in order to make legislation more amenable to automated updates and analysis:

1. Move ahead with positive law codification: the Law Revision Counsel has prepared a number of Titles of U.S. law for passage into positive law.  This would mean replacing the cobwebs of hundreds of overlapping laws and amendments with a single, authoritative text.  This is never an urgent matter for Congress, but it does represent a very important and long overdue Housecleaning. For each Title that is passed into positive law, Congress now has a single base from which to work when making amendments.

2. Write new bills with full section-by-section text replacements.  Currently, bills may change a single word, a sentence or an entire passage.  The amendment is often described in words ("change the last word of the fourth sentence...").  Replacing entire sections makes automated redlining much more practical and effective, meaning that we could all more easily see what effect any new bill has on the existing legislation.

Of course, there is much more that could be said about each of these points, and about the goals that the Legislative Counsel and Law Revision Counsel have set out.  I am very encouraged to see the political will gathering to make technological changes to the way bills are drafted, and I just hope that Congress can also make the bureaucratic changes necessary to support this new technology.

Friday, June 29, 2012

A Big Step Forward for Legislative Data Standards

Legislative transparency is catching on.  Exhibit 1:, a site to gather tools and propel legislatures toward a more open, participatory model. The site launch was sparked conference hosted by the National Democratic Institute (NDI), Sunlight Foundation and the Latin American Network for Legislative Transparency (LANLT) last month in Washington. The site links to the Declaration on Parliamentary Openness, which, among other provisions, calls for release of data in open, structured and non-proprietary formats and for allowing bulk download of data.

As the political movement for open data grows, the legislative unhackathon we held last month showed one of the tools that can be used to support it.  The legislative XML editor (AKN-editor) that we used for the event was initially built as an educational tool - to give users first-hand experience with the Akoma Ntoso XML standard for legislation.  You can see the editor in action in this 10 minute tutorial video that was prepared for the unhackathon.

The editor will now be used to teach Akoma Ntosa in the Lex2012 summer course in Ravenna, Italy. In addition, Professor Monica Palmieri, a leading expert in legislative data standards, and a participant in the unhackathon, has also showed off the AKN-editor to parliamentary staff in Brazil.

The Latin American Network for Legislative Transparency, a co-sponsor of the  site, has also been very active in educating legislatures about the value of adopting XML standards for legislation, and we are hopeful that the AKN-Editor can help make the transition to data standards more user friendly.  In fact, Chile has recently called for proposals for an XML based editor for their legislative drafting (full disclosure-- I was part of a bid submitted for this RFP), and I can envision more parliaments in Latin America moving down this path, which could ultimately result in a platform for legislation that makes access easier and improves collaboration between nations.

Saturday, May 19, 2012

International Legislative Unhackathon and G+ Hangout

State Senator Leland Yee made a personal appearance. (He's a sponsor of SB1002 in California, an open data bill.) Jim Harper and Francis Avila of the Cato Institute joined us.  Professor David Jung, Director of UC Hastings' Center for State and Local Government Law worked on marking up a local ordinance. Tanzania, Italy and Japan were represented as were Washington, D.C. and Virginia. Rob Richards, as always, kept us all in communication.  Terrific organization and inspiration by Charles Belle (UC Hastings), Pieter Gunst (Stanford) and Karen Suhaka (Denver).

We were also joined by a number of curious (bored?) men from Middle Eastern countries, apparently looking to chat with English speakers. Some stayed to listen.

Participants everywhere were able to learn about the emerging data standards for legislation (Akomo Ntosa), thanks to videos by Monica Palmieri and Grant Vergottini, and tried their hands at marking up legislation from California and a number of other jurisdictions.  We got some great feedback, which we're posting at, and had some very productive discussions about Grant's HTML5 legislative editor, the AKN standard, and the benefits of crowdsourcing vs. having a dedicated (paid) team to mark up legislation.

The event was also an amazing lesson for me on the power of technology to engage the "long tail".  After all, how many people are interested enough in legislative metadata to spend a beautiful Saturday in May hacking on it? There were 15-20 people each at Hastings and Stanford.  The LegiNation group was in Denver, and through the Google+ hangout, Twitter feed and editor on, many more people could participate virtually and, aside from the introductory presentations, asynchronously.

I am particularly curious about how the experience was for virtual participants-- did you feel you could join in when and how you wanted?  Was the Google+ experience satisfactory?  What would you have liked to see more of? will be hosting a series of future events on structured legislation and we'd love to hear what you liked, what you didn't like and what you hope to see in future events.

Thursday, May 17, 2012

International Legislation Hackathon: Remote Participants

The International Legislation Unhackathon will be held this Saturday, May 19. It's not too late to sign up for the event at UC Hastings, Denver or Stanford. If you are not in the San Francisco Bay Area or Denver, there are a number of ways to keep track of the events, and to pitch in:

  • Twitter hashtag: #legalhacks
  • Google+ Hangout --- link will be posted on Saturday morning around 11am PST on
  • View the tutorial video at:
  • Also check for other videos of "Ignite" talks on legislative mark-up
  •  Find a law or bill text, and mark it up using the AKN editor: 
  • Publish your marked-up bill on, and tweet out the link along with the #legalhacks hashtag. 
Have questions? Tweet them out with #legalhacks, or write to grant.vergottini at or arihershowitz at

Sunday, April 22, 2012

Legislative Standards and the International Legislation Unhackathon

The International Legislation Unhackathon is being held May 19 at UC Hastings and Stanford Law Schools. Sign up, if you haven't already, at It is free, and lunch will be provided, thanks to the UC Hastings Center for State and Local Government.

The event is designed to be accessible for non-programmers and non-lawyers (hence an 'un'hackathon) who will 'get their hands dirty' adding metadata to actual legislation, using a developing international standard for legislative data, Akoma Ntoso. Future (and previous) posts will discuss such questions as Why Metadata in Legislation? and Why should legislatures use XML standards. You could get started by reading this excellent post by Andrew Mandelbaum of the National Democratic Institute.

Assuming that you agree that metadata and standards for legislation are a good thing, there are still questions of implementation: (1) At a technical level (does the proposed standard actually match the structure of real legislation 'in the wild'; is it workable, etc.), and (2) At the practical level (will legislatures actually adopt the standard, or can the private sector add the metadata post-facto to legislation?).

This unhackathon will be an experiment in both of these elements of implementation. Grant is developing a browser-based tool to easily add Akomo Ntosa metadata to legislation. The idea is to lower the barrier for anyone to just try it out. It should take no more than 5 minutes to learn how to add the data fields to legislation. Then the real test-- how well does the data model fit the actual data of laws? Can it be extended easily, for example to accomodate the requirements of the DATA Act?

As anyone who has worked in the web world knows, HTML has been an evolving standard, applied differently by different browsers over many years. It has undergone testing in the real world on billions of webpages with millions of authors. Data standards for legislation are, by comparison, much newer and have a much smaller audience. I am hopeful that participants in this unhackathon, including myself, will come away with a better understanding of what data models in legislation can do. And I also expect that the learning will go both ways-- that the developing standard of Akomo Ntosa can be refined through exposure to events like this one and as more legislatures begin to test and ultimately adopt the standard for drafting legislation.

Look for more about the event, our goals, and legislative data from Charles Belle, of UC Hastings, Pieter Gunst of Stanford and Lawgives, Grant Vergottini, who are co-organizing.

Sunday, March 25, 2012

Lead-up to ABA Techshow

Last week started with an article by Christina Farr in VentureBeat on a new generation of legal technology start-ups, including Tabulaw. For all the recent coverage given to law and law school in the New York Times and other mainstream media publications, there has been comparatively little on legal technologies. In part, that may be due to the fact that a clear inflection point--where the pace of innovation visibly accelerates--has been slow to materialize in law. Indeed, the author did not sugar coat the difficult reality of start-ups in this space. But I believe that the article reached an important audience of engineers, tech entrepreneurs and investors, who can help this inflection take place. And I have been fortunate, as a result of the article, to be able to speak with some of the trailblazers in legal tech, including Rich Granat, who started the network, and Donna Seyle, founder of Law Practice Strategy. Both will be presenting at the ABA Techshow in Chicago. And though I will miss that event, thanks to these recent contacts and the growing legal tech community in the Bay Area, I feel that I've gotten a mini version of the Techshow experience out West.

Saturday, March 17, 2012

Legislative Transparency and the Open Government Partnership

The elements are aligning for a major transformation in how governments publish laws. Open data and open government go hand-in-hand and one significant benefit of this pairing is that the accelerated pace of technology can pull open government forward faster than it would otherwise move. Earlier this week, President Obama and UK Prime Minister issued a press release that highlighted a number of joint initiatives of the two countries. Among them is the Open Government Partnership, promoting transparency initiatives around the world. Transparency means many things to many people, but at the core, it is about improving citizen access to information about the actions and workings of government. I am encouraged to see that the U.S. commitments to the Open Government Partnership are headed by two items that are directly related to making legal data more accessible to citizens: Promote Public Participation in Government and Modernize Management of Government Records. These two items could make a big difference, at a relatively low cost, for many of the other participant countries in the OGP, and improve transparency for all countries. Transparency in the 21st century is almost synonymous with being web accessible. Making laws web accessible, in a standard, structured data format means that laws are not only accessible to a wider population of citizens within a country, but means that citizens around the world can compare their own laws to those of other countries, on the subjects of Freedom of Information, anti-Corruption, environmental protection and a hundred other dimensions of an open society. This can create a race to the top and pressure on laggards to bring up their standards in these areas.

Friday, March 9, 2012

A Glimpse of Linked Legal Data: U.S. Named Statutes

At, we are trying to push the envelope of digitization of primary tax law. Early on, we worked to convert references in documents to other parts of the law, regulations, etc. This gets us closer to a digital linked data set, but anyone who has looked at legal citations can tell you that this is a never ending task. One of the challenges is to convert the references that are made to individual statutes, as opposed to the U.S. Code. So, for example, references may be made to the Olympic Commemorative Coin Act, but where has that been classified in the U.S. Code? The Law Revision Council maintains an amazing Popular Names table, that points to the classification of named statutes, where possible. (See here) However, that table is in pdf format and, as far as I could find, there is no spreadsheet or database representation connecting a statute with its place in the code. So Serge Ulitin, the talented data whiz who I work with, took on this task and has converted the Popular Names table to digital form: This provides a listing of the statutes and, where possible, the source in the U.S. Code has been linked. This allows us to capture many more of the references to these "Named Acts" in the Code itself. The conversion is not perfect, and a lot of clean up is still needed, but this is one more of the steps that is needed to fulfill the promise of digital law I spoke of in the last post. I welcome your thoughts and comments.

Wednesday, March 7, 2012

Who is the Steve Jobs of Digital Legislation?

With the announcement of the new iPad 4   3 (including in this hilarious Onion article: This Article Generating Thousands of Dollars in Ad Revenue), it's a good time to think about the tremendous gap between the promise of digitization and the widespread adoption of digital technology for different media.

A recurring theme for technically-minded people who start to think about how law and legislation works is: why can't the law be more like computer code? By this, they mean, why can't we use the same tools (version control, integrated development environments, compiling, testing) to work with legal code? This thought has been reflected in many recent conversations I've had with programmers about the law, and by the popularity, among a largely technical audience, of my answer to this Quora question:

For people who work with digitized data in other fields, the state of tools in law is puzzling.  The current state of legal data is similar to that of music before the iPod.  At the time, CDs had been around for a number of years. Anyone who had "ripped" a CD or otherwise moved music on and off of a desktop computer's hard drive had, at some point, thought about what it would be like to skip the CD altogether.  I, myself, had a 100 CD player, and thought often about how nice it would be to compress that into a (relatively) small hard drive.  In fact, digital music players did exist, but they were clunky. Getting mp3s on and off of them was cumbersome.  And most people professed being quite happy with their large CD libraries.  Then came the iPod.

The evolution of digital cameras tells a similar story.  And it is inevitable that digital law will follow that path, sooner or later.  There are lurches toward building digital toolsets in various jurisdictions: e-discovery, e-filing, e-compliance...

But these all have to contend with the basic lack of data structure in underlying legal data.  What can be done with the data is therefore severely limited.  But it is not far-fetched to imagine that laws around the world will soon be tagged with a common data format, taking us one step closer to having an iPod for law.

Grant Vergottini and I are hatching a plan for a hackathon, following up on the California Law Hackathon, to mark up sample legislation around the world in a standard XML format.  Grant has done this already for the U.S. Code, California legislation, Hong Kong's Basic Law (essentially their Constitution) among others.

Although we have not yet "officially" announced this event, response to the idea has already been extremely encouraging, thanks in large part to Robert Richards' great outreach.  We are already getting hints of legislatures around the world that may be ready to make the leap to linked digital data (like the UK has largely done, under John Sheridan's leadership). So there may not be a Steve Jobs of legislation (yet), but I believe that there are visionaries in legislatures worldwide who, together, can make this happen.

Friday, February 24, 2012

IRS and Tech: Cudgel or Lever?

The National Taxpayer Advocate, Nina Olson, has a posse.  And for good reason. I attended the American Bar Association Tax Section mid-year meeting in San Diego last week and had the opportunity to hear Olson speak.  She wields statistics, legal provisions and specific taxpayer examples to show how the IRS has steadily de-personalized tax administration.  This has dramatically increased inequities for small business owners and lower income taxpayers.  Automated systems have replaced individual judgement and human contact throughout the IRS, while tax laws, regulations and guidance become more impenetrable.

The problem with technology at the IRS, according to Olson, is emphasis.  The relatively small technology budget has been primarily applied to enforcement and in creating a distance between taxpayers and individuals.  I asked about technologies to *improve* customer service, and Olson said that her office was starting to take some steps in that direction (e.g. video conferencing with taxpayers at remote Taxpayer Advocate offices), and that more needs to be done.  

She suggested putting together a conference of technologists and tax experts to discuss ways that technology could used on the taxpayer's behalf, and I think that there are a host of creative and capable consumer-facing companies in the Silicon Valley and Bay Area that could take up this task.  

What about an online tax dispute system, that reduces the barriers for a taxpayer challenging assessment? The IRS has a list of FAQs on its site, but what about a more comprehensive Q&A site, with technologies like Quora or to identify the relevant taxpayer issue?  These are just the tip of the iceberg, and I am confident that there are dozens of other technologies that could help cut through the complexity of tax law, if civic hackers and consumer internet entrepreneurs set their minds to the task.  

What do you think? Olson also now has a blog.  If you have ideas for technology and tax, you can let her know there, or post your comments here.

Thursday, February 16, 2012

US Code in Standardized XML

Grant Vergottini has done it again.  He has converted the XML of the U.S. Code, published by the Law Revision Counsel, and converted it to a format (obscurely) called Akoma Ntoso, which is growing to be the basis for an international standard for legislation. (See his post here.)

Standards for their own sake have little meaning.  What we'd like is a standard that would allow easy sharing and comparison legislative information from various jurisdictions, while flexible enough to integrate the kinds of metadata that Jim Harper of the Cato Institute has called for.  By focusing on core structural elements, Grant has shown that translation between the different data formats is not only possible, but can be relatively straightforward. The U.S. legislative process is unique, as some experts at the House conference on legislative data pointed out.

True enough: every legislative process is, in some ways, unique. But there is enough overlap that a robust standard is possible.  We're still far off from having a "computable" body of legislation, but this is a major step forward for making the code machine readable.

Wednesday, February 15, 2012

Convert PDF to Text, HTML, Word...

I've put together a small demonstration site to convert pdfs to clean html: you can try it out here.  There are many caveats that go along with this (e.g. the current server is not very stable, it only works with javascript enabled browsers, only 5 documents at a time, limited size on each document, no OCR, etc.).  But I thought I'd get it out there for all the legal data fans to try it out and get a conversation started about data encoding. Do you have a favorite way of getting text out of pdfs?

PDF documents are the only available starting point for a lot of government legal information.  I've discussed some of the problems with this before, and suffice it to say that this is a recurring problem in legal informatics.  To extract useful metadata, and to make the documents web-accessible, it is usually necessary to convert the PDF to a more portable format. The devil is in the details.

While there are many programs available that make the conversion from pdf to text, html or MS Word, there are many trade-offs, the biggest of which is to preserve layout or to make it easier to extract metadata.  Most of the converters to html that I have found, for example, include a huge number of extra tags that clutter up the text, break up sentences and paragraphs and generally make it very hard to extract meaningful metadata from the document.

I've combined a couple of open source programs (pdf2text -> txt2html) and an open source tool to upload documents, to make this small site.  If you find it useful, or need to convert large volumes of pdf documents to clean html, get in touch.

Thursday, February 2, 2012

Roundup: House Legislative Data and Transparency

Many kudos to those who put together today's House Legislative Data & Transparency conference. I was impressed with the high-level and high-quality line-up of speakers and participants, and very grateful to the Committee on House Administration, which provided a livestreaming feed, and to all the Tweeters in the room and around the country who helped fill in the blanks (search: #ldtc).

The Conference provided a clear picture of what the current state of play is with legislative data, and some very clear recommendations from the audience and some participants about where things should move. What is needed now, is a commitment to make those improvements.

John Wonderlich (also at Sunlight) expressed <understatement>disappointment</understatement> in the government's lack of commitment, after many years of requests, to providing bulk data.  I agree, though there are some bright spots: I have been pleased with the bulk data being provided by the LRC for the U.S. Code Prelim, and the regular url scheme at, which is not far off from providing bulk data.

One of the most underappreciated statements of the day came from the Law Revision Counsel.  On a question, I believe, related to authentication, the LRC highlighted the importance of positive law codification.  

I don't think most people realize: there is no single, authoritative publication of Federal statutory law.  The printed version of the U.S. Code is six years out of date. The online USC Prelim is up to date, for now. But neither one is the current law of the United States.

The Conference opened with a strong showing of bipartisanship.  That is exactly what is needed to move codification legislation forward.

XML Standard from Bill to Code: Legislative Recommendation #4

The fourth of my structural recommendations for the U.S. House conference on legislative data and transparency (being held now), is to establish a consistent XML standard from publication of a bill to incorporation in the Code.

I'll keep this one short, since others, particularly Jim Harper of the Cato Institute, have described in great detail what should go into this standard and why it is important.

Here, I want to focus on the importance of having a single XML standard from the first drafting of a bill  to its codification.  Lest you think this is already being done, or is an easy task to accomplish, Alex Howard (@digifile) of O'Reilly media has posted a helpful flowchart [and here] of the various offices that are involved in the first part of the legislative process (until the bill becomes law and is published by the GPO).  The process of codification takes place after that.

A key element of any XML standard for legislation is that it be consistent throughout this process, as it passes from the jurisdiction of one office to another.

Wednesday, February 1, 2012

Positive Law Codification: Legislative Recommendation #3

Positive law codification is probably the most under-appreciated facet of legislative transparency.  It is hard work, and requires fighting against the the entropy of legislative history. But ultimately, any effort to create more accessible legislation, will be limited without positive law codification.

The best description comes from the Office of Law Revision Counsel (LRC), here. There are many minutiae of the process that I don't know or understand, so my discussion here will necessarily be an approximation.  I welcome any corrections in the comments.

The LRC is charged with codifying Federal statutes, which is essentially organizing them into the Titles of the U.S. Code.  However, unless Congress passes a Title as law, and replaces the various laws which make up the Title, the Title will live in a parallel universe from the laws that Congress actually passed.  So relying on text in the Title alone can often lead to trouble. The whole Code is revised on a 6-year schedule, so some sections can be as much as 6 years out of date.  

The LRC is moving ever faster, and has started to release a "USCPrelim" version, which updates Titles on a faster cycle.  However, as long as (1) the Code is not positive law, and (2) changes are not made in a consistent manner, this codification process will continue to require a great deal of manual work and artistry.

At the same time, the LRC has taken up a number of projects to ask Congress to pass certain Titles into "positive law", so that the text of that Title in the Code is the law.  There are currently 8 Titles listed on the LRC's website that are, it appears, ready for Congressional action.

Congress could make tremendous progress toward legislative transparency by prioritizing positive law codification, and committing to completion of the process by a certain date (2014?).  

Now is a terrific time to start, for a number of reasons:
  1. The 6 year cycle completed in January.  So the Code is almost completely "up to date".
  2. Legislative gridlock on other issues creates a space for passing legislation of this technical nature that has few policy implications, but could offer great gains in efficiency and transparency.
  3. Data technologies have advanced to the point that the process of codification can be accelerated. Quality and completeness could be ensured by a number of automated, as well as manual tests. And the benefits of codification would be immediately visible, in the ability to update the Code in real time, just as is currently being done for the Code of Federal Regulations.

This is an exciting time for legislative data, and the House can make changes now, for a relatively modest investment, that will yield benefits for years to come.

Tuesday, January 31, 2012

Cato Institute: Legislative Transparency and Data Model

An email to the Sunlight Foundation's OpenGov listserve from Jim Harper of the Cato Institute points to his blog post, chock full of links and information, about the upcoming House conference on legislative transparency.  Among the references are a number of quite detailed and on-target recommendations that Cato is making for the drafting process and content.

In particular, Cato proposes a legislative data model that would include a great deal of useful metadata at every level of a bill.  The model is quite similar to the California legislative model described at, parts of which I worked with in the California Law Hackathon. Although the model requires adding a lot of different kinds of metadata to the text,  all of that data is easily available when a bill is being written (e.g. the bill sponsor). It is much more difficult to extract by parsing after the fact. As I've discussed in earlier posts, a richly marked-up legislative text would be very valuable, and goes hand-in-hand with the recommendations I make to make the text itself more amenable to automated analysis.

Monday, January 30, 2012

Make Change Consistent: Legislative Recommendation #2

In preparation for the U.S. House conference on data and transparency, I'm making four basic recommendations that, together, would create a framework for a more efficient electronic representation of U.S. Federal statutes.

The attention from the House on data transparency, combined with current partisan gridlock on issues of policy, make this a perfect time to "reboot" Federal legislation for the 21th Century.  The reboot comes in two mutually reinforcing parts. The first is the subject of this blogpost: make change to existing law in a consistent manner. The second is to push ahead with codification of existing law, so that future legislation can be built on a cleaner, more consistent and accessible platform. That is the topic of my next post.

Current legislation is riddled by language that describes the changes that should be made to the law.  Take as an example, the Health Care Reform Act (HR 1692) which I've also referred to in my Quora answer:
Section 1848(d) of the Social Security Act (42 U.S.C. 1395w–4(d)) is amended—
(1) in paragraph (10), in the heading, by striking ‘‘PORTION’’
and inserting ‘‘JANUARY THROUGH MAY ’’; and
(2) by adding at the end the following new paragraph...
Instead of this word-by-word description of the edits, a wholesale replacement should be made, preferably at the section level, changing the old section for the new one.  In California, for example, this is done with language like this-- "Section [53395.1] of the [Government Code] is amended to read:"

If you are co-authoring a memo, Congress's writing approach is the equivalent of describing edits to your co-author in the text of an email ("In sentence three, take out the first two words...").  What I am recommending, and what California does, essentially, is redlining.  Make changes in a consistent way, and replace entire sections with any amendment.

I understand, and have heard many of the reasons why this kind of change is not easy.  Tradition, and bureaucratic inertia plays a large part in how Congressional language is currently crafted.  Writing by committee is already difficult.  Imagine the challenge of writing by various committees, hundreds of members, two chambers with, to put it mildly, some disagreements in priorities.

There are many reasons to think, however, that this technical change to the drafting form will be welcomed on Capitol Hill. It will provide more clarity, not only for the public, but for Congressional offices themselves, about what impacts a bill would have on existing law. The "replacement" method of legislative drafting would ultimately be easier for each Congressional office to participate in.  And there are models to follow: California's legislature, not known for its easygoing legislative process is, by and large, able to make its changes using this method.

A major technical challenge to adopting this method at the Federal level is that many of our statutes, are "free floating".  Either they stand apart from the U.S. Code, and exist only in the "Statutes at Large", or they have been incorporated into a Title of the U.S. Code, but that Title, as an organized volume, has not been passed into law, in the process known as positive law codification.  Congress then, cannot technically refer to the existing text as "section 501 of Title 26", because Title 26 is not "positive law".

Instead, Congress refers to the original Acts which passed and which are being modified (e.g. the "Internal Revenue Code of 1986" or the "Patient Protection and Affordable Care Act"), and may include a parallel citation to the Code Title.  These Acts, in turn, make their amendments to prior Acts, some of which have been codified and some of which haven't. This has lead to a significant tangled backlog of legislation, which just makes the current system more difficult to change.  And that is why this change goes hand-in-hand with positive law codification, the subject of my next post.

Wednesday, January 25, 2012

Write in Plain English: Legislative Recommendation #1

I earlier listed recommendations for the U.S. House's Feb. 2 conference on legislative transparency and accessibility.  These recommendations, to improve the human and machine accessibility of Federal legislation first require changes in the way that legislation is written, and second focus on the technology to support those changes.  Some of these changes will meet with more cultural resistance and be harder to implement.  This is probably the case with my first recommendation, to Write in Plain English, but I believe that Congress can make initial steps toward this goal right away.

By plain English, I mean writing clearly and consistently.  In the highly nuanced and technical areas that are addressed by much of our legislation, plain language will still include technical language.  And statutes will still require some expertise to understand and apply.  Legislation will still include ambiguity: in fact much legislative compromise is built on carefully crafted, ambiguous, language.  However, such ambiguity also carries heavy costs in decreased certainty, increased litigation costs and increased polarization.

As for implementation, there are already plain language initiatives that apply to Federal agencies and stylistic guidelines for how to write in plain language. Although Congress is clearly a different beast, lessons from these initiatives can apply to legislative drafting, as well.  The centralized Office of Legislative Council already plays a very large role in drafting laws and crafting legislative language, for which most offices are very grateful. Strong plain language guidelines can be incorporated into the existing OLC guidelines (pdf) and, especially when backed by intelligent automation, can eventually make drafting a bit more of a science and less of an art.  For example, to the basic stylistic guidelines, we can add a bit of technology: expand the vocabulary that is explicitly defined, set standards for the categories and kinds of words and phrases that require definitions, and work to harmonize definitions, to the extent possible, going forward.

Another single, but very powerful, stylistic change is to write all legislation as a full text replacement, at the section level, as is done in California and some other state legislative systems.  I will discuss this more in my next post.

Friday, January 20, 2012

Legislative Standards: U.S. House Conference

The U.S. House will hold a conference about transparency and legislative data on February 2, 2012 (with remote streaming?).  As Daniel Schuman of the Sunlight Foundation writes, "This is a big deal."  In this time of extreme partisanship and one-upmanship in politics, legislative data transparency is both important and that both sides can, in principle, agree on.

I've written before about the importance of consistent standards in legislation, the use of meaningful metadata, and the value of version control. Federal legislative data currently passes through at least five offices on the way to being codified.  Because of many historical quirks, the codified version (the "U.S. Code") is often not the actual law, though for convenience most people, even lawyers, pretend that it is.  Simplifying the content of the law, as President Obama and others have called for in tax law, becomes harder when the basic technical issues of changing the law are so arcane and complex.

The Feb. 2 conference, brings together six offices that prepare legislative data on its way from bill to code, and can spark the creation of a unified data standard to make creating and understanding the law more accessible.  In the next few posts, I will detail some suggestions of key changes that would advance these goals.  For example:

  1. Write in plain English
  2. Write changes to statutory sections as full replacements of previous sections. Use a consistent format to make changes (e.g. "Section 444 is amended to read as follows:"). 
  3. Commit to enacting positive law codification for all Titles. A positive law codification project for Title 26 should be considered as a non-partisan starting point for the effort to simplify tax law.
  4. Adopt a clean and simple XML data standard for Federal Legislation.
These are not original suggestions, nor are they comprehensive, but making these changes would provide the foundations for a far more transparent legislative framework.

Tuesday, January 17, 2012

New Users = New Bugs. Patience requested.

We're getting a lot of new sign-ups for the private beta of Tabulaw's legal research and writing platform.  Most of this was triggered by the review by Bob Ambrogi on his blog,, and word-of-mouth growth from there.  One of the results is that the new users have quickly found many bugs in the software.  Some small (logins are case sensitive), some large (rendering doesn't work on all versions of Internet Explorer), and many that will require significant changes in the application.

I've been pleasantly surprised by the range of people who are interested in an integrated legal research and writing platform like Tabulaw: lawyers from large and small firms, many law professors and students, legal technology entrepreneurs, and government lawyers from an impressive selection of agencies from across the country.

Along with bugs and feature requests, the higher volume has also created a significant problem that may take more than software engineering to resolve: access to Google Scholar from the application has been disabled.  The reason --Scholar limits search requests from a single source-- points out one of the paradoxes of Google Scholar.  Scholar has become a terrific free resource for legal research.  Many articles have pointed out the potential for making Google Scholar (and Google Documents) a mainstay of legal work. As a free source of court opinions, connected to the open web, Google Scholar stands in contrast to the proprietary, walled-off databases of the major legal publishers.  This makes it -- theoretically -- possible to create a fluid workflow for lawyers who want to access legal information on the web and directly integrate it into their writing, without the inconvenience and cost of a publisher's paywall.

That is where Tabulaw comes in.  We are building a set of tools that helps to organize sources such as Google Scholar and integrate them seamlessly into the documents that lawyers and legal researchers are writing.  It makes it possible to imagine an ecosystem of applications by entrepreneurial startups using a common set of open access data, similar to the access developers have to APIs (programming interfaces) from Google and other tech leaders.

But there's the rub... Google Scholar Law, like other free online sources still have hidden limits, vestiges of the way that legal data is collected and owned, which make this vision of an open access web of legal data part of a more distant future.  Through initiatives like the California Law Hackathon, and our development of primary tax law resources at, we are working to bring that future closer.

I hope that some of the people who are trying out the private beta will work with us toward that goal. In the meantime, we have a lot of bugs to fix!

Wednesday, January 11, 2012

Tax Simplification: Use Data

This year, I hope politicians on both sides of the aisle heed professor Annette Nellen's advice to simplify tax law.  Nellen's article, for the American Institute of CPAs (AICPA) highlights remarks by IRS Commissioner Douglas H. Shulman on simplifying tax law.  Nellen provides a table showing broad agreement between the Commissioner's prescription for simplification, and recommendations by the AICPA.  They both start with setting common standards and seeking a single, simple approach to a tax problem.

I have often thought about tax simplification from the starting point of tax statutes, and simplification of the underlying Code. That may be what David Brooks had in mind, in this week's column, advising Obama to champion simplifying the tax code among other good government measures.  However, by using the data at its disposal, the IRS can go a long way toward simplification on its own.  What I mean is this:

The IRS has massive amounts of data on the transactions of citizens and businesses and what impact those transactions had on tax determinations by the IRS.  This data could be used to provide deterministic answers for a huge variety of specific taxpayer questions.

The IRS already provides guidance in a number of forms to individuals and businesses, with respect to the agency's interpretation of the law for particular circumstances.  However, this advice comes in the form of written documents and letters that add to the overall volume of information that is required to understand and act on the law.

What I imagine is a tax calculator -- something like H&R Block or TurboTax software  -- but using prior years' data, combined with IRS policy decisions, to prospectively help taxpayers determine the consequences of certain events or tax decisions.

Of course, the devil is in the details.  But this is a place where the high volume of data that the IRS deals with can actually be an advantage toward simplification, just as Google uses its tremendous data advantages to simplify many informational challenges that would otherwise be far too complex.