<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Softcore software development &#187; seneca</title>
	<atom:link href="http://tea.cesaroliveira.net/archives/tag/seneca/feed" rel="self" type="application/rss+xml" />
	<link>http://tea.cesaroliveira.net</link>
	<description>It&#039;s all about the cycles</description>
	<lastBuildDate>Sun, 01 Aug 2010 02:04:06 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>Uncovering the underlying metadata</title>
		<link>http://tea.cesaroliveira.net/archives/54</link>
		<comments>http://tea.cesaroliveira.net/archives/54#comments</comments>
		<pubDate>Thu, 22 Jan 2009 18:28:43 +0000</pubDate>
		<dc:creator>Cesar</dc:creator>
				<category><![CDATA[Web]]></category>
		<category><![CDATA[hugs]]></category>
		<category><![CDATA[audio]]></category>
		<category><![CDATA[bug]]></category>
		<category><![CDATA[extension]]></category>
		<category><![CDATA[html5]]></category>
		<category><![CDATA[seneca]]></category>
		<category><![CDATA[sleep]]></category>

		<guid isPermaLink="false">http://www.cesaroliveira.net/?p=53</guid>
		<description><![CDATA[A few weeks ago, I wanted to do some C++ Mozilla coding to make sure I wasn&#8217;t going soft. But I didn&#8217;t really know what to do. I left it for a bit until I found something weird about the HTML5 spec &#8211; there was a method of testing whether metadata has been loaded, but [...]]]></description>
			<content:encoded><![CDATA[<p>A few weeks ago, I wanted to do some C++ Mozilla coding to make sure I wasn&#8217;t going soft. But I didn&#8217;t really know what to do. I left it for a bit until I found something <a href="http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#dom-media-have_metadata" onclick="pageTracker._trackPageview('/outgoing/www.whatwg.org/specs/web-apps/current-work/multipage/video.html_dom-media-have_metadata?referer=');">weird about the HTML5 spec</a> &#8211; there was a method of testing whether metadata has been loaded, but no way to expose the metadata (eg. song title, artist, album, etc) to the user such as through page info.</p>
<p>I think this will be useful. As media starts being embedded into the web browser, it would make sense to start exposing this to the user. I know there have been a few instances where I was listening to something on the radio, but there was little hint of what the song was called (I usually tried to remember a few lyrics and did a Google search. Mixed success).</p>
<p>I brought this up in the whatwg irc channel, and apparently this is being considered for the next version of the spec. Which is understandable, because the server can always display the metadata. But often, media may not be central to the website. For example, background music.</p>
<p>I started look at the Audio/Video backend stuff that moz uses. It got confusing real quick (it doesn&#8217;t help that the <a href="http://mxr.mozilla.org/mozilla-central/source/content/html/content/src/nsHTMLAudioElement.cpp" onclick="pageTracker._trackPageview('/outgoing/mxr.mozilla.org/mozilla-central/source/content/html/content/src/nsHTMLAudioElement.cpp?referer=');">audio code</a> itself is completely empty). Plus I was in a hurry. So I decided to implement it as an extension.</p>
<p>It was a lovely experience. I had a few problems, including finding out that audio/video wasn&#8217;t actually being saved to the cache (<a href="https://bugzilla.mozilla.org/show_bug.cgi?id=469446" onclick="pageTracker._trackPageview('/outgoing/bugzilla.mozilla.org/show_bug.cgi?id=469446&amp;referer=');">bug 469446</a>). It was checked-in like 2 days after I found it out. Also, I hate string very much. The string guide helped, but it is still awful. And I made firefox crash a few times because I&#8217;m a nsCOMPtr n00b.</p>
<p>Right now, this extension is working only with ogg vorbis files. Which is stupid because &lt;audio /&gt; is rarely used anywhere, and if it is used, only with certain conditions (wikimedia commons uses the audio tag, but not really. Apparently, the video/audio tags start automatically downloading the media even if it isn&#8217;t under autoplay. This is a mess if you have dozens of audio tags in one page. <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=464272" onclick="pageTracker._trackPageview('/outgoing/bugzilla.mozilla.org/show_bug.cgi?id=464272&amp;referer=');">bug 464272</a>). It is so rarely used, that I had to create a <a href="/files/2009-01-22/audio">audio demo page</a> for testing purposes.</p>
<p>Using it is very simple. Right-clicking on a audio tag brings up the context menu. I decided to use the context menu over Page Info because the media tab of the Page info dialog is very much geared towards images, and that code has to be changed in the firefox source (it&#8217;s not easy/pretty to overlay).</p>
<div style="text-align:center;"><img src="/files/2009-01-22/context.png" alt="audio context menu"/></div>
<p>Which brings up the audio&#8217;s metadata</p>
<div style="text-align:center"><img src="/files/2009-01-22/properties.png" alt="audio properties"/></div>
<p>While a lot of metadata is displayed, some isn&#8217;t. For example, iTunes has support for cover art as a COVERART header. While you can put that in vorbis, it should be noted that it isn&#8217;t <a href="http://wiki.xiph.org/index.php/VorbisComment#Unofficial_.22COVERART.22_field" onclick="pageTracker._trackPageview('/outgoing/wiki.xiph.org/index.php/VorbisComment_Unofficial_.22COVERART.22_field?referer=');">widely supported</a>. So I decided to put in only the <a href="http://www.xiph.org/vorbis/doc/v-comment.html" onclick="pageTracker._trackPageview('/outgoing/www.xiph.org/vorbis/doc/v-comment.html?referer=');">standard headers</a> for now.</p>
<p>This is dealing with C++ code. Which is much more dangerous than javascript code because NS_ERROR_OMGWTF doesn&#8217;t appear in your error console when I try to free an uninitialized pointer. I made basic checks so hopefully nothing bad will happen. But I didn&#8217;t do extensive checking in case we have a bad ogg file or something.</p>
<p>Well, to be fair to me, I always save the function&#8217;s return value. I just didn&#8217;t check whether it passed nor did anything about it. And this won&#8217;t just crash at any time. It&#8217;ll crash if you try to load the metadata (I&#8217;m very nice like that).</p>
<p>The name of the <a href="https://addons.mozilla.org/en-US/firefox/addon/10465" onclick="pageTracker._trackPageview('/outgoing/addons.mozilla.org/en-US/firefox/addon/10465?referer=');">extension is saraswati</a>, named after the <a href="http://en.wikipedia.org/wiki/Saraswati" onclick="pageTracker._trackPageview('/outgoing/en.wikipedia.org/wiki/Saraswati?referer=');">hindu God of music and knowledge</a> (really, a Google search helped out a lot here). Please enjoy! (Linux x86, x86-64 and Windows x86 only right now)</p>
]]></content:encoded>
			<wfw:commentRss>http://tea.cesaroliveira.net/archives/54/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>New Editor tool finally landed on AMO</title>
		<link>http://tea.cesaroliveira.net/archives/41</link>
		<comments>http://tea.cesaroliveira.net/archives/41#comments</comments>
		<pubDate>Fri, 10 Oct 2008 02:56:55 +0000</pubDate>
		<dc:creator>Cesar</dc:creator>
				<category><![CDATA[addons]]></category>
		<category><![CDATA[editor]]></category>
		<category><![CDATA[mozilla]]></category>
		<category><![CDATA[seneca]]></category>

		<guid isPermaLink="false">http://www.cesaroliveira.net/?p=41</guid>
		<description><![CDATA[I made a post several months ago about an diffing zippy files on the web. While that stuff landed, it was difficult to use because I deferred actually showing what files changed to a later date. oops Well, that made it nearly useless, because it was less effort to download each xpi file and do [...]]]></description>
			<content:encoded><![CDATA[<p>I made a post several months ago about an <a href="http://www.cesaroliveira.net/tea/archives/26" onclick="pageTracker._trackPageview('/outgoing/www.cesaroliveira.net/tea/archives/26?referer=');">diffing zippy files</a> on the web. While that stuff landed, it was difficult to use because I deferred actually <em>showing what files changed</em> to a later date. <a href="http://failblog.org/" onclick="pageTracker._trackPageview('/outgoing/failblog.org/?referer=');">oops</a></p>
<p>Well, that made it nearly useless, because it was less effort to download each xpi file and do a diff locally.</p>
<p>Well, I&#8217;m glad to say that I&#8217;ve right a worlds wrong. Some stuff I was working on finally landed recently (can&#8217;t remember when. But it wasn&#8217;t working two days ago. So somewhere between Monday and today). So you may notice a few changes.</p>
<p>The first being the side panel that shows all the files. Any files that were modified appear italicized. Which is a bit subtle, but is the only indication I could give that wouldn&#8217;t make it stick out like a bad rash, or a <a href="http://www.autocarparts.com/images/products/Honda/honda_element.jpg" onclick="pageTracker._trackPageview('/outgoing/www.autocarparts.com/images/products/Honda/honda_element.jpg?referer=');">honda element</a>. Suggestions welcome.</p>
<div style="text-align:center;"><img src="/files/2008-10-09/side-panel2.jpg" alt="side panel" /><br />Side Panel</div>
<p>So this pretty much completes what was started. Only some minor improvements were made since the last post. Including a wikipedia colour style diff :</p>
<div style="text-align:center;"><img src="/files/2008-10-09/fullscreen.jpg" alt="full screen image of the xpi diff"/><br />Full screen view</div>
<p>I hope that this will be useful to AMO editors and help speed up the reviews. Which is in much better shape than they were two months ago.</p>
]]></content:encoded>
			<wfw:commentRss>http://tea.cesaroliveira.net/archives/41/feed</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Nuit Blanche</title>
		<link>http://tea.cesaroliveira.net/archives/39</link>
		<comments>http://tea.cesaroliveira.net/archives/39#comments</comments>
		<pubDate>Sun, 05 Oct 2008 23:53:54 +0000</pubDate>
		<dc:creator>Cesar</dc:creator>
				<category><![CDATA[hugs]]></category>
		<category><![CDATA[seneca]]></category>
		<category><![CDATA[sleep]]></category>
		<category><![CDATA[toronto]]></category>

		<guid isPermaLink="false">http://www.cesaroliveira.net/?p=39</guid>
		<description><![CDATA[Toronto was transformed at night on Saturday October 5th for some art thing called nuit blanche. Basically, it&#8217;s an event open all night with gallery&#8217;s across the city open for the public. Tom, Lucy and I spent all Saturday night and Sunday morning making trips to these exhibits, and had breakfast at 6-7am with Mike. [...]]]></description>
			<content:encoded><![CDATA[<p>Toronto was transformed at night on Saturday October 5th for some art thing called <a href="http://www.scotiabanknuitblanche.ca/home.shtml" onclick="pageTracker._trackPageview('/outgoing/www.scotiabanknuitblanche.ca/home.shtml?referer=');">nuit blanche</a>. Basically, it&#8217;s an event open all night with gallery&#8217;s across the city open for the public. Tom, Lucy and I spent all Saturday night and Sunday morning making trips to these exhibits, and had breakfast at 6-7am with Mike.</p>
<p>I was also surprised by meeting Armen there, and his brothers. It makes sense now.</p>
<p>While loads of fun (I didn&#8217;t know what we were doing when we met up. And I didn&#8217;t know how much I would enjoy it when I found out. But it turned out to be a great way to spend a Saturday night), I didn&#8217;t get home until later in the morning, where I crashed in my bed and didn&#8217;t wake up until 16:00 today. I didn&#8217;t manage to get any pictures because I didn&#8217;t bring my camera, but I did get a video of <a href="http://blinkenlights.net/blog" onclick="pageTracker._trackPageview('/outgoing/blinkenlights.net/blog?referer=');">blinkenlights</a> on my cell. They took over City Hall, and it was an entertaining light show. They also had it rigged up to play pong and tetris, but it was not well executed.</p>
<p><video src="http://www.cesaroliveira.net/files/2008-10-05/second.ogg" controls="true">You can view the video directly in the browser if you had a browser that supports the video tag. You&#8217;re stuck with <a href="http://www.cesaroliveira.net/files/2008-10-05/second.ogg" onclick="pageTracker._trackPageview('/outgoing/www.cesaroliveira.net/files/2008-10-05/second.ogg?referer=');">downloading it</a> <img src='http://tea.cesaroliveira.net/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </video></p>
]]></content:encoded>
			<wfw:commentRss>http://tea.cesaroliveira.net/archives/39/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>screen + irssi and the dreaded reboot</title>
		<link>http://tea.cesaroliveira.net/archives/36</link>
		<comments>http://tea.cesaroliveira.net/archives/36#comments</comments>
		<pubDate>Tue, 16 Sep 2008 21:40:04 +0000</pubDate>
		<dc:creator>Cesar</dc:creator>
				<category><![CDATA[hugs]]></category>
		<category><![CDATA[seneca]]></category>
		<category><![CDATA[tip]]></category>

		<guid isPermaLink="false">http://www.cesaroliveira.net/?p=36</guid>
		<description><![CDATA[If you use screen + irssi a lot, you&#8217;ll know that sickening feeling when &#8220;screen -r&#8221; gives you a message that there are no screens to be resume. This happens when the computer is rebooted, and you lose all your screens. To add salt to the wound, you probably had your channels in some very [...]]]></description>
			<content:encoded><![CDATA[<p>If you use screen + irssi a lot, you&#8217;ll know that sickening feeling when &#8220;screen -r&#8221; gives you a message that there are no screens  to be resume. This happens when the computer is rebooted, and you lose all your screens. To add salt to the wound, you probably had your channels in some very specific window. For example, #seneca could be windows 2 and #developers could be window 6. And you can&#8217;t quite remember what was between 2 and 6.</p>
<p>While I can&#8217;t solve the computer rebooting problem, I have figured out a way to make connecting back to all your channels painless.</p>
<p>The first thing you have to do is create a network. A network would then contain a list of channels. Here&#8217;s the syntax to create a network :<br />
<strong>/network add -nick cesar -realname &#8220;Cesar Oliveira&#8221; -autosendcmd &#8220;/^msg nickserv identify password&#8221; mozilla</strong><br />
It&#8217;s pretty self-explanatory.<br />
<em>-autosendcmd</em> sends a message to the server once you are connected. In my case, I identified myself to nickserv with my cryptographically strong password (The /^msg means I don&#8217;t want to see the input. That way it doesn&#8217;t open up a new query window in irssi).<br />
The last parameter is just the name of the network, which doesn&#8217;t have to be the same name as the server your connecting to (eg. irc.mozilla.org).</p>
<p>Then you add channels:<br />
<strong>/channel add -auto #seneca mozilla</strong><br />
<strong>/channel add -auto #firefox mozilla</strong><br />
&#8230;<br />
<strong>/channel add -auto #kittens mozilla</strong><br />
mozilla should correspond to your network. #seneca will be window 2, #firefox will be window 3&#8230;</p>
<p>Finally, when you get disconnected, you can connect to the irc server :<br />
<strong>/connect -ssl -network mozilla irc.mozilla.org</strong></p>
<p>Enjoy!</p>
]]></content:encoded>
			<wfw:commentRss>http://tea.cesaroliveira.net/archives/36/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Not even bytecode can save me now&#8230;</title>
		<link>http://tea.cesaroliveira.net/archives/35</link>
		<comments>http://tea.cesaroliveira.net/archives/35#comments</comments>
		<pubDate>Tue, 16 Sep 2008 07:57:18 +0000</pubDate>
		<dc:creator>Cesar</dc:creator>
				<category><![CDATA[addons]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[editor]]></category>
		<category><![CDATA[seneca]]></category>

		<guid isPermaLink="false">http://www.cesaroliveira.net/?p=35</guid>
		<description><![CDATA[I&#8217;ve been spending a few days on trying to develop a few tools for editors to use to quickly reject addons for obvious defects, such as loading remote scripts. But I wanted to get deeper into the javascript stuff mainly because it&#8217;s a) it&#8217;s harder and b) it&#8217;s where the real problems lie. But as [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been spending a few days on trying to develop a few tools for editors to use to quickly reject addons for obvious defects, such as loading remote scripts. But I wanted to get deeper into the javascript stuff mainly because it&#8217;s a) it&#8217;s harder and b) it&#8217;s where the real problems lie.</p>
<p>But as anyone can tell you, it&#8217;s not an easy task (going towards damn near impossible). Firstly, you can&#8217;t really use a lexical parser. Well, you can, but it won&#8217;t be dependable. Let&#8217;s take an example out of the Reviewer&#8217;s guide :</p>
<p><code>document["crea" + "teElement"]("s" + "c" + "r" + ["i", "p", "t"].join(""));</code></p>
<p>Which is sneaky way of creating a script element, though I question the competence of someone who will use this as their main line of attack (it&#8217;s practically spelled out for you). But taking this as a use case, and ignoring the fact that they can use document[cheese] instead, I wondering if parsing the javascript would make figuring this stuff out any better.</p>
<p>Happily, I have spidermonkey and a <a href="http://developer.mozilla.org/en/Introduction_to_the_JavaScript_shell" onclick="pageTracker._trackPageview('/outgoing/developer.mozilla.org/en/Introduction_to_the_JavaScript_shell?referer=');">js shell</a> to do any parsing I wish. But I found out some cool things that you can do in the shell, such as looking at the bytecode for an entire function using the dis() command.</p>
<p>This was interesting. Certainly, there are some optimizations you can do for :<br />
<code>document["crea" + "teElement"]("s" + "c" + "r" + ["i", "p", "t"].join("")); </code><br />
I would be shocked if it didn&#8217;t end up spelling out :<br />
<code>document["createElement"]("script"); </code></p>
<p>I had a few hurdles to overcome. Firstly, document is not defined in the javascript shell. Thinking it was defined in the xpcshell (owww. I was misled by some apparently <a href="http://mxr.mozilla.org/mozilla-central/search?find=%2Fjs%2Fsrc%2Fxpconnect%2Ftests%2F&amp;string=document" onclick="pageTracker._trackPageview('/outgoing/mxr.mozilla.org/mozilla-central/search?find=_2Fjs_2Fsrc_2Fxpconnect_2Ftests_2F_amp_string=document&amp;referer=');">unused tests</a> and my general ignorance that xpcshell tests went into unit/ and not js/ directory) I went through the added trouble of coping dis() and related functions from <a href="http://mxr.mozilla.org/mozilla-central/source/js/src/js.cpp#1373" onclick="pageTracker._trackPageview('/outgoing/mxr.mozilla.org/mozilla-central/source/js/src/js.cpp_1373?referer=');">js.cpp</a> to xpcshell.cpp. Once I realized that document wasn&#8217;t defined, I made a document mock object just to see what the blasted bytecode would look like.</p>
<p>I was a little disappointed. This source:<br />
<!--start_raw--><code>
<pre>var document = {
createElement : function(s) {
print("damn");
}
};

function foo() {
document["crea" + "teElement"]("s" + "c" + "r" + ["i", "p", "t"].join(""));
}

dis(foo);</pre>
<p></code><!--end_raw--></p>
<p>Ended up being this bytecode :<br />
<!--start_raw--><br />
<code style="font-size:smaller;">
<pre>
00000:  name "document"
00003:  string "createElement"
00006:  callelem
00007:  string "s"
00010:  string "c"
00013:  add
00014:  string "r"
00017:  add
00018:  newinit 3
00020:  zero
00021:  string "i"
00024:  initelem
00025:  one
00026:  string "p"
00029:  initelem
00030:  int8 2
00032:  string "t"
00035:  initelem
00036:  endinit
00037:  callprop "join"
00040:  string ""
00043:  call 1
00046:  add
00047:  call 1
00050:  pop
00051:  stop

Source notes:
  0:     0 [   0] newline
  1:     6 [   6] pcbase   offset 6
  3:    37 [  31] xdelta
  4:    37 [   0] pcbase   offset 19
  6:    43 [   6] pcbase   offset 25
  8:    47 [   4] pcbase   offset 47</pre>
<p></code><!--end_raw--></p>
<p>So, almost. The document["createElement"] part was correct, but the .join() wasn&#8217;t optimized. Although I wasn&#8217;t overly estatic, I knew that this was just one (somewhat lame) use case in the countless of possible others.</p>
<p>This is making me rethink whether lexical tools <em>are</em> the way to go. While they don&#8217;t give you any definitive proof that there is a possible security hole, they can still be useful. For example, if you want to use XMLHttpRequest, then you have to call it at least once in your program (even if you say <code>var Widget = XMLHttpRequest</code>). And at least that can bring up warning flags, or at least give editors a place to look.</p>
<p>I don&#8217;t think any tool can completely replace a human being. But hopefully, tools will help make the review process easier because you can start looking at high-risk areas first rather than starting from a arbitrary point and not coming across something until 10 minutes later.</p>
]]></content:encoded>
			<wfw:commentRss>http://tea.cesaroliveira.net/archives/35/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>An overly-complex diabolical plan</title>
		<link>http://tea.cesaroliveira.net/archives/18</link>
		<comments>http://tea.cesaroliveira.net/archives/18#comments</comments>
		<pubDate>Fri, 06 Jun 2008 00:28:50 +0000</pubDate>
		<dc:creator>Cesar</dc:creator>
				<category><![CDATA[addons]]></category>
		<category><![CDATA[intern]]></category>
		<category><![CDATA[seneca]]></category>
		<category><![CDATA[wildon]]></category>

		<guid isPermaLink="false">http://www.cesaroliveira.net/?p=18</guid>
		<description><![CDATA[So here is a diagram of the plan in which I had in mind to take over the world and catalog all of the extensions on the web: Click for a larger image Thank you Dia for letting my express my thoughts in boxes and stick figures. Here is a quick breakdown of some of [...]]]></description>
			<content:encoded><![CDATA[<p>So here is a diagram of the plan in which I had in mind to take over the world and catalog all of the extensions on the web:<br />
<a href="/images/misc/2008-06-05/theplan.png"><img src="/images/misc/2008-06-05/theplan-resize.png"/></a><br />
<i>Click for a larger image</i></p>
<p>Thank you <a href="http://www.gnome.org/projects/dia/" onclick="pageTracker._trackPageview('/outgoing/www.gnome.org/projects/dia/?referer=');">Dia</a> for letting my express my thoughts in boxes and stick figures. Here is a quick breakdown of some of the components</p>
<ol>
<li>A <strong>URL list</strong> is simply a list of URL that are known to contain extensions. For example source repositories such as AMO and mozdev.</li>
<li><strong>Google API</strong> for more separated addons, such as those on blogs and personal sites</li>
<li><strong>Manual entries</strong> for addons not hosted on webpages. These are usually commercial addons such as McAfee.</li>
<li><strong>Site-specific</strong> and <strong>generic</strong> refer to the rules that the crawler must obey. For example, a generic crawler would crawl a personal site such as example.com, while a site-specific policies would handle sites such as AMO where experimental addons require a login.</li>
<li><strong>Crawler</strong> is a web crawler. I have been having difficulty finding the best tool for the job.</li>
<li><strong>Parser</strong> parses .xpi files. We should also save the html files to extract contextual information where-ever possible.</li>
<li><strong>Site-speicifc persistent storage</strong> is just a database for each site we visit. This may have to be rethought, but I want some sort of redundancy plan to keep files saved even if something horrendous happen to a central database. Especially when dealing with beta software and unfamiliar technology such as web crawlers.</li>
<li><strong>Compared</strong> compares what is stored with a central database. Addons are updated all the time, so we want to the most up-to-date versions available.</li>
<li><strong>View</strong> is used by the <strong>website</strong> to provide information for the <strong>user</strong>.</li>
</ol>
<p>There are still some quirks which have to be figured out:</p>
<ul>
<li>Version bumping on AMO doesn&#8217;t change the actual install.rdf in the xpi file. Instead, Firefox does some update magic to fix that. I either need to work with said magic, or leave it alone (I don&#8217;t think it is entirely a big deal. But it should be noted).</li>
<li>JSpider is a java spider that I have been setting my eyes on. Yeah, it&#8217;s java, but many other crawlers are too. Many other crawlers do both crawl and index, and I different functionality (I need a flexible crawler. Forget the indexer). Unfortunately, JSpider doesn&#8217;t have POST data and web form authentication. Which means I&#8217;m going to have to fix that if I want to use it.</li>
<li><a href="http://code.google.com/apis/ajaxsearch/terms.html" onclick="pageTracker._trackPageview('/outgoing/code.google.com/apis/ajaxsearch/terms.html?referer=');">Google&#8217;s Search API TOS</a> doesn&#8217;t seem to be spider friendly. I may have to try out other web search engines.</li>
</ul>
<p>On a brighter note, I put up the <a href="http://repository.cesaroliveira.net/index.cgi/wildon/" onclick="pageTracker._trackPageview('/outgoing/repository.cesaroliveira.net/index.cgi/wildon/?referer=');">sources of my project</a> on the web. And even a nice place to <a href="http://www.cesaroliveira.net/wildon/frontend/" onclick="pageTracker._trackPageview('/outgoing/www.cesaroliveira.net/wildon/frontend/?referer=');">play in</a>. It&#8217;s a bit slow, but I&#8217;m probably into the &#8220;<a href="http://www.sqlite.org/whentouse.html" onclick="pageTracker._trackPageview('/outgoing/www.sqlite.org/whentouse.html?referer=');">this isn&#8217;t what you should sqlite for</a>&#8221; territory.</p>
]]></content:encoded>
			<wfw:commentRss>http://tea.cesaroliveira.net/archives/18/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>The many ways around a problem</title>
		<link>http://tea.cesaroliveira.net/archives/17</link>
		<comments>http://tea.cesaroliveira.net/archives/17#comments</comments>
		<pubDate>Wed, 28 May 2008 16:11:13 +0000</pubDate>
		<dc:creator>Cesar</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[intern]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[seneca]]></category>

		<guid isPermaLink="false">http://www.cesaroliveira.net/?p=17</guid>
		<description><![CDATA[I came across a bug in the zipfile python module yesterday that I had to fix today. The problem occurs when you try to create a ZipFile object and passing it a corrupt zip file. It doesn&#8217;t handle it gracefully like returning None or throwing an exception. Rather it heads into an infinite loop. This [...]]]></description>
			<content:encoded><![CDATA[<p>I came across a bug in the zipfile python module yesterday that I had to fix today. The problem occurs when you try to create a ZipFile object and passing it a corrupt zip file. It doesn&#8217;t handle it gracefully like returning None or throwing an exception. Rather it heads into an infinite loop.</p>
<p>This is rather unfortunate for me. How would I get around this problem? The first thing I did was check for an updated python. Which there was a minor version upgrade. I found the changelog (why do they hide these things?) and noticed a few bugs resolved with the zipfile module. So I installed. Unfortunately, this didn&#8217;t solve my problem.</p>
<p>I managed to find a <a href="http://bugs.python.org/issue1622" onclick="pageTracker._trackPageview('/outgoing/bugs.python.org/issue1622?referer=');">bug number</a> in the python bug tracking software about people having similar problems. There was a patch, but hasn&#8217;t landed. I downloaded the latest stable version, but the patch wouldn&#8217;t go through. So I had to cvs checkout trunk and apply it. Once installed, I tried it and it worked! Success.</p>
<p>However, it broke other library I was using (PyXML). Unfortunate for me, the recent trunk build didn&#8217;t seem to fair any better.</p>
<p>At this point, I wasn&#8217;t in the mood for debugging. I had a few options at my disposal :</p>
<ol>
<li>Ignore this particular file</li>
<li>Suck it up and debug it.</li>
<li>Find a whacky work-around</li>
</ol>
<p>Option 1 isn&#8217;t an option. Option 2 I tried for a fair while, but nothing worked. So Option 3 was my only option!</p>
<p>I tried using a lower level library to see if I can fix the problem (zlib library), but that didn&#8217;t work well at all.</p>
<p>I finally thought I had no choice but to initiate a thread to try and unzip the xpi, and if it took longer than 10 seconds, to kill the thread somehow. While seriously looking into this, and fighting the temptation to take tequelia shots at work. I came across signals (which I thought I could use to send to the thread. I&#8217;m so naive). It turns out, you can throw a signal after a specific number of seconds and it throws the SIGALRM. This was <strong>exactly</strong> what I needed without the extra complexity. The <a href="http://docs.python.org/lib/node545.html" onclick="pageTracker._trackPageview('/outgoing/docs.python.org/lib/node545.html?referer=');">example provided</a> was almost exactly what I did too! Here is my solution to the problem :<br />
<code>
<pre>
		signal.signal(signal.SIGALRM, signal_handler)
		signal.alarm(10)
		try:
			zippy = zipfile.ZipFile(io, 'r')
			signal.alarm(0)
		except:
			print "\tZipFile Timeout"
			continue
</pre>
<p></code></p>
<p>Maybe python isn&#8217;t just for programming sissies after all.</p>
]]></content:encoded>
			<wfw:commentRss>http://tea.cesaroliveira.net/archives/17/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>It&#8217;s hot down here</title>
		<link>http://tea.cesaroliveira.net/archives/16</link>
		<comments>http://tea.cesaroliveira.net/archives/16#comments</comments>
		<pubDate>Tue, 27 May 2008 07:56:21 +0000</pubDate>
		<dc:creator>Cesar</dc:creator>
				<category><![CDATA[addons]]></category>
		<category><![CDATA[personal]]></category>
		<category><![CDATA[seneca]]></category>
		<category><![CDATA[tinderbox]]></category>

		<guid isPermaLink="false">http://www.cesaroliveira.net/?p=16</guid>
		<description><![CDATA[So I have been spending a few hours here and there since starting my internship doing this side project. It&#8217;s an extension that watches the tinderbox tree and reports back what is burning, and the status of certain tinderbox&#8217;n that your interested in. There were a few goals I had in this release. The main [...]]]></description>
			<content:encoded><![CDATA[<p>So I have been spending a few hours here and there since starting my internship doing this side project. It&#8217;s an extension that watches the tinderbox tree and reports back what is burning, and the status of certain tinderbox&#8217;n that your interested in.</p>
<p>There were a few goals I had in this release. The main objective however, is to help avoid making trips to the tinderbox page (because it&#8217;s large, and slow). For me at least, I am only concerned about Linux tinderboxes being red so I can checkout <img src='http://tea.cesaroliveira.net/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> . But others might have different needs. So I generally tried to include everything I can. But I could have made a mess of things.</p>
<p>I should mention that you should have a reasonably fast connection (ie. not 56K modem). Even GoogleWiFi was able to reasonably download the json and bonsai xml files that I needed to get things working. Most developers should be fine.</p>
<p>I mainly tried to squeeze as much information as possible into two popup menus, making use of the tooltip to show more information then would be otherwise possible. I also show what menuitems are links by giving them an icon. But it has been a bit overdone.</p>
<p>Anyways, here are some images to show you what you can expect.</p>
<p>When loading, you&#8217;ll be amused by the animated png throbber that shows up on the statusbar<br />
<img src="http://cesaroliveira.net/images/misc/2008-05-27/loading.png"/></p>
<p>Before it can be useful, you have to set it up<br />
<img src="http://cesaroliveira.net/images/misc/2008-05-27/options.png"/></p>
<p>The options menu shows you what tinderboxes are available to be watch. For now, you will only see Firefox tinderbox. This was mostly because I was less interested in the other trees. Timeout refers to how long the extension should wait before updating. You want to keep this value to be reasonable.<br />
<img src="http://cesaroliveira.net/images/misc/2008-05-27/options-dialog.png"/></p>
<p>The statusbar icon will show you the worst state of any of your chosen tinderbox trees.<br />
<img src="http://cesaroliveira.net/images/misc/2008-05-27/warning.png"/></p>
<p>A left click shows tinderboxes and their status<br />
<img src="http://cesaroliveira.net/images/misc/2008-05-27/tinderbox.png"/></p>
<p>A right click shows bonsai information. From bottom to top, it shows most recent check-ins. Hovering over menuitems gives you the time/date of check-in as well the check-in message.<br />
<img src="http://cesaroliveira.net/images/misc/2008-05-27/tooltip.png"/></p>
<p>Sub menus show a component::file display. The reason for this was because showing the full path took too much room, so I wanted to show what I thought would give you enough information so you can reasonably take an educated guess as to what was being changed. Hovering, of course, shows you the full path and new version.<br />
<img src="http://cesaroliveira.net/images/misc/2008-05-27/bonsai.png"/></p>
<p>Bwahaha, the extension lives here in this <a href="http://www.cesaroliveira.net/extensions/smokey.xpi" onclick="pageTracker._trackPageview('/outgoing/www.cesaroliveira.net/extensions/smokey.xpi?referer=');">insecure site</a> until I get it up on <acronym title="addons.mozilla.org">AMO</acronym>. You can also fetch the source from <a href="http://repository.cesaroliveira.net/" onclick="pageTracker._trackPageview('/outgoing/repository.cesaroliveira.net/?referer=');">repository.cesaroliveira.net</a>. Any criticisms (hopefully constructive) can be emailed. In the meantime, enjoy this most beta software <img src='http://tea.cesaroliveira.net/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://tea.cesaroliveira.net/archives/16/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Taming the beast from within</title>
		<link>http://tea.cesaroliveira.net/archives/15</link>
		<comments>http://tea.cesaroliveira.net/archives/15#comments</comments>
		<pubDate>Thu, 22 May 2008 17:09:14 +0000</pubDate>
		<dc:creator>Cesar</dc:creator>
				<category><![CDATA[addons]]></category>
		<category><![CDATA[intern]]></category>
		<category><![CDATA[seneca]]></category>
		<category><![CDATA[wildon]]></category>

		<guid isPermaLink="false">http://www.cesaroliveira.net/?p=15</guid>
		<description><![CDATA[The next 5 paragraphs are me whining. To get to the real import stuff, start at paragraph 6 So I have been pouring two weeks into WildOn, which is finding out how many addons exist out there in the wild. But before I start unleashing web crawlers on the web causing havoc and chaos, it [...]]]></description>
			<content:encoded><![CDATA[<p><em>The next 5 paragraphs are me whining. To get to the real import stuff, start at paragraph 6</em></p>
<p>So I have been pouring two weeks into <a href="http://www.cesaroliveira.net/tea/archives/9" onclick="pageTracker._trackPageview('/outgoing/www.cesaroliveira.net/tea/archives/9?referer=');">WildOn</a>, which is finding out how many addons exist out there in the wild. But before I start unleashing web crawlers on the web causing havoc and chaos, it will be helpful if we could compare what&#8217;s out there with what we know. What we know is everything from <acronym title="addons.mozilla.org"><a href="https://addons.mozilla.org" onclick="pageTracker._trackPageview('/outgoing/addons.mozilla.org?referer=');">AMO</a></acronym>, so we start there. The point of this extra work is to have some results, so that when we release a web crawler on AMO and tell it to find all the extensions, we&#8217;ll have something to compare it&#8217;s results to.</p>
<p>Actually, even this was a bit confusing. AMO provides an <a href="http://wiki.mozilla.org/Update:Remora_API_Docs" onclick="pageTracker._trackPageview('/outgoing/wiki.mozilla.org/Update_Remora_API_Docs?referer=');">API</a> to view its addons (well actually, two versions of the API, with the older being slightly more useful). But that information was eventually scrapped for several reasons. The main one being is that there is a lot of information on AMO that isn&#8217;t on the extension itself (such as, What operating systems are supported, and is the addon a theme or an extension. While the former has been supported since <a href="http://developer.mozilla.org/en/docs/install.rdf#targetPlatform" onclick="pageTracker._trackPageview('/outgoing/developer.mozilla.org/en/docs/install.rdf_targetPlatform?referer=');">Firefox 2</a>, I have rarely seen it used, the latter is completely <a href="http://developer.mozilla.org/en/docs/install.rdf#type" onclick="pageTracker._trackPageview('/outgoing/developer.mozilla.org/en/docs/install.rdf_type?referer=');">optional</a>). This makes any sort of conclusion inconclusive because you don&#8217;t have enough information.</p>
<p>Then there was the problem of having too much information in the database. To the point where ~4000 addons took up ~1.8gigs of information. To an sqlite datbase, this can get slow. When you try some queries, such as the number of extensions that support the &#8216;jp-JP&#8217; locale, this can get to be even more intensive process as you build a table that comprises of tens of thousands of rows (one row for each guid/locale combination). The reason for this is because older versions where being included in the same table as the newest version of the addon. Some addons had something like <a href="https://addons.mozilla.org/en-US/firefox/addons/versions/166" onclick="pageTracker._trackPageview('/outgoing/addons.mozilla.org/en-US/firefox/addons/versions/166?referer=');">50+ different versions</a>. The solution seemed to be to move old extensions to a different tables. SQL queries seem to go much faster.</p>
<p>Another issue that makes me loathe <a href="http://developer.mozilla.org/en/docs/RDF" onclick="pageTracker._trackPageview('/outgoing/developer.mozilla.org/en/docs/RDF?referer=');">RDF</a> is <a href="http://developer.mozilla.org/en/docs/install.rdf" onclick="pageTracker._trackPageview('/outgoing/developer.mozilla.org/en/docs/install.rdf?referer=');">install.rdf</a>. I <strong>strongly</strong> disagree with the use of rdf for anything <img src='http://tea.cesaroliveira.net/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  It becomes difficult to parse with a regular xml parser (there are a few python rdf libraries out there. But <a href="http://rdflib.net/" onclick="pageTracker._trackPageview('/outgoing/rdflib.net/?referer=');">rdflib</a>, the most promising, seems to like not working and not having good examples. Only <a href="http://www.bitstampede.com/" onclick="pageTracker._trackPageview('/outgoing/www.bitstampede.com/?referer=');">sheppy</a> can save them now, but he&#8217;s working on <acronym title="Mozilla Developer Center">mdc</acronym>). Especially with rdf:resource, which I am completely ignoring right now. So it seems that AMO editors like to get creative with install.rdf, which has caused problems for me (eg. I can not rely on targetPlatform. Some extensions actually have their targetPlatoform in the Description tag. I know this because one of the extensions had Firefox&#8217;s GUID <img src='http://tea.cesaroliveira.net/wp-includes/images/smilies/icon_sad.gif' alt=':(' class='wp-smiley' /> ). Also, some other quirks like having the id as an attribute of Description instead of a new tag. All things that are probably perfectly valid, but make my life significantly more difficult.</p>
<p><acronym title="Yet Another Problem">YAP</acronym> was that many early extensions did not use chrome.manifest. And some newer ones don&#8217;t. So to look up locale information, they were either in <em>install.rdf</em> or <em>contents.rdf</em>. This makes me (and by extension, kittens and baby Jesus) sad. I don&#8217;t have a fix for this yet.</p>
<p>But enough about problems, what about SUCCESS!?</p>
<p>Ok. So I managed to get a local copy of every extension that is on AMO. Since parsing an analyzing and writing to persistent storage takes a long time, I decided to save myself some trouble and just do the first 2500 extensions (out of the ~7K folders that I have).</p>
<p>Of the 2500 &#8216;extensions numbers&#8217;, <b>1630</b> where successfully analyzed. This is mainly because extension numbers don&#8217;t increment perfectly (eg. there is no <a href="https://services.addons.mozilla.org/en-US/firefox/api/1.1/addon/1" onclick="pageTracker._trackPageview('/outgoing/services.addons.mozilla.org/en-US/firefox/api/1.1/addon/1?referer=');">addon #1</a>. The first one starts at <a href="https://services.addons.mozilla.org/en-US/firefox/api/1.1/addon/4" onclick="pageTracker._trackPageview('/outgoing/services.addons.mozilla.org/en-US/firefox/api/1.1/addon/4?referer=');">#4</a>. Only about 100 addons failed to parse, giving me a success rate of 94%. Some extensions had quirks in them (eg. bad RDF) that were either invalid or I couldn&#8217;t figure them out.</p>
<p>Out of the 1630 extensions, this is what xulrunner-like applications they supported :<br />
<img src="/images/misc/2008-05-21/addons.png"/><br />
And Here are the approximate numbers :</p>
<table>
<tr>
<th>Name</th>
<th>Count</th>
</tr>
<tr>
<td>Prism/Webrunner</td>
<td>2</td>
</tr>
<tr>
<td>Songbird (old)</td>
<td>2</td>
</tr>
<tr>
<td>Instant</td>
<td>1</td>
</tr>
<tr>
<td>Midbrowser</td>
<td>3</td>
</tr>
<tr>
<td>toolkit (any gecko 1.9 application)</td>
<td>7</td>
</tr>
<tr>
<td>eMusic DLM</td>
<td>12</td>
</tr>
<tr>
<td>Seamonkey (broken GUID)</td>
<td>2</td>
</tr>
<tr>
<td>Nvu</td>
<td>11</td>
</tr>
<tr>
<td>Sunbird</td>
<td>16</td>
</tr>
<tr>
<td>Thunderbird</td>
<td>256</td>
</tr>
<tr>
<td>Songbird</td>
<td>13</td>
</tr>
<tr>
<td>Seamonkey</td>
<td>101</td>
</tr>
<tr>
<td>Flock</td>
<td>159</td>
</tr>
<tr>
<td>Netscape Navigator</td>
<td>68</td>
</tr>
<tr>
<td>Mozilla Suite</td>
<td>166</td>
</tr>
<tr>
<td>Firefox</td>
<td>1466</td>
</tr>
</table>
<p>This looks ok so far. One expects a few non-Firefox extensions. The Thunderbird numbers seem a little low. Reminder that this is only ~33% of the total addons.</p>
<p>Locales seem to be a bigger mess, as there are many early extensions that don&#8217;t use chrome.manifest, so I decided to skip it, but now realize I have to fix it. Out of 1630 addons, only 464 addons had chrome.manifest files that I was able to read. But here is the breakdown anyways :</p>
<p>Number of locales : 173 (en, en-US, en-GB are all considered different locales). There are some invalid locales. For example, <a href="https://addons.mozilla.org/en-US/firefox/addons/versions/2155" onclick="pageTracker._trackPageview('/outgoing/addons.mozilla.org/en-US/firefox/addons/versions/2155?referer=');">Xultris</a> has an invalid locale called xultrisLocale. This can be fixed with a regex expression, but anyways.</p>
<div style="height:300px; overflow:scroll">
<table>
<tr>
<th>Locale</th>
<th>Supported Extensions</th>
</tr>
<tr>
<td>en-US</td>
<td>439</td>
</tr>
<tr>
<td>sv-SE</td>
<td>57</td>
</tr>
<tr>
<td>it-IT</td>
<td>190</td>
</tr>
<tr>
<td>de-DE</td>
<td>189</td>
</tr>
<tr>
<td>pl-PL</td>
<td>137</td>
</tr>
<tr>
<td>es-ES</td>
<td>181</td>
</tr>
<tr>
<td>fi-FI</td>
<td>64</td>
</tr>
<tr>
<td>ru-RU</td>
<td>129</td>
</tr>
<tr>
<td>nl-NL</td>
<td>145</td>
</tr>
<tr>
<td>pt-BR</td>
<td>162</td>
</tr>
<tr>
<td>fr-FR</td>
<td>204</td>
</tr>
<tr>
<td>ja-JP</td>
<td>124</td>
</tr>
<tr>
<td>zh-CN</td>
<td>126</td>
</tr>
<tr>
<td>zh-TW</td>
<td>114</td>
</tr>
<tr>
<td>ko-KR</td>
<td>86</td>
</tr>
<tr>
<td>cs-CZ</td>
<td>90</td>
</tr>
<tr>
<td>en-GB</td>
<td>29</td>
</tr>
<tr>
<td>es-AR</td>
<td>54</td>
</tr>
<tr>
<td>mn-MN</td>
<td>4</td>
</tr>
<tr>
<td>ro-RO</td>
<td>30</td>
</tr>
<tr>
<td>sk-SK</td>
<td>118</td>
</tr>
<tr>
<td>ca-AD</td>
<td>56</td>
</tr>
<tr>
<td>el-GR</td>
<td>38</td>
</tr>
<tr>
<td>pt-PT</td>
<td>49</td>
</tr>
<tr>
<td>ar</td>
<td>18</td>
</tr>
<tr>
<td>uk-UA</td>
<td>61</td>
</tr>
<tr>
<td>sr-YU</td>
<td>12</td>
</tr>
<tr>
<td>bg-BG</td>
<td>28</td>
</tr>
<tr>
<td>hu-HU</td>
<td>84</td>
</tr>
<tr>
<td>hr-HR</td>
<td>64</td>
</tr>
<tr>
<td>da-DK</td>
<td>92</td>
</tr>
<tr>
<td>nb-NO</td>
<td>32</td>
</tr>
<tr>
<td>sl-SI</td>
<td>23</td>
</tr>
<tr>
<td>lt-LT</td>
<td>21</td>
</tr>
<tr>
<td>tr-TR</td>
<td>72</td>
</tr>
<tr>
<td>ar-TN</td>
<td>0</td>
</tr>
<tr>
<td>de-AT</td>
<td>10</td>
</tr>
<tr>
<td>he-IL</td>
<td>41</td>
</tr>
<tr>
<td>el</td>
<td>6</td>
</tr>
<tr>
<td>ja-JA</td>
<td>1</td>
</tr>
<tr>
<td>mk-MK</td>
<td>10</td>
</tr>
<tr>
<td>be-BY</td>
<td>25</td>
</tr>
<tr>
<td>sq-AL</td>
<td>8</td>
</tr>
<tr>
<td>en</td>
<td>19</td>
</tr>
<tr>
<td>de</td>
<td>22</td>
</tr>
<tr>
<td>es</td>
<td>7</td>
</tr>
<tr>
<td>km-KH</td>
<td>6</td>
</tr>
<tr>
<td>th-TH</td>
<td>14</td>
</tr>
<tr>
<td>it</td>
<td>13</td>
</tr>
<tr>
<td>az-AZ</td>
<td>2</td>
</tr>
<tr>
<td>id-ID</td>
<td>8</td>
</tr>
<tr>
<td>fy-NL</td>
<td>13</td>
</tr>
<tr>
<td>fa-IR</td>
<td>33</td>
</tr>
<tr>
<td>af-ZA</td>
<td>8</td>
</tr>
<tr>
<td>ar-SA</td>
<td>4</td>
</tr>
<tr>
<td>cy-GB</td>
<td>0</td>
</tr>
<tr>
<td>gl-ES</td>
<td>11</td>
</tr>
<tr>
<td>ms-MY</td>
<td>3</td>
</tr>
<tr>
<td>ar-JO</td>
<td>1</td>
</tr>
<tr>
<td>es-CH</td>
<td>0</td>
</tr>
<tr>
<td>es-CL</td>
<td>6</td>
</tr>
<tr>
<td>am-HY</td>
<td>1</td>
</tr>
<tr>
<td>hi-IN</td>
<td>5</td>
</tr>
<tr>
<td>vi-VN</td>
<td>4</td>
</tr>
<tr>
<td>en-AU</td>
<td>5</td>
</tr>
<tr>
<td>cz-CZ</td>
<td>1</td>
</tr>
<tr>
<td>he</td>
<td>1</td>
</tr>
<tr>
<td>fa</td>
<td>1</td>
</tr>
<tr>
<td>ur</td>
<td>1</td>
</tr>
<tr>
<td>ja</td>
<td>18</td>
</tr>
<tr>
<td>fr</td>
<td>23</td>
</tr>
<tr>
<td>nl</td>
<td>9</td>
</tr>
<tr>
<td>pl</td>
<td>9</td>
</tr>
<tr>
<td>ru</td>
<td>14</td>
</tr>
<tr>
<td>sk</td>
<td>15</td>
</tr>
<tr>
<td>eu-EU</td>
<td>1</td>
</tr>
<tr>
<td>de-CH</td>
<td>5</td>
</tr>
<tr>
<td>ko</td>
<td>4</td>
</tr>
<tr>
<td>hr</td>
<td>1</td>
</tr>
<tr>
<td>sr-Yu</td>
<td>3</td>
</tr>
<tr>
<td>ga-IE</td>
<td>7</td>
</tr>
<tr>
<td>pt-PR</td>
<td>0</td>
</tr>
<tr>
<td>tr</td>
<td>3</td>
</tr>
<tr>
<td>cs</td>
<td>4</td>
</tr>
<tr>
<td>hu</td>
<td>7</td>
</tr>
<tr>
<td>en-BZ</td>
<td>3</td>
</tr>
<tr>
<td>en-CA</td>
<td>4</td>
</tr>
<tr>
<td>en-IE</td>
<td>3</td>
</tr>
<tr>
<td>en-JM</td>
<td>3</td>
</tr>
<tr>
<td>en-NZ</td>
<td>3</td>
</tr>
<tr>
<td>en-PH</td>
<td>3</td>
</tr>
<tr>
<td>en-TT</td>
<td>3</td>
</tr>
<tr>
<td>en-ZA</td>
<td>3</td>
</tr>
<tr>
<td>en-ZW</td>
<td>3</td>
</tr>
<tr>
<td>es-BO</td>
<td>1</td>
</tr>
<tr>
<td>es-CO</td>
<td>1</td>
</tr>
<tr>
<td>es-CR</td>
<td>1</td>
</tr>
<tr>
<td>es-DO</td>
<td>1</td>
</tr>
<tr>
<td>es-EC</td>
<td>1</td>
</tr>
<tr>
<td>es-SV</td>
<td>1</td>
</tr>
<tr>
<td>es-GT</td>
<td>1</td>
</tr>
<tr>
<td>es-HN</td>
<td>1</td>
</tr>
<tr>
<td>es-NI</td>
<td>1</td>
</tr>
<tr>
<td>es-PA</td>
<td>1</td>
</tr>
<tr>
<td>es-PY</td>
<td>1</td>
</tr>
<tr>
<td>es-PE</td>
<td>1</td>
</tr>
<tr>
<td>es-PR</td>
<td>1</td>
</tr>
<tr>
<td>es-MX</td>
<td>2</td>
</tr>
<tr>
<td>es-UY</td>
<td>1</td>
</tr>
<tr>
<td>es-VE</td>
<td>1</td>
</tr>
<tr>
<td>fr-BE</td>
<td>2</td>
</tr>
<tr>
<td>fr-CA</td>
<td>2</td>
</tr>
<tr>
<td>fr-CH</td>
<td>2</td>
</tr>
<tr>
<td>fr-LU</td>
<td>2</td>
</tr>
<tr>
<td>fr-MC</td>
<td>2</td>
</tr>
<tr>
<td>eu-ES</td>
<td>3</td>
</tr>
<tr>
<td>zw-TH</td>
<td>0</td>
</tr>
<tr>
<td>da-DA</td>
<td>1</td>
</tr>
<tr>
<td>be</td>
<td>1</td>
</tr>
<tr>
<td>eo</td>
<td>1</td>
</tr>
<tr>
<td>ca</td>
<td>7</td>
</tr>
<tr>
<td>pt</td>
<td>2</td>
</tr>
<tr>
<td>ar-DZ</td>
<td>1</td>
</tr>
<tr>
<td>jp-JP</td>
<td>0</td>
</tr>
<tr>
<td>et-EE</td>
<td>2</td>
</tr>
<tr>
<td>nl-BE</td>
<td>1</td>
</tr>
<tr>
<td>eu</td>
<td>1</td>
</tr>
<tr>
<td>en-EN</td>
<td>0</td>
</tr>
<tr>
<td>sr-CS</td>
<td>1</td>
</tr>
<tr>
<td>ua-UA</td>
<td>1</td>
</tr>
<tr>
<td>no-NO</td>
<td>1</td>
</tr>
<tr>
<td>mn-MK</td>
<td>0</td>
</tr>
<tr>
<td>sl-SL</td>
<td>2</td>
</tr>
<tr>
<td>is</td>
<td>2</td>
</tr>
<tr>
<td>nn-NO</td>
<td>1</td>
</tr>
<tr>
<td>lv-LV</td>
<td>0</td>
</tr>
<tr>
<td>uk-AU</td>
<td>1</td>
</tr>
<tr>
<td>ja-JP-mac</td>
<td>2</td>
</tr>
<tr>
<td>ml-IN</td>
<td>1</td>
</tr>
<tr>
<td>wa-BE</td>
<td>1</td>
</tr>
<tr>
<td>is-IS</td>
<td>2</td>
</tr>
<tr>
<td>ca-ES</td>
<td>0</td>
</tr>
<tr>
<td>sv</td>
<td>1</td>
</tr>
<tr>
<td>fr-fR</td>
<td>0</td>
</tr>
<tr>
<td>da</td>
<td>7</td>
</tr>
<tr>
<td>fi</td>
<td>2</td>
</tr>
<tr>
<td>ro</td>
<td>1</td>
</tr>
<tr>
<td>ar-LB</td>
<td>0</td>
</tr>
<tr>
<td>sr-RS</td>
<td>3</td>
</tr>
<tr>
<td>en-UK</td>
<td>2</td>
</tr>
<tr>
<td>es-US</td>
<td>1</td>
</tr>
<tr>
<td>de-LI</td>
<td>1</td>
</tr>
<tr>
<td>de-LU</td>
<td>1</td>
</tr>
<tr>
<td>ko-Kr</td>
<td>1</td>
</tr>
<tr>
<td>no</td>
<td>1</td>
</tr>
<tr>
<td>zh</td>
<td>1</td>
</tr>
<tr>
<td>bg</td>
<td>1</td>
</tr>
<tr>
<td>tl</td>
<td>1</td>
</tr>
<tr>
<td>sr</td>
<td>1</td>
</tr>
<tr>
<td>sq</td>
<td>1</td>
</tr>
<tr>
<td>sl</td>
<td>2</td>
</tr>
<tr>
<td>xultrisLocale</td>
<td>1</td>
</tr>
<tr>
<td>ca-CD</td>
<td>1</td>
</tr>
<tr>
<td>se-SV</td>
<td>1</td>
</tr>
<tr>
<td>mn</td>
<td>0</td>
</tr>
<tr>
<td>mk</td>
<td>1</td>
</tr>
<tr>
<td>pa-IN</td>
<td>0</td>
</tr>
<tr>
<td>ka</td>
<td>1</td>
</tr>
<tr>
<td>lt</td>
<td>1</td>
</tr>
<tr>
<td>uk</td>
<td>2</td>
</tr>
<tr>
<td>ar-AR</td>
<td>1</td>
</tr>
<tr>
<td>he-HL</td>
<td>0</td>
</tr>
<tr>
<td>convertLocale</td>
<td>1</td>
</tr>
</table>
</div>
<p>Some locales will have 0 supported extensions. This is because We are only counting the most up-to-date extension, and not counting previous versions which may have supported that locale. While doing a graph for each locale would be unwise, a much wiser choice would be to break it down into language.</p>
<p>So which languages are best supported?</p>
<div style="height:300px; overflow:scroll">
<table>
<tr>
<th>Language</th>
<th>Extensions supported</th>
</tr>
<tr>
<td>en</td>
<td>462</td>
</tr>
<tr>
<td>sv</td>
<td>58</td>
</tr>
<tr>
<td>it</td>
<td>202</td>
</tr>
<tr>
<td>de</td>
<td>212</td>
</tr>
<tr>
<td>pl</td>
<td>145</td>
</tr>
<tr>
<td>es</td>
<td>192</td>
</tr>
<tr>
<td>fi</td>
<td>66</td>
</tr>
<tr>
<td>ru</td>
<td>143</td>
</tr>
<tr>
<td>nl</td>
<td>154</td>
</tr>
<tr>
<td>pt</td>
<td>165</td>
</tr>
<tr>
<td>fr</td>
<td>225</td>
</tr>
<tr>
<td>ja</td>
<td>142</td>
</tr>
<tr>
<td>zh</td>
<td>148</td>
</tr>
<tr>
<td>ko</td>
<td>91</td>
</tr>
<tr>
<td>cs</td>
<td>94</td>
</tr>
<tr>
<td>mn</td>
<td>4</td>
</tr>
<tr>
<td>ro</td>
<td>31</td>
</tr>
<tr>
<td>sk</td>
<td>133</td>
</tr>
<tr>
<td>ca</td>
<td>64</td>
</tr>
<tr>
<td>el</td>
<td>44</td>
</tr>
<tr>
<td>ar</td>
<td>21</td>
</tr>
<tr>
<td>uk</td>
<td>64</td>
</tr>
<tr>
<td>sr</td>
<td>19</td>
</tr>
<tr>
<td>bg</td>
<td>29</td>
</tr>
<tr>
<td>hu</td>
<td>91</td>
</tr>
<tr>
<td>hr</td>
<td>65</td>
</tr>
<tr>
<td>da</td>
<td>100</td>
</tr>
<tr>
<td>nb</td>
<td>32</td>
</tr>
<tr>
<td>sl</td>
<td>27</td>
</tr>
<tr>
<td>lt</td>
<td>22</td>
</tr>
<tr>
<td>tr</td>
<td>75</td>
</tr>
<tr>
<td>he</td>
<td>42</td>
</tr>
<tr>
<td>mk</td>
<td>11</td>
</tr>
<tr>
<td>be</td>
<td>26</td>
</tr>
<tr>
<td>sq</td>
<td>9</td>
</tr>
<tr>
<td>km</td>
<td>6</td>
</tr>
<tr>
<td>th</td>
<td>14</td>
</tr>
<tr>
<td>az</td>
<td>2</td>
</tr>
<tr>
<td>id</td>
<td>8</td>
</tr>
<tr>
<td>fy</td>
<td>13</td>
</tr>
<tr>
<td>fa</td>
<td>34</td>
</tr>
<tr>
<td>af</td>
<td>8</td>
</tr>
<tr>
<td>cy</td>
<td>0</td>
</tr>
<tr>
<td>gl</td>
<td>11</td>
</tr>
<tr>
<td>ms</td>
<td>3</td>
</tr>
<tr>
<td>am</td>
<td>1</td>
</tr>
<tr>
<td>hi</td>
<td>5</td>
</tr>
<tr>
<td>vi</td>
<td>4</td>
</tr>
<tr>
<td>cz</td>
<td>1</td>
</tr>
<tr>
<td>ur</td>
<td>1</td>
</tr>
<tr>
<td>eu</td>
<td>5</td>
</tr>
<tr>
<td>ga</td>
<td>7</td>
</tr>
<tr>
<td>zw</td>
<td>0</td>
</tr>
<tr>
<td>eo</td>
<td>1</td>
</tr>
<tr>
<td>jp</td>
<td>0</td>
</tr>
<tr>
<td>et</td>
<td>2</td>
</tr>
<tr>
<td>ua</td>
<td>1</td>
</tr>
<tr>
<td>no</td>
<td>2</td>
</tr>
<tr>
<td>is</td>
<td>4</td>
</tr>
<tr>
<td>nn</td>
<td>1</td>
</tr>
<tr>
<td>lv</td>
<td>0</td>
</tr>
<tr>
<td>ml</td>
<td>1</td>
</tr>
<tr>
<td>wa</td>
<td>1</td>
</tr>
<tr>
<td>tl</td>
<td>1</td>
</tr>
<tr>
<td>xultrisLocale</td>
<td>1</td>
</tr>
<tr>
<td>se</td>
<td>1</td>
</tr>
<tr>
<td>pa</td>
<td>0</td>
</tr>
<tr>
<td>ka</td>
<td>1</td>
</tr>
<tr>
<td>convertLocale</td>
<td>1</td>
</tr>
</table>
</div>
<p>And here is the obligatory graph for those numerically challenged by high school mathematics teachers.</p>
<p><img src="http://www.cesaroliveira.net/images/misc/2008-05-21/addons2.png" alt="top 10 languages for 464 analyzed extensions"/></p>
<p>So what does this lead to? First I need to fix locales. We need to get the vast majority of them. Next, I want to profile all the extensions and not just the first 2500. And then, I want to start looking at web crawlers and learning how to crawl a simple website before unleashing a monster on AMO.</p>
]]></content:encoded>
			<wfw:commentRss>http://tea.cesaroliveira.net/archives/15/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Taking on the WildOn(es)</title>
		<link>http://tea.cesaroliveira.net/archives/9</link>
		<comments>http://tea.cesaroliveira.net/archives/9#comments</comments>
		<pubDate>Thu, 15 May 2008 23:33:37 +0000</pubDate>
		<dc:creator>Cesar</dc:creator>
				<category><![CDATA[addons]]></category>
		<category><![CDATA[intern]]></category>
		<category><![CDATA[mozilla]]></category>
		<category><![CDATA[seneca]]></category>
		<category><![CDATA[wildon]]></category>

		<guid isPermaLink="false">http://www.cesaroliveira.net/?p=9</guid>
		<description><![CDATA[I started writing this a week and a half ago, but just finished it today. First day at interning at Mozilla. I finally found out what I get to do this summer. I got the OK to blog about it, because you know how secret them Mozilla folks are about their secret in-house project (ie. [...]]]></description>
			<content:encoded><![CDATA[<p><em>I started writing this a week and a half ago, but just finished it today.</em></p>
<p>First day at interning at Mozilla. I finally found out what I get to do this summer. I got the OK to blog about it, because you know how secret them Mozilla folks are about their secret in-house project (ie. <a href="http://starkravingfinkle.org/blog/" onclick="pageTracker._trackPageview('/outgoing/starkravingfinkle.org/blog/?referer=');">What is this guy up to?</a> <img src='http://tea.cesaroliveira.net/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> ).</p>
<p>The actual wiki page was apparently out in the open, but no-one heard about it. It&#8217;s called <a href="http://wiki.mozilla.org/Update:WildOn" onclick="pageTracker._trackPageview('/outgoing/wiki.mozilla.org/Update_WildOn?referer=');">WildOnAddons</a>. While a new name is, <acronym title="In My Opinion">IMO</acronym>, mandatory, it&#8217;s actually a pretty neat idea. There are many great extensions such as Ted&#8217;s <a href="http://ted.mielczarek.org/code/mozilla/extensiondev/" onclick="pageTracker._trackPageview('/outgoing/ted.mielczarek.org/code/mozilla/extensiondev/?referer=');">Extension Developer&#8217;s Extension</a> that aren&#8217;t hosted on AMO. Some other extensions are hosted on AMO, but frequently have updates much sooner on their website before it goes public.</p>
<p>Sometimes, extensions come in bundled with packages such as Norton and McAfeee. <a href="http://www.google.com/tools/firefox/" onclick="pageTracker._trackPageview('/outgoing/www.google.com/tools/firefox/?referer=');">Google Notebook</a> is one of many Google Labs extension hosted on their own server.</p>
<p>In short, they&#8217;re hosted everywhere. But that presents a problem, how many are out there and can find and index them?</p>
<p>This is actually a lot harder then going on google and typing <a href="http://www.google.com/search?hl=en&amp;q=filetype%3Axpi&amp;btnG=Google+Search" onclick="pageTracker._trackPageview('/outgoing/www.google.com/search?hl=en_amp_q=filetype_3Axpi_amp_btnG=Google+Search&amp;referer=');">filetype:xpi</a>, because according to those results, <a href="http://www.google.com/search?hl=en&amp;q=filetype%3Axpi+site%3Aaddons.mozilla.org&amp;btnG=Google+Search" onclick="pageTracker._trackPageview('/outgoing/www.google.com/search?hl=en_amp_q=filetype_3Axpi+site_3Aaddons.mozilla.org_amp_btnG=Google+Search&amp;referer=');">AMO only has 78 extensions</a>. In fact, there are <a href="http://www.addonsmirror.net/" onclick="pageTracker._trackPageview('/outgoing/www.addonsmirror.net/?referer=');">several</a> <a href="http://addons.sociz.com/" onclick="pageTracker._trackPageview('/outgoing/addons.sociz.com/?referer=');">repositories</a> <a href="http://en.addons.pl/" onclick="pageTracker._trackPageview('/outgoing/en.addons.pl/?referer=');">of</a> <a href="http://www.foxiewire.com/" onclick="pageTracker._trackPageview('/outgoing/www.foxiewire.com/?referer=');">addons</a> <a href="http://addons.songbirdnest.com/" onclick="pageTracker._trackPageview('/outgoing/addons.songbirdnest.com/?referer=');">each</a> <a href="https://extensions.flock.com/" onclick="pageTracker._trackPageview('/outgoing/extensions.flock.com/?referer=');">catering</a> to a different crowd (yes, we are counting <strong>all</strong> addons). While I don&#8217;t think that AMO can satisfy everyone all the time. It might help us figure out how many extensions are out there and how many are hosted on our servers. Actually figuring this out will take a lot of work, and not as straight-forward as it sounds (ie. All of AMO&#8217;s sandboxed addons require authentication, so a web crawler would have to know about it if we were crawling through the web), but it will be worth it in the end.</p>
<p>I&#8217;ll keep blogging about it under <a href="http://www.cesaroliveira.net/tea/archives/tag/wildon/feed" onclick="pageTracker._trackPageview('/outgoing/www.cesaroliveira.net/tea/archives/tag/wildon/feed?referer=');">wildon tag RSS feed</a> if your interested on how progress goes.</p>
]]></content:encoded>
			<wfw:commentRss>http://tea.cesaroliveira.net/archives/9/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
