Udostępnij za pośrednictwem


Reading Twitter Data with C# and LINQ

I wanted to read Twitter.com search results (tweets) using C#.  I started by deciding that a tweet looks something like this:

     public class Tweet
    {
        public string Id { get; set; }
        public DateTime Published { get; set; }
        public string Link { get; set; }
        public string Title { get; set; }
        public Author Author { get; set; }
    }
    public class Author
    {
        public string Name { get; set; }
        public string Uri { get; set; }
    }

Next I needed a way to get the data from the web and into a collection of Tweet’s.  I defined a TweetStream type that would handle the following:

  1. Downloading the data from Twitter.com
  2. Deserializing the Tweet data
  3. Remembering the high-watermark so that we always only get new tweets.

I stubbed out this:

     public class TweetStream
    {
        private string m_refreshUri;
        List<Tweet> m_tweets;
        public TweetStream(string queryUri)
        {
            m_refreshUri = queryUri;
            m_tweets = new List<Tweet>();
        }
        public List<Tweet> Tweets
        {
            get
            {
                return m_tweets;
            }
        }
        public void Refresh()
        {
            // TODO - download tweet information from Twitter.com,
            // populate the tweet collection with the new
            // tweet data, and store the new high-watermark
        }
    }

Now I just need to fill in Refresh.

Download the results from Twitter

The LINQ XDocument type makes this very easy.  What’s not clear here is that I am loading ATOM data using the Twitter ATOM search API.  The result stream will look something like this:

 <?xml version="1.0" encoding="UTF-8"?>
<feed xmlns:google="https://base.google.com/ns/1.0" xml:lang="en-US" xmlns:openSearch="https://a9.com/-/spec/opensearch/1.1/" xmlns="https://www.w3.org/2005/Atom" xmlns:twitter="https://api.twitter.com/">
  <id>tag:search.twitter.com,2005:search/TFS</id>
  <link type="text/html" rel="alternate" href="https://search.twitter.com/search?q=TFS"/>
  <link type="application/atom+xml" rel="self" href="https://search.twitter.com/search.atom?q=TFS"/>
  <title>TFS - Twitter Search</title>
  <link type="application/opensearchdescription+xml" rel="search" href="https://search.twitter.com/opensearch.xml"/>
  <link type="application/atom+xml" rel="refresh" href="https://search.twitter.com/search.atom?q=TFS&amp;since_id=1152416895"/>
  <twitter:warning>adjusted since_id, it was older than allowed</twitter:warning>
  <updated>2009-01-27T16:13:14Z</updated>
  <openSearch:itemsPerPage>15</openSearch:itemsPerPage>
  <openSearch:language>en</openSearch:language>
  <link type="application/atom+xml" rel="next" href="https://search.twitter.com/search.atom?max_id=1152416895&amp;page=2&amp;q=TFS"/>
  <entry>
    <id>tag:search.twitter.com,2005:1152416895</id>
    <published>2009-01-27T16:13:14Z</published>
    <link type="text/html" rel="alternate" href="https://twitter.com/bubbafat/statuses/1152416895"/>
    <title>All day TFS training.</title>
    <content type="html">All day &lt;b&gt;TFS&lt;/b&gt; training.</content>
    <updated>2009-01-27T16:13:14Z</updated>
    <link type="image/png" rel="image" href="https://static.twitter.com/images/default_profile_normal.png"/>
    <author>
      <name>bubbafat (Robert Horvick)</name>
      <uri>https://twitter.com/bubbafat</uri>
    </author>
  </entry>
  <entry>...</entry>
  <entry>...</entry>
  <entry>...</entry>
</feed>

And we load it like this:

 XDocument feed = XDocument.Load(m_refreshUri);

Populate the Tweet Collection

With the document loaded we can now start pulling out the data we want.  First we need to define the ATOM namespace:

 XNamespace atomNS = "https://www.w3.org/2005/Atom";

Next we need to iterate over every “entry” element in the feed and extract the data we care about.  LINQ makes this a breeze (though perhaps a bit tough to debug and grok at first).

 m_tweets = (from tweet in feed.Descendants(atomNS + "entry")
    select new Tweet
    {
        Title = (string)tweet.Element(atomNS + "title"),
        Published = DateTime.Parse((string)tweet.Element(atomNS + "published")),
        Id = (string)tweet.Element(atomNS + "id"),
        Link = tweet.Elements(atomNS + "link")
            .Where(link => (string)link.Attribute("rel") == "alternate")
            .Select(link => (string)link.Attribute("href"))
            .First(),
        Author = (from author in tweet.Descendants(atomNS + "author")
            select new Author
            {
                Name = (string)author.Element(atomNS + "name"),
                Uri = (string)author.Element(atomNS + "uri"),
            }).First(),
    }).ToList<Tweet>();

Store the High-watermark

Finally we want to load the high watermark so we don’t ask for data twice.  This is just a matter of loading the “refresh” link from the top-level feed elements.

 m_refreshUri = feed.Descendants(atomNS + "link")
    .Where(link => link.Attribute("rel").Value == "refresh")
    .Select(link => link.Attribute("href").Value)
    .First();

Putting it all Together

Not to put it all together you can create a collection of TweetStream’s and refresh them in a loop – printing out the tweets as you find them.

 using System;
using System.Threading;

using TwitterLib;
namespace TestHost
{
    class Program
    {
        static void Main(string[] args)
        {
            int totalTweets = 0;
            TweetStream[] tweetStreams = new TweetStream[] {
                new TweetStream("https://search.twitter.com/search.atom?q=TFS"),
                new TweetStream("https://search.twitter.com/search.atom?q=Team+Foundation+Server"),
                new TweetStream("https://search.twitter.com/search.atom?q=TFS2008"),
                new TweetStream("https://search.twitter.com/search.atom?q=TFS2005"),
            };
            while (true)
            {
                int currentTweets = 0;
                bool newTweets = false;
                foreach (TweetStream stream in tweetStreams)
                {
                    stream.Refresh();
                    foreach(Tweet tweet in stream.Tweets)
                    {
                        Console.WriteLine("{0}: {1}",
                            tweet.Author.TwitterId,
                            tweet.Title);
                        newTweets = true;
                        totalTweets++;
                        currentTweets++;
                    }
                }
                if (newTweets)
                {
                    Console.WriteLine("Loaded {0} more tweets", currentTweets);
                    Console.WriteLine("Loaded {0} total tweets", totalTweets);
                }
                else
                {
                    Console.WriteLine("No new tweets.");
                }
                DateTime nextCheck = DateTime.Now.AddMinutes(5);
                Console.WriteLine("Will check again at {0}", nextCheck.ToShortTimeString());
                Thread.Sleep(TimeSpan.FromMinutes(5));
            }
        }
    }
}

You may have noticed I used a property (Author.TwitterId) that is not in the code – it’s just a simple Regex:

 public string TwitterId
{
    get
    {
        return s_idParser.Match(Name).Groups["twitterid"].Value;
    }
}
private static Regex s_idParser = 
    new Regex(@"^(?<twitterid>.*)\s+\((?<displayname>.*)\)");

Whats’ Next?

To be clear this is not production-ready code.  It is very optimistic, quite inefficient and a bit ugly.  Also it doesn’t do anything terribly interesting.

But in less than 100 lines you have an ugly, slow, optimistic Twitter feed reader (which could read any ATOM stream with just a few additional lines).

Comments

  • Anonymous
    March 02, 2009
    My pal Lieven and I are preparing some cool demos to show at the Belgian TechDays SharePoint Preconference