Reading Twitter Data with C# and LINQ

[アーティクル]
01/27/2009

I wanted to read Twitter.com search results (tweets) using C#. I started by deciding that a tweet looks something like this:

     public class Tweet

        public string Id { get; set; }

        public DateTime Published { get; set; }

        public string Link { get; set; }

        public string Title { get; set; }

        public Author Author { get; set; }

    public class Author

        public string Name { get; set; }

        public string Uri { get; set; }

Next I needed a way to get the data from the web and into a collection of Tweet’s. I defined a TweetStream type that would handle the following:

Downloading the data from Twitter.com
Deserializing the Tweet data
Remembering the high-watermark so that we always only get new tweets.

I stubbed out this:

     public class TweetStream

        private string m_refreshUri;

        List<Tweet> m_tweets;

        public TweetStream(string queryUri)

            m_refreshUri = queryUri;

            m_tweets = new List<Tweet>();

        public List<Tweet> Tweets

get

                return m_tweets;

        public void Refresh()

            // TODO - download tweet information from Twitter.com,

            // populate the tweet collection with the new

            // tweet data, and store the new high-watermark

Now I just need to fill in Refresh.

Download the results from Twitter

The LINQ XDocument type makes this very easy. What’s not clear here is that I am loading ATOM data using the Twitter ATOM search API. The result stream will look something like this:

 <?xml version="1.0" encoding="UTF-8"?>

<feed xmlns:google="https://base.google.com/ns/1.0" xml:lang="en-US" xmlns:openSearch="https://a9.com/-/spec/opensearch/1.1/" xmlns="https://www.w3.org/2005/Atom" xmlns:twitter="https://api.twitter.com/">

  <id>tag:search.twitter.com,2005:search/TFS</id>

  <link type="text/html" rel="alternate" href="https://search.twitter.com/search?q=TFS"/>

  <link type="application/atom+xml" rel="self" href="https://search.twitter.com/search.atom?q=TFS"/>

  <title>TFS - Twitter Search</title>

  <link type="application/opensearchdescription+xml" rel="search" href="https://search.twitter.com/opensearch.xml"/>

  <link type="application/atom+xml" rel="refresh" href="https://search.twitter.com/search.atom?q=TFS&amp;since_id=1152416895"/>

  <twitter:warning>adjusted since_id, it was older than allowed</twitter:warning>

  <updated>2009-01-27T16:13:14Z</updated>

  <openSearch:itemsPerPage>15</openSearch:itemsPerPage>

  <openSearch:language>en</openSearch:language>

  <link type="application/atom+xml" rel="next" href="https://search.twitter.com/search.atom?max_id=1152416895&amp;page=2&amp;q=TFS"/>

  <entry>

    <id>tag:search.twitter.com,2005:1152416895</id>

    <published>2009-01-27T16:13:14Z</published>

    <link type="text/html" rel="alternate" href="https://twitter.com/bubbafat/statuses/1152416895"/>

    <title>All day TFS training.</title>

    <content type="html">All day &lt;b&gt;TFS&lt;/b&gt; training.</content>

    <updated>2009-01-27T16:13:14Z</updated>

    <link type="image/png" rel="image" href="https://static.twitter.com/images/default_profile_normal.png"/>

    <author>

      <name>bubbafat (Robert Horvick)</name>

      <uri>https://twitter.com/bubbafat</uri>

    </author>

  </entry>

  <entry>...</entry>

  <entry>...</entry>

  <entry>...</entry>

</feed>

And we load it like this:

 XDocument feed = XDocument.Load(m_refreshUri);

Populate the Tweet Collection

With the document loaded we can now start pulling out the data we want. First we need to define the ATOM namespace:

 XNamespace atomNS = "https://www.w3.org/2005/Atom";

Next we need to iterate over every “entry” element in the feed and extract the data we care about. LINQ makes this a breeze (though perhaps a bit tough to debug and grok at first).

 m_tweets = (from tweet in feed.Descendants(atomNS + "entry")

    select new Tweet

        Title = (string)tweet.Element(atomNS + "title"),

        Published = DateTime.Parse((string)tweet.Element(atomNS + "published")),

        Id = (string)tweet.Element(atomNS + "id"),

        Link = tweet.Elements(atomNS + "link")

            .Where(link => (string)link.Attribute("rel") == "alternate")

            .Select(link => (string)link.Attribute("href"))

            .First(),

        Author = (from author in tweet.Descendants(atomNS + "author")

            select new Author

                Name = (string)author.Element(atomNS + "name"),

                Uri = (string)author.Element(atomNS + "uri"),

            }).First(),

    }).ToList<Tweet>();

Store the High-watermark

Finally we want to load the high watermark so we don’t ask for data twice. This is just a matter of loading the “refresh” link from the top-level feed elements.

 m_refreshUri = feed.Descendants(atomNS + "link")

    .Where(link => link.Attribute("rel").Value == "refresh")

    .Select(link => link.Attribute("href").Value)

    .First();

Putting it all Together

Not to put it all together you can create a collection of TweetStream’s and refresh them in a loop – printing out the tweets as you find them.

 using System;

using System.Threading;

using TwitterLib;

namespace TestHost

    class Program

        static void Main(string[] args)

            int totalTweets = 0;

            TweetStream[] tweetStreams = new TweetStream[] {

                new TweetStream("https://search.twitter.com/search.atom?q=TFS"),

                new TweetStream("https://search.twitter.com/search.atom?q=Team+Foundation+Server"),

                new TweetStream("https://search.twitter.com/search.atom?q=TFS2008"),

                new TweetStream("https://search.twitter.com/search.atom?q=TFS2005"),

};

            while (true)

                int currentTweets = 0;

                bool newTweets = false;

                foreach (TweetStream stream in tweetStreams)

                    stream.Refresh();

                    foreach(Tweet tweet in stream.Tweets)

                        Console.WriteLine("{0}: {1}",

                            tweet.Author.TwitterId,

                            tweet.Title);

                        newTweets = true;

                        totalTweets++;

                        currentTweets++;

                if (newTweets)

                    Console.WriteLine("Loaded {0} more tweets", currentTweets);

                    Console.WriteLine("Loaded {0} total tweets", totalTweets);

                else

                    Console.WriteLine("No new tweets.");

                DateTime nextCheck = DateTime.Now.AddMinutes(5);

                Console.WriteLine("Will check again at {0}", nextCheck.ToShortTimeString());

                Thread.Sleep(TimeSpan.FromMinutes(5));

You may have noticed I used a property (Author.TwitterId) that is not in the code – it’s just a simple Regex:

 public string TwitterId

get

        return s_idParser.Match(Name).Groups["twitterid"].Value;

private static Regex s_idParser =

    new Regex(@"^(?<twitterid>.*)\s+\((?<displayname>.*)\)");

Whats’ Next?

To be clear this is not production-ready code. It is very optimistic, quite inefficient and a bit ugly. Also it doesn’t do anything terribly interesting.

But in less than 100 lines you have an ugly, slow, optimistic Twitter feed reader (which could read any ATOM stream with just a few additional lines).

Comments

Anonymous
March 02, 2009
My pal Lieven and I are preparing some cool demos to show at the Belgian TechDays SharePoint Preconference

次の方法で共有