How to make an HTTP Post with the Requests Package in Python 3.4

python 3, HTTP Post, Ipython Notebook, Requests

A look from Ipython Notebook inverted.

Twitter @CodeDocta

Making an HTTP POST Request

 

import requests
from pprint import pprint


URL = 'http://127.0.0.1:5000/post'

REFERER  = 'http://127.0.0.1:5000/forms/post'
UA = 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:37.0) Gecko/20100101 Firefox/37.0'

HEADERS = {
    'Host': 'httpbin.org',
    'Referer': REFERER,
    'User-Agent': UA,
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
    'Accept-Language': 'en-US,en;q=0.5',
    'Accept-Encoding': 'gzip, deflate',
    'DNT': 1,
    'Content-Type': 'application/x-www-form-urlencoded'
    }
PARAMS = {
    'custname': 'Nick',
    'custtel': '777-867-5309',
    'custemail': 'noob@lala.com',
    'size': 'medium',
    'topping': 'onion',
    'delivery': '18:20',
    'comments': 'I woutld like extra peppers please.'
}

PROXY = 'proxy goes here'
PROXIES = {
  "http": "http://" + PROXY,
  "https": "http://" + PROXY,
    }

resp = requests.post(URL, data=PARAMS, headers=HEADERS, proxies=None)
resp.close()
resp.status_code


200

Cool we have our 200

This means it worked and we can do something else, like save all the inputs to a databease or file.

 

if resp.status_code == 200:
    pprint(resp.text)


('{\n'
 '  "args": {},\n'
 '  "data": "",\n'
 '  "files": {},\n'
 '  "form": {\n'
 '    "comments": "I woutld like extra peppers please.",\n'
 '    "custemail": "noob@lala.com",\n'
 '    "custname": "Nick",\n'
 '    "custtel": "777-867-5309",\n'
 '    "delivery": "18:20",\n'
 '    "size": "medium",\n'
 '    "topping": "onion"\n'
 '  },\n'
 '  "headers": {\n'
 '    "Accept": '
 '"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",\n'
 '    "Accept-Encoding": "gzip, deflate",\n'
 '    "Accept-Language": "en-US,en;q=0.5",\n'
 '    "Connection": "keep-alive",\n'
 '    "Content-Length": "148",\n'
 '    "Content-Type": "application/x-www-form-urlencoded",\n'
 '    "Dnt": "1",\n'
 '    "Host": "httpbin.org",\n'
 '    "Referer": "http://127.0.0.1:5000/forms/post",\n'
 '    "User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:37.0) '
 'Gecko/20100101 Firefox/37.0"\n'
 '  },\n'
 '  "json": null,\n'
 '  "origin": "127.0.0.1",\n'
 '  "url": "http://httpbin.org/post"\n'
 '}\n')

Without pprint it looks a little messy, try it on an real webpage.

pprint = pretty print

Well, that’s it…

Ok, so I lied. Let’s do some other cool stuff.

If is not the way to go here we need some exception handling.

 

PARAMS = {
    'custname': 'Nick',
    'custtel': '777-867-5309',
    'custemail': 'noob@lala.com',
    'size': 'medium',
    'topping': 'mushroom',
    'delivery': '18:20',
    'comments': 'can you make it free?'
}

try:
    r = requests.post("http://httpbin.org/pos", data=PARAMS, proxies=PROXIES)
    print(r.text)
    
except Exception as postError:
    print('AHHH... Your end of the world message!!!')
    print(postError)
    
r.close()


AHHH... Your end of the world message!!!
HTTPConnectionPool(host='proxy goes here', port=80): Max retries exceeded with url: http://httpbin.org/pos (Caused by ProxyError('Cannot connect to proxy.', gaierror(11004, 'getaddrinfo failed')))

I suggest you read deeply into the Exception Handling documentatoin there are many and Exception is the catch all. At some point I will do a tut on this module.

 

PARAMS = {
    'custname': 'Nick',
    'custtel': '888-867-5309',
    'custemail': 'noob@lala.com',
    'size': 'medium',
    'topping': 'mushroom',
    'delivery': '18:20',
    'comments': 'can you make it free?'
}

try:
    r = requests.post("http://127.0.0.1:5000/post", data=PARAMS, proxies=PROXIES, timeout=5)
    print(r.text)
    
except Exception as postError:
    print('You can put whatever on top of the real error.')
    print(postError)
    
r.close()


You can put whatever on top of the real error.
HTTPConnectionPool(host='proxy goes here', port=80): Max retries exceeded with url: http://127.0.0.1:5000/post (Caused by ProxyError('Cannot connect to proxy.', gaierror(11004, 'getaddrinfo failed')))

As you can see I did not change the proxy to an real proxy however you see the error is quite telling.

You can see some defaults there like port, max redirects and I snuck in a timeout on you. There are 2 time outs one for the read and one for the server connection. Now that you are awhere you can go see more about them in the Requests documentation. It is done well so don’t be scared.

Timeouts http://docs.python-requests.org/en/latest/user/advanced/#timeouts and http://docs.python-requests.org/en/latest/user/quickstart/#timeouts

Requests has its own exception handling too http://docs.python-requests.org/en/latest/user/quickstart/#errors-and-exceptions

Let’s see how to unpack a list into variables. I do this because I can read without thinking about do I have the correct list item when I try to debug.

 

pList = ['77.888.45.80.8080', 'Jon Doe', '555-867-5309', 'Jon.Doe@Amail.com', 'large', 'mushroom', '12:00', 'Make sure that coke is a diet coke!!']
PROXY, CUST, PHONE, EMAIL, SIZE, TOP, TIME, COMM = pList

pList


['77.888.45.80.8080',
 'Jon Doe',
 '555-867-5309',
 'Jon.Doe@Amail.com',
 'large',
 'mushroom',
 '12:00',
 'Make sure that coke is a diet coke!!']

Now we can use dynamic parameters…

 

PROXIES = PROXY
PARAMS = {
    'custname': CUST,
    'custtel': PHONE,
    'custemail': EMAIL,
    'size': SIZE,
    'topping': TOP,
    'delivery': TIME,
    'comments': COMM
}

try:
    r = requests.post("http://127.0.0.1:5000/post", data=PARAMS, proxies=None, timeout=5)
    print(r.text)
    
except Exception as postError:
    print('You can put whatever on top of the real error.')
    print(postError)
    
r.close()


{
  "args": {},
  "data": "",
  "files": {},
  "form": {
    "comments": "Make sure that coke is a diet coke!!",
    "custemail": "Jon.Doe@Amail.com",
    "custname": "Jon Doe",
    "custtel": "555-867-5309",
    "delivery": "12:00",
    "size": "large",
    "topping": "mushroom"
  },
  "headers": {
    "Accept": "*/*",
    "Accept-Encoding": "gzip, deflate",
    "Connection": "keep-alive",
    "Content-Length": "162",
    "Content-Type": "application/x-www-form-urlencoded",
    "Host": "127.0.0.1:5000",
    "User-Agent": "python-requests/2.6.2 CPython/3.4.3 Windows/7"
  },
  "json": null,
  "origin": "127.0.0.1",
  "url": "http://127.0.0.1:5000/post"
}

When we are done with or memory slots you can delete them with the “del” keyword…

 

PARAMS


{'comments': 'Make sure that coke is a diet coke!!',
 'custemail': 'Jon.Doe@Amail.com',
 'custname': 'Jon Doe',
 'custtel': '555-867-5309',
 'delivery': '12:00',
 'size': 'large',
 'topping': 'mushroom'}


del PARAMS

This way we don’t have memory leaks and build up. Use this on variables and iterables when you no longer need them.

 

PARAMS


---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-15-6908596e41fe> in <module>()
----> 1 PARAMS

NameError: name 'PARAMS' is not defined

Twitter @CodeDocta

Scraping a Form for Input Fields Python 3.4

xpath, python, regex, ipython notebook

Form scrape using Regex and some Xpath.

Twitter @CodeDocta

 

import requests, regex
from pprint import pprint
from lxml import html
from lxml.etree import XPath


URL = 'http://httpbin.org/forms/post'
resp = requests.get(URL, )
respText = resp.text
resp.close()
print(resp.status_code)


200


respTree = html.fromstring(respText)
inputs = respTree.xpath("//input")
pprint(inputs)


[<InputElement 4657778 name='custname' type='text'>,
 <InputElement 4657868 name='custtel' type='tel'>,
 <InputElement 4657958 name='custemail' type='email'>,
 <InputElement 46579a8 name='size' type='radio'>,
 <InputElement 46579f8 name='size' type='radio'>,
 <InputElement 4660278 name='size' type='radio'>,
 <InputElement 46609f8 name='topping' type='checkbox'>,
 <InputElement 4660a98 name='topping' type='checkbox'>,
 <InputElement 4660ae8 name='topping' type='checkbox'>,
 <InputElement 4660b38 name='topping' type='checkbox'>,
 <InputElement 4660b88 name='delivery' type='time'>]


print(type(inputs))
print(type(inputs[0]))


<class 'list'>
<class 'lxml.html.InputElement'>


for x in inputs:
    print(x)
   


<InputElement 4657778 name='custname' type='text'>
<InputElement 4657868 name='custtel' type='tel'>
<InputElement 4657958 name='custemail' type='email'>
<InputElement 46579a8 name='size' type='radio'>
<InputElement 46579f8 name='size' type='radio'>
<InputElement 4660278 name='size' type='radio'>
<InputElement 46609f8 name='topping' type='checkbox'>
<InputElement 4660a98 name='topping' type='checkbox'>
<InputElement 4660ae8 name='topping' type='checkbox'>
<InputElement 4660b38 name='topping' type='checkbox'>
<InputElement 4660b88 name='delivery' type='time'>

Need to convert to string before you can split into another list…

 

firstA = inputs[0]
firstB = str(inputs[0])
print(type(firstA))
print(type(firstB))


<class 'lxml.html.InputElement'>
<class 'str'>


itemSplit = firstB.split()
itemSplit


['<InputElement', '4657778', "name='custname'", "type='text'>"]

Now you can get at the name and type.

Notice… I did not use lowercase t as “type” is a python keyword.

 

name = itemSplit[2]
Type = itemSplit[3]

print(name)
print(Type)


name='custname'
type='text'>

Or just regex it…

You can see the regex object, it returns a list.

 

c = regex.findall(r"(?<=name=').*?(?=')", firstB)
print(c)

print(type(c))
print(c[0])


['custname']
<class 'list'>
custname


t = regex.findall(r"(?<=type=').*?(?=')", firstB)
print(t[0])


text

Now you can loop thru inputs list and convert to string and add to another list or…

just Xpath the //form and regex what you need.

Let’s put everything into a list with regex instead.

But first I will show you the form real quick….

 

form = respTree.xpath("//form[@method='post']")
print(type(form))
print(type(form[0]))
print(str(form[0]))


<class 'list'>
<class 'lxml.html.FormElement'>
<Element form at 0x54d0c28>

Not what we expected

Hmmm… Well, this is a pain!! let’s just try regex and I will explain all tha xpath stuff later…give you a hint tho “IO” package/module.

 

allTypes = regex.findall(r"(?<=type=').*?(?=')", resp.text)
allTypes


[]

Oops! what happened?

We closed the connetion like good boys and girls is what happened.

Good thing we stuck it in a variable!!

Do you see what else?

Look at the regex closely.

Here is the HTML so we can see what we are doing.

 

pprint(respText)


('<!DOCTYPE html>\n'
 '<html>\n'
 '  <head>\n'
 '  </head>\n'
 '  <body>\n'
 '  <!-- Example form from HTML5 spec '
 "http://www.w3.org/TR/html5/forms.html#writing-a-form's-user-interface -->\n"
 '  <form method="post" action="/post">\n'
 '   <p><label>Customer name: <input name="custname"></label></p>\n'
 '   <p><label>Telephone: <input type=tel name="custtel"></label></p>\n'
 '   <p><label>E-mail address: <input type=email '
 'name="custemail"></label></p>\n'
 '   <fieldset>\n'
 '    <legend> Pizza Size </legend>\n'
 '    <p><label> <input type=radio name=size value="small"> Small '
 '</label></p>\n'
 '    <p><label> <input type=radio name=size value="medium"> Medium '
 '</label></p>\n'
 '    <p><label> <input type=radio name=size value="large"> Large '
 '</label></p>\n'
 '   </fieldset>\n'
 '   <fieldset>\n'
 '    <legend> Pizza Toppings </legend>\n'
 '    <p><label> <input type=checkbox name="topping" value="bacon"> Bacon '
 '</label></p>\n'
 '    <p><label> <input type=checkbox name="topping" value="cheese"> Extra '
 'Cheese </label></p>\n'
 '    <p><label> <input type=checkbox name="topping" value="onion"> Onion '
 '</label></p>\n'
 '    <p><label> <input type=checkbox name="topping" value="mushroom"> '
 'Mushroom </label></p>\n'
 '   </fieldset>\n'
 '   <p><label>Preferred delivery time: <input type=time min="11:00" '
 'max="21:00" step="900" name="delivery"></label></p>\n'
 '   <p><label>Delivery instructions: <textarea '
 'name="comments"></textarea></label></p>\n'
 '   <p><button>Submit order</button></p>\n'
 '  </form>\n'
 '  </body>\n'
 '</html>')

Notice the quotes?

I switched them, now we can use the regex!

 

allNames = regex.findall(r'(?<=name=").*?(?=")', respText)
allNames


['custname',
 'custtel',
 'custemail',
 'topping',
 'topping',
 'topping',
 'topping',
 'delivery',
 'comments']


allValues = regex.findall(r'(?<=value=").*?(?=")', respText)
allValues


['small', 'medium', 'large', 'bacon', 'cheese', 'onion', 'mushroom']


allTypes = regex.findall(r'(?<=type=).*?(?=\s)', respText)
allTypes


['tel',
 'email',
 'radio',
 'radio',
 'radio',
 'checkbox',
 'checkbox',
 'checkbox',
 'checkbox',
 'time']

This is not looking good, my lists are uneven 😦

 

print('Names ' + str(len(allNames)))
print('Types ' + str(len(allTypes)))
print('Values ' + str(len(allValues)))


Names 9
Types 10
Values 7

Notice I converted integers into Strings there?

The “len” function returns an int, but not anymore.

 

allLabels = regex.findall(r'(?<=<label>).*?(?=</label>)', respText)
allLabels


['Customer name: <input name="custname">',
 'Telephone: <input type=tel name="custtel">',
 'E-mail address: <input type=email name="custemail">',
 ' <input type=radio name=size value="small"> Small ',
 ' <input type=radio name=size value="medium"> Medium ',
 ' <input type=radio name=size value="large"> Large ',
 ' <input type=checkbox name="topping" value="bacon"> Bacon ',
 ' <input type=checkbox name="topping" value="cheese"> Extra Cheese ',
 ' <input type=checkbox name="topping" value="onion"> Onion ',
 ' <input type=checkbox name="topping" value="mushroom"> Mushroom ',
 'Preferred delivery time: <input type=time min="11:00" max="21:00" step="900" name="delivery">',
 'Delivery instructions: <textarea name="comments"></textarea>']

So what should I use?

The great thing is that is totally up to you and your needs.

Now you know several ways and yes there are several more.

This regex syntax is good for “re” packeage too.

I used new “regex” package as it will replace “re” soon.

Just “pip install regex” to get it.

As for the Xpath, I will be doing a seperate tutorial for this as it is more complex.

What to do now?

The obvious utily is to just see and create the post code manually Otherwise, think outside the box. 😉

Think about how you can automate this for most pages…