My new title

2016-02-15 by Code Docta in test, Tests and tagged firstpost, test

This is the body of my new post.

My new title

2016-02-15 by Code Docta in test, Tests and tagged firstpost, test

This is the body of my new post.

How to make an HTTP Post with the Requests Package in Python 3.4

2015-05-07 by Code Docta in Uncategorized | Leave a comment

python 3, HTTP Post, Ipython Notebook, Requests

A look from Ipython Notebook inverted.

Twitter @CodeDocta¶

Making an HTTP POST Request¶

import requests
from pprint import pprint

URL = 'http://127.0.0.1:5000/post'

REFERER  = 'http://127.0.0.1:5000/forms/post'
UA = 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:37.0) Gecko/20100101 Firefox/37.0'

HEADERS = {
    'Host': 'httpbin.org',
    'Referer': REFERER,
    'User-Agent': UA,
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
    'Accept-Language': 'en-US,en;q=0.5',
    'Accept-Encoding': 'gzip, deflate',
    'DNT': 1,
    'Content-Type': 'application/x-www-form-urlencoded'
    }
PARAMS = {
    'custname': 'Nick',
    'custtel': '777-867-5309',
    'custemail': 'noob@lala.com',
    'size': 'medium',
    'topping': 'onion',
    'delivery': '18:20',
    'comments': 'I woutld like extra peppers please.'
}

PROXY = 'proxy goes here'
PROXIES = {
  "http": "http://" + PROXY,
  "https": "http://" + PROXY,
    }

resp = requests.post(URL, data=PARAMS, headers=HEADERS, proxies=None)
resp.close()
resp.status_code

Cool we have our 200¶

This means it worked and we can do something else, like save all the inputs to a databease or file.¶

if resp.status_code == 200:
    pprint(resp.text)

('{\n'
 '  "args": {},\n'
 '  "data": "",\n'
 '  "files": {},\n'
 '  "form": {\n'
 '    "comments": "I woutld like extra peppers please.",\n'
 '    "custemail": "noob@lala.com",\n'
 '    "custname": "Nick",\n'
 '    "custtel": "777-867-5309",\n'
 '    "delivery": "18:20",\n'
 '    "size": "medium",\n'
 '    "topping": "onion"\n'
 '  },\n'
 '  "headers": {\n'
 '    "Accept": '
 '"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",\n'
 '    "Accept-Encoding": "gzip, deflate",\n'
 '    "Accept-Language": "en-US,en;q=0.5",\n'
 '    "Connection": "keep-alive",\n'
 '    "Content-Length": "148",\n'
 '    "Content-Type": "application/x-www-form-urlencoded",\n'
 '    "Dnt": "1",\n'
 '    "Host": "httpbin.org",\n'
 '    "Referer": "http://127.0.0.1:5000/forms/post",\n'
 '    "User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:37.0) '
 'Gecko/20100101 Firefox/37.0"\n'
 '  },\n'
 '  "json": null,\n'
 '  "origin": "127.0.0.1",\n'
 '  "url": "http://httpbin.org/post"\n'
 '}\n')

Without pprint it looks a little messy, try it on an real webpage.¶

pprint = pretty print¶

Well, that’s it…¶

Ok, so I lied. Let’s do some other cool stuff.¶

If is not the way to go here we need some exception handling.¶

PARAMS = {
    'custname': 'Nick',
    'custtel': '777-867-5309',
    'custemail': 'noob@lala.com',
    'size': 'medium',
    'topping': 'mushroom',
    'delivery': '18:20',
    'comments': 'can you make it free?'
}

try:
    r = requests.post("http://httpbin.org/pos", data=PARAMS, proxies=PROXIES)
    print(r.text)
    
except Exception as postError:
    print('AHHH... Your end of the world message!!!')
    print(postError)
    
r.close()

AHHH... Your end of the world message!!!
HTTPConnectionPool(host='proxy goes here', port=80): Max retries exceeded with url: http://httpbin.org/pos (Caused by ProxyError('Cannot connect to proxy.', gaierror(11004, 'getaddrinfo failed')))

I suggest you read deeply into the Exception Handling documentatoin there are many and Exception is the catch all. At some point I will do a tut on this module.¶

PARAMS = {
    'custname': 'Nick',
    'custtel': '888-867-5309',
    'custemail': 'noob@lala.com',
    'size': 'medium',
    'topping': 'mushroom',
    'delivery': '18:20',
    'comments': 'can you make it free?'
}

try:
    r = requests.post("http://127.0.0.1:5000/post", data=PARAMS, proxies=PROXIES, timeout=5)
    print(r.text)
    
except Exception as postError:
    print('You can put whatever on top of the real error.')
    print(postError)
    
r.close()

You can put whatever on top of the real error.
HTTPConnectionPool(host='proxy goes here', port=80): Max retries exceeded with url: http://127.0.0.1:5000/post (Caused by ProxyError('Cannot connect to proxy.', gaierror(11004, 'getaddrinfo failed')))

As you can see I did not change the proxy to an real proxy however you see the error is quite telling.¶

You can see some defaults there like port, max redirects and I snuck in a timeout on you. There are 2 time outs one for the read and one for the server connection. Now that you are awhere you can go see more about them in the Requests documentation. It is done well so don’t be scared.¶

Timeouts http://docs.python-requests.org/en/latest/user/advanced/#timeouts and http://docs.python-requests.org/en/latest/user/quickstart/#timeouts ¶

Requests has its own exception handling too http://docs.python-requests.org/en/latest/user/quickstart/#errors-and-exceptions ¶

Let’s see how to unpack a list into variables. I do this because I can read without thinking about do I have the correct list item when I try to debug.¶

pList = ['77.888.45.80.8080', 'Jon Doe', '555-867-5309', 'Jon.Doe@Amail.com', 'large', 'mushroom', '12:00', 'Make sure that coke is a diet coke!!']
PROXY, CUST, PHONE, EMAIL, SIZE, TOP, TIME, COMM = pList

pList

['77.888.45.80.8080',
 'Jon Doe',
 '555-867-5309',
 'Jon.Doe@Amail.com',
 'large',
 'mushroom',
 '12:00',
 'Make sure that coke is a diet coke!!']

Now we can use dynamic parameters…¶

PROXIES = PROXY
PARAMS = {
    'custname': CUST,
    'custtel': PHONE,
    'custemail': EMAIL,
    'size': SIZE,
    'topping': TOP,
    'delivery': TIME,
    'comments': COMM
}

try:
    r = requests.post("http://127.0.0.1:5000/post", data=PARAMS, proxies=None, timeout=5)
    print(r.text)
    
except Exception as postError:
    print('You can put whatever on top of the real error.')
    print(postError)
    
r.close()

{
  "args": {},
  "data": "",
  "files": {},
  "form": {
    "comments": "Make sure that coke is a diet coke!!",
    "custemail": "Jon.Doe@Amail.com",
    "custname": "Jon Doe",
    "custtel": "555-867-5309",
    "delivery": "12:00",
    "size": "large",
    "topping": "mushroom"
  },
  "headers": {
    "Accept": "*/*",
    "Accept-Encoding": "gzip, deflate",
    "Connection": "keep-alive",
    "Content-Length": "162",
    "Content-Type": "application/x-www-form-urlencoded",
    "Host": "127.0.0.1:5000",
    "User-Agent": "python-requests/2.6.2 CPython/3.4.3 Windows/7"
  },
  "json": null,
  "origin": "127.0.0.1",
  "url": "http://127.0.0.1:5000/post"
}

When we are done with or memory slots you can delete them with the “del” keyword…¶

PARAMS

{'comments': 'Make sure that coke is a diet coke!!',
 'custemail': 'Jon.Doe@Amail.com',
 'custname': 'Jon Doe',
 'custtel': '555-867-5309',
 'delivery': '12:00',
 'size': 'large',
 'topping': 'mushroom'}

del PARAMS

This way we don’t have memory leaks and build up. Use this on variables and iterables when you no longer need them.¶

PARAMS

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-15-6908596e41fe> in <module>()
----> 1 PARAMS

NameError: name 'PARAMS' is not defined

Twitter @CodeDocta¶

Scraping a Form for Input Fields Python 3.4

2015-05-04 by Code Docta in Python and tagged ipython notebook, lxml, Python 3, regex, requests, scpraping forms, scraping, xpath | Leave a comment

Form scrape using Regex and some Xpath.

Twitter @CodeDocta¶

import requests, regex
from pprint import pprint
from lxml import html
from lxml.etree import XPath

URL = 'http://httpbin.org/forms/post'
resp = requests.get(URL, )
respText = resp.text
resp.close()
print(resp.status_code)

respTree = html.fromstring(respText)
inputs = respTree.xpath("//input")
pprint(inputs)

[<InputElement 4657778 name='custname' type='text'>,
 <InputElement 4657868 name='custtel' type='tel'>,
 <InputElement 4657958 name='custemail' type='email'>,
 <InputElement 46579a8 name='size' type='radio'>,
 <InputElement 46579f8 name='size' type='radio'>,
 <InputElement 4660278 name='size' type='radio'>,
 <InputElement 46609f8 name='topping' type='checkbox'>,
 <InputElement 4660a98 name='topping' type='checkbox'>,
 <InputElement 4660ae8 name='topping' type='checkbox'>,
 <InputElement 4660b38 name='topping' type='checkbox'>,
 <InputElement 4660b88 name='delivery' type='time'>]

print(type(inputs))
print(type(inputs[0]))

<class 'list'>
<class 'lxml.html.InputElement'>

for x in inputs:
    print(x)

<InputElement 4657778 name='custname' type='text'>
<InputElement 4657868 name='custtel' type='tel'>
<InputElement 4657958 name='custemail' type='email'>
<InputElement 46579a8 name='size' type='radio'>
<InputElement 46579f8 name='size' type='radio'>
<InputElement 4660278 name='size' type='radio'>
<InputElement 46609f8 name='topping' type='checkbox'>
<InputElement 4660a98 name='topping' type='checkbox'>
<InputElement 4660ae8 name='topping' type='checkbox'>
<InputElement 4660b38 name='topping' type='checkbox'>
<InputElement 4660b88 name='delivery' type='time'>

Need to convert to string before you can split into another list…¶

firstA = inputs[0]
firstB = str(inputs[0])
print(type(firstA))
print(type(firstB))

<class 'lxml.html.InputElement'>
<class 'str'>

itemSplit = firstB.split()
itemSplit

['<InputElement', '4657778', "name='custname'", "type='text'>"]

Now you can get at the name and type.¶

Notice… I did not use lowercase t as “type” is a python keyword.¶

name = itemSplit[2]
Type = itemSplit[3]

print(name)
print(Type)

name='custname'
type='text'>

Or just regex it…¶

You can see the regex object, it returns a list.¶

c = regex.findall(r"(?<=name=').*?(?=')", firstB)
print(c)

print(type(c))
print(c[0])

['custname']
<class 'list'>
custname

t = regex.findall(r"(?<=type=').*?(?=')", firstB)
print(t[0])

text

Now you can loop thru inputs list and convert to string and add to another list or…¶

just Xpath the //form and regex what you need.¶

Let’s put everything into a list with regex instead.¶

But first I will show you the form real quick….¶

form = respTree.xpath("//form[@method='post']")
print(type(form))
print(type(form[0]))
print(str(form[0]))

<class 'list'>
<class 'lxml.html.FormElement'>
<Element form at 0x54d0c28>

Not what we expected¶

Hmmm… Well, this is a pain!! let’s just try regex and I will explain all tha xpath stuff later…give you a hint tho “IO” package/module.¶

allTypes = regex.findall(r"(?<=type=').*?(?=')", resp.text)
allTypes

[]

Oops! what happened?¶

We closed the connetion like good boys and girls is what happened.¶

Good thing we stuck it in a variable!!¶

Do you see what else?¶

Look at the regex closely.¶

Here is the HTML so we can see what we are doing.¶

pprint(respText)

('<!DOCTYPE html>\n'
 '<html>\n'
 '  <head>\n'
 '  </head>\n'
 '  <body>\n'
 '  <!-- Example form from HTML5 spec '
 "http://www.w3.org/TR/html5/forms.html#writing-a-form's-user-interface -->\n"
 '  <form method="post" action="/post">\n'
 '   <p><label>Customer name: <input name="custname"></label></p>\n'
 '   <p><label>Telephone: <input type=tel name="custtel"></label></p>\n'
 '   <p><label>E-mail address: <input type=email '
 'name="custemail"></label></p>\n'
 '   <fieldset>\n'
 '    <legend> Pizza Size </legend>\n'
 '    <p><label> <input type=radio name=size value="small"> Small '
 '</label></p>\n'
 '    <p><label> <input type=radio name=size value="medium"> Medium '
 '</label></p>\n'
 '    <p><label> <input type=radio name=size value="large"> Large '
 '</label></p>\n'
 '   </fieldset>\n'
 '   <fieldset>\n'
 '    <legend> Pizza Toppings </legend>\n'
 '    <p><label> <input type=checkbox name="topping" value="bacon"> Bacon '
 '</label></p>\n'
 '    <p><label> <input type=checkbox name="topping" value="cheese"> Extra '
 'Cheese </label></p>\n'
 '    <p><label> <input type=checkbox name="topping" value="onion"> Onion '
 '</label></p>\n'
 '    <p><label> <input type=checkbox name="topping" value="mushroom"> '
 'Mushroom </label></p>\n'
 '   </fieldset>\n'
 '   <p><label>Preferred delivery time: <input type=time min="11:00" '
 'max="21:00" step="900" name="delivery"></label></p>\n'
 '   <p><label>Delivery instructions: <textarea '
 'name="comments"></textarea></label></p>\n'
 '   <p><button>Submit order</button></p>\n'
 '  </form>\n'
 '  </body>\n'
 '</html>')

Notice the quotes?¶

I switched them, now we can use the regex!¶

allNames = regex.findall(r'(?<=name=").*?(?=")', respText)
allNames

['custname',
 'custtel',
 'custemail',
 'topping',
 'topping',
 'topping',
 'topping',
 'delivery',
 'comments']

allValues = regex.findall(r'(?<=value=").*?(?=")', respText)
allValues

['small', 'medium', 'large', 'bacon', 'cheese', 'onion', 'mushroom']

allTypes = regex.findall(r'(?<=type=).*?(?=\s)', respText)
allTypes

['tel',
 'email',
 'radio',
 'radio',
 'radio',
 'checkbox',
 'checkbox',
 'checkbox',
 'checkbox',
 'time']

This is not looking good, my lists are uneven 😦¶

print('Names ' + str(len(allNames)))
print('Types ' + str(len(allTypes)))
print('Values ' + str(len(allValues)))

Names 9
Types 10
Values 7

Notice I converted integers into Strings there?¶

The “len” function returns an int, but not anymore.¶

allLabels = regex.findall(r'(?<=<label>).*?(?=</label>)', respText)
allLabels

['Customer name: <input name="custname">',
 'Telephone: <input type=tel name="custtel">',
 'E-mail address: <input type=email name="custemail">',
 ' <input type=radio name=size value="small"> Small ',
 ' <input type=radio name=size value="medium"> Medium ',
 ' <input type=radio name=size value="large"> Large ',
 ' <input type=checkbox name="topping" value="bacon"> Bacon ',
 ' <input type=checkbox name="topping" value="cheese"> Extra Cheese ',
 ' <input type=checkbox name="topping" value="onion"> Onion ',
 ' <input type=checkbox name="topping" value="mushroom"> Mushroom ',
 'Preferred delivery time: <input type=time min="11:00" max="21:00" step="900" name="delivery">',
 'Delivery instructions: <textarea name="comments"></textarea>']

So what should I use?¶

The great thing is that is totally up to you and your needs.¶

Now you know several ways and yes there are several more.¶

This regex syntax is good for “re” packeage too.¶

I used new “regex” package as it will replace “re” soon.¶

Just “pip install regex” to get it.¶

As for the Xpath, I will be doing a seperate tutorial for this as it is more complex.¶

What to do now?¶

The obvious utily is to just see and create the post code manually Otherwise, think outside the box. 😉¶

Think about how you can automate this for most pages…¶

Install Python 3.4 with Anaconda 3 Distribution (Distro) Part 1

2015-04-27 by Code Docta in Python and tagged Anaconda 3, Anaconda 3 tutorial, Python 3, Python 3 download, python install tutorial, python tutorial | Leave a comment

Step 1: Download Anaconda 3 Distro and Install Python 3.4

The fallowing tutorials geared for Python 3.x+ on a 64bit Windows 7 system. Go to the Continuum site and download Anaconda 3.

However, the process is the same for Python 2.7.

On the right you can see the link for Python 3

Click on “I want Python 3” link to get the correct version. Then make sure it is for 64 bit.

After you have selected the correct version click the link to download.

Click the highlighted area to download 64 bit Anaconda3 Python 3. While waiting for the file to download…

Click on Packages to see the list that comes with this distribution.

As you can see there are over 200 packages that will be included. This saves a lot of time in the long run. This ditro also enables you to update all these packages with one command. I will be showing you this in part 2.

Step 2: Running the Installer

Go to the folder where you downloaded it to and double click to install. Them click run.

Click Next

If you agree click I agree.

Please read the agreement and if you don’t have any problems with it click agree to move forward.

Click whatever is appropriate and click next

You can either have this available to you or everyone the uses this computer. Choose what is appropriate for your situation.

I strongly suggest you keep in default location until you know what you are doing.

I strongly suggest you keep the default location until you know what you are doing.

Otherwise feel free to put it where ever you wish.

Check the box highlighted and click next.

Check the box highlighted and click install. This may be checked already.

After installed click Start or press windows key.

Then click Anaconda. You will see all the cool goodies that comes with this distribution (distro) too!!

See all the cool stuff we will be playing with?

You should see the Ipython Notebook, Anaconda command prompt, Ipython Qt console (interpreter) and much more. These will be the focus of the upcoming tutorials on Python 3.4+. Please watch part 2 to see how to install, update packages and more!!

Continued in Part 2

Or visit thebotdoc Youtube channel to see all my tutorials.

You are welcome to follow me on Twitter @CodeDocta

Or please feel free to join the conversation over at our Facebook page

Code Docta

Also please check the resources page for Python tutorials I have found helpful along the way and many more useful information.

Thank you for reading and have a Pythonic Day!

Twitter @CodeDocta¶

Making an HTTP POST Request¶

Cool we have our 200¶

This means it worked and we can do something else, like save all the inputs to a databease or file.¶

Without pprint it looks a little messy, try it on an real webpage.¶

pprint = pretty print¶

Well, that’s it…¶

Ok, so I lied. Let’s do some other cool stuff.¶

If is not the way to go here we need some exception handling.¶

I suggest you read deeply into the Exception Handling documentatoin there are many and Exception is the catch all. At some point I will do a tut on this module.¶

As you can see I did not change the proxy to an real proxy however you see the error is quite telling.¶

You can see some defaults there like port, max redirects and I snuck in a timeout on you. There are 2 time outs one for the read and one for the server connection. Now that you are awhere you can go see more about them in the Requests documentation. It is done well so don’t be scared.¶

Timeouts http://docs.python-requests.org/en/latest/user/advanced/#timeouts and http://docs.python-requests.org/en/latest/user/quickstart/#timeouts¶

Requests has its own exception handling too http://docs.python-requests.org/en/latest/user/quickstart/#errors-and-exceptions¶

Let’s see how to unpack a list into variables. I do this because I can read without thinking about do I have the correct list item when I try to debug.¶

Now we can use dynamic parameters…¶

When we are done with or memory slots you can delete them with the “del” keyword…¶

This way we don’t have memory leaks and build up. Use this on variables and iterables when you no longer need them.¶

Twitter @CodeDocta¶

Twitter @CodeDocta¶

Need to convert to string before you can split into another list…¶

Now you can get at the name and type.¶

Notice… I did not use lowercase t as “type” is a python keyword.¶

Or just regex it…¶

You can see the regex object, it returns a list.¶

Now you can loop thru inputs list and convert to string and add to another list or…¶

just Xpath the //form and regex what you need.¶

Let’s put everything into a list with regex instead.¶

But first I will show you the form real quick….¶

Not what we expected¶

Hmmm… Well, this is a pain!! let’s just try regex and I will explain all tha xpath stuff later…give you a hint tho “IO” package/module.¶

Oops! what happened?¶

We closed the connetion like good boys and girls is what happened.¶

Good thing we stuck it in a variable!!¶

Do you see what else?¶

Look at the regex closely.¶

Here is the HTML so we can see what we are doing.¶

Notice the quotes?¶

I switched them, now we can use the regex!¶

This is not looking good, my lists are uneven 😦¶

Notice I converted integers into Strings there?¶

The “len” function returns an int, but not anymore.¶

So what should I use?¶

The great thing is that is totally up to you and your needs.¶

Now you know several ways and yes there are several more.¶

This regex syntax is good for “re” packeage too.¶

I used new “regex” package as it will replace “re” soon.¶

Just “pip install regex” to get it.¶

As for the Xpath, I will be doing a seperate tutorial for this as it is more complex.¶

What to do now?¶

The obvious utily is to just see and create the post code manually Otherwise, think outside the box. 😉¶

Think about how you can automate this for most pages…¶

Step 1: Download Anaconda 3 Distro and Install Python 3.4

Step 2: Running the Installer

Timeouts http://docs.python-requests.org/en/latest/user/advanced/#timeouts and http://docs.python-requests.org/en/latest/user/quickstart/#timeouts ¶

Requests has its own exception handling too http://docs.python-requests.org/en/latest/user/quickstart/#errors-and-exceptions ¶