My e-Notes about DataScience, Machine Learning, Python, Data Analytics, DataStage, DWH and ETL Concepts

Breaking

Sunday, 6 March 2016

Python SyntaxError - Non-ASCII character '\xe2' in file


If you get below error while running your python code - 

SyntaxError: Non-ASCII character '\xe2' in file .\set_learn.py on line 32, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

and You are using Notepad++ so here is how you have to resolve this -

1. By converting the Text Encoding

Go to Menu -> Encoding -> Convert to UTF-8

and save the file.


2. By seach and replace the \xe2 value to empty

Use Ctrl-F
Find [^\xe2]+
or Find [^\x00-\x7F]+ to delete all non-ascii char
Select Search mode as -Regular Expression
Hit Enter to replace all values


3. In Linux

a. Find the line which is having bad charaters -
grep -nP "[\x80-\xFF]" INPUT_FILE


b. Some ways to remove 
sed -i 's/[^[:print:]]//g' INPUT_FILE > clean-file
sed 's/[\x80-\xff]//g' INPUT_FILE > clean-file
tr -cd '\11\12\15\40-\176' < INPUT_FILE > clean-file

** word of caution - It may remove some charaters which you need file as we are using range, so take a backup of your file first



Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/



No comments:

Post a comment

Disclaimer

The postings on this site are my own and don't necessarily represent IBM's or other companies positions, strategies or opinions. All content provided on this blog is for informational purposes and knowledge sharing only.
The owner of this blog makes no representations as to the accuracy or completeness of any information on this site or found by following any link on this site. The owner will not be liable for any errors or omissions in this information nor for the availability of this information. The owner will not be liable for any losses, injuries, or damages from the display or use of his information.