3/5/2023 0 Comments Osx renamer![]() A consistent file name structure, including information such as the date, project, and description, can often be the best way to properly organize and locate digital data. Sometimes, however, nothing beats a good file naming scheme. To this end, Apple has introduced several features in OS X to help users corral their pictures, documents, and other files, such as Spotlight metadata and Finder Tags. So, now you need to put that inside the for loop in the little bash script above.How to Batch Rename Multiple Files in OS X YosemiteĪs our digital libraries continue to grow, so too has the importance of efficiently managing our data. But you are not done yet, because that is UTF-16, so, to search in the file, you need: iconv -c -f UTF-16 -t ASCII /tmp/licence.txt | grep -oE "" pdf2text.app/Contents/MacOS/"Application Stub" SomeFile.pdfĪnd it will extract the text to /tmp/licence.txt. Now you can run the following instead of pdfgrep. Save that as "as an Application", called pdf2text. Check the Replace Existing Files box so that it still works for your second PDF when the licence from the previous file is there. You get /tmp in the "Save Output to" field by using SHIFT COMMAND G and typing /tmp. Make an Automator workflow that looks like this: If that makes sense to you, here are the steps: ![]() Basically, you make an Automator workflow to extract the text from your PDF into a temporary text document and then you convert that from UTF-16 into ASCII and grep in there. If you don't want to use homebrew and pdfgrep, you can do it with native OSX tools, but it is a bit harder. If it looks correct, remove the word echo and it will actually do the name changes - at the moment it just tells you what it would do, rather than doing anything. PLEASE MAKE A BACKUP FIRST AND TEST ON A FEW DUMMY FILES Once you have saved the script, make it executable (just necessary once) with: chmod x NameByLicence Lic=$(pdfgrep -i "License Number" "$f" | grep -oE " ") So, if you have 7,000 PDF files in a directory you would need to go that directory and save the following as a script called NameByLicence: #!/bin/bash ![]() You can then extract the licence from a PDF file with: pdfgrep -i "License Number" SomeFile.pdf | grep -oE " "Īnd put that in a variable with: lic=$(pdfgrep -i "License Number" SomeFile.pdf | grep -oE " ") Then you can install pdfgrep with: brew install pdfgrepĪlternatively, you can download, and build pdfgrep yourself if you like doing that sort of thing! Download. You would need to go to the homebrew wesbite and copy the one-liner from there (which I don't want to put here in case it gets outdated) and paste it into Terminal and run it. You can extract the license number using pdfgrep, which you can install using homebrew. Then you can get them all done with: parallel. So, you would install with: brew install parallelĪnd then change the script to do just a single file like this: #!/bin/bash If it takes forever, you could also use homebrew to install GNU Parallel so you can do them all in parallel and get the job done faster. # Check licence is at least 15 characters, else do nothing # Don't barf if no files, or if upper or lower case names Ok, I think we can do a bit better now I understand the format of the number better. Maybe you can create a more tolerant/general regex for your PDF documents by investigating more samples. So I have created a regex to find substring \n\nLicense\nNumber\n\n9-91-053-01-4L-04292\n\nA and get License Number from it. If some other PDF documents have a little bit different formatting and this script is not working for them, simply investigate the text contents with: commands.getstatusoutput('pdf2txt.py ' file)įor your sample file it was. Lastly, if there is a result it will change file's name to LICENSE_NUMBER.pdf Then it will search each file for the license numbers according to a regular expression that I wrote for you. This Python script will search the directory( same directory with itself) for files with. You can execute it by simply typing python rename.py. Print command ' :: Command executed.' # Show what command has been executed ![]() If result: # If license number has been foundĬommand = 'mv ' file ' ' oup(0) '.pdf'Ĭommands.getstatusoutput(command) # Rename file to LICENSE_NUMBER.pdf Result = re.search('-', pdf_text) # Search using a regex specific to your solution and find the license number Pdf_text = commands.getstatusoutput('pdf2txt.py ' file) # Get text content of the pdf file Secondly install pdfminer: pip install pdfminerīy using pdfminer and standard libraries of Python I have created a script that specific to your problem:įor file in glob.glob("*.pdf"): # For all files with extension. First please install pip: sudo easy_install pip ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |